[jira] [Updated] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduced, just modify {color:red} val metadataRoot = "hdfs://localhost:8020/tmp/checkpoint" {color} in StreamTest and then run the test {color:red} sort after aggregate in complete mode {color} in StreamingAggregationSuite. The main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file(latest delta file generated before restart and named "currentBatchId.delta") . In my opinion, this is a bug. If you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at
[jira] [Comment Edited] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872921#comment-15872921 ] guifeng edited comment on SPARK-19645 at 2/18/17 3:17 AM: -- [~zsxwing] ok, I think the solution is that determining whether the current versionId's Delta File exists before rename delta file. when file exists, just ignore the next rename operation(may also need to delete current versionId's snapshot). was (Author: guifengl...@gmail.com): [~zsxwing] ok, I think the solution is that determining whether the Delta File exists before rename delta file. when file exists, just ignore the next rename operation. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872921#comment-15872921 ] guifeng edited comment on SPARK-19645 at 2/18/17 2:37 AM: -- [~zsxwing] ok, I think the solution is that determining whether the Delta File exists before rename delta file. when file exists, just ignore the next rename operation. was (Author: guifengl...@gmail.com): [~zsxwing] ok, I think the solution is that determining whether the Delta File exists before rename delta file. when file exists, ignore the next rename operation. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872921#comment-15872921 ] guifeng commented on SPARK-19645: - [~zsxwing] ok, I think the solution is that determining whether the Delta File exists before rename delta file. when file exists, ignore the next rename operation. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871716#comment-15871716 ] guifeng commented on SPARK-19645: - So, I think current workaround is to delete(if delta file exist) and then rename file. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871627#comment-15871627 ] guifeng commented on SPARK-19645: - aBut, spark mater don't set rename options that support overwrite, so I think rename will failed also. After use hadoop 2.6+ remain failed. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Comment: was deleted (was: aBut, spark mater don't set rename options that support overwrite, so I think rename will failed also. After use hadoop 2.6+ remain failed.) > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871552#comment-15871552 ] guifeng edited comment on SPARK-19645 at 2/17/17 10:36 AM: --- But, spark mater don't set rename options that support overwrite, so I think rename will failed also. [~srowen] After use hadoop 2.6+ remain failed. was (Author: guifengl...@gmail.com): But, spark mater don't set rename options that support overwrite, so I think rename will failed also. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871552#comment-15871552 ] guifeng edited comment on SPARK-19645 at 2/17/17 9:50 AM: -- But, spark mater don't set rename options that support overwrite, so I think rename will failed also. was (Author: guifengl...@gmail.com): But, spark mater don't set rename options that support overwrite. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871552#comment-15871552 ] guifeng commented on SPARK-19645: - But, spark mater don't set rename options that support overwrite. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871536#comment-15871536 ] guifeng commented on SPARK-19645: - spark's default hadoop version is hadoop 2.2 that rename method don't exist overwrite options, so we need delete(if file exist) and then rename file. > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file(latest delta file generated > before restart and named "currentBatchId.delta") . In my opinion, this is a > bug. If you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file(latest delta file generated before restart and named "currentBatchId.delta") . In my opinion, this is a bug. If you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at
[jira] [Updated] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file(generated before restart and named "currentBatchId.delta") . In my opinion, this is a bug. If you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259)
[jira] [Updated] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file named "currentBatchId.delta". In my opinion, this is a bug. If you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at
[jira] [Updated] (SPARK-19645) structured streaming job restart bug
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Summary: structured streaming job restart bug (was: structured streaming job restart) > structured streaming job restart bug > > > Key: SPARK-19645 > URL: https://issues.apache.org/jira/browse/SPARK-19645 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 >Reporter: guifeng >Priority: Critical > > We are trying to use Structured Streaming in product, however currently > there exists a bug refer to the process of streaming job restart. > The following is the concrete error message: > {quote} >Caused by: java.lang.IllegalStateException: Error committing version 2 > into HDFSStateStore[id = (op=0, part=136), dir = > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) > at > org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) > at > org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to rename > /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 > to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) > ... 14 more > {quote} > The bug can be easily reproduce when restart previous streaming job, and > the main reason is that when restart streaming job spark will recompute WAL > offsets and generate the same hdfs delta file whose name is > "currentBatchId.delta". In my opinion, this is a bug. If you guy consider > that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19645) structured streaming job restart
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file whose name is "currentBatchId.delta". In my opinion, this is a bug. If you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at
[jira] [Updated] (SPARK-19645) structured streaming job restart
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute WAL offsets and generate the same hdfs delta file whose name is "currentBatchId.delta". In my opinion, this is a bug, and if you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at
[jira] [Updated] (SPARK-19645) structured streaming job restart
[ https://issues.apache.org/jira/browse/SPARK-19645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guifeng updated SPARK-19645: Description: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/Pipeline_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute wal offsets and generate the same hdfs delta file whose name is "currentBatchId.delta". In my opinion, this is a bug, and if you guy consider that this is a bug also, I can fix it. was: We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/PipelinePolicy_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/PipelinePolicy_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/PipelinePolicy_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259)
[jira] [Created] (SPARK-19645) structured streaming job restart
guifeng created SPARK-19645: --- Summary: structured streaming job restart Key: SPARK-19645 URL: https://issues.apache.org/jira/browse/SPARK-19645 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.1.0 Reporter: guifeng Priority: Critical We are trying to use Structured Streaming in product, however currently there exists a bug refer to the process of streaming job restart. The following is the concrete error message: {quote} Caused by: java.lang.IllegalStateException: Error committing version 2 into HDFSStateStore[id = (op=0, part=136), dir = /tmp/PipelinePolicy_112346-continueagg-bxaxs/state/0/136] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:162) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:173) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(StatefulAggregate.scala:123) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to rename /tmp/PipelinePolicy_112346-continueagg-bxaxs/state/0/136/temp--5345709896617324284 to /tmp/PipelinePolicy_112346-continueagg-bxaxs/state/0/136/2.delta at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$commitUpdates(HDFSBackedStateStoreProvider.scala:259) at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:156) ... 14 more {quote} The bug can be easily reproduce when restart previous streaming job, and the main reason is that when restart streaming job spark will recompute wal offsets and generate the same hdfs delta file whose name is "currentBatchId.delta". In my opinion, this is a bug, and if you guy consider that this is a bug also, I can fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org