[ https://issues.apache.org/jira/browse/SPARK-22752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287305#comment-16287305 ]
Marco Gaido commented on SPARK-22752: ------------------------------------- Hi [~zsxwing], thanks for looking at this. The checkpointDir is on HDFS. Inside it we have: {noformat} metadata . 0.1 kB commits offsets sources state {noformat} where {{commits}} contains files from {{0}} to {{13}}, {{offsets}} from {{0}} to {{14}}, {{sources}} contains only a file which is {{0/0}} and {{state}} contains only the directory {{0/0}} which is empty. Do you need other information? Have you idea of which is the root cause of the issue? Thanks. > FileNotFoundException while reading from Kafka > ---------------------------------------------- > > Key: SPARK-22752 > URL: https://issues.apache.org/jira/browse/SPARK-22752 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.2.0 > Reporter: Marco Gaido > > We are running a stateful structured streaming job which reads from Kafka and > writes to HDFS. And we are hitting this exception: > {noformat} > 17/12/08 05:20:12 ERROR FileFormatWriter: Aborting job null. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 4, hcube1-1n03.eng.hortonworks.com, executor 1): > java.lang.IllegalStateException: Error reading delta file > /checkpointDir/state/0/0/1.delta of HDFSStateStoreProvider[id = (op=0, > part=0), dir = /checkpointDir/state/0/0]: /checkpointDir/state/0/0/1.delta > does not exist > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$updateFromDeltaFile(HDFSBackedStateStoreProvider.scala:410) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:362) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360) > at > org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359) > at scala.Option.getOrElse(Option.scala:121) > {noformat} > Of course, the file doesn't exist in HDFS. And in the {{state/0/0}} directory > there is no file at all. While we have some files in the commits and offsets > folders. I am not sure about the reason of this behavior. It seems to happen > on the second time the job is started, after the first one failed. So it > looks like task failures can generate it. Or it might be related to > watermarks, since there are some problems related to the incoming data for > which the watermark was filtering all the incoming data. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org