[ https://issues.apache.org/jira/browse/SPARK-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu resolved SPARK-18905. ---------------------------------- Resolution: Fixed Assignee: Nan Zhu Fix Version/s: 2.2.0 2.1.1 > Potential Issue of Semantics of BatchCompleted > ---------------------------------------------- > > Key: SPARK-18905 > URL: https://issues.apache.org/jira/browse/SPARK-18905 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Reporter: Nan Zhu > Assignee: Nan Zhu > Fix For: 2.1.1, 2.2.0 > > > the current implementation of Spark streaming considers a batch is completed > no matter the results of the jobs > (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L203) > Let's consider the following case: > A micro batch contains 2 jobs and they read from two different kafka topics > respectively. One of these jobs is failed due to some problem in the user > defined logic, after the other one is finished successfully. > 1. The main thread in the Spark streaming application will execute the line > mentioned above, > 2. and another thread (checkpoint writer) will make a checkpoint file > immediately after this line is executed. > 3. Then due to the current error handling mechanism in Spark Streaming, > StreamingContext will be closed > (https://github.com/apache/spark/blob/1169db44bc1d51e68feb6ba2552520b2d660c2c0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala#L214) > the user recovers from the checkpoint file, and because the JobSet containing > the failed job has been removed (taken as completed) before the checkpoint is > constructed, the data being processed by the failed job would never be > reprocessed? > I might have missed something in the checkpoint thread or this > handleJobCompletion()....or it is a potential bug -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org