[ https://issues.apache.org/jira/browse/SPARK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
SaintBacchus reopened SPARK-8163: --------------------------------- > CheckPoint mechanism did not work well when error happened in big streaming > --------------------------------------------------------------------------- > > Key: SPARK-8163 > URL: https://issues.apache.org/jira/browse/SPARK-8163 > Project: Spark > Issue Type: Bug > Components: Streaming > Affects Versions: 1.4.0 > Reporter: SaintBacchus > > I tested it with Kafka DStream. > Sometimes Kafka Producer had push a lot data to the Kafka Brokers, then > Streaming Receiver wanted to pull this data without rate limite. > At this first batch, Streaming may take 10 or more seconds to comsume this > data(batch was 2 second). > I wanted to describle what the Streaming do more detail at this moment: > The SC was doing its job; the JobGenerator was still send new batchs to > StreamingContext and StreamingContext writed this to the CheckPoint files;And > the Receiver still was busy receiving the data from kafka and also tracked > this events into CheckPoint. > Then an error(unexcept error) occured, leading to shutdown the Streaming > Application. > Then we wanted to recover the application from check point files.But since > the StreamingContext had record the next few batch, it would be recorvered > from the last batch. So the Streaming had already missed the first batch and > did not know what data had been actually comsumed by Receiver. > Setting spark.streaming.concurrentJobs=2 could avoid this problem, but some > application can not do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org