SaintBacchus created SPARK-8163:
-----------------------------------

             Summary: CheckPoint mechanism did not work well when error 
happened in big streaming
                 Key: SPARK-8163
                 URL: https://issues.apache.org/jira/browse/SPARK-8163
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.4.0
            Reporter: SaintBacchus
             Fix For: 1.5.0


I tested it with Kafka DStream.
Sometimes Kafka Producer had push a lot data to the Kafka Brokers, then 
Streaming Receiver wanted to pull this data without rate limite.
At this first batch, Streaming may take 10 or more seconds to comsume this 
data(batch was 2 second).
I wanted to describle what the Streaming do more detail at this moment:
The SC was doing its job; the JobGenerator was still send new batchs to 
StreamingContext and StreamingContext writed this to the CheckPoint files;And 
the Receiver still was busy receiving the data from kafka and also tracked this 
events into CheckPoint.
Then an error(unexcept error) occured, leading to shutdown the Streaming 
Application.
Then we wanted to recover the application from check point files.But since the 
StreamingContext had record the next few batch, it would be recorvered from the 
last batch. So the Streaming had already missed the first batch and did not 
know what data had been actually comsumed by Receiver.
Setting spark.streaming.concurrentJobs=2 could avoid this problem, but some 
application can not do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to