[jira] [Commented] (SPARK-8163) CheckPoint mechanism did not work well when error happened in big streaming

SaintBacchus (JIRA) Mon, 08 Jun 2015 18:28:12 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578161#comment-14578161
 ]


SaintBacchus commented on SPARK-8163:
-------------------------------------

Hi [~sowen] all the description was the problem how I meet it.
Since my poor English, I think you may not understand what i say:
First, Producer had push a lot data to the Kafka Brokers
Second, after a while(about 10s) shutdown the streaming
Third, recover it from checkpoint file

The result is that Streaming skipped many batches.

I really think this is a big problem, so I still reopen this issue.

> CheckPoint mechanism did not work well when error happened in big streaming
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-8163
>                 URL: https://issues.apache.org/jira/browse/SPARK-8163
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.4.0
>            Reporter: SaintBacchus
>
> I tested it with Kafka DStream.
> Sometimes Kafka Producer had push a lot data to the Kafka Brokers, then 
> Streaming Receiver wanted to pull this data without rate limite.
> At this first batch, Streaming may take 10 or more seconds to comsume this 
> data(batch was 2 second).
> I wanted to describle what the Streaming do more detail at this moment:
> The SC was doing its job; the JobGenerator was still send new batchs to 
> StreamingContext and StreamingContext writed this to the CheckPoint files;And 
> the Receiver still was busy receiving the data from kafka and also tracked 
> this events into CheckPoint.
> Then an error(unexcept error) occured, leading to shutdown the Streaming 
> Application.
> Then we wanted to recover the application from check point files.But since 
> the StreamingContext had record the next few batch, it would be recorvered 
> from the last batch. So the Streaming had already missed the first batch and 
> did not know what data had been actually comsumed by Receiver.
> Setting spark.streaming.concurrentJobs=2 could avoid this problem, but some 
> application can not do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8163) CheckPoint mechanism did not work well when error happened in big streaming

Reply via email to