ng
>>>>> It took me 5 seconds to finish the same size micro-batch, why it's
>>>>> too high? what's kind of job in checkpoint?
>>>>> why it's keep increasing?
>>>>>
>>>>> 2/ When I changes the data checkpoint interval
val works more stable.
>
> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote:
>
>> Nice! Thanks for sharing, I wasn’t aware of the new API.
>>
>> Left some comments on the JIRA and design doc.
>>
>> -adrian
>>
>> From: S
kpoint interval?
>>
>> It seems that default interval works more stable.
>>
>> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote:
>>
>>> Nice! Thanks for sharing, I wasn’t aware of the new API.
>>>
>>> Lef
stats.checkpoint(Durations.seconds(100)); //change to 100,
>>>> defaults is 10
>>>>
>>>> The checkpoint is keep increasing significantly first checkpoint is
>>>> 10s, second is 30s, third is 70s ... and keep increasing :)
>>>> Why
s 30s, third is 70s ... and keep increasing :)
>>> Why it's too high when increasing checkpoint interval?
>>>
>>> It seems that default interval works more stable.
>>>
>>> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote:
&
> Cc: Adrian Tanase, "user@spark.apache.org"
> Subject: Re: Spark Streaming data checkpoint performance
>
> "trackStateByKey" is about to be added in 1.6 to resolve the performance
> issue of "updateStateByKey". You can take a look at
> https://issues.apache.org/jira/browse/SPARK-2629 and
> https://github.com/apache/spark/pull/9256
>
Nice! Thanks for sharing, I wasn’t aware of the new API.
Left some comments on the JIRA and design doc.
-adrian
From: Shixiong Zhu
Date: Tuesday, November 3, 2015 at 3:32 AM
To: Thúy Hằng Lê
Cc: Adrian Tanase, "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re:
You are correct, the default checkpointing interval is 10 seconds or your batch
size, whichever is bigger. You can change it by calling .checkpoint(x) on your
resulting Dstream.
For the rest, you are probably keeping an “all time” word count that grows
unbounded if you never remove words from
Hi Andrian,
Thanks for the information.
However your 2 suggestions couldn't really work for me.
Accuracy is the most important aspect in my application. So keeping only 15
minutes window stats or prune out some of keys is impossible for my
application.
I can change the checking point interval
"trackStateByKey" is about to be added in 1.6 to resolve the performance
issue of "updateStateByKey". You can take a look at
https://issues.apache.org/jira/browse/SPARK-2629 and
https://github.com/apache/spark/pull/9256
10 matches
Mail list logo