ng
>>>>> It took me 5 seconds to finish the same size micro-batch, why it's
>>>>> too high? what's kind of job in checkpoint?
>>>>> why it's keep increasing?
>>>>>
>>>>> 2/ When I changes the data checkpoint interval
val works more stable.
>
> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote:
>
>> Nice! Thanks for sharing, I wasn’t aware of the new API.
>>
>> Left some comments on the JIRA and design doc.
>>
>> -adrian
>>
>> From: S
kpoint interval?
>>
>> It seems that default interval works more stable.
>>
>> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote:
>>
>>> Nice! Thanks for sharing, I wasn’t aware of the new API.
>>>
>>> Lef
stats.checkpoint(Durations.seconds(100)); //change to 100,
>>>> defaults is 10
>>>>
>>>> The checkpoint is keep increasing significantly first checkpoint is
>>>> 10s, second is 30s, third is 70s ... and keep increasing :)
>>>> Why
s 30s, third is 70s ... and keep increasing :)
>>> Why it's too high when increasing checkpoint interval?
>>>
>>> It seems that default interval works more stable.
>>>
>>> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote:
&
"Adrian Tanase" <atan...@adobe.com> wrote:
> Nice! Thanks for sharing, I wasn’t aware of the new API.
>
> Left some comments on the JIRA and design doc.
>
> -adrian
>
> From: Shixiong Zhu
> Date: Tuesday, November 3, 2015 at 3:32 AM
> To: Thúy Hằng Lê
Nice! Thanks for sharing, I wasn’t aware of the new API.
Left some comments on the JIRA and design doc.
-adrian
From: Shixiong Zhu
Date: Tuesday, November 3, 2015 at 3:32 AM
To: Thúy Hằng Lê
Cc: Adrian Tanase, "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re:
You are correct, the default checkpointing interval is 10 seconds or your batch
size, whichever is bigger. You can change it by calling .checkpoint(x) on your
resulting Dstream.
For the rest, you are probably keeping an “all time” word count that grows
unbounded if you never remove words from
Hi Andrian,
Thanks for the information.
However your 2 suggestions couldn't really work for me.
Accuracy is the most important aspect in my application. So keeping only 15
minutes window stats or prune out some of keys is impossible for my
application.
I can change the checking point interval
"trackStateByKey" is about to be added in 1.6 to resolve the performance
issue of "updateStateByKey". You can take a look at
https://issues.apache.org/jira/browse/SPARK-2629 and
https://github.com/apache/spark/pull/9256
Yeah use streaming to gather the incoming logs and write to log file then
run a spark job evry 5 minutes to process the counts. Got it. Thanks a
lot.
On 07:07, Mon, 26 Jan 2015 Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Tue, Jan 20, 2015 at 8:16 PM, balu.naren balu.na...@gmail.com
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
On Tue, Jan 20, 2015 at 8:16 PM, balu.naren balu.na...@gmail.com wrote:
I am a beginner to spark streaming. So have a basic doubt regarding
checkpoints. My use case is to calculate the no of unique users by day. I
am using reduce by key and window for this. Where my window duration is 24
Thank you Jerry,
Does the window operation create new RDDs for each slide duration..?
I am asking this because i see a constant increase in memory even when
there is no logs received.
If not checkpoint is there any alternative that you would suggest.?
On Tue, Jan 20, 2015 at 7:08 PM,
for you?
I think it’s better and easy for you to change your implementation rather than
rely on Spark to handle this.
Thanks
Jerry
From: Balakrishnan Narendran [mailto:balu.na...@gmail.com]
Sent: Friday, January 23, 2015 12:19 AM
To: Shao, Saisai
Cc: user@spark.apache.org
Subject: Re: spark
Maybe you use a wrong approach - try something like hyperloglog or bitmap
structures as you can find them, for instance, in redis. They are much
smaller
Le 22 janv. 2015 17:19, Balakrishnan Narendran balu.na...@gmail.com a
écrit :
Thank you Jerry,
Does the window operation create new
Hi,
Seems you have such a large window (24 hours), so the phenomena of memory
increasing is expectable, because of window operation will cache the RDD within
this window in memory. So for your requirement, memory should be enough to hold
the data of 24 hours.
I don't think checkpoint in Spark
17 matches
Mail list logo