Re: Spark Streaming data checkpoint performance

2015-11-07 Thread trung kien
ng >>>>> It took me 5 seconds to finish the same size micro-batch, why it's >>>>> too high? what's kind of job in checkpoint? >>>>> why it's keep increasing? >>>>> >>>>> 2/ When I changes the data checkpoint interval

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Thúy Hằng Lê
val works more stable. > > On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote: > >> Nice! Thanks for sharing, I wasn’t aware of the new API. >> >> Left some comments on the JIRA and design doc. >> >> -adrian >> >> From: S

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Aniket Bhatnagar
kpoint interval? >> >> It seems that default interval works more stable. >> >> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote: >> >>> Nice! Thanks for sharing, I wasn’t aware of the new API. >>> >>> Lef

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Aniket Bhatnagar
stats.checkpoint(Durations.seconds(100)); //change to 100, >>>> defaults is 10 >>>> >>>> The checkpoint is keep increasing significantly first checkpoint is >>>> 10s, second is 30s, third is 70s ... and keep increasing :) >>>> Why

Re: Spark Streaming data checkpoint performance

2015-11-06 Thread Thúy Hằng Lê
s 30s, third is 70s ... and keep increasing :) >>> Why it's too high when increasing checkpoint interval? >>> >>> It seems that default interval works more stable. >>> >>> On Nov 4, 2015 9:08 PM, "Adrian Tanase" <atan...@adobe.com> wrote: &

Re: Spark Streaming data checkpoint performance

2015-11-05 Thread Thúy Hằng Lê
> Cc: Adrian Tanase, "user@spark.apache.org" > Subject: Re: Spark Streaming data checkpoint performance > > "trackStateByKey" is about to be added in 1.6 to resolve the performance > issue of "updateStateByKey". You can take a look at > https://issues.apache.org/jira/browse/SPARK-2629 and > https://github.com/apache/spark/pull/9256 >

Re: Spark Streaming data checkpoint performance

2015-11-04 Thread Adrian Tanase
Spark Streaming data checkpoint performance "trackStateByKey" is about to be added in 1.6 to resolve the performance issue of "updateStateByKey". You can take a look at https://issues.apache.org/jira/browse/SPARK-2629 and https://github.com/apache/spark/pull/9256

Re: Spark Streaming data checkpoint performance

2015-11-02 Thread Adrian Tanase
From: Thúy Hằng Lê Date: Monday, November 2, 2015 at 7:20 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Spark Streaming data checkpoint performance JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

Re: Spark Streaming data checkpoint performance

2015-11-02 Thread Thúy Hằng Lê
er 2, 2015 at 7:20 AM > To: "user@spark.apache.org" > Subject: Spark Streaming data checkpoint performance > > JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, > Durations.seconds(2)); >

Re: Spark Streaming data checkpoint performance

2015-11-02 Thread Shixiong Zhu
"trackStateByKey" is about to be added in 1.6 to resolve the performance issue of "updateStateByKey". You can take a look at https://issues.apache.org/jira/browse/SPARK-2629 and https://github.com/apache/spark/pull/9256

Spark Streaming data checkpoint performance

2015-11-01 Thread Thúy Hằng Lê
Hi Spark guru I am evaluating Spark Streaming, In my application I need to maintain cumulative statistics (e.g the total running word count), so I need to call the updateStateByKey function on very micro-batch. After setting those things, I got following behaviors: * The Processing Time