Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-29 Thread Shixiong(Ryan) Zhu
Could you post the screenshot of the Streaming DAG and also the driver log? It would be great if you have a simple producer for us to debug. On Mon, Feb 29, 2016 at 1:39 AM, Abhishek Anand wrote: > Hi Ryan, > > Its not working even after removing the reduceByKey. > >

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-29 Thread Abhishek Anand
Hi Ryan, Its not working even after removing the reduceByKey. So, basically I am doing the following - reading from kafka - flatmap inside transform - mapWithState - rdd.count on output of mapWithState But to my surprise still dont see checkpointing taking place. Is there any restriction to

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-28 Thread Shixiong(Ryan) Zhu
Sorry that I forgot to tell you that you should also call `rdd.count()` for "reduceByKey" as well. Could you try it and see if it works? On Sat, Feb 27, 2016 at 1:17 PM, Abhishek Anand wrote: > Hi Ryan, > > I am using mapWithState after doing reduceByKey. > > I am right

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-27 Thread Abhishek Anand
Hi Ryan, I am using mapWithState after doing reduceByKey. I am right now using mapWithState as you suggested and triggering the count manually. But, still unable to see any checkpointing taking place. In the DAG I can see that the reduceByKey operation for the previous batches are also being

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-22 Thread Shixiong(Ryan) Zhu
Hey Abhi, Using reducebykeyandwindow and mapWithState will trigger the bug in SPARK-6847. Here is a workaround to trigger checkpoint manually: JavaMapWithStateDStream<...> stateDStream = myPairDstream.mapWithState(StateSpec.function(mappingFunc)); stateDStream.foreachRDD(new

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-22 Thread Shixiong(Ryan) Zhu
Hey, Ted, As the fix for SPARK-6847 changes the semantics of Streaming checkpointing, it doesn't go into branch 1.6. A workaround is calling `count` to trigger the checkpoint manually. Such as, val dstream = ... // dstream is an operator needing to be checkpointed. dstream.foreachRDD(rdd =>

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-22 Thread Ted Yu
Fix for SPARK-6847 is not in branch-1.6 Should the fix be ported to branch-1.6 ? Thanks > On Feb 22, 2016, at 11:55 AM, Shixiong(Ryan) Zhu > wrote: > > Hey Abhi, > > Could you post how you use mapWithState? By default, it should do > checkpointing every 10 batches.

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-22 Thread Abhishek Anand
Hi Ryan, Reposting the code. Basically my use case is something like - I am receiving the web impression logs and may get the notify (listening from kafka) for those impressions in the same interval (for me its 1 min) or any next interval (upto 2 hours). Now, when I receive notify for a

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-22 Thread Shixiong(Ryan) Zhu
Hey Abhi, Could you post how you use mapWithState? By default, it should do checkpointing every 10 batches. However, there is a known issue that prevents mapWithState from checkpointing in some special cases: https://issues.apache.org/jira/browse/SPARK-6847 On Mon, Feb 22, 2016 at 5:47 AM,

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-22 Thread Abhishek Anand
Any Insights on this one ? Thanks !!! Abhi On Mon, Feb 15, 2016 at 11:08 PM, Abhishek Anand wrote: > I am now trying to use mapWithState in the following way using some > example codes. But, by looking at the DAG it does not seem to checkpoint > the state and when

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-15 Thread Abhishek Anand
I am now trying to use mapWithState in the following way using some example codes. But, by looking at the DAG it does not seem to checkpoint the state and when restarting the application from checkpoint, it re-partitions all the previous batches data from kafka. static Function3

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-13 Thread Abhishek Anand
Does mapWithState checkpoints the data ? When my application goes down and is restarted from checkpoint, will mapWithState need to recompute the previous batches data ? Also, to use mapWithState I will need to upgrade my application as I am using version 1.4.0 and mapWithState isnt supported

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-13 Thread Ted Yu
mapWithState supports checkpoint. There has been some bug fix since release of 1.6.0 e.g. SPARK-12591 NullPointerException using checkpointed mapWithState with KryoSerializer which is in the upcoming 1.6.1 Cheers On Sat, Feb 13, 2016 at 12:05 PM, Abhishek Anand

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-13 Thread Sebastian Piu
If you don't want to update your only option will be updateStateByKey then On 13 Feb 2016 8:48 p.m., "Ted Yu" wrote: > mapWithState supports checkpoint. > > There has been some bug fix since release of 1.6.0 > e.g. > SPARK-12591 NullPointerException using checkpointed

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-11 Thread Sebastian Piu
Looks like mapWithState could help you? On 11 Feb 2016 8:40 p.m., "Abhishek Anand" wrote: > Hi All, > > I have an use case like follows in my production environment where I am > listening from kafka with slideInterval of 1 min and windowLength of 2 > hours. > > I have a

Stateful Operation on JavaPairDStream Help Needed !!

2016-02-11 Thread Abhishek Anand
Hi All, I have an use case like follows in my production environment where I am listening from kafka with slideInterval of 1 min and windowLength of 2 hours. I have a JavaPairDStream where for each key I am getting the same key but with different value,which might appear in the same batch or