Could be a bug. Can you share a code with data that I can use to reproduce this?
TD On May 2, 2014 9:49 AM, "Adrian Mocanu" <amoc...@verticalscope.com> wrote: > Has anyone else noticed that *sometimes* the same tuple calls update > state function twice? > > I have 2 tuples with the same key in 1 RDD part of DStream: RDD[ (a,1), > (a,2) ] > > When the update function is called the first time Seq[V] has data: 1, 2 > which is correct: StateClass(3,2, ArrayBuffer(1, 2)) > > Then right away (in my output I see this) the same key is used and the > function is called again but this time Seq is empty: StateClass(3,2, > ArrayBuffer( )) > > > > In the update function I also save Seq[V] to state so I can see it in the > RDD. I also show a count and sum of the values. > > StateClass(sum, count, Seq[V]) > > > > Why is the update function called with empty Seq[V] on the same key when > all values for that key have been already taken care of in a previous > update? > > > > -Adrian > > >