RE: What is Seq[V] in updateStateByKey?

2014-05-01 Thread Adrian Mocanu
you confirm? Thanks again! -A -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: April-30-14 4:30 PM To: user@spark.apache.org Subject: Re: What is Seq[V] in updateStateByKey? S is the previous count, if any. Seq[V] are potentially many new counts. All of them have

Re: What is Seq[V] in updateStateByKey?

2014-05-01 Thread Tathagata Das
...@cloudera.com] Sent: April-30-14 4:30 PM To: user@spark.apache.org Subject: Re: What is Seq[V] in updateStateByKey? S is the previous count, if any. Seq[V] are potentially many new counts. All of them have to be added together to keep an accurate total. It's as if the count were 3, and I tell

Re: What is Seq[V] in updateStateByKey?

2014-04-30 Thread Sean Owen
S is the previous count, if any. Seq[V] are potentially many new counts. All of them have to be added together to keep an accurate total. It's as if the count were 3, and I tell you I've just observed 2, 5, and 1 additional occurrences -- the new count is 3 + (2+5+1) not 1 + 1. I butted in

Re: What is Seq[V] in updateStateByKey?

2014-04-30 Thread Tathagata Das
Yeah, I remember changing fold to sum in a few places, probably in testsuites, but missed this example I guess. On Wed, Apr 30, 2014 at 1:29 PM, Sean Owen so...@cloudera.com wrote: S is the previous count, if any. Seq[V] are potentially many new counts. All of them have to be added together

Re: What is Seq[V] in updateStateByKey?

2014-04-29 Thread Sean Owen
The original DStream is of (K,V). This function creates a DStream of (K,S). Each time slice brings one or more new V for each K. The old state S (can be different from V!) for each K -- possibly non-existent -- is updated in some way by a bunch of new V, to produce a new state S -- which also

Re: What is Seq[V] in updateStateByKey?

2014-04-29 Thread Tathagata Das
You may have already seen it, but I will mention it anyways. This example may help. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala Here the state is essentially a running count of the words seen. So the value