you confirm?
Thanks again!
-A
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: April-30-14 4:30 PM
To: user@spark.apache.org
Subject: Re: What is Seq[V] in updateStateByKey?
S is the previous count, if any. Seq[V] are potentially many new counts. All of
them have
...@cloudera.com]
Sent: April-30-14 4:30 PM
To: user@spark.apache.org
Subject: Re: What is Seq[V] in updateStateByKey?
S is the previous count, if any. Seq[V] are potentially many new counts.
All of them have to be added together to keep an accurate total. It's as
if the count were 3, and I tell
S is the previous count, if any. Seq[V] are potentially many new
counts. All of them have to be added together to keep an accurate
total. It's as if the count were 3, and I tell you I've just observed
2, 5, and 1 additional occurrences -- the new count is 3 + (2+5+1) not
1 + 1.
I butted in
Yeah, I remember changing fold to sum in a few places, probably in
testsuites, but missed this example I guess.
On Wed, Apr 30, 2014 at 1:29 PM, Sean Owen so...@cloudera.com wrote:
S is the previous count, if any. Seq[V] are potentially many new
counts. All of them have to be added together
The original DStream is of (K,V). This function creates a DStream of
(K,S). Each time slice brings one or more new V for each K. The old
state S (can be different from V!) for each K -- possibly non-existent
-- is updated in some way by a bunch of new V, to produce a new state
S -- which also
You may have already seen it, but I will mention it anyways. This example
may help.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala
Here the state is essentially a running count of the words seen. So the
value