I am trying out StatefulNetworkWordCount from latest Spark master branch.
When I run this example I see a odd behaviour.
If in a batch a key is repeated the output stream prints for each
repetition e.g.  If I key in "ab" five times for input it will show like

(ab,1)
(ab,2)
(ab,3)
(ab,4)
(ab,5)

Is it the intended behaviour to show all the occurrence of the word, or is
it a bug ? If I am a user I would expect only the last entry (ab, 5) . Else
users has to put some logic in application code to get to the latest
value.  I know we can do this by snapshot, but IMO the updated stream
should give us similar functionality.

Is there a reason for not doing this ? i.e. for a given key if multiple
output is generated , only the last one should be returned back.

Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

Reply via email to