So if you want to do from beginning to end of time the interface is updateStatebykey, if only over a particular set of windows you can construct broader windows from smaller windows/batches.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Nov 14, 2014 at 9:17 AM, jay vyas <jayunit100.apa...@gmail.com> wrote: > I would think this should be done at the application level. > After all, the core functionality of SparkStreaming is to capture RDDs in > some real time interval and process them - > not to aggregate their results. > > But maybe there is a better way....... > > On Thu, Nov 13, 2014 at 8:28 PM, SK <skrishna...@gmail.com> wrote: > >> Hi, >> >> I am using the following code to generate the (score, count) for each >> window: >> >> val score_count_by_window = topic.map(r => r._2) // r._2 is the >> integer >> score >> .countByValue() >> >> score_count_by_window.print() >> >> E.g. output for a window is as follows, which means that within the >> Dstream >> for that window, there are 2 rdds with score 0; 3 with score 1, and 1 with >> score -1. >> (0, 2) >> (1, 3) >> (-1, 1) >> >> I would like to get the aggregate count for each score over all windows >> until program terminates. I tried countByValueAndWindow() but the result >> is >> same as countByValue() (i.e. it is producing only per window counts). >> reduceByWindow also does not produce the result I am expecting. What is >> the >> correct way to sum up the counts over multiple windows? >> >> thanks >> >> >> >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-getting-total-count-over-all-windows-tp18888.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > jay vyas >