subject:"Streaming\: getting total count over all windows"

Streaming: getting total count over all windows

2014-11-13 Thread SK

Hi,

I am using the following code to generate the (score, count) for each
window:

val score_count_by_window  = topic.map(r =  r._2)   // r._2 is the integer
score
 .countByValue()
   
score_count_by_window.print()   

E.g. output for a window is as follows, which means that within the Dstream
for that window, there are 2 rdds with score 0; 3 with score 1, and 1 with
score -1.
(0, 2)
(1, 3)
(-1, 1)

I would like to get the aggregate count for each score over all windows
until program terminates. I tried countByValueAndWindow() but the result is
same as countByValue() (i.e. it is producing only per window counts). 
reduceByWindow also does not produce the result I am expecting. What is the
correct way to sum up the counts over multiple windows?

thanks










--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-getting-total-count-over-all-windows-tp1.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Streaming: getting total count over all windows

2014-11-13 Thread jay vyas

I would think this should be done at the application level.
After all, the core functionality of SparkStreaming is to capture RDDs in
some real time interval and process them -
not to aggregate their results.

But maybe there is a better way...

On Thu, Nov 13, 2014 at 8:28 PM, SK skrishna...@gmail.com wrote:

 Hi,

 I am using the following code to generate the (score, count) for each
 window:

 val score_count_by_window  = topic.map(r =  r._2)   // r._2 is the integer
 score
  .countByValue()

 score_count_by_window.print()

 E.g. output for a window is as follows, which means that within the Dstream
 for that window, there are 2 rdds with score 0; 3 with score 1, and 1 with
 score -1.
 (0, 2)
 (1, 3)
 (-1, 1)

 I would like to get the aggregate count for each score over all windows
 until program terminates. I tried countByValueAndWindow() but the result is
 same as countByValue() (i.e. it is producing only per window counts).
 reduceByWindow also does not produce the result I am expecting. What is the
 correct way to sum up the counts over multiple windows?

 thanks










 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-getting-total-count-over-all-windows-tp1.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
jay vyas

Streaming: getting total count over all windows

Re: Streaming: getting total count over all windows

2 matches

Site Navigation

Mail list logo

Footer information