Spark Streaming - Most popular Twitter Hashtags

Harold Nguyen Mon, 03 Nov 2014 16:35:34 -0800

Hi all,

I was just reading this nice documentation here:
http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html


And got to the end of it, which says:

"Note that there are more efficient ways to get the top 10 hashtags. For
example, instead of sorting the entire of 5-minute-counts (thereby,
incurring the cost of a data shuffle), one can get the top 10 hashtags in
each partition, collect them together at the driver and then find the top
10 hashtags among them. We leave this as an exercise for the reader to try."

I was just wondering if anyone had managed to do this, and was willing to
share as an example :) This seems to be the exact use case that will help
me!

Thanks!

Harold

Spark Streaming - Most popular Twitter Hashtags

Reply via email to