Re: Spark Streaming - Most popular Twitter Hashtags

2014-11-04 Thread Akhil Das
This might help
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala

Thanks
Best Regards

On Tue, Nov 4, 2014 at 6:03 AM, Harold Nguyen har...@nexgate.com wrote:

 Hi all,

 I was just reading this nice documentation here:

 http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html

 And got to the end of it, which says:

 Note that there are more efficient ways to get the top 10 hashtags. For
 example, instead of sorting the entire of 5-minute-counts (thereby,
 incurring the cost of a data shuffle), one can get the top 10 hashtags in
 each partition, collect them together at the driver and then find the top
 10 hashtags among them. We leave this as an exercise for the reader to try.

 I was just wondering if anyone had managed to do this, and was willing to
 share as an example :) This seems to be the exact use case that will help
 me!

 Thanks!

 Harold



Spark Streaming - Most popular Twitter Hashtags

2014-11-03 Thread Harold Nguyen
Hi all,

I was just reading this nice documentation here:
http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html

And got to the end of it, which says:

Note that there are more efficient ways to get the top 10 hashtags. For
example, instead of sorting the entire of 5-minute-counts (thereby,
incurring the cost of a data shuffle), one can get the top 10 hashtags in
each partition, collect them together at the driver and then find the top
10 hashtags among them. We leave this as an exercise for the reader to try.

I was just wondering if anyone had managed to do this, and was willing to
share as an example :) This seems to be the exact use case that will help
me!

Thanks!

Harold