This might help
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala

Thanks
Best Regards

On Tue, Nov 4, 2014 at 6:03 AM, Harold Nguyen <har...@nexgate.com> wrote:

> Hi all,
>
> I was just reading this nice documentation here:
>
> http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html
>
> And got to the end of it, which says:
>
> "Note that there are more efficient ways to get the top 10 hashtags. For
> example, instead of sorting the entire of 5-minute-counts (thereby,
> incurring the cost of a data shuffle), one can get the top 10 hashtags in
> each partition, collect them together at the driver and then find the top
> 10 hashtags among them. We leave this as an exercise for the reader to try."
>
> I was just wondering if anyone had managed to do this, and was willing to
> share as an example :) This seems to be the exact use case that will help
> me!
>
> Thanks!
>
> Harold
>

Reply via email to