My primary goal : To get top 10 hashtag for every 5 mins interval. I want to do this efficiently. I have already done this by using reducebykeyandwindow() and then sorting all hashtag in 5 mins interval taking only top 10 elements. But this is very slow.
So I now I am thinking of retaining only top 10 hashtags in each RDD because these only could come in the final answer. I am stuck at : how to retain only top 10 hashtag in each RDD of my DSTREAM ? Basically I need to transform my DTREAM in which each RDD contains only top 10 hashtags so that number of hashtags in 5 mins interval is low. If there is some more efficient way of doing this then please let me know that also. Thanx, Nilesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517p6577.html Sent from the Apache Spark User List mailing list archive at Nabble.com.