Can you clarify what you're trying to achieve here ? If you want to take only top 10 of each RDD, why don't sort followed by take(10) of every RDD ?
Or, you want to take top 10 of five minutes ? Cheers, On Thu, May 29, 2014 at 2:04 PM, nilmish <nilmish....@gmail.com> wrote: > I have a DSTREAM which consists of RDD partitioned every 2 sec. I have > sorted > each RDD and want to retain only top 10 values and discard further value. > How can I retain only top 10 values ? > > I am trying to get top 10 hashtags. Instead of sorting the entire of > 5-minute-counts (thereby, incurring the cost of a data shuffle), I am > trying > to get the top 10 hashtags in each partition. I am struck at how to retain > top 10 hashtags in each partition. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >