It's an Iterator in both Java and Scala. In both cases you need to copy the stream of values into something List-like to sort it. An Iterable would not change that (not sure the API can promise many iterations anyway).
If you just want the equivalent of "toArray", you can use a utility method in Commons Collections or Guava. Guava's Lists.newArrayList(Iterator) does nicely, which you can then Collections.sort() with a Comparator and the return its iterator() I dug this up too, remembering a similar question: http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3c529f819f.3060...@vu.nl%3E On Tue, May 20, 2014 at 2:25 PM, Madhu <ma...@madhu.com> wrote: > I'm trying to sort data in each partition of an RDD. > I was able to do it successfully in Scala like this: > > val sorted = rdd.mapPartitions(iter => { > iter.toArray.sortWith((x, y) => x._2.compare(y._2) < 0).iterator > }, > preservesPartitioning = true) > > I used the same technique as in OrderedRDDFunctions.scala, so I assume it's > a reasonable way to do it. > > This works well so far, but I can't seem to do the same thing in Java > because 'iter' in the Java APIs is an Iterator rather than an Iterable. > There may be an unattractive workaround, but I didn't pursue it. > > Ideally, it would be nice to have an efficient, robust method in RDD to sort > each partition. > Does something like that exist? > > Thanks! > > > > ----- > -- > Madhu > https://www.linkedin.com/in/msiddalingaiah > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java-tp6715.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com.