Voted :) https://issues.apache.org/jira/browse/SPARK-983
On Tue, May 20, 2014 at 10:21 AM, Sandy Ryza <sandy.r...@cloudera.com>wrote: > There is: SPARK-545 > > > On Tue, May 20, 2014 at 10:16 AM, Andrew Ash <and...@andrewash.com> wrote: > > > Sandy, is there a Jira ticket for that? > > > > > > On Tue, May 20, 2014 at 10:12 AM, Sandy Ryza <sandy.r...@cloudera.com > > >wrote: > > > > > sortByKey currently requires partitions to fit in memory, but there are > > > plans to add external sort > > > > > > > > > On Tue, May 20, 2014 at 10:10 AM, Madhu <ma...@madhu.com> wrote: > > > > > > > Thanks Sean, I had seen that post you mentioned. > > > > > > > > What you suggest looks an in-memory sort, which is fine if each > > partition > > > > is > > > > small enough to fit in memory. Is it true that rdd.sortByKey(...) > > > requires > > > > partitions to fit in memory? I wasn't sure if there was some magic > > behind > > > > the scenes that supports arbitrarily large sorts. > > > > > > > > None of this is a show stopper, it just might require a little more > > code > > > on > > > > the part of the developer. If there's a requirement for Spark > > partitions > > > to > > > > fit in memory, developers will have to be aware of that and plan > > > > accordingly. One nice feature of Hadoop MR is the ability to sort > very > > > > large > > > > sets without thinking about data size. > > > > > > > > In the case that a developer repartitions an RDD such that some > > > partitions > > > > don't fit in memory, sorting those partitions requires more work. For > > > these > > > > cases, I think there is value in having a robust partition sorting > > method > > > > that deals with it efficiently and reliably. > > > > > > > > Is there another solution for sorting arbitrarily large partitions? > If > > > not, > > > > I don't mind developing and contributing a solution. > > > > > > > > > > > > > > > > > > > > ----- > > > > -- > > > > Madhu > > > > https://www.linkedin.com/in/msiddalingaiah > > > > -- > > > > View this message in context: > > > > > > > > > > http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java-tp6715p6719.html > > > > Sent from the Apache Spark Developers List mailing list archive at > > > > Nabble.com. > > > > > > > > > >