Voted :)

https://issues.apache.org/jira/browse/SPARK-983


On Tue, May 20, 2014 at 10:21 AM, Sandy Ryza <sandy.r...@cloudera.com>wrote:

> There is: SPARK-545
>
>
> On Tue, May 20, 2014 at 10:16 AM, Andrew Ash <and...@andrewash.com> wrote:
>
> > Sandy, is there a Jira ticket for that?
> >
> >
> > On Tue, May 20, 2014 at 10:12 AM, Sandy Ryza <sandy.r...@cloudera.com
> > >wrote:
> >
> > > sortByKey currently requires partitions to fit in memory, but there are
> > > plans to add external sort
> > >
> > >
> > > On Tue, May 20, 2014 at 10:10 AM, Madhu <ma...@madhu.com> wrote:
> > >
> > > > Thanks Sean, I had seen that post you mentioned.
> > > >
> > > > What you suggest looks an in-memory sort, which is fine if each
> > partition
> > > > is
> > > > small enough to fit in memory. Is it true that rdd.sortByKey(...)
> > > requires
> > > > partitions to fit in memory? I wasn't sure if there was some magic
> > behind
> > > > the scenes that supports arbitrarily large sorts.
> > > >
> > > > None of this is a show stopper, it just might require a little more
> > code
> > > on
> > > > the part of the developer. If there's a requirement for Spark
> > partitions
> > > to
> > > > fit in memory, developers will have to be aware of that and plan
> > > > accordingly. One nice feature of Hadoop MR is the ability to sort
> very
> > > > large
> > > > sets without thinking about data size.
> > > >
> > > > In the case that a developer repartitions an RDD such that some
> > > partitions
> > > > don't fit in memory, sorting those partitions requires more work. For
> > > these
> > > > cases, I think there is value in having a robust partition sorting
> > method
> > > > that deals with it efficiently and reliably.
> > > >
> > > > Is there another solution for sorting arbitrarily large partitions?
> If
> > > not,
> > > > I don't mind developing and contributing a solution.
> > > >
> > > >
> > > >
> > > >
> > > > -----
> > > > --
> > > > Madhu
> > > > https://www.linkedin.com/in/msiddalingaiah
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java-tp6715p6719.html
> > > > Sent from the Apache Spark Developers List mailing list archive at
> > > > Nabble.com.
> > > >
> > >
> >
>

Reply via email to