Hah!  No, that is not a "starter" issue.  It touches on some fairly deep
Spark architecture, and there have already been a few attempts to resolve
the issue -- none entirely satisfactory, but you should definitely search
out the work that has already been done.

On Mon, Nov 2, 2015 at 5:51 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Answering my own question after...searching sortByKey in the mailing
> list archives and later in JIRA.
>
> It turns out it's a known issue and filed under
> https://issues.apache.org/jira/browse/SPARK-1021 "sortByKey() launches
> a cluster job when it shouldn't".
>
> It's labelled "starter" that should not be that hard to fix. Does this
> still hold? I'd like to work on it if it's "simple" and doesn't get me
> swamped. Thanks!
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Mon, Nov 2, 2015 at 2:34 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> > Hi Sparkians,
> >
> > I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default
> > local[*] master.
> >
> > I created an RDD of pairs using the following snippet:
> >
> > val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean))
> >
> > It's all fine so far. The map transformation causes no computation.
> >
> > I thought all transformations are lazy and trigger no job until an
> > action's called. It seems I was wrong with sortByKey()! When I called
> > `rdd.sortByKey()`, it started a job: sortByKey at <console>:27 (!)
> >
> > Can anyone explain what makes for the different behaviour of sortByKey
> > since it is a transformation and hence should be lazy? Is this a
> > special transformation?
> >
> > Pozdrawiam,
> > Jacek
> >
> > --
> > Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
> > Follow me at https://twitter.com/jaceklaskowski
> > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to