Hi,

Answering my own question after...searching sortByKey in the mailing
list archives and later in JIRA.

It turns out it's a known issue and filed under
https://issues.apache.org/jira/browse/SPARK-1021 "sortByKey() launches
a cluster job when it shouldn't".

It's labelled "starter" that should not be that hard to fix. Does this
still hold? I'd like to work on it if it's "simple" and doesn't get me
swamped. Thanks!

Pozdrawiam,
Jacek

--
Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski


On Mon, Nov 2, 2015 at 2:34 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi Sparkians,
>
> I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default
> local[*] master.
>
> I created an RDD of pairs using the following snippet:
>
> val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean))
>
> It's all fine so far. The map transformation causes no computation.
>
> I thought all transformations are lazy and trigger no job until an
> action's called. It seems I was wrong with sortByKey()! When I called
> `rdd.sortByKey()`, it started a job: sortByKey at <console>:27 (!)
>
> Can anyone explain what makes for the different behaviour of sortByKey
> since it is a transformation and hence should be lazy? Is this a
> special transformation?
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to