Hah! No, that is not a "starter" issue. It touches on some fairly deep Spark architecture, and there have already been a few attempts to resolve the issue -- none entirely satisfactory, but you should definitely search out the work that has already been done.
On Mon, Nov 2, 2015 at 5:51 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Answering my own question after...searching sortByKey in the mailing > list archives and later in JIRA. > > It turns out it's a known issue and filed under > https://issues.apache.org/jira/browse/SPARK-1021 "sortByKey() launches > a cluster job when it shouldn't". > > It's labelled "starter" that should not be that hard to fix. Does this > still hold? I'd like to work on it if it's "simple" and doesn't get me > swamped. Thanks! > > Pozdrawiam, > Jacek > > -- > Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl > Follow me at https://twitter.com/jaceklaskowski > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski > > > On Mon, Nov 2, 2015 at 2:34 PM, Jacek Laskowski <ja...@japila.pl> wrote: > > Hi Sparkians, > > > > I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default > > local[*] master. > > > > I created an RDD of pairs using the following snippet: > > > > val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean)) > > > > It's all fine so far. The map transformation causes no computation. > > > > I thought all transformations are lazy and trigger no job until an > > action's called. It seems I was wrong with sortByKey()! When I called > > `rdd.sortByKey()`, it started a job: sortByKey at <console>:27 (!) > > > > Can anyone explain what makes for the different behaviour of sortByKey > > since it is a transformation and hence should be lazy? Is this a > > special transformation? > > > > Pozdrawiam, > > Jacek > > > > -- > > Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl > > Follow me at https://twitter.com/jaceklaskowski > > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >