Why does sortByKey() transformation trigger a job in spark-shell?

2015-11-02 Thread Jacek Laskowski
Hi Sparkians, I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default local[*] master. I created an RDD of pairs using the following snippet: val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean)) It's all fine so far. The map transformation causes no

Re: Why does sortByKey() transformation trigger a job in spark-shell?

2015-11-02 Thread Jacek Laskowski
Hi, Answering my own question after...searching sortByKey in the mailing list archives and later in JIRA. It turns out it's a known issue and filed under https://issues.apache.org/jira/browse/SPARK-1021 "sortByKey() launches a cluster job when it shouldn't". It's labelled "starter" that should

Re: Why does sortByKey() transformation trigger a job in spark-shell?

2015-11-02 Thread Mark Hamstra
Hah! No, that is not a "starter" issue. It touches on some fairly deep Spark architecture, and there have already been a few attempts to resolve the issue -- none entirely satisfactory, but you should definitely search out the work that has already been done. On Mon, Nov 2, 2015 at 5:51 AM,