Hi Sparkians,
I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default
local[*] master.
I created an RDD of pairs using the following snippet:
val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean))
It's all fine so far. The map transformation causes no
Hi,
Answering my own question after...searching sortByKey in the mailing
list archives and later in JIRA.
It turns out it's a known issue and filed under
https://issues.apache.org/jira/browse/SPARK-1021 "sortByKey() launches
a cluster job when it shouldn't".
It's labelled "starter" that should
Hah! No, that is not a "starter" issue. It touches on some fairly deep
Spark architecture, and there have already been a few attempts to resolve
the issue -- none entirely satisfactory, but you should definitely search
out the work that has already been done.
On Mon, Nov 2, 2015 at 5:51 AM,