Spark RDD sortByKey triggering a new job

2015-04-24 Thread Spico Florin
I have tested sortByKey method with the following code and I have observed that is triggering a new job when is called. I could find this in the neither in API nor in the code. Is this an indented behavior? For example, the RDD zipWithIndex method API specifies that will trigger a new job. But

Re: Spark RDD sortByKey triggering a new job

2015-04-24 Thread Sean Owen
Yes, I think this is a known issue, that sortByKey actually runs a job to assess the distribution of the data. https://issues.apache.org/jira/browse/SPARK-1021 I think further eyes on it would be welcome as it's not desirable. On Fri, Apr 24, 2015 at 9:57 AM, Spico Florin spicoflo...@gmail.com