Repository: spark Updated Branches: refs/heads/branch-1.0 18c8c3833 -> d4aed266d
[SPARK-4304] [PySpark] Fix sort on empty RDD (1.0 branch) This PR fix sortBy()/sortByKey() on empty RDD. This should be back ported into 1.0 Author: Davies Liu <dav...@databricks.com> Closes #3163 from davies/fix_sort_1.0 and squashes the following commits: 9be984f [Davies Liu] fix sort on empty RDD Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d4aed266 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d4aed266 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d4aed266 Branch: refs/heads/branch-1.0 Commit: d4aed266d3db3cb3aea711f30aa058c74bfe60a5 Parents: 18c8c38 Author: Davies Liu <dav...@databricks.com> Authored: Fri Nov 7 20:57:56 2014 -0800 Committer: Josh Rosen <joshro...@databricks.com> Committed: Fri Nov 7 20:57:56 2014 -0800 ---------------------------------------------------------------------- python/pyspark/rdd.py | 2 ++ python/pyspark/tests.py | 3 +++ 2 files changed, 5 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/d4aed266/python/pyspark/rdd.py ---------------------------------------------------------------------- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 368ab50..57c2cd7 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -496,6 +496,8 @@ class RDD(object): # number of (key, value) pairs falling into them if numPartitions > 1: rddSize = self.count() + if not rddSize: + return self maxSampleSize = numPartitions * 20.0 # constant from Spark's RangePartitioner fraction = min(maxSampleSize / max(rddSize, 1), 1.0) http://git-wip-us.apache.org/repos/asf/spark/blob/d4aed266/python/pyspark/tests.py ---------------------------------------------------------------------- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index 45284ee..8f5b48d 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -198,6 +198,9 @@ class TestRDDFunctions(PySparkTestCase): os.unlink(tempFile.name) self.assertRaises(Exception, lambda: filtered_data.count()) + def test_sort_on_empty_rdd(self): + self.assertEqual([], self.sc.parallelize(zip([], [])).sortByKey().collect()) + def test_itemgetter(self): rdd = self.sc.parallelize([range(10)]) from operator import itemgetter --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org