spark git commit: [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark

rxin Mon, 10 Jul 2017 18:57:07 -0700

Repository: spark
Updated Branches:
  refs/heads/master d03aebbe6 -> c3713fde8



[SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at 
pyspark

## What changes were proposed in this pull request?
At example of repartitionAndSortWithinPartitions at rdd.py, third argument 
should be True or False.
I proposed fix of example code.

## How was this patch tested?
* I rename test_repartitionAndSortWithinPartitions to 
test_repartitionAndSortWIthinPartitions_asc to specify boolean argument.
* I added test_repartitionAndSortWithinPartitions_desc to test False pattern at 
third argument.

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: chie8842 <chie8...@gmail.com>

Closes #18586 from chie8842/SPARK-21358.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3713fde
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3713fde
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3713fde

Branch: refs/heads/master
Commit: c3713fde86204bf3f027483914ff9e60e7aad261
Parents: d03aebb
Author: chie8842 <chie8...@gmail.com>
Authored: Mon Jul 10 18:56:54 2017 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Mon Jul 10 18:56:54 2017 -0700

----------------------------------------------------------------------
 python/pyspark/rdd.py   |  2 +-
 python/pyspark/tests.py | 12 ++++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c3713fde/python/pyspark/rdd.py
----------------------------------------------------------------------
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 7dfa17f..3325b65 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -608,7 +608,7 @@ class RDD(object):
         sort records by their keys.
 
         >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 
3)])
-        >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 
2)
+        >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 
True)
         >>> rdd2.glom().collect()
         [[(0, 5), (0, 8), (2, 6)], [(1, 3), (3, 8), (3, 8)]]
         """

http://git-wip-us.apache.org/repos/asf/spark/blob/c3713fde/python/pyspark/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py
index bb13de5..73ab442 100644
--- a/python/pyspark/tests.py
+++ b/python/pyspark/tests.py
@@ -1019,14 +1019,22 @@ class RDDTests(ReusedPySparkTestCase):
         self.assertEqual((["ab", "ef"], [5]), rdd.histogram(1))
         self.assertRaises(TypeError, lambda: rdd.histogram(2))
 
-    def test_repartitionAndSortWithinPartitions(self):
+    def test_repartitionAndSortWithinPartitions_asc(self):
         rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 
3)], 2)
 
-        repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: 
key % 2)
+        repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: 
key % 2, True)
         partitions = repartitioned.glom().collect()
         self.assertEqual(partitions[0], [(0, 5), (0, 8), (2, 6)])
         self.assertEqual(partitions[1], [(1, 3), (3, 8), (3, 8)])
 
+    def test_repartitionAndSortWithinPartitions_desc(self):
+        rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 
3)], 2)
+
+        repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: 
key % 2, False)
+        partitions = repartitioned.glom().collect()
+        self.assertEqual(partitions[0], [(2, 6), (0, 5), (0, 8)])
+        self.assertEqual(partitions[1], [(3, 8), (3, 8), (1, 3)])
+
     def test_repartition_no_skewed(self):
         num_partitions = 20
         a = self.sc.parallelize(range(int(1000)), 2)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark

Reply via email to