chie hayashida created SPARK-21358: -------------------------------------- Summary: variable of repartitionandsortwithinpartitions at pyspark Key: SPARK-21358 URL: https://issues.apache.org/jira/browse/SPARK-21358 Project: Spark Issue Type: Improvement Components: Documentation, Examples Affects Versions: 2.1.1 Reporter: chie hayashida Priority: Minor
In rdd.py, implementation of repartitionandsortwithinpartitions is below. ``` def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): ``` And at document, there is following sample script. ``` >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) ``` The third argument (ascending) expected to be boolean, so following script is better, I think. ``` >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org