[ https://issues.apache.org/jira/browse/SPARK-21358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chie hayashida updated SPARK-21358: ----------------------------------- Description: In rdd.py, implementation of repartitionandsortwithinpartitions is below. {code} def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): {code} And at document, there is following sample script. {code} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) {code} The third argument (ascending) expected to be boolean, so following script is better, I think. {code} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) {code} was: In rdd.py, implementation of repartitionandsortwithinpartitions is below. {code:python} def repartitionAndSortWithinPartitions(self, numPartitions=None, partitionFunc=portable_hash, ascending=True, keyfunc=lambda x: x): {code} And at document, there is following sample script. {code:python} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) {code} The third argument (ascending) expected to be boolean, so following script is better, I think. {code:python} >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) {code} > Argument of repartitionandsortwithinpartitions at pyspark > --------------------------------------------------------- > > Key: SPARK-21358 > URL: https://issues.apache.org/jira/browse/SPARK-21358 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples > Affects Versions: 2.1.1 > Reporter: chie hayashida > Priority: Minor > > In rdd.py, implementation of repartitionandsortwithinpartitions is below. > {code} > def repartitionAndSortWithinPartitions(self, numPartitions=None, > partitionFunc=portable_hash, > ascending=True, keyfunc=lambda x: > x): > {code} > And at document, there is following sample script. > {code} > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > 2) > {code} > The third argument (ascending) expected to be boolean, so following script is > better, I think. > {code} > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > True) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org