[ https://issues.apache.org/jira/browse/SPARK-21358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chie hayashida updated SPARK-21358: ----------------------------------- Summary: Argument of repartitionandsortwithinpartitions at pyspark (was: variable of repartitionandsortwithinpartitions at pyspark) > Argument of repartitionandsortwithinpartitions at pyspark > --------------------------------------------------------- > > Key: SPARK-21358 > URL: https://issues.apache.org/jira/browse/SPARK-21358 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples > Affects Versions: 2.1.1 > Reporter: chie hayashida > Priority: Minor > > In rdd.py, implementation of repartitionandsortwithinpartitions is below. > ``` > def repartitionAndSortWithinPartitions(self, numPartitions=None, > partitionFunc=portable_hash, > ascending=True, keyfunc=lambda x: > x): > ``` > And at document, there is following sample script. > ``` > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > 2) > ``` > The third argument (ascending) expected to be boolean, so following script is > better, I think. > ``` > >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, > 3)]) > >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, > True) > ``` -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org