Davies Liu created SPARK-2983: --------------------------------- Summary: improve performance of sortByKey() Key: SPARK-2983 URL: https://issues.apache.org/jira/browse/SPARK-2983 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.0.2, 0.9.0, 1.1.0 Reporter: Davies Liu
For large datasets with many partitions (N), sortByKey() will be very slow, because it will take O(N) time in rangePartitioner. This could be improved by using binary search, the time will be reduced to O(logN). -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org