Xiangrui Meng created SPARK-4104: ------------------------------------ Summary: KVArraySortDataFormat is not as fast as Java's Arrays.sort() Key: SPARK-4104 URL: https://issues.apache.org/jira/browse/SPARK-4104 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: Xiangrui Meng
The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million: {code} uple-sort using Arrays.sort(): Took 25626 ms Tuple-sort using Arrays.sort(): Took 28018 ms Tuple-sort using Arrays.sort(): Took 26932 ms Tuple-sort using Arrays.sort(): Took 24436 ms Tuple-sort using Arrays.sort(): Took 25894 ms Tuple-sort using Arrays.sort(): Took 24965 ms Tuple-sort using Arrays.sort(): Took 23817 ms Tuple-sort using Arrays.sort(): Took 23692 ms Tuple-sort using Arrays.sort(): Took 26731 ms Tuple-sort using Arrays.sort(): Took 23667 ms Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average) KV-sort using Sorter: Took 39579 ms KV-sort using Sorter: Took 39176 ms KV-sort using Sorter: Took 41760 ms KV-sort using Sorter: Took 42469 ms KV-sort using Sorter: Took 43133 ms KV-sort using Sorter: Took 41692 ms KV-sort using Sorter: Took 39585 ms KV-sort using Sorter: Took 41617 ms KV-sort using Sorter: Took 42300 ms KV-sort using Sorter: Took 48274 ms KV-sort using Sorter: (47217 ms first try, 41958 ms average) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org