Xiangrui Meng created SPARK-4104:
------------------------------------

             Summary: KVArraySortDataFormat is not as fast as Java's 
Arrays.sort()
                 Key: SPARK-4104
                 URL: https://issues.apache.org/jira/browse/SPARK-4104
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.1.0, 1.2.0
            Reporter: Xiangrui Meng


The previous benchmark code in `SorterSuite` doesn't reset the array after each 
run. So we were comparing both algorithms on already ordered arrays. With the 
correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On 
Java 7, I got the following on arrays of size 25 million:

{code}
uple-sort using Arrays.sort(): Took 25626 ms
Tuple-sort using Arrays.sort(): Took 28018 ms
Tuple-sort using Arrays.sort(): Took 26932 ms
Tuple-sort using Arrays.sort(): Took 24436 ms
Tuple-sort using Arrays.sort(): Took 25894 ms
Tuple-sort using Arrays.sort(): Took 24965 ms
Tuple-sort using Arrays.sort(): Took 23817 ms
Tuple-sort using Arrays.sort(): Took 23692 ms
Tuple-sort using Arrays.sort(): Took 26731 ms
Tuple-sort using Arrays.sort(): Took 23667 ms
Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
KV-sort using Sorter: Took 39579 ms
KV-sort using Sorter: Took 39176 ms
KV-sort using Sorter: Took 41760 ms
KV-sort using Sorter: Took 42469 ms
KV-sort using Sorter: Took 43133 ms
KV-sort using Sorter: Took 41692 ms
KV-sort using Sorter: Took 39585 ms
KV-sort using Sorter: Took 41617 ms
KV-sort using Sorter: Took 42300 ms
KV-sort using Sorter: Took 48274 ms
KV-sort using Sorter: (47217 ms first try, 41958 ms average)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to