[ 
https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602512#comment-16602512
 ] 

Jungtaek Lim commented on SPARK-25317:
--------------------------------------

Why not running test with JMH, applying warmup and iteration? Not sure it can 
be applied to scala test, but the Java test code should be simple if these 
Spark classes are aware of interop.

> MemoryBlock performance regression
> ----------------------------------
>
>                 Key: SPARK-25317
>                 URL: https://issues.apache.org/jira/browse/SPARK-25317
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Wenchen Fan
>            Priority: Blocker
>
> eThere is a performance regression when calculating hash code for UTF8String:
> {code:java}
>   test("hashing") {
>     import org.apache.spark.unsafe.hash.Murmur3_x86_32
>     import org.apache.spark.unsafe.types.UTF8String
>     val hasher = new Murmur3_x86_32(0)
>     val str = UTF8String.fromString("b" * 10001)
>     val numIter = 100000
>     val start = System.nanoTime
>     for (i <- 0 until numIter) {
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>     }
>     val duration = (System.nanoTime() - start) / 1000 / numIter
>     println(s"duration $duration us")
>   }
> {code}
> To run this test in 2.3, we need to add
> {code:java}
> public static int hashUTF8String(UTF8String str, int seed) {
>     return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), 
> str.numBytes(), seed);
>   }
> {code}
> to `Murmur3_x86_32`
> In my laptop, the result for master vs 2.3 is: 120 us vs 40 us



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to