[ 
https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603174#comment-16603174
 ] 

Marco Gaido commented on SPARK-25317:
-------------------------------------

I think I have a fix for this. I can submit a PR if you want, but I am still 
not sure about the root cause of the regression. My best guess is that there 
are more than one reason and the perf improvement happens iff all the reasons 
are fixed, which is rather strange to me.

> MemoryBlock performance regression
> ----------------------------------
>
>                 Key: SPARK-25317
>                 URL: https://issues.apache.org/jira/browse/SPARK-25317
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Wenchen Fan
>            Priority: Blocker
>
> eThere is a performance regression when calculating hash code for UTF8String:
> {code:java}
>   test("hashing") {
>     import org.apache.spark.unsafe.hash.Murmur3_x86_32
>     import org.apache.spark.unsafe.types.UTF8String
>     val hasher = new Murmur3_x86_32(0)
>     val str = UTF8String.fromString("b" * 10001)
>     val numIter = 100000
>     val start = System.nanoTime
>     for (i <- 0 until numIter) {
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>       Murmur3_x86_32.hashUTF8String(str, 0)
>     }
>     val duration = (System.nanoTime() - start) / 1000 / numIter
>     println(s"duration $duration us")
>   }
> {code}
> To run this test in 2.3, we need to add
> {code:java}
> public static int hashUTF8String(UTF8String str, int seed) {
>     return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), 
> str.numBytes(), seed);
>   }
> {code}
> to `Murmur3_x86_32`
> In my laptop, the result for master vs 2.3 is: 120 us vs 40 us



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to