[ https://issues.apache.org/jira/browse/SPARK-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603174#comment-16603174 ]
Marco Gaido commented on SPARK-25317: ------------------------------------- I think I have a fix for this. I can submit a PR if you want, but I am still not sure about the root cause of the regression. My best guess is that there are more than one reason and the perf improvement happens iff all the reasons are fixed, which is rather strange to me. > MemoryBlock performance regression > ---------------------------------- > > Key: SPARK-25317 > URL: https://issues.apache.org/jira/browse/SPARK-25317 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Wenchen Fan > Priority: Blocker > > eThere is a performance regression when calculating hash code for UTF8String: > {code:java} > test("hashing") { > import org.apache.spark.unsafe.hash.Murmur3_x86_32 > import org.apache.spark.unsafe.types.UTF8String > val hasher = new Murmur3_x86_32(0) > val str = UTF8String.fromString("b" * 10001) > val numIter = 100000 > val start = System.nanoTime > for (i <- 0 until numIter) { > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > Murmur3_x86_32.hashUTF8String(str, 0) > } > val duration = (System.nanoTime() - start) / 1000 / numIter > println(s"duration $duration us") > } > {code} > To run this test in 2.3, we need to add > {code:java} > public static int hashUTF8String(UTF8String str, int seed) { > return hashUnsafeBytes(str.getBaseObject(), str.getBaseOffset(), > str.numBytes(), seed); > } > {code} > to `Murmur3_x86_32` > In my laptop, the result for master vs 2.3 is: 120 us vs 40 us -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org