Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22338#discussion_r215192739
  
    --- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java ---
    @@ -69,22 +70,27 @@ public static int hashUnsafeWords(Object base, long 
offset, int lengthInBytes, i
       }
     
       public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
    +    return hashUnsafeBytesBlock(base, Ints.checkedCast(base.size()), seed);
    +  }
    +
    +  private static int hashUnsafeBytesBlock(MemoryBlock base, int 
lengthInBytes, int seed) {
         // This is not compatible with original and another implementations.
         // But remain it for backward compatibility for the components 
existing before 2.3.
    -    int lengthInBytes = Ints.checkedCast(base.size());
         assert (lengthInBytes >= 0): "lengthInBytes cannot be negative";
         int lengthAligned = lengthInBytes - lengthInBytes % 4;
    -    int h1 = hashBytesByIntBlock(base.subBlock(0, lengthAligned), seed);
    +    int h1 = hashBytesByIntBlock(base, lengthAligned, seed);
    +    long offset = base.getBaseOffset();
    +    Object o = base.getBaseObject();
         for (int i = lengthAligned; i < lengthInBytes; i++) {
    -      int halfWord = base.getByte(i);
    +      int halfWord = Platform.getByte(o, offset + i);
    --- End diff --
    
    So seems the performance regression is due to the cost of virtual function 
calls on `MemoryBlock`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to