[GitHub] [kafka] jasonk000 commented on a change in pull request #11721: KAFKA-13629: use faster algorithm for ByteUtils sizeOfXxx algorithm

GitBox Wed, 02 Feb 2022 12:53:19 -0800


jasonk000 commented on a change in pull request #11721:
URL: https://github.com/apache/kafka/pull/11721#discussion_r798002944




##########
File path: clients/src/main/java/org/apache/kafka/common/utils/ByteUtils.java
##########
@@ -386,18 +386,39 @@ public static void writeDouble(double value, ByteBuffer 
buffer) {
         buffer.putDouble(value);
     }
 
+    final static int[] LEADING_ZEROS_TO_U_VARINT_SIZE = new int[] {
+        // 32 bits, and each 7-bits adds one byte to the output
+        5, 5, 5, 5, // 32
+        4, 4, 4, 4, 4, 4, 4, // 28
+        3, 3, 3, 3, 3, 3, 3, // 21
+        2, 2, 2, 2, 2, 2, 2, // 14
+        1, 1, 1, 1, 1, 1, 1, // 7
+        1 // 0
+    };
+
+    final static int[] LEADING_ZEROS_TO_U_VARLONG_SIZE = new int[] {
+        // 64 bits, and each 7-bits adds one byte to the output
+        10, // 64
+        9, 9, 9, 9, 9, 9, 9, // 63
+        8, 8, 8, 8, 8, 8, 8, // 56
+        7, 7, 7, 7, 7, 7, 7, // 49
+        6, 6, 6, 6, 6, 6, 6, // 42
+        5, 5, 5, 5, 5, 5, 5, // 35
+        4, 4, 4, 4, 4, 4, 4, // 28
+        3, 3, 3, 3, 3, 3, 3, // 21
+        2, 2, 2, 2, 2, 2, 2, // 14
+        1, 1, 1, 1, 1, 1, 1, // 7
+        1 // 0
+    };
+
     /**
      * Number of bytes needed to encode an integer in unsigned variable-length 
format.
      *
      * @param value The signed value
      */
     public static int sizeOfUnsignedVarint(int value) {
-        int bytes = 1;
-        while ((value & 0xffffff80) != 0L) {
-            bytes += 1;
-            value >>>= 7;
-        }
-        return bytes;
+        int leadingZeros = Integer.numberOfLeadingZeros(value);
+        return LEADING_ZEROS_TO_U_VARINT_SIZE[leadingZeros];

Review comment:
       Yes, I agree - the arithmetic / bitshift looks intuitively quicker since 
it requires no loads. Looking at the compiler output from JDK11, it does get 
compiled down to a sequence of operations that are reasonably clean (no DIV), 
just that there are a lot of instructions. Perhaps there is a slight advantage 
in different use-cases, but, in my case this fix takes the function completely 
out of the profiler results so it solves the problem.
   
   Out of interest sake, here is the sequence of operations according to 
perfasm on JDK11.
   ```
            ││  0x00007f26e04797b7: mov     0x10(%r12,%r10,8),%r10d
     0.05%  ││  0x00007f26e04797bc: lzcnt   %r10d,%r11d       ;*invokestatic 
numberOfLeadingZeros {reexecute=0 rethrow=0 return_oop=0}
            ││                                                ; - 
org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintMathOne@6 
(line 76)
            ││                                                ; - 
org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintMathOne_jmhTest::testSizeOfUnsignedVarintMathOne_thrpt_jmhStub@17
 (line 119)
     0.03%  ││  0x00007f26e04797c1: mov     $0x26,%r8d
     0.01%  ││  0x00007f26e04797c7: sub     %r11d,%r8d        ;*isub 
{reexecute=0 rethrow=0 return_oop=0}
            ││                                                ; - 
org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintMathOne@13
 (line 77)
            ││                                                ; - 
org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintMathOne_jmhTest::testSizeOfUnsignedVarintMathOne_thrpt_jmhStub@17
 (line 119)
     0.49%  ││  0x00007f26e04797ca: mov     %r11d,%r9d
     0.36%  ││  0x00007f26e04797cd: sar     $0x1f,%r9d
     0.02%  ││  0x00007f26e04797d1: movsxd  %r8d,%r10
     0.77%  ││  0x00007f26e04797d4: shr     $0x1b,%r9d
     7.24%  ││  0x00007f26e04797d8: add     %r11d,%r9d
     0.78%  ││  0x00007f26e04797db: imulq   $0x92492493,%r10,%r10
     0.05%  ││  0x00007f26e04797e2: sar     $0x5,%r9d
     2.24%  ││  0x00007f26e04797e6: sar     $0x20,%r10
     9.93%  ││  0x00007f26e04797ea: movsxd  %r9d,%r11
     0.12%  ││  0x00007f26e04797ed: mov     %r10d,%r10d
            ││  0x00007f26e04797f0: add     %r8d,%r10d
     6.30%  ││  0x00007f26e04797f3: sar     $0x1f,%r8d
     2.53%  ││  0x00007f26e04797f7: sar     $0x2,%r10d
     6.21%  ││  0x00007f26e04797fb: movsxd  %r8d,%r8
            ││  0x00007f26e04797fe: movsxd  %r10d,%rdx
     7.25%  ││  0x00007f26e0479801: sub     %r8,%rdx
     7.01%  ││  0x00007f26e0479804: add     %r11,%rdx         ;*i2l 
{reexecute=0 rethrow=0 return_oop=0}
            ││                                                ; - 
org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintMathOne@22
 (line 77)
            ││                                                ; - 
org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintMathOne_jmhTest::testSizeOfUnsignedVarintMathOne_thrpt_jmhStub@17
 (line 119)
     8.11%  ││  0x00007f26e0479807: mov     (%rsp),%rsi
   ```
   
   Lookup table:
   ```
            │  0x00007fde904792a7: mov     0x10(%r12,%r10,8),%r10d
     1.42%  │  0x00007fde904792ac: lzcnt   %r10d,%r10d       ;*invokestatic 
numberOfLeadingZeros {reexecute=0 rethrow=0 return_oop=0}
            │                                                ; - 
org.apache.kafka.common.utils.ByteUtils::sizeOfUnsignedVarint@1 (line 420)
            │                                                ; - 
org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintOne@6 
(line 61)
            │                                                ; - 
org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintOne_jmhTest::testSizeOfUnsignedVarintOne_thrpt_jmhStub@17
 (line 119)
     1.26%  │  0x00007fde904792b1: cmp     $0x21,%r10d
     2.65%  │  0x00007fde904792b5: jnb     0x7fde90479332
     0.02%  │  0x00007fde904792b7: mov     %r8,0x40(%rsp)
     0.02%  │  0x00007fde904792bc: movabs  $0xfe434bf0,%r11  ;   
{oop([I{0x00000000fe434bf0})}
     1.68%  │  0x00007fde904792c6: movsxd  0x10(%r11,%r10,4),%rdx  ;*i2l 
{reexecute=0 rethrow=0 return_oop=0}
            │                                                ; - 
org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintOne@9 
(line 61)
            │                                                ; - 
org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintOne_jmhTest::testSizeOfUnsignedVarintOne_thrpt_jmhStub@17
 (line 119)
    37.64%  │  0x00007fde904792cb: mov     (%rsp),%rsi
   ```
   
   It may be that the code in yours is faster when dropped into a full 
application.
   
   Perhaps there are a more direct series of shift operations that would be 
better suited?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] jasonk000 commented on a change in pull request #11721: KAFKA-13629: use faster algorithm for ByteUtils sizeOfXxx algorithm

Reply via email to