[GitHub] [kafka] artemlivshits commented on a change in pull request #11721: KAFKA-13629: use faster algorithm for ByteUtils sizeOfXxx algorithm

GitBox Wed, 02 Feb 2022 12:27:51 -0800


artemlivshits commented on a change in pull request #11721:
URL: https://github.com/apache/kafka/pull/11721#discussion_r797932888




##########
File path: clients/src/main/java/org/apache/kafka/common/utils/ByteUtils.java
##########
@@ -386,18 +386,39 @@ public static void writeDouble(double value, ByteBuffer 
buffer) {
         buffer.putDouble(value);
     }
 
+    final static int[] LEADING_ZEROS_TO_U_VARINT_SIZE = new int[] {
+        // 32 bits, and each 7-bits adds one byte to the output
+        5, 5, 5, 5, // 32
+        4, 4, 4, 4, 4, 4, 4, // 28
+        3, 3, 3, 3, 3, 3, 3, // 21
+        2, 2, 2, 2, 2, 2, 2, // 14
+        1, 1, 1, 1, 1, 1, 1, // 7
+        1 // 0
+    };
+
+    final static int[] LEADING_ZEROS_TO_U_VARLONG_SIZE = new int[] {
+        // 64 bits, and each 7-bits adds one byte to the output
+        10, // 64
+        9, 9, 9, 9, 9, 9, 9, // 63
+        8, 8, 8, 8, 8, 8, 8, // 56
+        7, 7, 7, 7, 7, 7, 7, // 49
+        6, 6, 6, 6, 6, 6, 6, // 42
+        5, 5, 5, 5, 5, 5, 5, // 35
+        4, 4, 4, 4, 4, 4, 4, // 28
+        3, 3, 3, 3, 3, 3, 3, // 21
+        2, 2, 2, 2, 2, 2, 2, // 14
+        1, 1, 1, 1, 1, 1, 1, // 7
+        1 // 0
+    };
+
     /**
      * Number of bytes needed to encode an integer in unsigned variable-length 
format.
      *
      * @param value The signed value
      */
     public static int sizeOfUnsignedVarint(int value) {
-        int bytes = 1;
-        while ((value & 0xffffff80) != 0L) {
-            bytes += 1;
-            value >>>= 7;
-        }
-        return bytes;
+        int leadingZeros = Integer.numberOfLeadingZeros(value);
+        return LEADING_ZEROS_TO_U_VARINT_SIZE[leadingZeros];

Review comment:
       Interesting.  It's unfortunate that the second operation is a div in 
Java (I'm new to Java, so maybe there is a way to express it differently, but I 
couldn't think of one), in C++ I would've just done:
   ```
      return (38 - leadingZeros) / 7 + (leadingZeros == 32);
   ```
   which would just do cmp and add carryover bit, which is cheaper than div.  
Maybe the compiler would be smart enough to translate ((leadingZeros == 32) ? 1 
: 0) into math expression rather than do a branch?
   In these benchmarks, the lookup table is likely cached in L1 cache (hot loop 
that hits the same small amount of data), so memory access in the benchmark is 
likely cheaper than on average.  It's probably hard to do a proper model and 
I'm not sure if it's worth it.
   In any case, thanks for doing this comprehensive research, good stuff!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kafka] artemlivshits commented on a change in pull request #11721: KAFKA-13629: use faster algorithm for ByteUtils sizeOfXxx algorithm

Reply via email to