jasonk000 commented on a change in pull request #11721: URL: https://github.com/apache/kafka/pull/11721#discussion_r798002944
########## File path: clients/src/main/java/org/apache/kafka/common/utils/ByteUtils.java ########## @@ -386,18 +386,39 @@ public static void writeDouble(double value, ByteBuffer buffer) { buffer.putDouble(value); } + final static int[] LEADING_ZEROS_TO_U_VARINT_SIZE = new int[] { + // 32 bits, and each 7-bits adds one byte to the output + 5, 5, 5, 5, // 32 + 4, 4, 4, 4, 4, 4, 4, // 28 + 3, 3, 3, 3, 3, 3, 3, // 21 + 2, 2, 2, 2, 2, 2, 2, // 14 + 1, 1, 1, 1, 1, 1, 1, // 7 + 1 // 0 + }; + + final static int[] LEADING_ZEROS_TO_U_VARLONG_SIZE = new int[] { + // 64 bits, and each 7-bits adds one byte to the output + 10, // 64 + 9, 9, 9, 9, 9, 9, 9, // 63 + 8, 8, 8, 8, 8, 8, 8, // 56 + 7, 7, 7, 7, 7, 7, 7, // 49 + 6, 6, 6, 6, 6, 6, 6, // 42 + 5, 5, 5, 5, 5, 5, 5, // 35 + 4, 4, 4, 4, 4, 4, 4, // 28 + 3, 3, 3, 3, 3, 3, 3, // 21 + 2, 2, 2, 2, 2, 2, 2, // 14 + 1, 1, 1, 1, 1, 1, 1, // 7 + 1 // 0 + }; + /** * Number of bytes needed to encode an integer in unsigned variable-length format. * * @param value The signed value */ public static int sizeOfUnsignedVarint(int value) { - int bytes = 1; - while ((value & 0xffffff80) != 0L) { - bytes += 1; - value >>>= 7; - } - return bytes; + int leadingZeros = Integer.numberOfLeadingZeros(value); + return LEADING_ZEROS_TO_U_VARINT_SIZE[leadingZeros]; Review comment: Yes, I agree - the arithmetic / bitshift looks intuitively quicker since it requires no loads. Looking at the compiler output from JDK11, it does get compiled down to a sequence of operations that are reasonably clean (no DIV), just that there are a lot of instructions. Perhaps there is a slight advantage in different use-cases, but, in my case this fix takes the function completely out of the profiler results so it solves the problem. Out of interest sake, here is the sequence of operations according to perfasm on JDK11. ``` ││ 0x00007f26e04797b7: mov 0x10(%r12,%r10,8),%r10d 0.05% ││ 0x00007f26e04797bc: lzcnt %r10d,%r11d ;*invokestatic numberOfLeadingZeros {reexecute=0 rethrow=0 return_oop=0} ││ ; - org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintMathOne@6 (line 76) ││ ; - org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintMathOne_jmhTest::testSizeOfUnsignedVarintMathOne_thrpt_jmhStub@17 (line 119) 0.03% ││ 0x00007f26e04797c1: mov $0x26,%r8d 0.01% ││ 0x00007f26e04797c7: sub %r11d,%r8d ;*isub {reexecute=0 rethrow=0 return_oop=0} ││ ; - org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintMathOne@13 (line 77) ││ ; - org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintMathOne_jmhTest::testSizeOfUnsignedVarintMathOne_thrpt_jmhStub@17 (line 119) 0.49% ││ 0x00007f26e04797ca: mov %r11d,%r9d 0.36% ││ 0x00007f26e04797cd: sar $0x1f,%r9d 0.02% ││ 0x00007f26e04797d1: movsxd %r8d,%r10 0.77% ││ 0x00007f26e04797d4: shr $0x1b,%r9d 7.24% ││ 0x00007f26e04797d8: add %r11d,%r9d 0.78% ││ 0x00007f26e04797db: imulq $0x92492493,%r10,%r10 0.05% ││ 0x00007f26e04797e2: sar $0x5,%r9d 2.24% ││ 0x00007f26e04797e6: sar $0x20,%r10 9.93% ││ 0x00007f26e04797ea: movsxd %r9d,%r11 0.12% ││ 0x00007f26e04797ed: mov %r10d,%r10d ││ 0x00007f26e04797f0: add %r8d,%r10d 6.30% ││ 0x00007f26e04797f3: sar $0x1f,%r8d 2.53% ││ 0x00007f26e04797f7: sar $0x2,%r10d 6.21% ││ 0x00007f26e04797fb: movsxd %r8d,%r8 ││ 0x00007f26e04797fe: movsxd %r10d,%rdx 7.25% ││ 0x00007f26e0479801: sub %r8,%rdx 7.01% ││ 0x00007f26e0479804: add %r11,%rdx ;*i2l {reexecute=0 rethrow=0 return_oop=0} ││ ; - org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintMathOne@22 (line 77) ││ ; - org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintMathOne_jmhTest::testSizeOfUnsignedVarintMathOne_thrpt_jmhStub@17 (line 119) 8.11% ││ 0x00007f26e0479807: mov (%rsp),%rsi ``` Lookup table: ``` │ 0x00007fde904792a7: mov 0x10(%r12,%r10,8),%r10d 1.42% │ 0x00007fde904792ac: lzcnt %r10d,%r10d ;*invokestatic numberOfLeadingZeros {reexecute=0 rethrow=0 return_oop=0} │ ; - org.apache.kafka.common.utils.ByteUtils::sizeOfUnsignedVarint@1 (line 420) │ ; - org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintOne@6 (line 61) │ ; - org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintOne_jmhTest::testSizeOfUnsignedVarintOne_thrpt_jmhStub@17 (line 119) 1.26% │ 0x00007fde904792b1: cmp $0x21,%r10d 2.65% │ 0x00007fde904792b5: jnb 0x7fde90479332 0.02% │ 0x00007fde904792b7: mov %r8,0x40(%rsp) 0.02% │ 0x00007fde904792bc: movabs $0xfe434bf0,%r11 ; {oop([I{0x00000000fe434bf0})} 1.68% │ 0x00007fde904792c6: movsxd 0x10(%r11,%r10,4),%rdx ;*i2l {reexecute=0 rethrow=0 return_oop=0} │ ; - org.apache.kafka.jmh.util.ByteUtilsBenchmark::testSizeOfUnsignedVarintOne@9 (line 61) │ ; - org.apache.kafka.jmh.util.jmh_generated.ByteUtilsBenchmark_testSizeOfUnsignedVarintOne_jmhTest::testSizeOfUnsignedVarintOne_thrpt_jmhStub@17 (line 119) 37.64% │ 0x00007fde904792cb: mov (%rsp),%rsi ``` It may be that the code in yours is faster when dropped into a full application. Perhaps there are a more direct series of shift operations that would be better suited? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org