twmb commented on a change in pull request #11721: URL: https://github.com/apache/kafka/pull/11721#discussion_r801157664
########## File path: clients/src/main/java/org/apache/kafka/common/utils/ByteUtils.java ########## @@ -386,18 +386,39 @@ public static void writeDouble(double value, ByteBuffer buffer) { buffer.putDouble(value); } + final static int[] LEADING_ZEROS_TO_U_VARINT_SIZE = new int[] { + // 32 bits, and each 7-bits adds one byte to the output + 5, 5, 5, 5, // 32 + 4, 4, 4, 4, 4, 4, 4, // 28 + 3, 3, 3, 3, 3, 3, 3, // 21 + 2, 2, 2, 2, 2, 2, 2, // 14 + 1, 1, 1, 1, 1, 1, 1, // 7 + 1 // 0 + }; + + final static int[] LEADING_ZEROS_TO_U_VARLONG_SIZE = new int[] { + // 64 bits, and each 7-bits adds one byte to the output + 10, // 64 + 9, 9, 9, 9, 9, 9, 9, // 63 + 8, 8, 8, 8, 8, 8, 8, // 56 + 7, 7, 7, 7, 7, 7, 7, // 49 + 6, 6, 6, 6, 6, 6, 6, // 42 + 5, 5, 5, 5, 5, 5, 5, // 35 + 4, 4, 4, 4, 4, 4, 4, // 28 + 3, 3, 3, 3, 3, 3, 3, // 21 + 2, 2, 2, 2, 2, 2, 2, // 14 + 1, 1, 1, 1, 1, 1, 1, // 7 + 1 // 0 + }; + /** * Number of bytes needed to encode an integer in unsigned variable-length format. * * @param value The signed value */ public static int sizeOfUnsignedVarint(int value) { - int bytes = 1; - while ((value & 0xffffff80) != 0L) { - bytes += 1; - value >>>= 7; - } - return bytes; + int leadingZeros = Integer.numberOfLeadingZeros(value); + return LEADING_ZEROS_TO_U_VARINT_SIZE[leadingZeros]; Review comment: Sorry to chime in late here, but I'm wondering if a lookup table may still be faster. You can see these two instructions in your lookup table implementation above, ``` 1.26% │ 0x00007fde904792b1: cmp $0x21,%r10d 2.65% │ 0x00007fde904792b5: jnb 0x7fde90479332 ``` this is comparing if the calculated number (leading zeroes) fits within the size-33 array. Does Java have the concept of a 256 sized array, and uint8 numbers? In my Go implementation (which uses bits.Len, i.e., the opposite of leading zeroes), using a size-256 string and a uint8 allows for no bounds check, resulting in even fewer instructions and shaving a fraction of a nanosecond of time in comparison to the pure math version. In this benchmark, ```go func BenchmarkUvarintLen(b *testing.B) { z := 0 for i := 0; i < b.N; i++ { z += UvarintLen(uint32(i)) } b.Log(z) } ``` Comparing the existing code to this code: ```go func UvarintLen(u uint32) int { lz := bits.LeadingZeros32(u) return (((38 - lz) * 0b10010010010010011) >> 19) + (lz >> 5)} ``` Top is math version, bottom is lookup table: ``` BenchmarkUvarintLen-8 1000000000 1.179 ns/op BenchmarkUvarintLen-8 1000000000 0.7811 ns/op ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org