Hi Ivan! Update: http://cr.openjdk.java.net/~shade/8136500/webrev.02/
Performance data updated (shaves off 0.5 ns at most): http://cr.openjdk.java.net/~shade/8136500/notes.txt On 11/21/2015 11:38 PM, Ivan Gerasimov wrote: > Just by a coincidence, I was experimenting with the same piece of code :) Not a surprise: that code screams for cleaning up. I know others tried to optimize at least stringSize before. > 1) > When benchmarking different implementation of Integer.stringSize() the > fastest (on my machine) variants were with explicitly unrolled > switch-like constructions and hard-coded binary search with guaranteed > no-more-than-four comparisons: > They appeared to be twice as fast as the original implementation. > However, the simple loop that you have is surely cleaner. > > static int stringSize_switch(int x) { > if (x < 10) return 1; > if (x < 100) return 2; > if (x < 1000) return 3; > if (x < 10000) return 4; > if (x < 100000) return 5; > if (x < 1000000) return 6; > if (x < 10000000) return 7; > if (x < 100000000) return 8; > if (x < 1000000000) return 9; > return 10; > } Yes. The thing is, if you look into the code generated by C2, then you will see the similar shape, because the loop is aggressively unrolled and constants are computed. This seems like a no-brainer for any decent compiler. I said that in the stringSize docs as the reminder. > static int stringSize_binchop(int x) { Binary search is an obvious idea too, but again the trouble is that many integers are small, and you are much better off linear-searching from small to large numbers. This is further exacerbated by the fact byte/short values are normally coerced to integers, and go through Integer.toString. I tried to capture that in stringSize docs as well. > > 2) > The lookup tables DigitTens and DigitOnes are always used together. > Packing them in an one table can save us one array access for the price > of little arithmetic. > > static final int[] TwoDigits = { > ('0'<<16)+'0', ('0'<<16)+'1', ('0'<<16)+'2', ('0'<<16)+'3', > ... > int twoDig = TwoDigits[r]; > buf[--charPos] = (byte)twoDig; > buf[--charPos] = (byte)(twoDig >> 16); > ... I agree, that's a nifty idea. As the improvement, you can do char[] array and access as (byte)v and (byte)(v >> 8). However, both variants are not improving against my current patch... # JDK 9 Patched (webrev.01) IntegerToString.test 0 avgt 9 7.008 ± 0.252 ns/op IntegerToString.test 1 avgt 9 16.352 ± 0.206 ns/op IntegerToString.test 2 avgt 9 18.064 ± 0.598 ns/op IntegerToString.test 3 avgt 9 18.626 ± 0.127 ns/op IntegerToString.test 4 avgt 9 20.591 ± 0.295 ns/op IntegerToString.test 5 avgt 9 20.796 ± 0.106 ns/op IntegerToString.test 6 avgt 9 22.585 ± 0.083 ns/op IntegerToString.test 7 avgt 9 23.247 ± 1.231 ns/op IntegerToString.test 8 avgt 9 24.090 ± 0.289 ns/op # JDK 9 Patched (webrev.01 + lookup tables combining) Benchmark (mag) Mode Cnt Score Error Units IntegerToString.test 0 avgt 9 6.846 ± 0.119 ns/op IntegerToString.test 1 avgt 9 16.315 ± 0.157 ns/op IntegerToString.test 2 avgt 9 17.967 ± 0.127 ns/op IntegerToString.test 3 avgt 9 18.429 ± 0.226 ns/op IntegerToString.test 4 avgt 9 20.653 ± 0.419 ns/op IntegerToString.test 5 avgt 9 21.104 ± 0.049 ns/op IntegerToString.test 6 avgt 9 22.728 ± 0.292 ns/op IntegerToString.test 7 avgt 9 24.014 ± 0.153 ns/op IntegerToString.test 8 avgt 9 25.626 ± 0.340 ns/op Therefore, I'd prefer to keep the cleaner (i.e. split lookup tables) alternative. But that reminds me we can omit duplicating the lookup tables for different coder, and use only a byte[] pair. This caters for the most frequent case of Latin1-encoded Strings. > 3) > In Integer.toString(int i) > > 463 if (i == Integer.MIN_VALUE) > 464 return "-2147483648"; > 465 int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i); > > It might be reasonable to reorganize the code a bit: > > int size = 0, i1 = i; > if (i < 0) { > if (i == Integer.MIN_VALUE) > return "-2147483648"; > size = 1; > i1 = -i; > } > size += stringSize(i1); > > First, we would skip one comparison of positive i with MIN_INT. Yes, I like this one. Grabbed a modified version of this approach. > Second, I think, the code of stringSize() might be better inlined, if it > was used only once. No-no! That's a helpful little fella. Not-yet-integrated Indify String Concat uses Integer/Long.stringSize. Thanks, -Aleksey