On Mon, 13 Dec 2021 09:39:55 GMT, Сергей Цыпанов <d...@openjdk.java.net> wrote:

> Originally this was spotted by by Amir Hadadi in 
> https://stackoverflow.com/questions/70272651/missing-bounds-checking-elimination-in-string-constructor
> 
> It looks like in the following code in `String(byte[], int, int, Charset)`
> 
> while (offset < sl) {
>     int b1 = bytes[offset];
>     if (b1 >= 0) {
>         dst[dp++] = (byte)b1;
>         offset++;                                  // <---
>         continue;
>     }
>     if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
>             offset + 1 < sl) {
>         int b2 = bytes[offset + 1];
>         if (!isNotContinuation(b2)) {
>             dst[dp++] = (byte)decode2(b1, b2);
>             offset += 2;
>             continue;
>         }
>     }
>     // anything not a latin1, including the repl
>     // we have to go with the utf16
>     break;
> }
> 
> bounds check elimination is not executed when accessing byte array via 
> `bytes[offset].
> 
> The reason, I guess, is that offset variable is modified within the loop 
> (marked with arrow).
> 
> Possible fix for this could be changing:
> 
> `while (offset < sl)` ---> `while (offset >= 0 && offset < sl)`
> 
> However the best is to invest in C2 optimization to handle all such cases.
> 
> The following benchmark demonstrates good improvement:
> 
> @State(Scope.Thread)
> @BenchmarkMode(Mode.AverageTime)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> public class StringConstructorBenchmark {
>   private byte[] array;
>   private String str;
> 
>   @Setup
>   public void setup() {
>     str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. 
> Я";//Latin1 ending with Russian
>     array = str.getBytes(StandardCharsets.UTF_8);
>   }
> 
>   @Benchmark
>   public String newString()  {
>       return new String(array, 0, array.length, StandardCharsets.UTF_8);
>   }
> 
>   @Benchmark
>   public String translateEscapes()  {
>       return str.translateEscapes();
>   }
> }
> 
> Results:
> 
> //baseline
> Benchmark Mode Cnt Score Error Units
> StringConstructorBenchmark.newString avgt 50 173,092 ± 3,048 ns/op
> 
> //patched
> Benchmark Mode Cnt Score Error Units
> StringConstructorBenchmark.newString avgt 50 126,908 ± 2,355 ns/op
> 
> The same is observed in String.translateEscapes() for the same String as in 
> the benchmark above:
> 
> //baseline
> Benchmark Mode Cnt Score Error Units
> StringConstructorBenchmark.translateEscapes avgt 100 53,627 ± 0,850 ns/op
> 
> //patched
> Benchmark Mode Cnt Score Error Units
> StringConstructorBenchmark.translateEscapes avgt 100 48,087 ± 1,129 ns/op
> 
> Also I've looked into this with `LinuxPerfAsmProfiler`, full output for 
> baseline is available here 
> https://gist.github.com/stsypanov/d2524f98477d633fb1d4a2510fedeea6 and for 
> patched code here 
> https://gist.github.com/stsypanov/16c787e4f9fa3dd122522f16331b68b7.
> 
> Here's the part of baseline assembly responsible for `while` loop: 
> 
>   3.62%           ││            │  0x00007fed70eb4c1c:   mov    %ebx,%ecx
>   2.29%           ││            │  0x00007fed70eb4c1e:   mov    %edx,%r9d
>   2.22%           ││            │  0x00007fed70eb4c21:   mov    (%rsp),%r8    
>                ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
>                   ││            │                                             
>                ; - java.lang.String::&lt;init&gt;@107 (line 537)
>   2.32%           ↘│            │  0x00007fed70eb4c25:   cmp    %r13d,%ecx
>                    │            │  0x00007fed70eb4c28:   jge    
> 0x00007fed70eb5388           ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>                    │            │                                             
>                ; - java.lang.String::&lt;init&gt;@110 (line 537)
>   3.05%            │            │  0x00007fed70eb4c2e:   cmp    0x8(%rsp),%ecx
>                    │            │  0x00007fed70eb4c32:   jae    
> 0x00007fed70eb5319
>   2.38%            │            │  0x00007fed70eb4c38:   mov    %r8,(%rsp)
>   2.64%            │            │  0x00007fed70eb4c3c:   movslq %ecx,%r8
>   2.46%            │            │  0x00007fed70eb4c3f:   mov    %rax,%rbx
>   3.44%            │            │  0x00007fed70eb4c42:   sub    %r8,%rbx
>   2.62%            │            │  0x00007fed70eb4c45:   add    $0x1,%rbx
>   2.64%            │            │  0x00007fed70eb4c49:   and    
> $0xfffffffffffffffe,%rbx
>   2.30%            │            │  0x00007fed70eb4c4d:   mov    %ebx,%r8d
>   3.08%            │            │  0x00007fed70eb4c50:   add    %ecx,%r8d
>   2.55%            │            │  0x00007fed70eb4c53:   movslq %r8d,%r8
>   2.45%            │            │  0x00007fed70eb4c56:   add    
> $0xfffffffffffffffe,%r8
>   2.13%            │            │  0x00007fed70eb4c5a:   cmp    (%rsp),%r8
>                    │            │  0x00007fed70eb4c5e:   jae    
> 0x00007fed70eb5319
>   3.36%            │            │  0x00007fed70eb4c64:   mov    %ecx,%edi     
>                ;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
>                    │            │                                             
>                ; - java.lang.String::&lt;init&gt;@113 (line 538)
>   2.86%            │           ↗│  0x00007fed70eb4c66:   movsbl 
> 0x10(%r14,%rdi,1),%r8d       ;*baload {reexecute=0 rethrow=0 return_oop=0}
>                    │           ││                                             
>                ; - java.lang.String::&lt;init&gt;@115 (line 538)
>   2.48%            │           ││  0x00007fed70eb4c6c:   mov    %r9d,%edx
>   2.26%            │           ││  0x00007fed70eb4c6f:   inc    %edx          
>                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>                    │           ││                                             
>                ; - java.lang.String::&lt;init&gt;@127 (line 540)
>   3.28%            │           ││  0x00007fed70eb4c71:   mov    %edi,%ebx
>   2.44%            │           ││  0x00007fed70eb4c73:   inc    %ebx          
>                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>                    │           ││                                             
>                ; - java.lang.String::&lt;init&gt;@134 (line 541)
>   2.35%            │           ││  0x00007fed70eb4c75:   test   %r8d,%r8d
>                    ╰           ││  0x00007fed70eb4c78:   jge    
> 0x00007fed70eb4c04           ;*iflt {reexecute=0 rethrow=0 return_oop=0}
>                                ││                                             
>                ; - java.lang.String::&lt;init&gt;@120 (line 539)
> 
> and this one is for patched code:
> 
>  17.28%           ││  0x00007f6b88eb6061:   mov    %edx,%r10d                 
>   ;*iload_2 {reexecute=0 rethrow=0 return_oop=0}
>                   ││                                                          
>   ; - java.lang.String::&lt;init&gt;@107 (line 537)
>   0.11%           ↘│  0x00007f6b88eb6064:   test   %r10d,%r10d
>                    │  0x00007f6b88eb6067:   jl     0x00007f6b88eb669c         
>   ;*iflt {reexecute=0 rethrow=0 return_oop=0}
>                    │                                                          
>   ; - java.lang.String::&lt;init&gt;@108 (line 537)
>   0.39%            │  0x00007f6b88eb606d:   cmp    %r13d,%r10d
>                    │  0x00007f6b88eb6070:   jge    0x00007f6b88eb66d0         
>   ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
>                    │                                                          
>   ; - java.lang.String::&lt;init&gt;@114 (line 537)
>   0.66%            │  0x00007f6b88eb6076:   mov    %ebx,%r9d
>  13.70%            │  0x00007f6b88eb6079:   cmp    0x8(%rsp),%r10d
>   0.01%            │  0x00007f6b88eb607e:   jae    0x00007f6b88eb6671
>   0.14%            │  0x00007f6b88eb6084:   movsbl 0x10(%r14,%r10,1),%edi     
>   ;*baload {reexecute=0 rethrow=0 return_oop=0}
>                    │                                                          
>   ; - java.lang.String::&lt;init&gt;@119 (line 538)
>   0.37%            │  0x00007f6b88eb608a:   mov    %r9d,%ebx
>   0.99%            │  0x00007f6b88eb608d:   inc    %ebx                       
>   ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>                    │                                                          
>   ; - java.lang.String::&lt;init&gt;@131 (line 540)
>  12.88%            │  0x00007f6b88eb608f:   movslq %r9d,%rsi                  
>   ;*bastore {reexecute=0 rethrow=0 return_oop=0}
>                    │                                                          
>   ; - java.lang.String::&lt;init&gt;@196 (line 548)
>   0.17%            │  0x00007f6b88eb6092:   mov    %r10d,%edx
>   0.39%            │  0x00007f6b88eb6095:   inc    %edx                       
>   ;*iinc {reexecute=0 rethrow=0 return_oop=0}
>                    │                                                          
>   ; - java.lang.String::&lt;init&gt;@138 (line 541)
>   0.96%            │  0x00007f6b88eb6097:   test   %edi,%edi
>   0.02%            │  0x00007f6b88eb6099:   jl     0x00007f6b88eb60dc         
>   ;*iflt {reexecute=0 rethrow=0 return_oop=0}
> 
> Between instructions mapped to `if_icmpge` and `aload_1` in baseline we have 
> bounds check which is missing from patched code.

Hi, the problem here does not seem to lie in the bound check eliminations, as 
the patched version still contains the bound checks in these lines:

    13.70%            │  0x00007f6b88eb6079:   cmp    0x8(%rsp),%r10d
     0.01%            │  0x00007f6b88eb607e:   jae    0x00007f6b88eb6671

The problem, at first glance, seems to be that our compiled code is trying to 
compute this mysterious number

    long temp = sl;
    loop {
        long newTemp = (long)((int)((temp - (long)offset + 1) & (-2L)) + 
offset) - 2L;
        if (newTemp u>= temp) jump;
        temp = newTemp;
    }

I think further investigation is needed, maybe our compiler is trying too hard 
to deal with various appearance of `offset` in the loop.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6812

Reply via email to