On Thu, 13 May 2021 20:23:16 GMT, Richard Startin 
<github.com+16439049+richardstar...@openjdk.org> wrote:

>> In private correspondence with Vladimir, it was explained that where 
>> Vladimir's code and Laurent's code are identical, including typos 
>> ([Vladimir's 
>> code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672),
>>  [Laurent's 
>> code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719))
>>  it is because Vladimir sent the code to Laurent, not the other way around, 
>> therefore Vladimir's code does not derive from Laurent's, and it does not 
>> derive from mine. I can only trust that this is the case, so please 
>> disregard my claim that this is derivative work when reviewing this PR.
>
> For what it's worth, I benchmarked this implementation radix sort ([adapted 
> here to fit in to my 
> harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782))
>  against a [signed 
> variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478)
>  of what I have claimed this work was derived from and the proposed 
> implementation does not perform favourably on uniformly random data:
> 
> 
> 
> Benchmark                                         (bits)  (padding)  
> (scenario)  (seed)   (size)  Mode  Cnt      Score     Error  Units
> RadixSortBenchmark.jdk                                17          7     
> UNIFORM       0  1000000  avgt    5  11301.950 ± 113.691  us/op
> RadixSortBenchmark.jdk                                23          7     
> UNIFORM       0  1000000  avgt    5  11792.351 ±  60.757  us/op
> RadixSortBenchmark.jdk                                30          7     
> UNIFORM       0  1000000  avgt    5  11184.616 ±  67.094  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned      17          7     
> UNIFORM       0  1000000  avgt    5   9564.626 ±  69.497  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned      23          7     
> UNIFORM       0  1000000  avgt    5   9432.085 ±  58.983  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned      30          7     
> UNIFORM       0  1000000  avgt    5  10772.975 ±  51.848  us/op
> 
> 
> 
> I believe the root cause is a defect in the mechanism employed to skip passes 
> as can be seen by the increased number of instructions and cycles here. In 
> the proposed implementation, instructions is roughly constant as a function 
> of bits. In the case where all passes must be performed (bits = 30), IPC is 
> superior in `unrollOnePassHistogramsSkipLevelsSigned`.
> 
> 
> Benchmark                                                               
> (bits)  (padding)  (scenario)  (seed)   (size)  Mode  Cnt         Score     
> Error      Units
> RadixSortBenchmark.jdk:cycles                                               
> 17          7     UNIFORM       0  1000000  avgt       34976971.877           
>       #/op
> RadixSortBenchmark.jdk:instructions                                         
> 17          7     UNIFORM       0  1000000  avgt       70121142.003           
>       #/op
> RadixSortBenchmark.jdk:cycles                                               
> 23          7     UNIFORM       0  1000000  avgt       32369970.385           
>       #/op
> RadixSortBenchmark.jdk:instructions                                         
> 23          7     UNIFORM       0  1000000  avgt       70201664.963           
>       #/op
> RadixSortBenchmark.jdk:cycles                                               
> 30          7     UNIFORM       0  1000000  avgt       30789736.602           
>       #/op
> RadixSortBenchmark.jdk:instructions                                         
> 30          7     UNIFORM       0  1000000  avgt       70180942.122           
>       #/op
> RadixSortBenchmark.jdk:IPC                                                  
> 30          7     UNIFORM       0  1000000  avgt              2.279           
>  insns/clk
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles                     
> 17          7     UNIFORM       0  1000000  avgt       26983994.479           
>       #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions               
> 17          7     UNIFORM       0  1000000  avgt       62065304.827           
>       #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles                     
> 23          7     UNIFORM       0  1000000  avgt       26161703.124           
>       #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions               
> 23          7     UNIFORM       0  1000000  avgt       63102683.167           
>       #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles                     
> 30          7     UNIFORM       0  1000000  avgt       29780103.795           
>       #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions               
> 30          7     UNIFORM       0  1000000  avgt       74437097.025           
>       #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:IPC                        
> 30          7     UNIFORM       0  1000000  avgt              2.500           
>  insns/clk
> 
> 
> 
> The biggest difference in executed code appears to be that bounds checks are 
> not eliminated in the first counting pass in the proposed implementation:
> 
> 
> ....[Hottest Region 
> 1]..............................................................................
> c2, level 4, io.github.richardstartin.radixsort.RadixSort::jdk, version 402 
> (166 bytes) 
> 
>             0x00007f0d2fb20579: cmp    $0x100,%r10d
>             0x00007f0d2fb20580: jae    0x00007f0d2fb21344
>             0x00007f0d2fb20586: decl   0x10(%rdx,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@107 (line 697)
>             0x00007f0d2fb2058b: cmp    $0x1,%esi
>             0x00007f0d2fb2058e: jle    0x00007f0d2fb2134c
>             0x00007f0d2fb20594: mov    $0x1,%r11d
>             0x00007f0d2fb2059a: xor    %ecx,%ecx
>             0x00007f0d2fb2059c: nopl   0x0(%rax)          ;*aload_2 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@41 (line 694)
>   0.06%  ↗  0x00007f0d2fb205a0: movzbl 0x10(%rbp,%r11,4),%r10d  ;*iand 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@49 (line 694)
>   0.18%  │  0x00007f0d2fb205a6: decl   0x10(%r8,%r10,4)   ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@54 (line 694)
>   1.37%  │  0x00007f0d2fb205ab: mov    0x10(%rbp,%r11,4),%r10d
>   0.42%  │  0x00007f0d2fb205b0: shr    $0x8,%r10d
>   0.34%  │  0x00007f0d2fb205b4: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@66 (line 695)
>   0.50%  │  0x00007f0d2fb205b8: decl   0x10(%r9,%r10,4)   ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@71 (line 695)
>   1.75%  │  0x00007f0d2fb205bd: mov    0x10(%rbp,%r11,4),%r10d
>   0.42%  │  0x00007f0d2fb205c2: shr    $0x10,%r10d
>   0.12%  │  0x00007f0d2fb205c6: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@84 (line 696)
>   0.36%  │  0x00007f0d2fb205ca: decl   0x10(%rbx,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@89 (line 696)
>   1.73%  │  0x00007f0d2fb205cf: mov    0x10(%rbp,%r11,4),%r10d
>   0.26%  │  0x00007f0d2fb205d4: shr    $0x18,%r10d
>   0.38%  │  0x00007f0d2fb205d8: xor    $0x80,%r10d        ;*ixor {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@102 (line 697)
>   0.28%  │  0x00007f0d2fb205df: cmp    $0x100,%r10d
>          │  0x00007f0d2fb205e6: jae    0x00007f0d2fb20bab
>   0.38%  │  0x00007f0d2fb205ec: decl   0x10(%rdx,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@107 (line 697)
>   1.25%  │  0x00007f0d2fb205f1: movzbl 0x14(%rbp,%r11,4),%r10d  ;*iand 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@49 (line 694)
>   1.63%  │  0x00007f0d2fb205f7: decl   0x10(%r8,%r10,4)   ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@54 (line 694)
>   1.71%  │  0x00007f0d2fb205fc: mov    0x14(%rbp,%r11,4),%r10d
>   0.16%  │  0x00007f0d2fb20601: shr    $0x8,%r10d
>   0.18%  │  0x00007f0d2fb20605: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@66 (line 695)
>   0.36%  │  0x00007f0d2fb20609: decl   0x10(%r9,%r10,4)   ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@71 (line 695)
>   2.05%  │  0x00007f0d2fb2060e: mov    0x14(%rbp,%r11,4),%r10d
>   0.14%  │  0x00007f0d2fb20613: shr    $0x10,%r10d
>   0.38%  │  0x00007f0d2fb20617: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@84 (line 696)
>   0.14%  │  0x00007f0d2fb2061b: decl   0x10(%rbx,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@89 (line 696)
>   1.51%  │  0x00007f0d2fb20620: mov    0x14(%rbp,%r11,4),%r10d
>   0.14%  │  0x00007f0d2fb20625: shr    $0x18,%r10d
>   0.32%  │  0x00007f0d2fb20629: xor    $0x80,%r10d        ;*ixor {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@102 (line 697)
>   0.12%  │  0x00007f0d2fb20630: cmp    $0x100,%r10d
>          │  0x00007f0d2fb20637: jae    0x00007f0d2fb20ba8
>   0.78%  │  0x00007f0d2fb2063d: decl   0x10(%rdx,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@107 (line 697)
>   1.04%  │  0x00007f0d2fb20642: add    $0x2,%r11d         ;*iinc {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@108 (line 693)
>   0.64%  │  0x00007f0d2fb20646: cmp    %esi,%r11d
>          ╰  0x00007f0d2fb20649: jl     0x00007f0d2fb205a0  ;*if_icmpge 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@38 (line 693)
>             0x00007f0d2fb2064f: cmp    %r13d,%r11d
>             0x00007f0d2fb20652: jge    0x00007f0d2fb206ad  ;*aload_2 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@41 (line 694)
>             0x00007f0d2fb20654: movzbl 0x10(%rbp,%r11,4),%r10d  ;*iand 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@49 (line 694)
>             0x00007f0d2fb2065a: decl   0x10(%r8,%r10,4)   ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::jdk@54 (line 694)
> ....................................................................................................
>  21.11%  <total for region 1>
> 
> 
> 
> Whereas they have been eliminated in the first pass of 
> `unrollOnePassHistogramsSkipLevelsSigned`:
> 
> 
> ....[Hottest Region 
> 1]..............................................................................
> c2, level 4, 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned,
>  version 402 (544 bytes) 
> 
>             0x00007f123035bb6c: incl   0x14(%rbp,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>             0x00007f123035bb70: incl   0x14(%r9,%r10,4)   ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>             0x00007f123035bb75: cmp    $0x1,%esi
>             0x00007f123035bb78: jle    0x00007f123035cc28
>             0x00007f123035bb7e: mov    $0x1,%edi
>             0x00007f123035bb83: mov    $0x1,%ebx
>             0x00007f123035bb88: nopl   0x0(%rax,%rax,1)   ;*aload 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@43
>  (line 402)
>   0.12%  ↗  0x00007f123035bb90: mov    $0x80000000,%r10d
>   0.06%  │  0x00007f123035bb96: add    0x10(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.04%  │  0x00007f123035bb9b: mov    %r10d,%r8d
>   0.10%  │  0x00007f123035bb9e: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.12%  │  0x00007f123035bba2: mov    %r10d,%r11d
>   0.08%  │  0x00007f123035bba5: shr    $0x10,%r11d
>   0.08%  │  0x00007f123035bba9: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.08%  │  0x00007f123035bbad: mov    %r10d,%ecx
>   0.04%  │  0x00007f123035bbb0: shr    $0x8,%ecx
>   0.08%  │  0x00007f123035bbb3: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.14%  │  0x00007f123035bbb7: mov    0x18(%rsp),%r14
>   0.16%  │  0x00007f123035bbbc: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.36%  │  0x00007f123035bbc1: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.06%  │  0x00007f123035bbc4: mov    0x10(%rsp),%r10
>   0.10%  │  0x00007f123035bbc9: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.26%  │  0x00007f123035bbce: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.28%  │  0x00007f123035bbd3: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.20%  │  0x00007f123035bbd8: mov    $0x80000000,%r10d
>   0.10%  │  0x00007f123035bbde: add    0x14(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.08%  │  0x00007f123035bbe3: mov    %r10d,%r8d
>   0.12%  │  0x00007f123035bbe6: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.04%  │  0x00007f123035bbea: mov    %r10d,%r11d
>   0.06%  │  0x00007f123035bbed: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bbf1: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.14%  │  0x00007f123035bbf5: mov    %r10d,%ecx
>   0.04%  │  0x00007f123035bbf8: shr    $0x8,%ecx
>   0.04%  │  0x00007f123035bbfb: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.08%  │  0x00007f123035bbff: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.46%  │  0x00007f123035bc04: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.10%  │  0x00007f123035bc07: mov    0x10(%rsp),%r10
>   0.08%  │  0x00007f123035bc0c: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.22%  │  0x00007f123035bc11: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.30%  │  0x00007f123035bc16: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.22%  │  0x00007f123035bc1b: mov    $0x80000000,%r10d
>   0.04%  │  0x00007f123035bc21: add    0x18(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.20%  │  0x00007f123035bc26: mov    %r10d,%r8d
>   0.18%  │  0x00007f123035bc29: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.12%  │  0x00007f123035bc2d: mov    %r10d,%r11d
>   0.06%  │  0x00007f123035bc30: shr    $0x10,%r11d
>   0.20%  │  0x00007f123035bc34: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.10%  │  0x00007f123035bc38: mov    %r10d,%ecx
>   0.12%  │  0x00007f123035bc3b: shr    $0x8,%ecx
>   0.20%  │  0x00007f123035bc3e: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.08%  │  0x00007f123035bc42: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.18%  │  0x00007f123035bc47: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.08%  │  0x00007f123035bc4a: mov    0x10(%rsp),%r10
>   0.10%  │  0x00007f123035bc4f: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.20%  │  0x00007f123035bc54: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.12%  │  0x00007f123035bc59: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.34%  │  0x00007f123035bc5e: mov    $0x80000000,%r10d
>   0.04%  │  0x00007f123035bc64: add    0x1c(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.91%  │  0x00007f123035bc69: mov    %r10d,%r8d
>   0.02%  │  0x00007f123035bc6c: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.12%  │  0x00007f123035bc70: mov    %r10d,%r11d
>   0.06%  │  0x00007f123035bc73: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bc77: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.06%  │  0x00007f123035bc7b: mov    %r10d,%ecx
>          │  0x00007f123035bc7e: shr    $0x8,%ecx
>   0.10%  │  0x00007f123035bc81: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.08%  │  0x00007f123035bc85: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.16%  │  0x00007f123035bc8a: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>          │  0x00007f123035bc8d: mov    0x10(%rsp),%r10
>   0.14%  │  0x00007f123035bc92: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.32%  │  0x00007f123035bc97: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.32%  │  0x00007f123035bc9c: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.24%  │  0x00007f123035bca1: mov    $0x80000000,%r10d
>   0.12%  │  0x00007f123035bca7: add    0x20(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.08%  │  0x00007f123035bcac: mov    %r10d,%r8d
>   0.08%  │  0x00007f123035bcaf: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.06%  │  0x00007f123035bcb3: mov    %r10d,%r11d
>   0.08%  │  0x00007f123035bcb6: shr    $0x10,%r11d
>   0.10%  │  0x00007f123035bcba: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.14%  │  0x00007f123035bcbe: mov    %r10d,%ecx
>   0.14%  │  0x00007f123035bcc1: shr    $0x8,%ecx
>   0.14%  │  0x00007f123035bcc4: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.08%  │  0x00007f123035bcc8: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.22%  │  0x00007f123035bccd: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.10%  │  0x00007f123035bcd0: mov    0x10(%rsp),%r10
>   0.14%  │  0x00007f123035bcd5: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.24%  │  0x00007f123035bcda: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.20%  │  0x00007f123035bcdf: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.18%  │  0x00007f123035bce4: mov    $0x80000000,%r10d
>   0.04%  │  0x00007f123035bcea: add    0x24(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.12%  │  0x00007f123035bcef: mov    %r10d,%r8d
>   0.04%  │  0x00007f123035bcf2: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.08%  │  0x00007f123035bcf6: mov    %r10d,%r11d
>   0.16%  │  0x00007f123035bcf9: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bcfd: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.06%  │  0x00007f123035bd01: mov    %r10d,%ecx
>   0.04%  │  0x00007f123035bd04: shr    $0x8,%ecx
>   0.02%  │  0x00007f123035bd07: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.12%  │  0x00007f123035bd0b: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.22%  │  0x00007f123035bd10: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.06%  │  0x00007f123035bd13: mov    0x10(%rsp),%r10
>   0.04%  │  0x00007f123035bd18: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.20%  │  0x00007f123035bd1d: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.28%  │  0x00007f123035bd22: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.46%  │  0x00007f123035bd27: mov    $0x80000000,%r10d
>   0.08%  │  0x00007f123035bd2d: add    0x28(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.08%  │  0x00007f123035bd32: mov    %r10d,%r8d
>   0.04%  │  0x00007f123035bd35: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.10%  │  0x00007f123035bd39: mov    %r10d,%r11d
>   0.04%  │  0x00007f123035bd3c: shr    $0x10,%r11d
>   0.06%  │  0x00007f123035bd40: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.04%  │  0x00007f123035bd44: mov    %r10d,%ecx
>   0.10%  │  0x00007f123035bd47: shr    $0x8,%ecx
>   0.10%  │  0x00007f123035bd4a: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.10%  │  0x00007f123035bd4e: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.28%  │  0x00007f123035bd53: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.14%  │  0x00007f123035bd56: mov    0x10(%rsp),%r10
>   0.10%  │  0x00007f123035bd5b: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.36%  │  0x00007f123035bd60: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.20%  │  0x00007f123035bd65: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.28%  │  0x00007f123035bd6a: mov    $0x80000000,%r10d
>   0.16%  │  0x00007f123035bd70: add    0x2c(%rdx,%rdi,4),%r10d  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
>   0.08%  │  0x00007f123035bd75: mov    %r10d,%r8d
>   0.10%  │  0x00007f123035bd78: shr    $0x18,%r8d         ;*iushr 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@116
>  (line 406)
>   0.08%  │  0x00007f123035bd7c: mov    %r10d,%r11d
>   0.08%  │  0x00007f123035bd7f: shr    $0x10,%r11d
>   0.14%  │  0x00007f123035bd83: movzbl %r11b,%r11d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@99
>  (line 405)
>   0.02%  │  0x00007f123035bd87: mov    %r10d,%ecx
>   0.02%  │  0x00007f123035bd8a: shr    $0x8,%ecx
>   0.20%  │  0x00007f123035bd8d: movzbl %r10b,%r10d        ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@59
>  (line 403)
>   0.22%  │  0x00007f123035bd91: incl   0x14(%r14,%r10,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@66
>  (line 403)
>   0.22%  │  0x00007f123035bd96: movzbl %cl,%ecx           ;*iand {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@79
>  (line 404)
>   0.12%  │  0x00007f123035bd99: mov    0x10(%rsp),%r10
>   0.04%  │  0x00007f123035bd9e: incl   0x14(%r10,%rcx,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@86
>  (line 404)
>   0.22%  │  0x00007f123035bda3: incl   0x14(%rbp,%r11,4)  ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@106
>  (line 405)
>   0.20%  │  0x00007f123035bda8: incl   0x14(%r9,%r8,4)    ;*iastore 
> {reexecute=0 rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@123
>  (line 406)
>   0.26%  │  0x00007f123035bdad: add    $0x8,%edi          ;*iinc {reexecute=0 
> rethrow=0 return_oop=0}
>          │                                                ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@124
>  (line 402)
>   0.10%  │  0x00007f123035bdb0: cmp    %esi,%edi
>          ╰  0x00007f123035bdb2: jl     0x00007f123035bb90  ;*if_icmpge 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@40
>  (line 402)
>             0x00007f123035bdb8: cmp    %eax,%edi
>             0x00007f123035bdba: jge    0x00007f123035be09  ;*aload 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@43
>  (line 402)
>             0x00007f123035bdbc: mov    $0x80000000,%esi
>             0x00007f123035bdc1: add    0x10(%rdx,%rdi,4),%esi  ;*isub 
> {reexecute=0 rethrow=0 return_oop=0}
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::signed@3 (line 476)
>                                                           ; - 
> io.github.richardstartin.radixsort.RadixSort::unrollOnePassHistogramsSkipLevelsSigned@53
>  (line 403)
> ....................................................................................................
>  18.02%  <total for region 1>
> 
> 
> 
> So, I think there may be some room for improvement in this submission. 
> Benchmarking is hard, and it's possible I have made a mistake or have 
> misinterpreted the results, but I think such a big difference warrants 
> investigation before pushing this change to so many users. I ran these 
> benchmarks on JDK 11.0.11. I have run a range of randomised tests to verify 
> that the two implementations produce the same results.

So the issue of not skipping passes was my fault in the translation process, so 
not something to worry about, though after [fixing 
that](https://github.com/richardstartin/radix-sort-benchmark/commit/ccbee984c6a0e0f50c30de59e1a5e9fbcad89510)
 the original implementation still has the edge because of the bounds checks on 
the `xor` not getting eliminated.


Benchmark                                         (bits)  (padding)  (scenario) 
 (seed)   (size)  Mode  Cnt      Score    Error  Units
RadixSortBenchmark.jdk                                17          7     UNIFORM 
      0  1000000  avgt    5  10432.408 ± 87.024  us/op
RadixSortBenchmark.jdk                                23          7     UNIFORM 
      0  1000000  avgt    5   9465.990 ± 40.598  us/op
RadixSortBenchmark.jdk                                30          7     UNIFORM 
      0  1000000  avgt    5  11189.146 ± 50.972  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned      17          7     UNIFORM 
      0  1000000  avgt    5   9546.963 ± 41.698  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned      23          7     UNIFORM 
      0  1000000  avgt    5   9412.114 ± 43.081  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned      30          7     UNIFORM 
      0  1000000  avgt    5  10823.618 ± 64.311  us/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/3938

Reply via email to