Re: RFR: 8283681: Improve ZonedDateTime offset handling

2022-03-25 Thread Richard Startin
On Fri, 25 Mar 2022 12:28:58 GMT, Claes Redestad  wrote:

> Richard Startin prompted me to have a look at a case where java.time 
> underperforms relative to joda time 
> (https://twitter.com/richardstartin/status/1506975932271190017). 
> 
> It seems the java.time test of his suffer from heavy allocations due 
> ZoneOffset::getRules allocating a new ZoneRules object every time and escape 
> analysis failing to do the thing in his test. The patch here adds a simple 
> specialization so that when creating ZonedDateTimes using a ZoneOffset we 
> don't query the rules at all. This removes the risk of extra allocations and 
> slightly speeds up ZonedDateTime creation for both ZoneOffset (+14%) and 
> ZoneRegion (+5%) even when EA works like it should (the case in the here 
> provided microbenchmark).

test/micro/org/openjdk/bench/java/time/GetYearBench.java line 70:

> 68: private static final long[] INSTANT_MILLIS = createInstants();
> 69: 
> 70: private static final int[] YEARS = new int[INSTANT_MILLIS.length];

Does it make any difference if these aren't constant?

-

PR: https://git.openjdk.java.net/jdk/pull/7957


Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

2022-03-08 Thread Richard Startin
On Mon, 7 Mar 2022 21:41:05 GMT, Richard Startin  wrote:

>> Ludovic Henry has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add UTF-16 benchmarks
>
> Great to see this taken up. As it’s implemented here, it’s still scalar, but 
> the unroll prevents a strength reduction of the multiplication in the loop 
> from
> 
> result = 31 * result + element;
> 
> to:
> 
> result = (result << 5) - result + element
> 
> which creates a data dependency and slows the loop down.
> 
> This was first reported by Peter Levart here: 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-September/028898.html

> @richardstartin - does that strength reduction actually happen? The bit-shift 
> transformation valid only if the original `result` is known to be 
> non-negative.

Yes.


@State(Scope.Benchmark)
public class StringHashCode {

  @Param({"sdjhfklashdfklashdflkashdflkasdhf", "締国件街徹条覧野武鮮覧横営績難比兵州催色"})
  String string;

  @CompilerControl(CompilerControl.Mode.DONT_INLINE)
  @Benchmark
  public int stringHashCode() {
return new String(string).hashCode();
  }
}



[Hottest Region 
1]..
c2, level 4, StringHashCode::stringHashCode, version 507 (384 bytes) 

0x7f2df0142da4: shl$0x3,%r10
0x7f2df0142da8: movabs $0x8,%r12
0x7f2df0142db2: add%r12,%r10
0x7f2df0142db5: xor%r12,%r12
0x7f2df0142db8: cmp%r10,%rax
0x7f2df0142dbb: jne0x7f2de8696080  ;   
{runtime_call ic_miss_stub}
0x7f2df0142dc1: data16 xchg %ax,%ax
0x7f2df0142dc4: nopl   0x0(%rax,%rax,1)
0x7f2df0142dcc: data16 data16 xchg %ax,%ax
  [Verified Entry Point]
  0.12% 0x7f2df0142dd0: mov%eax,-0x14000(%rsp)
  0.84% 0x7f2df0142dd7: push   %rbp
  0.22% 0x7f2df0142dd8: sub$0x30,%rsp ;*synchronization 
entry
  ; - 
StringHashCode::stringHashCode@-1 (line 14)
0x7f2df0142ddc: mov0xc(%rsi),%r8d ;*getfield string 
{reexecute=0 rethrow=0 return_oop=0}
  ; - 
StringHashCode::stringHashCode@5 (line 14)
  0.73% 0x7f2df0142de0: mov0x10(%r12,%r8,8),%eax  ; implicit 
exception: dispatches to 0x7f2df0142fc4
  0.10% 0x7f2df0142de5: test   %eax,%eax
 ╭  0x7f2df0142de7: je 0x7f2df0142df9  
;*synchronization entry
 │; - 
StringHashCode::stringHashCode@-1 (line 14)
  0.16%  │  0x7f2df0142de9: add$0x30,%rsp
 │  0x7f2df0142ded: pop%rbp
 │  0x7f2df0142dee: mov0x108(%r15),%r10
  0.88%  │  0x7f2df0142df5: test   %eax,(%r10);   {poll_return}
  0.18%  │  0x7f2df0142df8: retq   
 ↘  0x7f2df0142df9: mov0xc(%r12,%r8,8),%ecx  ;*getfield 
value {reexecute=0 rethrow=0 return_oop=0}
  ; - 
java.lang.String::<init>@6 (line 236)
  ; - 
StringHashCode::stringHashCode@8 (line 14)
0x7f2df0142dfe: mov0xc(%r12,%rcx,8),%r10d  
;*arraylength {reexecute=0 rethrow=0 return_oop=0}
  ; - 
java.lang.String::hashCode@13 (line 1503)
  ; - 
StringHashCode::stringHashCode@11 (line 14)
  ; implicit 
exception: dispatches to 0x7f2df0142fd0
  0.83% 0x7f2df0142e03: test   %r10d,%r10d
0x7f2df0142e06: jbe0x7f2df0142f86  ;*ifle 
{reexecute=0 rethrow=0 return_oop=0}
  ; - 
java.lang.String::hashCode@14 (line 1503)
  ; - 
StringHashCode::stringHashCode@11 (line 14)
  0.14% 0x7f2df0142e0c: movsbl 0x14(%r12,%r8,8),%r8d  ;*getfield 
coder {reexecute=0 rethrow=0 return_oop=0}
  ; - 
java.lang.String::<init>@14 (line 237)
  ; - 
StringHashCode::stringHashCode@8 (line 14)
  0.02% 0x7f2df0142e12: test   %r8d,%r8d
0x7f2df0142e15: jne0x7f2df0142fac  ;*ifne 
{reexecute=0 rethrow=0 return_oop=0}
  ; - 
java.lang.String::isLatin1@10 (li

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

2022-03-07 Thread Richard Startin
On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry  wrote:

>> Despite the hash value being cached for Strings, computing the hash still 
>> represents a significant CPU usage for applications handling lots of text.
>> 
>> Even though it would be generally better to do it through an enhancement to 
>> the autovectorizer, the complexity of doing it by hand is trivial and the 
>> gain is sizable (2x speedup) even without the Vector API. The algorithm has 
>> been proposed by Richard Startin and Paul Sandoz [1].
>> 
>> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`
>> 
>> 
>> Benchmark(size)  Mode  Cnt Score 
>>Error  Units
>> StringHashCode.Algorithm.scalarLatin1 0  avgt   25 2.111 
>> ±  0.210  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25 3.500 
>> ±  0.127  ns/op
>> StringHashCode.Algorithm.scalarLatin110  avgt   25 7.001 
>> ±  0.099  ns/op
>> StringHashCode.Algorithm.scalarLatin1   100  avgt   2561.285 
>> ±  0.444  ns/op
>> StringHashCode.Algorithm.scalarLatin1  1000  avgt   25   628.995 
>> ±  0.846  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25  6307.990 
>> ±  4.071  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   0  avgt   25 2.358 
>> ±  0.092  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25 3.631 
>> ±  0.159  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16  10  avgt   25 7.049 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16 100  avgt   2533.626 
>> ±  1.218  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled161000  avgt   25   317.811 
>> ±  1.225  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25  3212.333 
>> ± 14.621  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled80  avgt   25 2.356 
>> ±  0.097  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25 3.630 
>> ±  0.158  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8   10  avgt   25 8.724 
>> ±  0.065  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8  100  avgt   2532.402 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000  avgt   25   321.949 
>> ±  0.251  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25  3202.083 
>> ±  1.667  ns/op
>> StringHashCode.Algorithm.scalarUTF16  0  avgt   25 2.135 
>> ±  0.191  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25 5.202 
>> ±  0.362  ns/op
>> StringHashCode.Algorithm.scalarUTF16 10  avgt   2511.105 
>> ±  0.112  ns/op
>> StringHashCode.Algorithm.scalarUTF16100  avgt   2575.974 
>> ±  0.702  ns/op
>> StringHashCode.Algorithm.scalarUTF16   1000  avgt   25   716.429 
>> ±  3.290  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25  7095.459 
>> ± 43.847  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled160  avgt   25 2.381 
>> ±  0.038  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25 5.268 
>> ±  0.422  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16   10  avgt   2511.248 
>> ±  0.178  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16  100  avgt   2552.966 
>> ±  0.089  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000  avgt   25   450.912 
>> ±  1.834  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25  4403.988 
>> ±  2.927  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 0  avgt   25 2.401 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25 5.091 
>> ±  0.396  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled810  avgt   2512.801 
>> ±  0.189  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8   100  avgt   2552.068 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8  1000  avgt   25   453.270 
>> ±  0.340  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25  4433.112 
>> ±  2.699  ns/op
>> 
>> 
>> At Datadog, we handle a great amount of text (through logs management for 
>> example), and hashing String represents a large part of our CPU usage. It'

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-14 Thread Richard Startin
On Tue, 14 Sep 2021 10:57:17 GMT, Alan Bateman  wrote:

>>> Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I 
>>> believe this work derives from an unsigned radix sort I implemented on 
>>> 10/04/2021 
>>> [richardstartin/radix-sort-benchmark@ab4da23#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226](https://github.com/richardstartin/radix-sort-benchmark/commit/ab4da230e1d0ac68e5ee2cee38d71c7e7d50f49b#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226)
>>>  which has numerous structural similarities to this work:
>>> Moreover, @bourgesl forked my repository on 11/04/2021 and communicated 
>>> with me about doing so. On 25/04/2021 there was a new implementation of 
>>> `DualPivotQuicksort` with a signed radix sort but the same structural 
>>> similarities, and with the same method and variable names in places 
>>> [bourgesl/radix-sort-benchmark@90ff7e4#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609](https://github.com/bourgesl/radix-sort-benchmark/commit/90ff7e427da0fa49f374bff0241fb2487bd87bde#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609)
>> 
>> @iaroslavski The attribution is not clear here. Can you provide a summary as 
>> to who is contributing to this patch? I can't tell if all involved have 
>> signed the OCA or not. I'm sure there will be questions about space/time 
>> trade-offs with radix sort but I think it's important to first establish the 
>> origins of this patch first.
>
>> @AlanBateman Vertical pipeline of PR hides comments in the middle and you 
>> have to click on "Show more..." to see all comments. There are no claims 
>> related to the origin of my patch, it doesn't violate any rights.
> 
> There is a comment from richardstartin suggesting that something was derived 
> from code in his repo. Is this a benchmark that is not part of this PR? Only 
> asking because I can't find him on OCA signatories. You can use the Skara 
> /contributor command to list the contributors.

@AlanBateman my claim was that the implementation was derived from my 
implementation, and demonstrated a sequence of name changes after @bourgesl 
forked my repository containing a structurally similar radix sort 
implementation and benchmarks, in order to provide circumstantial evidence for 
my claim. Via email @iaroslavski told me that this was not the case, which I 
decided to accept at face value. So please judge this PR on its merits, and 
disregard the claims made in these comments. I have not signed an OCA but do 
not want to block this PR if the space time tradeoff is deemed acceptable.

-

PR: https://git.openjdk.java.net/jdk/pull/3938


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Sat, 8 May 2021 20:54:48 GMT, iaroslavski 
 wrote:

> Sorting:
> 
> - adopt radix sort for sequential and parallel sorts on int/long/float/double 
> arrays (almost random and length > 6K)
> - fix tryMergeRuns() to better handle case when the last run is a single 
> element
> - minor javadoc and comment changes
> 
> Testing:
> - add new data inputs in tests for sorting
> - add min/max/infinity values to float/double testing
> - add tests for radix sort

src/java.base/share/classes/java/util/DualPivotQuicksort.java line 672:

> 670: count2[(a[i] >>>  8) & 0xFF]--;
> 671: count3[(a[i] >>> 16) & 0xFF]--;
> 672: count4[(a[i] >>> 24) ^ 0x80]--;

It seems that C2 can't eliminate the bounds check here because of the `xor`, 
even though this can't possibly exceed 256. The three masked accesses above are 
all eliminated. Maybe someone could look in to improving that.

-

PR: https://git.openjdk.java.net/jdk/pull/3938


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 10:22:57 GMT, Laurent Bourgès  wrote:

>> Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I 
>> believe this work derives from an unsigned radix sort I implemented on 
>> 10/04/2021 
>> https://github.com/richardstartin/radix-sort-benchmark/commit/ab4da230e1d0ac68e5ee2cee38d71c7e7d50f49b#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226
>>  which has numerous structural similarities to this work:
>> * Producing all four histograms in one pass
>> * Skipping passes based on detecting the total in the histogram
>> * Bailing out of the skip detection if a nonzero value not equal to the 
>> total is encountered
>> * Manually unrolling the LSD radix sort loop in order to avoid array copies
>> 
>> My implementation from 10th April is below for reference:
>> 
>>   public static void unrollOnePassHistogramsSkipLevels(int[] data) {
>> int[] histogram1 = new int[257];
>> int[] histogram2 = new int[257];
>> int[] histogram3 = new int[257];
>> int[] histogram4 = new int[257];
>> 
>> for (int value : data) {
>>   ++histogram1[(value & 0xFF) + 1];
>>   ++histogram2[((value >>> 8) & 0xFF) + 1];
>>   ++histogram3[((value >>> 16) & 0xFF) + 1];
>>   ++histogram4[(value >>> 24) + 1];
>> }
>> boolean skipLevel1 = canSkipLevel(histogram1, data.length);
>> boolean skipLevel2 = canSkipLevel(histogram2, data.length);
>> boolean skipLevel3 = canSkipLevel(histogram3, data.length);
>> boolean skipLevel4 = canSkipLevel(histogram4, data.length);
>> 
>> if (skipLevel1 && skipLevel2 && skipLevel3 && skipLevel4) {
>>   return;
>> }
>> int[] copy = new int[data.length];
>> 
>> int[] source = data;
>> int[] dest = copy;
>> 
>> if (!skipLevel1) {
>>   for (int i = 1; i < histogram1.length; ++i) {
>> histogram1[i] += histogram1[i - 1];
>>   }
>>   for (int value : source) {
>> dest[histogram1[value & 0xFF]++] = value;
>>   }
>>   if (!skipLevel2 || !skipLevel3 || !skipLevel4) {
>> int[] tmp = dest;
>> dest = source;
>> source = tmp;
>>   }
>> }
>> 
>> if (!skipLevel2) {
>>   for (int i = 1; i < histogram2.length; ++i) {
>> histogram2[i] += histogram2[i - 1];
>>   }
>>   for (int value : source) {
>> dest[histogram2[(value >>> 8) & 0xFF]++] = value;
>>   }
>>   if (!skipLevel3 || !skipLevel4) {
>> int[] tmp = dest;
>> dest = source;
>> source = tmp;
>>   }
>> }
>> 
>> if (!skipLevel3) {
>>   for (int i = 1; i < histogram3.length; ++i) {
>> histogram3[i] += histogram3[i - 1];
>>   }
>>   for (int value : data) {
>> dest[histogram3[(value >>> 16) & 0xFF]++] = value;
>>   }
>>   if (!skipLevel4) {
>> int[] tmp = dest;
>> dest = source;
>> source = tmp;
>>   }
>> }
>> 
>> if (!skipLevel4) {
>>   for (int i = 1; i < histogram4.length; ++i) {
>> histogram4[i] += histogram4[i - 1];
>>   }
>>   for (int value : source) {
>> dest[histogram4[value >>> 24]++] = value;
>>   }
>> }
>> if (dest != data) {
>>   System.arraycopy(dest, 0, data, 0, data.length);
>> }
>>   }
>> 
>>   private static boolean canSkipLevel(int[] histogram, int dataSize) {
>> for (int count : histogram) {
>>   if (count == dataSize) {
>> return true;
>>   } else if (count > 0) {
>> return false;
>>   }
>> }
>> return true;
>>   }
>> 
>> 
>> Moreover, @bourgesl forked my repository on 11/04/2021 and communicated with 
>> me about doing so. On 25/04/2021 there was a new implementation of 
>> `DualPivotQuicksort` with a signed radix sort but the same structural 
>> similarities, and with the same method and variable names in places 
>> https://github.com/bourgesl/radix-sort-benchmark/commit/90ff7e427da0fa49f374bff0241fb2487bd87bde#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609
>> 
>> 
>> // TODO add javadoc
>> private static void radixSort(Sorter sorter, int[] a, int low, int high) 
>> {
>> int[] b;
>> // LBO: prealloc (high - low) +1 element:
>> if (sorter == null || (b = sorter.b) == null || b.length < (high - 
>> low)) {
>> // System.out.println("alloc b: " + (high - low));
>> b = new int[high - low];
>> }
>> 
>> int[] count1, count2, count3, count4;
>> if (sorter != null) {
>> sorter.resetRadixBuffers();
>> count1 = sorter.count1;
>> count2 = sorter.count2;
>> count3 = sorter.count3;
>> count4 = sorter.count4;
>> } else {
>> // System.out.println("alloc radix buffers(4x256)");
>> count1 = new int[256];
>> count2 = new int[256];
>> count3 = new int[256];
>> count4 = new int[256];
>> }
>> 
>> 

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Fri, 14 May 2021 07:14:27 GMT, Laurent Bourgès  wrote:

>> So the issue of not skipping passes was my fault in the translation process, 
>> so not something to worry about, though after [fixing 
>> that](https://github.com/richardstartin/radix-sort-benchmark/commit/ccbee984c6a0e0f50c30de59e1a5e9fbcad89510)
>>  the original implementation still has the edge because of the bounds checks 
>> on the `xor` not getting eliminated.
>> 
>> 
>> Benchmark (bits)  (padding)  
>> (scenario)  (seed)   (size)  Mode  Cnt  ScoreError  Units
>> RadixSortBenchmark.jdk17  7 
>> UNIFORM   0  100  avgt5  10432.408 ± 87.024  us/op
>> RadixSortBenchmark.jdk23  7 
>> UNIFORM   0  100  avgt5   9465.990 ± 40.598  us/op
>> RadixSortBenchmark.jdk30  7 
>> UNIFORM   0  100  avgt5  11189.146 ± 50.972  us/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned  17  7 
>> UNIFORM   0  100  avgt5   9546.963 ± 41.698  us/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned  23  7 
>> UNIFORM   0  100  avgt5   9412.114 ± 43.081  us/op
>> RadixSortBenchmark.unrollOnePassSkipLevelsSigned  30  7 
>> UNIFORM   0  100  avgt5  10823.618 ± 64.311  us/op
>
> Great analysis on C2, richard.
> 
> maybe (x ^ 0x80) &0xFF would help C2 to eliminate bound checks...

I don't know Laurent, I find the handling of signed order over-complicated. 
Subtracting `Integer.MIN_VALUE` is really cheap...

-

PR: https://git.openjdk.java.net/jdk/pull/3938


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 14:44:28 GMT, Richard Startin 
 wrote:

>> @iaroslavski I would prefer to discuss this in private than here, but my 
>> argument is that the name `skipByte` came from Laurent's code, and that 
>> Laurent's code was clearly derived from my own within a fork of my 
>> repository. I linked the commits where you changed `skipByte` to `passLevel` 
>> and Laurent changed my name `canSkipLevel` to `skipByte`. 
>> 
>> For me, this raises questions about the independence of your work from 
>> Laurent's, and Laurent's work is clearly derived from my own (and I don't 
>> think anyone is disputing the latter). I would be happy to sort this out in 
>> private.
>
> In private correspondence with Vladimir, it was explained that where 
> Vladimir's code and Laurent's code are identical, including typos 
> ([Vladimir's 
> code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672),
>  [Laurent's 
> code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719))
>  it is because Vladimir sent the code to Laurent, not the other way around, 
> therefore Vladimir's code does not derive from Laurent's, and it does not 
> derive from mine. I can only trust that this is the case, so please disregard 
> my claim that this is derivative work when reviewing this PR.

For what it's worth, I benchmarked this implementation radix sort ([adapted 
here to fit in to my 
harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782))
 against a [signed 
variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478)
 of what I have claimed this work was derived from and the proposed 
implementation does not perform favourably on uniformly random data:



Benchmark (bits)  (padding)  (scenario) 
 (seed)   (size)  Mode  Cnt  Score Error  Units
RadixSortBenchmark.jdk17  7 UNIFORM 
  0  100  avgt5  11301.950 ± 113.691  us/op
RadixSortBenchmark.jdk23  7 UNIFORM 
  0  100  avgt5  11792.351 ±  60.757  us/op
RadixSortBenchmark.jdk30  7 UNIFORM 
  0  100  avgt5  11184.616 ±  67.094  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned  17  7 UNIFORM 
  0  100  avgt5   9564.626 ±  69.497  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned  23  7 UNIFORM 
  0  100  avgt5   9432.085 ±  58.983  us/op
RadixSortBenchmark.unrollOnePassSkipLevelsSigned  30  7 UNIFORM 
  0  100  avgt5  10772.975 ±  51.848  us/op



I believe the root cause is a defect in the mechanism employed to skip passes 
as can be seen by the increased number of instructions and cycles here. In the 
proposed implementation, instructions is roughly constant as a function of 
bits. In the case where all passes must be performed (bits = 30), IPC is 
superior in `unrollOnePassHistogramsSkipLevelsSigned`.


Benchmark   (bits)  
(padding)  (scenario)  (seed)   (size)  Mode  Cnt Score Error  
Units
RadixSortBenchmark.jdk:cycles   17  
7 UNIFORM   0  100  avgt   34976971.877 
#/op
RadixSortBenchmark.jdk:instructions 17  
7 UNIFORM   0  100  avgt   70121142.003 
#/op
RadixSortBenchmark.jdk:cycles   23  
7 UNIFORM   0  100  avgt   32369970.385 
#/op
RadixSortBenchmark.jdk:instructions 23  
7 UNIFORM   0  100  avgt   70201664.963 
#/op
RadixSortBenchmark.jdk:cycles   30  
7 UNIFORM   0  100  avgt   30789736.602 
#/op
RadixSortBenchmark.jdk:instructions 30  
7 UNIFORM   0  100  avgt   70180942.122 
#/op
RadixSortBenchmark.jdk:IPC  30  
7 UNIFORM   0  100  avgt  2.279
insns/clk
RadixSortBenc

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 20:23:16 GMT, Richard Startin 
 wrote:

>> In private correspondence with Vladimir, it was explained that where 
>> Vladimir's code and Laurent's code are identical, including typos 
>> ([Vladimir's 
>> code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672),
>>  [Laurent's 
>> code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719))
>>  it is because Vladimir sent the code to Laurent, not the other way around, 
>> therefore Vladimir's code does not derive from Laurent's, and it does not 
>> derive from mine. I can only trust that this is the case, so please 
>> disregard my claim that this is derivative work when reviewing this PR.
>
> For what it's worth, I benchmarked this implementation radix sort ([adapted 
> here to fit in to my 
> harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782))
>  against a [signed 
> variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478)
>  of what I have claimed this work was derived from and the proposed 
> implementation does not perform favourably on uniformly random data:
> 
> 
> 
> Benchmark (bits)  (padding)  
> (scenario)  (seed)   (size)  Mode  Cnt  Score Error  Units
> RadixSortBenchmark.jdk17  7 
> UNIFORM   0  100  avgt5  11301.950 ± 113.691  us/op
> RadixSortBenchmark.jdk23  7 
> UNIFORM   0  100  avgt5  11792.351 ±  60.757  us/op
> RadixSortBenchmark.jdk30  7 
> UNIFORM   0  100  avgt5  11184.616 ±  67.094  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned  17  7 
> UNIFORM   0  100  avgt5   9564.626 ±  69.497  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned  23  7 
> UNIFORM   0  100  avgt5   9432.085 ±  58.983  us/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned  30  7 
> UNIFORM   0  100  avgt5  10772.975 ±  51.848  us/op
> 
> 
> 
> I believe the root cause is a defect in the mechanism employed to skip passes 
> as can be seen by the increased number of instructions and cycles here. In 
> the proposed implementation, instructions is roughly constant as a function 
> of bits. In the case where all passes must be performed (bits = 30), IPC is 
> superior in `unrollOnePassHistogramsSkipLevelsSigned`.
> 
> 
> Benchmark   
> (bits)  (padding)  (scenario)  (seed)   (size)  Mode  Cnt Score 
> Error  Units
> RadixSortBenchmark.jdk:cycles   
> 17  7 UNIFORM   0  100  avgt   34976971.877   
>   #/op
> RadixSortBenchmark.jdk:instructions 
> 17  7 UNIFORM   0  100  avgt   70121142.003   
>   #/op
> RadixSortBenchmark.jdk:cycles   
> 23  7 UNIFORM   0  100  avgt   32369970.385   
>   #/op
> RadixSortBenchmark.jdk:instructions 
> 23  7 UNIFORM   0  100  avgt   70201664.963   
>   #/op
> RadixSortBenchmark.jdk:cycles   
> 30  7 UNIFORM   0  100  avgt   30789736.602   
>   #/op
> RadixSortBenchmark.jdk:instructions 
> 30  7 UNIFORM   0  100  avgt   70180942.122   
>   #/op
> RadixSortBenchmark.jdk:IPC  
> 30  7 UNIFORM   0  100  avgt  2.279   
>  insns/clk
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles 
> 17  7 UNIFORM   0  100  avgt   26983994.479   
>   #/op
> RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions   
> 17  7 UNIFORM   0  100  avgt   62065304.827   
>   #/op
> RadixSortBenchmark.unrollO

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 11:31:49 GMT, iaroslavski 
 wrote:

>> Perhaps we can resolve this issue in private - my email address is on my 
>> profile (or in the commits in `radix-sort-benchmark`)?
>
> @richardstartin And one more addon: my first version of Radix sort, see my 
> github https://github.com/iaroslavski/sorting/tree/master/radixsort uses 
> another name, like skipBytes, then renamed to passLevel.
> So, the common part is "skip". And this method has different number of 
> parameters. I don't see any collision with your code.

@iaroslavski I would prefer to discuss this in private than here, but my 
argument is that the name `skipByte` came from Laurent's code, and that 
Laurent's code was clearly derived from my own within a fork of my repository. 
I linked the commits where you changed `skipByte` to `passLevel` and Laurent 
changed my name `canSkipLevel` to `skipByte`. 

For me, this raises questions about the independence of your work from 
Laurent's, and Laurent's work is clearly derived from my own (and I don't think 
anyone is disputing the latter). I would be happy to sort this out in private.

-

PR: https://git.openjdk.java.net/jdk/pull/3938


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 11:47:58 GMT, Richard Startin 
 wrote:

>> @richardstartin And one more addon: my first version of Radix sort, see my 
>> github https://github.com/iaroslavski/sorting/tree/master/radixsort uses 
>> another name, like skipBytes, then renamed to passLevel.
>> So, the common part is "skip". And this method has different number of 
>> parameters. I don't see any collision with your code.
>
> @iaroslavski I would prefer to discuss this in private than here, but my 
> argument is that the name `skipByte` came from Laurent's code, and that 
> Laurent's code was clearly derived from my own within a fork of my 
> repository. I linked the commits where you changed `skipByte` to `passLevel` 
> and Laurent changed my name `canSkipLevel` to `skipByte`. 
> 
> For me, this raises questions about the independence of your work from 
> Laurent's, and Laurent's work is clearly derived from my own (and I don't 
> think anyone is disputing the latter). I would be happy to sort this out in 
> private.

In private correspondence with Vladimir, it was explained that where Vladimir's 
code and Laurent's code are identical, including typos ([Vladimir's 
code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672),
 [Laurent's 
code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719))
 it is because Vladimir sent the code to Laurent, not the other way around, 
therefore Vladimir's code does not derive from Laurent's, and it does not 
derive from mine. I can only trust that this is the case, so please disregard 
my claim that this is derivative work when reviewing this PR.

-

PR: https://git.openjdk.java.net/jdk/pull/3938


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Wed, 12 May 2021 12:20:09 GMT, iaroslavski 
 wrote:

>> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 47:
>> 
>>> 45:  * @author Doug Lea
>>> 46:  *
>>> 47:  * @version 2020.06.14
>> 
>> Vladimir, I would update to 2021.05.06 (+your hash)
>
> Laurent, the date in this class is not the date of our last commit,
> this date is the date when I have final idea regarding to Radix sort,
> therefore, I prefer to keep 2020.06.14

Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I believe 
this work derives from an unsigned radix sort I implemented on 10/04/2021 
https://github.com/richardstartin/radix-sort-benchmark/commit/ab4da230e1d0ac68e5ee2cee38d71c7e7d50f49b#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226
 which has numerous structural similarities to this work:
* Producing all four histograms in one pass
* Skipping passes based on detecting the total in the histogram
* Bailing out of the skip detection if a nonzero value not equal to the total 
is encountered
* Manually unrolling the LSD radix sort loop in order to avoid array copies

My implementation from 10th April is below for reference:

  public static void unrollOnePassHistogramsSkipLevels(int[] data) {
int[] histogram1 = new int[257];
int[] histogram2 = new int[257];
int[] histogram3 = new int[257];
int[] histogram4 = new int[257];

for (int value : data) {
  ++histogram1[(value & 0xFF) + 1];
  ++histogram2[((value >>> 8) & 0xFF) + 1];
  ++histogram3[((value >>> 16) & 0xFF) + 1];
  ++histogram4[(value >>> 24) + 1];
}
boolean skipLevel1 = canSkipLevel(histogram1, data.length);
boolean skipLevel2 = canSkipLevel(histogram2, data.length);
boolean skipLevel3 = canSkipLevel(histogram3, data.length);
boolean skipLevel4 = canSkipLevel(histogram4, data.length);

if (skipLevel1 && skipLevel2 && skipLevel3 && skipLevel4) {
  return;
}
int[] copy = new int[data.length];

int[] source = data;
int[] dest = copy;

if (!skipLevel1) {
  for (int i = 1; i < histogram1.length; ++i) {
histogram1[i] += histogram1[i - 1];
  }
  for (int value : source) {
dest[histogram1[value & 0xFF]++] = value;
  }
  if (!skipLevel2 || !skipLevel3 || !skipLevel4) {
int[] tmp = dest;
dest = source;
source = tmp;
  }
}

if (!skipLevel2) {
  for (int i = 1; i < histogram2.length; ++i) {
histogram2[i] += histogram2[i - 1];
  }
  for (int value : source) {
dest[histogram2[(value >>> 8) & 0xFF]++] = value;
  }
  if (!skipLevel3 || !skipLevel4) {
int[] tmp = dest;
dest = source;
source = tmp;
  }
}

if (!skipLevel3) {
  for (int i = 1; i < histogram3.length; ++i) {
histogram3[i] += histogram3[i - 1];
  }
  for (int value : data) {
dest[histogram3[(value >>> 16) & 0xFF]++] = value;
  }
  if (!skipLevel4) {
int[] tmp = dest;
dest = source;
source = tmp;
  }
}

if (!skipLevel4) {
  for (int i = 1; i < histogram4.length; ++i) {
histogram4[i] += histogram4[i - 1];
  }
  for (int value : source) {
dest[histogram4[value >>> 24]++] = value;
  }
}
if (dest != data) {
  System.arraycopy(dest, 0, data, 0, data.length);
}
  }

  private static boolean canSkipLevel(int[] histogram, int dataSize) {
for (int count : histogram) {
  if (count == dataSize) {
return true;
  } else if (count > 0) {
return false;
  }
}
return true;
  }


Moreover, @bourgesl forked my repository on 11/04/2021 and communicated with me 
about doing so. On 25/04/2021 there was a new implementation of 
`DualPivotQuicksort` with a signed radix sort but the same structural 
similarities, and with the same method and variable names in places 
https://github.com/bourgesl/radix-sort-benchmark/commit/90ff7e427da0fa49f374bff0241fb2487bd87bde#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609


// TODO add javadoc
private static void radixSort(Sorter sorter, int[] a, int low, int high) {
int[] b;
// LBO: prealloc (high - low) +1 element:
if (sorter == null || (b = sorter.b) == null || b.length < (high - 
low)) {
// System.out.println("alloc b: " + (high - low));
b = new int[high - low];
}

int[] count1, count2, count3, count4;
if (sorter != null) {
sorter.resetRadixBuffers();
count1 = sorter.count1;
count2 = sorter.count2;
count3 = sorter.count3;
count4 = sorter.count4;
} else {
// System.out.println("alloc radix buffers(4x256)");
count1 = new int[256];
count2 = new int[256];
count3 = new int[256];
count4 = new int[256];
}

for (int i = low; i < hig