Re: RFR: 8283681: Improve ZonedDateTime offset handling
On Fri, 25 Mar 2022 12:28:58 GMT, Claes Redestad wrote: > Richard Startin prompted me to have a look at a case where java.time > underperforms relative to joda time > (https://twitter.com/richardstartin/status/1506975932271190017). > > It seems the java.time test of his suffer from heavy allocations due > ZoneOffset::getRules allocating a new ZoneRules object every time and escape > analysis failing to do the thing in his test. The patch here adds a simple > specialization so that when creating ZonedDateTimes using a ZoneOffset we > don't query the rules at all. This removes the risk of extra allocations and > slightly speeds up ZonedDateTime creation for both ZoneOffset (+14%) and > ZoneRegion (+5%) even when EA works like it should (the case in the here > provided microbenchmark). test/micro/org/openjdk/bench/java/time/GetYearBench.java line 70: > 68: private static final long[] INSTANT_MILLIS = createInstants(); > 69: > 70: private static final int[] YEARS = new int[INSTANT_MILLIS.length]; Does it make any difference if these aren't constant? - PR: https://git.openjdk.java.net/jdk/pull/7957
Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]
On Mon, 7 Mar 2022 21:41:05 GMT, Richard Startin wrote: >> Ludovic Henry has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add UTF-16 benchmarks > > Great to see this taken up. As it’s implemented here, it’s still scalar, but > the unroll prevents a strength reduction of the multiplication in the loop > from > > result = 31 * result + element; > > to: > > result = (result << 5) - result + element > > which creates a data dependency and slows the loop down. > > This was first reported by Peter Levart here: > http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-September/028898.html > @richardstartin - does that strength reduction actually happen? The bit-shift > transformation valid only if the original `result` is known to be > non-negative. Yes. @State(Scope.Benchmark) public class StringHashCode { @Param({"sdjhfklashdfklashdflkashdflkasdhf", "締国件街徹条覧野武鮮覧横営績難比兵州催色"}) String string; @CompilerControl(CompilerControl.Mode.DONT_INLINE) @Benchmark public int stringHashCode() { return new String(string).hashCode(); } } [Hottest Region 1].. c2, level 4, StringHashCode::stringHashCode, version 507 (384 bytes) 0x7f2df0142da4: shl$0x3,%r10 0x7f2df0142da8: movabs $0x8,%r12 0x7f2df0142db2: add%r12,%r10 0x7f2df0142db5: xor%r12,%r12 0x7f2df0142db8: cmp%r10,%rax 0x7f2df0142dbb: jne0x7f2de8696080 ; {runtime_call ic_miss_stub} 0x7f2df0142dc1: data16 xchg %ax,%ax 0x7f2df0142dc4: nopl 0x0(%rax,%rax,1) 0x7f2df0142dcc: data16 data16 xchg %ax,%ax [Verified Entry Point] 0.12% 0x7f2df0142dd0: mov%eax,-0x14000(%rsp) 0.84% 0x7f2df0142dd7: push %rbp 0.22% 0x7f2df0142dd8: sub$0x30,%rsp ;*synchronization entry ; - StringHashCode::stringHashCode@-1 (line 14) 0x7f2df0142ddc: mov0xc(%rsi),%r8d ;*getfield string {reexecute=0 rethrow=0 return_oop=0} ; - StringHashCode::stringHashCode@5 (line 14) 0.73% 0x7f2df0142de0: mov0x10(%r12,%r8,8),%eax ; implicit exception: dispatches to 0x7f2df0142fc4 0.10% 0x7f2df0142de5: test %eax,%eax ╭ 0x7f2df0142de7: je 0x7f2df0142df9 ;*synchronization entry │; - StringHashCode::stringHashCode@-1 (line 14) 0.16% │ 0x7f2df0142de9: add$0x30,%rsp │ 0x7f2df0142ded: pop%rbp │ 0x7f2df0142dee: mov0x108(%r15),%r10 0.88% │ 0x7f2df0142df5: test %eax,(%r10); {poll_return} 0.18% │ 0x7f2df0142df8: retq ↘ 0x7f2df0142df9: mov0xc(%r12,%r8,8),%ecx ;*getfield value {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.String::<init>@6 (line 236) ; - StringHashCode::stringHashCode@8 (line 14) 0x7f2df0142dfe: mov0xc(%r12,%rcx,8),%r10d ;*arraylength {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.String::hashCode@13 (line 1503) ; - StringHashCode::stringHashCode@11 (line 14) ; implicit exception: dispatches to 0x7f2df0142fd0 0.83% 0x7f2df0142e03: test %r10d,%r10d 0x7f2df0142e06: jbe0x7f2df0142f86 ;*ifle {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.String::hashCode@14 (line 1503) ; - StringHashCode::stringHashCode@11 (line 14) 0.14% 0x7f2df0142e0c: movsbl 0x14(%r12,%r8,8),%r8d ;*getfield coder {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.String::<init>@14 (line 237) ; - StringHashCode::stringHashCode@8 (line 14) 0.02% 0x7f2df0142e12: test %r8d,%r8d 0x7f2df0142e15: jne0x7f2df0142fac ;*ifne {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.String::isLatin1@10 (li
Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]
On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry wrote: >> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark(size) Mode Cnt Score >>Error Units >> StringHashCode.Algorithm.scalarLatin1 0 avgt 25 2.111 >> ± 0.210 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 3.500 >> ± 0.127 ns/op >> StringHashCode.Algorithm.scalarLatin110 avgt 25 7.001 >> ± 0.099 ns/op >> StringHashCode.Algorithm.scalarLatin1 100 avgt 2561.285 >> ± 0.444 ns/op >> StringHashCode.Algorithm.scalarLatin1 1000 avgt 25 628.995 >> ± 0.846 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 6307.990 >> ± 4.071 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 0 avgt 25 2.358 >> ± 0.092 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3.631 >> ± 0.159 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 10 avgt 25 7.049 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 2533.626 >> ± 1.218 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled161000 avgt 25 317.811 >> ± 1.225 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3212.333 >> ± 14.621 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled80 avgt 25 2.356 >> ± 0.097 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3.630 >> ± 0.158 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 10 avgt 25 8.724 >> ± 0.065 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 2532.402 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000 avgt 25 321.949 >> ± 0.251 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3202.083 >> ± 1.667 ns/op >> StringHashCode.Algorithm.scalarUTF16 0 avgt 25 2.135 >> ± 0.191 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 5.202 >> ± 0.362 ns/op >> StringHashCode.Algorithm.scalarUTF16 10 avgt 2511.105 >> ± 0.112 ns/op >> StringHashCode.Algorithm.scalarUTF16100 avgt 2575.974 >> ± 0.702 ns/op >> StringHashCode.Algorithm.scalarUTF16 1000 avgt 25 716.429 >> ± 3.290 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 7095.459 >> ± 43.847 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled160 avgt 25 2.381 >> ± 0.038 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 5.268 >> ± 0.422 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 10 avgt 2511.248 >> ± 0.178 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 100 avgt 2552.966 >> ± 0.089 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000 avgt 25 450.912 >> ± 1.834 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 4403.988 >> ± 2.927 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 0 avgt 25 2.401 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 5.091 >> ± 0.396 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled810 avgt 2512.801 >> ± 0.189 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 100 avgt 2552.068 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1000 avgt 25 453.270 >> ± 0.340 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 4433.112 >> ± 2.699 ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage. It'
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Tue, 14 Sep 2021 10:57:17 GMT, Alan Bateman wrote: >>> Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I >>> believe this work derives from an unsigned radix sort I implemented on >>> 10/04/2021 >>> [richardstartin/radix-sort-benchmark@ab4da23#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226](https://github.com/richardstartin/radix-sort-benchmark/commit/ab4da230e1d0ac68e5ee2cee38d71c7e7d50f49b#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226) >>> which has numerous structural similarities to this work: >>> Moreover, @bourgesl forked my repository on 11/04/2021 and communicated >>> with me about doing so. On 25/04/2021 there was a new implementation of >>> `DualPivotQuicksort` with a signed radix sort but the same structural >>> similarities, and with the same method and variable names in places >>> [bourgesl/radix-sort-benchmark@90ff7e4#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609](https://github.com/bourgesl/radix-sort-benchmark/commit/90ff7e427da0fa49f374bff0241fb2487bd87bde#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609) >> >> @iaroslavski The attribution is not clear here. Can you provide a summary as >> to who is contributing to this patch? I can't tell if all involved have >> signed the OCA or not. I'm sure there will be questions about space/time >> trade-offs with radix sort but I think it's important to first establish the >> origins of this patch first. > >> @AlanBateman Vertical pipeline of PR hides comments in the middle and you >> have to click on "Show more..." to see all comments. There are no claims >> related to the origin of my patch, it doesn't violate any rights. > > There is a comment from richardstartin suggesting that something was derived > from code in his repo. Is this a benchmark that is not part of this PR? Only > asking because I can't find him on OCA signatories. You can use the Skara > /contributor command to list the contributors. @AlanBateman my claim was that the implementation was derived from my implementation, and demonstrated a sequence of name changes after @bourgesl forked my repository containing a structurally similar radix sort implementation and benchmarks, in order to provide circumstantial evidence for my claim. Via email @iaroslavski told me that this was not the case, which I decided to accept at face value. So please judge this PR on its merits, and disregard the claims made in these comments. I have not signed an OCA but do not want to block this PR if the space time tradeoff is deemed acceptable. - PR: https://git.openjdk.java.net/jdk/pull/3938
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Sat, 8 May 2021 20:54:48 GMT, iaroslavski wrote: > Sorting: > > - adopt radix sort for sequential and parallel sorts on int/long/float/double > arrays (almost random and length > 6K) > - fix tryMergeRuns() to better handle case when the last run is a single > element > - minor javadoc and comment changes > > Testing: > - add new data inputs in tests for sorting > - add min/max/infinity values to float/double testing > - add tests for radix sort src/java.base/share/classes/java/util/DualPivotQuicksort.java line 672: > 670: count2[(a[i] >>> 8) & 0xFF]--; > 671: count3[(a[i] >>> 16) & 0xFF]--; > 672: count4[(a[i] >>> 24) ^ 0x80]--; It seems that C2 can't eliminate the bounds check here because of the `xor`, even though this can't possibly exceed 256. The three masked accesses above are all eliminated. Maybe someone could look in to improving that. - PR: https://git.openjdk.java.net/jdk/pull/3938
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Thu, 13 May 2021 10:22:57 GMT, Laurent Bourgès wrote: >> Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I >> believe this work derives from an unsigned radix sort I implemented on >> 10/04/2021 >> https://github.com/richardstartin/radix-sort-benchmark/commit/ab4da230e1d0ac68e5ee2cee38d71c7e7d50f49b#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226 >> which has numerous structural similarities to this work: >> * Producing all four histograms in one pass >> * Skipping passes based on detecting the total in the histogram >> * Bailing out of the skip detection if a nonzero value not equal to the >> total is encountered >> * Manually unrolling the LSD radix sort loop in order to avoid array copies >> >> My implementation from 10th April is below for reference: >> >> public static void unrollOnePassHistogramsSkipLevels(int[] data) { >> int[] histogram1 = new int[257]; >> int[] histogram2 = new int[257]; >> int[] histogram3 = new int[257]; >> int[] histogram4 = new int[257]; >> >> for (int value : data) { >> ++histogram1[(value & 0xFF) + 1]; >> ++histogram2[((value >>> 8) & 0xFF) + 1]; >> ++histogram3[((value >>> 16) & 0xFF) + 1]; >> ++histogram4[(value >>> 24) + 1]; >> } >> boolean skipLevel1 = canSkipLevel(histogram1, data.length); >> boolean skipLevel2 = canSkipLevel(histogram2, data.length); >> boolean skipLevel3 = canSkipLevel(histogram3, data.length); >> boolean skipLevel4 = canSkipLevel(histogram4, data.length); >> >> if (skipLevel1 && skipLevel2 && skipLevel3 && skipLevel4) { >> return; >> } >> int[] copy = new int[data.length]; >> >> int[] source = data; >> int[] dest = copy; >> >> if (!skipLevel1) { >> for (int i = 1; i < histogram1.length; ++i) { >> histogram1[i] += histogram1[i - 1]; >> } >> for (int value : source) { >> dest[histogram1[value & 0xFF]++] = value; >> } >> if (!skipLevel2 || !skipLevel3 || !skipLevel4) { >> int[] tmp = dest; >> dest = source; >> source = tmp; >> } >> } >> >> if (!skipLevel2) { >> for (int i = 1; i < histogram2.length; ++i) { >> histogram2[i] += histogram2[i - 1]; >> } >> for (int value : source) { >> dest[histogram2[(value >>> 8) & 0xFF]++] = value; >> } >> if (!skipLevel3 || !skipLevel4) { >> int[] tmp = dest; >> dest = source; >> source = tmp; >> } >> } >> >> if (!skipLevel3) { >> for (int i = 1; i < histogram3.length; ++i) { >> histogram3[i] += histogram3[i - 1]; >> } >> for (int value : data) { >> dest[histogram3[(value >>> 16) & 0xFF]++] = value; >> } >> if (!skipLevel4) { >> int[] tmp = dest; >> dest = source; >> source = tmp; >> } >> } >> >> if (!skipLevel4) { >> for (int i = 1; i < histogram4.length; ++i) { >> histogram4[i] += histogram4[i - 1]; >> } >> for (int value : source) { >> dest[histogram4[value >>> 24]++] = value; >> } >> } >> if (dest != data) { >> System.arraycopy(dest, 0, data, 0, data.length); >> } >> } >> >> private static boolean canSkipLevel(int[] histogram, int dataSize) { >> for (int count : histogram) { >> if (count == dataSize) { >> return true; >> } else if (count > 0) { >> return false; >> } >> } >> return true; >> } >> >> >> Moreover, @bourgesl forked my repository on 11/04/2021 and communicated with >> me about doing so. On 25/04/2021 there was a new implementation of >> `DualPivotQuicksort` with a signed radix sort but the same structural >> similarities, and with the same method and variable names in places >> https://github.com/bourgesl/radix-sort-benchmark/commit/90ff7e427da0fa49f374bff0241fb2487bd87bde#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609 >> >> >> // TODO add javadoc >> private static void radixSort(Sorter sorter, int[] a, int low, int high) >> { >> int[] b; >> // LBO: prealloc (high - low) +1 element: >> if (sorter == null || (b = sorter.b) == null || b.length < (high - >> low)) { >> // System.out.println("alloc b: " + (high - low)); >> b = new int[high - low]; >> } >> >> int[] count1, count2, count3, count4; >> if (sorter != null) { >> sorter.resetRadixBuffers(); >> count1 = sorter.count1; >> count2 = sorter.count2; >> count3 = sorter.count3; >> count4 = sorter.count4; >> } else { >> // System.out.println("alloc radix buffers(4x256)"); >> count1 = new int[256]; >> count2 = new int[256]; >> count3 = new int[256]; >> count4 = new int[256]; >> } >> >>
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Fri, 14 May 2021 07:14:27 GMT, Laurent Bourgès wrote: >> So the issue of not skipping passes was my fault in the translation process, >> so not something to worry about, though after [fixing >> that](https://github.com/richardstartin/radix-sort-benchmark/commit/ccbee984c6a0e0f50c30de59e1a5e9fbcad89510) >> the original implementation still has the edge because of the bounds checks >> on the `xor` not getting eliminated. >> >> >> Benchmark (bits) (padding) >> (scenario) (seed) (size) Mode Cnt ScoreError Units >> RadixSortBenchmark.jdk17 7 >> UNIFORM 0 100 avgt5 10432.408 ± 87.024 us/op >> RadixSortBenchmark.jdk23 7 >> UNIFORM 0 100 avgt5 9465.990 ± 40.598 us/op >> RadixSortBenchmark.jdk30 7 >> UNIFORM 0 100 avgt5 11189.146 ± 50.972 us/op >> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 >> UNIFORM 0 100 avgt5 9546.963 ± 41.698 us/op >> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 >> UNIFORM 0 100 avgt5 9412.114 ± 43.081 us/op >> RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 >> UNIFORM 0 100 avgt5 10823.618 ± 64.311 us/op > > Great analysis on C2, richard. > > maybe (x ^ 0x80) &0xFF would help C2 to eliminate bound checks... I don't know Laurent, I find the handling of signed order over-complicated. Subtracting `Integer.MIN_VALUE` is really cheap... - PR: https://git.openjdk.java.net/jdk/pull/3938
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Thu, 13 May 2021 14:44:28 GMT, Richard Startin wrote: >> @iaroslavski I would prefer to discuss this in private than here, but my >> argument is that the name `skipByte` came from Laurent's code, and that >> Laurent's code was clearly derived from my own within a fork of my >> repository. I linked the commits where you changed `skipByte` to `passLevel` >> and Laurent changed my name `canSkipLevel` to `skipByte`. >> >> For me, this raises questions about the independence of your work from >> Laurent's, and Laurent's work is clearly derived from my own (and I don't >> think anyone is disputing the latter). I would be happy to sort this out in >> private. > > In private correspondence with Vladimir, it was explained that where > Vladimir's code and Laurent's code are identical, including typos > ([Vladimir's > code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672), > [Laurent's > code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719)) > it is because Vladimir sent the code to Laurent, not the other way around, > therefore Vladimir's code does not derive from Laurent's, and it does not > derive from mine. I can only trust that this is the case, so please disregard > my claim that this is derivative work when reviewing this PR. For what it's worth, I benchmarked this implementation radix sort ([adapted here to fit in to my harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782)) against a [signed variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478) of what I have claimed this work was derived from and the proposed implementation does not perform favourably on uniformly random data: Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units RadixSortBenchmark.jdk17 7 UNIFORM 0 100 avgt5 11301.950 ± 113.691 us/op RadixSortBenchmark.jdk23 7 UNIFORM 0 100 avgt5 11792.351 ± 60.757 us/op RadixSortBenchmark.jdk30 7 UNIFORM 0 100 avgt5 11184.616 ± 67.094 us/op RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 UNIFORM 0 100 avgt5 9564.626 ± 69.497 us/op RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 UNIFORM 0 100 avgt5 9432.085 ± 58.983 us/op RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 UNIFORM 0 100 avgt5 10772.975 ± 51.848 us/op I believe the root cause is a defect in the mechanism employed to skip passes as can be seen by the increased number of instructions and cycles here. In the proposed implementation, instructions is roughly constant as a function of bits. In the case where all passes must be performed (bits = 30), IPC is superior in `unrollOnePassHistogramsSkipLevelsSigned`. Benchmark (bits) (padding) (scenario) (seed) (size) Mode Cnt Score Error Units RadixSortBenchmark.jdk:cycles 17 7 UNIFORM 0 100 avgt 34976971.877 #/op RadixSortBenchmark.jdk:instructions 17 7 UNIFORM 0 100 avgt 70121142.003 #/op RadixSortBenchmark.jdk:cycles 23 7 UNIFORM 0 100 avgt 32369970.385 #/op RadixSortBenchmark.jdk:instructions 23 7 UNIFORM 0 100 avgt 70201664.963 #/op RadixSortBenchmark.jdk:cycles 30 7 UNIFORM 0 100 avgt 30789736.602 #/op RadixSortBenchmark.jdk:instructions 30 7 UNIFORM 0 100 avgt 70180942.122 #/op RadixSortBenchmark.jdk:IPC 30 7 UNIFORM 0 100 avgt 2.279 insns/clk RadixSortBenc
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Thu, 13 May 2021 20:23:16 GMT, Richard Startin wrote: >> In private correspondence with Vladimir, it was explained that where >> Vladimir's code and Laurent's code are identical, including typos >> ([Vladimir's >> code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672), >> [Laurent's >> code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719)) >> it is because Vladimir sent the code to Laurent, not the other way around, >> therefore Vladimir's code does not derive from Laurent's, and it does not >> derive from mine. I can only trust that this is the case, so please >> disregard my claim that this is derivative work when reviewing this PR. > > For what it's worth, I benchmarked this implementation radix sort ([adapted > here to fit in to my > harness](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR681-R710782)) > against a [signed > variant](https://github.com/richardstartin/radix-sort-benchmark/commit/07169e8e8602152cfda859baa159db165bf5fcab#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR396-R478) > of what I have claimed this work was derived from and the proposed > implementation does not perform favourably on uniformly random data: > > > > Benchmark (bits) (padding) > (scenario) (seed) (size) Mode Cnt Score Error Units > RadixSortBenchmark.jdk17 7 > UNIFORM 0 100 avgt5 11301.950 ± 113.691 us/op > RadixSortBenchmark.jdk23 7 > UNIFORM 0 100 avgt5 11792.351 ± 60.757 us/op > RadixSortBenchmark.jdk30 7 > UNIFORM 0 100 avgt5 11184.616 ± 67.094 us/op > RadixSortBenchmark.unrollOnePassSkipLevelsSigned 17 7 > UNIFORM 0 100 avgt5 9564.626 ± 69.497 us/op > RadixSortBenchmark.unrollOnePassSkipLevelsSigned 23 7 > UNIFORM 0 100 avgt5 9432.085 ± 58.983 us/op > RadixSortBenchmark.unrollOnePassSkipLevelsSigned 30 7 > UNIFORM 0 100 avgt5 10772.975 ± 51.848 us/op > > > > I believe the root cause is a defect in the mechanism employed to skip passes > as can be seen by the increased number of instructions and cycles here. In > the proposed implementation, instructions is roughly constant as a function > of bits. In the case where all passes must be performed (bits = 30), IPC is > superior in `unrollOnePassHistogramsSkipLevelsSigned`. > > > Benchmark > (bits) (padding) (scenario) (seed) (size) Mode Cnt Score > Error Units > RadixSortBenchmark.jdk:cycles > 17 7 UNIFORM 0 100 avgt 34976971.877 > #/op > RadixSortBenchmark.jdk:instructions > 17 7 UNIFORM 0 100 avgt 70121142.003 > #/op > RadixSortBenchmark.jdk:cycles > 23 7 UNIFORM 0 100 avgt 32369970.385 > #/op > RadixSortBenchmark.jdk:instructions > 23 7 UNIFORM 0 100 avgt 70201664.963 > #/op > RadixSortBenchmark.jdk:cycles > 30 7 UNIFORM 0 100 avgt 30789736.602 > #/op > RadixSortBenchmark.jdk:instructions > 30 7 UNIFORM 0 100 avgt 70180942.122 > #/op > RadixSortBenchmark.jdk:IPC > 30 7 UNIFORM 0 100 avgt 2.279 > insns/clk > RadixSortBenchmark.unrollOnePassSkipLevelsSigned:cycles > 17 7 UNIFORM 0 100 avgt 26983994.479 > #/op > RadixSortBenchmark.unrollOnePassSkipLevelsSigned:instructions > 17 7 UNIFORM 0 100 avgt 62065304.827 > #/op > RadixSortBenchmark.unrollO
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Thu, 13 May 2021 11:31:49 GMT, iaroslavski wrote: >> Perhaps we can resolve this issue in private - my email address is on my >> profile (or in the commits in `radix-sort-benchmark`)? > > @richardstartin And one more addon: my first version of Radix sort, see my > github https://github.com/iaroslavski/sorting/tree/master/radixsort uses > another name, like skipBytes, then renamed to passLevel. > So, the common part is "skip". And this method has different number of > parameters. I don't see any collision with your code. @iaroslavski I would prefer to discuss this in private than here, but my argument is that the name `skipByte` came from Laurent's code, and that Laurent's code was clearly derived from my own within a fork of my repository. I linked the commits where you changed `skipByte` to `passLevel` and Laurent changed my name `canSkipLevel` to `skipByte`. For me, this raises questions about the independence of your work from Laurent's, and Laurent's work is clearly derived from my own (and I don't think anyone is disputing the latter). I would be happy to sort this out in private. - PR: https://git.openjdk.java.net/jdk/pull/3938
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Thu, 13 May 2021 11:47:58 GMT, Richard Startin wrote: >> @richardstartin And one more addon: my first version of Radix sort, see my >> github https://github.com/iaroslavski/sorting/tree/master/radixsort uses >> another name, like skipBytes, then renamed to passLevel. >> So, the common part is "skip". And this method has different number of >> parameters. I don't see any collision with your code. > > @iaroslavski I would prefer to discuss this in private than here, but my > argument is that the name `skipByte` came from Laurent's code, and that > Laurent's code was clearly derived from my own within a fork of my > repository. I linked the commits where you changed `skipByte` to `passLevel` > and Laurent changed my name `canSkipLevel` to `skipByte`. > > For me, this raises questions about the independence of your work from > Laurent's, and Laurent's work is clearly derived from my own (and I don't > think anyone is disputing the latter). I would be happy to sort this out in > private. In private correspondence with Vladimir, it was explained that where Vladimir's code and Laurent's code are identical, including typos ([Vladimir's code](https://github.com/iaroslavski/sorting/commit/f076073b8b819a9687613903a164e3ed71821769#diff-4b4d68fc834c2ad12a9fb9d316a812221af7c398338ed2ee907d0a795e7aadafR672), [Laurent's code](https://github.com/bourgesl/radix-sort-benchmark/commit/a693b26b2e2c14cfeedf9c753c9d643096b0e38d#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R719)) it is because Vladimir sent the code to Laurent, not the other way around, therefore Vladimir's code does not derive from Laurent's, and it does not derive from mine. I can only trust that this is the case, so please disregard my claim that this is derivative work when reviewing this PR. - PR: https://git.openjdk.java.net/jdk/pull/3938
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)
On Wed, 12 May 2021 12:20:09 GMT, iaroslavski wrote: >> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 47: >> >>> 45: * @author Doug Lea >>> 46: * >>> 47: * @version 2020.06.14 >> >> Vladimir, I would update to 2021.05.06 (+your hash) > > Laurent, the date in this class is not the date of our last commit, > this date is the date when I have final idea regarding to Radix sort, > therefore, I prefer to keep 2020.06.14 Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I believe this work derives from an unsigned radix sort I implemented on 10/04/2021 https://github.com/richardstartin/radix-sort-benchmark/commit/ab4da230e1d0ac68e5ee2cee38d71c7e7d50f49b#diff-6c13d3fb74f38906677dbfa1a70a123c8e5baf4a39219c81ef121e078d0013bcR226 which has numerous structural similarities to this work: * Producing all four histograms in one pass * Skipping passes based on detecting the total in the histogram * Bailing out of the skip detection if a nonzero value not equal to the total is encountered * Manually unrolling the LSD radix sort loop in order to avoid array copies My implementation from 10th April is below for reference: public static void unrollOnePassHistogramsSkipLevels(int[] data) { int[] histogram1 = new int[257]; int[] histogram2 = new int[257]; int[] histogram3 = new int[257]; int[] histogram4 = new int[257]; for (int value : data) { ++histogram1[(value & 0xFF) + 1]; ++histogram2[((value >>> 8) & 0xFF) + 1]; ++histogram3[((value >>> 16) & 0xFF) + 1]; ++histogram4[(value >>> 24) + 1]; } boolean skipLevel1 = canSkipLevel(histogram1, data.length); boolean skipLevel2 = canSkipLevel(histogram2, data.length); boolean skipLevel3 = canSkipLevel(histogram3, data.length); boolean skipLevel4 = canSkipLevel(histogram4, data.length); if (skipLevel1 && skipLevel2 && skipLevel3 && skipLevel4) { return; } int[] copy = new int[data.length]; int[] source = data; int[] dest = copy; if (!skipLevel1) { for (int i = 1; i < histogram1.length; ++i) { histogram1[i] += histogram1[i - 1]; } for (int value : source) { dest[histogram1[value & 0xFF]++] = value; } if (!skipLevel2 || !skipLevel3 || !skipLevel4) { int[] tmp = dest; dest = source; source = tmp; } } if (!skipLevel2) { for (int i = 1; i < histogram2.length; ++i) { histogram2[i] += histogram2[i - 1]; } for (int value : source) { dest[histogram2[(value >>> 8) & 0xFF]++] = value; } if (!skipLevel3 || !skipLevel4) { int[] tmp = dest; dest = source; source = tmp; } } if (!skipLevel3) { for (int i = 1; i < histogram3.length; ++i) { histogram3[i] += histogram3[i - 1]; } for (int value : data) { dest[histogram3[(value >>> 16) & 0xFF]++] = value; } if (!skipLevel4) { int[] tmp = dest; dest = source; source = tmp; } } if (!skipLevel4) { for (int i = 1; i < histogram4.length; ++i) { histogram4[i] += histogram4[i - 1]; } for (int value : source) { dest[histogram4[value >>> 24]++] = value; } } if (dest != data) { System.arraycopy(dest, 0, data, 0, data.length); } } private static boolean canSkipLevel(int[] histogram, int dataSize) { for (int count : histogram) { if (count == dataSize) { return true; } else if (count > 0) { return false; } } return true; } Moreover, @bourgesl forked my repository on 11/04/2021 and communicated with me about doing so. On 25/04/2021 there was a new implementation of `DualPivotQuicksort` with a signed radix sort but the same structural similarities, and with the same method and variable names in places https://github.com/bourgesl/radix-sort-benchmark/commit/90ff7e427da0fa49f374bff0241fb2487bd87bde#diff-397ce8fd791e2ce508cf9127201bc9ab46264cd2a79fd0487a63569f2e4b59b2R607-R609 // TODO add javadoc private static void radixSort(Sorter sorter, int[] a, int low, int high) { int[] b; // LBO: prealloc (high - low) +1 element: if (sorter == null || (b = sorter.b) == null || b.length < (high - low)) { // System.out.println("alloc b: " + (high - low)); b = new int[high - low]; } int[] count1, count2, count3, count4; if (sorter != null) { sorter.resetRadixBuffers(); count1 = sorter.count1; count2 = sorter.count2; count3 = sorter.count3; count4 = sorter.count4; } else { // System.out.println("alloc radix buffers(4x256)"); count1 = new int[256]; count2 = new int[256]; count3 = new int[256]; count4 = new int[256]; } for (int i = low; i < hig