Re: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception

2024-07-03 Thread Paul Sandoz
On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule  wrote:

> Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this 
> change adds special casing for `VarHandle.{access-mode}` methods.
> 
> The exception message is less exact, but I think that's acceptable.

src/hotspot/share/prims/methodHandles.cpp line 1371:

> 1369:  * invoked directly.
> 1370:  */
> 1371: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject mh, jobjectArray args)) {

Suggestion:

JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject vh, jobjectArray args)) {

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664836522


Re: RFR: 8327854: Test java/util/stream/test/org/openjdk/tests/java/util/stream/WhileOpStatefulTest.java failed with RuntimeException

2024-07-01 Thread Paul Sandoz
On Sat, 1 Jun 2024 11:49:39 GMT, Viktor Klang  wrote:

> This PR improves the test failure output for the failing test case, and the 
> underlying issue is likely going to be addressed by 
> https://github.com/openjdk/jdk/pull/19131 /cc @DougLea

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19508#pullrequestreview-2152007241


Re: RFR: 8335252: Reduce size of j.u.Formatter.Conversion#isValid [v4]

2024-06-28 Thread Paul Sandoz
On Fri, 28 Jun 2024 12:46:49 GMT, Shaojin Wen  wrote:

>> Currently, the java.util.Formatter$Conversion::isValid method is implemented 
>> based on switch, which cannot be inlined because codeSize > 325. This 
>> problem can be avoided by implementing it with ImmutableBitSetPredicate.
>> 
>> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master 
>> branch:
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (358 bytes)   failed to 
>> inline: hot method too big
>> 
>> 
>> current version
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (10 bytes)   inline (hot)
>>   @ 4   
>> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test
>>  (50 bytes)   inline (hot)
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix comment

That's a clever change and more preferable (but still potentially fragile in 
the source to bytecode translation process). However I still don't understand 
the overall motivation, why does it matter?

-

PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2197275066


Re: RFR: 8335252: ForceInline j.u.Formatter.Conversion#isValid [v2]

2024-06-27 Thread Paul Sandoz
On Thu, 27 Jun 2024 14:12:36 GMT, Shaojin Wen  wrote:

>> Currently, the java.util.Formatter$Conversion::isValid method is implemented 
>> based on switch, which cannot be inlined because codeSize > 325. This 
>> problem can be avoided by implementing it with ImmutableBitSetPredicate.
>> 
>> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master 
>> branch:
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (358 bytes)   failed to 
>> inline: hot method too big
>> 
>> 
>> current version
>> 
>> @ 109   java.util.Formatter$Conversion::isValid (10 bytes)   inline (hot)
>>   @ 4   
>> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test
>>  (50 bytes)   inline (hot)
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert & use `@ForceInline`

Can you provide some additional context here? I think we need to be very 
careful about the general use @ForceInline in core libraries. While you show a 
modest performance benefit using a the micro benchmark, will it actually make 
any difference overall given formatting strings is not particular efficient?

String templates, currently removed, provided good string formatting 
performance, and the redesign will i think also provide good performance.

-

PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2195095064


Re: [External] : Gatherer and primitive specialization

2024-06-13 Thread Paul Sandoz


> On Jun 13, 2024, at 8:20 AM, fo...@univ-mlv.fr wrote:
> 
> 
> 
> From: "Viktor Klang" 
> To: "Remi Forax" , "core-libs-dev" 
> 
> Sent: Thursday, June 13, 2024 12:52:03 PM
> Subject: Re: [External] : Gatherer and primitive specialization
> Hi Rémi,
> 
> Given that Collector has not been specialized since it was introduced, we 
> opted to not specialize Gatherer eagerly as Valhalla Value Classes may also 
> move the needle a bit regarding the need for specialization in general.
> 
> As i said previously, most collectors uses collections so until collections 
> are specialized, specializing the collector API is not that useful.
> A gatherer unlike a Collector can just transform values, thus specializing it 
> is more useful.
> 
> For Valhalla, we are at year 10 and value classes are not there yet, 
> specialization of generics is in my opinion 10 years ahead.
> I'm not complaining, if we had rush value classes, everybody will have regret 
> it, but those things take a lot of time.
> 

Also we may benefit from stack flattening with the boxed primitives. I would 
hold off on any such explicit specialization (the pressure on Streams was too 
great, less so for Gatherers IMO). Gatherers may provide a useful way to 
explore the effectiveness of using primitive boxes that don’t have identity.

Paul.   

Re: RFR: 8331196: vector api: Remove unnecessary index check in Byte/ShortVector.fromArray/fromArray0Template

2024-05-22 Thread Paul Sandoz
On Fri, 26 Apr 2024 14:06:02 GMT, Hamlin Li  wrote:

> Hi,
> Can you help to review this simple patch?
> Some index check in Byte/ShortVector.fromArray/fromArray0Template seems not 
> necessary, could be removed.
> Thanks

The intrinsic implementation will not perform bounds checks. I think what you 
may be observing is the result of bounds checks when Java code is interpreted 
(or perhaps from compiled C1 code). You need to first ensure the code is 
compiled to C2 before executing with out of bounds values.

-

PR Comment: https://git.openjdk.org/jdk/pull/18977#issuecomment-2125465612


Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v3]

2024-05-21 Thread Paul Sandoz
On Tue, 21 May 2024 10:20:27 GMT, Maurizio Cimadamore  
wrote:

>> When creating a nested memory access var handle, we ensure that the segment 
>> is accessed at the correct alignment for the root layout being accessed. But 
>> we do not ensure that the segment has at least the size of the accessed root 
>> layout. Example:
>> 
>> 
>> MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), 
>> JAVA_INT.withName("y")));
>> VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), 
>> PathElement.groupElement("x"));
>> 
>> 
>> If I have a memory segment whose size is 12, I can successfully call 
>> X_HANDLE on it with index 1, like so:
>> 
>> 
>> MemorySegment segment = Arena.ofAuto().allocate(12);
>> int x = (int)X_HANDLE.get(segment, 0, 1);
>> 
>> 
>> This seems incorrect: after all, the provided segment doesn't have enough 
>> space to fit _two_ elements of the nested structs. 
>> 
>> This PR adds checks to detect this kind of scenario earlier, thus improving 
>> usability. To achieve this we could, in principle, just tweak 
>> `LayoutPath::checkEnclosingLayout` and add the required size check.
>> 
>> But doing so will add yet another redundant check on top of the other 
>> already added by [pull/19124](https://git.openjdk.org/jdk/pull/19124). 
>> Instead, this PR rethinks how memory segment var handles are created, in 
>> order to eliminate redundant checks.
>> 
>> The main observation is that size and alignment checks depend on an 
>> _enclosing_ layout. This layout *might* be the very same value layout being 
>> accessed (this is the case when e.g. using `JAVA_INT.varHandle()`). But, in 
>> the general case, the accessed value layout and the enclosing layout might 
>> differ (e.g. think about accessing a struct field).
>> 
>> Furthermore, the enclosing layout check depends on the _base_ offset at 
>> which the segment is accessed, _prior_ to any index computation that occurs 
>> if the accessed layout path has any open elements. It is important to notice 
>> that this base offset is only available when looking at the var handle that 
>> is returned to the user. For instance, an indexed var handle with 
>> coordinates:
>> 
>> 
>> (MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, 
>> long /* index 3 */)
>> 
>> 
>> Is obtained through adaptation, by taking a more basic var handle of the 
>> form:
>> 
>> 
>> (MemorySegment, long /* offset */)
>> 
>> 
>> And then injecting the result of the index multiplication into `offset`. As 
>> such, we can't add an enclosing layout check inside the var handle returned 
>> by `VarHandles::memorySegmentViewHandle`, as doing so will end up seeing the 
>> *wrong* off...
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fix typo in javadoc

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19251#pullrequestreview-2068996279


Re: RFR: 8331196: vector api: Remove unnecessary index check in Byte/ShortVector.fromArray/fromArray0Template

2024-05-21 Thread Paul Sandoz
On Mon, 20 May 2024 19:21:44 GMT, Paul Sandoz  wrote:

>> Hi,
>> Can you help to review this simple patch?
>> Some index check in Byte/ShortVector.fromArray/fromArray0Template seems not 
>> necessary, could be removed.
>> Thanks
>
> That does not look correct and will only check a prefix indexes. A 
> `ByteVector` with a shape of 256 bits has 32 lanes, whereas an `IntVector` of 
> the same shape has 8 lanes. The `mapOffset` array will hold 32 indexes that 
> need checking, so we need to loop through `mapOffset` array four times.

> Thanks @PaulSandoz for comment. I just re-ran the vector api tests with this 
> patch on x64 and riscv64, but seems no failures triggered. Let me check 
> further, either I missed something or maybe there is some gap in test to be 
> filled.

More likely a gap in the tests, not sufficiently checking for out of bounds 
access across the range in the mapOffset array.

-

PR Comment: https://git.openjdk.org/jdk/pull/18977#issuecomment-2122917109


Re: RFR: 8331196: vector api: Remove unnecessary index check in Byte/ShortVector.fromArray/fromArray0Template

2024-05-20 Thread Paul Sandoz
On Fri, 26 Apr 2024 14:06:02 GMT, Hamlin Li  wrote:

> Hi,
> Can you help to review this simple patch?
> Some index check in Byte/ShortVector.fromArray/fromArray0Template seems not 
> necessary, could be removed.
> Thanks

That does not look correct and will only check a prefix indexes. A `ByteVector` 
with a shape of 256 bits has 32 lanes, whereas an `IntVector` of the same shape 
has 8 lanes. The `mapOffset` array will hold 32 indexes that need checking, so 
we need to loop through `mapOffset` array four times.

-

PR Comment: https://git.openjdk.org/jdk/pull/18977#issuecomment-2121057883


Re: RFR: 8332486: ClassFile API ArrayIndexOutOfBoundsException with label metadata

2024-05-20 Thread Paul Sandoz
On Mon, 20 May 2024 08:37:49 GMT, Adam Sotona  wrote:

> Parsing of a specifically corrupted class file cause unexpected 
> `ArrayIndexOutOfBoundsException` during label inflation.
> This patch checks the valid range and throws expected 
> `IllegalArgumentException` instead.
> Relevant test is added.
> 
> Please review.
> 
> Thanks,
> Adam

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19304#pullrequestreview-2066846853


Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v2]

2024-05-20 Thread Paul Sandoz
On Mon, 20 May 2024 16:31:18 GMT, Maurizio Cimadamore  
wrote:

> 
> Ah, got it. You mean more support in VarHandleGuards. Yes, I have a separate 
> patch for that (actually had that for quite a while), but we're not super 
> sure how to evaluate what impact it has :-)

Ah, i did not realize that. Yes its tricky, it was designed for VHs to 
fields/arrays, to really minimize their overhead. With segments there is 
already some additional overhead e.g.,

if (derefAdapters.length == 0) {
// insert align check for the root layout on the initial MS + offset
List> coordinateTypes = handle.coordinateTypes();
MethodHandle alignCheck = 
MethodHandles.insertArguments(MH_CHECK_ENCL_LAYOUT, 2, rootLayout());
handle = MethodHandles.collectCoordinates(handle, 0, alignCheck);
int[] reorder = IntStream.concat(IntStream.of(0, 1), 
IntStream.range(0, coordinateTypes.size())).toArray();
handle = MethodHandles.permuteCoordinates(handle, coordinateTypes, 
reorder);
}

So perhaps it does not make much difference.

-

PR Comment: https://git.openjdk.org/jdk/pull/19251#issuecomment-2120813942


Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v2]

2024-05-17 Thread Paul Sandoz
On Thu, 16 May 2024 14:37:15 GMT, Maurizio Cimadamore  
wrote:

>> When creating a nested memory access var handle, we ensure that the segment 
>> is accessed at the correct alignment for the root layout being accessed. But 
>> we do not ensure that the segment has at least the size of the accessed root 
>> layout. Example:
>> 
>> 
>> MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), 
>> JAVA_INT.withName("y")));
>> VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), 
>> PathElement.groupElement("x"));
>> 
>> 
>> If I have a memory segment whose size is 12, I can successfully call 
>> X_HANDLE on it with index 1, like so:
>> 
>> 
>> MemorySegment segment = Arena.ofAuto().allocate(12);
>> int x = (int)X_HANDLE.get(segment, 0, 1);
>> 
>> 
>> This seems incorrect: after all, the provided segment doesn't have enough 
>> space to fit _two_ elements of the nested structs. 
>> 
>> This PR adds checks to detect this kind of scenario earlier, thus improving 
>> usability. To achieve this we could, in principle, just tweak 
>> `LayoutPath::checkEnclosingLayout` and add the required size check.
>> 
>> But doing so will add yet another redundant check on top of the other 
>> already added by [pull/19124](https://git.openjdk.org/jdk/pull/19124). 
>> Instead, this PR rethinks how memory segment var handles are created, in 
>> order to eliminate redundant checks.
>> 
>> The main observation is that size and alignment checks depend on an 
>> _enclosing_ layout. This layout *might* be the very same value layout being 
>> accessed (this is the case when e.g. using `JAVA_INT.varHandle()`). But, in 
>> the general case, the accessed value layout and the enclosing layout might 
>> differ (e.g. think about accessing a struct field).
>> 
>> Furthermore, the enclosing layout check depends on the _base_ offset at 
>> which the segment is accessed, _prior_ to any index computation that occurs 
>> if the accessed layout path has any open elements. It is important to notice 
>> that this base offset is only available when looking at the var handle that 
>> is returned to the user. For instance, an indexed var handle with 
>> coordinates:
>> 
>> 
>> (MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, 
>> long /* index 3 */)
>> 
>> 
>> Is obtained through adaptation, by taking a more basic var handle of the 
>> form:
>> 
>> 
>> (MemorySegment, long /* offset */)
>> 
>> 
>> And then injecting the result of the index multiplication into `offset`. As 
>> such, we can't add an enclosing layout check inside the var handle returned 
>> by `VarHandles::memorySegmentViewHandle`, as doing so will end up seeing the 
>> *wrong* off...
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fix copyrights

Took me a few passes to work it all out. Looks good. All the bounds checking 
action now consistently passes through the call to `checkEnclosingLayout` in 
adaption of the raw (and unsafe) segment accessing VarHandle.

Separately, we might be missing a few long argument accepting guard methods for 
simpler cases as I suspect they are still focused more on int index types.

-

PR Review: https://git.openjdk.org/jdk/pull/19251#pullrequestreview-2064493620


Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v2]

2024-05-17 Thread Paul Sandoz
On Thu, 16 May 2024 14:37:15 GMT, Maurizio Cimadamore  
wrote:

>> When creating a nested memory access var handle, we ensure that the segment 
>> is accessed at the correct alignment for the root layout being accessed. But 
>> we do not ensure that the segment has at least the size of the accessed root 
>> layout. Example:
>> 
>> 
>> MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), 
>> JAVA_INT.withName("y")));
>> VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), 
>> PathElement.groupElement("x"));
>> 
>> 
>> If I have a memory segment whose size is 12, I can successfully call 
>> X_HANDLE on it with index 1, like so:
>> 
>> 
>> MemorySegment segment = Arena.ofAuto().allocate(12);
>> int x = (int)X_HANDLE.get(segment, 0, 1);
>> 
>> 
>> This seems incorrect: after all, the provided segment doesn't have enough 
>> space to fit _two_ elements of the nested structs. 
>> 
>> This PR adds checks to detect this kind of scenario earlier, thus improving 
>> usability. To achieve this we could, in principle, just tweak 
>> `LayoutPath::checkEnclosingLayout` and add the required size check.
>> 
>> But doing so will add yet another redundant check on top of the other 
>> already added by [pull/19124](https://git.openjdk.org/jdk/pull/19124). 
>> Instead, this PR rethinks how memory segment var handles are created, in 
>> order to eliminate redundant checks.
>> 
>> The main observation is that size and alignment checks depend on an 
>> _enclosing_ layout. This layout *might* be the very same value layout being 
>> accessed (this is the case when e.g. using `JAVA_INT.varHandle()`). But, in 
>> the general case, the accessed value layout and the enclosing layout might 
>> differ (e.g. think about accessing a struct field).
>> 
>> Furthermore, the enclosing layout check depends on the _base_ offset at 
>> which the segment is accessed, _prior_ to any index computation that occurs 
>> if the accessed layout path has any open elements. It is important to notice 
>> that this base offset is only available when looking at the var handle that 
>> is returned to the user. For instance, an indexed var handle with 
>> coordinates:
>> 
>> 
>> (MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, 
>> long /* index 3 */)
>> 
>> 
>> Is obtained through adaptation, by taking a more basic var handle of the 
>> form:
>> 
>> 
>> (MemorySegment, long /* offset */)
>> 
>> 
>> And then injecting the result of the index multiplication into `offset`. As 
>> such, we can't add an enclosing layout check inside the var handle returned 
>> by `VarHandles::memorySegmentViewHandle`, as doing so will end up seeing the 
>> *wrong* off...
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fix copyrights

src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 630:

> 628:  * The access operation must fall inside the spatial bounds 
> of the accessed
> 629:  * memory segment, or an {@link IndexOutOfBoundsException} is 
> thrown. This is the case
> 630:  * when {@code B + A <= S}, where {@code O} is the base offset 
> (defined above),

Do you mean `{@code O + A <= S}`?
(Still working my way through the changes...)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19251#discussion_r1605231757


Re: RFR: 8331940: ClassFile API ArrayIndexOutOfBoundsException with certain class files [v2]

2024-05-15 Thread Paul Sandoz
On Wed, 15 May 2024 07:51:33 GMT, Adam Sotona  wrote:

>> Class file with `LineNumberTable` attribute element pointing behind the 
>> bytecode array throws `ArrayIndexOutOfBoundsException`.
>> This patch performs the check and throws  expected 
>> `IllegalArgumentException` instead.
>> Relevant test is added.
>> 
>> Please review.
>> 
>> Thanks,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fixed exception message

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19230#pullrequestreview-2058434044


Re: RFR: 8331940: ClassFile API ArrayIndexOutOfBoundsException with certain class files

2024-05-14 Thread Paul Sandoz
On Tue, 14 May 2024 13:18:51 GMT, Adam Sotona  wrote:

> Class file with `LineNumberTable` attribute element pointing behind the 
> bytecode array throws `ArrayIndexOutOfBoundsException`.
> This patch performs the check and throws  expected `IllegalArgumentException` 
> instead.
> Relevant test is added.
> 
> Please review.
> 
> Thanks,
> Adam

src/java.base/share/classes/jdk/internal/classfile/impl/CodeImpl.java line 241:

> 239: int startPc = classReader.readU2(p);
> 240: if (startPc > codeLength) {
> 241: throw new 
> IllegalArgumentException(String.format("Line number out of range; 
> start_pc=%d, codeLength=%d",

It's the byte code index that is out of range, not the line number associated 
with it.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19230#discussion_r1600292689


Re: RFR: 8332003: Clarify javadoc for MemoryLayout::offsetHandle [v5]

2024-05-10 Thread Paul Sandoz
On Fri, 10 May 2024 16:10:29 GMT, Maurizio Cimadamore  
wrote:

>> Consider this layout:
>> 
>> 
>> MemoryLayout SEQ = MemoryLayout.sequenceLayout(5,
>>  MemoryLayout.sequenceLayout(10, JAVA_INT));
>> 
>> 
>> And the corresponding offset handle:
>> 
>> 
>> MethodHandle offsetHandle = SEQ.offsetHandle(PathElement.sequenceLayout(), 
>> PathElement.sequenceLayout());
>> 
>> 
>> The resulting method handle takes two additional `long` indices. The 
>> implementation checks that the dynamically provided indices conform to the 
>> bound available at construction: that is, the first index must be < 5, while 
>> the second must be < 10. If this is not true, an `IndexOutOfBoundException` 
>> is thrown.
>> 
>> However, the javadoc for `MemoryLayout::byteOffsetHandle` doesn't claim 
>> anything in this direction. There are only some vague claims in the javadoc 
>> for `PathElement::sequenceElement()` and `PathElement::sequenceElement(long, 
>> long, long)` which make some claims on which indices are actually allowed, 
>> but the text seems more in the tone of a discussion, rather than actual 
>> normative text.
>> 
>> I've tweaked the javadoc for `MemoryLayout::byteOffsetHandle` to actually 
>> state that the indices will be checked against the "size" of the 
>> corresponding open path element (this is a new concept that I also have 
>> defined in the javadoc).
>> 
>> I also added a test for `byteOffsetHandle` as I don't think we had a test 
>> for that specifically (although this method is tested indirectly, via 
>> `MemoryLayout::varHandle`).
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fix another index check

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19158#pullrequestreview-2050771038


Re: RFR: 8332003: Clarify javadoc for MemoryLayout::offsetHandle [v2]

2024-05-09 Thread Paul Sandoz
On Thu, 9 May 2024 18:26:17 GMT, Maurizio Cimadamore  
wrote:

>> Consider this layout:
>> 
>> 
>> MemoryLayout SEQ = MemoryLayout.sequenceLayout(5,
>>  MemoryLayout.sequenceLayout(10, JAVA_INT));
>> 
>> 
>> And the corresponding offset handle:
>> 
>> 
>> MethodHandle offsetHandle = SEQ.offsetHandle(PathElement.sequenceLayout(), 
>> PathElement.sequenceLayout());
>> 
>> 
>> The resulting method handle takes two additional `long` indices. The 
>> implementation checks that the dynamically provided indices conform to the 
>> bound available at construction: that is, the first index must be < 5, while 
>> the second must be < 10. If this is not true, an `IndexOutOfBoundException` 
>> is thrown.
>> 
>> However, the javadoc for `MemoryLayout::byteOffsetHandle` doesn't claim 
>> anything in this direction. There are only some vague claims in the javadoc 
>> for `PathElement::sequenceElement()` and `PathElement::sequenceElement(long, 
>> long, long)` which make some claims on which indices are actually allowed, 
>> but the text seems more in the tone of a discussion, rather than actual 
>> normative text.
>> 
>> I've tweaked the javadoc for `MemoryLayout::byteOffsetHandle` to actually 
>> state that the indices will be checked against the "size" of the 
>> corresponding open path element (this is a new concept that I also have 
>> defined in the javadoc).
>> 
>> I also added a test for `byteOffsetHandle` as I don't think we had a test 
>> for that specifically (although this method is tested indirectly, via 
>> `MemoryLayout::varHandle`).
>
> Maurizio Cimadamore has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - Update copyright
>  - Add javadoc to other MemoryLayout methods returning VarHandle/MethodHandle 
> to describe which exception can be thrown by returned handle

src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 574:

> 572:  * {@code c_1}, {@code c_2}, ... {@code c_m} are other 
> static offset
> 573:  * constants (such as field offsets) which are derived from the 
> layout path.
> 574:  * 

Can we connected back to the prior paragraph? e.g., for any given dynamic 
argument `x_n`, the index into the sequence associated with the n'th open path 
element whose size is `size_n`,`x_n` must be `>= 0` or `< size_n`

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19158#discussion_r1595836040


Re: RFR: 8331734: Atomic MemorySegment VarHandle operations fails for element layouts

2024-05-07 Thread Paul Sandoz
On Tue, 7 May 2024 15:42:23 GMT, Maurizio Cimadamore  
wrote:

> This PR fixes an issue that has crept into the FFM API implementation.
> 
> From very early stages, the FFM API used to disable the alignment check on 
> nested layout elements, in favor of an alignment check against the memory 
> segment base address. The rationale was that the JIT had issue with 
> eliminating redundant alignment checks, and accessing nested elements could 
> never result in alignment issues, since the alignment of the container is 
> provably bigger than that of the contained element. This means that, when 
> creating a var handle for a nested layout element, we set the nested layout 
> alignment to 1 (unaligned), derive its var handle, and then decorate the var 
> handle with an alignment check for the container.
> 
> At some point in 22, we tweaked the API to throw 
> `UnsupportedOperationException` when using an access mode incompatible with 
> the alignment constraint of the accessed layout. That is, a volatile read on 
> an `int` is only possible if the access occurs at an address that is at least 
> 4-byte aligned. Otherwise an `UOE` is thrown.
> 
> Unfortunately this change broke the aforementioned optimization: creating a 
> var handle for an unaligned layout works, but the resulting layout will *not* 
> support any of the atomic access modes.
> 
> Since this optimization is not really required anymore (proper C2 support to 
> hoist/eliminate alignment checks has been added since then), it is better to 
> disable this implementation quirk, and leave optimizations to C2.
> 
> (If we really really wanted to optimize things a bit, we could remove the 
> container alignment check in the case the accessed value is the first layout 
> element nested in the container, but this PR doesn't go that far).
> 
> I've run relevant benchmarks before/after and found no differences. In part 
> this is because `arrayElementVarHandle` is unaffected. But, even after 
> tweaking the `LoopOverNonConstant` benchmark to add explicit tests for the 
> code path affected, no significant difference was found, sign that C2 is 
> indeed able to spot (and remove) the redundant alignment check. Note: if we 
> know that `aligned_to_N(base)` holds, then it's easy to prove that 
> `aligned_to_M(base + offset + index * scale)` holds, when `N >= M` and with 
> `offset` and `scale` known (the latter a power of two).

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19124#pullrequestreview-2043909050


Re: RFR: 8048691: Spliterator.SORTED characteristics gets cleared for BaseStream.spliterator

2024-05-07 Thread Paul Sandoz
On Tue, 7 May 2024 14:58:00 GMT, Viktor Klang  wrote:

> Removes SORTED if not also ORDERED for escape-hatch `Stream::spliterator()`

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19123#pullrequestreview-2043897612


Re: RFR: 8331655: ClassFile API ClassCastException with verbose output of certain class files [v2]

2024-05-03 Thread Paul Sandoz
On Fri, 3 May 2024 17:13:04 GMT, Adam Sotona  wrote:

>> Specifically corrupted constant pool of a class file can cause 
>> ClassCastException, when the entries are accessed by Class-File API in exact 
>> order.
>> 
>> This fix avoids the ClassCastException and throws ConstantPoolException 
>> instead.
>> Test is attached.
>> 
>> Please review.
>> 
>> Thanks,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   applied suggested changes

Nice!

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19088#pullrequestreview-2038729421


Re: RFR: 8331655: ClassFile API ClassCastException with verbose output of certain class files

2024-05-03 Thread Paul Sandoz
On Fri, 3 May 2024 15:28:05 GMT, Adam Sotona  wrote:

> Specifically corrupted constant pool of a class file can cause 
> ClassCastException, when the entries are accessed by Class-File API in exact 
> order.
> 
> This fix avoids the ClassCastException and throws ConstantPoolException 
> instead.
> Test is attached.
> 
> Please review.
> 
> Thanks,
> Adam

src/java.base/share/classes/jdk/internal/classfile/impl/ClassReaderImpl.java 
line 402:

> 400: int tag = readU1(offset);
> 401: final int q = offset + 1;
> 402: if (tag == TAG_UTF8) {

Can we call into the tag accepting entryByIndex? e.g.,

if (entryByIndex(index, TAG_UTF8) instanceof AbstractPoolEntry.Utf8EntryImpl 
utf8) {
  return ...
}
throw new ...

?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19088#discussion_r1589423345


Re: RFR: 8331320: ClassFile API OutOfMemoryError with certain class files [v2]

2024-05-02 Thread Paul Sandoz
On Thu, 2 May 2024 11:13:22 GMT, Adam Sotona  wrote:

>> Class files with specifically corrupted tableswitch or lookupswitch 
>> instructions in the bytecode cause OutOfMemoryError while parsing with 
>> Class-File API.
>> This patch performs additional checks to avoid OOME and adds relevant tests.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8331320-OOME
>  - 8331320: ClassFile API OutOfMemoryError with certain class files

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/19024#pullrequestreview-2036210957


Re: RFR: 8331320: ClassFile API OutOfMemoryError with certain class files

2024-04-30 Thread Paul Sandoz
On Tue, 30 Apr 2024 15:31:02 GMT, Adam Sotona  wrote:

> Class files with specifically corrupted tableswitch or lookupswitch 
> instructions in the bytecode cause OutOfMemoryError while parsing with 
> Class-File API.
> This patch performs additional checks to avoid OOME and adds relevant tests.
> 
> Please review.
> 
> Thank you,
> Adam

src/java.base/share/classes/jdk/internal/classfile/impl/AbstractInstruction.java
 line 320:

> 318: int low = code.classReader.readInt(ap + 4);
> 319: int high = code.classReader.readInt(ap + 8);
> 320: if (high < low || high - low > code.codeLength >> 2) {

May be its also an opportunity to reduce duplication e.g., replace line 316 
with a call to `afterPadding()`

-

PR Review Comment: https://git.openjdk.org/jdk/pull/19024#discussion_r1585300727


Re: RFR: 8330684: ClassFile API runs into StackOverflowError while parsing certain class' bytes [v3]

2024-04-26 Thread Paul Sandoz
On Fri, 26 Apr 2024 13:34:08 GMT, Adam Sotona  wrote:

>> ClassFile API dives into the nested constant pool entries without type 
>> restrictions, while parsing a class file. Validation of the entry is 
>> performed post-parsing. Specifically corrupted constant pool entry may cause 
>> infinite loop during parsing and throws SOE.
>> This patch resolves the issue by providing specific implementations for the 
>> nested CP entries parsing, instead of sharing the common (post-checking) 
>> code.
>> Added test simulates the situation on inner-looped method reference entry.
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Apply suggestions from code review
>   
>   Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com>

Very nice. I think we got lucky this worked out :-)

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18907#pullrequestreview-2025411091


Re: RFR: 8322847: java.lang.classfile.BufWriter should specify @throws for its writeXXX methods

2024-04-25 Thread Paul Sandoz
On Tue, 23 Apr 2024 11:56:41 GMT, Adam Sotona  wrote:

> This patch adds missing `@throws` javadoc annotations to 
> `java.lang.classfile.BufWriter`.
> 
> Please review.
> 
> Thank you,
> Adam

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18913#pullrequestreview-2023498303


Re: RFR: 8330684: ClassFile API runs into StackOverflowError while parsing certain class' bytes

2024-04-25 Thread Paul Sandoz
On Thu, 25 Apr 2024 00:51:41 GMT, Adam Sotona  wrote:

> Unfortunately it would have to be an expected tags list or an extra 
> constructed bit mask, due to the multiple tags allowed for MemberRefEntry and 
> it would slightly affect the performance. 

Ah yes, i missed that. It could be two tags, a lower and upper bound, because 
TAG_FIELDREF, TAG_METHODREF, and TAG_INTERFACEMETHODREF are consecutive values 
(9 to 11).

-

PR Comment: https://git.openjdk.org/jdk/pull/18907#issuecomment-2078099914


Re: RFR: 8325373: Improve StackCounter error reporting for bad switch cases

2024-04-24 Thread Paul Sandoz
On Tue, 23 Apr 2024 12:59:18 GMT, Adam Sotona  wrote:

> ClassFile API `StackMapGenerator` attaches print or hex dump of the method to 
> an error message.
> However there is no such attachment when the stack maps generation is  turned 
> off.
> 
> This patch moves class print/dump to `impl.Util::dumpMethod`, so it is shared 
> by `StackMapGenerator` and `StackCounter` to provide the same level of 
> details in case of an error.
> 
> Please review.
> 
> Thank you,
> Adam

Looks good, just update the copy write date in Util.java.

src/java.base/share/classes/jdk/internal/classfile/impl/Util.java line 229:

> 227: };
> 228: ClassPrinter.toYaml(clm.methods().get(0).code().get(), 
> ClassPrinter.Verbosity.TRACE_ALL, dump);
> 229: } catch (Error | Exception suppresed) {

If you like you can replace `suppresed` [sic] with `_`.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18914#pullrequestreview-2021078844
PR Review Comment: https://git.openjdk.org/jdk/pull/18914#discussion_r1578604408


Re: RFR: 8322847: java.lang.classfile.BufWriter should specify @throws for its writeXXX methods

2024-04-24 Thread Paul Sandoz
On Tue, 23 Apr 2024 11:56:41 GMT, Adam Sotona  wrote:

> This patch adds missing `@throws` javadoc annotations to 
> `java.lang.classfile.BufWriter`.
> 
> Please review.
> 
> Thank you,
> Adam

This looks good, but for completeness it will need a CSR.

-

PR Review: https://git.openjdk.org/jdk/pull/18913#pullrequestreview-2021068931


Re: RFR: 8330684: ClassFile API runs into StackOverflowError while parsing certain class' bytes

2024-04-24 Thread Paul Sandoz
On Tue, 23 Apr 2024 07:39:47 GMT, Adam Sotona  wrote:

> ClassFile API dives into the nested constant pool entries without type 
> restrictions, while parsing a class file. Validation of the entry is 
> performed post-parsing. Specifically corrupted constant pool entry may cause 
> infinite loop during parsing and throws SOE.
> This patch resolves the issue by providing specific implementations for the 
> nested CP entries parsing, instead of sharing the common (post-checking) code.
> Added test simulates the situation on inner-looped method reference entry.
> 
> Please review.
> 
> Thank you,
> Adam

Rather than duplicating some checks I wonder if it is possible to add a private 
method `entryByIndex(int index, int expectedTag)` that the existing 
`entryByIndex` defers to. If the `expectedTag` is non-negative then it checks 
`tag` against `expectedTag` before proceeding to the switch expression. Then 
the implementations of `readClassEntry` etc can be adjusted to pass along the 
expected tag.

-

PR Review: https://git.openjdk.org/jdk/pull/18907#pullrequestreview-2021009969


Re: Vector (and integer) API: unsigned min/max

2024-04-18 Thread Paul Sandoz
Hi David,

It’s not at all outlandish, but would caution it's more work than one might 
initially think.

Could you describe a little more about your use case? that can be helpful to 
understand the larger picture and demand. Using unsigned comparison would be my 
recommended approach with the current API. CC'ing Sandhya for her opinion.

Generally when we add new Vector operations we also consider the impact on the 
scalar types and try to fill any gaps there, so the vector operation behavior 
is composed from the scalar operation behavior (so like you indicated regarding 
symmetry). 

We are seeing demand for saturated arithmetic primarily for vector (not 
“vector” as in hardware vector) search, so we may do something there for 
specific integral types.

Paul.

> On Apr 17, 2024, at 7:13 AM, David Lloyd  wrote:
> 
> I've been trying the the incubating Vector API and one thing I've missed on 
> integer-typed vectors is having unsigned min/max operations. There is a 
> signed min/max operation, and unsigned comparison, but no unsigned min/max.
> 
> I guess this is for symmetry with `Math`, which only has signed `min`/`max`. 
> However, I would point out that while it's not very hard to implement one's 
> own unsigned min/max for integer types using `compareUnsigned`, it is a bit 
> harder to do so with vectors. The best I've come up with is to take one of 
> two approaches:
> 
> 1. Zero-extend the vector to the next-larger size, perform the min/max, and 
> reduce it back down again, or
> 2. Add .MIN_VALUE, min/max with a value or vector also offset by 
> .MIN_VALUE, and then subtract the offset again
> 
> I guess a third approach could be to do a comparison using unsigned compares, 
> and then use the resultant vector mask to select items from the original two 
> vectors, but I didn't bother to work out this solution given I already had 
> the other two options.
> 
> Would it be feasible to add unsigned min/max operations for vectors? It looks 
> like at least AArch64 has support for this as a single instruction, so it 
> doesn't seem too outlandish.
> 
> And as a separate (but related) question, what about 
> `Math.minUnsigned`/`Math.maxUnsigned` of `int` and `long` for symmetry?
> 
> -- 
> - DML • he/him



Re: RFR: 8330467: NoClassDefFoundError when lambda is in a hidden class [v3]

2024-04-17 Thread Paul Sandoz
On Wed, 17 Apr 2024 16:08:31 GMT, Adam Sotona  wrote:

>> Current implementation of `LambdaMetafactory` does not allow to use lambdas 
>> in hidden classes. Invocation throws `NoClassDefFoundError` instead.
>> 
>> This patch includes lambda implementation in a hidden class under the 
>> special handling of `useImplMethodHandle`.
>> The patch also fixes `j/l/i/defineHiddenClass/BasicTest::testLambda` to 
>> correctly cover this test case.
>> 
>> Please review.
>> 
>> Thanks,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - typo in comment fix
>  - applied suggested changes

Looks good to me. I conservatively bumped the review count to 2.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18810#pullrequestreview-2006903761


Re: RFR: 8330467: NoClassDefFoundError when lambda is in a hidden class [v2]

2024-04-17 Thread Paul Sandoz
On Wed, 17 Apr 2024 12:51:55 GMT, Adam Sotona  wrote:

>> Current implementation of `LambdaMetafactory` does not allow to use lambdas 
>> in hidden classes. Invocation throws `NoClassDefFoundError` instead.
>> 
>> This patch includes lambda implementation in a hidden class under the 
>> special handling of `useImplMethodHandle`.
>> The patch also fixes `j/l/i/defineHiddenClass/BasicTest::testLambda` to 
>> correctly cover this test case.
>> 
>> Please review.
>> 
>> Thanks,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   applied suggested changes

test/jdk/java/lang/invoke/defineHiddenClass/src/Lambda.java line 28:

> 26: public class Lambda implements HiddenTest {
> 27:  public void test() {
> 28:  Function f = o -> o.toString();

Recommend you retain the existing method reference e.g.:

 Function fmref = Object::toString;
 Function flexp = o -> o.toString();
 String s = fmref.apply(this) + flexp.apply(this);
 ...

Or split them out into two tests (easier to debug if case fails).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18810#discussion_r1569060714


Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams [v8]

2024-04-15 Thread Paul Sandoz
On Sat, 13 Apr 2024 14:27:12 GMT, Viktor Klang  wrote:

>> This PR implements Gatherer-inspired encoding of `flatMap` that shows that 
>> it is both competitive performance-wise as well as improve correctness.
>> 
>> Below is the performance of `Stream::flatMap` (for reference types):
>> 
>> Before this PR (on my MacBook, aarch64):
>> 
>> Non-short-circuiting:
>> 
>> Benchmark (size)   Mode  CntScore   Error  Units
>> FlatMap.par_array 10  thrpt   12   257582,480 ? 31360,242  ops/s
>> FlatMap.par_array100  thrpt   1255202,015 ? 14011,668  ops/s
>> FlatMap.par_array   1000  thrpt   12 6546,252 ?  3675,764  ops/s
>> FlatMap.par_doublestream  10  thrpt   12   267423,410 ? 37338,089  ops/s
>> FlatMap.par_doublestream 100  thrpt   1227140,703 ?  4979,878  ops/s
>> FlatMap.par_doublestream1000  thrpt   12 2978,178 ?  1890,250  ops/s
>> FlatMap.par_intstream 10  thrpt   12   268194,530 ? 37295,092  ops/s
>> FlatMap.par_intstream100  thrpt   1230897,034 ?  5388,245  ops/s
>> FlatMap.par_intstream   1000  thrpt   12 3969,043 ?  2494,641  ops/s
>> FlatMap.par_longstream10  thrpt   12   279756,861 ? 19519,497  ops/s
>> FlatMap.par_longstream   100  thrpt   1245733,955 ? 18385,144  ops/s
>> FlatMap.par_longstream  1000  thrpt   12 5115,615 ?  4147,237  ops/s
>> FlatMap.seq_array 10  thrpt   12  2937192,934 ? 58605,583  ops/s
>> FlatMap.seq_array100  thrpt   1241859,462 ?   139,021  ops/s
>> FlatMap.seq_array   1000  thrpt   12  442,677 ? 1,041  ops/s
>> FlatMap.seq_doublestream  10  thrpt   12  4972651,093 ? 35997,267  ops/s
>> FlatMap.seq_doublestream 100  thrpt   1299265,005 ?   193,497  ops/s
>> FlatMap.seq_doublestream1000  thrpt   12 1037,030 ? 3,254  ops/s
>> FlatMap.seq_intstream 10  thrpt   12  5133751,888 ? 23516,294  ops/s
>> FlatMap.seq_intstream100  thrpt   12   145166,206 ?   399,263  ops/s
>> FlatMap.seq_intstream   1000  thrpt   12 1565,004 ? 3,207  ops/s
>> FlatMap.seq_longstream10  thrpt   12  5047029,363 ? 24742,300  ops/s
>> FlatMap.seq_longstream   100  thrpt   12   233225,307 ?  7162,604  ops/s
>> FlatMap.seq_longstream  1000  thrpt   12 2999,926 ? 9,945  ops/s
>> 
>> // Short-circuiting:
>> 
>> Benchmark   (size)   Mode  Cnt   Score   Error  Units
>> FlatMap.par_iterate_double  10  thrpt   12   46336,834 ?  6803,751  ops/s
>> FlatMap.par_iterate_double 100 ...
>
> Viktor Klang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Addressing review feedback

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18625#pullrequestreview-2001623065


Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams [v7]

2024-04-12 Thread Paul Sandoz
On Fri, 12 Apr 2024 14:53:01 GMT, Viktor Klang  wrote:

>> This PR implements Gatherer-inspired encoding of `flatMap` that shows that 
>> it is both competitive performance-wise as well as improve correctness.
>> 
>> Below is the performance of `Stream::flatMap` (for reference types):
>> 
>> Before this PR (on my MacBook, aarch64):
>> 
>> Non-short-circuiting:
>> 
>> Benchmark (size)   Mode  CntScore   Error  Units
>> FlatMap.par_array 10  thrpt   12   257582,480 ? 31360,242  ops/s
>> FlatMap.par_array100  thrpt   1255202,015 ? 14011,668  ops/s
>> FlatMap.par_array   1000  thrpt   12 6546,252 ?  3675,764  ops/s
>> FlatMap.par_doublestream  10  thrpt   12   267423,410 ? 37338,089  ops/s
>> FlatMap.par_doublestream 100  thrpt   1227140,703 ?  4979,878  ops/s
>> FlatMap.par_doublestream1000  thrpt   12 2978,178 ?  1890,250  ops/s
>> FlatMap.par_intstream 10  thrpt   12   268194,530 ? 37295,092  ops/s
>> FlatMap.par_intstream100  thrpt   1230897,034 ?  5388,245  ops/s
>> FlatMap.par_intstream   1000  thrpt   12 3969,043 ?  2494,641  ops/s
>> FlatMap.par_longstream10  thrpt   12   279756,861 ? 19519,497  ops/s
>> FlatMap.par_longstream   100  thrpt   1245733,955 ? 18385,144  ops/s
>> FlatMap.par_longstream  1000  thrpt   12 5115,615 ?  4147,237  ops/s
>> FlatMap.seq_array 10  thrpt   12  2937192,934 ? 58605,583  ops/s
>> FlatMap.seq_array100  thrpt   1241859,462 ?   139,021  ops/s
>> FlatMap.seq_array   1000  thrpt   12  442,677 ? 1,041  ops/s
>> FlatMap.seq_doublestream  10  thrpt   12  4972651,093 ? 35997,267  ops/s
>> FlatMap.seq_doublestream 100  thrpt   1299265,005 ?   193,497  ops/s
>> FlatMap.seq_doublestream1000  thrpt   12 1037,030 ? 3,254  ops/s
>> FlatMap.seq_intstream 10  thrpt   12  5133751,888 ? 23516,294  ops/s
>> FlatMap.seq_intstream100  thrpt   12   145166,206 ?   399,263  ops/s
>> FlatMap.seq_intstream   1000  thrpt   12 1565,004 ? 3,207  ops/s
>> FlatMap.seq_longstream10  thrpt   12  5047029,363 ? 24742,300  ops/s
>> FlatMap.seq_longstream   100  thrpt   12   233225,307 ?  7162,604  ops/s
>> FlatMap.seq_longstream  1000  thrpt   12 2999,926 ? 9,945  ops/s
>> 
>> // Short-circuiting:
>> 
>> Benchmark   (size)   Mode  Cnt   Score   Error  Units
>> FlatMap.par_iterate_double  10  thrpt   12   46336,834 ?  6803,751  ops/s
>> FlatMap.par_iterate_double 100 ...
>
> Viktor Klang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adding additional, short-circuit-specific, cases to the FlatMap benchmark

Very nice work.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18625#pullrequestreview-1998504550


Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams [v2]

2024-04-09 Thread Paul Sandoz
On Tue, 9 Apr 2024 10:07:46 GMT, Viktor Klang  wrote:

>> This PR implements Gatherer-inspired encoding of `flatMap` that shows that 
>> it is both competitive performance-wise as well as improve correctness.
>> 
>> Below is the performance of `Stream::flatMap` (for reference types):
>> 
>> Before this PR:
>> 
>> Benchmark(size)   Mode  CntScore   Error  Units
>> FlatMap.par_array10  thrpt   12   294008,937 ? 54369,110  ops/s
>> FlatMap.par_array   100  thrpt   1262411,229 ? 14868,119  ops/s
>> FlatMap.par_array  1000  thrpt   12 8263,821 ?   452,622  ops/s
>> FlatMap.par_iterate  10  thrpt   1223029,978 ?  4274,449  ops/s
>> FlatMap.par_iterate 100  thrpt   1210532,907 ?   321,694  ops/s
>> FlatMap.par_iterate1000  thrpt   12  981,571 ?   135,270  ops/s
>> FlatMap.seq_array10  thrpt   12  2955648,495 ? 32539,142  ops/s
>> FlatMap.seq_array   100  thrpt   1241851,009 ?   377,546  ops/s
>> FlatMap.seq_array  1000  thrpt   12 1740,281 ?  1229,974  ops/s
>> FlatMap.seq_iterate  10  thrpt   12   321727,690 ?  5149,356  ops/s
>> FlatMap.seq_iterate 100  thrpt   12 8437,198 ?56,635  ops/s
>> FlatMap.seq_iterate1000  thrpt   12   76,994 ? 0,965  ops/s
>> 
>> 
>> After this PR:
>> 
>> 
>> Benchmark(size)   Mode  CntScoreError  Units
>> FlatMap.par_array10  thrpt   12   283350,051 ?  35567,223  ops/s
>> FlatMap.par_array   100  thrpt   1253846,906 ?  19241,913  ops/s
>> FlatMap.par_array  1000  thrpt   12 8230,909 ?156,362  ops/s
>> FlatMap.par_iterate  10  thrpt   1226328,500 ?   5411,401  ops/s
>> FlatMap.par_iterate 100  thrpt   1210470,862 ?249,991  ops/s
>> FlatMap.par_iterate1000  thrpt   12  986,511 ?224,050  ops/s
>> FlatMap.seq_array10  thrpt   12  5654826,565 ?  27317,453  ops/s
>> FlatMap.seq_array   100  thrpt   12   187929,786 ?542,787  ops/s
>> FlatMap.seq_array  1000  thrpt   12 2385,346 ?  9,827  ops/s
>> FlatMap.seq_iterate  10  thrpt   12   812722,403 ? 160500,399  ops/s
>> FlatMap.seq_iterate 100  thrpt   1213542,472 ?118,769  ops/s
>> FlatMap.seq_iterate1000  thrpt   12  157,056 ?  1,814  ops/s
>> 
>> 
>> For streams of size 100k, the following numbers are interesting:
>> 
>> Before this PR:
>> 
>> Benchmark(size)   Mode  Cnt  ScoreError  Units
>> FlatMap.par_array10  thrpt   12  0,325 ?  0,004  ops/s
>> FlatMap.par_iterate  10  thrpt   12  0,106 ?  0,008  o...
>
> Viktor Klang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Updating copyright year

src/java.base/share/classes/java/util/stream/DoublePipeline.java line 280:

> 278: result.sequential().allMatch(this);
> 279: else
> 280: 
> result.sequential().forEach(sink::accept);

I think that might create a new double consumer instance for every input 
element. Alternatively you can compute and cache it as a field, replacing 
`shorts` and use a `null` check.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18625#discussion_r1557995257


Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams

2024-04-09 Thread Paul Sandoz
On Mon, 8 Apr 2024 10:20:11 GMT, Viktor Klang  wrote:

> @PaulSandoz @AlanBateman I've added a commit to this PR which removes the use 
> of Gatherer for Stream::flatMap, but instead implements flatMap for all of 
> the pipelines using the same encoding which Gatherer would use. It seems very 
> competitive performance-wise, and resolves at least one open JBS-issue with 
> flatMap (will look to see if it resolves more than that)

That's a rather clever use of `allMatch`!

Did you observe performance improvements using `@Stable` on the `cancel` field? 
It's really hard to predict in the abstract (since the default value of the 
field will be read in proportion to the number of elements until the stream is 
cancelled).

-

PR Comment: https://git.openjdk.org/jdk/pull/18625#issuecomment-2043218633


Re: RFR: 8325659: Normalize Random usage by incubator vector tests

2024-04-08 Thread Paul Sandoz
On Mon, 8 Apr 2024 10:50:00 GMT, Evgeny Nikitin  wrote:

> Improve RNG usage in said tests:
> 
> 1. The RNG is not created by our Utils class, as suggested in our JTReg tests;
> 2. The seed, accordingly, is not a fixed value now, but truly random;
> 3. The tests that had been creating their own Random instances, are now using 
> RAND static member from the AbstractVectorTest;
> 4. The generated tests sources have been re-generated.
> 
> The most important change is #2, it could add variability and help cover more 
> JIT Compiler and Runtime scenarios.

Looks good, thank you for cleaning this up.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18675#pullrequestreview-1987442519


Re: RFR: 8328316: Finisher cannot emit if stream is sequential and integrator returned false [v4]

2024-03-21 Thread Paul Sandoz
On Wed, 20 Mar 2024 23:58:32 GMT, Viktor Klang  wrote:

>> Adds differentiation between direct and transitive short circuiting which 
>> could prevent pushing downstream in the finisher for built-ins that were not 
>> `collect()`.
>> 
>> Creating this as a draft PR for now, just need to run some benchmarks to 
>> validate no significant regressions first.
>
> Viktor Klang has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Updating copyright year and switching to use underscores in 
> GathererShortCircuitTest
>  - Adds differentiation between direct and transitive short circutiting for 
> Gatherers

Marked as reviewed by psandoz (Reviewer).

src/java.base/share/classes/java/util/stream/GathererOp.java line 2:

> 1: /*
> 2:  * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved.

Suggestion:

 * Copyright (c) 2023, 2024, Oracle and/or its affiliates. All rights reserved.

-

PR Review: https://git.openjdk.org/jdk/pull/18351#pullrequestreview-1952810339
PR Review Comment: https://git.openjdk.org/jdk/pull/18351#discussion_r1534280411


Re: RFR: 8328316: Finisher cannot emit if stream is sequential and integrator returned false

2024-03-20 Thread Paul Sandoz
On Mon, 18 Mar 2024 16:27:13 GMT, Viktor Klang  wrote:

> Adds differentiation between direct and transitive short circuiting which 
> could prevent pushing downstream in the finisher for built-ins that were not 
> `collect()`.
> 
> Creating this as a draft PR for now, just need to run some benchmarks to 
> validate no significant regressions first.

Looks good, just a minor suggestion.

test/jdk/java/util/stream/GathererShortCircuitTest.java line 48:

> 46: Gatherer.of(
> 47: (unused, element, downstream) -> false,
> 48: (unused, downstream) -> downstream.push(expected)

Suggestion:

(_, element, downstream) -> false,
(_ downstream) -> downstream.push(expected)

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18351#pullrequestreview-1950412052
PR Review Comment: https://git.openjdk.org/jdk/pull/18351#discussion_r1533025318


Re: RFR: 8318650: Optimized subword gather for x86 targets. [v17]

2024-03-04 Thread Paul Sandoz
On Sat, 2 Mar 2024 16:22:22 GMT, Jatin Bhateja  wrote:

>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
>> AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially 
>> partially unrolls scalar loop to accumulates values from gather indices into 
>> a quadword(64bit) slice followed by vector permutation to place the slice 
>> into appropriate vector lanes, it prevents code bloating and generates 
>> compact JIT sequence. This coupled with savings from expansive array 
>> allocation in existing java implementation translates into significant 
>> performance of 1.5-10x gains with included micro.
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by 
>> replacing temporary array allocation with zero initialized vector and a 
>> scalar loops which inserts gathered values into vector. But, vector insert 
>> operation in higher vector lanes is a three step process which first 
>> extracts the upper vector 128 bit lane, updates it with gather subword value 
>> and then inserts the lane back to its original position. This makes inserts 
>> into higher order lanes costly w.r.t to proposed solution. In addition 
>> generated JIT code for modified fallback implementation was very bulky. This 
>> may impact in-lining decisions into caller contexts.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review resolutions.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1914752388


Re: RFR: 8318650: Optimized subword gather for x86 targets. [v16]

2024-02-27 Thread Paul Sandoz
On Tue, 27 Feb 2024 02:47:13 GMT, Jatin Bhateja  wrote:

>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
>> AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially 
>> partially unrolls scalar loop to accumulates values from gather indices into 
>> a quadword(64bit) slice followed by vector permutation to place the slice 
>> into appropriate vector lanes, it prevents code bloating and generates 
>> compact JIT sequence. This coupled with savings from expansive array 
>> allocation in existing java implementation translates into significant 
>> performance of 1.5-10x gains with included micro.
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by 
>> replacing temporary array allocation with zero initialized vector and a 
>> scalar loops which inserts gathered values into vector. But, vector insert 
>> operation in higher vector lanes is a three step process which first 
>> extracts the upper vector 128 bit lane, updates it with gather subword value 
>> and then inserts the lane back to its original position. This makes inserts 
>> into higher order lanes costly w.r.t to proposed solution. In addition 
>> generated JIT code for modified fallback implementation was very bulky. This 
>> may impact in-lining decisions into caller contexts.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comment resolutions.

The Java code looks correct.

It would be nice to common the non-mask and mask variants to reduce the code 
duplication. Perhaps even common to all element types if the loop check is 
efficient?

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
 line 3637:

> 3635: } else {
> 3636: lsp = isp;
> 3637: }

No need to initialize `lsp` to null, since it will be definitely assigned after 
the if/else statement.

Suggestion:

// Constant folding should sweep out following conditonal logic.
VectorSpecies lsp;
if (isp.length() > IntVector.SPECIES_PREFERRED.length()) {
lsp = IntVector.SPECIES_PREFERRED;
} else {
lsp = isp;
}

-

PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1904909095
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1505073358


Re: [jdk22] RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments

2024-02-07 Thread Paul Sandoz
On Wed, 7 Feb 2024 09:13:33 GMT, Aleksey Shipilev  wrote:

> Looks fine.

Thanks!

-

PR Comment: https://git.openjdk.org/jdk22/pull/109#issuecomment-1932769043


[jdk22] Integrated: 8324858: [vectorapi] Bounds checking issues when accessing memory segments

2024-02-07 Thread Paul Sandoz
On Tue, 6 Feb 2024 16:50:10 GMT, Paul Sandoz  wrote:

> This pull request contains a backport of commit 
> [1ae85138](https://github.com/openjdk/jdk/commit/1ae851387f881263ccc6aeace5afdd0f49d41d33)
>  from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
> 
> The commit being backported was authored by Paul Sandoz on 2 Feb 2024 and was 
> reviewed by Maurizio Cimadamore and Jatin Bhateja.

This pull request has now been integrated.

Changeset: 9cc260d3
Author:Paul Sandoz 
URL:   
https://git.openjdk.org/jdk22/commit/9cc260d3a10a4d31a2ee22be1715884e175f9679
Stats: 248 lines in 39 files changed: 132 ins; 8 del; 108 mod

8324858: [vectorapi] Bounds checking issues when accessing memory segments

Reviewed-by: shade
Backport-of: 1ae851387f881263ccc6aeace5afdd0f49d41d33

-

PR: https://git.openjdk.org/jdk22/pull/109


[jdk22] RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments

2024-02-06 Thread Paul Sandoz
This pull request contains a backport of commit 
[1ae85138](https://github.com/openjdk/jdk/commit/1ae851387f881263ccc6aeace5afdd0f49d41d33)
 from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.

The commit being backported was authored by Paul Sandoz on 2 Feb 2024 and was 
reviewed by Maurizio Cimadamore and Jatin Bhateja.

-

Commit messages:
 - Backport 1ae851387f881263ccc6aeace5afdd0f49d41d33

Changes: https://git.openjdk.org/jdk22/pull/109/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk22=109=00
  Issue: https://bugs.openjdk.org/browse/JDK-8324858
  Stats: 248 lines in 39 files changed: 132 ins; 8 del; 108 mod
  Patch: https://git.openjdk.org/jdk22/pull/109.diff
  Fetch: git fetch https://git.openjdk.org/jdk22.git pull/109/head:pull/109

PR: https://git.openjdk.org/jdk22/pull/109


Re: RFR: 8323058: Revisit j.l.classfile.CodeBuilder API surface [v2]

2024-02-05 Thread Paul Sandoz
On Mon, 5 Feb 2024 18:31:44 GMT, Adam Sotona  wrote:

>> `java.lang.classfile.CodeBuilder` contains more than 230 API methods.
>> Existing ClassFile API use cases proved the concept one big CodeBuilder is 
>> comfortable. However there are some redundancies, glitches in the naming 
>> conventions, some frequently used methods are hard to find and some methods 
>> have low practical use.
>> 
>> This patch revisits the `CodeBuilder` API methods and introduces some 
>> changes.
>> 
>> For more details, please, visit the [CSR 
>> ](https://bugs.openjdk.org/browse/JDK-8323067)
>> 
>> Please review.
>> 
>> Thank you,
>> Adam
>
> Adam Sotona has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   removed CodeBuilder::newObject methods

Marked as reviewed by psandoz (Reviewer).

src/java.base/share/classes/java/lang/classfile/CodeBuilder.java line 507:

> 505:  * @return this builder
> 506:  */
> 507: default CodeBuilder newObject(ClassEntry type) {

The two `newObject` methods seem to fit in the pattern of methods that are 
being removed, since they don't differentiate sufficiently with the `new_` 
methods that defer to them.

-

PR Review: https://git.openjdk.org/jdk/pull/17282#pullrequestreview-1863531162
PR Review Comment: https://git.openjdk.org/jdk/pull/17282#discussion_r1478611886


Integrated: 8324858: [vectorapi] Bounds checking issues when accessing memory segments

2024-02-02 Thread Paul Sandoz
On Mon, 29 Jan 2024 19:45:41 GMT, Paul Sandoz  wrote:

> The implementation of method `VectorSpecies::fromMemorySegment`, in 
> `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on 
> the offset argument when the method is compiled by C2 (bounds checks are 
> performed when interpreted and by C1).
> 
> This is an oversight and explicit bounds checks are required, as is already 
> case for the other load and store memory access methods (including storing to 
> memory memory segments).
> 
> The workaround is to call the static method `{T}Vector::fromMemorySegment`.
> 
> The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` to 
> do the same and call `{T}Vector::fromMemorySegment`, following the same 
> pattern for implementations of `VectorSpecies::fromArray`.
> 
> The tests have been conservatively updated to call the species access method 
> where possible in the knowledge that it calls the vector access method (the 
> tests were intended to test out of bounds access when compiled by C2).
> 
> Thinking ahead its tempting to remove the species access methods, simplifying 
> functionality that is duplicated.

This pull request has now been integrated.

Changeset: 1ae85138
Author:Paul Sandoz 
URL:   
https://git.openjdk.org/jdk/commit/1ae851387f881263ccc6aeace5afdd0f49d41d33
Stats: 248 lines in 39 files changed: 132 ins; 8 del; 108 mod

8324858: [vectorapi] Bounds checking issues when accessing memory segments

Reviewed-by: mcimadamore, jbhateja

-

PR: https://git.openjdk.org/jdk/pull/17621


Re: RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments [v2]

2024-02-01 Thread Paul Sandoz
On Thu, 1 Feb 2024 12:22:10 GMT, Maurizio Cimadamore  
wrote:

>> My expectation is the risk is small, but of course non-zero. These tests can 
>> be expensive to run so i was trying balance the risk without increasing test 
>> execution times.
>> 
>> I could strengthen the comment from:
>> 
>> @ForceInline
>> @Override final
>> public ByteVector fromMemorySegment(MemorySegment ms, long offset, 
>> ByteOrder bo) {
>> // User entry point:  Be careful with inputs.
>> return ByteVector
>> .fromMemorySegment(this, ms, offset, bo);
>> }
>> 
>> 
>> to:
>> 
>> 
>> @ForceInline
>> @Override final
>> public ByteVector fromMemorySegment(MemorySegment ms, long offset, 
>> ByteOrder bo) {
>> // User entry point
>> // Defer only to the equivalent method on the vector class, 
>> using the same inputs
>> return ByteVector
>> .fromMemorySegment(this, ms, offset, bo);
>> }
>
> Sounds good

Done

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17621#discussion_r1475344913


Re: RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments [v2]

2024-02-01 Thread Paul Sandoz
> The implementation of method `VectorSpecies::fromMemorySegment`, in 
> `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on 
> the offset argument when the method is compiled by C2 (bounds checks are 
> performed when interpreted and by C1).
> 
> This is an oversight and explicit bounds checks are required, as is already 
> case for the other load and store memory access methods (including storing to 
> memory memory segments).
> 
> The workaround is to call the static method `{T}Vector::fromMemorySegment`.
> 
> The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` to 
> do the same and call `{T}Vector::fromMemorySegment`, following the same 
> pattern for implementations of `VectorSpecies::fromArray`.
> 
> The tests have been conservatively updated to call the species access method 
> where possible in the knowledge that it calls the vector access method (the 
> tests were intended to test out of bounds access when compiled by C2).
> 
> Thinking ahead its tempting to remove the species access methods, simplifying 
> functionality that is duplicated.

Paul Sandoz has updated the pull request incrementally with one additional 
commit since the last revision:

  Update/add clarifying comments

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/17621/files
  - new: https://git.openjdk.org/jdk/pull/17621/files/ba326467..e3958a16

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=17621=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=17621=00-01

  Stats: 90 lines in 38 files changed: 76 ins; 0 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/17621.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17621/head:pull/17621

PR: https://git.openjdk.org/jdk/pull/17621


Re: RFR: 8318966: Some methods make promises about Java array element alignment that are too strong

2024-02-01 Thread Paul Sandoz
On Wed, 15 Nov 2023 22:46:03 GMT, Jorn Vernee  wrote:

> See the JBS issue for an extended problem description.
> 
> This patch changes the specification and implementation of 
> `MethodHandles::byteArrayViewVarHandle`, 
> `MethodHandles::byteBufferViewVarHandle`, `ByteBuffer::alignedSlice`, and 
> `ByteBuffer::alignmentOffset` to weaken the guarantees they make about the 
> alignment of Java array elements, in order to bring them in line with the 
> guarantees made by an arbitrary JVM implementation (which makes no guarantees 
> about array element alignment beyond them being aligned to their natural 
> alignment).
> 
> - `MethodHandles::byteArrayViewVarHandle`: we can not guarantee any alignment 
> for the accesses. Which means we can only reliably support plain get and set 
> access modes. The javadoc text explaining which other access modes are 
> supported, or how to compute aligned offsets into the array is dropped, as it 
> is not guaranteed to be correct on all JVM implementations. The 
> implementation of the returned VarHandle is changed to throw an 
> `UnsupportedOperationException` for the unsupported access modes, as mandated 
> by the spec of `VarHandle` [1].
> 
> - `MethodHandles::byteBufferViewVarHandle`: the implementation/spec is 
> incorrect when accessing a heap buffer (wrapping a byte[]), for the same 
> reasons as `byteArrayViewVarHandle`. The spec is changed to specify that when 
> accessing a _heap buffer_, only plain get and set access modes are supported. 
> The implementation of the returned var handle is changed to throw an 
> `IllegalStateException` when an access is attempted on a heap buffer using an 
> access mode other than plain get or set. Note that we don't throw an outright 
> `UnsupportedOperationException` for this case, since whether the access modes 
> are supported depends on the byte buffer instance being used.
> 
> - `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset`: The former 
> method depends directly on the latter for all its alignment computations. We 
> change the implementation of the latter method to throw an 
> `UnsupportedOperationException` for all unit sizes greater than 1, when the 
> buffer is non-direct. This change is largely covered by the existing 
> specification:
> 
> 
>  * @throws UnsupportedOperationException
>  * If the native platform does not guarantee stable alignment 
> offset
>  * values for the given unit size when managing the memory regions
>  * of buffers of the same kind as this buffer (direct or
>  * non-direct).  For example, if garbage collection would result
>  * in the mo...

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16681#pullrequestreview-1857461928


Re: RFR: 8318650: Optimized subword gather for x86 targets. [v11]

2024-01-31 Thread Paul Sandoz
On Wed, 31 Jan 2024 23:53:16 GMT, Sandhya Viswanathan 
 wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1613:
>> 
>>> 1611:   vpand(xtmp, idx_vec, xtmp, vlen_enc);
>>> 1612:   // Load double words from normalized indices.
>>> 1613:   evpgatherdd(dst, gmask, Address(base, xtmp, scale), vlen_enc);
>> 
>> Another question, looks to me that we could read beyond the allocated memory 
>> for the array here. e.g. consider the following case:
>> * It is a byte gather
>> * The byte source array is of size 41, i.e. only indices 0-40 are valid
>> * The gather index is 40
>> 
>> Then as part of evpgatherdd we would be reading bytes at 40-43 offset from 
>> source array.
>
> I guess the fact that the Java objects are 8 byte alignment padded and the 
> alignment being done at lines 1609-1611 and 1616-1621 somehow takes care of 
> this.

Drive by comment, that can change with Project Lilliput.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473650392


Re: RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments

2024-01-30 Thread Paul Sandoz
On Wed, 31 Jan 2024 00:10:26 GMT, Maurizio Cimadamore  
wrote:

>> The implementation of method `VectorSpecies::fromMemorySegment`, in 
>> `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on 
>> the offset argument when the method is compiled by C2 (bounds checks are 
>> performed when interpreted and by C1).
>> 
>> This is an oversight and explicit bounds checks are required, as is already 
>> case for the other load and store memory access methods (including storing 
>> to memory memory segments).
>> 
>> The workaround is to call the static method `{T}Vector::fromMemorySegment`.
>> 
>> The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` 
>> to do the same and call `{T}Vector::fromMemorySegment`, following the same 
>> pattern for implementations of `VectorSpecies::fromArray`.
>> 
>> The tests have been conservatively updated to call the species access method 
>> where possible in the knowledge that it calls the vector access method (the 
>> tests were intended to test out of bounds access when compiled by C2).
>> 
>> Thinking ahead its tempting to remove the species access methods, 
>> simplifying functionality that is duplicated.
>
> test/jdk/jdk/incubator/vector/templates/X-LoadStoreTest.java.template line 
> 271:
> 
>> 269: @DontInline
>> 270: static $abstractvectortype$ fromArray($type$[] a, int i) {
>> 271: return ($abstractvectortype$) SPECIES.fromArray(a, i);
> 
> These new tests focus on the species method - but should we also test the 
> statics factories in the record class, just in case a regression is 
> introduced and the two implementation start diverging again?

My expectation is the risk is small, but of course non-zero. These tests can be 
expensive to run so i was trying balance the risk without increasing test 
execution times.

I could strengthen the comment from:

@ForceInline
@Override final
public ByteVector fromMemorySegment(MemorySegment ms, long offset, 
ByteOrder bo) {
// User entry point:  Be careful with inputs.
return ByteVector
.fromMemorySegment(this, ms, offset, bo);
}


to:


@ForceInline
@Override final
public ByteVector fromMemorySegment(MemorySegment ms, long offset, 
ByteOrder bo) {
// User entry point
// Defer only to the equivalent method on the vector class, using 
the same inputs
return ByteVector
.fromMemorySegment(this, ms, offset, bo);
}

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17621#discussion_r1472162760


RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments

2024-01-29 Thread Paul Sandoz
The implementation of method `VectorSpecies::fromMemorySegment`, in 
`AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on the 
offset argument when the method is compiled by C2 (bounds checks are performed 
when interpreted and by C1).

This is an oversight and explicit bounds checks are required, as is already 
case for the other load and store memory access methods (including storing to 
memory memory segments).

The workaround is to call the static method `{T}Vector::fromMemorySegment`.

The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` to 
do the same and call `{T}Vector::fromMemorySegment`, following the same pattern 
for implementations of `VectorSpecies::fromArray`.

The tests have been conservatively updated to call the species access method 
were possible in the knowledge that calls the vector access method (the tests 
were intended to test out of bounds access when compiled by C2).

Thinking ahead its tempting to remove the species access methods, simplifying 
functionality that is duplicated.

-

Commit messages:
 - Merge branch 'master' into v-load-segment-bounds-checks
 - 8324858: [vectorapi] Bounds checking issues when accessing memory segments

Changes: https://git.openjdk.org/jdk/pull/17621/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=17621=00
  Issue: https://bugs.openjdk.org/browse/JDK-8324858
  Stats: 165 lines in 39 files changed: 56 ins; 8 del; 101 mod
  Patch: https://git.openjdk.org/jdk/pull/17621.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17621/head:pull/17621

PR: https://git.openjdk.org/jdk/pull/17621


Re: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5]

2024-01-24 Thread Paul Sandoz
On Wed, 24 Jan 2024 18:48:34 GMT, Aleksey Shipilev  wrote:

>> @dholmes-ora Indeed it's a compiler magic, albeit not really weird. While 
>> the method execution only receives the evaluated value of `expr`, the method 
>> compilation has the expression in its original form. As a result, it can 
>> determine the result based on this information.
>
> It is still weird to talk about expressions at this level. We really check if 
> the value is constant, like the method name suggests now. Yes, this 
> implicitly tests that the expression that produced that value is fully 
> constant-folded. But that's a detail that we do not need to capture here. 
> Let's rename `expr` -> `val`, and tighten up the javadoc for the method to 
> mention we only test the constness of the final value.

I agree. All values are produced by evaluating expressions. In this case we 
want to query whether a value produced by the compiler evaluating its 
expression is a constant value (inputs to the expression are constants and the 
expression had no material side-effects). Meaning if the method returns true 
then we could use that knowledge in subsequent expressions that may also 
produce constants or some specific behavior.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1465449454


Re: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5]

2024-01-23 Thread Paul Sandoz
On Tue, 23 Jan 2024 17:21:47 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch introduces `JitCompiler::isConstantExpression` which can be used 
>> to statically determine whether an expression has been constant-folded by 
>> the Jit compiler, leading to more constant-folding opportunities. For 
>> example, it can be used in `MemorySessionImpl::checkValidStateRaw` to 
>> eliminate the lifetime check on global sessions without imposing additional 
>> branches on other non-global sessions. This is similar to 
>> `__builtin_constant_p` in GCC and clang.
>> 
>> Please kindly give your opinion as well as your reviews, thanks very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   ident

src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 32:

> 30:  * Just-in-time-compiler-related queries
> 31:  */
> 32: public class JitCompiler {

An alternative name and location is `jdk.internal.vm.ConstantSupport` with 
initial class doc:

Defines methods to test if a value has been evaluated to a compile-time 
constant value by the HotSpot VM.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463926393


Re: RFR: 8316641: VarHandle template classes can share code in the base class [v8]

2023-12-04 Thread Paul Sandoz
On Mon, 4 Dec 2023 14:58:34 GMT, Jorn Vernee  wrote:

>> src/java.base/share/classes/java/lang/invoke/X-VarHandleByteArrayView.java.template
>>  line 594:
>> 
>>> 592: @ForceInline
>>> 593: static int indexRO(ByteBuffer bb, int index) {
>>> 594: if (bb.isReadOnly())
>> 
>> I have to think that there was a reason for using unsafe access in the 
>> current code. So, I'm not sure if it's safe to switch this to a direct 
>> method/field reference.
>
> @PaulSandoz Do you remember the history here?

I don't recall exactly but the unsafe usage is likely to avoid possible profile 
pollution due to the abstract method call.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15854#discussion_r1414197423


Re: RFR: 8321223: Implementation of Scoped Values (Second Preview)

2023-12-03 Thread Paul Sandoz
On Sun, 3 Dec 2023 08:46:07 GMT, Alan Bateman  wrote:

> This API is sitting out JDK 22, meaning no API/implementation changes in this 
> PR. Some small API changes are likely for JDK 23.
> 
> For now, we just need to bump JEP number/title that shows up in the preview 
> section of the javadoc.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16937#pullrequestreview-1761337322


Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]

2023-11-22 Thread Paul Sandoz
On Wed, 22 Nov 2023 09:05:31 GMT, Andrew Haley  wrote:

> > Have you considered the possibility of copying the sleef source to the 
> > OpenJDK repository and thereby it becomes part of the build process? I 
> > don't know how straightforward that is technically and IANAL but I think 
> > it's worth exploring.
> 
> Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds 
> good. However, I actually have no idea about how to handle the third-party 
> licence in OpenJDK project. Do you have any idea about this area? Some 
> suggestions/guidence from the JDK team will be much helpful. Thanks!

We (Oracle Java Platform Group) can handle the required "paperwork" on any 
third party dependencies and attribution of copyright before any PR can be 
integrated, if you can help detail what those are. First i think we need to 
determine if this is feasible e.g., copying a subset and integrating it into 
the build system, since it does not make sense to bring in the support for quad 
floats and DFT, which IIUC brings in a dependency on compiler support for 
OpenMP.

-

PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1823335443


Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]

2023-11-21 Thread Paul Sandoz
On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong  wrote:

>> Currently the vector floating-point math APIs like 
>> `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, 
>> which causes large performance gap on AArch64. Note that those APIs are 
>> optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. 
>> To close the gap, we would like to optimize these APIs for AArch64 by 
>> calling a third-party vector library called libsleef [2], which are 
>> available in mainstream Linux distros (e.g. [3] [4]).
>> 
>> SLEEF supports multiple accuracies. To match Vector API's requirement and 
>> implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA 
>> instructions used stubs in libsleef for most of the operations by default, 
>> and 2) add the vector calling convention to apply with the runtime calls to 
>> stub code in libsleef. Note that for those APIs that libsleef does not 
>> support 1.0 ULP, we choose 0.5 ULP instead.
>> 
>> To help loading the expected libsleef library, this patch also adds an 
>> experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. 
>> People can use it to denote the libsleef path/name explicitly. By default, 
>> it points to the system installed library. If the library does not exist or 
>> the dynamic loading of it in runtime fails, the math vector ops will 
>> fall-back to use the default scalar version without error. But a warning is 
>> printed out if people specifies a nonexistent library explicitly.
>> 
>> Note that this is a part of the original proposed patch in panama-dev [5], 
>> just with some initial review comments addressed. And now we'd like to get 
>> some wider feedbacks from more hotspot experts.
>> 
>> [1] https://github.com/openjdk/jdk/pull/3638
>> [2] https://sleef.org/
>> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/
>> [4] https://packages.debian.org/bookworm/libsleef3
>> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html
>
> Xiaohong Gong has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains five additional 
> commits since the last revision:
> 
>  - Add a bundled native lib in jdk as a bridge to libsleef
>  - Merge 'jdk:master' into JDK-8312425
>  - Disable sleef by default
>  - Merge 'jdk:master' into JDK-8312425
>  - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF

Have you considered the possibility of copying the sleef source to the OpenJDK 
repository and thereby it becomes part of the build process? I don't know how 
straightforward that is technically and IANAL but I think it's worth exploring.

Also it may enable us to use sleef for other platforms where we have gaps 
(looking at Table 1.1 of https://sleef.org/). Further out it should inspire us 
to do a Java Vector API port to as indicated in a prior comment.  

> Yes, libsleef is used to build the binary if found. And at runtime, if the 
> libsleef with right version is not found, `dlopen` to the libvmath.so will 
> fail. And then all the operations will be fall-back to the java default 
> implementation. X86_64 has also bundled the Intel's SVML binary to jdk image 
> at build time. And we use the same way loading/opening the library at runtime.

-

PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821885433


Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]

2023-11-21 Thread Paul Sandoz
On Mon, 23 Oct 2023 09:02:35 GMT, Xiaohong Gong  wrote:

> This looks good. As far as I can tell the choice you've made of accuracy 
> matches what we need to meet the spec. 

Same here . Sinh/cosh/tanh/expm1 are specified to be within 2.5 ulps of the 
exact result, but i believe sleef does not offer that option, it's either 
within 1 or 3.5, so we have to pick the former. AFAICT sleef does not say 
anything about monotonicity, but we relax semi-monotonicity for all but sqrt 
(we defer to IEEE 754).

-

PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821422985


Re: RFR: 8318966: Some methods make promises about Java array element alignment that are too strong

2023-11-16 Thread Paul Sandoz
On Wed, 15 Nov 2023 22:46:03 GMT, Jorn Vernee  wrote:

> See the JBS issue for an extended problem description.
> 
> This patch changes the specification and implementation of 
> `MethodHandles::byteArrayViewVarHandle`, 
> `MethodHandles::byteBufferViewVarHandle`, `ByteBuffer::alignedSlice`, and 
> `ByteBuffer::alignmentOffset` to weaken the guarantees they make about the 
> alignment of Java array elements, in order to bring them in line with the 
> guarantees made by an arbitrary JVM implementation (which makes no guarantees 
> about array element alignment beyond them being aligned to their natural 
> alignment).
> 
> - `MethodHandles::byteArrayViewVarHandle`: we can not guarantee any alignment 
> for the accesses. Which means we can only reliably support plain get and set 
> access modes. The javadoc text explaining which other access modes are 
> supported, or how to compute aligned offsets into the array is dropped, as it 
> is not guaranteed to be correct on all JVM implementations. The 
> implementation of the returned VarHandle is changed to throw an 
> `UnsupportedOperationException` for the unsupported access modes, as mandated 
> by the spec of `VarHandle` [1].
> 
> - `MethodHandles::byteBufferViewVarHandle`: the implementation/spec is 
> incorrect when accessing a heap buffer (wrapping a byte[]), for the same 
> reasons as `byteArrayViewVarHandle`. The spec is changed to specify that when 
> accessing a _heap buffer_, only plain get and set access modes are supported. 
> The implementation of the returned var handle is changed to throw an 
> `IllegalStateException` when an access is attempted on a heap buffer using an 
> access mode other than plain get or set. Note that we don't throw an outright 
> `UnsupportedOperationException` for this case, since whether the access modes 
> are supported depends on the byte buffer instance being used.
> 
> - `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset`: The former 
> method depends directly on the latter for all its alignment computations. We 
> change the implementation of the latter method to throw an 
> `UnsupportedOperationException` for all unit sizes greater than 1, when the 
> buffer is non-direct. This change is largely covered by the existing 
> specification:
> 
> 
>  * @throws UnsupportedOperationException
>  * If the native platform does not guarantee stable alignment 
> offset
>  * values for the given unit size when managing the memory regions
>  * of buffers of the same kind as this buffer (direct or
>  * non-direct).  For example, if garbage collection would result
>  * in the mo...

src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 4518:

> 4516:  * Only plain {@linkplain VarHandle.AccessMode#GET get} and 
> {@linkplain VarHandle.AccessMode#SET set}
> 4517:  * access modes are supported by the returned var handle. For all 
> other access modes, an
> 4518:  * {@link UnsupportedOperationException} will be thrown.

I recommend adding an api note explaining that native memory segments, direct 
byte buffers, or heap memory segments backed by long[] should be used if 
support for other access modes are required.

src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 4610:

> 4608:  * {@link Double#doubleToRawLongBits}, respectively).
> 4609:  * 
> 4610:  * Access to heap byte buffers is always unaligned.

I recommend merging that sentence into the paragraph on heap byte buffers e.g.:
> For direct buffers, access of the bytes at an index is always misaligned. As 
> a result only the plain...

src/java.base/share/classes/java/nio/X-Buffer.java.template line 2218:

> 2216:  * @implNote
> 2217:  * This implementation throws {@code UnsupportedOperationException} 
> for
> 2218:  * non-direct buffers when the given unit size is greater then 
> {@code 1}.

This is no longer an implementation note, its now part of the specified API. So 
i think we can simplify the text of the `@throws UOE ...` to just say:

@throws UOE if the buffer is non-direct and the unit size > 1

Same for the other method.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16681#discussion_r1396097577
PR Review Comment: https://git.openjdk.org/jdk/pull/16681#discussion_r1396101675
PR Review Comment: https://git.openjdk.org/jdk/pull/16681#discussion_r1396109655


Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v9]

2023-11-15 Thread Paul Sandoz
On Wed, 15 Nov 2023 17:50:48 GMT, Viktor Klang  wrote:

>> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461)
>
> Viktor Klang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Improvements after feedback

Elegantly and thoroughly done (I mostly focused on the API and implementation). 
I have been following the work and providing ongoing feedback hence no specific 
comments here.

Going preview now will allow for some additional time and feedback on the 
ergonomics of Gatherer construction. There is also the intriguing prospect of 
further work to replace stream internals if the performance holds up (the 
ability to optimize when composing gathers seems key here as you already have 
explored in the performance tests - the runtime compiler should be able to see 
through the shorter paths more easily).

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16420#pullrequestreview-1733265473


Re: RFR: JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview) [v5]

2023-11-13 Thread Paul Sandoz
On Mon, 13 Nov 2023 09:22:13 GMT, Viktor Klang  wrote:

>> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461)
>
> Viktor Klang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Minor improvements to Gatherer Javadoc

src/java.base/share/classes/java/util/stream/Gatherer.java line 40:

> 38:  * stream of output elements, optionally applying a final action when the 
> end of
> 39:  * the upstream is reached. The transformation may be stateless or 
> stateful,
> 40:  * and a Gatherer may buffer arbitrarily much input before producing any 
> output.

Suggestion:

 * and may buffer input before producing any output.

src/java.base/share/classes/java/util/stream/Gatherer.java line 63:

> 61:  * 
> 62:  *
> 63:  * Each invocation to {@link #initializer()}, {@link #integrator()},

Suggestion:

 * Each invocation of {@link #initializer()}, {@link #integrator()},

src/java.base/share/classes/java/util/stream/Gatherer.java line 119:

> 117:  *
> 118:  * As an example, in order to create a gatherer to implement a 
> sequential
> 119:  * Prefix Scan as a Gatherer, it could be done the following way:

Suggestion:

 * As an example, a Gatherer implementing a sequential Prefix Scan 
 * could be done the following way:

src/java.base/share/classes/java/util/stream/Gatherer.java line 152:

> 150:  * }
> 151:  *
> 152:  * @implSpec Libraries that implement transformation based on {@code 
> Gatherer},

Suggestion:

 * @implSpec Libraries that implement transformations based on {@code Gatherer},

src/java.base/share/classes/java/util/stream/Gatherer.java line 166:

> 164:  * arguments passed to the combiner function, and the argument 
> passed to the
> 165:  * finisher function must be the result of a previous invocation of 
> the
> 166:  * initializer, integrator, or combiner functions.

the integrator returns a boolean, so is it just the result of a previous 
invocation of the initializer or combiner functions? (Similarly for the next 
clause.)

src/java.base/share/classes/java/util/stream/Gatherer.java line 185:

> 183:  * partitions state using the combiner, and then invoking the 
> finisher on
> 184:  * the joined state. Outputs and state later in the input sequence 
> will
> 185:  * be discarded if processing an earlier segment short-circuits.

Suggestion:

 * be discarded if processing an earlier partition short-circuits.

src/java.base/share/classes/java/util/stream/Gatherer.java line 501:

> 499: 
> 500: /**
> 501:  * Allows for checking whether the next stage is known to not 
> want

Suggestion:

 * Checks whether the next stage is known to not want

src/java.base/share/classes/java/util/stream/Gatherer.java line 558:

> 556: 
> 557: /**
> 558:  * Factory method for turning Integrator-shaped lambdas into

Suggestion:

 * Targets Integrator-shaped lambdas to

It's not a factory, as it does produce anything, it's an identity function that 
aids in the targeting of the lambda expression to the integrator when there 
would otherwise be ambiguity as to the target.

src/java.base/share/classes/java/util/stream/Gatherers.java line 47:

> 45: 
> 46: /**
> 47:  * Implementations of {@link Gatherer} that implement various useful 
> intermediate

Suggestion:

 * Implementations of {@link Gatherer} that provide useful intermediate

src/java.base/share/classes/java/util/stream/Gatherers.java line 80:

> 78:  *  and eagerly. This means that choosing large window sizes 
> for
> 79:  *  small streams may use excessive memory for the duration of
> 80:  *  evaluation of this operation.

Suggestion:

 *  small streams may use excessive memory during
 *  evaluation of this operation.

(Same for other methods.)

src/java.base/share/classes/java/util/stream/Gatherers.java line 329:

> 327: 
> 328: /**
> 329:  * An operation which executes operations concurrently

Suggestion:

 * An operation which executes a function concurrently

src/java.base/share/classes/java/util/stream/Gatherers.java line 340:

> 338:  * If the mapper throws an exception during evaluation of this 
> Gatherer,
> 339:  * and the result of that invocation is to be produced to the 
> downstream,
> 340:  * then that exception will instead be rethrown as a {@link 
> RuntimeException}.

Suggestion:

 * If a result of the function is to be pushed downstream but instead 
the function completed
 * exceptionally then the corresponding exception will instead be rethrown 
by this method as an 
 * instance of {@link RuntimeException}. After which any remaining tasks 
are canceled.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1391814346
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1391815675
PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1391818520
PR 

Re: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch

2023-11-07 Thread Paul Sandoz
On Tue, 7 Nov 2023 17:45:36 GMT, Paul Sandoz  wrote:

>> This test requires certain methods to be compiled, but without `-Xbatch` the 
>> compiler races against the test code, which can lead to intermittent 
>> failures.
>
> Marked as reviewed by psandoz (Reviewer).

> @PaulSandoz do you see any problem with this change? Adding `-Xbatch` does 
> not significantly increase the test run time.

Seems ok to me.

-

PR Comment: https://git.openjdk.org/jdk/pull/16023#issuecomment-1799331748


Re: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch

2023-11-07 Thread Paul Sandoz
On Tue, 3 Oct 2023 07:47:30 GMT, Gergö Barany  wrote:

> This test requires certain methods to be compiled, but without `-Xbatch` the 
> compiler races against the test code, which can lead to intermittent failures.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16023#pullrequestreview-1718335476


Re: RFR: 8318678: Vector access on heap MemorySegments only works for byte[] [v3]

2023-10-30 Thread Paul Sandoz
On Mon, 30 Oct 2023 13:43:13 GMT, Per Minborg  wrote:

>> This PR proposes removing the restriction that only heap `MemorySegment` 
>> wrapping a `byte` array can be accessed by Vectors. Now any array type can 
>> be used provided the element alignment constraints are respected.
>
> Per Minborg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add benchmark and intrinsification spy

Looks good. Please remove `IntrinsicHeapTest` and let's follow up in another 
issue/PR to fix the mismatched heap access not being intrinsic.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16360#pullrequestreview-1704600236


Re: RFR: 8318678: Vector access on heap MemorySegments only works for byte[] [v3]

2023-10-30 Thread Paul Sandoz
On Mon, 30 Oct 2023 13:43:13 GMT, Per Minborg  wrote:

>> This PR proposes removing the restriction that only heap `MemorySegment` 
>> wrapping a `byte` array can be accessed by Vectors. Now any array type can 
>> be used provided the element alignment constraints are respected.
>
> Per Minborg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add benchmark and intrinsification spy

test/jdk/jdk/incubator/vector/IntrinsicHeapTest.java line 48:

> 46: import static org.testng.Assert.*;
> 47: 
> 48: public class IntrinsicHeapTest {

I don't think we need this test as part of this PR. It's useful ascertain what 
is intrinsic or not. We can rethink it as part of the following fix if needed 
to verify the intrinsics work across all the cross product of array types and 
vector types.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16360#discussion_r1376494655


Re: RFR: 8318678: Vector access on heap MemorySegments only works for byte[] [v2]

2023-10-26 Thread Paul Sandoz
On Thu, 26 Oct 2023 09:17:25 GMT, Per Minborg  wrote:

>> This PR proposes removing the restriction that only heap `MemorySegment` 
>> wrapping a `byte` array can be accessed by Vectors. Now any array type can 
>> be used provided the element alignment constraints are respected.
>
> Per Minborg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Allow unaligned array access

We also need to review the intrinsic for load/store found here to ensure, when 
the hardware supports it, that mixed access is optimized:

https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L967

I recommend writing a simple benchmark with various forms of mixed access and 
verify the intrinsics kick in (longer term IR tests would be preferable).

-

PR Comment: https://git.openjdk.org/jdk/pull/16360#issuecomment-1781482409


Re: RFR: 8318421 : AbstractPipeline.sourceStageSpliterator() chases pointers needlessly

2023-10-19 Thread Paul Sandoz
On Wed, 18 Oct 2023 10:14:08 GMT, Viktor Klang  wrote:

> Removes a few unnecessary dereferences of `sourceStage`.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16241#pullrequestreview-1685995672


Re: RFR: 8318420 : AbstractPipeline invokes overridden method in constructor

2023-10-19 Thread Paul Sandoz
On Wed, 18 Oct 2023 09:45:53 GMT, Viktor Klang  wrote:

> This PR corrects so that `opIsStateful()` is not invoked as a part of the 
> java.util.stream.AbstractPipeline constructor—as `opIsStateful()` is intended 
> to be overridden by subclasses, and as their own constructors have not run 
> when their superclass constructor runs, this means that `opIsStateful()` 
> cannot base its return value on any class members of the subclass.
> 
> Since `opIsStateful()` is only needed for parallel streams, we can therefor 
> defer the invocation of `opIsStateful()` until evaluation-time, and as such 
> we can remove the need for having an instance field to store the result of 
> the invocation—making Stream instances potentially a tiny bit smaller, 
> reducing Stream-construction overhead, while still preserving semantics.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16239#pullrequestreview-1685991329


Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]

2023-10-12 Thread Paul Sandoz
On Thu, 12 Oct 2023 03:44:28 GMT, Danny Thomas  wrote:

> At least on Saphire Rapids the [emulation suggested 
> here](https://github.com/natmaurice/x86-simd-sort/commit/41d03b2d8f3b62a2ee6a3a97a8da7f193a407026)
>  only imposes a 6% penalty for `intSort`, while also mitigating the 
> performance issue on Zen 4.

Thanks for testing this out, this seems a good solution to explore. 

Is there is a way to `ifdef` switching on the CPU vendor? otherwise we could 
probably set a flag in the build script based of the CPU vendor, which may be 
more robust if different compilers are used.

-

PR Comment: https://git.openjdk.org/jdk/pull/16124#issuecomment-1759992778


Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-09 Thread Paul Sandoz
On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa  wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The pull request now contains 45 commits:
> 
>  - fix code style and formatting
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
>  - Update CompileThresholdScaling only for the sort and partition intrinsics; 
> update build script to remove nested if
>  - change variable names of indexPivot* to pivotIndex*
>  - Update DualPivotQuicksort.java
>  - Rename arraySort and arrayPartition Java methods to sort and partition. 
> Cleanup some comments
>  - Remove the unnecessary exception in single pivot partitioning fallback 
> method
>  - Move functional interfaces close to the associated methods
>  - Refactor the sort and partition 

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-09 Thread Paul Sandoz
On Sun, 8 Oct 2023 06:18:13 GMT, Danny Thomas  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The pull request now contains 45 commits:
>> 
>>  - fix code style and formatting
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
>>  - Update CompileThresholdScaling only for the sort and partition 
>> intrinsics; update build script to remove nested if
>>  - change variable names of indexPivot* to pivotIndex*
>>  - Update DualPivotQuicksort.java
>>  - Rename arraySort and arrayPartition Java methods to sort and partition. 
>> Cleanup some comments
>>  - Remove the unnecessary exception in single pivot partitioning fallback 
>> method
>>  - Move functional interfaces close to the associated methods
>>  - Refactor the sort and partition intrinsics to accept method references 
>> for fallback functions
>>  - Refactor stub handling to use a generic function for all types
>>  - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86
>
> A [discussion on 
> Reddit](https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/)
>  raised that this had the potential to regress sort performance on AMD Zen 4. 
> The poster didn't have access to Zen 4 hardware, so I took a moment to run 
> the benchmark, comparing against a baseline with `UseAVX=2` on an AWS 
> `m7a.2xlarge`:
> 
> 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 25
> model : 17
> model name: AMD EPYC 9R14
> stepping  : 1
> microcode : 0xa10113e
> cpu MHz   : 3673.537
> cache size: 1024 KB
> physical id   : 0
> siblings  : 8
> core id   : 0
> cpu cores : 8
> apicid: 0
> initial apicid: 0
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 16
> wp: yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
> tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic 
> movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy 
> cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core 
> invpcid_single ssbd perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 
> smep bmi2 invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb 
> avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 
> clzero xsaveerptr rdpru wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni 
> vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid flush_l1d
> bugs  : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
> bogomips  : 5200.00
> TLB size  : 3584 4K pages
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management:
> 
> 
> 
> $ make test TEST="micro:java.lang.ArraysSort.intSort"
> 
> Benchmark(size)  Mode  Cnt  Score Error  Units
> ArraysSort.intSort   10  avgt3  0.033 ?   0.001  us/op
> ArraysSort.intSort   25  avgt3  0.067 ?   0.002  us/op
> ArraysSort.intSort   50  avgt3  0.391 ?   0.007  us/op
> ArraysSort.intSort   75  avgt3  0.923 ?   0.041  us/op
> ArraysSort.intSort  100  avgt3  1.292 ?   0.009  us/op
> ArraysSort.intSort 1000  avgt3 24.268 ?   0.078  us/op
> ArraysSort.intSort1  avgt3320.131 ?   0.381  us/op
> ArraysSort.intSort   10  avgt3   4803.142 ?  63.901  us/op
> ArraysSort.intSort  100  avgt3  61457.708 ? 347.446  us/op
> 
> 
> 
> $ make test TEST="micro:java.lang.ArraysSort.intSort" MICRO...

@DanielThomas thank you, i was not aware of the limitation on AMD Zen 4. We 
need to address this and it appears the fix is simple and localized. Thankfully 
we are early into the JDK 22 development cycle so we have time to shake this 
out.

-

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1753412092


Re: RFR: 8316998: Remove redundant type arguments in the java.util.stream package [v3]

2023-09-28 Thread Paul Sandoz
On Fri, 29 Sep 2023 00:21:52 GMT, Mourad Abbay  wrote:

> Are you referring to the expand of lambdas and the extra whitespaces ?

Yes, sometimes the IDE just does it automatically!

-

PR Comment: https://git.openjdk.org/jdk/pull/15936#issuecomment-1740160022


Re: RFR: 8316998: Remove redundant type arguments in the java.util.stream package [v3]

2023-09-28 Thread Paul Sandoz
On Fri, 29 Sep 2023 00:37:16 GMT, Mourad Abbay  wrote:

> > > Are you referring to the expand of lambdas and the extra whitespaces ?
> > 
> > 
> > Yes, sometimes the IDE just does it automatically!
> 
> Is there a way I can make the IDE respect the code style?

I doubt it, since 1) there is no agreed on code style in OpenJDK (although 
there have been attempts to propose one); and 2) each source file might have 
its own style (and there may even be inconsistencies within the same file).

-

PR Comment: https://git.openjdk.org/jdk/pull/15936#issuecomment-1740165050


Re: RFR: 8317119: Remove unused imports in the java.util.stream package [v2]

2023-09-28 Thread Paul Sandoz
On Wed, 27 Sep 2023 22:44:52 GMT, Mourad Abbay  wrote:

>> Remove unused imports in the java.util.stream package.
>
> Mourad Abbay has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update copyright year.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15949#pullrequestreview-1649773090


Re: RFR: 8317034: Remove redundant type cast in the java.util.stream package [v3]

2023-09-28 Thread Paul Sandoz
On Wed, 27 Sep 2023 22:47:43 GMT, Mourad Abbay  wrote:

>> Remove redundant type cast in the java.util.stream package.
>
> Mourad Abbay has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update copyright year.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15942#pullrequestreview-1649773625


Re: RFR: 8316998: Remove redundant type arguments in the java.util.stream package [v3]

2023-09-28 Thread Paul Sandoz
On Wed, 27 Sep 2023 22:44:07 GMT, Mourad Abbay  wrote:

>> Remove cases of redundant type arguments in the java.util.stream package.
>
> Mourad Abbay has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update copyright year.

Looks good. Note a general rule to mostly follow is to try and stick to the 
prevailing style in source being modified and not make additional and unrelated 
changes (no matter how tempting it might be). In this case i don't have strong 
opinions on those changes and they did not detract from reviewing the code.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15936#pullrequestreview-1649772340


Re: RFR: 8317264: Pattern.Bound has `static` fields that should be `static final`. [v2]

2023-09-28 Thread Paul Sandoz
On Thu, 28 Sep 2023 20:39:13 GMT, Eamonn McManus  wrote:

>> It looks to have been an oversight that `final` was omitted. The fields are 
>> never assigned after initialization. `final` leads to shorter bytecode.
>
> Eamonn McManus has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into staticfinal
>  - In `Pattern.Bound`, make some constants `static final`.
>
>It looks to have been an oversight that `final` was omitted. The fields are
>never assigned after initialization. `final` leads to shorter bytecode.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15967#pullrequestreview-1649761096


Re: RFR: 8317034: Redundant type cast

2023-09-27 Thread Paul Sandoz
On Wed, 27 Sep 2023 08:50:20 GMT, Mourad Abbay  wrote:

> Remove redundant type cast in the java.util.stream package.

src/java.base/share/classes/java/util/stream/DoublePipeline.java line 391:

> 389: if (maxSize < 0)
> 390: throw new IllegalArgumentException(Long.toString(maxSize));
> 391: return SliceOps.makeDouble(this, 0, maxSize);

Suggestion:

return SliceOps.makeDouble(this, 0L, maxSize);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15942#discussion_r1339158996


Re: RFR: 8316998: Remove redundant type arguments

2023-09-27 Thread Paul Sandoz
On Wed, 27 Sep 2023 00:58:01 GMT, Mourad Abbay  wrote:

> Remove cases of redundant type arguments in the java.util.stream package.

Looks like there are a few others in the package. In `Collectors`, see see 
construction of `CollectorImpl`. See also `DistinctOp`, `DoublePipeline`, 
`IntPipeline`, `LongPipeline`, `ReferencePipeline`, and `WhileOps`.

-

PR Comment: https://git.openjdk.org/jdk/pull/15936#issuecomment-1738008502


Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]

2023-09-26 Thread Paul Sandoz
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee  wrote:

>> This patch contains the implementation of the foreign linker & memory API 
>> JEP for Java 22. The initial patch is composed of commits brought over 
>> directly from the [panama-foreign 
>> repo](https://github.com/openjdk/panama-foreign). The main changes found in 
>> this patch come from the following PRs:
>> 
>> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous 
>> iterations supported converting Java strings to and from native strings in 
>> the UTF-8 encoding, we've extended the supported encoding to all the 
>> encodings found in the `java.nio.charset.StandardCharsets` class.
>> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the 
>> `MemoryLayout::sequenceLayout` factory method which inferred the size of the 
>> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A 
>> client is now required to explicitly specify the sequence size.
>> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: 
>> `Linker::canonicalLayouts`, which exposes a map containing the 
>> platform-specific mappings of common C type names to memory layouts.
>> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access 
>> varhandles, as well as byte offset and slice handles derived from memory 
>> layouts, now feature an additional 'base offset' coordinate that is added to 
>> the offset computed by the handle. This allows composing these handles with 
>> other offset computation strategies that may not be based on the same memory 
>> layout. This addresses use-cases where clients are working with 'dynamic' 
>> layouts, whose size might not be known statically, such as variable length 
>> arrays, or variable size matrices.
>> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now 
>> redundant API. Clients can simply use the difference between the base 
>> address of two memory segments.
>> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of 
>> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + 
>> initialize memory segments to `allocateFrom`. (see the original PR for the 
>> problematic case)
>> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the 
>> documentation for variadic functions.
>> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to 
>> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking 
>> linux-x86 as a test bed.
>> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API 
>> required. The `Linker::nativeLinker` method is not longer allowed to throw 
>> an `UnsupportedO...
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix typos

This looks good.

In the implementation the functional interfaces 
`BindingInterpreter.StoreFunc/LoadFunc` are package private, but are used in 
internal public signatures. This was previously like this and it's not a big 
deal but i recommend making those interfaces public.

We can also remove `@enablePreview` from `IndirectVarHandleTest`

-

Marked as reviewed by psandoz (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1645266436


Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]

2023-09-26 Thread Paul Sandoz
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee  wrote:

>> This patch contains the implementation of the foreign linker & memory API 
>> JEP for Java 22. The initial patch is composed of commits brought over 
>> directly from the [panama-foreign 
>> repo](https://github.com/openjdk/panama-foreign). The main changes found in 
>> this patch come from the following PRs:
>> 
>> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous 
>> iterations supported converting Java strings to and from native strings in 
>> the UTF-8 encoding, we've extended the supported encoding to all the 
>> encodings found in the `java.nio.charset.StandardCharsets` class.
>> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the 
>> `MemoryLayout::sequenceLayout` factory method which inferred the size of the 
>> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A 
>> client is now required to explicitly specify the sequence size.
>> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: 
>> `Linker::canonicalLayouts`, which exposes a map containing the 
>> platform-specific mappings of common C type names to memory layouts.
>> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access 
>> varhandles, as well as byte offset and slice handles derived from memory 
>> layouts, now feature an additional 'base offset' coordinate that is added to 
>> the offset computed by the handle. This allows composing these handles with 
>> other offset computation strategies that may not be based on the same memory 
>> layout. This addresses use-cases where clients are working with 'dynamic' 
>> layouts, whose size might not be known statically, such as variable length 
>> arrays, or variable size matrices.
>> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now 
>> redundant API. Clients can simply use the difference between the base 
>> address of two memory segments.
>> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of 
>> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + 
>> initialize memory segments to `allocateFrom`. (see the original PR for the 
>> problematic case)
>> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the 
>> documentation for variadic functions.
>> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to 
>> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking 
>> linux-x86 as a test bed.
>> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API 
>> required. The `Linker::nativeLinker` method is not longer allowed to throw 
>> an `UnsupportedO...
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix typos

src/java.base/share/classes/java/lang/foreign/Linker.java line 35:

> 33: 
> 34: import java.lang.invoke.MethodHandle;
> 35: import java.nio.ByteOrder;

Unused?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337802958


Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]

2023-09-26 Thread Paul Sandoz
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee  wrote:

>> This patch contains the implementation of the foreign linker & memory API 
>> JEP for Java 22. The initial patch is composed of commits brought over 
>> directly from the [panama-foreign 
>> repo](https://github.com/openjdk/panama-foreign). The main changes found in 
>> this patch come from the following PRs:
>> 
>> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous 
>> iterations supported converting Java strings to and from native strings in 
>> the UTF-8 encoding, we've extended the supported encoding to all the 
>> encodings found in the `java.nio.charset.StandardCharsets` class.
>> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the 
>> `MemoryLayout::sequenceLayout` factory method which inferred the size of the 
>> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A 
>> client is now required to explicitly specify the sequence size.
>> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: 
>> `Linker::canonicalLayouts`, which exposes a map containing the 
>> platform-specific mappings of common C type names to memory layouts.
>> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access 
>> varhandles, as well as byte offset and slice handles derived from memory 
>> layouts, now feature an additional 'base offset' coordinate that is added to 
>> the offset computed by the handle. This allows composing these handles with 
>> other offset computation strategies that may not be based on the same memory 
>> layout. This addresses use-cases where clients are working with 'dynamic' 
>> layouts, whose size might not be known statically, such as variable length 
>> arrays, or variable size matrices.
>> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now 
>> redundant API. Clients can simply use the difference between the base 
>> address of two memory segments.
>> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of 
>> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + 
>> initialize memory segments to `allocateFrom`. (see the original PR for the 
>> problematic case)
>> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the 
>> documentation for variadic functions.
>> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to 
>> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking 
>> linux-x86 as a test bed.
>> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API 
>> required. The `Linker::nativeLinker` method is not longer allowed to throw 
>> an `UnsupportedO...
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix typos

src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java 
line 152:

> 150: private static long allocateMemoryWrapper(long size) {
> 151: try {
> 152: return UNSAFE.allocateMemory(size);

Since we now zero memory only when needed we should test very carefully.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337794211


Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]

2023-09-26 Thread Paul Sandoz
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee  wrote:

>> This patch contains the implementation of the foreign linker & memory API 
>> JEP for Java 22. The initial patch is composed of commits brought over 
>> directly from the [panama-foreign 
>> repo](https://github.com/openjdk/panama-foreign). The main changes found in 
>> this patch come from the following PRs:
>> 
>> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous 
>> iterations supported converting Java strings to and from native strings in 
>> the UTF-8 encoding, we've extended the supported encoding to all the 
>> encodings found in the `java.nio.charset.StandardCharsets` class.
>> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the 
>> `MemoryLayout::sequenceLayout` factory method which inferred the size of the 
>> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A 
>> client is now required to explicitly specify the sequence size.
>> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: 
>> `Linker::canonicalLayouts`, which exposes a map containing the 
>> platform-specific mappings of common C type names to memory layouts.
>> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access 
>> varhandles, as well as byte offset and slice handles derived from memory 
>> layouts, now feature an additional 'base offset' coordinate that is added to 
>> the offset computed by the handle. This allows composing these handles with 
>> other offset computation strategies that may not be based on the same memory 
>> layout. This addresses use-cases where clients are working with 'dynamic' 
>> layouts, whose size might not be known statically, such as variable length 
>> arrays, or variable size matrices.
>> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now 
>> redundant API. Clients can simply use the difference between the base 
>> address of two memory segments.
>> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of 
>> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + 
>> initialize memory segments to `allocateFrom`. (see the original PR for the 
>> problematic case)
>> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the 
>> documentation for variadic functions.
>> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to 
>> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking 
>> linux-x86 as a test bed.
>> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API 
>> required. The `Linker::nativeLinker` method is not longer allowed to throw 
>> an `UnsupportedO...
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix typos

src/java.base/share/classes/java/lang/foreign/Linker.java line 735:

> 733:  *
> 734:  * @apiNote This linker option can not be combined with {@link 
> #critical}.
> 735:  *

That seems more specification (that can be asserted on) then an informative 
note.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337661700


Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]

2023-09-26 Thread Paul Sandoz
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee  wrote:

>> This patch contains the implementation of the foreign linker & memory API 
>> JEP for Java 22. The initial patch is composed of commits brought over 
>> directly from the [panama-foreign 
>> repo](https://github.com/openjdk/panama-foreign). The main changes found in 
>> this patch come from the following PRs:
>> 
>> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous 
>> iterations supported converting Java strings to and from native strings in 
>> the UTF-8 encoding, we've extended the supported encoding to all the 
>> encodings found in the `java.nio.charset.StandardCharsets` class.
>> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the 
>> `MemoryLayout::sequenceLayout` factory method which inferred the size of the 
>> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A 
>> client is now required to explicitly specify the sequence size.
>> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: 
>> `Linker::canonicalLayouts`, which exposes a map containing the 
>> platform-specific mappings of common C type names to memory layouts.
>> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access 
>> varhandles, as well as byte offset and slice handles derived from memory 
>> layouts, now feature an additional 'base offset' coordinate that is added to 
>> the offset computed by the handle. This allows composing these handles with 
>> other offset computation strategies that may not be based on the same memory 
>> layout. This addresses use-cases where clients are working with 'dynamic' 
>> layouts, whose size might not be known statically, such as variable length 
>> arrays, or variable size matrices.
>> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now 
>> redundant API. Clients can simply use the difference between the base 
>> address of two memory segments.
>> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of 
>> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + 
>> initialize memory segments to `allocateFrom`. (see the original PR for the 
>> problematic case)
>> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the 
>> documentation for variadic functions.
>> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to 
>> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking 
>> linux-x86 as a test bed.
>> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API 
>> required. The `Linker::nativeLinker` method is not longer allowed to throw 
>> an `UnsupportedO...
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix typos

src/java.base/share/classes/java/lang/Module.java line 328:

> 326: System.err.printf("""
> 327: WARNING: A restricted method in %s has been 
> called
> 328: WARNING: %s has been called%s in %s

Suggestion:

WARNING: %s has been called by %s in %s

?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337640870


Re: RFR: 8271268: Fix Javadoc links for Stream.mapMulti [v3]

2023-09-22 Thread Paul Sandoz
On Fri, 22 Sep 2023 17:08:48 GMT, Mourad Abbay  wrote:

>> Fix Javadoc links for Stream.mapMulti, Stream.MapMultiInt, 
>> Stream.mapMultiToInt, Stream.mapMultiToLong and Stream.mapMultiToDouble.
>
> Mourad Abbay has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Display links to Stream.flatMap and Stream.mapMulti without parameters.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15794#pullrequestreview-1640438479


Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Paul Sandoz
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

The Java changes look good to me, it slots in very neatly with the same 
space/memory characteristics. It's generally pleasing to me to see recent data 
parallel sorting techniques from academic literature applied in the Java 
platform.

I wish it were easier to support different OS platforms (e.g., MacOS with x86) 
but its a compromise on complexity of the sort implementations. 

I wonder if it makes sense to adjust the insertion sort thresholds when the 
intrinsics are enabled. That could be revisited later.

-

Marked as reviewed by psandoz (Reviewer).

PR Review: 

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Paul Sandoz
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

test/jdk/java/util/Arrays/Sorting.java line 30:

> 28:  * @build Sorting
> 29:  * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
> -XX:DisableIntrinsic=_arraySort,_arrayPartition, Sorting -shortrun
> 30:  * @run main/othervm -XX:CompileThreshold=1 -XX:-TieredCompilation 
> Sorting -shortrun

It would be useful to target the compilation threshold only for the intrinsic 
methods, but i suspect that is not possible?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r151101


Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v39]

2023-09-20 Thread Paul Sandoz
On Wed, 20 Sep 2023 16:27:19 GMT, Srinivas Vamsi Parasa  
wrote:

> This API was suggested to me and was already reviewed by others.

Confirming so, this was my suggestion. We use the _double-register_ addressing 
mode (as commonly applied in unsafe), thereby the intrinsic is capable of being 
used on and off-heap. To reduce the number of intrinsic methods we pass the 
element class as the first argument.

-

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1728089648


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v9]

2023-09-20 Thread Paul Sandoz
On Fri, 15 Sep 2023 20:17:17 GMT, Paul Sandoz  wrote:

>>> Alan, you mentioned that DualPivotQuicksort will need detailed review. Can 
>>> we go ahead and start reviewing? Laurent checked performance, JMH results 
>>> look fine.
>> 
>> As before, I think the main question with this change is whether adding 
>> radix sort to the mix is worth the complexity and additional code to 
>> maintain. Also as we discussed in the previous PR, the additional memory 
>> needed for the radix sort may have an effect on other things that are going 
>> on concurrently. I know it has been updated to handle OOME but I think 
>> potential reviewers would need to be comfortable with that part.
>
>> > Alan, you mentioned that DualPivotQuicksort will need detailed review. Can 
>> > we go ahead and start reviewing? Laurent checked performance, JMH results 
>> > look fine.
>> 
>> As before, I think the main question with this change is whether adding 
>> radix sort to the mix is worth the complexity and additional code to 
>> maintain. Also as we discussed in the previous PR, the additional memory 
>> needed for the radix sort may have an effect on other things that are going 
>> on concurrently. I know it has been updated to handle OOME but I think 
>> potential reviewers would need to be comfortable with that part.
> 
> I too share concerns about the potential increased use of memory for sorting 
> ints/longs/floats/doubles. With modern SIMD hardware and data parallel 
> techniques we can apply quicksort much more efficiently. I think it is 
> important to determine to what extent this reduces the need for radix sort. 
> To determine that we would need to carefully measure against the AVX-512 
> implementation (with ongoing improvements) with appropriately initialized 
> data to sort, and further measure against an AVX2 version.

> Hi Paul (@PaulSandoz), Alan (@AlanBateman), Any update? Do you agree with 
> Radix sort in parallel case only?

I think its definitely a better fit, but another aspect of my previous comment 
was wondering if we need a radix sort if the vectorized quicksort 
implementation is fast enough. IMO we need to compare performance results with 
the vectorized quick sort, and be aware of future enhancements to that.

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1728082819


Re: RFR: 8271268: Fix Javadoc links for Stream.mapMulti

2023-09-20 Thread Paul Sandoz
On Mon, 18 Sep 2023 18:09:57 GMT, Mourad Abbay  wrote:

> Fix Javadoc links for Stream.mapMulti, Stream.MapMultiInt, 
> Stream.mapMultiToInt, Stream.mapMultiToLong and Stream.mapMultiToDouble.

Note to others. I suggested @mabbay pick up the dropped 
[PR](https://github.com/openjdk/jdk/pull/3909/), as a starter issue to help 
achieve OpenJDK author status.

-

PR Comment: https://git.openjdk.org/jdk/pull/15794#issuecomment-1724417397


Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v9]

2023-09-15 Thread Paul Sandoz
On Fri, 1 Sep 2023 06:12:57 GMT, Alan Bateman  wrote:

> > Alan, you mentioned that DualPivotQuicksort will need detailed review. Can 
> > we go ahead and start reviewing? Laurent checked performance, JMH results 
> > look fine.
> 
> As before, I think the main question with this change is whether adding radix 
> sort to the mix is worth the complexity and additional code to maintain. Also 
> as we discussed in the previous PR, the additional memory needed for the 
> radix sort may have an effect on other things that are going on concurrently. 
> I know it has been updated to handle OOME but I think potential reviewers 
> would need to be comfortable with that part.

I too share concerns about the potential increased use of memory for sorting 
ints/longs/floats/doubles. With modern SIMD hardware and data parallel 
techniques we can apply quicksort much more efficiently. I think it is 
important to determine to what extent this reduces the need for radix sort. To 
determine that we would need to carefully measure against the AVX-512 
implementation (with ongoing improvements) with appropriately initialized data 
to sort, and further measure against an AVX2 version.

-

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1721811993


Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v36]

2023-09-14 Thread Paul Sandoz
On Wed, 13 Sep 2023 23:02:21 GMT, Srinivas Vamsi Parasa  
wrote:

> Could you please have a look at the changes in `DualPivotQuicksort.java` and 
> provide your feedback?

I agree that is much cleaner, glad that worked out. That neatly covers multiple 
element types and Java-based insertion sort algorithms (although I don't know 
why we need two since mixed insertion effectively embeds the other but we don't 
need to address that here).

I recommend embedding the functional interfaces next to the associated methods, 
rather than as auxiliary classes, and also adding `@ForceInline` on `arraySort`.

-

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1720264496


Re: RFR: 8315938: Deprecate for removal Unsafe methods that have standard APIs for many releases

2023-09-08 Thread Paul Sandoz
On Fri, 8 Sep 2023 15:54:05 GMT, Alan Bateman  wrote:

> There are several methods defined by sun.misc.Unsafe that have standard API 
> equivalents for many years and releases. The change proposed here is to 
> deprecate, for removal, the park, unpark, getLoadAverage, loadFence, 
> storeFence, and fullFence methods. Code using these methods should move to 
> LockSupport.park/unpark (Java 5), OperatingSystemMXBean.getSystemLoadAverage 
> (Java 6), and VarHandles xxxFence methods (Java 9).
> 
> The following is a summary of a search of 175973022 classes in 484751 
> artifacts to get a sense of the usage of these methods.
> 
> - 1290 usages of Unsafe.park. 1195 are libraries that have re-packaged some 
> version of ForkJoinPool, maybe from the jsr166 repo, maybe an older JDK 
> release. In the remaining, only Ehcache stands out with 29 matches. It seems 
> to be one usage but the library seems to copied/shaded into other artifacts.
> 
> - 1322 usages of Unsafe.unpark. 1243 are re-packaged ForkJoinPool. 29 
> occurrences are Encache, again one usage but the library seems to 
> copied/shaded into other artifacts.
> 
> - 22 usages of Unsafe.getLoadAverage. Most of these are one usage in many 
> published versions of Apache HBase.
> 
> - 339 usages of Unsafe.loadFence, 1057 usages of Unsafe.storeFence, 517 
> usages of Unsafe.fullFence. A handful of these are libraries with copies of 
> j.u.concurrent, the rest seem to be high performance libraries that support 
> older JDK releases or just haven't been updated to use VarHandles yet.

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15641#pullrequestreview-1618162112


Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v33]

2023-09-07 Thread Paul Sandoz
On Thu, 31 Aug 2023 21:31:40 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change name of the avxsort library to libx86_64_sort

Thank you for the summary.

> Currently there is a regression of up to 20%, when the intrinsic is disabled. 
> This has been root caused to two reasons:
> 
> * performance loss due to using a generalized intrinsic for all data 
> types (`int, long, float, double`) with multiple indirections and `switch()` 
> in the fallback method implementation.
> 

Ok, i cannot guarantee the fallback lambda approach does not regress :-)

> * performance loss due to passing a `pivotIndices` array to the partition 
> methods and modifying it in place. Passing the pivot indices explicitly 

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v33]

2023-09-07 Thread Paul Sandoz
On Thu, 31 Aug 2023 21:31:40 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change name of the avxsort library to libx86_64_sort

Focusing on the Java changes, I am glad you managed to introduce intrinsics 
into the main Java sorting loop, with leaf array sorting and array partitioning 
thereby taking advantage of existing functionality.

Initially i advised trying to make the intrinsic work across heap and off-heap 
data, e.g., sorting `MemorySegment`. I still think that is a worthy goal, but 
we don't need to do that now, but I still think intrinsics should be 
base+offset shaped in preparation for that. 

I think we need to summarize future planned steps, some i believe were 
discussed with regards to 

Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v10]

2023-09-06 Thread Paul Sandoz
On Wed, 6 Sep 2023 10:50:59 GMT, Jorn Vernee  wrote:

>> src/java.base/share/classes/java/lang/foreign/Linker.java line 152:
>> 
>>> 150:  * 
>>> 151:  * The following table shows some examples of how C types are modelled 
>>> in Linux/x64 (all the examples provided
>>> 152:  * here will assume these platform-dependent mappings):
>> 
>> Up to you, but it might be useful to link to the ABI specifications if the 
>> links are considered stable.
>
> [This SO question](https://stackoverflow.com/a/40348010) points to a gitlab 
> repo that seems to have the latest version: 
> https://gitlab.com/x86-psABIs/x86-64-ABI But, I'm not sure how stable that 
> is, or if that's an authoritative source.
> 
> Alternatively, we could refer to the name only: "System V Application Binary 
> Interface - AMD64 Architecture Processor Supplement" (or "x86-64 psABI") Then 
> people can google for themselves and find it.

Yeah, its hard to find the official and latest version. Referring to the full 
title will help.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1317517319


Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v10]

2023-09-05 Thread Paul Sandoz
On Mon, 4 Sep 2023 15:54:41 GMT, Jorn Vernee  wrote:

>> This patch contains the implementation of the foreign linker & memory API 
>> JEP for Java 22. The initial patch is composed of commits brought over 
>> directly from the [panama-foreign 
>> repo](https://github.com/openjdk/panama-foreign). The main changes found in 
>> this patch come from the following PRs:
>> 
>> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous 
>> iterations supported converting Java strings to and from native strings in 
>> the UTF-8 encoding, we've extended the supported encoding to all the 
>> encodings found in the `java.nio.charset.StandardCharsets` class.
>> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the 
>> `MemoryLayout::sequenceLayout` factory method which inferred the size of the 
>> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A 
>> client is now required to explicitly specify the sequence size.
>> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: 
>> `Linker::canonicalLayouts`, which exposes a map containing the 
>> platform-specific mappings of common C type names to memory layouts.
>> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access 
>> varhandles, as well as byte offset and slice handles derived from memory 
>> layouts, now feature an additional 'base offset' coordinate that is added to 
>> the offset computed by the handle. This allows composing these handles with 
>> other offset computation strategies that may not be based on the same memory 
>> layout. This addresses use-cases where clients are working with 'dynamic' 
>> layouts, whose size might not be known statically, such as variable length 
>> arrays, or variable size matrices.
>> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now 
>> redundant API. Clients can simply use the difference between the base 
>> address of two memory segments.
>> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of 
>> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + 
>> initialize memory segments to `allocateFrom`. (see the original PR for the 
>> problematic case)
>> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the 
>> documentation for variadic functions.
>> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to 
>> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking 
>> linux-x86 as a test bed.
>> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API 
>> required. The `Linker::nativeLinker` method is not longer allowed to throw 
>> an `UnsupportedO...
>
> Jorn Vernee has updated the pull request incrementally with five additional 
> commits since the last revision:
> 
>  - 8315096: Allowed access modes in memory segment should depend on layout 
> alignment
>
>Reviewed-by: psandoz
>  - Add missing @implSpec to AddressLayout
>
>Reviewed-by: pminborg
>  - Fix misc typos in FFM API javadoc
>
>Reviewed-by: jvernee
>  - Clarify javadoc w.r.t. exceptions thrown by a memory access var handle 
> (part two)
>
>Reviewed-by: pminborg
>  - Clarify javadoc w.r.t. exceptions thrown by a memory access var handle
>
>Reviewed-by: jvernee

Recommend you do a sweep through the code for unused imports and fields and any 
rogue `@since` in the internal classes.

src/java.base/share/classes/java/lang/foreign/AddressLayout.java line 53:

> 51:  *
> 52:  * @implSpec
> 53:  * This class is immutable, thread-safe and  href="{@docRoot}/java.base/java/lang/doc-files/ValueBased.html">value-based.

Strictly speaking it's the implementations, as stated in the `Linker`:

* Implementations of this interface are immutable, thread-safe and value-based.

src/java.base/share/classes/java/lang/foreign/Linker.java line 152:

> 150:  * 
> 151:  * The following table shows some examples of how C types are modelled 
> in Linux/x64 (all the examples provided
> 152:  * here will assume these platform-dependent mappings):

Up to you, but it might be useful to link to the ABI specifications if the 
links are considered stable.

src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 439:

> 437:  * 
> 438:  * 
> 439:  * If the provided layout path {@code P} contains no dereference 
> elements, then the offset of the access operation is

Suggestion:

 * If the provided layout path {@code P} contains no dereference elements, 
then the offset {@code O} of the access operation is

src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 443:

> 441:  *
> 442:  * {@snippet lang = "java":
> 443:  * offset = this.offsetHandle(P).invokeExact(B, I1, I2, ... In);

Suggestion:

 * O = this.offsetHandle(P).invokeExact(B, I1, I2, ... In);

To align with the use of `O` later on.

src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 536:

> 534:  * 
> 535:  * 
> 

Re: RFR: 8314544: Matrix multiple benchmark using Vector API [v2]

2023-08-23 Thread Paul Sandoz
On Mon, 21 Aug 2023 03:55:42 GMT, Martin Stypinski  wrote:

>> Added a bunch of different implementations for Vector API Matrix 
>> Multiplications:
>> 
>> - Baseline
>> - Blocked (Cache Local)
>> - FMA
>> - Vector API Simple Implementation
>> - Vector API Blocked Implementation
>> 
>> Commit was discussed with @PaulSandoz
>
> Martin Stypinski has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - changed for consistency
>  - improved some RandomGenerator & unuseed Imports

Marked as reviewed by psandoz (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15338#pullrequestreview-1592478060


Re: RFR: 8314544: Matrix multiple benchmark using Vector API

2023-08-18 Thread Paul Sandoz
On Fri, 18 Aug 2023 03:57:24 GMT, Martin Stypinski  wrote:

> Added a bunch of different implementations for Vector API Matrix 
> Multiplications:
> 
> - Baseline
> - Blocked (Cache Local)
> - FMA
> - Vector API Simple Implementation
> - Vector API Blocked Implementation
> 
> Commit was discussed with @PaulSandoz

test/micro/org/openjdk/bench/jdk/incubator/vector/MatrixMultiplicationBenchmark.java
 line 33:

> 31: 
> 32: import org.openjdk.jmh.annotations.*;
> 33: import org.openjdk.jmh.infra.Blackhole;

Unused import.

test/micro/org/openjdk/bench/jdk/incubator/vector/MatrixMultiplicationBenchmark.java
 line 176:

> 174: 
> 175: private static float[] newFloatRowMatrix(int size) {
> 176: Random rand = new Random();

Instead we can use the new random generator API e.g,:

var rand = RandomGenerator.getDefault();

This is not super important, but its good practice to try and use more 
preferred APIs.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15338#discussion_r1298830226
PR Review Comment: https://git.openjdk.org/jdk/pull/15338#discussion_r1298830156


  1   2   3   4   >