Re: RFR: 8335638: Calling VarHandle.{access-mode} methods reflectively throws wrong exception
On Wed, 3 Jul 2024 19:43:05 GMT, Hannes Greule wrote: > Similar to how `MethodHandle#invoke(Exact)` methods are already handled, this > change adds special casing for `VarHandle.{access-mode}` methods. > > The exception message is less exact, but I think that's acceptable. src/hotspot/share/prims/methodHandles.cpp line 1371: > 1369: * invoked directly. > 1370: */ > 1371: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject mh, jobjectArray args)) { Suggestion: JVM_ENTRY(jobject, VH_UOE(JNIEnv* env, jobject vh, jobjectArray args)) { - PR Review Comment: https://git.openjdk.org/jdk/pull/20015#discussion_r1664836522
Re: RFR: 8327854: Test java/util/stream/test/org/openjdk/tests/java/util/stream/WhileOpStatefulTest.java failed with RuntimeException
On Sat, 1 Jun 2024 11:49:39 GMT, Viktor Klang wrote: > This PR improves the test failure output for the failing test case, and the > underlying issue is likely going to be addressed by > https://github.com/openjdk/jdk/pull/19131 /cc @DougLea Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19508#pullrequestreview-2152007241
Re: RFR: 8335252: Reduce size of j.u.Formatter.Conversion#isValid [v4]
On Fri, 28 Jun 2024 12:46:49 GMT, Shaojin Wen wrote: >> Currently, the java.util.Formatter$Conversion::isValid method is implemented >> based on switch, which cannot be inlined because codeSize > 325. This >> problem can be avoided by implementing it with ImmutableBitSetPredicate. >> >> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master >> branch: >> >> @ 109 java.util.Formatter$Conversion::isValid (358 bytes) failed to >> inline: hot method too big >> >> >> current version >> >> @ 109 java.util.Formatter$Conversion::isValid (10 bytes) inline (hot) >> @ 4 >> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test >> (50 bytes) inline (hot) > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > fix comment That's a clever change and more preferable (but still potentially fragile in the source to bytecode translation process). However I still don't understand the overall motivation, why does it matter? - PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2197275066
Re: RFR: 8335252: ForceInline j.u.Formatter.Conversion#isValid [v2]
On Thu, 27 Jun 2024 14:12:36 GMT, Shaojin Wen wrote: >> Currently, the java.util.Formatter$Conversion::isValid method is implemented >> based on switch, which cannot be inlined because codeSize > 325. This >> problem can be avoided by implementing it with ImmutableBitSetPredicate. >> >> use `-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining` to see the master >> branch: >> >> @ 109 java.util.Formatter$Conversion::isValid (358 bytes) failed to >> inline: hot method too big >> >> >> current version >> >> @ 109 java.util.Formatter$Conversion::isValid (10 bytes) inline (hot) >> @ 4 >> jdk.internal.util.ImmutableBitSetPredicate$SmallImmutableBitSetPredicate::test >> (50 bytes) inline (hot) > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert & use `@ForceInline` Can you provide some additional context here? I think we need to be very careful about the general use @ForceInline in core libraries. While you show a modest performance benefit using a the micro benchmark, will it actually make any difference overall given formatting strings is not particular efficient? String templates, currently removed, provided good string formatting performance, and the redesign will i think also provide good performance. - PR Comment: https://git.openjdk.org/jdk/pull/19926#issuecomment-2195095064
Re: [External] : Gatherer and primitive specialization
> On Jun 13, 2024, at 8:20 AM, fo...@univ-mlv.fr wrote: > > > > From: "Viktor Klang" > To: "Remi Forax" , "core-libs-dev" > > Sent: Thursday, June 13, 2024 12:52:03 PM > Subject: Re: [External] : Gatherer and primitive specialization > Hi Rémi, > > Given that Collector has not been specialized since it was introduced, we > opted to not specialize Gatherer eagerly as Valhalla Value Classes may also > move the needle a bit regarding the need for specialization in general. > > As i said previously, most collectors uses collections so until collections > are specialized, specializing the collector API is not that useful. > A gatherer unlike a Collector can just transform values, thus specializing it > is more useful. > > For Valhalla, we are at year 10 and value classes are not there yet, > specialization of generics is in my opinion 10 years ahead. > I'm not complaining, if we had rush value classes, everybody will have regret > it, but those things take a lot of time. > Also we may benefit from stack flattening with the boxed primitives. I would hold off on any such explicit specialization (the pressure on Streams was too great, less so for Gatherers IMO). Gatherers may provide a useful way to explore the effectiveness of using primitive boxes that don’t have identity. Paul.
Re: RFR: 8331196: vector api: Remove unnecessary index check in Byte/ShortVector.fromArray/fromArray0Template
On Fri, 26 Apr 2024 14:06:02 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Some index check in Byte/ShortVector.fromArray/fromArray0Template seems not > necessary, could be removed. > Thanks The intrinsic implementation will not perform bounds checks. I think what you may be observing is the result of bounds checks when Java code is interpreted (or perhaps from compiled C1 code). You need to first ensure the code is compiled to C2 before executing with out of bounds values. - PR Comment: https://git.openjdk.org/jdk/pull/18977#issuecomment-2125465612
Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v3]
On Tue, 21 May 2024 10:20:27 GMT, Maurizio Cimadamore wrote: >> When creating a nested memory access var handle, we ensure that the segment >> is accessed at the correct alignment for the root layout being accessed. But >> we do not ensure that the segment has at least the size of the accessed root >> layout. Example: >> >> >> MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), >> JAVA_INT.withName("y"))); >> VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), >> PathElement.groupElement("x")); >> >> >> If I have a memory segment whose size is 12, I can successfully call >> X_HANDLE on it with index 1, like so: >> >> >> MemorySegment segment = Arena.ofAuto().allocate(12); >> int x = (int)X_HANDLE.get(segment, 0, 1); >> >> >> This seems incorrect: after all, the provided segment doesn't have enough >> space to fit _two_ elements of the nested structs. >> >> This PR adds checks to detect this kind of scenario earlier, thus improving >> usability. To achieve this we could, in principle, just tweak >> `LayoutPath::checkEnclosingLayout` and add the required size check. >> >> But doing so will add yet another redundant check on top of the other >> already added by [pull/19124](https://git.openjdk.org/jdk/pull/19124). >> Instead, this PR rethinks how memory segment var handles are created, in >> order to eliminate redundant checks. >> >> The main observation is that size and alignment checks depend on an >> _enclosing_ layout. This layout *might* be the very same value layout being >> accessed (this is the case when e.g. using `JAVA_INT.varHandle()`). But, in >> the general case, the accessed value layout and the enclosing layout might >> differ (e.g. think about accessing a struct field). >> >> Furthermore, the enclosing layout check depends on the _base_ offset at >> which the segment is accessed, _prior_ to any index computation that occurs >> if the accessed layout path has any open elements. It is important to notice >> that this base offset is only available when looking at the var handle that >> is returned to the user. For instance, an indexed var handle with >> coordinates: >> >> >> (MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, >> long /* index 3 */) >> >> >> Is obtained through adaptation, by taking a more basic var handle of the >> form: >> >> >> (MemorySegment, long /* offset */) >> >> >> And then injecting the result of the index multiplication into `offset`. As >> such, we can't add an enclosing layout check inside the var handle returned >> by `VarHandles::memorySegmentViewHandle`, as doing so will end up seeing the >> *wrong* off... > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Fix typo in javadoc Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19251#pullrequestreview-2068996279
Re: RFR: 8331196: vector api: Remove unnecessary index check in Byte/ShortVector.fromArray/fromArray0Template
On Mon, 20 May 2024 19:21:44 GMT, Paul Sandoz wrote: >> Hi, >> Can you help to review this simple patch? >> Some index check in Byte/ShortVector.fromArray/fromArray0Template seems not >> necessary, could be removed. >> Thanks > > That does not look correct and will only check a prefix indexes. A > `ByteVector` with a shape of 256 bits has 32 lanes, whereas an `IntVector` of > the same shape has 8 lanes. The `mapOffset` array will hold 32 indexes that > need checking, so we need to loop through `mapOffset` array four times. > Thanks @PaulSandoz for comment. I just re-ran the vector api tests with this > patch on x64 and riscv64, but seems no failures triggered. Let me check > further, either I missed something or maybe there is some gap in test to be > filled. More likely a gap in the tests, not sufficiently checking for out of bounds access across the range in the mapOffset array. - PR Comment: https://git.openjdk.org/jdk/pull/18977#issuecomment-2122917109
Re: RFR: 8331196: vector api: Remove unnecessary index check in Byte/ShortVector.fromArray/fromArray0Template
On Fri, 26 Apr 2024 14:06:02 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Some index check in Byte/ShortVector.fromArray/fromArray0Template seems not > necessary, could be removed. > Thanks That does not look correct and will only check a prefix indexes. A `ByteVector` with a shape of 256 bits has 32 lanes, whereas an `IntVector` of the same shape has 8 lanes. The `mapOffset` array will hold 32 indexes that need checking, so we need to loop through `mapOffset` array four times. - PR Comment: https://git.openjdk.org/jdk/pull/18977#issuecomment-2121057883
Re: RFR: 8332486: ClassFile API ArrayIndexOutOfBoundsException with label metadata
On Mon, 20 May 2024 08:37:49 GMT, Adam Sotona wrote: > Parsing of a specifically corrupted class file cause unexpected > `ArrayIndexOutOfBoundsException` during label inflation. > This patch checks the valid range and throws expected > `IllegalArgumentException` instead. > Relevant test is added. > > Please review. > > Thanks, > Adam Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19304#pullrequestreview-2066846853
Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v2]
On Mon, 20 May 2024 16:31:18 GMT, Maurizio Cimadamore wrote: > > Ah, got it. You mean more support in VarHandleGuards. Yes, I have a separate > patch for that (actually had that for quite a while), but we're not super > sure how to evaluate what impact it has :-) Ah, i did not realize that. Yes its tricky, it was designed for VHs to fields/arrays, to really minimize their overhead. With segments there is already some additional overhead e.g., if (derefAdapters.length == 0) { // insert align check for the root layout on the initial MS + offset List> coordinateTypes = handle.coordinateTypes(); MethodHandle alignCheck = MethodHandles.insertArguments(MH_CHECK_ENCL_LAYOUT, 2, rootLayout()); handle = MethodHandles.collectCoordinates(handle, 0, alignCheck); int[] reorder = IntStream.concat(IntStream.of(0, 1), IntStream.range(0, coordinateTypes.size())).toArray(); handle = MethodHandles.permuteCoordinates(handle, coordinateTypes, reorder); } So perhaps it does not make much difference. - PR Comment: https://git.openjdk.org/jdk/pull/19251#issuecomment-2120813942
Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v2]
On Thu, 16 May 2024 14:37:15 GMT, Maurizio Cimadamore wrote: >> When creating a nested memory access var handle, we ensure that the segment >> is accessed at the correct alignment for the root layout being accessed. But >> we do not ensure that the segment has at least the size of the accessed root >> layout. Example: >> >> >> MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), >> JAVA_INT.withName("y"))); >> VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), >> PathElement.groupElement("x")); >> >> >> If I have a memory segment whose size is 12, I can successfully call >> X_HANDLE on it with index 1, like so: >> >> >> MemorySegment segment = Arena.ofAuto().allocate(12); >> int x = (int)X_HANDLE.get(segment, 0, 1); >> >> >> This seems incorrect: after all, the provided segment doesn't have enough >> space to fit _two_ elements of the nested structs. >> >> This PR adds checks to detect this kind of scenario earlier, thus improving >> usability. To achieve this we could, in principle, just tweak >> `LayoutPath::checkEnclosingLayout` and add the required size check. >> >> But doing so will add yet another redundant check on top of the other >> already added by [pull/19124](https://git.openjdk.org/jdk/pull/19124). >> Instead, this PR rethinks how memory segment var handles are created, in >> order to eliminate redundant checks. >> >> The main observation is that size and alignment checks depend on an >> _enclosing_ layout. This layout *might* be the very same value layout being >> accessed (this is the case when e.g. using `JAVA_INT.varHandle()`). But, in >> the general case, the accessed value layout and the enclosing layout might >> differ (e.g. think about accessing a struct field). >> >> Furthermore, the enclosing layout check depends on the _base_ offset at >> which the segment is accessed, _prior_ to any index computation that occurs >> if the accessed layout path has any open elements. It is important to notice >> that this base offset is only available when looking at the var handle that >> is returned to the user. For instance, an indexed var handle with >> coordinates: >> >> >> (MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, >> long /* index 3 */) >> >> >> Is obtained through adaptation, by taking a more basic var handle of the >> form: >> >> >> (MemorySegment, long /* offset */) >> >> >> And then injecting the result of the index multiplication into `offset`. As >> such, we can't add an enclosing layout check inside the var handle returned >> by `VarHandles::memorySegmentViewHandle`, as doing so will end up seeing the >> *wrong* off... > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Fix copyrights Took me a few passes to work it all out. Looks good. All the bounds checking action now consistently passes through the call to `checkEnclosingLayout` in adaption of the raw (and unsafe) segment accessing VarHandle. Separately, we might be missing a few long argument accepting guard methods for simpler cases as I suspect they are still focused more on int index types. - PR Review: https://git.openjdk.org/jdk/pull/19251#pullrequestreview-2064493620
Re: RFR: 8331865: Consolidate size and alignment checks in LayoutPath [v2]
On Thu, 16 May 2024 14:37:15 GMT, Maurizio Cimadamore wrote: >> When creating a nested memory access var handle, we ensure that the segment >> is accessed at the correct alignment for the root layout being accessed. But >> we do not ensure that the segment has at least the size of the accessed root >> layout. Example: >> >> >> MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), >> JAVA_INT.withName("y"))); >> VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), >> PathElement.groupElement("x")); >> >> >> If I have a memory segment whose size is 12, I can successfully call >> X_HANDLE on it with index 1, like so: >> >> >> MemorySegment segment = Arena.ofAuto().allocate(12); >> int x = (int)X_HANDLE.get(segment, 0, 1); >> >> >> This seems incorrect: after all, the provided segment doesn't have enough >> space to fit _two_ elements of the nested structs. >> >> This PR adds checks to detect this kind of scenario earlier, thus improving >> usability. To achieve this we could, in principle, just tweak >> `LayoutPath::checkEnclosingLayout` and add the required size check. >> >> But doing so will add yet another redundant check on top of the other >> already added by [pull/19124](https://git.openjdk.org/jdk/pull/19124). >> Instead, this PR rethinks how memory segment var handles are created, in >> order to eliminate redundant checks. >> >> The main observation is that size and alignment checks depend on an >> _enclosing_ layout. This layout *might* be the very same value layout being >> accessed (this is the case when e.g. using `JAVA_INT.varHandle()`). But, in >> the general case, the accessed value layout and the enclosing layout might >> differ (e.g. think about accessing a struct field). >> >> Furthermore, the enclosing layout check depends on the _base_ offset at >> which the segment is accessed, _prior_ to any index computation that occurs >> if the accessed layout path has any open elements. It is important to notice >> that this base offset is only available when looking at the var handle that >> is returned to the user. For instance, an indexed var handle with >> coordinates: >> >> >> (MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, >> long /* index 3 */) >> >> >> Is obtained through adaptation, by taking a more basic var handle of the >> form: >> >> >> (MemorySegment, long /* offset */) >> >> >> And then injecting the result of the index multiplication into `offset`. As >> such, we can't add an enclosing layout check inside the var handle returned >> by `VarHandles::memorySegmentViewHandle`, as doing so will end up seeing the >> *wrong* off... > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Fix copyrights src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 630: > 628: * The access operation must fall inside the spatial bounds > of the accessed > 629: * memory segment, or an {@link IndexOutOfBoundsException} is > thrown. This is the case > 630: * when {@code B + A <= S}, where {@code O} is the base offset > (defined above), Do you mean `{@code O + A <= S}`? (Still working my way through the changes...) - PR Review Comment: https://git.openjdk.org/jdk/pull/19251#discussion_r1605231757
Re: RFR: 8331940: ClassFile API ArrayIndexOutOfBoundsException with certain class files [v2]
On Wed, 15 May 2024 07:51:33 GMT, Adam Sotona wrote: >> Class file with `LineNumberTable` attribute element pointing behind the >> bytecode array throws `ArrayIndexOutOfBoundsException`. >> This patch performs the check and throws expected >> `IllegalArgumentException` instead. >> Relevant test is added. >> >> Please review. >> >> Thanks, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > fixed exception message Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19230#pullrequestreview-2058434044
Re: RFR: 8331940: ClassFile API ArrayIndexOutOfBoundsException with certain class files
On Tue, 14 May 2024 13:18:51 GMT, Adam Sotona wrote: > Class file with `LineNumberTable` attribute element pointing behind the > bytecode array throws `ArrayIndexOutOfBoundsException`. > This patch performs the check and throws expected `IllegalArgumentException` > instead. > Relevant test is added. > > Please review. > > Thanks, > Adam src/java.base/share/classes/jdk/internal/classfile/impl/CodeImpl.java line 241: > 239: int startPc = classReader.readU2(p); > 240: if (startPc > codeLength) { > 241: throw new > IllegalArgumentException(String.format("Line number out of range; > start_pc=%d, codeLength=%d", It's the byte code index that is out of range, not the line number associated with it. - PR Review Comment: https://git.openjdk.org/jdk/pull/19230#discussion_r1600292689
Re: RFR: 8332003: Clarify javadoc for MemoryLayout::offsetHandle [v5]
On Fri, 10 May 2024 16:10:29 GMT, Maurizio Cimadamore wrote: >> Consider this layout: >> >> >> MemoryLayout SEQ = MemoryLayout.sequenceLayout(5, >> MemoryLayout.sequenceLayout(10, JAVA_INT)); >> >> >> And the corresponding offset handle: >> >> >> MethodHandle offsetHandle = SEQ.offsetHandle(PathElement.sequenceLayout(), >> PathElement.sequenceLayout()); >> >> >> The resulting method handle takes two additional `long` indices. The >> implementation checks that the dynamically provided indices conform to the >> bound available at construction: that is, the first index must be < 5, while >> the second must be < 10. If this is not true, an `IndexOutOfBoundException` >> is thrown. >> >> However, the javadoc for `MemoryLayout::byteOffsetHandle` doesn't claim >> anything in this direction. There are only some vague claims in the javadoc >> for `PathElement::sequenceElement()` and `PathElement::sequenceElement(long, >> long, long)` which make some claims on which indices are actually allowed, >> but the text seems more in the tone of a discussion, rather than actual >> normative text. >> >> I've tweaked the javadoc for `MemoryLayout::byteOffsetHandle` to actually >> state that the indices will be checked against the "size" of the >> corresponding open path element (this is a new concept that I also have >> defined in the javadoc). >> >> I also added a test for `byteOffsetHandle` as I don't think we had a test >> for that specifically (although this method is tested indirectly, via >> `MemoryLayout::varHandle`). > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Fix another index check Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19158#pullrequestreview-2050771038
Re: RFR: 8332003: Clarify javadoc for MemoryLayout::offsetHandle [v2]
On Thu, 9 May 2024 18:26:17 GMT, Maurizio Cimadamore wrote: >> Consider this layout: >> >> >> MemoryLayout SEQ = MemoryLayout.sequenceLayout(5, >> MemoryLayout.sequenceLayout(10, JAVA_INT)); >> >> >> And the corresponding offset handle: >> >> >> MethodHandle offsetHandle = SEQ.offsetHandle(PathElement.sequenceLayout(), >> PathElement.sequenceLayout()); >> >> >> The resulting method handle takes two additional `long` indices. The >> implementation checks that the dynamically provided indices conform to the >> bound available at construction: that is, the first index must be < 5, while >> the second must be < 10. If this is not true, an `IndexOutOfBoundException` >> is thrown. >> >> However, the javadoc for `MemoryLayout::byteOffsetHandle` doesn't claim >> anything in this direction. There are only some vague claims in the javadoc >> for `PathElement::sequenceElement()` and `PathElement::sequenceElement(long, >> long, long)` which make some claims on which indices are actually allowed, >> but the text seems more in the tone of a discussion, rather than actual >> normative text. >> >> I've tweaked the javadoc for `MemoryLayout::byteOffsetHandle` to actually >> state that the indices will be checked against the "size" of the >> corresponding open path element (this is a new concept that I also have >> defined in the javadoc). >> >> I also added a test for `byteOffsetHandle` as I don't think we had a test >> for that specifically (although this method is tested indirectly, via >> `MemoryLayout::varHandle`). > > Maurizio Cimadamore has updated the pull request incrementally with two > additional commits since the last revision: > > - Update copyright > - Add javadoc to other MemoryLayout methods returning VarHandle/MethodHandle > to describe which exception can be thrown by returned handle src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 574: > 572: * {@code c_1}, {@code c_2}, ... {@code c_m} are other > static offset > 573: * constants (such as field offsets) which are derived from the > layout path. > 574: * Can we connected back to the prior paragraph? e.g., for any given dynamic argument `x_n`, the index into the sequence associated with the n'th open path element whose size is `size_n`,`x_n` must be `>= 0` or `< size_n` - PR Review Comment: https://git.openjdk.org/jdk/pull/19158#discussion_r1595836040
Re: RFR: 8331734: Atomic MemorySegment VarHandle operations fails for element layouts
On Tue, 7 May 2024 15:42:23 GMT, Maurizio Cimadamore wrote: > This PR fixes an issue that has crept into the FFM API implementation. > > From very early stages, the FFM API used to disable the alignment check on > nested layout elements, in favor of an alignment check against the memory > segment base address. The rationale was that the JIT had issue with > eliminating redundant alignment checks, and accessing nested elements could > never result in alignment issues, since the alignment of the container is > provably bigger than that of the contained element. This means that, when > creating a var handle for a nested layout element, we set the nested layout > alignment to 1 (unaligned), derive its var handle, and then decorate the var > handle with an alignment check for the container. > > At some point in 22, we tweaked the API to throw > `UnsupportedOperationException` when using an access mode incompatible with > the alignment constraint of the accessed layout. That is, a volatile read on > an `int` is only possible if the access occurs at an address that is at least > 4-byte aligned. Otherwise an `UOE` is thrown. > > Unfortunately this change broke the aforementioned optimization: creating a > var handle for an unaligned layout works, but the resulting layout will *not* > support any of the atomic access modes. > > Since this optimization is not really required anymore (proper C2 support to > hoist/eliminate alignment checks has been added since then), it is better to > disable this implementation quirk, and leave optimizations to C2. > > (If we really really wanted to optimize things a bit, we could remove the > container alignment check in the case the accessed value is the first layout > element nested in the container, but this PR doesn't go that far). > > I've run relevant benchmarks before/after and found no differences. In part > this is because `arrayElementVarHandle` is unaffected. But, even after > tweaking the `LoopOverNonConstant` benchmark to add explicit tests for the > code path affected, no significant difference was found, sign that C2 is > indeed able to spot (and remove) the redundant alignment check. Note: if we > know that `aligned_to_N(base)` holds, then it's easy to prove that > `aligned_to_M(base + offset + index * scale)` holds, when `N >= M` and with > `offset` and `scale` known (the latter a power of two). Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19124#pullrequestreview-2043909050
Re: RFR: 8048691: Spliterator.SORTED characteristics gets cleared for BaseStream.spliterator
On Tue, 7 May 2024 14:58:00 GMT, Viktor Klang wrote: > Removes SORTED if not also ORDERED for escape-hatch `Stream::spliterator()` Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19123#pullrequestreview-2043897612
Re: RFR: 8331655: ClassFile API ClassCastException with verbose output of certain class files [v2]
On Fri, 3 May 2024 17:13:04 GMT, Adam Sotona wrote: >> Specifically corrupted constant pool of a class file can cause >> ClassCastException, when the entries are accessed by Class-File API in exact >> order. >> >> This fix avoids the ClassCastException and throws ConstantPoolException >> instead. >> Test is attached. >> >> Please review. >> >> Thanks, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > applied suggested changes Nice! - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19088#pullrequestreview-2038729421
Re: RFR: 8331655: ClassFile API ClassCastException with verbose output of certain class files
On Fri, 3 May 2024 15:28:05 GMT, Adam Sotona wrote: > Specifically corrupted constant pool of a class file can cause > ClassCastException, when the entries are accessed by Class-File API in exact > order. > > This fix avoids the ClassCastException and throws ConstantPoolException > instead. > Test is attached. > > Please review. > > Thanks, > Adam src/java.base/share/classes/jdk/internal/classfile/impl/ClassReaderImpl.java line 402: > 400: int tag = readU1(offset); > 401: final int q = offset + 1; > 402: if (tag == TAG_UTF8) { Can we call into the tag accepting entryByIndex? e.g., if (entryByIndex(index, TAG_UTF8) instanceof AbstractPoolEntry.Utf8EntryImpl utf8) { return ... } throw new ... ? - PR Review Comment: https://git.openjdk.org/jdk/pull/19088#discussion_r1589423345
Re: RFR: 8331320: ClassFile API OutOfMemoryError with certain class files [v2]
On Thu, 2 May 2024 11:13:22 GMT, Adam Sotona wrote: >> Class files with specifically corrupted tableswitch or lookupswitch >> instructions in the bytecode cause OutOfMemoryError while parsing with >> Class-File API. >> This patch performs additional checks to avoid OOME and adds relevant tests. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains two additional > commits since the last revision: > > - Merge branch 'master' into JDK-8331320-OOME > - 8331320: ClassFile API OutOfMemoryError with certain class files Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/19024#pullrequestreview-2036210957
Re: RFR: 8331320: ClassFile API OutOfMemoryError with certain class files
On Tue, 30 Apr 2024 15:31:02 GMT, Adam Sotona wrote: > Class files with specifically corrupted tableswitch or lookupswitch > instructions in the bytecode cause OutOfMemoryError while parsing with > Class-File API. > This patch performs additional checks to avoid OOME and adds relevant tests. > > Please review. > > Thank you, > Adam src/java.base/share/classes/jdk/internal/classfile/impl/AbstractInstruction.java line 320: > 318: int low = code.classReader.readInt(ap + 4); > 319: int high = code.classReader.readInt(ap + 8); > 320: if (high < low || high - low > code.codeLength >> 2) { May be its also an opportunity to reduce duplication e.g., replace line 316 with a call to `afterPadding()` - PR Review Comment: https://git.openjdk.org/jdk/pull/19024#discussion_r1585300727
Re: RFR: 8330684: ClassFile API runs into StackOverflowError while parsing certain class' bytes [v3]
On Fri, 26 Apr 2024 13:34:08 GMT, Adam Sotona wrote: >> ClassFile API dives into the nested constant pool entries without type >> restrictions, while parsing a class file. Validation of the entry is >> performed post-parsing. Specifically corrupted constant pool entry may cause >> infinite loop during parsing and throws SOE. >> This patch resolves the issue by providing specific implementations for the >> nested CP entries parsing, instead of sharing the common (post-checking) >> code. >> Added test simulates the situation on inner-looped method reference entry. >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com> Very nice. I think we got lucky this worked out :-) - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18907#pullrequestreview-2025411091
Re: RFR: 8322847: java.lang.classfile.BufWriter should specify @throws for its writeXXX methods
On Tue, 23 Apr 2024 11:56:41 GMT, Adam Sotona wrote: > This patch adds missing `@throws` javadoc annotations to > `java.lang.classfile.BufWriter`. > > Please review. > > Thank you, > Adam Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/18913#pullrequestreview-2023498303
Re: RFR: 8330684: ClassFile API runs into StackOverflowError while parsing certain class' bytes
On Thu, 25 Apr 2024 00:51:41 GMT, Adam Sotona wrote: > Unfortunately it would have to be an expected tags list or an extra > constructed bit mask, due to the multiple tags allowed for MemberRefEntry and > it would slightly affect the performance. Ah yes, i missed that. It could be two tags, a lower and upper bound, because TAG_FIELDREF, TAG_METHODREF, and TAG_INTERFACEMETHODREF are consecutive values (9 to 11). - PR Comment: https://git.openjdk.org/jdk/pull/18907#issuecomment-2078099914
Re: RFR: 8325373: Improve StackCounter error reporting for bad switch cases
On Tue, 23 Apr 2024 12:59:18 GMT, Adam Sotona wrote: > ClassFile API `StackMapGenerator` attaches print or hex dump of the method to > an error message. > However there is no such attachment when the stack maps generation is turned > off. > > This patch moves class print/dump to `impl.Util::dumpMethod`, so it is shared > by `StackMapGenerator` and `StackCounter` to provide the same level of > details in case of an error. > > Please review. > > Thank you, > Adam Looks good, just update the copy write date in Util.java. src/java.base/share/classes/jdk/internal/classfile/impl/Util.java line 229: > 227: }; > 228: ClassPrinter.toYaml(clm.methods().get(0).code().get(), > ClassPrinter.Verbosity.TRACE_ALL, dump); > 229: } catch (Error | Exception suppresed) { If you like you can replace `suppresed` [sic] with `_`. - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18914#pullrequestreview-2021078844 PR Review Comment: https://git.openjdk.org/jdk/pull/18914#discussion_r1578604408
Re: RFR: 8322847: java.lang.classfile.BufWriter should specify @throws for its writeXXX methods
On Tue, 23 Apr 2024 11:56:41 GMT, Adam Sotona wrote: > This patch adds missing `@throws` javadoc annotations to > `java.lang.classfile.BufWriter`. > > Please review. > > Thank you, > Adam This looks good, but for completeness it will need a CSR. - PR Review: https://git.openjdk.org/jdk/pull/18913#pullrequestreview-2021068931
Re: RFR: 8330684: ClassFile API runs into StackOverflowError while parsing certain class' bytes
On Tue, 23 Apr 2024 07:39:47 GMT, Adam Sotona wrote: > ClassFile API dives into the nested constant pool entries without type > restrictions, while parsing a class file. Validation of the entry is > performed post-parsing. Specifically corrupted constant pool entry may cause > infinite loop during parsing and throws SOE. > This patch resolves the issue by providing specific implementations for the > nested CP entries parsing, instead of sharing the common (post-checking) code. > Added test simulates the situation on inner-looped method reference entry. > > Please review. > > Thank you, > Adam Rather than duplicating some checks I wonder if it is possible to add a private method `entryByIndex(int index, int expectedTag)` that the existing `entryByIndex` defers to. If the `expectedTag` is non-negative then it checks `tag` against `expectedTag` before proceeding to the switch expression. Then the implementations of `readClassEntry` etc can be adjusted to pass along the expected tag. - PR Review: https://git.openjdk.org/jdk/pull/18907#pullrequestreview-2021009969
Re: Vector (and integer) API: unsigned min/max
Hi David, It’s not at all outlandish, but would caution it's more work than one might initially think. Could you describe a little more about your use case? that can be helpful to understand the larger picture and demand. Using unsigned comparison would be my recommended approach with the current API. CC'ing Sandhya for her opinion. Generally when we add new Vector operations we also consider the impact on the scalar types and try to fill any gaps there, so the vector operation behavior is composed from the scalar operation behavior (so like you indicated regarding symmetry). We are seeing demand for saturated arithmetic primarily for vector (not “vector” as in hardware vector) search, so we may do something there for specific integral types. Paul. > On Apr 17, 2024, at 7:13 AM, David Lloyd wrote: > > I've been trying the the incubating Vector API and one thing I've missed on > integer-typed vectors is having unsigned min/max operations. There is a > signed min/max operation, and unsigned comparison, but no unsigned min/max. > > I guess this is for symmetry with `Math`, which only has signed `min`/`max`. > However, I would point out that while it's not very hard to implement one's > own unsigned min/max for integer types using `compareUnsigned`, it is a bit > harder to do so with vectors. The best I've come up with is to take one of > two approaches: > > 1. Zero-extend the vector to the next-larger size, perform the min/max, and > reduce it back down again, or > 2. Add .MIN_VALUE, min/max with a value or vector also offset by > .MIN_VALUE, and then subtract the offset again > > I guess a third approach could be to do a comparison using unsigned compares, > and then use the resultant vector mask to select items from the original two > vectors, but I didn't bother to work out this solution given I already had > the other two options. > > Would it be feasible to add unsigned min/max operations for vectors? It looks > like at least AArch64 has support for this as a single instruction, so it > doesn't seem too outlandish. > > And as a separate (but related) question, what about > `Math.minUnsigned`/`Math.maxUnsigned` of `int` and `long` for symmetry? > > -- > - DML • he/him
Re: RFR: 8330467: NoClassDefFoundError when lambda is in a hidden class [v3]
On Wed, 17 Apr 2024 16:08:31 GMT, Adam Sotona wrote: >> Current implementation of `LambdaMetafactory` does not allow to use lambdas >> in hidden classes. Invocation throws `NoClassDefFoundError` instead. >> >> This patch includes lambda implementation in a hidden class under the >> special handling of `useImplMethodHandle`. >> The patch also fixes `j/l/i/defineHiddenClass/BasicTest::testLambda` to >> correctly cover this test case. >> >> Please review. >> >> Thanks, >> Adam > > Adam Sotona has updated the pull request incrementally with two additional > commits since the last revision: > > - typo in comment fix > - applied suggested changes Looks good to me. I conservatively bumped the review count to 2. - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18810#pullrequestreview-2006903761
Re: RFR: 8330467: NoClassDefFoundError when lambda is in a hidden class [v2]
On Wed, 17 Apr 2024 12:51:55 GMT, Adam Sotona wrote: >> Current implementation of `LambdaMetafactory` does not allow to use lambdas >> in hidden classes. Invocation throws `NoClassDefFoundError` instead. >> >> This patch includes lambda implementation in a hidden class under the >> special handling of `useImplMethodHandle`. >> The patch also fixes `j/l/i/defineHiddenClass/BasicTest::testLambda` to >> correctly cover this test case. >> >> Please review. >> >> Thanks, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > applied suggested changes test/jdk/java/lang/invoke/defineHiddenClass/src/Lambda.java line 28: > 26: public class Lambda implements HiddenTest { > 27: public void test() { > 28: Function f = o -> o.toString(); Recommend you retain the existing method reference e.g.: Function fmref = Object::toString; Function flexp = o -> o.toString(); String s = fmref.apply(this) + flexp.apply(this); ... Or split them out into two tests (easier to debug if case fails). - PR Review Comment: https://git.openjdk.org/jdk/pull/18810#discussion_r1569060714
Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams [v8]
On Sat, 13 Apr 2024 14:27:12 GMT, Viktor Klang wrote: >> This PR implements Gatherer-inspired encoding of `flatMap` that shows that >> it is both competitive performance-wise as well as improve correctness. >> >> Below is the performance of `Stream::flatMap` (for reference types): >> >> Before this PR (on my MacBook, aarch64): >> >> Non-short-circuiting: >> >> Benchmark (size) Mode CntScore Error Units >> FlatMap.par_array 10 thrpt 12 257582,480 ? 31360,242 ops/s >> FlatMap.par_array100 thrpt 1255202,015 ? 14011,668 ops/s >> FlatMap.par_array 1000 thrpt 12 6546,252 ? 3675,764 ops/s >> FlatMap.par_doublestream 10 thrpt 12 267423,410 ? 37338,089 ops/s >> FlatMap.par_doublestream 100 thrpt 1227140,703 ? 4979,878 ops/s >> FlatMap.par_doublestream1000 thrpt 12 2978,178 ? 1890,250 ops/s >> FlatMap.par_intstream 10 thrpt 12 268194,530 ? 37295,092 ops/s >> FlatMap.par_intstream100 thrpt 1230897,034 ? 5388,245 ops/s >> FlatMap.par_intstream 1000 thrpt 12 3969,043 ? 2494,641 ops/s >> FlatMap.par_longstream10 thrpt 12 279756,861 ? 19519,497 ops/s >> FlatMap.par_longstream 100 thrpt 1245733,955 ? 18385,144 ops/s >> FlatMap.par_longstream 1000 thrpt 12 5115,615 ? 4147,237 ops/s >> FlatMap.seq_array 10 thrpt 12 2937192,934 ? 58605,583 ops/s >> FlatMap.seq_array100 thrpt 1241859,462 ? 139,021 ops/s >> FlatMap.seq_array 1000 thrpt 12 442,677 ? 1,041 ops/s >> FlatMap.seq_doublestream 10 thrpt 12 4972651,093 ? 35997,267 ops/s >> FlatMap.seq_doublestream 100 thrpt 1299265,005 ? 193,497 ops/s >> FlatMap.seq_doublestream1000 thrpt 12 1037,030 ? 3,254 ops/s >> FlatMap.seq_intstream 10 thrpt 12 5133751,888 ? 23516,294 ops/s >> FlatMap.seq_intstream100 thrpt 12 145166,206 ? 399,263 ops/s >> FlatMap.seq_intstream 1000 thrpt 12 1565,004 ? 3,207 ops/s >> FlatMap.seq_longstream10 thrpt 12 5047029,363 ? 24742,300 ops/s >> FlatMap.seq_longstream 100 thrpt 12 233225,307 ? 7162,604 ops/s >> FlatMap.seq_longstream 1000 thrpt 12 2999,926 ? 9,945 ops/s >> >> // Short-circuiting: >> >> Benchmark (size) Mode Cnt Score Error Units >> FlatMap.par_iterate_double 10 thrpt 12 46336,834 ? 6803,751 ops/s >> FlatMap.par_iterate_double 100 ... > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Addressing review feedback Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/18625#pullrequestreview-2001623065
Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams [v7]
On Fri, 12 Apr 2024 14:53:01 GMT, Viktor Klang wrote: >> This PR implements Gatherer-inspired encoding of `flatMap` that shows that >> it is both competitive performance-wise as well as improve correctness. >> >> Below is the performance of `Stream::flatMap` (for reference types): >> >> Before this PR (on my MacBook, aarch64): >> >> Non-short-circuiting: >> >> Benchmark (size) Mode CntScore Error Units >> FlatMap.par_array 10 thrpt 12 257582,480 ? 31360,242 ops/s >> FlatMap.par_array100 thrpt 1255202,015 ? 14011,668 ops/s >> FlatMap.par_array 1000 thrpt 12 6546,252 ? 3675,764 ops/s >> FlatMap.par_doublestream 10 thrpt 12 267423,410 ? 37338,089 ops/s >> FlatMap.par_doublestream 100 thrpt 1227140,703 ? 4979,878 ops/s >> FlatMap.par_doublestream1000 thrpt 12 2978,178 ? 1890,250 ops/s >> FlatMap.par_intstream 10 thrpt 12 268194,530 ? 37295,092 ops/s >> FlatMap.par_intstream100 thrpt 1230897,034 ? 5388,245 ops/s >> FlatMap.par_intstream 1000 thrpt 12 3969,043 ? 2494,641 ops/s >> FlatMap.par_longstream10 thrpt 12 279756,861 ? 19519,497 ops/s >> FlatMap.par_longstream 100 thrpt 1245733,955 ? 18385,144 ops/s >> FlatMap.par_longstream 1000 thrpt 12 5115,615 ? 4147,237 ops/s >> FlatMap.seq_array 10 thrpt 12 2937192,934 ? 58605,583 ops/s >> FlatMap.seq_array100 thrpt 1241859,462 ? 139,021 ops/s >> FlatMap.seq_array 1000 thrpt 12 442,677 ? 1,041 ops/s >> FlatMap.seq_doublestream 10 thrpt 12 4972651,093 ? 35997,267 ops/s >> FlatMap.seq_doublestream 100 thrpt 1299265,005 ? 193,497 ops/s >> FlatMap.seq_doublestream1000 thrpt 12 1037,030 ? 3,254 ops/s >> FlatMap.seq_intstream 10 thrpt 12 5133751,888 ? 23516,294 ops/s >> FlatMap.seq_intstream100 thrpt 12 145166,206 ? 399,263 ops/s >> FlatMap.seq_intstream 1000 thrpt 12 1565,004 ? 3,207 ops/s >> FlatMap.seq_longstream10 thrpt 12 5047029,363 ? 24742,300 ops/s >> FlatMap.seq_longstream 100 thrpt 12 233225,307 ? 7162,604 ops/s >> FlatMap.seq_longstream 1000 thrpt 12 2999,926 ? 9,945 ops/s >> >> // Short-circuiting: >> >> Benchmark (size) Mode Cnt Score Error Units >> FlatMap.par_iterate_double 10 thrpt 12 46336,834 ? 6803,751 ops/s >> FlatMap.par_iterate_double 100 ... > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Adding additional, short-circuit-specific, cases to the FlatMap benchmark Very nice work. - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18625#pullrequestreview-1998504550
Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams [v2]
On Tue, 9 Apr 2024 10:07:46 GMT, Viktor Klang wrote: >> This PR implements Gatherer-inspired encoding of `flatMap` that shows that >> it is both competitive performance-wise as well as improve correctness. >> >> Below is the performance of `Stream::flatMap` (for reference types): >> >> Before this PR: >> >> Benchmark(size) Mode CntScore Error Units >> FlatMap.par_array10 thrpt 12 294008,937 ? 54369,110 ops/s >> FlatMap.par_array 100 thrpt 1262411,229 ? 14868,119 ops/s >> FlatMap.par_array 1000 thrpt 12 8263,821 ? 452,622 ops/s >> FlatMap.par_iterate 10 thrpt 1223029,978 ? 4274,449 ops/s >> FlatMap.par_iterate 100 thrpt 1210532,907 ? 321,694 ops/s >> FlatMap.par_iterate1000 thrpt 12 981,571 ? 135,270 ops/s >> FlatMap.seq_array10 thrpt 12 2955648,495 ? 32539,142 ops/s >> FlatMap.seq_array 100 thrpt 1241851,009 ? 377,546 ops/s >> FlatMap.seq_array 1000 thrpt 12 1740,281 ? 1229,974 ops/s >> FlatMap.seq_iterate 10 thrpt 12 321727,690 ? 5149,356 ops/s >> FlatMap.seq_iterate 100 thrpt 12 8437,198 ?56,635 ops/s >> FlatMap.seq_iterate1000 thrpt 12 76,994 ? 0,965 ops/s >> >> >> After this PR: >> >> >> Benchmark(size) Mode CntScoreError Units >> FlatMap.par_array10 thrpt 12 283350,051 ? 35567,223 ops/s >> FlatMap.par_array 100 thrpt 1253846,906 ? 19241,913 ops/s >> FlatMap.par_array 1000 thrpt 12 8230,909 ?156,362 ops/s >> FlatMap.par_iterate 10 thrpt 1226328,500 ? 5411,401 ops/s >> FlatMap.par_iterate 100 thrpt 1210470,862 ?249,991 ops/s >> FlatMap.par_iterate1000 thrpt 12 986,511 ?224,050 ops/s >> FlatMap.seq_array10 thrpt 12 5654826,565 ? 27317,453 ops/s >> FlatMap.seq_array 100 thrpt 12 187929,786 ?542,787 ops/s >> FlatMap.seq_array 1000 thrpt 12 2385,346 ? 9,827 ops/s >> FlatMap.seq_iterate 10 thrpt 12 812722,403 ? 160500,399 ops/s >> FlatMap.seq_iterate 100 thrpt 1213542,472 ?118,769 ops/s >> FlatMap.seq_iterate1000 thrpt 12 157,056 ? 1,814 ops/s >> >> >> For streams of size 100k, the following numbers are interesting: >> >> Before this PR: >> >> Benchmark(size) Mode Cnt ScoreError Units >> FlatMap.par_array10 thrpt 12 0,325 ? 0,004 ops/s >> FlatMap.par_iterate 10 thrpt 12 0,106 ? 0,008 o... > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Updating copyright year src/java.base/share/classes/java/util/stream/DoublePipeline.java line 280: > 278: result.sequential().allMatch(this); > 279: else > 280: > result.sequential().forEach(sink::accept); I think that might create a new double consumer instance for every input element. Alternatively you can compute and cache it as a field, replacing `shorts` and use a `null` check. - PR Review Comment: https://git.openjdk.org/jdk/pull/18625#discussion_r1557995257
Re: RFR: 8196106: Support nested infinite or recursive flat mapped streams
On Mon, 8 Apr 2024 10:20:11 GMT, Viktor Klang wrote: > @PaulSandoz @AlanBateman I've added a commit to this PR which removes the use > of Gatherer for Stream::flatMap, but instead implements flatMap for all of > the pipelines using the same encoding which Gatherer would use. It seems very > competitive performance-wise, and resolves at least one open JBS-issue with > flatMap (will look to see if it resolves more than that) That's a rather clever use of `allMatch`! Did you observe performance improvements using `@Stable` on the `cancel` field? It's really hard to predict in the abstract (since the default value of the field will be read in proportion to the number of elements until the stream is cancelled). - PR Comment: https://git.openjdk.org/jdk/pull/18625#issuecomment-2043218633
Re: RFR: 8325659: Normalize Random usage by incubator vector tests
On Mon, 8 Apr 2024 10:50:00 GMT, Evgeny Nikitin wrote: > Improve RNG usage in said tests: > > 1. The RNG is not created by our Utils class, as suggested in our JTReg tests; > 2. The seed, accordingly, is not a fixed value now, but truly random; > 3. The tests that had been creating their own Random instances, are now using > RAND static member from the AbstractVectorTest; > 4. The generated tests sources have been re-generated. > > The most important change is #2, it could add variability and help cover more > JIT Compiler and Runtime scenarios. Looks good, thank you for cleaning this up. - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18675#pullrequestreview-1987442519
Re: RFR: 8328316: Finisher cannot emit if stream is sequential and integrator returned false [v4]
On Wed, 20 Mar 2024 23:58:32 GMT, Viktor Klang wrote: >> Adds differentiation between direct and transitive short circuiting which >> could prevent pushing downstream in the finisher for built-ins that were not >> `collect()`. >> >> Creating this as a draft PR for now, just need to run some benchmarks to >> validate no significant regressions first. > > Viktor Klang has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains two additional > commits since the last revision: > > - Updating copyright year and switching to use underscores in > GathererShortCircuitTest > - Adds differentiation between direct and transitive short circutiting for > Gatherers Marked as reviewed by psandoz (Reviewer). src/java.base/share/classes/java/util/stream/GathererOp.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2023, 2024, Oracle and/or its affiliates. All rights reserved. - PR Review: https://git.openjdk.org/jdk/pull/18351#pullrequestreview-1952810339 PR Review Comment: https://git.openjdk.org/jdk/pull/18351#discussion_r1534280411
Re: RFR: 8328316: Finisher cannot emit if stream is sequential and integrator returned false
On Mon, 18 Mar 2024 16:27:13 GMT, Viktor Klang wrote: > Adds differentiation between direct and transitive short circuiting which > could prevent pushing downstream in the finisher for built-ins that were not > `collect()`. > > Creating this as a draft PR for now, just need to run some benchmarks to > validate no significant regressions first. Looks good, just a minor suggestion. test/jdk/java/util/stream/GathererShortCircuitTest.java line 48: > 46: Gatherer.of( > 47: (unused, element, downstream) -> false, > 48: (unused, downstream) -> downstream.push(expected) Suggestion: (_, element, downstream) -> false, (_ downstream) -> downstream.push(expected) - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18351#pullrequestreview-1950412052 PR Review Comment: https://git.openjdk.org/jdk/pull/18351#discussion_r1533025318
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v17]
On Sat, 2 Mar 2024 16:22:22 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >> partially unrolls scalar loop to accumulates values from gather indices into >> a quadword(64bit) slice followed by vector permutation to place the slice >> into appropriate vector lanes, it prevents code bloating and generates >> compact JIT sequence. This coupled with savings from expansive array >> allocation in existing java implementation translates into significant >> performance of 1.5-10x gains with included micro. >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Review resolutions. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1914752388
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v16]
On Tue, 27 Feb 2024 02:47:13 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >> partially unrolls scalar loop to accumulates values from gather indices into >> a quadword(64bit) slice followed by vector permutation to place the slice >> into appropriate vector lanes, it prevents code bloating and generates >> compact JIT sequence. This coupled with savings from expansive array >> allocation in existing java implementation translates into significant >> performance of 1.5-10x gains with included micro. >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Review comment resolutions. The Java code looks correct. It would be nice to common the non-mask and mask variants to reduce the code duplication. Perhaps even common to all element types if the loop check is efficient? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 3637: > 3635: } else { > 3636: lsp = isp; > 3637: } No need to initialize `lsp` to null, since it will be definitely assigned after the if/else statement. Suggestion: // Constant folding should sweep out following conditonal logic. VectorSpecies lsp; if (isp.length() > IntVector.SPECIES_PREFERRED.length()) { lsp = IntVector.SPECIES_PREFERRED; } else { lsp = isp; } - PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1904909095 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1505073358
Re: [jdk22] RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments
On Wed, 7 Feb 2024 09:13:33 GMT, Aleksey Shipilev wrote: > Looks fine. Thanks! - PR Comment: https://git.openjdk.org/jdk22/pull/109#issuecomment-1932769043
[jdk22] Integrated: 8324858: [vectorapi] Bounds checking issues when accessing memory segments
On Tue, 6 Feb 2024 16:50:10 GMT, Paul Sandoz wrote: > This pull request contains a backport of commit > [1ae85138](https://github.com/openjdk/jdk/commit/1ae851387f881263ccc6aeace5afdd0f49d41d33) > from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Paul Sandoz on 2 Feb 2024 and was > reviewed by Maurizio Cimadamore and Jatin Bhateja. This pull request has now been integrated. Changeset: 9cc260d3 Author:Paul Sandoz URL: https://git.openjdk.org/jdk22/commit/9cc260d3a10a4d31a2ee22be1715884e175f9679 Stats: 248 lines in 39 files changed: 132 ins; 8 del; 108 mod 8324858: [vectorapi] Bounds checking issues when accessing memory segments Reviewed-by: shade Backport-of: 1ae851387f881263ccc6aeace5afdd0f49d41d33 - PR: https://git.openjdk.org/jdk22/pull/109
[jdk22] RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments
This pull request contains a backport of commit [1ae85138](https://github.com/openjdk/jdk/commit/1ae851387f881263ccc6aeace5afdd0f49d41d33) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Paul Sandoz on 2 Feb 2024 and was reviewed by Maurizio Cimadamore and Jatin Bhateja. - Commit messages: - Backport 1ae851387f881263ccc6aeace5afdd0f49d41d33 Changes: https://git.openjdk.org/jdk22/pull/109/files Webrev: https://webrevs.openjdk.org/?repo=jdk22=109=00 Issue: https://bugs.openjdk.org/browse/JDK-8324858 Stats: 248 lines in 39 files changed: 132 ins; 8 del; 108 mod Patch: https://git.openjdk.org/jdk22/pull/109.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/109/head:pull/109 PR: https://git.openjdk.org/jdk22/pull/109
Re: RFR: 8323058: Revisit j.l.classfile.CodeBuilder API surface [v2]
On Mon, 5 Feb 2024 18:31:44 GMT, Adam Sotona wrote: >> `java.lang.classfile.CodeBuilder` contains more than 230 API methods. >> Existing ClassFile API use cases proved the concept one big CodeBuilder is >> comfortable. However there are some redundancies, glitches in the naming >> conventions, some frequently used methods are hard to find and some methods >> have low practical use. >> >> This patch revisits the `CodeBuilder` API methods and introduces some >> changes. >> >> For more details, please, visit the [CSR >> ](https://bugs.openjdk.org/browse/JDK-8323067) >> >> Please review. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional > commit since the last revision: > > removed CodeBuilder::newObject methods Marked as reviewed by psandoz (Reviewer). src/java.base/share/classes/java/lang/classfile/CodeBuilder.java line 507: > 505: * @return this builder > 506: */ > 507: default CodeBuilder newObject(ClassEntry type) { The two `newObject` methods seem to fit in the pattern of methods that are being removed, since they don't differentiate sufficiently with the `new_` methods that defer to them. - PR Review: https://git.openjdk.org/jdk/pull/17282#pullrequestreview-1863531162 PR Review Comment: https://git.openjdk.org/jdk/pull/17282#discussion_r1478611886
Integrated: 8324858: [vectorapi] Bounds checking issues when accessing memory segments
On Mon, 29 Jan 2024 19:45:41 GMT, Paul Sandoz wrote: > The implementation of method `VectorSpecies::fromMemorySegment`, in > `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on > the offset argument when the method is compiled by C2 (bounds checks are > performed when interpreted and by C1). > > This is an oversight and explicit bounds checks are required, as is already > case for the other load and store memory access methods (including storing to > memory memory segments). > > The workaround is to call the static method `{T}Vector::fromMemorySegment`. > > The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` to > do the same and call `{T}Vector::fromMemorySegment`, following the same > pattern for implementations of `VectorSpecies::fromArray`. > > The tests have been conservatively updated to call the species access method > where possible in the knowledge that it calls the vector access method (the > tests were intended to test out of bounds access when compiled by C2). > > Thinking ahead its tempting to remove the species access methods, simplifying > functionality that is duplicated. This pull request has now been integrated. Changeset: 1ae85138 Author:Paul Sandoz URL: https://git.openjdk.org/jdk/commit/1ae851387f881263ccc6aeace5afdd0f49d41d33 Stats: 248 lines in 39 files changed: 132 ins; 8 del; 108 mod 8324858: [vectorapi] Bounds checking issues when accessing memory segments Reviewed-by: mcimadamore, jbhateja - PR: https://git.openjdk.org/jdk/pull/17621
Re: RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments [v2]
On Thu, 1 Feb 2024 12:22:10 GMT, Maurizio Cimadamore wrote: >> My expectation is the risk is small, but of course non-zero. These tests can >> be expensive to run so i was trying balance the risk without increasing test >> execution times. >> >> I could strengthen the comment from: >> >> @ForceInline >> @Override final >> public ByteVector fromMemorySegment(MemorySegment ms, long offset, >> ByteOrder bo) { >> // User entry point: Be careful with inputs. >> return ByteVector >> .fromMemorySegment(this, ms, offset, bo); >> } >> >> >> to: >> >> >> @ForceInline >> @Override final >> public ByteVector fromMemorySegment(MemorySegment ms, long offset, >> ByteOrder bo) { >> // User entry point >> // Defer only to the equivalent method on the vector class, >> using the same inputs >> return ByteVector >> .fromMemorySegment(this, ms, offset, bo); >> } > > Sounds good Done - PR Review Comment: https://git.openjdk.org/jdk/pull/17621#discussion_r1475344913
Re: RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments [v2]
> The implementation of method `VectorSpecies::fromMemorySegment`, in > `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on > the offset argument when the method is compiled by C2 (bounds checks are > performed when interpreted and by C1). > > This is an oversight and explicit bounds checks are required, as is already > case for the other load and store memory access methods (including storing to > memory memory segments). > > The workaround is to call the static method `{T}Vector::fromMemorySegment`. > > The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` to > do the same and call `{T}Vector::fromMemorySegment`, following the same > pattern for implementations of `VectorSpecies::fromArray`. > > The tests have been conservatively updated to call the species access method > where possible in the knowledge that it calls the vector access method (the > tests were intended to test out of bounds access when compiled by C2). > > Thinking ahead its tempting to remove the species access methods, simplifying > functionality that is duplicated. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Update/add clarifying comments - Changes: - all: https://git.openjdk.org/jdk/pull/17621/files - new: https://git.openjdk.org/jdk/pull/17621/files/ba326467..e3958a16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=17621=01 - incr: https://webrevs.openjdk.org/?repo=jdk=17621=00-01 Stats: 90 lines in 38 files changed: 76 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/17621.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17621/head:pull/17621 PR: https://git.openjdk.org/jdk/pull/17621
Re: RFR: 8318966: Some methods make promises about Java array element alignment that are too strong
On Wed, 15 Nov 2023 22:46:03 GMT, Jorn Vernee wrote: > See the JBS issue for an extended problem description. > > This patch changes the specification and implementation of > `MethodHandles::byteArrayViewVarHandle`, > `MethodHandles::byteBufferViewVarHandle`, `ByteBuffer::alignedSlice`, and > `ByteBuffer::alignmentOffset` to weaken the guarantees they make about the > alignment of Java array elements, in order to bring them in line with the > guarantees made by an arbitrary JVM implementation (which makes no guarantees > about array element alignment beyond them being aligned to their natural > alignment). > > - `MethodHandles::byteArrayViewVarHandle`: we can not guarantee any alignment > for the accesses. Which means we can only reliably support plain get and set > access modes. The javadoc text explaining which other access modes are > supported, or how to compute aligned offsets into the array is dropped, as it > is not guaranteed to be correct on all JVM implementations. The > implementation of the returned VarHandle is changed to throw an > `UnsupportedOperationException` for the unsupported access modes, as mandated > by the spec of `VarHandle` [1]. > > - `MethodHandles::byteBufferViewVarHandle`: the implementation/spec is > incorrect when accessing a heap buffer (wrapping a byte[]), for the same > reasons as `byteArrayViewVarHandle`. The spec is changed to specify that when > accessing a _heap buffer_, only plain get and set access modes are supported. > The implementation of the returned var handle is changed to throw an > `IllegalStateException` when an access is attempted on a heap buffer using an > access mode other than plain get or set. Note that we don't throw an outright > `UnsupportedOperationException` for this case, since whether the access modes > are supported depends on the byte buffer instance being used. > > - `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset`: The former > method depends directly on the latter for all its alignment computations. We > change the implementation of the latter method to throw an > `UnsupportedOperationException` for all unit sizes greater than 1, when the > buffer is non-direct. This change is largely covered by the existing > specification: > > > * @throws UnsupportedOperationException > * If the native platform does not guarantee stable alignment > offset > * values for the given unit size when managing the memory regions > * of buffers of the same kind as this buffer (direct or > * non-direct). For example, if garbage collection would result > * in the mo... Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16681#pullrequestreview-1857461928
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v11]
On Wed, 31 Jan 2024 23:53:16 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1613: >> >>> 1611: vpand(xtmp, idx_vec, xtmp, vlen_enc); >>> 1612: // Load double words from normalized indices. >>> 1613: evpgatherdd(dst, gmask, Address(base, xtmp, scale), vlen_enc); >> >> Another question, looks to me that we could read beyond the allocated memory >> for the array here. e.g. consider the following case: >> * It is a byte gather >> * The byte source array is of size 41, i.e. only indices 0-40 are valid >> * The gather index is 40 >> >> Then as part of evpgatherdd we would be reading bytes at 40-43 offset from >> source array. > > I guess the fact that the Java objects are 8 byte alignment padded and the > alignment being done at lines 1609-1611 and 1616-1621 somehow takes care of > this. Drive by comment, that can change with Project Lilliput. - PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473650392
Re: RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments
On Wed, 31 Jan 2024 00:10:26 GMT, Maurizio Cimadamore wrote: >> The implementation of method `VectorSpecies::fromMemorySegment`, in >> `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on >> the offset argument when the method is compiled by C2 (bounds checks are >> performed when interpreted and by C1). >> >> This is an oversight and explicit bounds checks are required, as is already >> case for the other load and store memory access methods (including storing >> to memory memory segments). >> >> The workaround is to call the static method `{T}Vector::fromMemorySegment`. >> >> The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` >> to do the same and call `{T}Vector::fromMemorySegment`, following the same >> pattern for implementations of `VectorSpecies::fromArray`. >> >> The tests have been conservatively updated to call the species access method >> where possible in the knowledge that it calls the vector access method (the >> tests were intended to test out of bounds access when compiled by C2). >> >> Thinking ahead its tempting to remove the species access methods, >> simplifying functionality that is duplicated. > > test/jdk/jdk/incubator/vector/templates/X-LoadStoreTest.java.template line > 271: > >> 269: @DontInline >> 270: static $abstractvectortype$ fromArray($type$[] a, int i) { >> 271: return ($abstractvectortype$) SPECIES.fromArray(a, i); > > These new tests focus on the species method - but should we also test the > statics factories in the record class, just in case a regression is > introduced and the two implementation start diverging again? My expectation is the risk is small, but of course non-zero. These tests can be expensive to run so i was trying balance the risk without increasing test execution times. I could strengthen the comment from: @ForceInline @Override final public ByteVector fromMemorySegment(MemorySegment ms, long offset, ByteOrder bo) { // User entry point: Be careful with inputs. return ByteVector .fromMemorySegment(this, ms, offset, bo); } to: @ForceInline @Override final public ByteVector fromMemorySegment(MemorySegment ms, long offset, ByteOrder bo) { // User entry point // Defer only to the equivalent method on the vector class, using the same inputs return ByteVector .fromMemorySegment(this, ms, offset, bo); } - PR Review Comment: https://git.openjdk.org/jdk/pull/17621#discussion_r1472162760
RFR: 8324858: [vectorapi] Bounds checking issues when accessing memory segments
The implementation of method `VectorSpecies::fromMemorySegment`, in `AbstractSpecies::fromMemorySegment`, neglects to perform bounds checks on the offset argument when the method is compiled by C2 (bounds checks are performed when interpreted and by C1). This is an oversight and explicit bounds checks are required, as is already case for the other load and store memory access methods (including storing to memory memory segments). The workaround is to call the static method `{T}Vector::fromMemorySegment`. The fix is for the implementation(s) of `VectorSpecies::fromMemorySegment` to do the same and call `{T}Vector::fromMemorySegment`, following the same pattern for implementations of `VectorSpecies::fromArray`. The tests have been conservatively updated to call the species access method were possible in the knowledge that calls the vector access method (the tests were intended to test out of bounds access when compiled by C2). Thinking ahead its tempting to remove the species access methods, simplifying functionality that is duplicated. - Commit messages: - Merge branch 'master' into v-load-segment-bounds-checks - 8324858: [vectorapi] Bounds checking issues when accessing memory segments Changes: https://git.openjdk.org/jdk/pull/17621/files Webrev: https://webrevs.openjdk.org/?repo=jdk=17621=00 Issue: https://bugs.openjdk.org/browse/JDK-8324858 Stats: 165 lines in 39 files changed: 56 ins; 8 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/17621.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17621/head:pull/17621 PR: https://git.openjdk.org/jdk/pull/17621
Re: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5]
On Wed, 24 Jan 2024 18:48:34 GMT, Aleksey Shipilev wrote: >> @dholmes-ora Indeed it's a compiler magic, albeit not really weird. While >> the method execution only receives the evaluated value of `expr`, the method >> compilation has the expression in its original form. As a result, it can >> determine the result based on this information. > > It is still weird to talk about expressions at this level. We really check if > the value is constant, like the method name suggests now. Yes, this > implicitly tests that the expression that produced that value is fully > constant-folded. But that's a detail that we do not need to capture here. > Let's rename `expr` -> `val`, and tighten up the javadoc for the method to > mention we only test the constness of the final value. I agree. All values are produced by evaluating expressions. In this case we want to query whether a value produced by the compiler evaluating its expression is a constant value (inputs to the expression are constants and the expression had no material side-effects). Meaning if the method returns true then we could use that knowledge in subsequent expressions that may also produce constants or some specific behavior. - PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1465449454
Re: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5]
On Tue, 23 Jan 2024 17:21:47 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used >> to statically determine whether an expression has been constant-folded by >> the Jit compiler, leading to more constant-folding opportunities. For >> example, it can be used in `MemorySessionImpl::checkValidStateRaw` to >> eliminate the lifetime check on global sessions without imposing additional >> branches on other non-global sessions. This is similar to >> `__builtin_constant_p` in GCC and clang. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional > commit since the last revision: > > ident src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 32: > 30: * Just-in-time-compiler-related queries > 31: */ > 32: public class JitCompiler { An alternative name and location is `jdk.internal.vm.ConstantSupport` with initial class doc: Defines methods to test if a value has been evaluated to a compile-time constant value by the HotSpot VM. - PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463926393
Re: RFR: 8316641: VarHandle template classes can share code in the base class [v8]
On Mon, 4 Dec 2023 14:58:34 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/invoke/X-VarHandleByteArrayView.java.template >> line 594: >> >>> 592: @ForceInline >>> 593: static int indexRO(ByteBuffer bb, int index) { >>> 594: if (bb.isReadOnly()) >> >> I have to think that there was a reason for using unsafe access in the >> current code. So, I'm not sure if it's safe to switch this to a direct >> method/field reference. > > @PaulSandoz Do you remember the history here? I don't recall exactly but the unsafe usage is likely to avoid possible profile pollution due to the abstract method call. - PR Review Comment: https://git.openjdk.org/jdk/pull/15854#discussion_r1414197423
Re: RFR: 8321223: Implementation of Scoped Values (Second Preview)
On Sun, 3 Dec 2023 08:46:07 GMT, Alan Bateman wrote: > This API is sitting out JDK 22, meaning no API/implementation changes in this > PR. Some small API changes are likely for JDK 23. > > For now, we just need to bump JEP number/title that shows up in the preview > section of the javadoc. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16937#pullrequestreview-1761337322
Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]
On Wed, 22 Nov 2023 09:05:31 GMT, Andrew Haley wrote: > > Have you considered the possibility of copying the sleef source to the > > OpenJDK repository and thereby it becomes part of the build process? I > > don't know how straightforward that is technically and IANAL but I think > > it's worth exploring. > > Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds > good. However, I actually have no idea about how to handle the third-party > licence in OpenJDK project. Do you have any idea about this area? Some > suggestions/guidence from the JDK team will be much helpful. Thanks! We (Oracle Java Platform Group) can handle the required "paperwork" on any third party dependencies and attribution of copyright before any PR can be integrated, if you can help detail what those are. First i think we need to determine if this is feasible e.g., copying a subset and integrating it into the build system, since it does not make sense to bring in the support for quad floats and DFT, which IIUC brings in a dependency on compiler support for OpenMP. - PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1823335443
Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]
On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like >> `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, >> which causes large performance gap on AArch64. Note that those APIs are >> optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. >> To close the gap, we would like to optimize these APIs for AArch64 by >> calling a third-party vector library called libsleef [2], which are >> available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and >> implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA >> instructions used stubs in libsleef for most of the operations by default, >> and 2) add the vector calling convention to apply with the runtime calls to >> stub code in libsleef. Note that for those APIs that libsleef does not >> support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an >> experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. >> People can use it to denote the libsleef path/name explicitly. By default, >> it points to the system installed library. If the library does not exist or >> the dynamic loading of it in runtime fails, the math vector ops will >> fall-back to use the default scalar version without error. But a warning is >> printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], >> just with some initial review comments addressed. And now we'd like to get >> some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains five additional > commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. Also it may enable us to use sleef for other platforms where we have gaps (looking at Table 1.1 of https://sleef.org/). Further out it should inspire us to do a Java Vector API port to as indicated in a prior comment. > Yes, libsleef is used to build the binary if found. And at runtime, if the > libsleef with right version is not found, `dlopen` to the libvmath.so will > fail. And then all the operations will be fall-back to the java default > implementation. X86_64 has also bundled the Intel's SVML binary to jdk image > at build time. And we use the same way loading/opening the library at runtime. - PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821885433
Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]
On Mon, 23 Oct 2023 09:02:35 GMT, Xiaohong Gong wrote: > This looks good. As far as I can tell the choice you've made of accuracy > matches what we need to meet the spec. Same here . Sinh/cosh/tanh/expm1 are specified to be within 2.5 ulps of the exact result, but i believe sleef does not offer that option, it's either within 1 or 3.5, so we have to pick the former. AFAICT sleef does not say anything about monotonicity, but we relax semi-monotonicity for all but sqrt (we defer to IEEE 754). - PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821422985
Re: RFR: 8318966: Some methods make promises about Java array element alignment that are too strong
On Wed, 15 Nov 2023 22:46:03 GMT, Jorn Vernee wrote: > See the JBS issue for an extended problem description. > > This patch changes the specification and implementation of > `MethodHandles::byteArrayViewVarHandle`, > `MethodHandles::byteBufferViewVarHandle`, `ByteBuffer::alignedSlice`, and > `ByteBuffer::alignmentOffset` to weaken the guarantees they make about the > alignment of Java array elements, in order to bring them in line with the > guarantees made by an arbitrary JVM implementation (which makes no guarantees > about array element alignment beyond them being aligned to their natural > alignment). > > - `MethodHandles::byteArrayViewVarHandle`: we can not guarantee any alignment > for the accesses. Which means we can only reliably support plain get and set > access modes. The javadoc text explaining which other access modes are > supported, or how to compute aligned offsets into the array is dropped, as it > is not guaranteed to be correct on all JVM implementations. The > implementation of the returned VarHandle is changed to throw an > `UnsupportedOperationException` for the unsupported access modes, as mandated > by the spec of `VarHandle` [1]. > > - `MethodHandles::byteBufferViewVarHandle`: the implementation/spec is > incorrect when accessing a heap buffer (wrapping a byte[]), for the same > reasons as `byteArrayViewVarHandle`. The spec is changed to specify that when > accessing a _heap buffer_, only plain get and set access modes are supported. > The implementation of the returned var handle is changed to throw an > `IllegalStateException` when an access is attempted on a heap buffer using an > access mode other than plain get or set. Note that we don't throw an outright > `UnsupportedOperationException` for this case, since whether the access modes > are supported depends on the byte buffer instance being used. > > - `ByteBuffer::alignedSlice` and `ByteBuffer::alignmentOffset`: The former > method depends directly on the latter for all its alignment computations. We > change the implementation of the latter method to throw an > `UnsupportedOperationException` for all unit sizes greater than 1, when the > buffer is non-direct. This change is largely covered by the existing > specification: > > > * @throws UnsupportedOperationException > * If the native platform does not guarantee stable alignment > offset > * values for the given unit size when managing the memory regions > * of buffers of the same kind as this buffer (direct or > * non-direct). For example, if garbage collection would result > * in the mo... src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 4518: > 4516: * Only plain {@linkplain VarHandle.AccessMode#GET get} and > {@linkplain VarHandle.AccessMode#SET set} > 4517: * access modes are supported by the returned var handle. For all > other access modes, an > 4518: * {@link UnsupportedOperationException} will be thrown. I recommend adding an api note explaining that native memory segments, direct byte buffers, or heap memory segments backed by long[] should be used if support for other access modes are required. src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 4610: > 4608: * {@link Double#doubleToRawLongBits}, respectively). > 4609: * > 4610: * Access to heap byte buffers is always unaligned. I recommend merging that sentence into the paragraph on heap byte buffers e.g.: > For direct buffers, access of the bytes at an index is always misaligned. As > a result only the plain... src/java.base/share/classes/java/nio/X-Buffer.java.template line 2218: > 2216: * @implNote > 2217: * This implementation throws {@code UnsupportedOperationException} > for > 2218: * non-direct buffers when the given unit size is greater then > {@code 1}. This is no longer an implementation note, its now part of the specified API. So i think we can simplify the text of the `@throws UOE ...` to just say: @throws UOE if the buffer is non-direct and the unit size > 1 Same for the other method. - PR Review Comment: https://git.openjdk.org/jdk/pull/16681#discussion_r1396097577 PR Review Comment: https://git.openjdk.org/jdk/pull/16681#discussion_r1396101675 PR Review Comment: https://git.openjdk.org/jdk/pull/16681#discussion_r1396109655
Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v9]
On Wed, 15 Nov 2023 17:50:48 GMT, Viktor Klang wrote: >> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461) > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Improvements after feedback Elegantly and thoroughly done (I mostly focused on the API and implementation). I have been following the work and providing ongoing feedback hence no specific comments here. Going preview now will allow for some additional time and feedback on the ergonomics of Gatherer construction. There is also the intriguing prospect of further work to replace stream internals if the performance holds up (the ability to optimize when composing gathers seems key here as you already have explored in the performance tests - the runtime compiler should be able to see through the shorter paths more easily). - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16420#pullrequestreview-1733265473
Re: RFR: JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview) [v5]
On Mon, 13 Nov 2023 09:22:13 GMT, Viktor Klang wrote: >> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461) > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Minor improvements to Gatherer Javadoc src/java.base/share/classes/java/util/stream/Gatherer.java line 40: > 38: * stream of output elements, optionally applying a final action when the > end of > 39: * the upstream is reached. The transformation may be stateless or > stateful, > 40: * and a Gatherer may buffer arbitrarily much input before producing any > output. Suggestion: * and may buffer input before producing any output. src/java.base/share/classes/java/util/stream/Gatherer.java line 63: > 61: * > 62: * > 63: * Each invocation to {@link #initializer()}, {@link #integrator()}, Suggestion: * Each invocation of {@link #initializer()}, {@link #integrator()}, src/java.base/share/classes/java/util/stream/Gatherer.java line 119: > 117: * > 118: * As an example, in order to create a gatherer to implement a > sequential > 119: * Prefix Scan as a Gatherer, it could be done the following way: Suggestion: * As an example, a Gatherer implementing a sequential Prefix Scan * could be done the following way: src/java.base/share/classes/java/util/stream/Gatherer.java line 152: > 150: * } > 151: * > 152: * @implSpec Libraries that implement transformation based on {@code > Gatherer}, Suggestion: * @implSpec Libraries that implement transformations based on {@code Gatherer}, src/java.base/share/classes/java/util/stream/Gatherer.java line 166: > 164: * arguments passed to the combiner function, and the argument > passed to the > 165: * finisher function must be the result of a previous invocation of > the > 166: * initializer, integrator, or combiner functions. the integrator returns a boolean, so is it just the result of a previous invocation of the initializer or combiner functions? (Similarly for the next clause.) src/java.base/share/classes/java/util/stream/Gatherer.java line 185: > 183: * partitions state using the combiner, and then invoking the > finisher on > 184: * the joined state. Outputs and state later in the input sequence > will > 185: * be discarded if processing an earlier segment short-circuits. Suggestion: * be discarded if processing an earlier partition short-circuits. src/java.base/share/classes/java/util/stream/Gatherer.java line 501: > 499: > 500: /** > 501: * Allows for checking whether the next stage is known to not > want Suggestion: * Checks whether the next stage is known to not want src/java.base/share/classes/java/util/stream/Gatherer.java line 558: > 556: > 557: /** > 558: * Factory method for turning Integrator-shaped lambdas into Suggestion: * Targets Integrator-shaped lambdas to It's not a factory, as it does produce anything, it's an identity function that aids in the targeting of the lambda expression to the integrator when there would otherwise be ambiguity as to the target. src/java.base/share/classes/java/util/stream/Gatherers.java line 47: > 45: > 46: /** > 47: * Implementations of {@link Gatherer} that implement various useful > intermediate Suggestion: * Implementations of {@link Gatherer} that provide useful intermediate src/java.base/share/classes/java/util/stream/Gatherers.java line 80: > 78: * and eagerly. This means that choosing large window sizes > for > 79: * small streams may use excessive memory for the duration of > 80: * evaluation of this operation. Suggestion: * small streams may use excessive memory during * evaluation of this operation. (Same for other methods.) src/java.base/share/classes/java/util/stream/Gatherers.java line 329: > 327: > 328: /** > 329: * An operation which executes operations concurrently Suggestion: * An operation which executes a function concurrently src/java.base/share/classes/java/util/stream/Gatherers.java line 340: > 338: * If the mapper throws an exception during evaluation of this > Gatherer, > 339: * and the result of that invocation is to be produced to the > downstream, > 340: * then that exception will instead be rethrown as a {@link > RuntimeException}. Suggestion: * If a result of the function is to be pushed downstream but instead the function completed * exceptionally then the corresponding exception will instead be rethrown by this method as an * instance of {@link RuntimeException}. After which any remaining tasks are canceled. - PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1391814346 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1391815675 PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1391818520 PR
Re: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch
On Tue, 7 Nov 2023 17:45:36 GMT, Paul Sandoz wrote: >> This test requires certain methods to be compiled, but without `-Xbatch` the >> compiler races against the test code, which can lead to intermittent >> failures. > > Marked as reviewed by psandoz (Reviewer). > @PaulSandoz do you see any problem with this change? Adding `-Xbatch` does > not significantly increase the test run time. Seems ok to me. - PR Comment: https://git.openjdk.org/jdk/pull/16023#issuecomment-1799331748
Re: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch
On Tue, 3 Oct 2023 07:47:30 GMT, Gergö Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the > compiler races against the test code, which can lead to intermittent failures. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16023#pullrequestreview-1718335476
Re: RFR: 8318678: Vector access on heap MemorySegments only works for byte[] [v3]
On Mon, 30 Oct 2023 13:43:13 GMT, Per Minborg wrote: >> This PR proposes removing the restriction that only heap `MemorySegment` >> wrapping a `byte` array can be accessed by Vectors. Now any array type can >> be used provided the element alignment constraints are respected. > > Per Minborg has updated the pull request incrementally with one additional > commit since the last revision: > > Add benchmark and intrinsification spy Looks good. Please remove `IntrinsicHeapTest` and let's follow up in another issue/PR to fix the mismatched heap access not being intrinsic. - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16360#pullrequestreview-1704600236
Re: RFR: 8318678: Vector access on heap MemorySegments only works for byte[] [v3]
On Mon, 30 Oct 2023 13:43:13 GMT, Per Minborg wrote: >> This PR proposes removing the restriction that only heap `MemorySegment` >> wrapping a `byte` array can be accessed by Vectors. Now any array type can >> be used provided the element alignment constraints are respected. > > Per Minborg has updated the pull request incrementally with one additional > commit since the last revision: > > Add benchmark and intrinsification spy test/jdk/jdk/incubator/vector/IntrinsicHeapTest.java line 48: > 46: import static org.testng.Assert.*; > 47: > 48: public class IntrinsicHeapTest { I don't think we need this test as part of this PR. It's useful ascertain what is intrinsic or not. We can rethink it as part of the following fix if needed to verify the intrinsics work across all the cross product of array types and vector types. - PR Review Comment: https://git.openjdk.org/jdk/pull/16360#discussion_r1376494655
Re: RFR: 8318678: Vector access on heap MemorySegments only works for byte[] [v2]
On Thu, 26 Oct 2023 09:17:25 GMT, Per Minborg wrote: >> This PR proposes removing the restriction that only heap `MemorySegment` >> wrapping a `byte` array can be accessed by Vectors. Now any array type can >> be used provided the element alignment constraints are respected. > > Per Minborg has updated the pull request incrementally with one additional > commit since the last revision: > > Allow unaligned array access We also need to review the intrinsic for load/store found here to ensure, when the hardware supports it, that mixed access is optimized: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L967 I recommend writing a simple benchmark with various forms of mixed access and verify the intrinsics kick in (longer term IR tests would be preferable). - PR Comment: https://git.openjdk.org/jdk/pull/16360#issuecomment-1781482409
Re: RFR: 8318421 : AbstractPipeline.sourceStageSpliterator() chases pointers needlessly
On Wed, 18 Oct 2023 10:14:08 GMT, Viktor Klang wrote: > Removes a few unnecessary dereferences of `sourceStage`. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16241#pullrequestreview-1685995672
Re: RFR: 8318420 : AbstractPipeline invokes overridden method in constructor
On Wed, 18 Oct 2023 09:45:53 GMT, Viktor Klang wrote: > This PR corrects so that `opIsStateful()` is not invoked as a part of the > java.util.stream.AbstractPipeline constructor—as `opIsStateful()` is intended > to be overridden by subclasses, and as their own constructors have not run > when their superclass constructor runs, this means that `opIsStateful()` > cannot base its return value on any class members of the subclass. > > Since `opIsStateful()` is only needed for parallel streams, we can therefor > defer the invocation of `opIsStateful()` until evaluation-time, and as such > we can remove the need for having an instance field to store the result of > the invocation—making Stream instances potentially a tiny bit smaller, > reducing Stream-construction overhead, while still preserving semantics. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16239#pullrequestreview-1685991329
Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]
On Thu, 12 Oct 2023 03:44:28 GMT, Danny Thomas wrote: > At least on Saphire Rapids the [emulation suggested > here](https://github.com/natmaurice/x86-simd-sort/commit/41d03b2d8f3b62a2ee6a3a97a8da7f193a407026) > only imposes a 6% penalty for `intSort`, while also mitigating the > performance issue on Zen 4. Thanks for testing this out, this seems a good solution to explore. Is there is a way to `ifdef` switching on the CPU vendor? otherwise we could probably set a flag in the build script based of the CPU vendor, which may be more robust if different compilers are used. - PR Comment: https://git.openjdk.org/jdk/pull/16124#issuecomment-1759992778
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]
On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The pull request now contains 45 commits: > > - fix code style and formatting > - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort > - Update CompileThresholdScaling only for the sort and partition intrinsics; > update build script to remove nested if > - change variable names of indexPivot* to pivotIndex* > - Update DualPivotQuicksort.java > - Rename arraySort and arrayPartition Java methods to sort and partition. > Cleanup some comments > - Remove the unnecessary exception in single pivot partitioning fallback > method > - Move functional interfaces close to the associated methods > - Refactor the sort and partition
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]
On Sun, 8 Oct 2023 06:18:13 GMT, Danny Thomas wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The pull request now contains 45 commits: >> >> - fix code style and formatting >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort >> - Update CompileThresholdScaling only for the sort and partition >> intrinsics; update build script to remove nested if >> - change variable names of indexPivot* to pivotIndex* >> - Update DualPivotQuicksort.java >> - Rename arraySort and arrayPartition Java methods to sort and partition. >> Cleanup some comments >> - Remove the unnecessary exception in single pivot partitioning fallback >> method >> - Move functional interfaces close to the associated methods >> - Refactor the sort and partition intrinsics to accept method references >> for fallback functions >> - Refactor stub handling to use a generic function for all types >> - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86 > > A [discussion on > Reddit](https://www.reddit.com/r/java/comments/171t5sj/heads_up_openjdk_implementation_of_avx512_based/) > raised that this had the potential to regress sort performance on AMD Zen 4. > The poster didn't have access to Zen 4 hardware, so I took a moment to run > the benchmark, comparing against a baseline with `UseAVX=2` on an AWS > `m7a.2xlarge`: > > > processor : 0 > vendor_id : AuthenticAMD > cpu family: 25 > model : 17 > model name: AMD EPYC 9R14 > stepping : 1 > microcode : 0xa10113e > cpu MHz : 3673.537 > cache size: 1024 KB > physical id : 0 > siblings : 8 > core id : 0 > cpu cores : 8 > apicid: 0 > initial apicid: 0 > fpu : yes > fpu_exception : yes > cpuid level : 16 > wp: yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb > rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf > tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic > movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy > cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core > invpcid_single ssbd perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 > smep bmi2 invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb > avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 > clzero xsaveerptr rdpru wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni > vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid flush_l1d > bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass > bogomips : 5200.00 > TLB size : 3584 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: > > > > $ make test TEST="micro:java.lang.ArraysSort.intSort" > > Benchmark(size) Mode Cnt Score Error Units > ArraysSort.intSort 10 avgt3 0.033 ? 0.001 us/op > ArraysSort.intSort 25 avgt3 0.067 ? 0.002 us/op > ArraysSort.intSort 50 avgt3 0.391 ? 0.007 us/op > ArraysSort.intSort 75 avgt3 0.923 ? 0.041 us/op > ArraysSort.intSort 100 avgt3 1.292 ? 0.009 us/op > ArraysSort.intSort 1000 avgt3 24.268 ? 0.078 us/op > ArraysSort.intSort1 avgt3320.131 ? 0.381 us/op > ArraysSort.intSort 10 avgt3 4803.142 ? 63.901 us/op > ArraysSort.intSort 100 avgt3 61457.708 ? 347.446 us/op > > > > $ make test TEST="micro:java.lang.ArraysSort.intSort" MICRO... @DanielThomas thank you, i was not aware of the limitation on AMD Zen 4. We need to address this and it appears the fix is simple and localized. Thankfully we are early into the JDK 22 development cycle so we have time to shake this out. - PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1753412092
Re: RFR: 8316998: Remove redundant type arguments in the java.util.stream package [v3]
On Fri, 29 Sep 2023 00:21:52 GMT, Mourad Abbay wrote: > Are you referring to the expand of lambdas and the extra whitespaces ? Yes, sometimes the IDE just does it automatically! - PR Comment: https://git.openjdk.org/jdk/pull/15936#issuecomment-1740160022
Re: RFR: 8316998: Remove redundant type arguments in the java.util.stream package [v3]
On Fri, 29 Sep 2023 00:37:16 GMT, Mourad Abbay wrote: > > > Are you referring to the expand of lambdas and the extra whitespaces ? > > > > > > Yes, sometimes the IDE just does it automatically! > > Is there a way I can make the IDE respect the code style? I doubt it, since 1) there is no agreed on code style in OpenJDK (although there have been attempts to propose one); and 2) each source file might have its own style (and there may even be inconsistencies within the same file). - PR Comment: https://git.openjdk.org/jdk/pull/15936#issuecomment-1740165050
Re: RFR: 8317119: Remove unused imports in the java.util.stream package [v2]
On Wed, 27 Sep 2023 22:44:52 GMT, Mourad Abbay wrote: >> Remove unused imports in the java.util.stream package. > > Mourad Abbay has updated the pull request incrementally with one additional > commit since the last revision: > > Update copyright year. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/15949#pullrequestreview-1649773090
Re: RFR: 8317034: Remove redundant type cast in the java.util.stream package [v3]
On Wed, 27 Sep 2023 22:47:43 GMT, Mourad Abbay wrote: >> Remove redundant type cast in the java.util.stream package. > > Mourad Abbay has updated the pull request incrementally with one additional > commit since the last revision: > > Update copyright year. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/15942#pullrequestreview-1649773625
Re: RFR: 8316998: Remove redundant type arguments in the java.util.stream package [v3]
On Wed, 27 Sep 2023 22:44:07 GMT, Mourad Abbay wrote: >> Remove cases of redundant type arguments in the java.util.stream package. > > Mourad Abbay has updated the pull request incrementally with one additional > commit since the last revision: > > Update copyright year. Looks good. Note a general rule to mostly follow is to try and stick to the prevailing style in source being modified and not make additional and unrelated changes (no matter how tempting it might be). In this case i don't have strong opinions on those changes and they did not detract from reviewing the code. - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15936#pullrequestreview-1649772340
Re: RFR: 8317264: Pattern.Bound has `static` fields that should be `static final`. [v2]
On Thu, 28 Sep 2023 20:39:13 GMT, Eamonn McManus wrote: >> It looks to have been an oversight that `final` was omitted. The fields are >> never assigned after initialization. `final` leads to shorter bytecode. > > Eamonn McManus has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains two additional > commits since the last revision: > > - Merge branch 'openjdk:master' into staticfinal > - In `Pattern.Bound`, make some constants `static final`. > >It looks to have been an oversight that `final` was omitted. The fields are >never assigned after initialization. `final` leads to shorter bytecode. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/15967#pullrequestreview-1649761096
Re: RFR: 8317034: Redundant type cast
On Wed, 27 Sep 2023 08:50:20 GMT, Mourad Abbay wrote: > Remove redundant type cast in the java.util.stream package. src/java.base/share/classes/java/util/stream/DoublePipeline.java line 391: > 389: if (maxSize < 0) > 390: throw new IllegalArgumentException(Long.toString(maxSize)); > 391: return SliceOps.makeDouble(this, 0, maxSize); Suggestion: return SliceOps.makeDouble(this, 0L, maxSize); - PR Review Comment: https://git.openjdk.org/jdk/pull/15942#discussion_r1339158996
Re: RFR: 8316998: Remove redundant type arguments
On Wed, 27 Sep 2023 00:58:01 GMT, Mourad Abbay wrote: > Remove cases of redundant type arguments in the java.util.stream package. Looks like there are a few others in the package. In `Collectors`, see see construction of `CollectorImpl`. See also `DistinctOp`, `DoublePipeline`, `IntPipeline`, `LongPipeline`, `ReferencePipeline`, and `WhileOps`. - PR Comment: https://git.openjdk.org/jdk/pull/15936#issuecomment-1738008502
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API >> JEP for Java 22. The initial patch is composed of commits brought over >> directly from the [panama-foreign >> repo](https://github.com/openjdk/panama-foreign). The main changes found in >> this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous >> iterations supported converting Java strings to and from native strings in >> the UTF-8 encoding, we've extended the supported encoding to all the >> encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the >> `MemoryLayout::sequenceLayout` factory method which inferred the size of the >> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A >> client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: >> `Linker::canonicalLayouts`, which exposes a map containing the >> platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access >> varhandles, as well as byte offset and slice handles derived from memory >> layouts, now feature an additional 'base offset' coordinate that is added to >> the offset computed by the handle. This allows composing these handles with >> other offset computation strategies that may not be based on the same memory >> layout. This addresses use-cases where clients are working with 'dynamic' >> layouts, whose size might not be known statically, such as variable length >> arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now >> redundant API. Clients can simply use the difference between the base >> address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of >> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + >> initialize memory segments to `allocateFrom`. (see the original PR for the >> problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the >> documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to >> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking >> linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API >> required. The `Linker::nativeLinker` method is not longer allowed to throw >> an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > fix typos This looks good. In the implementation the functional interfaces `BindingInterpreter.StoreFunc/LoadFunc` are package private, but are used in internal public signatures. This was previously like this and it's not a big deal but i recommend making those interfaces public. We can also remove `@enablePreview` from `IndirectVarHandleTest` - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15103#pullrequestreview-1645266436
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API >> JEP for Java 22. The initial patch is composed of commits brought over >> directly from the [panama-foreign >> repo](https://github.com/openjdk/panama-foreign). The main changes found in >> this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous >> iterations supported converting Java strings to and from native strings in >> the UTF-8 encoding, we've extended the supported encoding to all the >> encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the >> `MemoryLayout::sequenceLayout` factory method which inferred the size of the >> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A >> client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: >> `Linker::canonicalLayouts`, which exposes a map containing the >> platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access >> varhandles, as well as byte offset and slice handles derived from memory >> layouts, now feature an additional 'base offset' coordinate that is added to >> the offset computed by the handle. This allows composing these handles with >> other offset computation strategies that may not be based on the same memory >> layout. This addresses use-cases where clients are working with 'dynamic' >> layouts, whose size might not be known statically, such as variable length >> arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now >> redundant API. Clients can simply use the difference between the base >> address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of >> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + >> initialize memory segments to `allocateFrom`. (see the original PR for the >> problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the >> documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to >> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking >> linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API >> required. The `Linker::nativeLinker` method is not longer allowed to throw >> an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > fix typos src/java.base/share/classes/java/lang/foreign/Linker.java line 35: > 33: > 34: import java.lang.invoke.MethodHandle; > 35: import java.nio.ByteOrder; Unused? - PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337802958
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API >> JEP for Java 22. The initial patch is composed of commits brought over >> directly from the [panama-foreign >> repo](https://github.com/openjdk/panama-foreign). The main changes found in >> this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous >> iterations supported converting Java strings to and from native strings in >> the UTF-8 encoding, we've extended the supported encoding to all the >> encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the >> `MemoryLayout::sequenceLayout` factory method which inferred the size of the >> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A >> client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: >> `Linker::canonicalLayouts`, which exposes a map containing the >> platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access >> varhandles, as well as byte offset and slice handles derived from memory >> layouts, now feature an additional 'base offset' coordinate that is added to >> the offset computed by the handle. This allows composing these handles with >> other offset computation strategies that may not be based on the same memory >> layout. This addresses use-cases where clients are working with 'dynamic' >> layouts, whose size might not be known statically, such as variable length >> arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now >> redundant API. Clients can simply use the difference between the base >> address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of >> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + >> initialize memory segments to `allocateFrom`. (see the original PR for the >> problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the >> documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to >> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking >> linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API >> required. The `Linker::nativeLinker` method is not longer allowed to throw >> an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > fix typos src/java.base/share/classes/jdk/internal/foreign/NativeMemorySegmentImpl.java line 152: > 150: private static long allocateMemoryWrapper(long size) { > 151: try { > 152: return UNSAFE.allocateMemory(size); Since we now zero memory only when needed we should test very carefully. - PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337794211
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API >> JEP for Java 22. The initial patch is composed of commits brought over >> directly from the [panama-foreign >> repo](https://github.com/openjdk/panama-foreign). The main changes found in >> this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous >> iterations supported converting Java strings to and from native strings in >> the UTF-8 encoding, we've extended the supported encoding to all the >> encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the >> `MemoryLayout::sequenceLayout` factory method which inferred the size of the >> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A >> client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: >> `Linker::canonicalLayouts`, which exposes a map containing the >> platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access >> varhandles, as well as byte offset and slice handles derived from memory >> layouts, now feature an additional 'base offset' coordinate that is added to >> the offset computed by the handle. This allows composing these handles with >> other offset computation strategies that may not be based on the same memory >> layout. This addresses use-cases where clients are working with 'dynamic' >> layouts, whose size might not be known statically, such as variable length >> arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now >> redundant API. Clients can simply use the difference between the base >> address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of >> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + >> initialize memory segments to `allocateFrom`. (see the original PR for the >> problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the >> documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to >> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking >> linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API >> required. The `Linker::nativeLinker` method is not longer allowed to throw >> an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > fix typos src/java.base/share/classes/java/lang/foreign/Linker.java line 735: > 733: * > 734: * @apiNote This linker option can not be combined with {@link > #critical}. > 735: * That seems more specification (that can be asserted on) then an informative note. - PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337661700
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v27]
On Mon, 25 Sep 2023 16:54:10 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API >> JEP for Java 22. The initial patch is composed of commits brought over >> directly from the [panama-foreign >> repo](https://github.com/openjdk/panama-foreign). The main changes found in >> this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous >> iterations supported converting Java strings to and from native strings in >> the UTF-8 encoding, we've extended the supported encoding to all the >> encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the >> `MemoryLayout::sequenceLayout` factory method which inferred the size of the >> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A >> client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: >> `Linker::canonicalLayouts`, which exposes a map containing the >> platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access >> varhandles, as well as byte offset and slice handles derived from memory >> layouts, now feature an additional 'base offset' coordinate that is added to >> the offset computed by the handle. This allows composing these handles with >> other offset computation strategies that may not be based on the same memory >> layout. This addresses use-cases where clients are working with 'dynamic' >> layouts, whose size might not be known statically, such as variable length >> arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now >> redundant API. Clients can simply use the difference between the base >> address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of >> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + >> initialize memory segments to `allocateFrom`. (see the original PR for the >> problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the >> documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to >> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking >> linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API >> required. The `Linker::nativeLinker` method is not longer allowed to throw >> an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > fix typos src/java.base/share/classes/java/lang/Module.java line 328: > 326: System.err.printf(""" > 327: WARNING: A restricted method in %s has been > called > 328: WARNING: %s has been called%s in %s Suggestion: WARNING: %s has been called by %s in %s ? - PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1337640870
Re: RFR: 8271268: Fix Javadoc links for Stream.mapMulti [v3]
On Fri, 22 Sep 2023 17:08:48 GMT, Mourad Abbay wrote: >> Fix Javadoc links for Stream.mapMulti, Stream.MapMultiInt, >> Stream.mapMultiToInt, Stream.mapMultiToLong and Stream.mapMultiToDouble. > > Mourad Abbay has updated the pull request incrementally with one additional > commit since the last revision: > > Display links to Stream.flatMap and Stream.mapMulti without parameters. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/15794#pullrequestreview-1640438479
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* The Java changes look good to me, it slots in very neatly with the same space/memory characteristics. It's generally pleasing to me to see recent data parallel sorting techniques from academic literature applied in the Java platform. I wish it were easier to support different OS platforms (e.g., MacOS with x86) but its a compromise on complexity of the sort implementations. I wonder if it makes sense to adjust the insertion sort thresholds when the intrinsics are enabled. That could be revisited later. - Marked as reviewed by psandoz (Reviewer). PR Review:
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* test/jdk/java/util/Arrays/Sorting.java line 30: > 28: * @build Sorting > 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions > -XX:DisableIntrinsic=_arraySort,_arrayPartition, Sorting -shortrun > 30: * @run main/othervm -XX:CompileThreshold=1 -XX:-TieredCompilation > Sorting -shortrun It would be useful to target the compilation threshold only for the intrinsic methods, but i suspect that is not possible? - PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r151101
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v39]
On Wed, 20 Sep 2023 16:27:19 GMT, Srinivas Vamsi Parasa wrote: > This API was suggested to me and was already reviewed by others. Confirming so, this was my suggestion. We use the _double-register_ addressing mode (as commonly applied in unsafe), thereby the intrinsic is capable of being used on and off-heap. To reduce the number of intrinsic methods we pass the element class as the first argument. - PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1728089648
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v9]
On Fri, 15 Sep 2023 20:17:17 GMT, Paul Sandoz wrote: >>> Alan, you mentioned that DualPivotQuicksort will need detailed review. Can >>> we go ahead and start reviewing? Laurent checked performance, JMH results >>> look fine. >> >> As before, I think the main question with this change is whether adding >> radix sort to the mix is worth the complexity and additional code to >> maintain. Also as we discussed in the previous PR, the additional memory >> needed for the radix sort may have an effect on other things that are going >> on concurrently. I know it has been updated to handle OOME but I think >> potential reviewers would need to be comfortable with that part. > >> > Alan, you mentioned that DualPivotQuicksort will need detailed review. Can >> > we go ahead and start reviewing? Laurent checked performance, JMH results >> > look fine. >> >> As before, I think the main question with this change is whether adding >> radix sort to the mix is worth the complexity and additional code to >> maintain. Also as we discussed in the previous PR, the additional memory >> needed for the radix sort may have an effect on other things that are going >> on concurrently. I know it has been updated to handle OOME but I think >> potential reviewers would need to be comfortable with that part. > > I too share concerns about the potential increased use of memory for sorting > ints/longs/floats/doubles. With modern SIMD hardware and data parallel > techniques we can apply quicksort much more efficiently. I think it is > important to determine to what extent this reduces the need for radix sort. > To determine that we would need to carefully measure against the AVX-512 > implementation (with ongoing improvements) with appropriately initialized > data to sort, and further measure against an AVX2 version. > Hi Paul (@PaulSandoz), Alan (@AlanBateman), Any update? Do you agree with > Radix sort in parallel case only? I think its definitely a better fit, but another aspect of my previous comment was wondering if we need a radix sort if the vectorized quicksort implementation is fast enough. IMO we need to compare performance results with the vectorized quick sort, and be aware of future enhancements to that. - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1728082819
Re: RFR: 8271268: Fix Javadoc links for Stream.mapMulti
On Mon, 18 Sep 2023 18:09:57 GMT, Mourad Abbay wrote: > Fix Javadoc links for Stream.mapMulti, Stream.MapMultiInt, > Stream.mapMultiToInt, Stream.mapMultiToLong and Stream.mapMultiToDouble. Note to others. I suggested @mabbay pick up the dropped [PR](https://github.com/openjdk/jdk/pull/3909/), as a starter issue to help achieve OpenJDK author status. - PR Comment: https://git.openjdk.org/jdk/pull/15794#issuecomment-1724417397
Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v9]
On Fri, 1 Sep 2023 06:12:57 GMT, Alan Bateman wrote: > > Alan, you mentioned that DualPivotQuicksort will need detailed review. Can > > we go ahead and start reviewing? Laurent checked performance, JMH results > > look fine. > > As before, I think the main question with this change is whether adding radix > sort to the mix is worth the complexity and additional code to maintain. Also > as we discussed in the previous PR, the additional memory needed for the > radix sort may have an effect on other things that are going on concurrently. > I know it has been updated to handle OOME but I think potential reviewers > would need to be comfortable with that part. I too share concerns about the potential increased use of memory for sorting ints/longs/floats/doubles. With modern SIMD hardware and data parallel techniques we can apply quicksort much more efficiently. I think it is important to determine to what extent this reduces the need for radix sort. To determine that we would need to carefully measure against the AVX-512 implementation (with ongoing improvements) with appropriately initialized data to sort, and further measure against an AVX2 version. - PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1721811993
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v36]
On Wed, 13 Sep 2023 23:02:21 GMT, Srinivas Vamsi Parasa wrote: > Could you please have a look at the changes in `DualPivotQuicksort.java` and > provide your feedback? I agree that is much cleaner, glad that worked out. That neatly covers multiple element types and Java-based insertion sort algorithms (although I don't know why we need two since mixed insertion effectively embeds the other but we don't need to address that here). I recommend embedding the functional interfaces next to the associated methods, rather than as auxiliary classes, and also adding `@ForceInline` on `arraySort`. - PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1720264496
Re: RFR: 8315938: Deprecate for removal Unsafe methods that have standard APIs for many releases
On Fri, 8 Sep 2023 15:54:05 GMT, Alan Bateman wrote: > There are several methods defined by sun.misc.Unsafe that have standard API > equivalents for many years and releases. The change proposed here is to > deprecate, for removal, the park, unpark, getLoadAverage, loadFence, > storeFence, and fullFence methods. Code using these methods should move to > LockSupport.park/unpark (Java 5), OperatingSystemMXBean.getSystemLoadAverage > (Java 6), and VarHandles xxxFence methods (Java 9). > > The following is a summary of a search of 175973022 classes in 484751 > artifacts to get a sense of the usage of these methods. > > - 1290 usages of Unsafe.park. 1195 are libraries that have re-packaged some > version of ForkJoinPool, maybe from the jsr166 repo, maybe an older JDK > release. In the remaining, only Ehcache stands out with 29 matches. It seems > to be one usage but the library seems to copied/shaded into other artifacts. > > - 1322 usages of Unsafe.unpark. 1243 are re-packaged ForkJoinPool. 29 > occurrences are Encache, again one usage but the library seems to > copied/shaded into other artifacts. > > - 22 usages of Unsafe.getLoadAverage. Most of these are one usage in many > published versions of Apache HBase. > > - 339 usages of Unsafe.loadFence, 1057 usages of Unsafe.storeFence, 517 > usages of Unsafe.fullFence. A handful of these are libraries with copies of > j.u.concurrent, the rest seem to be high performance libraries that support > older JDK releases or just haven't been updated to use VarHandles yet. Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/15641#pullrequestreview-1618162112
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v33]
On Thu, 31 Aug 2023 21:31:40 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > Change name of the avxsort library to libx86_64_sort Thank you for the summary. > Currently there is a regression of up to 20%, when the intrinsic is disabled. > This has been root caused to two reasons: > > * performance loss due to using a generalized intrinsic for all data > types (`int, long, float, double`) with multiple indirections and `switch()` > in the fallback method implementation. > Ok, i cannot guarantee the fallback lambda approach does not regress :-) > * performance loss due to passing a `pivotIndices` array to the partition > methods and modifying it in place. Passing the pivot indices explicitly
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v33]
On Thu, 31 Aug 2023 21:31:40 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > Change name of the avxsort library to libx86_64_sort Focusing on the Java changes, I am glad you managed to introduce intrinsics into the main Java sorting loop, with leaf array sorting and array partitioning thereby taking advantage of existing functionality. Initially i advised trying to make the intrinsic work across heap and off-heap data, e.g., sorting `MemorySegment`. I still think that is a worthy goal, but we don't need to do that now, but I still think intrinsics should be base+offset shaped in preparation for that. I think we need to summarize future planned steps, some i believe were discussed with regards to
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v10]
On Wed, 6 Sep 2023 10:50:59 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/foreign/Linker.java line 152: >> >>> 150: * >>> 151: * The following table shows some examples of how C types are modelled >>> in Linux/x64 (all the examples provided >>> 152: * here will assume these platform-dependent mappings): >> >> Up to you, but it might be useful to link to the ABI specifications if the >> links are considered stable. > > [This SO question](https://stackoverflow.com/a/40348010) points to a gitlab > repo that seems to have the latest version: > https://gitlab.com/x86-psABIs/x86-64-ABI But, I'm not sure how stable that > is, or if that's an authoritative source. > > Alternatively, we could refer to the name only: "System V Application Binary > Interface - AMD64 Architecture Processor Supplement" (or "x86-64 psABI") Then > people can google for themselves and find it. Yeah, its hard to find the official and latest version. Referring to the full title will help. - PR Review Comment: https://git.openjdk.org/jdk/pull/15103#discussion_r1317517319
Re: RFR: 8312522: Implementation of Foreign Function & Memory API [v10]
On Mon, 4 Sep 2023 15:54:41 GMT, Jorn Vernee wrote: >> This patch contains the implementation of the foreign linker & memory API >> JEP for Java 22. The initial patch is composed of commits brought over >> directly from the [panama-foreign >> repo](https://github.com/openjdk/panama-foreign). The main changes found in >> this patch come from the following PRs: >> >> 1. https://github.com/openjdk/panama-foreign/pull/836 Where previous >> iterations supported converting Java strings to and from native strings in >> the UTF-8 encoding, we've extended the supported encoding to all the >> encodings found in the `java.nio.charset.StandardCharsets` class. >> 2. https://github.com/openjdk/panama-foreign/pull/838 We dropped the >> `MemoryLayout::sequenceLayout` factory method which inferred the size of the >> sequence to be `Long.MAX_VALUE`, as this led to confusion among clients. A >> client is now required to explicitly specify the sequence size. >> 3. https://github.com/openjdk/panama-foreign/pull/839 A new API was added: >> `Linker::canonicalLayouts`, which exposes a map containing the >> platform-specific mappings of common C type names to memory layouts. >> 4. https://github.com/openjdk/panama-foreign/pull/840 Memory access >> varhandles, as well as byte offset and slice handles derived from memory >> layouts, now feature an additional 'base offset' coordinate that is added to >> the offset computed by the handle. This allows composing these handles with >> other offset computation strategies that may not be based on the same memory >> layout. This addresses use-cases where clients are working with 'dynamic' >> layouts, whose size might not be known statically, such as variable length >> arrays, or variable size matrices. >> 5. https://github.com/openjdk/panama-foreign/pull/841 Remove this now >> redundant API. Clients can simply use the difference between the base >> address of two memory segments. >> 6. https://github.com/openjdk/panama-foreign/pull/845 Disambiguate uses of >> `SegmentAllocator::allocateArray`, by renaming methods that both allocate + >> initialize memory segments to `allocateFrom`. (see the original PR for the >> problematic case) >> 7. https://github.com/openjdk/panama-foreign/pull/846 Improve the >> documentation for variadic functions. >> 8. https://github.com/openjdk/panama-foreign/pull/849 Several test fixes to >> make sure the `jdk_foreign` tests can pass on 32-bit machines, taking >> linux-x86 as a test bed. >> 9. https://github.com/openjdk/panama-foreign/pull/850 Make the linker API >> required. The `Linker::nativeLinker` method is not longer allowed to throw >> an `UnsupportedO... > > Jorn Vernee has updated the pull request incrementally with five additional > commits since the last revision: > > - 8315096: Allowed access modes in memory segment should depend on layout > alignment > >Reviewed-by: psandoz > - Add missing @implSpec to AddressLayout > >Reviewed-by: pminborg > - Fix misc typos in FFM API javadoc > >Reviewed-by: jvernee > - Clarify javadoc w.r.t. exceptions thrown by a memory access var handle > (part two) > >Reviewed-by: pminborg > - Clarify javadoc w.r.t. exceptions thrown by a memory access var handle > >Reviewed-by: jvernee Recommend you do a sweep through the code for unused imports and fields and any rogue `@since` in the internal classes. src/java.base/share/classes/java/lang/foreign/AddressLayout.java line 53: > 51: * > 52: * @implSpec > 53: * This class is immutable, thread-safe and href="{@docRoot}/java.base/java/lang/doc-files/ValueBased.html">value-based. Strictly speaking it's the implementations, as stated in the `Linker`: * Implementations of this interface are immutable, thread-safe and value-based. src/java.base/share/classes/java/lang/foreign/Linker.java line 152: > 150: * > 151: * The following table shows some examples of how C types are modelled > in Linux/x64 (all the examples provided > 152: * here will assume these platform-dependent mappings): Up to you, but it might be useful to link to the ABI specifications if the links are considered stable. src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 439: > 437: * > 438: * > 439: * If the provided layout path {@code P} contains no dereference > elements, then the offset of the access operation is Suggestion: * If the provided layout path {@code P} contains no dereference elements, then the offset {@code O} of the access operation is src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 443: > 441: * > 442: * {@snippet lang = "java": > 443: * offset = this.offsetHandle(P).invokeExact(B, I1, I2, ... In); Suggestion: * O = this.offsetHandle(P).invokeExact(B, I1, I2, ... In); To align with the use of `O` later on. src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 536: > 534: * > 535: * >
Re: RFR: 8314544: Matrix multiple benchmark using Vector API [v2]
On Mon, 21 Aug 2023 03:55:42 GMT, Martin Stypinski wrote: >> Added a bunch of different implementations for Vector API Matrix >> Multiplications: >> >> - Baseline >> - Blocked (Cache Local) >> - FMA >> - Vector API Simple Implementation >> - Vector API Blocked Implementation >> >> Commit was discussed with @PaulSandoz > > Martin Stypinski has updated the pull request incrementally with two > additional commits since the last revision: > > - changed for consistency > - improved some RandomGenerator & unuseed Imports Marked as reviewed by psandoz (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/15338#pullrequestreview-1592478060
Re: RFR: 8314544: Matrix multiple benchmark using Vector API
On Fri, 18 Aug 2023 03:57:24 GMT, Martin Stypinski wrote: > Added a bunch of different implementations for Vector API Matrix > Multiplications: > > - Baseline > - Blocked (Cache Local) > - FMA > - Vector API Simple Implementation > - Vector API Blocked Implementation > > Commit was discussed with @PaulSandoz test/micro/org/openjdk/bench/jdk/incubator/vector/MatrixMultiplicationBenchmark.java line 33: > 31: > 32: import org.openjdk.jmh.annotations.*; > 33: import org.openjdk.jmh.infra.Blackhole; Unused import. test/micro/org/openjdk/bench/jdk/incubator/vector/MatrixMultiplicationBenchmark.java line 176: > 174: > 175: private static float[] newFloatRowMatrix(int size) { > 176: Random rand = new Random(); Instead we can use the new random generator API e.g,: var rand = RandomGenerator.getDefault(); This is not super important, but its good practice to try and use more preferred APIs. - PR Review Comment: https://git.openjdk.org/jdk/pull/15338#discussion_r1298830226 PR Review Comment: https://git.openjdk.org/jdk/pull/15338#discussion_r1298830156