Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v9]
On Wed, 15 Nov 2023 17:50:48 GMT, Viktor Klang wrote: >> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461) > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Improvements after feedback Really nicely done. - Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16420#pullrequestreview-1733713548
Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]
On Thu, 16 Nov 2023 05:33:13 GMT, David Holmes wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains five additional >> commits since the last revision: >> >> - Add a bundled native lib in jdk as a bridge to libsleef >> - Merge 'jdk:master' into JDK-8312425 >> - Disable sleef by default >> - Merge 'jdk:master' into JDK-8312425 >> - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF > > So to be clear, now you have to opt-in to using `libsleef` by building a > binary that will use it. That binary will always use `libsleef` if found, so > there is no way to opt-out at runtime. Is that the way it works on x86_64 too? Thanks for adding the labels @dholmes-ora ! > So to be clear, now you have to opt-in to using libsleef by building a binary > that will use it. That binary will always use libsleef if found, so there is > no way to opt-out at runtime. Is that the way it works on x86_64 too? Yes, libsleef is used to build the binary if found. And at runtime, if the libsleef with right version is not found, `dlopen` to the libvmath.so will fail. And then all the operations will be fall-back to the java default implementation. X86_64 has also bundled the Intel's SVML binary to jdk image at build time. And we use the same way loading/opening the library at runtime. - PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1813916971
Re: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v5]
On Fri, 10 Nov 2023 01:49:17 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in >> ProcessTools.createLimitedTestJavaProcessBuilder(). Only >> createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the >> middle of the list. I don't think it makes sense to modify arguments in >> several places so I replaced it with the flag isLimited and moved all >> modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional > commit since the last revision: > > variable was renamed. It is expected from `createLimitedTestJavaProcessBuilder` to execute the process exactly as specified in the test, without any additional changes from jtreg. It is very likely that the test might fail if the process is executed in other modes. I agree that usage of this method is always a potential loss of coverage for vm flags being tested and for virtual threads. So this method should be used only when it is necessary. As well as we should minimize the number of flagless testing. - PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1813899682
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, long, float and double arrays)
On Wed, 15 Nov 2023 22:05:47 GMT, Srinivas Vamsi Parasa wrote: >> make/modules/java.base/Lib.gmk line 245: >> >>> 243: TOOLCHAIN := TOOLCHAIN_LINK_CXX, \ >>> 244: OPTIMIZATION := HIGH, \ >>> 245: CFLAGS := $(CFLAGS_JDKLIB) -std=c++17, \ >> >> This makes me uneasy. We do not in general use C++17 in the JDK. >> >> Is this flag needed for the code to compile? If so, would it be difficult to >> rewrite it not to require C++17 constructs? >> >> Or was it added since you noticed performance increases, not related to the >> new code, by forcing the compiler to use a higher language revision? >> >> We are supporting gcc versions from 6. From what I can tell, C++17 was fully >> introduced in gcc 11. Increasing the lowest supported gcc to 11 would >> require quite a jump, just for this library. >> >> In the worst case, you would need to make the existence of this library >> dependent on gcc version. (It is my understanding that the library is >> optional, and just produces a performance benefits if it exists). > > Hi Magnus, the new x86-simd-sort 4.0 needs C++17 to compile. Will look into > the changes needed for this library to compile without the C++17 standard and > get back to you. > > Thanks, > Vamsi Seems a bit odd to me too as the existing simd code seems to C code residing in .cpp files for some reason. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1395233522
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, long, float and double arrays)
On Tue, 7 Nov 2023 00:12:41 GMT, Srinivas Vamsi Parasa wrote: > The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) and upto ~3x improvement for 64-bit datatypes > (long, double) on Intel TigerLake machine as shown in the performance data > below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) and upto ~2.3x for 64-bit datatypes as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > ArraysSort.longSort | 10 | 0.037 | 0.031 | 1.2 > ArraysSort.longSort | 25 | 0.101 | 0.073 | 1.4 > ArraysSort.longSort | 50 | 0.227 | 0.219 | 1.0 > ArraysSort.longSort | 75 | 0.446 | 0.332 | 1.3 > ArraysSort.longSort | 100 | 0.714 | 0.557 | 1.3 > ArraysSort.longSort | 1000 ... src/java.base/linux/native/libsimdsort/avx2-32bit-qsort.hpp line 3: > 1: /* > 2: * Copyright (c) 2021, 2023, Intel Corporation. All rights reserved. > 3: * Copyright (c) 2021 Serge Sans Paille. All rights reserved. Is this person an OpenJDK Contributor? - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1395236944
Re: RFR: 8317834: java/lang/Thread/IsAlive.java timed out [v2]
On Tue, 14 Nov 2023 17:25:42 GMT, Darragh Clarke wrote: >> Currently the `IsAlive` test occasionally times out. >> >> This PR changes the timeout from `10` to `25` which I think is an adequate >> increase based on the failures I've seen. Though I'd be happy to change it >> to another value if someone thinks `25` isn't ideal. > > Darragh Clarke has updated the pull request incrementally with one additional > commit since the last revision: > > use default timeout Seems fine. If it ever times out again then we know we have a real problem. :) I suspect, as Alan suggested, the `othervm` could be dropped, but I'm not concerned about it staying. Thanks - Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16659#pullrequestreview-1733586933
Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy
On Wed, 15 Nov 2023 17:03:38 GMT, Steve Dohrmann wrote: >> Do you see any concerns while handling multithreaded case where writer is >> busy copying 256 bytes block in loop and reader try to access a location >> still not flushed out of write combining buffer. > > The results a concurrent reader sees could be different if the copy is using > nt writes, but if the read of the destination is not synced with the copy > operation, I think the reader would not see consistent state in either case. > Is it worse with nt writes? Thanks for the clarification, agree behavior is similar to non-NT case, in fact using NT for huge copy operations will prevent polluting caches due to destination cache line fills. - PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1395171526
Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy
On Tue, 14 Nov 2023 07:59:22 GMT, Jatin Bhateja wrote: >> Below is baseline data collected using a modified version of the >> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug >> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake >> i7-1185G7, which does support AVX512. >> >> Baseline data >> Benchmark (arrayKind) (sizeKind) Mode Cnt Score >> Error Units >> -- >> XorTest.copy ELEMENTS SMALL avgt 30 584737355.767 ± >> 60414308.540 ns/op >> XorTest.copy ELEMENTS MEDIUM avgt 30 272248995.683 ± >> 2924954.498 ns/op >> XorTest.copy ELEMENTS LARGE avgt 30 1019200210.900 ± >> 28334453.652 ns/op >> XorTest.copy REGION SMALL avgt 30 7399944.164 ± >> 216821.819 ns/op >> XorTest.copy REGION MEDIUM avgt 3020591454.558 ± >> 147398.572 ns/op >> XorTest.copy REGION LARGE avgt 3021649266.051 ± >> 179263.875 ns/op >> XorTest.copy CRITICAL SMALL avgt 30 51079.357 ± >> 542.482 ns/op >> XorTest.copy CRITICAL MEDIUM avgt 302496.961 ± >> 11.375 ns/op >> XorTest.copy CRITICAL LARGE avgt 30 515.454 ± >> 5.831 ns/op >> XorTest.copy FOREIGN SMALL avgt 30 7558432.075 ± >> 79489.276 ns/op >> XorTest.copy FOREIGN MEDIUM avgt 3019730666.341 ± >> 500505.099 ns/op >> XorTest.copy FOREIGN LARGE avgt 3034616758.085 ± >> 340300.726 ns/op >> XorTest.xor ELEMENTS SMALL avgt 30 219832692.489 ± >> 2329417.319 ns/op >> XorTest.xor ELEMENTS MEDIUM avgt 30 505138197.167 ± >> 3818334.424 ns/op >> XorTest.xor ELEMENTS LARGE avgt 30 1189608474.667 ± >> 5877981.900 ns/op >> XorTest.xorREGION SMALL avgt 3064093872.804 ± >> 599704.491 ns/op >> XorTest.xorREGION MEDIUM avgt 3081544576.454 ± >> 1406342.118 ns/op >> XorTest.xorREGION LARGE avgt 3090091424.883 ± >> 775577.613 ns/op >> XorTest.xor CRITICAL SMALL avgt 3057231375.744 ± >> 438223.342 ns/op >> XorTest.xor CRITICAL MEDIUM avgt 3058583884.930 ± >> 375355.215 ns/op >> XorTest.xor CRITICAL LARGE avgt 3060644832.949 ± >> 588120.738 ns/op >> XorTest.xor FOREIGN SMALL avgt 3073868679.405 ± >> 819965.524 ns/op >> XorTest.xor FOREIGN MEDIUM avgt 3088156275.944 ± >> 1051257.152 ns/op >> Xo... > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 585: > >> 583: __ shlq(temp2, shift); >> 584: __ cmpq(temp2, large_threshold); >> 585: __ jcc(Assembler::greaterEqual, L_copy_large); > > Hi @steveatgh , Can you please share the performance number of other Array > copy JMH micros in following directoy > https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/lang I will still request you to run BM in above path, we may see performance dips for sizes after special cases due to additional comparisons. - PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1395174203
Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]
On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like >> `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, >> which causes large performance gap on AArch64. Note that those APIs are >> optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. >> To close the gap, we would like to optimize these APIs for AArch64 by >> calling a third-party vector library called libsleef [2], which are >> available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and >> implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA >> instructions used stubs in libsleef for most of the operations by default, >> and 2) add the vector calling convention to apply with the runtime calls to >> stub code in libsleef. Note that for those APIs that libsleef does not >> support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an >> experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. >> People can use it to denote the libsleef path/name explicitly. By default, >> it points to the system installed library. If the library does not exist or >> the dynamic loading of it in runtime fails, the math vector ops will >> fall-back to use the default scalar version without error. But a warning is >> printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], >> just with some initial review comments addressed. And now we'd like to get >> some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains five additional > commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF So to be clear, now you have to opt-in to using `libsleef` by building a binary that will use it. That binary will always use `libsleef` if found, so there is no way to opt-out at runtime. Is that the way it works on x86_64 too? - PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1813816824
Re: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3]
On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like >> `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, >> which causes large performance gap on AArch64. Note that those APIs are >> optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. >> To close the gap, we would like to optimize these APIs for AArch64 by >> calling a third-party vector library called libsleef [2], which are >> available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and >> implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA >> instructions used stubs in libsleef for most of the operations by default, >> and 2) add the vector calling convention to apply with the runtime calls to >> stub code in libsleef. Note that for those APIs that libsleef does not >> support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an >> experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. >> People can use it to denote the libsleef path/name explicitly. By default, >> it points to the system installed library. If the library does not exist or >> the dynamic loading of it in runtime fails, the math vector ops will >> fall-back to use the default scalar version without error. But a warning is >> printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], >> just with some initial review comments addressed. And now we'd like to get >> some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains five additional > commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Really this should go to panama-dev but core-libs will have to do. - PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1813811597
Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v11]
On Wed, 15 Nov 2023 19:26:06 GMT, Eirik Bjorsnos wrote: >> Please review this PR which speeds up TestTooManyEntries and clarifies its >> purpose: >> >> - The name 'TestTooManyEntries' does not clearly convey the purpose of the >> test. What is tested is the validation that the total CEN size fits in a >> Java byte array. Suggested rename: CenSizeTooLarge >> - The test creates DEFLATED entries which incurs zlib costs and File Data / >> Data Descriptors for no additional benefit. We can use STORED instead. >> - By creating a single LocalDateTime and setting it with >> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. >> - The name of entries is generated by calling UUID.randomUUID, we could use >> simple counter instead. >> - The produced file is unnecessarily large. We know how large a CEN entry >> is, let's take advantage of that to create a file with the minimal size. >> - By adding a maximally large extra field to the CEN entries, we get away >> with fewer CEN records and save memory >> - The summary and comments of the test can be improved to help explain the >> purpose of the test and how we reach the limit being tested. >> - By writing sparse 'holes' until the last CEN entry, we can reduce required >> disk space. >> >> These speedups reduced the runtime from 4 min 17 sec to 3 seconds on my >> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to ~4K. Memory >> consumption is down from 8GB to something like 12MB. > > Eirik Bjorsnos has updated the pull request incrementally with two additional > commits since the last revision: > > - Fix spelling of LocalDateTime > - Add comment to the SparseOutputStream class noting that instances must be > passed directly to the ZipOutputStream constructor, without buffering. Hello Eirik, the code comment updates look good to me. I went ahead and ran another run our CI with the latest mainline and it too passed fine. I'll go ahead and sponsor this now. - PR Comment: https://git.openjdk.org/jdk/pull/12991#issuecomment-1813811258
Re: RFR: 8318364: Add an FFM-based implementation of harfbuzz OpenType layout [v4]
On Wed, 15 Nov 2023 15:52:43 GMT, Jayathirth D V wrote: >> Phil Race has updated the pull request incrementally with one additional >> commit since the last revision: >> >> indentation > > src/java.desktop/share/classes/sun/font/HBShaper.java line 142: > >> 140: private static final MemorySegment get_contour_pt_stub; >> 141: >> 142: private static final MemorySegment store_layout_results_stub; > > I see this Upcall is named `store_layout_results` in case of FFM and we call > it similar function in JNI case as `storeGVData`. If we can find some common > name and use it will be beneficial in future when we want to compare code > between FFM and JNI implementation. noted, but not something I intend to change here. - PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1395138283
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v7]
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation >> based on hybrid algorithm which initially partially unrolls scalar loop to >> accumulates values from gather indices into a quadword(64bit) slice followed >> by vector permutation to place the slice into appropriate vector lanes, it >> prevents code bloating and generates compact >> JIT sequence. This coupled with savings from expansive array allocation in >> existing java implementation translates into significant performance of >> 1.3-5x gains with included micro. >> >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> 3) Some minor adjustments in existing gather instruction pattens for >> double/quad words. >> >> >> Kindly review and share your feedback. >> >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Fix incorrect comment src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3073: > 3071: .add(offset); > 3072: vix = VectorIntrinsics.checkIndex(vix, a.length); > 3073: } This has finished the boundary checks for the index array. But the drawback is it also generates the index vectors (I mean the load instructions), which may be duplicated with intrinsification on some hardwares like SVE (sve gatter instructions need the index vectors as well), because you passed the index address and offset to compiler. So for SVE, we will need double load instructions than what it actually needs. - PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1395126023
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v7]
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation >> based on hybrid algorithm which initially partially unrolls scalar loop to >> accumulates values from gather indices into a quadword(64bit) slice followed >> by vector permutation to place the slice into appropriate vector lanes, it >> prevents code bloating and generates compact >> JIT sequence. This coupled with savings from expansive array allocation in >> existing java implementation translates into significant performance of >> 1.3-5x gains with included micro. >> >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> 3) Some minor adjustments in existing gather instruction pattens for >> double/quad words. >> >> >> Kindly review and share your feedback. >> >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Fix incorrect comment BTW, I have two questions: 1. An intrinsic which should accept the vector as index like non-subword gather is more benefical in real applications. See: https://github.com/openjdk/panama-vector/pull/201 please. 2. Do you have the plan for adding such optimization for subword scatter in future? Thanks, Xiaohong - PR Comment: https://git.openjdk.org/jdk/pull/16354#issuecomment-1813761378
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v7]
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation >> based on hybrid algorithm which initially partially unrolls scalar loop to >> accumulates values from gather indices into a quadword(64bit) slice followed >> by vector permutation to place the slice into appropriate vector lanes, it >> prevents code bloating and generates compact >> JIT sequence. This coupled with savings from expansive array allocation in >> existing java implementation translates into significant performance of >> 1.3-5x gains with included micro. >> >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> 3) Some minor adjustments in existing gather instruction pattens for >> double/quad words. >> >> >> Kindly review and share your feedback. >> >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Fix incorrect comment I have only minor comment, please address. No need for re-review from my side. - Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1733388609
Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan wrote: >> match_rule_supported_vector called in the beginning will enforce these >> checks. > > This method is match_rule_support_vector and it is not enforcing this check > now. It was doing so before through fall through. I will be more comfortable if either we add the size_in_bits == 64 check in LoadVectorGatherMasked for non subword types or document it properly via comment on why it is covered. - PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1395080418
Re: RFR: 8318364: Add an FFM-based implementation of harfbuzz OpenType layout [v4]
On Tue, 7 Nov 2023 00:37:47 GMT, Phil Race wrote: > > > So we have somewhere around a fixed 125ms startup cost for the FFM case - > > > as measured on my Mac, > > > but only 35-40ms of that is attributable to the specific needs of layout. > > > > > > That looks unfortunate. I guess if we will start to use ffm in other places > > we can easily spend of 1 second budget on startup=( > > Yes, this case is sufficiently uncommon, that it is OK, and is a decent way > to help us track improvements to FFM. But it would be another matter to have > to do it for however many of our core software loops and AWT window manager > interaction calls we need to get running for a minimal app. > > > > layoutCnt=16000 total=193ms <<< app fully displayed > > > vs > > > layoutCnt=16000 total=453ms <<< app fully displayed > > > > > > It looks strange that 16000 calls are not enough to warmup, and the > > difference is so large. > > I am not a C2 expert, (not even an amateur), I just assume that it takes a > lot of calls to be fully optimized. @JornVernee this looks suspicious and seems unrelated to the cold startup issues we discussed before. - PR Comment: https://git.openjdk.org/jdk/pull/15476#issuecomment-1813640596
Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v9]
On Wed, 15 Nov 2023 17:50:48 GMT, Viktor Klang wrote: >> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461) > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Improvements after feedback Elegantly and thoroughly done (I mostly focused on the API and implementation). I have been following the work and providing ongoing feedback hence no specific comments here. Going preview now will allow for some additional time and feedback on the ergonomics of Gatherer construction. There is also the intriguing prospect of further work to replace stream internals if the performance holds up (the ability to optimize when composing gathers seems key here as you already have explored in the performance tests - the runtime compiler should be able to see through the shorter paths more easily). - Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16420#pullrequestreview-1733265473
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref
On Tue, 14 Nov 2023 14:06:29 GMT, Alan Bateman wrote: >> Classes in the `java.lang.ref` package would benefit from an update to bring >> the spec in line with how the VM already behaves. The changes would focus on >> _happens-before_ edges at some key points during reference processing. >> >> A couple key things we want to be able to say are: >> - `Reference.reachabilityFence(x)` _happens-before_ reference processing >> occurs for 'x'. >> - `Cleaner.register()` _happens-before_ the Cleaner thread runs the >> registered cleaning action. >> >> This will bring Cleaner in line (or close) with the memory visibility >> guarantees made for finalizers in [JLS >> 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): >> _"There is a happens-before edge from the end of a constructor of an object >> to the start of a finalizer (§12.6) for that object."_ > > src/java.base/share/classes/java/lang/ref/Reference.java line 393: > >> 391: * Clears this reference object. Invoking this method does not >> enqueue this >> 392: * object, and the garbage collector will no longer enqueue this >> object once >> 393: * the referent reaches the designated reachability level. > > I'm wondering if "designated reachability level" is the right words to use > here. The Notification section in the package description speaks of when the > reachability of the referent has changed to the value corresponding to the > type of the reference and I wonder if it might be better to use wording > consistent with that. Minimally it could link to the package description > where the notion of reachability level is introduced. Perhaps it's sufficient to say simply that, _"the garbage collector will no longer enqueue this object."_ - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394969775
Re: RFR: 8320199: Fix HTML 5 errors in java.math.BigInteger
On Wed, 15 Nov 2023 21:59:56 GMT, Brian Burkhalter wrote: > Clean up HTML error due to nested anchor (``) elements. Marked as reviewed by iris (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16679#pullrequestreview-1733201186
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref
On Mon, 13 Nov 2023 22:31:16 GMT, Brent Christian wrote: > Classes in the `java.lang.ref` package would benefit from an update to bring > the spec in line with how the VM already behaves. The changes would focus on > _happens-before_ edges at some key points during reference processing. > > A couple key things we want to be able to say are: > - `Reference.reachabilityFence(x)` _happens-before_ reference processing > occurs for 'x'. > - `Cleaner.register()` _happens-before_ the Cleaner thread runs the > registered cleaning action. > > This will bring Cleaner in line (or close) with the memory visibility > guarantees made for finalizers in [JLS > 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): > _"There is a happens-before edge from the end of a constructor of an object > to the start of a finalizer (§12.6) for that object."_ src/java.base/share/classes/java/lang/ref/Reference.java line 564: > 562: * {@code reachabilityFence(x)} > 563: * href="{@docRoot}/java.base/java/util/concurrent/package-summary.html#MemoryVisibility">happen-before > 564: * the garbage collector clears any reference to {code x}. This is a fairly low-level specification, so the relationship described here is "reachabilityFence **hb** clearing". The relationships "clearing **hb** enqueue" and "enqueue **hb** dequeue" are described elsewhere. Thus the mention of Cleaner above seems misplaced. However, the whole chain of relationships > RF **hb** clearing **hb** enqueue **hb** dequeue **hb** cleaning-action is important. Should this be described somewhere? src/java.base/share/classes/java/lang/ref/ReferenceQueue.java line 187: > 185: * {@link Reference#enqueue()} instead of by the garbage collector, > its > 186: * former referent (which has since been cleared) could still be > strongly > 187: * reachable. See my comments on the corresponding Reference.enqueue() apiNote. I'm not sure we want to have this issue sprinkled around in the apiNotes for several methods, including others here on ReferenceQueue. If we do want to address this issue -- and it's not clear to me that we do -- then perhaps we should do it in a central location? - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394941127 PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394944893
Re: RFR: 8319567: Update java/lang/invoke tests to support vm flags [v2]
On Wed, 15 Nov 2023 02:39:56 GMT, Mandy Chung wrote: >> This PR includes test fixes for the following issues: >> >> 8319567: Update java/lang/invoke tests to support vm flags >> 8319568: Update java/lang/reflect/exeCallerAccessTest/CallerAccessTest.java >> to accept vm flags >> 8319672: Several classloader tests ignore VM flags >> 8319676: A couple of jdk/modules/incubator/ tests ignore VM flags >> 8319677: Test jdk/internal/misc/VM/RuntimeArguments.java should be marked as >> flagless >> >> It converts the test to use `ProcessTools::createTestJavaProcessBuilder` or >> `createNativeTestJavaProcessBuilder` so that the test will support VM flags >> passed to jtreg. A couple tests that ignore VM flags should use >> `ProcessTools::createLimtiedTestJavaProcessBuilder` and marks the test with >> `@requires vm.flagless`. > > Mandy Chung has updated the pull request incrementally with one additional > commit since the last revision: > > review feedback Marked as reviewed by jvernee (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/1#pullrequestreview-1733180278
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref
On Mon, 13 Nov 2023 22:31:16 GMT, Brent Christian wrote: > Classes in the `java.lang.ref` package would benefit from an update to bring > the spec in line with how the VM already behaves. The changes would focus on > _happens-before_ edges at some key points during reference processing. > > A couple key things we want to be able to say are: > - `Reference.reachabilityFence(x)` _happens-before_ reference processing > occurs for 'x'. > - `Cleaner.register()` _happens-before_ the Cleaner thread runs the > registered cleaning action. > > This will bring Cleaner in line (or close) with the memory visibility > guarantees made for finalizers in [JLS > 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): > _"There is a happens-before edge from the end of a constructor of an object > to the start of a finalizer (§12.6) for that object."_ src/java.base/share/classes/java/lang/ref/Reference.java line 559: > 557: * triggering garbage collection. This method is applicable only > 558: * when reclamation may have visible effects, > 559: * such as for objects with finalizers or that use Cleaner. Should reference processing be mentioned here? - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394936289
Re: RFR: 8311302: Allow for jlinking a custom runtime without packaged modules being present [v8]
On Wed, 15 Nov 2023 21:21:07 GMT, Severin Gehwolf wrote: > Note that plugins like `--add-options` have been modified so that only the > options passed at the current CLI will propagate to the final image. So in > this case, `image1` would have property `com.foo.XYZ` set, but not `image2`. > Incidentally, what you probably intended to use was `--add-options > "-Dcom.foo.XYZ=true -Dcom.acme.name=BAR". > > Having said that, depending on the contents of `argfile`, those could be > equivalent. In fact, they are with `--unlock-run-image`, and an empty > `argfile`. `--unlock-run-image` avoids adding the marker file, which is the > only difference when we extract the image. > > `--save-jlink-argfiles` brings a strange angle to this discussion, but it's > conceivable to get a similarly different image, even with > --keep-packaged-modules. Right. This example intends to show that the behavior is not straight-forward for users to follow and also subject to the implementation of each of the plugins. I think we need an easy-to-understand model for developers to understand. Some possible options: Option 1: all plugins applied in `image1` are _auto-applied_ to `image2` by default Option 2: all plugins applied in `image1` are _not applied_ to `image2` by default When there is an exception, it should be documented clearly by the plugin (possibly in the output from `--list-plugins`). I also think option 1 may be more useful to the developers. I'm not sure how many plugins can undo the transformation done in `image1` when creating `image2`. For example, with option 1, $ jdk22/bin/jlink --add-modules jdk.compiler,jdk.jlink --output image1 --vendor-bug.url https://xyz.com/bugs --save-jlink-argfiles argfile --generate-jli-classes jli_trace.txt --strip-debug --add-options "-Dcom.foo.XYZ=true" $ image1/bin/jlink --add-modules jdk.jlink --output image2 --add-options "-Dcom.acme.name=BAR" # equivalent to: $ jdk22/bin/jlink --add-modules jdk.jlink --output image2 --vendor-bug.url https://xyz.com/bugs --save-jlink-argfiles argfile --generate-jli-classes jli_trace.txt --strip-debug --add-options "-Dcom.foo.XYZ=true -Dcom.acme.name=BAR" $ image1/bin/jlink --add-modules java.base --output image3 --vendor-bug.url https://com.acme/bugs # equivalent to: $ jdk22/bin/jlink --add-modules java.base --output image3 --generate-jli-classes jli_trace.txt --strip-debug --add-options "-Dcom.foo.XYZ=true" --vendor-bug.url https://com.acme/bugs Discussion points are: - `--save-jlink-argfiles` is only applicable when `jdk.jlink` is added to the custom module. I think this one is not an issue. - `--add-options` concatenates the options? - `-vendor-bug.url https://xyz.com/bugs --vendor-bug.url https://com.acme/bugs` last one wins? We need to go through each plugin and decide on its behavior. I'm also pondering how the Plugin API should support this run-time image based linking. - PR Comment: https://git.openjdk.org/jdk/pull/14787#issuecomment-1813370985
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref
On Mon, 13 Nov 2023 22:31:16 GMT, Brent Christian wrote: > Classes in the `java.lang.ref` package would benefit from an update to bring > the spec in line with how the VM already behaves. The changes would focus on > _happens-before_ edges at some key points during reference processing. > > A couple key things we want to be able to say are: > - `Reference.reachabilityFence(x)` _happens-before_ reference processing > occurs for 'x'. > - `Cleaner.register()` _happens-before_ the Cleaner thread runs the > registered cleaning action. > > This will bring Cleaner in line (or close) with the memory visibility > guarantees made for finalizers in [JLS > 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): > _"There is a happens-before edge from the end of a constructor of an object > to the start of a finalizer (§12.6) for that object."_ src/java.base/share/classes/java/lang/ref/Reference.java line 489: > 487: * If this reference was already enqueued (by the garbage > collector, or a > 488: * previous call to {@code enqueue}), this method is not > successful, > 489: * and returns false. If we're going to talk about successful and unsuccessful calls, we should define what success is first. I guess that would be something like: if this ref is registered with a queue and not already enqueued, it is enqueued successfully and the method returns true. Otherwise, not successful, and returns false. src/java.base/share/classes/java/lang/ref/Reference.java line 495: > 493: * href="{@docRoot}/java.base/java/util/concurrent/package-summary.html#MemoryVisibility">happen-before > 494: * the reference is removed from the queue by {@link > ReferenceQueue#poll} > 495: * or {@link ReferenceQueue#remove}. Should there be an explicit statement about an unsuccessful call having no memory consistency effects? src/java.base/share/classes/java/lang/ref/Reference.java line 505: > 503: * to return this reference even though the referent is still > strongly > 504: * reachable. That is, before the referent has reached the expected > 505: * reachability level. The wording here is kind of awkward. We could try to wordsmith it, but there is also the possibility that there are valid use cases for using enqueue(), which makes this caution seem misplaced. An alternative would simply be to omit this paragraph. - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394916147 PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394922243 PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394927265
Re: RFR: 8314480: Memory ordering spec updates in java.lang.ref
On Mon, 13 Nov 2023 22:31:16 GMT, Brent Christian wrote: > Classes in the `java.lang.ref` package would benefit from an update to bring > the spec in line with how the VM already behaves. The changes would focus on > _happens-before_ edges at some key points during reference processing. > > A couple key things we want to be able to say are: > - `Reference.reachabilityFence(x)` _happens-before_ reference processing > occurs for 'x'. > - `Cleaner.register()` _happens-before_ the Cleaner thread runs the > registered cleaning action. > > This will bring Cleaner in line (or close) with the memory visibility > guarantees made for finalizers in [JLS > 17.4.5](https://docs.oracle.com/javase/specs/jls/se18/html/jls-17.html#jls-17.4.5): > _"There is a happens-before edge from the end of a constructor of an object > to the start of a finalizer (§12.6) for that object."_ src/java.base/share/classes/java/lang/ref/Reference.java line 401: > 399: * Avoid this race by ensuring the referent remains > strongly-reachable until > 400: * after the call to clear(), using {@link > #reachabilityFence(Object)} if necessary. > 401: * The text about the potential race condition and the recommendation to use reachabilityFence() should be in an apiNote at the end of the method spec, after the following paragraph. - PR Review Comment: https://git.openjdk.org/jdk/pull/16644#discussion_r1394908730
Re: RFR: 8320199: Fix HTML 5 errors in java.math.BigInteger
On Wed, 15 Nov 2023 21:59:56 GMT, Brian Burkhalter wrote: > Clean up HTML error due to nested anchor (``) elements. Marked as reviewed by lancea (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/16679#pullrequestreview-1733121214
Re: RFR: 8320199: Fix HTML 5 errors in java.math.BigInteger
On Wed, 15 Nov 2023 21:59:56 GMT, Brian Burkhalter wrote: > Clean up HTML error due to nested anchor (``) elements. Looks good; thanks for fixing this. - Marked as reviewed by darcy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16679#pullrequestreview-1733118262
Re: RFR: 8320199: Fix HTML 5 errors in java.math.BigInteger
On Wed, 15 Nov 2023 21:59:56 GMT, Brian Burkhalter wrote: > Clean up HTML error due to nested anchor (``) elements. Looks fine - Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16679#pullrequestreview-1733111801
Re: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4]
On Wed, 15 Nov 2023 15:23:48 GMT, Raffaello Giulietti wrote: >> Roger Riggs has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Update PPC implementation of string_compress to return the index of the >> non-latin1 char >> Patch supplied by TheRealMDoerr > > test/jdk/java/lang/String/StringRacyConstructor.java line 110: > >> 108: for (int i = 0; i < orig.length(); i++) >> 109: accum |= orig.charAt(i); >> 110: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; > > I think this assumes that compact strings are enabled during this test. Correct, the test should be enabled iff COMPACT_STRINGS is true. > test/jdk/java/lang/String/StringRacyConstructor.java line 190: > >> 188: if (printWarningCount == 0) { >> 189: printWarningCount = 1; >> 190: System.out.println("StringUTF16.compress returned >> 0, may not be intrinsic"); > > It seems to me that the Java code for `StringUTF16.compress` also returns the > index of the non-latin1 char, so I'm not sure I understand this. Just caution? There are separate implementations of the intrinsic for each platform. This test would help identify the platforms on which the intrinsic had not been updated to return the index instead of zero. > test/jdk/java/lang/String/StringRacyConstructor.java line 288: > >> 286: } >> 287: if (i >= 1_000_000) { >> 288: System.err.printf("Unable to produce a UTF16 string >> in %d iterations: %s%n", i, original); > > AFAIU, this writes to `System.err` on "success". Is this the intent? System.out is more appropriate for an informational message that a great many attempts had been made to produce an unexpected coder without success. > test/jdk/java/lang/String/StringRacyConstructor.java line 400: > >> 398: @Override >> 399: public int length() { >> 400: return aString.length() + 1; > > Not sure why ` + 1`. Removed; it was a leftover from a prior way to throw an exception during `CharSequence.charAt(n)`. - PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394889724 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394885799 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394892328 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394883379
Re: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v5]
> Strings, after construction, are immutable but may be constructed from > mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the > arrays during construction that might invalidate internal invariants for the > correct behavior of operations on the resulting strings. In particular, a > number of operations have optimizations for operations on pairs of latin1 > strings and pairs of non-latin1 strings, while operations between latin1 and > non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to > indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from > the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one > non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or > another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 > with the added verification that a non-latin1 character is present at the > same index. > If that character is found to be latin1, then the input array has been > modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an > existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; > that scan determines whether the latin1 or the non-latin1 representation is > returned. > > - The methods that scan for non-latin1 characters and their intrinsic > implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be > guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: - Cleanup of test with review comment recommendations - Enable racy constructor tests iff COMPACT_STRINGS is true Test of string_compress intrinsic is always enabled - Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/08f365f9..b84d09db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16425=04 - incr: https://webrevs.openjdk.org/?repo=jdk=16425=03-04 Stats: 30 lines in 1 file changed: 23 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425
Re: RFE: support safely wrapping restricted FFM calls
Hi Rob, honestly I think this all looks ok (as in: I don't think you need to do anything in particular). You are writing a library that will need to create some downcall method handles and dereference some pointers. You need unsafe access, so users of your library will need to pass your module name to the --enable-native-access flag. It is true that clients of your library will be able to, indirectly (through your framework), perform unsafe operations, but _only if_ they have explictly trusted your library module on the command line. At that point they get full access to your library (and the pointers it generate). This is not dissimilar, if you think about it, to what happens with method handles. E.g. if a client A creates a method handle for a method M that is _only_ accessible from A, it is then free to publish that method handle to be used by other clients, including a client B who might not otherwise have access to M. This is fine - method handles are _capabilities_ that are granted at the moment they are obtained (via a lookup object). Your library acts in a similar way - it creates a lot of unsafe stuff which is then exposed (as safely as possible, I hope :-) ), to a client that would not otherwise have access to such unsafe capabilities. P.S. If you _really_ want to ban clients that don't have enable native access supported, I propose that your API should take a MethodHandle.Lookup, and then you could check the module of the lookup class, and throw an exception if native access is not enabled: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/invoke/MethodHandles.Lookup.html#lookupClass() https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Module.html#isNativeAccessEnabled() You can even throw if the lookup doesn't have enough access: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/invoke/MethodHandles.Lookup.html#hasFullPrivilegeAccess() Maurizio On 15/11/2023 17:13, Rob Spoor wrote: Hello all, I'm working on a module that makes working with FFM easier; think of something like JNA. For instance, it allows creating structures without having to manually manage var handles etc. My module uses restricted mehods like AddressLayout.withTargetLayout to support pointers. Those correctly give warnings if I don't use --enable-native-access. This is where I've identified a potential security risk. Native access would need to be enabled for *my* module, which would allow modules that use my module to call these restricted methods indirectly and without needing native access enabled themselves. This means that any malicious module could piggy-back on the native access that would be enabled for my module. I can implement my own access checks using the following: StackWalker.getInstance(Set.of(Option.RETAIN_CLASS_REFERENCE)) .getCallerClass() .getModule() .isNativeAccessEnabled() However, that would mean users of my module would need to provide access using two different mechanisms. I think that making some existing code public could help situations like mine: * Changing the visibility of java.lang.Module.ensureNativeAccess from package-private to public would allow me to check access using the JVM's own mechanism, in combination with the StackWalker class to get the caller (current) class and its module. Alternatively, new instance method Class.ensureNativeAccess(owner, methodName) could delegate to Reflection.ensureNativeAccess(this, owner, methodName) to make sure that a different module couldn't be used instead, or static method Class.ensureNativeAccess(currentClass, owner, methodName) could delegate to Reflection.ensureNativeAccess to support null classes. * Moving jdk.internal.javac.Restricted to java.lang.foreign would allow me to easily document that methods are restricted. There is an alternative in using --add-exports to access jdk.internal.reflect (for Reflection.ensureNativeAccess that indirectly calls Module.ensureNativeAccess) and jdk.internal.javac (for Restricted), but that adds another burden on callers. In this case, a burden that cannot be easily remedied using a manifest entry (Enable-Native-Access). Kind regards, Rob
Re: RFR: 8320199: Fix HTML 5 errors in java.math.BigInteger
On Wed, 15 Nov 2023 21:59:56 GMT, Brian Burkhalter wrote: > Clean up HTML error due to nested anchor (``) elements. HTML Tidy (`tidy -e`) output after 6cc6f8b66cf6819efd3a8c73701722ee8539257b: Info: Document content looks like HTML5 No warnings or errors were found. Generated HTML looks okay and the link to the `BigInteger` anchor from `BigDecimal` works. - PR Comment: https://git.openjdk.org/jdk/pull/16679#issuecomment-181897
RFR: 8320199: Fix HTML 5 errors in java.math.BigInteger
Clean up HTML error due to nested anchor (``) elements. - Commit messages: - 8320199: Fix HTML 5 errors in java.math.BigInteger Changes: https://git.openjdk.org/jdk/pull/16679/files Webrev: https://webrevs.openjdk.org/?repo=jdk=16679=00 Issue: https://bugs.openjdk.org/browse/JDK-8320199 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16679.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16679/head:pull/16679 PR: https://git.openjdk.org/jdk/pull/16679
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, long, float and double arrays)
On Wed, 15 Nov 2023 15:15:37 GMT, Magnus Ihse Bursie wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) and upto ~3x improvement for 64-bit datatypes >> (long, double) on Intel TigerLake machine as shown in the performance data >> below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) and upto ~2.3x for 64-bit datatypes as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 >> ArraysSort.longSort | 10 | 0.037 | 0.031 | 1.2 >> ArraysSort.longSort | 25 | 0.101 | 0.073 | 1.4 >> ArraysSort.longSort | 50 | 0.227 | 0.219 | 1.0 >> ArraysSort.longS... > > make/modules/java.base/Lib.gmk line 245: > >> 243: TOOLCHAIN := TOOLCHAIN_LINK_CXX, \ >> 244: OPTIMIZATION := HIGH, \ >> 245: CFLAGS := $(CFLAGS_JDKLIB) -std=c++17, \ > > This makes me uneasy. We do not in general use C++17 in the JDK. > > Is this flag needed for the code to compile? If so, would it be difficult to > rewrite it not to require C++17 constructs? > > Or was it added since you noticed performance increases, not related to the > new code, by forcing the compiler to use a higher language revision? > > We are supporting gcc versions from 6. From what I can tell, C++17 was fully > introduced in gcc 11. Increasing the lowest supported gcc to 11 would > require quite a jump, just for this library. > > In the worst case, you would need to make the existence of this library > dependent on gcc version. (It is my understanding that the library is > optional, and just produces a performance benefits if it exists). Hi Magnus, the new x86-simd-sort 4.0 needs C++17 to compile. Will look into the changes needed for this library to compile without the C++17 standard and get back to you. Thanks, Vamsi - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1394882549
Re: RFR: 8311302: Allow for jlinking a custom runtime without packaged modules being present [v8]
On Tue, 14 Nov 2023 15:21:47 GMT, Severin Gehwolf wrote: >> Please review this patch which adds a "jmodless" jlink mode to the JDK. >> Fundamentally this patch adds an option to use `jlink` even though your JDK >> install might not come with the packaged modules (directory `jmods`). This >> is particularly useful to further reduce the size of a jlinked runtime. >> After the removal of the concept of a JRE, a common distribution mechanism >> is still the full JDK with all modules and packaged modules. However, >> packaged modules can incur an additional size tax. For example in a >> container scenario it could be useful to have a base JDK container including >> all modules, but without also delivering the packaged modules. This comes at >> a size advantage of `~25%`. Such a base JDK container could then be used to >> `jlink` application specific runtimes, further reducing the size of the >> application runtime image (App + JDK runtime; as a single image *or* >> separate bundles, depending on the app being modularized). >> >> The basic design of this approach is to add a jlink plugin for tracking >> non-class and non-resource files of a JDK install. I.e. files which aren't >> present in the jimage (`lib/modules`). This enables producing a >> `JmodLessArchive` class which has all the info of what constitutes the final >> jlinked runtime. >> >> Basic usage example: >> >> $ diff -u <(./bin/java --list-modules --limit-modules java.se) >> <(../linux-x86_64-server-release/images/jdk/bin/java --list-modules >> --limit-modules java.se) >> $ diff -u <(./bin/java --list-modules --limit-modules java.se) >> <(../linux-x86_64-server-release/images/jdk/bin/java --list-modules >> --limit-modules jdk.jlink) >> $ ls ../linux-x86_64-server-release/images/jdk/jmods >> java.base.jmodjava.net.http.jmod java.sql.rowset.jmod >> jdk.crypto.ec.jmod jdk.internal.opt.jmod >> jdk.jdi.jmod jdk.management.agent.jmod jdk.security.auth.jmod >> java.compiler.jmodjava.prefs.jmod java.transaction.xa.jmod >> jdk.dynalink.jmod jdk.internal.vm.ci.jmod >> jdk.jdwp.agent.jmod jdk.management.jfr.jmodjdk.security.jgss.jmod >> java.datatransfer.jmodjava.rmi.jmodjava.xml.crypto.jmod >> jdk.editpad.jmod jdk.internal.vm.compiler.jmod >> jdk.jfr.jmod jdk.management.jmodjdk.unsupported.desktop.jmod >> java.desktop.jmod java.scripting.jmod java.xml.jmod >> jdk.hotspot.agent.jmod jdk.internal.vm.compiler.management.jmod >> jdk.jlink.jmod jdk.naming.dns.j... > > Severin Gehwolf has updated the pull request incrementally with four > additional commits since the last revision: > > - First pass at 'run-image' => 'run-time image' > - Remove 'Please double check!' phrase. > - Don't show --add-run-image-resources plugin in listing > - Rename resource file to jdk_internal_runimage Thanks. > How can the user know what plugins are applied to `image2`? i.e. what is the > jlink command to produce `image2` if running from `jdk22` with the packaged > modules present? The only way to know is by knowing the chain of `jlink` commands yielding up to the final image. Let `jlink'` be the jlink using packaged modules. Currently this can be at most two. In addition, the contents of `argfile` needs to be known. That doesn't seem to be very different to the status quo, though. See below. > Reading the changes, I'm not sure but I think it's not equivalent to: > > ``` > jdk22/bin/jlink --add-modules jdk.jlink --output image2 --vendor-bug.url > https://xyz.com/bugs --save-jlink-argfiles argfile --generate-jli-classes > jli_trace.txt --strip-debug --add-options "-Dcom.foo.XYZ=true" --add-options > "-Dcom.acme.name=BAR" > ``` Note that plugins like `--add-options` have been modified so that only the options passed at the current CLI will propagate to the final image. So in this case, `image1` would have property `com.foo.XYZ` set, but not `image2`. Incidentally, what you probably intended to use was `--add-options "-Dcom.foo.XYZ=true -Dcom.acme.name=BAR". Having said that, depending on the contents of `argfile`, those could be equivalent. In fact, they are with `--unlock-run-image`, and an empty `argfile`. `--unlock-run-image` avoids adding the marker file, which is the only difference when we extract the image. `--save-jlink-argfiles` brings a strange angle to this discussion, but it's conceivable to get a similarly different image, even with --keep-packaged-modules. Consider: echo '--add-options=-XX:+UseParallelGC' > argfile ./jdk22/bin/jlink --save-jlink-argfiles argfile --strip-debug --add-modules jdk.jlink --keep-packaged-modules ./image_a/jmods --output image_a ./image_a/bin/jlink --add-modules java.base --output image_b It's not very apparent that `image_b` will have `-XX:+UseParallelGC`. So you need some
Re: RFR: JDK-8313764: Offer JVM HS functionality to shared lib load operations done by the JDK codebase [v2]
On Wed, 23 Aug 2023 15:18:03 GMT, Matthias Baesken wrote: >> Currently there is a number of functionality that would be interesting to >> have for shared lib load operations in the JDK C code. >> Some examples : >> Events::log_dll_message for hs-err files reporting >> JFR event NativeLibraryLoad >> There is the need to update the shared lib Cache on AIX ( see >> LoadedLibraries::reload() , see also >> https://bugs.openjdk.org/browse/JDK-8314152 ), >> this is currently not fully in sync with libs loaded form jdk c-libs and >> sometimes reports outdated information >> >> Offer an interface (e.g. jvm.cpp) to support this. > > Matthias Baesken has updated the pull request incrementally with one > additional commit since the last revision: > > windows aarch64 build issues src/java.desktop/unix/native/common/awt/fontpath.c line 53: > 51: /* for dlopen */ > 52: #include > 53: I have no idea why we would want to use some VM internal in the desktop module. Please take the desktop module changes out of this PR. - PR Review Comment: https://git.openjdk.org/jdk/pull/15264#discussion_r1394756424
Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v6]
On Wed, 15 Nov 2023 20:13:15 GMT, Eirik Bjorsnos wrote: >> src/java.base/share/classes/java/util/zip/ZipInputStream.java line 581: >> >>> 579: if ((flag & 8) == 8) { >>> 580: /* "Data Descriptor" present */ >>> 581: if (hasZip64Extra(e) || >> >> You probably want to consider updating `readLOC` to make sure the extralen >> is != 0 if the appropriate fields are set to either 0x or 0x or >> update `hasZip64Extra` to do the validation > > I think I prefer keeping this PR maintaining a strict focus on expanding the > set of readable files to include those that use Zip64 extra fields for < 2GB > entries with data descriptors. > > Would you be ok with that? > > Adding validation to `readLOC` is a fair effort, but I would prefer this to > be done in a separate PR, similar to your work on adding Zip64 validation to > ZipFile. > > I wouldn't mind looking into that, but perhaps you would like to handle it, > given your comment above about spending some time on `ZipInputStream` in the > following days? A follow on PR is fine. Why don't you take that on seeing you already have your sleeves rolled up in this area :-) - PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1394744061
Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v6]
On Tue, 14 Nov 2023 11:49:11 GMT, Lance Andersen wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Add a @bug reference in the test > > src/java.base/share/classes/java/util/zip/ZipInputStream.java line 581: > >> 579: if ((flag & 8) == 8) { >> 580: /* "Data Descriptor" present */ >> 581: if (hasZip64Extra(e) || > > You probably want to consider updating `readLOC` to make sure the extralen is > != 0 if the appropriate fields are set to either 0x or 0x or > update `hasZip64Extra` to do the validation I think I prefer keeping this PR maintaining a strict focus on expanding the set of readable files to include those that use Zip64 extra fields for < 2GB entries with data descriptors. Would you be ok with that? Adding validation to `readLOC` is a fair effort, but I would prefer this to be done in a separate PR, similar to your work on adding Zip64 validation to ZipFile. I wouldn't mind looking into that, but perhaps you would like to handle it, given your comment above about spending some time on `ZipInputStream` in the following days? > src/java.base/share/classes/java/util/zip/ZipInputStream.java line 689: > >> 687: return switch (blockSize) { >> 688: case 8, 16 -> true; >> 689: default -> false; > > Also from 4.5.3: > >> This entry in the Local header MUST include BOTH original and compressed >> file size fields > > So I believe the minimum value is 16 given both fields must be present So it seems 16 is the only valid size for a Zip64 block in a LOC header. Updated the code to reflect that and also added a comment referencing that original and uncompressed sizes are both required in a LOC header. - PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1394742755 PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1394743998
Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v6]
On Mon, 13 Nov 2023 19:02:28 GMT, Lance Andersen wrote: >> Eirik Bjorsnos has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Add a @bug reference in the test > > test/jdk/java/util/zip/ZipInputStream/Zip64DataDescriptor.java line 57: > >> 55: public void setup() { >> 56: /* >> 57: * Structure of the ZIP64 file used below . Note the precense > > typo: **precense** Thanks, fixed. > test/jdk/java/util/zip/ZipInputStream/Zip64DataDescriptor.java line 63: > >> 61: * The file was produced using the following command: >> 62: * echo hello | zip -fd > hello.zip >> 63: * > > Please document which zip command(and options) is being used by the above Documented the zip command used (zip 3.0, by Info-ZIP), the use of stdin/streaming to enable Zip64, and the use of -fd to force the use of data descriptors. > test/jdk/java/util/zip/ZipInputStream/Zip64DataDescriptor.java line 149: > >> 147: */ >> 148: private void setExtraSize(short size) { >> 149: int extSizeOffset = 33; > > I would suggest making this a constant. Either way I would like to have a > comment added indicating that the value represents of offset of the extra > length size in the LOC Header for `zip64File` used by the test Extracted the constant `ZIP64_BLOCK_SIZE_OFFSET` with a comment - PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1394736922 PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1394738124 PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1394737247
Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v8]
> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the > number of compressed or uncompressed bytes read from the inflater is larger > than the Zip64 magic value. > > While the ZIP format mandates that the data descriptor `SHOULD be stored in > ZIP64 format (as 8 byte values) when a file's size exceeds 0x`, it > also states that `ZIP64 format MAY be used regardless of the size of a file`. > For such small entries, the above assumption does not hold. > > This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the > ZipEntry includes a Zip64 extra information field. This brings ZipInputStream > into alignment with the APPNOTE format spec: > > > When extracting, if the zip64 extended information extra > field is present for the file the compressed and > uncompressed sizes will be 8 byte values. > > > While small Zip64 files with 8-byte data descriptors are not commonly found > in the wild, it is possible to create one using the Info-ZIP command line > `-fd` flag: > > `echo hello | zip -fd > hello.zip` > > The PR also adds a test verifying that such a small Zip64 file can be parsed > by ZipInputStream. Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision: Extract ZIP64_BLOCK_SIZE_OFFSET as a constant - Changes: - all: https://git.openjdk.org/jdk/pull/12524/files - new: https://git.openjdk.org/jdk/pull/12524/files/a77147bc..b6d0a0ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12524=07 - incr: https://webrevs.openjdk.org/?repo=jdk=12524=06-07 Stats: 5 lines in 1 file changed: 3 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12524.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12524/head:pull/12524 PR: https://git.openjdk.org/jdk/pull/12524
Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v7]
> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the > number of compressed or uncompressed bytes read from the inflater is larger > than the Zip64 magic value. > > While the ZIP format mandates that the data descriptor `SHOULD be stored in > ZIP64 format (as 8 byte values) when a file's size exceeds 0x`, it > also states that `ZIP64 format MAY be used regardless of the size of a file`. > For such small entries, the above assumption does not hold. > > This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the > ZipEntry includes a Zip64 extra information field. This brings ZipInputStream > into alignment with the APPNOTE format spec: > > > When extracting, if the zip64 extended information extra > field is present for the file the compressed and > uncompressed sizes will be 8 byte values. > > > While small Zip64 files with 8-byte data descriptors are not commonly found > in the wild, it is possible to create one using the Info-ZIP command line > `-fd` flag: > > `echo hello | zip -fd > hello.zip` > > The PR also adds a test verifying that such a small Zip64 file can be parsed > by ZipInputStream. Eirik Bjorsnos has updated the pull request incrementally with three additional commits since the last revision: - A Zip64 extra field used in a LOC header must include both the uncompressed and compressed size fields, and does not include local header offset or disk start number fields. Conequently, a valid LOC Zip64 block must always be 16 bytes long. - Document better the zip command and options used to generate the test vector ZIP - Fix spelling of "presence" - Changes: - all: https://git.openjdk.org/jdk/pull/12524/files - new: https://git.openjdk.org/jdk/pull/12524/files/7601b8dd..a77147bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12524=06 - incr: https://webrevs.openjdk.org/?repo=jdk=12524=05-06 Stats: 15 lines in 2 files changed: 8 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12524.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12524/head:pull/12524 PR: https://git.openjdk.org/jdk/pull/12524
Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v11]
On Wed, 15 Nov 2023 19:26:06 GMT, Eirik Bjorsnos wrote: >> Please review this PR which speeds up TestTooManyEntries and clarifies its >> purpose: >> >> - The name 'TestTooManyEntries' does not clearly convey the purpose of the >> test. What is tested is the validation that the total CEN size fits in a >> Java byte array. Suggested rename: CenSizeTooLarge >> - The test creates DEFLATED entries which incurs zlib costs and File Data / >> Data Descriptors for no additional benefit. We can use STORED instead. >> - By creating a single LocalDateTime and setting it with >> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. >> - The name of entries is generated by calling UUID.randomUUID, we could use >> simple counter instead. >> - The produced file is unnecessarily large. We know how large a CEN entry >> is, let's take advantage of that to create a file with the minimal size. >> - By adding a maximally large extra field to the CEN entries, we get away >> with fewer CEN records and save memory >> - The summary and comments of the test can be improved to help explain the >> purpose of the test and how we reach the limit being tested. >> - By writing sparse 'holes' until the last CEN entry, we can reduce required >> disk space. >> >> These speedups reduced the runtime from 4 min 17 sec to 3 seconds on my >> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to ~4K. Memory >> consumption is down from 8GB to something like 12MB. > > Eirik Bjorsnos has updated the pull request incrementally with two additional > commits since the last revision: > > - Fix spelling of LocalDateTime > - Add comment to the SparseOutputStream class noting that instances must be > passed directly to the ZipOutputStream constructor, without buffering. Marked as reviewed by lancea (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/12991#pullrequestreview-1732831434
Re: RFR: 8319928: Exceptions thrown by cleanup actions should be handled correctly
On Mon, 13 Nov 2023 11:08:13 GMT, Maurizio Cimadamore wrote: > Silently ignoring exceptions seems bad, but something that is forced when > using a Cleaner I also don't like that Cleaner ignores exceptions. One idea I have: [8305979 : UncaughtExceptionHandler for Cleaner ](https://bugs.openjdk.org/browse/JDK-8305979). - PR Comment: https://git.openjdk.org/jdk/pull/16619#issuecomment-1813163393
Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v11]
> Please review this PR which speeds up TestTooManyEntries and clarifies its > purpose: > > - The name 'TestTooManyEntries' does not clearly convey the purpose of the > test. What is tested is the validation that the total CEN size fits in a Java > byte array. Suggested rename: CenSizeTooLarge > - The test creates DEFLATED entries which incurs zlib costs and File Data / > Data Descriptors for no additional benefit. We can use STORED instead. > - By creating a single LocalDateTime and setting it with > `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations. > - The name of entries is generated by calling UUID.randomUUID, we could use > simple counter instead. > - The produced file is unnecessarily large. We know how large a CEN entry is, > let's take advantage of that to create a file with the minimal size. > - By adding a maximally large extra field to the CEN entries, we get away > with fewer CEN records and save memory > - The summary and comments of the test can be improved to help explain the > purpose of the test and how we reach the limit being tested. > - By writing sparse 'holes' until the last CEN entry, we can reduce required > disk space. > > These speedups reduced the runtime from 4 min 17 sec to 3 seconds on my > Macbook Pro. The produced ZIP size was reduced from 5.7 GB to ~4K. Memory > consumption is down from 8GB to something like 12MB. Eirik Bjorsnos has updated the pull request incrementally with two additional commits since the last revision: - Fix spelling of LocalDateTime - Add comment to the SparseOutputStream class noting that instances must be passed directly to the ZipOutputStream constructor, without buffering. - Changes: - all: https://git.openjdk.org/jdk/pull/12991/files - new: https://git.openjdk.org/jdk/pull/12991/files/c3d0069f..16a3c733 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12991=10 - incr: https://webrevs.openjdk.org/?repo=jdk=12991=09-10 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12991.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12991/head:pull/12991 PR: https://git.openjdk.org/jdk/pull/12991
Re: RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v10]
On Wed, 15 Nov 2023 07:41:32 GMT, Jaikiran Pai wrote: > Overall, this is a very good improvement to the test and looks good to me. I > just a have a trivial comment about a typo in a code comment, which I've > added inline. Thanks for your review, Jaikiran! With these latest, comment-only changes, this PR is now looking for a sponsor. > test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java line 78: > >> 76: public static final int NAME_LENGTH = 10; >> 77: >> 78: // Use a shared LocalDataTime on all entries to save processing time > > Hello Eirik, there appears to be a typo in the comment here. It should have > been `LocalDateTime`. Fixed > test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java line 204: > >> 202: channel.position(position); >> 203: // Check for last CEN record >> 204: if (Arrays.equals(LAST_CEN_COMMENT_BYTES, 0, >> LAST_CEN_COMMENT_BYTES.length, b, off, len)) { > > The way the instance of a `SparseOutputStream` is used in this test, it gets > passed to the constructor of `ZipOutputStream`. That then means that there is > no buffering involved when bytes are written out to the output stream > internally by `ZipOutputStream`. The implementation of `ZipOutputStream` > writes out the `ZipEntry`'s comment (if any) in one single `write(byte[])` > call on the outputstream, so it's guaranteed that the `byte[] b` passed in > here will actually have the entire comment (from `off` to `len`). So this > `Arrays.equals(...)` check is then guaranteed to pass (when that specific > entry does get written out). So this check looks good to me. Thanks, this was a nice observation. I added a short comment to the SparseOutputStream noting the instances should be passed directly, without any buffering. - PR Comment: https://git.openjdk.org/jdk/pull/12991#issuecomment-1813123375 PR Review Comment: https://git.openjdk.org/jdk/pull/12991#discussion_r1394673748 PR Review Comment: https://git.openjdk.org/jdk/pull/12991#discussion_r1394674564
Integrated: 8319986: Invalid/inconsistent description and example for DateFormat
On Mon, 13 Nov 2023 23:35:11 GMT, Naoto Sato wrote: > Correcting the explanation of the `DateFormat.SHORT` constant. This pull request has now been integrated. Changeset: 891d8cfa Author:Naoto Sato URL: https://git.openjdk.org/jdk/commit/891d8cfaf2fc0636bfe8f864cd010fb71266d723 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod 8319986: Invalid/inconsistent description and example for DateFormat Reviewed-by: joehw, rriggs, jlu, iris, lancea - PR: https://git.openjdk.org/jdk/pull/16645
Re: RFR: JDK-8319626: Override toString() for ZipFile [v2]
> Please review this PR and [CSR](https://bugs.openjdk.org/browse/JDK-8319982) > which overrides and provides an implementation of `toString()` in > _java.util.zip.ZipFile_ (and by extension, _java.util.jar.JarFile_). > > This change is primarily to provide a more informative String representation > of the two classes rather than the default hexadecimal representation of the > hash code. Justin Lu has updated the pull request incrementally with one additional commit since the last revision: reflect review: change string value and drop spec - Changes: - all: https://git.openjdk.org/jdk/pull/16643/files - new: https://git.openjdk.org/jdk/pull/16643/files/bc3cf3e1..c75029cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16643=01 - incr: https://webrevs.openjdk.org/?repo=jdk=16643=00-01 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16643.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16643/head:pull/16643 PR: https://git.openjdk.org/jdk/pull/16643
Re: RFR: JDK-8319626: Override toString() for ZipFile [v2]
On Tue, 14 Nov 2023 07:13:41 GMT, Alan Bateman wrote: >> Justin Lu has updated the pull request incrementally with one additional >> commit since the last revision: >> >> reflect review: change string value and drop spec > > src/java.base/share/classes/java/util/zip/ZipFile.java line 498: > >> 496: */ >> 497: @Override >> 498: public String toString() { > > I don't think the file name on its own is very helpful as it may not be > unique or there may be several instances of ZipFile that are backed by the > same zip file. Can you try `"" + file + > Integer.toHexString(System.identityHashCode(this))` ? Or, if you really want > to hide the file path in the String representation, then just use > file.getName() + identity string. No need for baseName field. I think we need > to be cautious about specifying anything, otherwise code will rely on it. Hi Alan, Thanks for taking a look. I updated the toString() value to the one you suggested, and also dropped the specific aspects of the specification. I am not sure if you have a preference one way or another regarding providing the full path versus just the file name, but I can switch the full path for just the file name if need be. - PR Review Comment: https://git.openjdk.org/jdk/pull/16643#discussion_r1394622024
Withdrawn: 8311085: Remove implementation detail writeTo from LocalVariable(Type)
On Thu, 29 Jun 2023 09:59:08 GMT, Chen Liang wrote: > `LocalVariable` and `LocalVariableType` includes `writeTo(BufWriter)`, which > should be implementation details. > > See > https://mail.openjdk.org/pipermail/classfile-api-dev/2023-June/000381.html > for context. > > This patch moves the implementation to `DirectCodeBuilder`'s original use > sites; the old `b.canWriteDirect` branch is redundant, as `writeIndex`'s > implementation already performs such an optimization. This pull request has been closed without being integrated. - PR: https://git.openjdk.org/jdk/pull/14705
Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v9]
> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461) Viktor Klang has updated the pull request incrementally with one additional commit since the last revision: Improvements after feedback - Changes: - all: https://git.openjdk.org/jdk/pull/16420/files - new: https://git.openjdk.org/jdk/pull/16420/files/bc8f2364..ba84335e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16420=08 - incr: https://webrevs.openjdk.org/?repo=jdk=16420=07-08 Stats: 57 lines in 2 files changed: 17 ins; 18 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16420/head:pull/16420 PR: https://git.openjdk.org/jdk/pull/16420
Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v8]
On Wed, 15 Nov 2023 16:40:53 GMT, Alan Bateman wrote: >> Viktor Klang has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Unpacking the rethrown exceptions from Gatherers.mapConcurrent() > > test/jdk/java/util/stream/GathererAPITest.java line 77: > >> 75: return; >> 76: } >> 77: fail("Expected NullPointerException but wasn't thrown!"); > > If you are updating this test then you could use > assertThrows(UnsupportedOperationException.class, supplier::get) here, same > in assertThrowsNPE. It's just a minor thing as I'm reading through some of > the tests. That's a good point. Earlier these utilities did unwrapping to type-test a cause, but how it's written now it can definitely be replaced with an assertThrows. - PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1394527319
RFE: support safely wrapping restricted FFM calls
Hello all, I'm working on a module that makes working with FFM easier; think of something like JNA. For instance, it allows creating structures without having to manually manage var handles etc. My module uses restricted mehods like AddressLayout.withTargetLayout to support pointers. Those correctly give warnings if I don't use --enable-native-access. This is where I've identified a potential security risk. Native access would need to be enabled for *my* module, which would allow modules that use my module to call these restricted methods indirectly and without needing native access enabled themselves. This means that any malicious module could piggy-back on the native access that would be enabled for my module. I can implement my own access checks using the following: StackWalker.getInstance(Set.of(Option.RETAIN_CLASS_REFERENCE)) .getCallerClass() .getModule() .isNativeAccessEnabled() However, that would mean users of my module would need to provide access using two different mechanisms. I think that making some existing code public could help situations like mine: * Changing the visibility of java.lang.Module.ensureNativeAccess from package-private to public would allow me to check access using the JVM's own mechanism, in combination with the StackWalker class to get the caller (current) class and its module. Alternatively, new instance method Class.ensureNativeAccess(owner, methodName) could delegate to Reflection.ensureNativeAccess(this, owner, methodName) to make sure that a different module couldn't be used instead, or static method Class.ensureNativeAccess(currentClass, owner, methodName) could delegate to Reflection.ensureNativeAccess to support null classes. * Moving jdk.internal.javac.Restricted to java.lang.foreign would allow me to easily document that methods are restricted. There is an alternative in using --add-exports to access jdk.internal.reflect (for Reflection.ensureNativeAccess that indirectly calls Module.ensureNativeAccess) and jdk.internal.javac (for Restricted), but that adds another burden on callers. In this case, a burden that cannot be easily remedied using a manifest entry (Enable-Native-Access). Kind regards, Rob
Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy
On Wed, 15 Nov 2023 01:44:56 GMT, Jatin Bhateja wrote: >> Thanks, there is an store fence upon completion of the main loop for the >> large size code: >> >> ![image](https://github.com/openjdk/jdk/assets/3858882/3bcea3c6-3bda-458c-aa7c-29ed6010cde2) > > Do you see any concerns while handling multithreaded case where writer is > busy copying 256 bytes block in loop and reader try to access a location > still not flushed out of write combining buffer. The results a concurrent reader sees could be different if the copy is using nt writes, but if the read of the destination is not synced with the copy operation, I think the reader would not see consistent state in either case. Is it worse with nt writes? - PR Review Comment: https://git.openjdk.org/jdk/pull/16575#discussion_r1394500454
Re: RFR: 8319123: Implement JEP 461: Stream Gatherers (Preview) [v8]
On Tue, 14 Nov 2023 16:35:53 GMT, Viktor Klang wrote: >> This Pull-Request implements [JEP-461](https://openjdk.org/jeps/461) > > Viktor Klang has updated the pull request incrementally with one additional > commit since the last revision: > > Unpacking the rethrown exceptions from Gatherers.mapConcurrent() test/jdk/java/util/stream/GathererAPITest.java line 77: > 75: return; > 76: } > 77: fail("Expected NullPointerException but wasn't thrown!"); If you are updating this test then you could use assertThrows(UnsupportedOperationException.class, supplier::get) here, same in assertThrowsNPE. It's just a minor thing as I'm reading through some of the tests. - PR Review Comment: https://git.openjdk.org/jdk/pull/16420#discussion_r1394472880
Re: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4]
On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from >> mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the >> arrays during construction that might invalidate internal invariants for the >> correct behavior of operations on the resulting strings. In particular, a >> number of operations have optimizations for operations on pairs of latin1 >> strings and pairs of non-latin1 strings, while operations between latin1 and >> non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to >> indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled >> from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least >> one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or >> another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as >> non-latin1 with the added verification that a non-latin1 character is >> present at the same index. >> If that character is found to be latin1, then the input array has been >> modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an >> existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; >> that scan determines whether the latin1 or the non-latin1 representation is >> returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic >> implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be >> guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional > commit since the last revision: > > Update PPC implementation of string_compress to return the index of the > non-latin1 char > Patch supplied by TheRealMDoerr test/jdk/java/lang/String/StringRacyConstructor.java line 110: > 108: for (int i = 0; i < orig.length(); i++) > 109: accum |= orig.charAt(i); > 110: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; I think this assumes that compact strings are enabled during this test. test/jdk/java/lang/String/StringRacyConstructor.java line 119: > 117: for (int i = 0; i < orig.length(); i++) > 118: accum |= orig.charAt(i); > 119: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; Same as above. test/jdk/java/lang/String/StringRacyConstructor.java line 190: > 188: if (printWarningCount == 0) { > 189: printWarningCount = 1; > 190: System.out.println("StringUTF16.compress returned 0, > may not be intrinsic"); It seems to me that the Java code for `StringUTF16.compress` also returns the index of the non-latin1 char, so I'm not sure I understand this. Just caution? test/jdk/java/lang/String/StringRacyConstructor.java line 199: > 197: // Exhaustively check compress returning the correct index of > the non-latin1 char. > 198: final int SIZE = 48; > 199: final byte FILL_BYTE = 0x52; Suggestion: final byte FILL_BYTE = 'R'; This makes it more clear that `FILL_BYTE != 'A'` for the logic below. test/jdk/java/lang/String/StringRacyConstructor.java line 258: > 256: */ > 257: public static String racyStringConstruction(String original) throws > ConcurrentModificationException { > 258: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 288: > 286: } > 287: if (i >= 1_000_000) { > 288: System.err.printf("Unable to produce a UTF16 string > in %d iterations: %s%n", i, original); AFAIU, this writes to `System.err` on "success". Is this the intent? test/jdk/java/lang/String/StringRacyConstructor.java line 300: > 298: */ > 299: public static String racyStringConstructionCodepoints(String > original) throws ConcurrentModificationException { > 300: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 347: > 345: */ > 346: public static String > racyStringConstructionCodepointsSurrogates(String original) throws > ConcurrentModificationException { > 347:
Re: RFR: JDK-8315458 Implement JEP 463: Implicitly Declared Classes and Instance Main Method (Second Preview) [v19]
On Tue, 14 Nov 2023 13:35:46 GMT, Jim Laskey wrote: >> Address changes from JEP 445 to JEP 463. >> >> - Move from a SYNTHETIC unnamed class to a MANDATED implicit class. >> >> - Don't mark class on read. >> >> - Remove reflection and annotation processing related to unnamed classes. >> >> - Simplify main method search. > > Jim Laskey has updated the pull request incrementally with one additional > commit since the last revision: > > Filter abstract main methods and search interfaces for default main methods. Look at the spec https://bugs.openjdk.org/browse/JDK-8319252 under 7.3 Compilation Units. - It is not abstract (8.1.1.1 ⇗). - It is final (8.1.1.2 ⇗). - It is a member of an unnamed package (7.4.2 ⇗) and **has package access**. Its direct superclass is Object (8.1.4 ⇗). - It does not have any direct superinterfaces (8.1.5 ⇗). - The body of the class contains every ClassMemberDeclaration (these are declarations of fields (8.3 ⇗), methods (8.4 ⇗), member classes (8.5 ⇗), and member interfaces (9.1.1.3 ⇗)) from the simple compilation unit. It is not possible for a simple compilation unit to declare an instance initializer (8.6 ⇗), static initializer (8.7 ⇗), or constructor (8.8 ⇗). -It has an implicitly declared default constructor (8.8.9 ⇗). - PR Comment: https://git.openjdk.org/jdk/pull/16461#issuecomment-1812796001
Re: RFR: 8318364: Add an FFM-based implementation of harfbuzz OpenType layout [v4]
On Wed, 25 Oct 2023 23:42:08 GMT, Phil Race wrote: >> 8318364: Add an FFM-based implementation of harfbuzz OpenType layout > > Phil Race has updated the pull request incrementally with one additional > commit since the last revision: > > indentation src/java.desktop/share/classes/sun/font/HBShaper.java line 66: > 64: * }; > 65: */ > 66:private static final UnionLayout VarIntLayout = > MemoryLayout.unionLayout( Indentation issue. src/java.desktop/share/classes/sun/font/HBShaper.java line 84: > 82: * }; > 83: */ > 84: private static final StructLayout PositionLayout = > MemoryLayout.structLayout( Indentation issue. src/java.desktop/share/classes/sun/font/HBShaper.java line 142: > 140: private static final MemorySegment get_contour_pt_stub; > 141: > 142: private static final MemorySegment store_layout_results_stub; I see this Upcall is named `store_layout_results` in case of FFM and we call it similar function in JNI case as `storeGVData`. If we can find some common name and use it will be beneficial in future when we want to compare code between FFM and JNI implementation. src/java.desktop/share/classes/sun/font/HBShaper.java line 144: > 142: private static final MemorySegment store_layout_results_stub; > 143: > 144:private static FunctionDescriptor Indentation issue. src/java.desktop/share/classes/sun/font/HBShaper.java line 153: > 151:} > 152: > 153:private static MethodHandle getMethodHandle Indentation issue. src/java.desktop/share/classes/sun/font/HBShaper.java line 165: > 163:} > 164: > 165:static { Indentation issue. src/java.desktop/share/classes/sun/font/HBShaper.java line 212: > 210: jdk_hb_shape_handle = tmp4; > 211: > 212: Arena garena = Arena.global(); // creating stubs that exist > until VM exit. Variable name `gArena`(for global arena) is better than using `garena`. src/java.desktop/share/classes/sun/font/HBShaper.java line 347: > 345: glyphIDPtr.setAtIndex(JAVA_INT, 0, glyphID); > 346: return (glyphID != 0) ? 1 : 0; > 347: } Indentation issue. src/java.desktop/share/classes/sun/font/HBShaper.java line 659: > 657: startPt.y = advY; > 658: startPt.x = advX; > 659: } Indentation issue. src/java.desktop/share/classes/sun/font/SunLayoutEngine.java line 193: > 191: } else { > 192:long pFace = getFacePtr(font); > 193: shape(font, strike, ptSize, mat, pFace, Is it okay to not have `(pFace != 0)` check here? - PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394338584 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394339310 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394403920 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394339849 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394340526 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394341308 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394370065 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394342419 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394345302 PR Review Comment: https://git.openjdk.org/jdk/pull/15476#discussion_r1394293925
Re: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4]
On Wed, 15 Nov 2023 15:32:54 GMT, Claes Redestad wrote: >> Roger Riggs has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Update PPC implementation of string_compress to return the index of the >> non-latin1 char >> Patch supplied by TheRealMDoerr > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8617: > >> 8615: lea(dst, Address(dst, tmp5, Address::times_1)); >> 8616: subptr(len, tmp5); >> 8617: jmpb(copy_chars_loop); > > This cause a crash if I run with `-XX:UseAVX=3 -XX:AVX3Threshold=0`: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (macroAssembler_x86.hpp:122), pid=3400452, tid=3400470 > # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit > offset at :0 > # > > > Needs to be a `jmp(copy_chars_loop)`. Alternatively: if (UseSSE42Intrinsics) { jmpb(copy_chars_loop); } else { jmp(copy_chars_loop); } More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive, though. Especially since we shouldn't mix AVX and SSE code (the code in this intrinsic seem to follow paths which are either/or, but it seems fragile). Perhaps @TobiHartmann can advise? - PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394386963
Re: RFR: JDK-8315458 Implement JEP 463: Implicitly Declared Classes and Instance Main Method (Second Preview) [v17]
On Sat, 11 Nov 2023 15:57:40 GMT, Jim Laskey wrote: >> Jim Laskey has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Clean up previous commit > > Things should be fine now. @JimLaskey, I read JEP 463 but couldn't find any statements on access level of an implicit class. `javap` shows that such a class has package access (as I assume it should). Here's the most relevant find in JEP, but it's not about the class. It's about class' members: > An implicit class is almost exactly like an explicitly declared class. Its > members can have the same modifiers (e.g., private and static) and the > modifiers have the same defaults (e.g., package access and instance > membership). One key difference is that while an implicit class has a default > zero-parameter constructor, it can have no other constructor. - PR Comment: https://git.openjdk.org/jdk/pull/16461#issuecomment-1812771590
Re: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4]
On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from >> mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the >> arrays during construction that might invalidate internal invariants for the >> correct behavior of operations on the resulting strings. In particular, a >> number of operations have optimizations for operations on pairs of latin1 >> strings and pairs of non-latin1 strings, while operations between latin1 and >> non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to >> indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled >> from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least >> one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or >> another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as >> non-latin1 with the added verification that a non-latin1 character is >> present at the same index. >> If that character is found to be latin1, then the input array has been >> modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an >> existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; >> that scan determines whether the latin1 or the non-latin1 representation is >> returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic >> implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be >> guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional > commit since the last revision: > > Update PPC implementation of string_compress to return the index of the > non-latin1 char > Patch supplied by TheRealMDoerr src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8617: > 8615: lea(dst, Address(dst, tmp5, Address::times_1)); > 8616: subptr(len, tmp5); > 8617: jmpb(copy_chars_loop); This cause a crash if I run with `-XX:UseAVX=3 -XX:AVX3Threshold=0`: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (macroAssembler_x86.hpp:122), pid=3400452, tid=3400470 # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit offset at :0 # Needs to be a `jmp(copy_chars_loop)`. - PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394375402
Re: RFR: 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts
On Wed, 15 Nov 2023 07:48:28 GMT, Eric Liu wrote: > Vector API defines zero-extend operations [1], which are going to be > intrinsified and generated to `VectorUCastNode` by C2. This patch adds > backend implementation for `VectorUCastNode` on AArch64. > > The micro benchmark shows significant performance improvement. In my test > machine (SVE, 256-bit), the result is shown as below: > > > > Benchmark Before After Units Gain > VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70 > VectorZeroExtend.byte2Long3212.201 216291.588 ops/ms 66.33 > VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85 > VectorZeroExtend.int2Long 1012.19780448.553 ops/ms 78.48 > VectorZeroExtend.short2Int1812.471 153416.828 ops/ms 83.65 > VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58 > > > On other Neon systems, we can get similar performance boost as a result of > intrinsification success. > > Since `VectorUCastNode` only used in Vector API's zero extension currently, > this patch also adds assertion on nodes' definitions to clarify their usages. > > [TEST] > compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines. > > [1] > https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726 src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 2322: > 2320: ins_pipe(pipe_slow); > 2321: %} > 2322: The following hunk does not seem to be making good use of the macro processor. - PR Review Comment: https://git.openjdk.org/jdk/pull/16670#discussion_r1394363082
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, long, float and double arrays)
On Tue, 7 Nov 2023 00:12:41 GMT, Srinivas Vamsi Parasa wrote: > The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) and upto ~3x improvement for 64-bit datatypes > (long, double) on Intel TigerLake machine as shown in the performance data > below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) and upto ~2.3x for 64-bit datatypes as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > ArraysSort.longSort | 10 | 0.037 | 0.031 | 1.2 > ArraysSort.longSort | 25 | 0.101 | 0.073 | 1.4 > ArraysSort.longSort | 50 | 0.227 | 0.219 | 1.0 > ArraysSort.longSort | 75 | 0.446 | 0.332 | 1.3 > ArraysSort.longSort | 100 | 0.714 | 0.557 | 1.3 > ArraysSort.longSort | 1000 ... make/modules/java.base/Lib.gmk line 245: > 243: TOOLCHAIN := TOOLCHAIN_LINK_CXX, \ > 244: OPTIMIZATION := HIGH, \ > 245: CFLAGS := $(CFLAGS_JDKLIB) -std=c++17, \ This makes me uneasy. We do not in general use C++17 in the JDK. Is this flag needed for the code to compile? If so, would it be difficult to rewrite it not to require C++17 constructs? Or was it added since you noticed performance increases, not related to the new code, by forcing the compiler to use a higher language revision? We are supporting gcc versions from 6. From what I can tell, C++17 was fully introduced in gcc 11. Increasing the lowest supported gcc to 11 would require quite a jump, just for this library. In the worst case, you would need to make the existence of this library dependent on gcc version. (It is my understanding that the library is optional, and just produces a performance benefits if it exists). - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1394350837
Integrated: 8319966: AIX: expected [[0:i4]] but found [[0:I4]] after JDK-8319882
On Wed, 15 Nov 2023 08:18:31 GMT, Per Minborg wrote: > This PR proposes to fix a failing test on big endian architectures like AIX. This pull request has now been integrated. Changeset: 4f4d00fa Author:Per Minborg URL: https://git.openjdk.org/jdk/commit/4f4d00fa756b1409692ada9aa2be76aa4f7da659 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8319966: AIX: expected [[0:i4]] but found [[0:I4]] after JDK-8319882 Reviewed-by: mdoerr - PR: https://git.openjdk.org/jdk/pull/16672
Re: RFR: 8319966: AIX: expected [[0:i4]] but found [[0:I4]] after JDK-8319882
On Wed, 15 Nov 2023 08:18:31 GMT, Per Minborg wrote: > This PR proposes to fix a failing test on big endian architectures like AIX. Thanks for fixing it so quickly! Looks correct and makes the test happy on AIX. - Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16672#pullrequestreview-1732012496
Re: RFR: JDK-8318175 : AIX PPC64: Handle alignment of double in structs
On Thu, 9 Nov 2023 09:22:45 GMT, suchismith1993 wrote: > 1. use pragma directive to handle alignment. > > JBS Issue: [JDK-8318175](https://bugs.openjdk.org/browse/JDK-8318175) Looks good. I had proposed this solution here: https://github.com/openjdk/jdk/pull/16179#discussion_r1360709308 - Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16579#pullrequestreview-1731793602
Re: RFR: 8319966: AIX: expected [[0:i4]] but found [[0:I4]] after JDK-8319882
On Wed, 15 Nov 2023 09:27:45 GMT, Jaikiran Pai wrote: >> This PR proposes to fix a failing test on big endian architectures like AIX. > > test/jdk/java/foreign/TestLayouts.java line 223: > >> 221: public void testSequenceLayoutWithZeroLength() { >> 222: SequenceLayout layout = MemoryLayout.sequenceLayout(0, >> JAVA_INT); >> 223: assertEquals(layout.toString().toLowerCase(Locale.ROOT), >> "[0:i4]"); > > Hello Per, I don't yet have experience with these APIs; do you think the > implementation of `toString()` itself should perhaps be changed to return a > consistent value that doesn't depend on the endianness? Or is it intentional > that there is a dependency? I have seen this question several times in the past and it is indeed a valid one. It is like this by design and unfortunately, there will always be differences across platforms due to byte ordering, and mappings between C types and Java types. As FFM acts like a glue between the native and Java worlds, there is always going to be a risk that one gets glue on one's fingers... - PR Review Comment: https://git.openjdk.org/jdk/pull/16672#discussion_r1393947108
Re: RFR: 8319966: AIX: expected [[0:i4]] but found [[0:I4]] after JDK-8319882
On Wed, 15 Nov 2023 08:18:31 GMT, Per Minborg wrote: > This PR proposes to fix a failing test on big endian architectures like AIX. test/jdk/java/foreign/TestLayouts.java line 223: > 221: public void testSequenceLayoutWithZeroLength() { > 222: SequenceLayout layout = MemoryLayout.sequenceLayout(0, JAVA_INT); > 223: assertEquals(layout.toString().toLowerCase(Locale.ROOT), > "[0:i4]"); Hello Per, I don't yet have experience with these APIs; do you think the implementation of `toString()` itself should perhaps be changed to return a consistent value that doesn't depend on the endianness? Or is it intentional that there is a dependency? - PR Review Comment: https://git.openjdk.org/jdk/pull/16672#discussion_r1393908097
Re: [External] : Re: Provide thread-safe and concurrent sequenced collections with insertion order?
Thank you for those references! The docs by Ben Manes look like a good starting point to learn more about implementing a concurrent cache. To elaborate on my motivation: I work as an educator explaining concepts behind Java extensions in a corporate context. LRU caches seemed like an interesting choice as an example demonstrating the new interfaces for sequenced collections together with virtual threads. But my main interest (that's why I asked here) is in understanding design considerations for the Java core libraries so I can explain them to others. When preparing the example, I stumbled upon what I perceived as gaps in the core libraries: - Collections.synchronizedSequencedSet and similar wrappers are not provided although Collections.synchronizedSortedSet (and others) are - There is no sequenced collection with external ordering in java.util.concurrent Here is a summary of what I infer from our interaction: - Synchronized wrappers are of limited use. The benefit of adding them for sequenced collections is unclear, despite already providing them for other interfaces extending Set and Map. - The implementation of concurrent versions of sequenced collections with external ordering is non-trivial and the benefit of adding them to the core libraries is unclear. For specific use cases like concurrent caches there are external libraries. - Although the mentioned gaps are apparent, it is unlikely that they will be filled before clarifying how that could be useful. Assuming that's a fair summary, I think I understand better now why the mentioned gaps are there. So, thanks again for your input! On Tue, Nov 14, 2023 at 8:43 PM Stuart Marks wrote: > Ah, so you want a concurrent cache! This is why it's important to talk > about use cases. Turns out there are a lot of subtle issues here, and there > has been a lot of work in this area outside the JDK. For example, a bit of > searching turned up this Stack Overflow question [1]. The answers aren't > particularly valuable, but there is this comment [2] from Ben Manes, who > knows something about this topic: > > ConcurrentLinkedHashMap [3] beget Guava's Cache [4] which beget Caffeine. > [5] [6] > > [1]: > https://stackoverflow.com/questions/40239485/concurrent-lru-cache-implementation > > [2]: > https://stackoverflow.com/questions/40239485/concurrent-lru-cache-implementation#comment67809840_40239485 > > [3]: https://github.com/ben-manes/concurrentlinkedhashmap/wiki/Design > > [4]: > https://github.com/ben-manes/concurrentlinkedhashmap/blob/wiki/ConcurrentCachingAtGoogle.pdf > > [5]: > http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html > > [6]: https://github.com/ben-manes/caffeine > > It looks like "ConcurrentLinkedHashMap" is a concurrent hash map that > maintains ordering but that doesn't actually maintain a linked list of > entries in the same way as LinkedHashMap. I haven't looked closely at the > implementation, but I'd guess that it maintains ordering using a weakly > consistent side data structure. > > Weak consistency seems to be the flavor of most concurrent data > structures; you give up non-essential consistency in order to gain > concurrency. For example, ConcurrentHashMap's views' iterators are "weakly > consistent" which roughly means that an element reported by the iterator > was a member at one time, but elements might not be reported by the > iterator if they were added or removed sometime during the iteration. For > an LRU cache, weaker semantics might be that you don't care about the exact > ordering of recency of usage. (Indeed, ordering of events is somewhat > ill-defined in a concurrent system.) Instead, you probably want things that > were used fairly recently to stay in the cache longer, while things that > haven't been used recently should tend to expire earlier. > > I'd suggest looking at some of these caching libraries. Or if you're keen > to roll your own, read some of the design documents. Which might convince > you that you don't want to roll your own. :-) > > s'marks > > On 11/10/23 3:26 AM, Sebastian Fischer wrote: > > Hi Stuart. > > Thanks, I understand now how the limited thread safety of "synchronized" > wrappers has led to not adding new ones. > > My use case is an LRU-cache that is accessed by a large number of > (virtual) threads. > > In my initial implementation I constructed a LinkedHashMap passing true > for the accessOrder parameter, implemented removeEldestEntry appropriately > and then wrapped it using Collections.sequencedMap. Thanks to this > interface, I did not need a specialized wrapper supporting putLast or > pollFirstEntry to manage the capacity myself. So, wrapping a LinkedHashMap > seemed like a viable approach at first. > > But, the map is being used by many threads, and a synchronized wrapper is > governed by a single exclusion lock. I improved throughput using a > concurrent map supporting concurrent reads and (some) writes. Here is a > simplified example using a sequenced collection
Re: RFR: JDK-8313764: Offer JVM HS functionality to shared lib load operations done by the JDK codebase [v2]
On Wed, 23 Aug 2023 15:18:03 GMT, Matthias Baesken wrote: >> Currently there is a number of functionality that would be interesting to >> have for shared lib load operations in the JDK C code. >> Some examples : >> Events::log_dll_message for hs-err files reporting >> JFR event NativeLibraryLoad >> There is the need to update the shared lib Cache on AIX ( see >> LoadedLibraries::reload() , see also >> https://bugs.openjdk.org/browse/JDK-8314152 ), >> this is currently not fully in sync with libs loaded form jdk c-libs and >> sometimes reports outdated information >> >> Offer an interface (e.g. jvm.cpp) to support this. > > Matthias Baesken has updated the pull request incrementally with one > additional commit since the last revision: > > windows aarch64 build issues In the meantime, https://bugs.openjdk.org/browse/JDK-8295159 was integrated. There we restore on x86_64 and aarch the floating point environment (fenv_t) after loading libs in HS os::dll_load / dlopen_helper to avoid 'silent' manipulation of the fenv by 'bad' shared libs. But unfortunately this does not work for libs not going through this coding (like the ones loaded from JDK c code). (And even if restoring in these additional cases is not wanted, we could at least *warn* about the change by a trace .) Might be worth considering in context of this change . - PR Comment: https://git.openjdk.org/jdk/pull/15264#issuecomment-1812023112
RFR: 8319966: AIX: expected [[0:i4]] but found [[0:I4]] after JDK-8319882
This PR proposes to fix a failing test on big endian architectures like AIX. - Commit messages: - Fix AIX test issue Changes: https://git.openjdk.org/jdk/pull/16672/files Webrev: https://webrevs.openjdk.org/?repo=jdk=16672=00 Issue: https://bugs.openjdk.org/browse/JDK-8319966 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16672/head:pull/16672 PR: https://git.openjdk.org/jdk/pull/16672