Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]
On Wed, 15 Mar 2023 19:40:33 GMT, Roman Kennke wrote: >>> Would it be possible to open/send me the failing test that triggers >>> vframeArray assert >>> or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take >> to move that test from closed to open. Will keep you posted. > >> > Would it be possible to open/send me the failing test that triggers >> > vframeArray assert >> > or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take to >> move that test from closed to open. Will keep you posted. > > Thank you! > > Regarding moving this PR back to draft, I am not sure. I can do that, yes. > But really the fundamental algorithm and implementation is basically fixed > since half a year already. I have re-worked it into a fresh PR based on the > request to put it behind a flag. The recent change to a fixed-size lock-stack > has probably invalidated part of your previous reviews, and I am sorry for > that. On the upside, it removed a lot of complexity in the JIT compilers and > assembly code generators. > > What else do I expect to happen? > > Thomas is working on an ARM(32) port, but this is quite separate and could > even land after this PR is done. > > I still don't quite like the naming. Fast-locking doesn't really say anything > and it's not (meant to be) faster than the previous stack-locking. It is an > alternative (and less racy, on the object header) way to implement a > thin-locking layer before inflating monitors, that is all. So maybe > -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is > that when we eventually switch to Lilliput turned on by default, we would > entirely remove stack-locking. > > I would also add some code in arguments.cpp to keep this new thin locking > turned off on platforms that don't yet support it. > > Besides that, from my POV, it is pretty much done. > > What do you think? @rkennke The changed to fixed-size lock stack was a significant change as you note and that suggested to me that the design was still in flux. So I have to wonder whether everything is in fact now stable? (or as stable as one should expect for an experimental new feature) > Fast-locking doesn't really say anything and it's not (meant to be) faster > than the previous stack-locking. Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks - PR: https://git.openjdk.org/jdk/pull/10907
RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics
This is needed for performance improvements in support of virtual threads. The update includes the following: 1. Refactored the `VirtualThread` native methods: `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` `JVM_VirtualThreadUnmountEnd`=> `VTMS_mount_end` 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. Testing: - Ran mach5 tiers 1-6. No regressions were found. - Commit messages: - fix traling spaces in a couple of files - minor update for VTMS_notify_jvmti_events variable - 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Changes: https://git.openjdk.org/jdk/pull/13054/files Webrev: https://webrevs.openjdk.org/?repo=jdk=13054=00 Issue: https://bugs.openjdk.org/browse/JDK-8304303 Stats: 438 lines in 20 files changed: 265 ins; 125 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. > (Excluding the Unicode space and tab sequence). The conversion was done using > native2ascii. > > In addition, the build logic is adjusted to support reading in the > .properties files as UTF-8 during the conversion from .properties file to > .java ListResourceBundle file. test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156: > 154: zh=\u00A4 > 155: zh_CN=\uFFE5 > 156: zh_HK=HK$ Why are they not encoded into UTF-8 native? - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]
On Wed, 15 Mar 2023 18:50:25 GMT, Serguei Spitsyn wrote: > This looks pretty good. How did you test the fix? Does it never fail with > your fix? Thanks, Serguei I run test jobs for "serviceability/sa" 100 times on all platforms, no failures (neither attach timeout nor NoClassDefFoundError) - PR: https://git.openjdk.org/jdk/pull/13030
Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]
On Wed, 15 Mar 2023 02:41:18 GMT, David Holmes wrote: > Not sure removing the build directives was the right way to go. As per the > jtreg tag guide: > > > A test that relies upon library classes should contain appropriate @build > > directives to ensure that the classes will be compiled. It is strongly > > recommended that tests do not rely on the use of implicit compilation by > > the Java compiler. > > so the problem is likely caused by missing build directives in the test(s) > that fails. This is long standing issue with NoClassDefFoundError for library classes. As far as I got from reading similar issues there was a try to add build directive for failing tests, but other tests from the same directory started to fail with NoClassDefFoundError later, and now most of the test have no build action for libraries. It seems to me that it's much simpler to remove build action from 4 tests in the directory than add it for other 55 - PR: https://git.openjdk.org/jdk/pull/13030
Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]
On Wed, 15 Mar 2023 19:14:09 GMT, Daniel D. Daugherty wrote: > > Would it be possible to open/send me the failing test that triggers > > vframeArray assert > > or extract a reproducer that you could publish? > > I have started an internal discussion at Oracle to see what it would take to > move that test from closed to open. Will keep you posted. Thank you! Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. What else do I expect to happen? Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. Besides that, from my POV, it is pretty much done. What do you think? - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8257967: JFR: Events for loaded agents [v10]
On Tue, 14 Mar 2023 12:26:16 GMT, Markus Grönlund wrote: >> I've had a good look through now and have a better sense of the refactoring. >> Seems good. >> >> I'll wait for any tweaks before hitting the approve button though. >> >> Thanks > >> I've had a good look through now and have a better sense of the refactoring. >> Seems good. >> >> >> >> I'll wait for any tweaks before hitting the approve button though. >> >> >> >> Thanks > > Thanks so much for taking a look. I realized that implementation details of > loading should probably reside in agent.cpp, not agentList.cpp. > > I am currently off on vacation and will update when back. Thanks also to > Andrew Dinn for comments. @mgronlun I'm looking at the fixes but it will take some time. - PR: https://git.openjdk.org/jdk/pull/12923
Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]
On Wed, 15 Mar 2023 09:36:25 GMT, Roman Kennke wrote: > Would it be possible to open/send me the failing test that triggers > vframeArray assert > or extract a reproducer that you could publish? I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for >> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its >> ambigious fields f1 and f2. This structure can hold information for fields, >> methods, and invokedynamics and each of its fields can hold different types >> of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain >> invokedynamic information in a manner that is easy to interpret and easy to >> extend. Resolved invokedynamic entries will be stored in an array in the >> constant pool cache and the operand of the invokedynamic bytecode will be >> rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from >> ConstantPoolCacheEntry will be replaced with accesses to this new array and >> structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by >> @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one > additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > 3396: const Bytecodes::Code code = bytecode(); > 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; > 3398: const bool is_invokedynamic= code == false; // should not reach > here with invokedynamic This looks strange! I guess you wanted to delete more? - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation
Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]
On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from >> subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve >> NoClassDefFoundError failures; It's known JTReg issue that "@build" actions >> for part of used shared classes may cause intermittent NoClassDefFoundError >> in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional > commit since the last revision: > > feedback This looks pretty good. How did you test the fix? Does it never fail with your fix? Thanks, Serguei - PR: https://git.openjdk.org/jdk/pull/13030
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
> The current structure used to store the resolution information for > invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its > ambigious fields f1 and f2. This structure can hold information for fields, > methods, and invokedynamics and each of its fields can hold different types > of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain > invokedynamic information in a manner that is easy to interpret and easy to > extend. Resolved invokedynamic entries will be stored in an array in the > constant pool cache and the operand of the invokedynamic bytecode will be > rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from > ConstantPoolCacheEntry will be replaced with accesses to this new array and > structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by > @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed aarch64 interpreter mistake - Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/415e7116..9a3a63ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12778=05 - incr: https://webrevs.openjdk.org/?repo=jdk=12778=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2]
On Tue, 14 Mar 2023 17:01:20 GMT, Matias Saavedra Silva wrote: > > @matias9927 can I ask you to merge master? There seem to be conflicts (at > > least I see a message "This branch has conflicts that must be resolved"). > > I'd like to give the change a spin in our CI testing. This requires that it > > can be applied on master. > > I saw that merge error but nothing came up when I tried to merge locally. The > branch is updated nonetheless, so you should be able to test it now @reinrich > ! Thanks. The testing didn't reveal anything. - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]
On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has >> poor extension capabilities, a complex management code because of lack of >> strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, >> achieving better memory density and providing flexible extensibility, while >> exposing a strongly typed set of data when uncompressed. The stream is >> compressed using the unsigned5 encoding, which alreay present in the JDK >> (because of pack200) and the JVM (because JIT compulers use it to comrpess >> debugging information). >> >> More technical details are available in the CR: >> https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the >> previous heterogeneous AccessFlags field into three distincts flag >> categories: immutable flags from the class file, immutable fields defined by >> the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, >> have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one > additional commit since the last revision: > > SA and JVMCI fixes SA changes looks good. Thanks for taking care of this! - Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v5]
> The current structure used to store the resolution information for > invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its > ambigious fields f1 and f2. This structure can hold information for fields, > methods, and invokedynamics and each of its fields can hold different types > of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain > invokedynamic information in a manner that is easy to interpret and easy to > extend. Resolved invokedynamic entries will be stored in an array in the > constant pool cache and the operand of the invokedynamic bytecode will be > rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from > ConstantPoolCacheEntry will be replaced with accesses to this new array and > structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by > @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed indentation and other comments - Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/db892223..415e7116 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12778=04 - incr: https://webrevs.openjdk.org/?repo=jdk=12778=03-04 Stats: 71 lines in 9 files changed: 1 ins; 3 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. > (Excluding the Unicode space and tab sequence). The conversion was done using > native2ascii. > > In addition, the build logic is adjusted to support reading in the > .properties files as UTF-8 during the conversion from .properties file to > .java ListResourceBundle file. test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1: > 1: # This file should probably be excluded because it's used in a test that relates to UTF-8 encoding (or not) of property files. - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native
On Tue, 7 Mar 2023 23:15:14 GMT, Jonathan Gibbons wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. >> (Excluding the Unicode space and tab sequence). The conversion was done >> using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the >> .properties files as UTF-8 during the conversion from .properties file to >> .java ListResourceBundle file. > > make/langtools/tools/compileproperties/CompileProperties.java line 252: > >> 250: try { >> 251: writer = new BufferedWriter( >> 252: new OutputStreamWriter(new >> FileOutputStream(outputPath), StandardCharsets.ISO_8859_1)); > > Using ISO_8859_1 seems strange. > Since these are generated files, you could write them as UTF-8 and then > override the default javac option for ascii when compiling _just_ these files. > > Or else just stay with ascii; no one should be looking at these files! Will stick with your latter solution, as since the .properties files were converted via native2ascii, it makes sense to write out via ascii. - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. > (Excluding the Unicode space and tab sequence). The conversion was done using > native2ascii. > > In addition, the build logic is adjusted to support reading in the > .properties files as UTF-8 during the conversion from .properties file to > .java ListResourceBundle file. make/langtools/tools/compileproperties/CompileProperties.java line 252: > 250: try { > 251: writer = new BufferedWriter( > 252: new OutputStreamWriter(new > FileOutputStream(outputPath), StandardCharsets.ISO_8859_1)); Using ISO_8859_1 seems strange. Since these are generated files, you could write them as UTF-8 and then override the default javac option for ascii when compiling _just_ these files. Or else just stay with ascii; no one should be looking at these files! - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3]
On Tue, 14 Mar 2023 15:39:39 GMT, Gui Cao wrote: >> Matias Saavedra Silva has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains five >> additional commits since the last revision: >> >> - Typo in comment >> - Merge branch 'master' into resolvedIndyEntry_8301995 >> - Interpreter optimization and comments >> - PPC and RISCV port >> - 8301995: Move invokedynamic resolution information out of the cpCache > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1843: > >> 1841: ldr(cache, Address(rcpool, >> in_bytes(ConstantPoolCache::invokedynamic_entries_offset(; >> 1842: // Scale the index to be the entry index * >> sizeof(ResolvedInvokeDynamicInfo) >> 1843: mov(tmp, sizeof(ResolvedIndyEntry)); > > The tmp register is not used here, is it redundant? Right, the tmp register is not needed anymore thanks to the mul to shift optimization. Note that shifting will not be possible on 32-bit systems due to the size of ResolvedIndyEntry not being a power of two. This optimization only works on 64 bit builds. - PR: https://git.openjdk.org/jdk/pull/12778
RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native
This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. - Commit messages: - Write to ASCII - Read in .properties as UTF-8, but write to LRB .java as ISO-8859-1 - Compile class with ascii (Not ready to make system wide change) - Toggle UTF-8 for javac option in JavaCompilation.gmk - CompileProperties converts in UTF-8 - Convert .properties from ISO-8859-1 to UTF-8 Changes: https://git.openjdk.org/jdk/pull/12726/files Webrev: https://webrevs.openjdk.org/?repo=jdk=12726=00 Issue: https://bugs.openjdk.org/browse/JDK-8301991 Stats: 29093 lines in 490 files changed: 6 ins; 0 del; 29087 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]
> Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has > poor extension capabilities, a complex management code because of lack of > strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, > achieving better memory density and providing flexible extensibility, while > exposing a strongly typed set of data when uncompressed. The stream is > compressed using the unsigned5 encoding, which alreay present in the JDK > (because of pack200) and the JVM (because JIT compulers use it to comrpess > debugging information). > > More technical details are available in the CR: > https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the > previous heterogeneous AccessFlags field into three distincts flag > categories: immutable flags from the class file, immutable fields defined by > the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have > been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: SA and JVMCI fixes - Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/12b4f1b4..f81337f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=12855=04 - incr: https://webrevs.openjdk.org/?repo=jdk=12855=03-04 Stats: 130 lines in 13 files changed: 14 ins; 46 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4]
On Tue, 14 Mar 2023 23:29:17 GMT, Calvin Cheung wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one >> additional commit since the last revision: >> >> RISCV port update > > src/hotspot/share/interpreter/bootstrapInfo.cpp line 234: > >> 232: if (_indy_index > -1) { >> 233: os::snprintf_checked(what, sizeof(what), "indy#%d", _indy_index); >> 234: } > > Since the `else` case doesn’t have braces, maybe omit the braces for this > case as well? The if statements below use braces so I think it would be better to add braces to the else case. - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4]
On Tue, 14 Mar 2023 01:25:01 GMT, Chris Plummer wrote: >> Frederic Parain has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fixes includes and style > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 75: > >> 73: int initialValueIndex; >> 74: int genericSignatureIndex; >> 75: int contendedGroup; > > It seems that these should all be shorts. All the getter methods are casting > them to short. Indexes in the constant pool are unsigned shorts, but Java shorts are signed, using ints is the simplest way to store those indexes. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java > line 108: > >> 106: CLASS_STATE_INITIALIZATION_ERROR = >> db.lookupIntConstant("InstanceKlass::initialization_error").intValue(); >> 107: // We need a new fieldsCache each time we attach. >> 108: fieldsCache = new HashMap(); > > This should probably be a WeakHashMap. I tried it and it seems to work (or at > least didn't cause any problems). However, when doing a heap dump I didn't > notice the table being any smaller on exit when it was made weak, even though > there were numerous GC's while dumping the heap. > > The `` is the Address of the hotspot InstanceKlass instance, and this > Address is referenced by the SA InstanceKlass mirror. So theoretically when > the reference to the mirror goes way, then the cache entry can be cleared. I've changed the map to a WeakHashMap and didn't see any issue during testing. But I didn't measure footprint. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java > line 325: > >> 323: >> 324: public int getFieldOffset(int index) { >> 325: return (int)getField(index).getOffset(); > > Cast to int is not needed Other APIs (like MetadaField) are using longs to pass offsets, doing a cast here is less disruptive than changing all the other APIs. - PR: https://git.openjdk.org/jdk/pull/12855
Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]
On Wed, 15 Mar 2023 01:25:33 GMT, Daniel D. Daugherty wrote: > I did Mach5 Tier{1,2,3} on v25. Please see the bug report for the gory > details: > > Tier1 - 1 known, unrelated failure Tier2 - 4 closed, unknown, related test > failures Tier3 - 8 closed, unknown, related test failures; 2 open, known, > unrelated test failures; 16 open, unknown, related test failures > > I'm pausing my Mach5 testing at this point. Hi Daniel, I could not reproduce any of the test failures, neither on x86_64 nor on aarch64. I have blindly fixed the code stub size issue, it seems rather trivial. Would it be possible to open/send me the failing test that triggers vframeArray assert or extract a reproducer that you could publish? I looked at it but could not figure out what could be going on there. Thanks, Roman - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
> This change adds a fast-locking scheme as an alternative to the current > stack-locking implementation. It retains the advantages of stack-locking > (namely fast locking in uncontended code-paths), while avoiding the overload > of the mark word. That overloading causes massive problems with Lilliput, > because it means we have to check and deal with this situation when trying to > access the mark-word. And because of the very racy nature, this turns out to > be very complex and would involve a variant of the inflation protocol to > ensure that the object header is stable. (The current implementation of > setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto > the stack which consists only of the displaced header, and CAS a pointer to > this stack location into the object header (the lowest two header bits being > 00 indicate 'stack-locked'). The pointer into the stack can then be used to > identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two > header bits to 00 to indicate 'fast-locked' but does *not* overload the upper > bits with a stack-pointer. Instead, it pushes the object-reference to a > thread-local lock-stack. This is a new structure which is basically a small > array of oops that is associated with each thread. Experience shows that this > array typcially remains very small (3-5 elements). Using this lock stack, it > is possible to query which threads own which locks. Most importantly, the > most common question 'does the current thread own me?' is very quickly > answered by doing a quick scan of the array. More complex queries like 'which > thread owns X?' are not performed in very performance-critical paths (usually > in code like JVMTI or deadlock detection) where it is ok to do more complex > operations (and we already do). The lock-stack is also a new set of GC roots, > and would be scanned during thread scanning, possibly concurrently, via the > normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for > potential overflow before attempting locking. When that is the case, locking > fast-paths would call into the runtime to grow the stack and handle the > locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check > on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive > locking (yet). When that happens, the fast-lock gets inflated to a full > monitor. It is not clear if it is worth to add support for recursive > fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, > it must inflate the fast-lock to a full monitor. Normally, we need to know > the current owning thread, and record that in the monitor, so that the > contending thread can wait for the current owner to properly exit the > monitor. However, fast-locking doesn't have this information. What we do > instead is to record a special marker ANONYMOUS_OWNER. When the thread that > currently holds the lock arrives at monitorexit, and observes > ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, > and then properly exits the monitor, and thus handing over to the contending > thread. > > As an alternative, I considered to remove stack-locking altogether, and only > use heavy monitors. In most workloads this did not show measurable > regressions. However, in a few workloads, I have observed severe regressions. > All of them have been using old synchronized Java collections (Vector, > Stack), StringBuffer or similar code. The combination of two conditions leads > to regressions without stack- or fast-locking: 1. The workload synchronizes > on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and > 2. The workload churns such locks. IOW, uncontended use of Vector, > StringBuffer, etc as such is ok, but creating lots of such single-use, > single-threaded-locked objects leads to massive ObjectMonitor churn, which > can lead to a significant performance impact. But alas, such code exists, and > we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the > (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the > hashcode has been installed. Fast-locked headers can be used directly, for > monitor-locked objects we can easily reach-through to the displaced header. > This is safe because Java threads participate in monitor deflation protocol. > This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2
Re: RFR: 8291555: Implement alternative fast-locking scheme [v25]
On Tue, 14 Mar 2023 18:46:47 GMT, Roman Kennke wrote: >> I've reviewed the changes in v23 and v24. Trying another >> Mach5 Tier1 job set. > >> I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. > > I just now pushed a simple change that changes the log message > 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. @rkennke : Hi, I have prepared some extra changes for RISC-V to make it work. See attachment. BTW: You might also want to use -w instructions in MacroAssembler::fast_unlock for aarch64. [more-riscv-changes.txt](https://github.com/openjdk/jdk/files/10977109/more-riscv-changes.txt) - PR: https://git.openjdk.org/jdk/pull/10907