Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Fri, 17 Mar 2023 06:33:43 GMT, Roman Kennke wrote: > > > > In my last changes I made a stupid mistake and don't set the condition > > > flags correctly to force the slow-path, on aarch64. This is only relevant > > > when we exceed the lock-stack capacity, that is why it's failing so > > > rarely. I don't see a similar problem on x86_64 - have we observed any > > > failures on x86_64? I pushed a fix for aarch64. > > > > > > I noticed this too for arm; I used cmp to clear EQ but using tst seems > > better. I also do it inside fast_lock, to give it a defined exit state wrt > > EQ|NE, since it saves me from having to think about this on every call > > site. But at least the fail case may be fiddly without conditional > > execution. > > Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null > register. Sure. I used cmp with an immediate that I knew was not the value. Clunky, I know. As I wrote, tst seems better. > Tst is better at least on x86 because it encodes smaller. _shrugs_ > > You can do it in the shared fast_lock() but it's really only needed in C2, > that's why I'm doing it there. Maybe I'm too perfectionist when it comes to > assembly code? I felt just better having it there, at least for the start. I may still move it outside to C2. Lets see. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Fri, 17 Mar 2023 06:15:28 GMT, Thomas Stuefe wrote: > > In my last changes I made a stupid mistake and don't set the condition > > flags correctly to force the slow-path, on aarch64. This is only relevant > > when we exceed the lock-stack capacity, that is why it's failing so rarely. > > I don't see a similar problem on x86_64 - have we observed any failures on > > x86_64? I pushed a fix for aarch64. > > > > I noticed this too for arm; I used cmp to clear EQ but using tst seems > better. I also do it inside fast_lock, to give it a defined exit state wrt > EQ|NE, since it saves me from having to think about this on every call site. > But at least the fail case may be fiddly without conditional execution. Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Tst is better at least on x86 because it encodes smaller. *shrugs* You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke wrote: > In my last changes I made a stupid mistake and don't set the condition flags > correctly to force the slow-path, on aarch64. This is only relevant when we > exceed the lock-stack capacity, that is why it's failing so rarely. I don't > see a similar problem on x86_64 - have we observed any failures on x86_64? I > pushed a fix for aarch64. I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v3]
> This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => > `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => > `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` > to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd`=> `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, > `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable > `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the > jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` > `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future > but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: address pre-review comments from Leonid - Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/397b6337..f3692263 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01-02 Stats: 22 lines in 2 files changed: 14 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054
RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE
The com/sun/jdi tests are located in the on package, and classes with same name cause 'class duplication error' when this directory is opened as source code in IDE. The simplest fix to avoid this is just to rename class. - Commit messages: - renamed classes Changes: https://git.openjdk.org/jdk/pull/13069/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13069&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304376 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13069.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13069/head:pull/13069 PR: https://git.openjdk.org/jdk/pull/13069
Re: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"
On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an > invoke on the breakpoint thread. The threads run until they generate 100 > breakpoints. There is an issue when these two threads are virtual threads. > Virtual threads are daemon threads. That means the JVM can exit while they > are still running. The above main() method is not waiting for these two > threads to exit, so main() exits immediately and the JVM starts the shutdown > process. It first must wait for all non-daemon threads to exit, but there are > none, so the JVM exits right away before the two threads are completed. The > end result of this early exit is that sometimes the invoke done by the > debugger never completes because the JVM has already issued a VMDeath event > and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they > complete before it exits, so they will always complete. The fix for virtual > threads is to do a join with t1 and t2. This forces the main() method to > block until they have completed. Marked as reviewed by lmesnik (Reviewer). - PR: https://git.openjdk.org/jdk/pull/13068
Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]
On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has >> poor extension capabilities, a complex management code because of lack of >> strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, >> achieving better memory density and providing flexible extensibility, while >> exposing a strongly typed set of data when uncompressed. The stream is >> compressed using the unsigned5 encoding, which alreay present in the JDK >> (because of pack200) and the JVM (because JIT compulers use it to comrpess >> debugging information). >> >> More technical details are available in the CR: >> https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the >> previous heterogeneous AccessFlags field into three distincts flag >> categories: immutable flags from the class file, immutable fields defined by >> the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, >> have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one > additional commit since the last revision: > > SA and JVMCI fixes Nice piece of work Fred - I won't pretend to follow every detail. A few nits on unnecessary alignment (which may match pre-existing style not evident in the diff). Thanks. src/hotspot/share/oops/fieldInfo.inline.hpp line 170: > 168: new_flags = old_flags & ~mask; > 169: witness = Atomic::cmpxchg(&flags, old_flags, new_flags); > 170: } while(witness != old_flags); Just to prove I did read this :) space needed after `while` src/hotspot/share/oops/fieldInfo.inline.hpp line 174: > 172: > 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) > { > 174: if (z)atomic_set_bits( _flags, flag_mask(pos)); Nit: extra space before `_flags` src/hotspot/share/oops/fieldInfo.inline.hpp line 175: > 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) > { > 174: if (z)atomic_set_bits( _flags, flag_mask(pos)); > 175: else atomic_clear_bits(_flags, flag_mask(pos)); Nit: no need for the extra spaces. If you really want these to align just place them on ne wlines. src/hotspot/share/oops/instanceKlass.inline.hpp line 50: > 48: > 49: inline Symbol* InstanceKlass::field_name(int index) const { > return field(index).name(constants()); } > 50: inline Symbol* InstanceKlass::field_signature (int index) const { > return field(index).signature(constants()); } There should not be spaces between a method name and the opening `(`. I'm really not a fine of this kind of alignment. - Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Thu, 16 Mar 2023 16:11:57 GMT, Richard Reingruber wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fixed aarch64 interpreter mistake > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335: > >> 2333: >> 2334: __ load_resolved_indy_entry(cache, index); >> 2335: __ ldr(method, Address(cache, >> in_bytes(ResolvedIndyEntry::method_offset(; > > Should this load have acquire semantics? > Like [here in template > interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239) > and [here for the zero > interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)? > > Call stack for zero interpreter is > > ConstantPoolCacheEntry::indices_ord() > ConstantPoolCacheEntry::bytecode_1() > ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code) > BytecodeInterpreter::run(interpreterState) Yes, acquire semantics should be used here. Thank you for pointing this out! - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]
On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"
On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an > invoke on the breakpoint thread. The threads run until they generate 100 > breakpoints. There is an issue when these two threads are virtual threads. > Virtual threads are daemon threads. That means the JVM can exit while they > are still running. The above main() method is not waiting for these two > threads to exit, so main() exits immediately and the JVM starts the shutdown > process. It first must wait for all non-daemon threads to exit, but there are > none, so the JVM exits right away before the two threads are completed. The > end result of this early exit is that sometimes the invoke done by the > debugger never completes because the JVM has already issued a VMDeath event > and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they > complete before it exits, so they will always complete. The fix for virtual > threads is to do a join with t1 and t2. This forces the main() method to > block until they have completed. Marked as reviewed by amenkov (Reviewer). test/jdk/com/sun/jdi/InvokeHangTest.java line 200: > 198: try { > 199: StackFrame sf = thread.frame(0); > 200: System.out.println(" Debugger: Breakpoint hit at > "+sf.location()); while you are here please add spaces around plus - PR: https://git.openjdk.org/jdk/pull/13068
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 21:05:54 GMT, Daniel D. Daugherty wrote: > > I pushed a fix for aarch64. > > > > Do you think this is the cause for the -Xcheck:jni failures that I ran into > > in my Tier4 testing? Yes, and with high probability also for some/all of the other failures. It leads to the situation that when the lock-stack is full, it should take the slow-path, but doesn't (because the flags are not set correctly) and thus leave the object unlocked. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"
On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer wrote: > The debuggee main method creates two threads and then starts them: > > > public static void main(String[] args) { > System.out.println("Howdy!"); > Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); > Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); > > t1.start(); > t2.start(); > } > > > These threads will hit breakpoints which the debugger handles and issues an > invoke on the breakpoint thread. The threads run until they generate 100 > breakpoints. There is an issue when these two threads are virtual threads. > Virtual threads are daemon threads. That means the JVM can exit while they > are still running. The above main() method is not waiting for these two > threads to exit, so main() exits immediately and the JVM starts the shutdown > process. It first must wait for all non-daemon threads to exit, but there are > none, so the JVM exits right away before the two threads are completed. The > end result of this early exit is that sometimes the invoke done by the > debugger never completes because the JVM has already issued a VMDeath event > and the debuggee has been disconnected. > > When these two threads are platform threads, the JVM has to wait until they > complete before it exits, so they will always complete. The fix for virtual > threads is to do a join with t1 and t2. This forces the main() method to > block until they have completed. test/jdk/com/sun/jdi/InvokeHangTest.java line 223: > 221: mainThread = bpe.thread(); > 222: EventRequestManager erm = vm().eventRequestManager(); > 223: final Thread mainTestThread = Thread.currentThread(); This local variable was conflicting with an instance field of the same name. See line 135. I ran into issues when I added some debugging code to this method that referenced the other `mainThread`, so I did a rename. - PR: https://git.openjdk.org/jdk/pull/13068
RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"
The debuggee main method creates two threads and then starts them: public static void main(String[] args) { System.out.println("Howdy!"); Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1); Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2); t1.start(); t2.start(); } These threads will hit breakpoints which the debugger handles and issues an invoke on the breakpoint thread. The threads run until they generate 100 breakpoints. There is an issue when these two threads are virtual threads. Virtual threads are daemon threads. That means the JVM can exit while they are still running. The above main() method is not waiting for these two threads to exit, so main() exits immediately and the JVM starts the shutdown process. It first must wait for all non-daemon threads to exit, but there are none, so the JVM exits right away before the two threads are completed. The end result of this early exit is that sometimes the invoke done by the debugger never completes because the JVM has already issued a VMDeath event and the debuggee has been disconnected. When these two threads are platform threads, the JVM has to wait until they complete before it exits, so they will always complete. The fix for virtual threads is to do a join with t1 and t2. This forces the main() method to block until they have completed. - Commit messages: - Make sure main() debuggee method does not exit until test threads are complete. Changes: https://git.openjdk.org/jdk/pull/13068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290200 Stats: 13 lines in 2 files changed: 6 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13068.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13068/head:pull/13068 PR: https://git.openjdk.org/jdk/pull/13068
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 20:57:31 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/x86_32.ad line 617: >> >>> 615: int bangsize = C->output()->bang_size_in_bytes(); >>> 616: >>> 617: __ verified_entry(framesize, >>> C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), >>> C->stub_function() != NULL); >> >> Why did this change from `nullptr` -> `NULL`? > > I reverted that part back to upstream state (at least what is in JDK-21+13). Okay. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke wrote: > I pushed a fix for aarch64. Do you think this is the cause for the -Xcheck:jni failures that I ran into in my Tier4 testing? - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 20:50:12 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Several changes (mostly cosmetic) in response to reviews > > src/hotspot/cpu/x86/x86_32.ad line 617: > >> 615: int bangsize = C->output()->bang_size_in_bytes(); >> 616: >> 617: __ verified_entry(framesize, >> C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), >> C->stub_function() != NULL); > > Why did this change from `nullptr` -> `NULL`? I reverted that part back to upstream state (at least what is in JDK-21+13). - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]
> This change adds a fast-locking scheme as an alternative to the current > stack-locking implementation. It retains the advantages of stack-locking > (namely fast locking in uncontended code-paths), while avoiding the overload > of the mark word. That overloading causes massive problems with Lilliput, > because it means we have to check and deal with this situation when trying to > access the mark-word. And because of the very racy nature, this turns out to > be very complex and would involve a variant of the inflation protocol to > ensure that the object header is stable. (The current implementation of > setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto > the stack which consists only of the displaced header, and CAS a pointer to > this stack location into the object header (the lowest two header bits being > 00 indicate 'stack-locked'). The pointer into the stack can then be used to > identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two > header bits to 00 to indicate 'fast-locked' but does *not* overload the upper > bits with a stack-pointer. Instead, it pushes the object-reference to a > thread-local lock-stack. This is a new structure which is basically a small > array of oops that is associated with each thread. Experience shows that this > array typcially remains very small (3-5 elements). Using this lock stack, it > is possible to query which threads own which locks. Most importantly, the > most common question 'does the current thread own me?' is very quickly > answered by doing a quick scan of the array. More complex queries like 'which > thread owns X?' are not performed in very performance-critical paths (usually > in code like JVMTI or deadlock detection) where it is ok to do more complex > operations (and we already do). The lock-stack is also a new set of GC roots, > and would be scanned during thread scanning, possibly concurrently, via the > normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for > potential overflow before attempting locking. When that is the case, locking > fast-paths would call into the runtime to grow the stack and handle the > locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check > on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive > locking (yet). When that happens, the fast-lock gets inflated to a full > monitor. It is not clear if it is worth to add support for recursive > fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, > it must inflate the fast-lock to a full monitor. Normally, we need to know > the current owning thread, and record that in the monitor, so that the > contending thread can wait for the current owner to properly exit the > monitor. However, fast-locking doesn't have this information. What we do > instead is to record a special marker ANONYMOUS_OWNER. When the thread that > currently holds the lock arrives at monitorexit, and observes > ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, > and then properly exits the monitor, and thus handing over to the contending > thread. > > As an alternative, I considered to remove stack-locking altogether, and only > use heavy monitors. In most workloads this did not show measurable > regressions. However, in a few workloads, I have observed severe regressions. > All of them have been using old synchronized Java collections (Vector, > Stack), StringBuffer or similar code. The combination of two conditions leads > to regressions without stack- or fast-locking: 1. The workload synchronizes > on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and > 2. The workload churns such locks. IOW, uncontended use of Vector, > StringBuffer, etc as such is ok, but creating lots of such single-use, > single-threaded-locked objects leads to massive ObjectMonitor churn, which > can lead to a significant performance impact. But alas, such code exists, and > we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the > (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the > hashcode has been installed. Fast-locked headers can be used directly, for > monitor-locked objects we can easily reach-through to the displaced header. > This is safe because Java threads participate in monitor deflation protocol. > This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_6
Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]
On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from >> subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve >> NoClassDefFoundError failures; It's known JTReg issue that "@build" actions >> for part of used shared classes may cause intermittent NoClassDefFoundError >> in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional > commit since the last revision: > > feedback Marked as reviewed by cjplummer (Reviewer). - PR: https://git.openjdk.org/jdk/pull/13030
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]
On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov wrote: >> The change: >> - updates UniqueVtableTest to follow standard SA way - attach to target from >> subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess; >> - updates several tests in the same directory to resolve >> NoClassDefFoundError failures; It's known JTReg issue that "@build" actions >> for part of used shared classes may cause intermittent NoClassDefFoundError >> in other tests which use the same shared library classpath. >> >> Tested: 100 runs on all platforms, no failures > > Alex Menkov has updated the pull request incrementally with one additional > commit since the last revision: > > feedback Marked as reviewed by sspitsyn (Reviewer). - PR: https://git.openjdk.org/jdk/pull/13030
Disallowing the dynamic loading of agents by default
Hi. In JDK 21 we intend to disallow the dynamic loading of agents by default. This will affect tools that use the Attach API to load an agent into a JVM some time after the JVM has started [1]. There is no change to any of the mechanisms that load an agent at JVM startup (-javaagent/-agentlib on the command line or the Launcher-Agent-Class attribute in the main JAR's manifest). This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. At that time the consensus was to switch to this default not in JDK 9 but in a later release to give tool maintainers sufficient time to inform their users. To allow the dynamic loading of agents, users will need to specify -XX:+EnableDynamicAgentLoading on the command line. I'll post a draft JEP for review shortly. -- Ron [1]: https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html [2]: https://openjdk.org/jeps/261 [3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3]
On Thu, 16 Mar 2023 18:31:23 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. >> (Excluding the Unicode space and tab sequence). The conversion was done >> using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the >> .properties files as UTF-8 during the conversion from .properties file to >> .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with two additional > commits since the last revision: > > - Reconvert CS.properties to UTF-8 > - Revert all changes to CurrencySymbols.properties test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1: > 1: # CurrencySymbols.properties is fully converted to UTF-8 now - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3]
> This PR converts Unicode sequences to UTF-8 native in .properties file. > (Excluding the Unicode space and tab sequence). The conversion was done using > native2ascii. > > In addition, the build logic is adjusted to support reading in the > .properties files as UTF-8 during the conversion from .properties file to > .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with two additional commits since the last revision: - Reconvert CS.properties to UTF-8 - Revert all changes to CurrencySymbols.properties - Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/6d6bffe8..7119830b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01-02 Stats: 87 lines in 1 file changed: 0 ins; 0 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 15 Mar 2023 16:18:44 GMT, Archie L. Cobbs wrote: >> Justin Lu has updated the pull request incrementally with four additional >> commits since the last revision: >> >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties > > test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1: > >> 1: # > > This file should probably be excluded because it's used in a test that > relates to UTF-8 encoding (or not) of property files. Thank you, removed the changes for this file - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Wed, 15 Mar 2023 20:19:51 GMT, Naoto Sato wrote: >> Justin Lu has updated the pull request incrementally with four additional >> commits since the last revision: >> >> - Bug6204853 should not be converted >> - Copyright year for CompileProperties >> - Redo translation for CS.properties >> - Spot convert CurrencySymbols.properties > > test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156: > >> 154: zh=\u00A4 >> 155: zh_CN=\uFFE5 >> 156: zh_HK=HK$ > > Why are they not encoded into UTF-8 native? Not sure, thank you for catching it. Working on it right now. - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
On Thu, 16 Mar 2023 18:19:29 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. >> (Excluding the Unicode space and tab sequence). The conversion was done >> using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the >> .properties files as UTF-8 during the conversion from .properties file to >> .java ListResourceBundle file. > > Justin Lu has updated the pull request incrementally with four additional > commits since the last revision: > > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1: > 1: # Conversion did not work as expected, addressing right now. - PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]
> This PR converts Unicode sequences to UTF-8 native in .properties file. > (Excluding the Unicode space and tab sequence). The conversion was done using > native2ascii. > > In addition, the build logic is adjusted to support reading in the > .properties files as UTF-8 during the conversion from .properties file to > .java ListResourceBundle file. Justin Lu has updated the pull request incrementally with four additional commits since the last revision: - Bug6204853 should not be converted - Copyright year for CompileProperties - Redo translation for CS.properties - Spot convert CurrencySymbols.properties - Changes: - all: https://git.openjdk.org/jdk/pull/12726/files - new: https://git.openjdk.org/jdk/pull/12726/files/1e798f24..6d6bffe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=00-01 Stats: 92 lines in 4 files changed: 0 ins; 0 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for >> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its >> ambigious fields f1 and f2. This structure can hold information for fields, >> methods, and invokedynamics and each of its fields can hold different types >> of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain >> invokedynamic information in a manner that is easy to interpret and easy to >> extend. Resolved invokedynamic entries will be stored in an array in the >> constant pool cache and the operand of the invokedynamic bytecode will be >> rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from >> ConstantPoolCacheEntry will be replaced with accesses to this new array and >> structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by >> @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one > additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335: > 2333: > 2334: __ load_resolved_indy_entry(cache, index); > 2335: __ ldr(method, Address(cache, > in_bytes(ResolvedIndyEntry::method_offset(; Should this load have acquire semantics? Like [here in template interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239) and [here for the zero interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)? Call stack for zero interpreter is ConstantPoolCacheEntry::indices_ord() ConstantPoolCacheEntry::bytecode_1() ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code) BytecodeInterpreter::run(interpreterState) - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5]
On Thu, 16 Mar 2023 14:50:14 GMT, Alan Bateman wrote: >> test/jdk/java/util/ServiceLoader/BadProvidersTest.java line 216: >> >>> 214: clb.withSuperclass(CD_Object); >>> 215: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER); >>> 216: var provider$1Desc = ClassDesc.of("p", >>> "ProviderFactory$1"); >> >> This is class descriptor for ProviderFactory$1, not "Provider" so maybe >> rename this to providerFactory1 or something a bit clearer. > >> This is class descriptor for ProviderFactory$1, not "Provider" so maybe >> rename this to providerFactory1 or something a bit clearer. > > The updated version looks good. I assume you'll do a pass over the updated > tests to bump their copyright date as this is the first change in 2023 for > many of these tests. Yes, the copyright years are updated. Tested Serializable, instrument, and LambdaStackTrace as of "[Fix LambdaStackTrace after running](https://github.com/openjdk/jdk/pull/13009/commits/a728c9de1ff684bd30726eb8ea6e7a674cb5a140)" - PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5]
> Summaries: > 1. A few recommendations about updating the constant API is made at > https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html > and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their > code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Fix LambdaStackTrace after running - Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/09bbe4d5..a728c9de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v4]
> Summaries: > 1. A few recommendations about updating the constant API is made at > https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html > and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their > code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request incrementally with one additional commit since the last revision: formatting - Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/a50b94f9..09bbe4d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]
On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has >> poor extension capabilities, a complex management code because of lack of >> strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, >> achieving better memory density and providing flexible extensibility, while >> exposing a strongly typed set of data when uncompressed. The stream is >> compressed using the unsigned5 encoding, which alreay present in the JDK >> (because of pack200) and the JVM (because JIT compulers use it to comrpess >> debugging information). >> >> More technical details are available in the CR: >> https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the >> previous heterogeneous AccessFlags field into three distincts flag >> categories: immutable flags from the class file, immutable fields defined by >> the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, >> have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one > additional commit since the last revision: > > SA and JVMCI fixes Marked as reviewed by dnsimon (Committer). - PR: https://git.openjdk.org/jdk/pull/12855
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v3]
> Summaries: > 1. A few recommendations about updating the constant API is made at > https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html > and I may update this patch shall the API changes be integrated before > 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their > code generation infrastructure upgraded from ASM to Classfile API. > 3. Most tests are included in tier1, but some are not: > In `:jdk_io`: (tier2, part 2) > > test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java > test/jdk/java/io/Serializable/records/ProhibitedMethods.java > test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java > > In `:jdk_instrument`: (tier 3) > > test/jdk/java/lang/instrument/RetransformAgent.java > test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java > test/jdk/java/lang/instrument/asmlib/Instrumentor.java > > > @asotona Would you mind reviewing? Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Fix failed LambdaStackTrace test, use more convenient APIs - Merge branch 'master' of https://git.openjdk.java.net/jdk into invoke-test-classfile - Shorten lines, move from mask() to ACC_ constants, other misc improvements - Convert test/jdk/java ASM tests to classfile api - Changes: - all: https://git.openjdk.org/jdk/pull/13009/files - new: https://git.openjdk.org/jdk/pull/13009/files/c6536bf9..a50b94f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=01-02 Stats: 68505 lines in 534 files changed: 42498 ins; 18129 del; 7878 mod Patch: https://git.openjdk.org/jdk/pull/13009.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009 PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2]
On Tue, 14 Mar 2023 07:53:25 GMT, Alan Bateman wrote: > This is class descriptor for ProviderFactory$1, not "Provider" so maybe > rename this to providerFactory1 or something a bit clearer. The updated version looks good. I assume you'll do a pass over the updated tests to bump their copyright date as this is the first change in 2023 for many of these tests. - PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2]
On Wed, 15 Mar 2023 04:09:04 GMT, Chen Liang wrote: >> Summaries: >> 1. A few recommendations about updating the constant API is made at >> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html >> and I may update this patch shall the API changes be integrated before >> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their >> code generation infrastructure upgraded from ASM to Classfile API. >> 3. Most tests are included in tier1, but some are not: >> In `:jdk_io`: (tier2, part 2) >> >> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java >> test/jdk/java/io/Serializable/records/ProhibitedMethods.java >> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java >> >> In `:jdk_instrument`: (tier 3) >> >> test/jdk/java/lang/instrument/RetransformAgent.java >> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java >> test/jdk/java/lang/instrument/asmlib/Instrumentor.java >> >> >> @asotona Would you mind reviewing? > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > Shorten lines, move from mask() to ACC_ constants, other misc improvements I like the easy way I can read the tests code now even I don't know them. They look great :) - PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2]
On Wed, 15 Mar 2023 04:09:04 GMT, Chen Liang wrote: >> Summaries: >> 1. A few recommendations about updating the constant API is made at >> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html >> and I may update this patch shall the API changes be integrated before >> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their >> code generation infrastructure upgraded from ASM to Classfile API. >> 3. Most tests are included in tier1, but some are not: >> In `:jdk_io`: (tier2, part 2) >> >> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java >> test/jdk/java/io/Serializable/records/ProhibitedMethods.java >> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java >> >> In `:jdk_instrument`: (tier 3) >> >> test/jdk/java/lang/instrument/RetransformAgent.java >> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java >> test/jdk/java/lang/instrument/asmlib/Instrumentor.java >> >> >> @asotona Would you mind reviewing? > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > Shorten lines, move from mask() to ACC_ constants, other misc improvements test/jdk/java/lang/Class/getSimpleName/GetSimpleNameTest.java line 174: > 172: clb.withSuperclass(CD_Object); > 173: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER); > 174: clb.accept(InnerClassesAttribute.of( During the API discussions there was slightly more recommended to use `ClasfileBuilder::with` over `ClasfileBuilder::accept`, however it is just a cosmetic difference. - PR: https://git.openjdk.org/jdk/pull/13009
Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]
> This change adds a fast-locking scheme as an alternative to the current > stack-locking implementation. It retains the advantages of stack-locking > (namely fast locking in uncontended code-paths), while avoiding the overload > of the mark word. That overloading causes massive problems with Lilliput, > because it means we have to check and deal with this situation when trying to > access the mark-word. And because of the very racy nature, this turns out to > be very complex and would involve a variant of the inflation protocol to > ensure that the object header is stable. (The current implementation of > setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto > the stack which consists only of the displaced header, and CAS a pointer to > this stack location into the object header (the lowest two header bits being > 00 indicate 'stack-locked'). The pointer into the stack can then be used to > identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two > header bits to 00 to indicate 'fast-locked' but does *not* overload the upper > bits with a stack-pointer. Instead, it pushes the object-reference to a > thread-local lock-stack. This is a new structure which is basically a small > array of oops that is associated with each thread. Experience shows that this > array typcially remains very small (3-5 elements). Using this lock stack, it > is possible to query which threads own which locks. Most importantly, the > most common question 'does the current thread own me?' is very quickly > answered by doing a quick scan of the array. More complex queries like 'which > thread owns X?' are not performed in very performance-critical paths (usually > in code like JVMTI or deadlock detection) where it is ok to do more complex > operations (and we already do). The lock-stack is also a new set of GC roots, > and would be scanned during thread scanning, possibly concurrently, via the > normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for > potential overflow before attempting locking. When that is the case, locking > fast-paths would call into the runtime to grow the stack and handle the > locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check > on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive > locking (yet). When that happens, the fast-lock gets inflated to a full > monitor. It is not clear if it is worth to add support for recursive > fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, > it must inflate the fast-lock to a full monitor. Normally, we need to know > the current owning thread, and record that in the monitor, so that the > contending thread can wait for the current owner to properly exit the > monitor. However, fast-locking doesn't have this information. What we do > instead is to record a special marker ANONYMOUS_OWNER. When the thread that > currently holds the lock arrives at monitorexit, and observes > ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, > and then properly exits the monitor, and thus handing over to the contending > thread. > > As an alternative, I considered to remove stack-locking altogether, and only > use heavy monitors. In most workloads this did not show measurable > regressions. However, in a few workloads, I have observed severe regressions. > All of them have been using old synchronized Java collections (Vector, > Stack), StringBuffer or similar code. The combination of two conditions leads > to regressions without stack- or fast-locking: 1. The workload synchronizes > on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and > 2. The workload churns such locks. IOW, uncontended use of Vector, > StringBuffer, etc as such is ok, but creating lots of such single-use, > single-threaded-locked objects leads to massive ObjectMonitor churn, which > can lead to a significant performance impact. But alas, such code exists, and > we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the > (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the > hashcode has been installed. Fast-locked headers can be used directly, for > monitor-locked objects we can easily reach-through to the displaced header. > This is safe because Java threads participate in monitor deflation protocol. > This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_6
Re: RFR: 8291555: Implement alternative fast-locking scheme [v21]
On Sat, 11 Mar 2023 14:52:54 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/share/runtime/lockStack.hpp line 64: > >> 62: >> 63: // GC support >> 64: inline void oops_do(OopClosure* cl); > > Does this need to be nonconst? Yes, because the OopClosures can (and will) update the inline array elements. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v21]
On Sat, 11 Mar 2023 14:57:19 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > >> 6232: >> 6233: // Load (object->mark() | 1) into hdr >> 6234: orr(hdr, hdr, markWord::unlocked_value); > > I wondered why this is needed. Should we not have the header of an unloaded > object in hdr? Or is this a safeguard against a misuse of this function > (called with the header of an already locked object)? It could be a monitor-locked header. In C2 this is not possible and we *could* save an instruction here, I guess. Not sure if it is worth it, though. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 10:26:26 GMT, Thomas Stuefe wrote: > Sounds good. Just to be clear, you mean enforce symmetric locking? resp. > forbid asymmetric locking? Yes, sorry, thanks for correcting! :) - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 10:20:21 GMT, Robbin Ehn wrote: > > > > > @rkennke I must be missing something. In aarch64, why do we handle the > > > > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we > > > > just seem to pop whatever is on top. > > > > > > > > > C1 and C2 don't allow assymmetric locking. If that ever happens, they > > > would refuse to compile the method. We should probably check that this > > > assumption holds true when popping the top entry in an #ASSERT block. > > > > > > Thanks for clarifying. Yes, asserting that would make sense. > > FYI: I'm trying to convince folks that JVMS should be allowed to enforce > asymmetric locking. We think most people don't know they will be stuck in > interpreter, unintended. What was discussed latest was to diagnose and warn > about this behavior as a first step. Sounds good. Just to be clear, you mean enforce symmetric locking? resp. forbid asymmetric locking? - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke wrote: >> I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is >> meant to be transitional, it is not meant to be used by end-users (except >> the bravest nerds) at all. When it lands, the Lilliput flag (e.g. >> +UseCompactObjectHeaders) will also control the locking flag. Eventually >> (e.g. release+1) both flags would become on by default and afterwards (e.g. >> release+2) would go away entirely, at which point the whole original >> stack-locking would disappear. > >> @rkennke I must be missing something. In aarch64, why do we handle the >> non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just >> seem to pop whatever is on top. > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would > refuse to compile the method. We should probably check that this assumption > holds true when popping the top entry in an #ASSERT block. > > > @rkennke I must be missing something. In aarch64, why do we handle the > > > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we > > > just seem to pop whatever is on top. > > > > > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would > > refuse to compile the method. We should probably check that this assumption > > holds true when popping the top entry in an #ASSERT block. > > Thanks for clarifying. Yes, asserting that would make sense. FYI: I'm trying to convince folks that JVMS should be allowed to enforce asymmetric locking. We think most people don't know they will be stuck in interpreter, unintended. What was discussed latest was to diagnose and warn about this behavior as a first step. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for >> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its >> ambigious fields f1 and f2. This structure can hold information for fields, >> methods, and invokedynamics and each of its fields can hold different types >> of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain >> invokedynamic information in a manner that is easy to interpret and easy to >> extend. Resolved invokedynamic entries will be stored in an array in the >> constant pool cache and the operand of the invokedynamic bytecode will be >> rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from >> ConstantPoolCacheEntry will be replaced with accesses to this new array and >> structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by >> @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one > additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > 3396: const Bytecodes::Code code = bytecode(); > 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; > 3398: const bool is_invokedynamic= code == false; // should not reach > here with invokedynamic This is what I meant. Suggestion: const bool is_invokedynamic= false; // should not reach here with invokedynamic Thanks! - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Thu, 16 Mar 2023 09:21:26 GMT, Martin Doerr wrote: >> Basically I kept the local variable as a name for the (now) constant value >> passed in the call at L3409. >> >> The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is >> declared in a shared header. >> >> I could replace the variable reference in the call with `false /* >> is_invokedynamic */` if you like that better. Personally I'd prefer the >> current version. > > I meant `code == false`. That was probably not intended. Oh my ... Your are right of course. - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Thu, 16 Mar 2023 09:07:27 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: >> >>> 3396: const Bytecodes::Code code = bytecode(); >>> 3397: const bool is_invokeinterface = code == >>> Bytecodes::_invokeinterface; >>> 3398: const bool is_invokedynamic= code == false; // should not reach >>> here with invokedynamic >> >> This looks strange! I guess you wanted to delete more? > > Basically I kept the local variable as a name for the (now) constant value > passed in the call at L3409. > > The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is > declared in a shared header. > > I could replace the variable reference in the call with `false /* > is_invokedynamic */` if you like that better. Personally I'd prefer the > current version. I meant `code == false`. That was probably not intended. - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]
On Wed, 15 Mar 2023 19:04:41 GMT, Martin Doerr wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fixed aarch64 interpreter mistake > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > >> 3396: const Bytecodes::Code code = bytecode(); >> 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; >> 3398: const bool is_invokedynamic= code == false; // should not reach >> here with invokedynamic > > This looks strange! I guess you wanted to delete more? Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. - PR: https://git.openjdk.org/jdk/pull/12778
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke wrote: > > > @rkennke I must be missing something. In aarch64, why do we handle the > > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just > > seem to pop whatever is on top. > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would > refuse to compile the method. We should probably check that this assumption > holds true when popping the top entry in an #ASSERT block. Thanks for clarifying. Yes, asserting that would make sense. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with three >> additional commits since the last revision: >> >> - More RISCV changes (by Fei Yang) >> - Use -w instructions in fast_unlock() >> - Increase stub size of C2HandleAnonOwnerStub to 18 > > I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is > meant to be transitional, it is not meant to be used by end-users (except the > bravest nerds) at all. When it lands, the Lilliput flag (e.g. > +UseCompactObjectHeaders) will also control the locking flag. Eventually > (e.g. release+1) both flags would become on by default and afterwards (e.g. > release+2) would go away entirely, at which point the whole original > stack-locking would disappear. > @rkennke I must be missing something. In aarch64, why do we handle the > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just > seem to pop whatever is on top. C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with three >> additional commits since the last revision: >> >> - More RISCV changes (by Fei Yang) >> - Use -w instructions in fast_unlock() >> - Increase stub size of C2HandleAnonOwnerStub to 18 > > I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is > meant to be transitional, it is not meant to be used by end-users (except the > bravest nerds) at all. When it lands, the Lilliput flag (e.g. > +UseCompactObjectHeaders) will also control the locking flag. Eventually > (e.g. release+1) both flags would become on by default and afterwards (e.g. > release+2) would go away entirely, at which point the whole original > stack-locking would disappear. @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Thu, 16 Mar 2023 08:00:38 GMT, Thomas Stuefe wrote: > I like UseNewLocks but fear that this may conflict with Oracles plan (?) to > move OMs into heap, which would be another revamp of locking - fat locks in > this case - and may come with yet another switch. Other than that, > UseNewLocks sounds good and succinct. > > Another proposal: UseThreadLockStack or UseLockStack Just a FYI, at the moment we have: product(ccstr, ObjectSynchronizerMode, "fast",\ "ObjectSynchronizer modes: " \ "legacy: legacy native system; " \ "native: java entry with native monitors; " \ "heavy: java entry with always inflated Java monitors; " \ "fast: java entry with fast-locks and"\ " inflate-on-demand Java monitors; ")\ At least personally I prefer one option than using many. A cmd line with e.g. `-XX:-UseLockStack -XX:+UseHeavyMonitors` It's harder, for me ?, to figure out what is selected and what was intended to be selected. - PR: https://git.openjdk.org/jdk/pull/10907
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current >> stack-locking implementation. It retains the advantages of stack-locking >> (namely fast locking in uncontended code-paths), while avoiding the overload >> of the mark word. That overloading causes massive problems with Lilliput, >> because it means we have to check and deal with this situation when trying >> to access the mark-word. And because of the very racy nature, this turns out >> to be very complex and would involve a variant of the inflation protocol to >> ensure that the object header is stable. (The current implementation of >> setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto >> the stack which consists only of the displaced header, and CAS a pointer to >> this stack location into the object header (the lowest two header bits being >> 00 indicate 'stack-locked'). The pointer into the stack can then be used to >> identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two >> header bits to 00 to indicate 'fast-locked' but does *not* overload the >> upper bits with a stack-pointer. Instead, it pushes the object-reference to >> a thread-local lock-stack. This is a new structure which is basically a >> small array of oops that is associated with each thread. Experience shows >> that this array typcially remains very small (3-5 elements). Using this lock >> stack, it is possible to query which threads own which locks. Most >> importantly, the most common question 'does the current thread own me?' is >> very quickly answered by doing a quick scan of the array. More complex >> queries like 'which thread owns X?' are not performed in very >> performance-critical paths (usually in code like JVMTI or deadlock >> detection) where it is ok to do more complex operations (and we already do). >> The lock-stack is also a new set of GC roots, and would be scanned during >> thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for >> potential overflow before attempting locking. When that is the case, locking >> fast-paths would call into the runtime to grow the stack and handle the >> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check >> on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive >> locking (yet). When that happens, the fast-lock gets inflated to a full >> monitor. It is not clear if it is worth to add support for recursive >> fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked >> object, it must inflate the fast-lock to a full monitor. Normally, we need >> to know the current owning thread, and record that in the monitor, so that >> the contending thread can wait for the current owner to properly exit the >> monitor. However, fast-locking doesn't have this information. What we do >> instead is to record a special marker ANONYMOUS_OWNER. When the thread that >> currently holds the lock arrives at monitorexit, and observes >> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, >> and then properly exits the monitor, and thus handing over to the contending >> thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only >> use heavy monitors. In most workloads this did not show measurable >> regressions. However, in a few workloads, I have observed severe >> regressions. All of them have been using old synchronized Java collections >> (Vector, Stack), StringBuffer or similar code. The combination of two >> conditions leads to regressions without stack- or fast-locking: 1. The >> workload synchronizes on uncontended locks (e.g. single-threaded use of >> Vector or StringBuffer) and 2. The workload churns such locks. IOW, >> uncontended use of Vector, StringBuffer, etc as such is ok, but creating >> lots of such single-use, single-threaded-locked objects leads to massive >> ObjectMonitor churn, which can lead to a significant performance impact. But >> alas, such code exists, and we probably don't want to punish it if we can >> avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the >> (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the >> hashcode has been installed. Fast-locked headers can be used directly, for >> monitor-locked objects we can easily reach-through to the displaced header. >> This is safe because Java threads participate in monitor deflation protocol.
Re: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v2]
> This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => > `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => > `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` > to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd`=> `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, > `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable > `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the > jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` > `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future > but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: include jniHandles.hpp into sharedRuntime.cpp - Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/8a379320..397b6337 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054
Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]
On Wed, 15 Mar 2023 20:02:11 GMT, Alex Menkov wrote: > It seems to me that it's much simpler to remove build action from 4 tests in > the directory than add it for other 55 True. Sadly we keep getting bitten over and over by CODETOOLS-7902847. Sometimes the "fix" is to remove build directives (in Hotspot we switched from adding to removing back in 2017) and sometimes it is to add them (there have been some recent fixes that have done this - including by me). - PR: https://git.openjdk.org/jdk/pull/13030