Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Thomas Stuefe
On Fri, 17 Mar 2023 06:33:43 GMT, Roman Kennke  wrote:

> 



> > > In my last changes I made a stupid mistake and don't set the condition 
> > > flags correctly to force the slow-path, on aarch64. This is only relevant 
> > > when we exceed the lock-stack capacity, that is why it's failing so 
> > > rarely. I don't see a similar problem on x86_64 - have we observed any 
> > > failures on x86_64? I pushed a fix for aarch64.
> > 
> > 
> > I noticed this too for arm; I used cmp to clear EQ but using tst seems 
> > better. I also do it inside fast_lock, to give it a defined exit state wrt 
> > EQ|NE, since it saves me from having to think about this on every call 
> > site. But at least the fail case may be fiddly without conditional 
> > execution.
> 
> Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null 
> register.

Sure. I used cmp with an immediate that I knew was not the value. Clunky, I 
know. As I wrote, tst seems better.

> Tst is better at least on x86 because it encodes smaller. _shrugs_
> 
> You can do it in the shared fast_lock() but it's really only needed in C2, 
> that's why I'm doing it there. Maybe I'm too perfectionist when it comes to 
> assembly code?

I felt just better having it there, at least for the start. I may still move it 
outside to C2. Lets see.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Roman Kennke
On Fri, 17 Mar 2023 06:15:28 GMT, Thomas Stuefe  wrote:

> > In my last changes I made a stupid mistake and don't set the condition 
> > flags correctly to force the slow-path, on aarch64. This is only relevant 
> > when we exceed the lock-stack capacity, that is why it's failing so rarely. 
> > I don't see a similar problem on x86_64 - have we observed any failures on 
> > x86_64? I pushed a fix for aarch64.
> 
> 
> 
> I noticed this too for arm; I used cmp to clear EQ but using tst seems 
> better. I also do it inside fast_lock, to give it a defined exit state wrt 
> EQ|NE, since it saves me from having to think about this on every call site. 
> But at least the fail case may be fiddly without conditional execution.

Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null 
register. Tst is better at least on x86 because it encodes smaller. *shrugs*

You can do it in the shared fast_lock() but it's really only needed in C2, 
that's why I'm doing it there. Maybe I'm too perfectionist when it comes to 
assembly code?

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Thomas Stuefe
On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke  wrote:

> In my last changes I made a stupid mistake and don't set the condition flags 
> correctly to force the slow-path, on aarch64. This is only relevant when we 
> exceed the lock-stack capacity, that is why it's failing so rarely. I don't 
> see a similar problem on x86_64 - have we observed any failures on x86_64? I 
> pushed a fix for aarch64.

I noticed this too for arm; I used cmp to clear EQ but using tst seems better. 
I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since 
it saves me from having to think about this on every call site. But at least 
the fail case may be fiddly without conditional execution.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v3]

2023-03-16 Thread Serguei Spitsyn
> This is needed for performance improvements in support of virtual threads.
> The update includes the following:
> 
> 1. Refactored the `VirtualThread` native methods:
> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd`  => 
> `notifyJvmtiMount`
> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => 
> `notifyJvmtiUnmount`
> 2. Still useful implementation of old native methods is moved from `jvm.cpp` 
> to `jvmtiThreadState.cpp`:
>  `JVM_VirtualThreadMountStart`   => `VTMS_mount_begin`
>  `JVM_VirtualThreadMountEnd` => `VTMS_mount_end`
>  `JVM_VirtualThreadUnmountStart`  = > `VTMS_unmount_begin`
>  `JVM_VirtualThreadUnmountEnd`=> `VTMS_mount_end`
> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, 
> `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`.
> 4. Removed the`VirtualThread` static boolean state variable 
> `notifyJvmtiEvents` and its support in `javaClasses`.
> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the 
> jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` 
> `notifyJvmtiEvents` variable.
> 
> Implementing the same methods as C1 intrinsics can be needed in the future 
> but is a low priority for now.  
> 
> Testing:
>  - Ran mach5 tiers 1-6. No regressions were found.

Serguei Spitsyn has updated the pull request incrementally with one additional 
commit since the last revision:

  address pre-review comments from Leonid

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13054/files
  - new: https://git.openjdk.org/jdk/pull/13054/files/397b6337..f3692263

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01-02

  Stats: 22 lines in 2 files changed: 14 ins; 4 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/13054.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054

PR: https://git.openjdk.org/jdk/pull/13054


RFR: 8304376: Rename t1/t2 classes in com/sun/jdi/CLETest.java to avoid class duplication error in IDE

2023-03-16 Thread Leonid Mesnik
The com/sun/jdi tests are located in the on package, and classes with same name 
cause 'class duplication error' when this directory is opened as source code in 
IDE.

The simplest fix to avoid this is just to rename class.

-

Commit messages:
 - renamed classes

Changes: https://git.openjdk.org/jdk/pull/13069/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13069&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8304376
  Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13069.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13069/head:pull/13069

PR: https://git.openjdk.org/jdk/pull/13069


Re: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"

2023-03-16 Thread Leonid Mesnik
On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer  wrote:

> The debuggee main method creates two threads and then starts them:
> 
> 
> public static void main(String[] args) {
> System.out.println("Howdy!");
> Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1);
> Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2);
> 
> t1.start();
> t2.start();
> }
> 
> 
> These threads will hit breakpoints which the debugger handles and issues an 
> invoke on the breakpoint thread. The threads run until they generate 100 
> breakpoints. There is an issue when these two threads are virtual threads. 
> Virtual threads are daemon threads. That means the JVM can exit while they 
> are still running. The above main() method is not waiting for these two 
> threads to exit, so main() exits immediately and the JVM starts the shutdown 
> process. It first must wait for all non-daemon threads to exit, but there are 
> none, so the JVM exits right away before the two threads are completed.  The 
> end result of this early exit is that sometimes the invoke done by the 
> debugger never completes because the JVM has already issued a VMDeath event 
> and the debuggee has been disconnected.
> 
> When these two threads are platform threads, the JVM has to wait until they 
> complete before it exits, so they will always complete. The fix for virtual 
> threads is to do a join with t1 and t2. This forces the main() method to 
> block until they have completed.

Marked as reviewed by lmesnik (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/13068


Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]

2023-03-16 Thread David Holmes
On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain  wrote:

>> Please review this change re-implementing the FieldInfo data structure.
>> 
>> The FieldInfo array is an old data structure storing fields metadata. It has 
>> poor extension capabilities, a complex management code because of lack of 
>> strong typing and semantic overloading, and a poor memory efficiency.
>> 
>> The new implementation uses a compressed stream to store those metadata, 
>> achieving better memory density and providing flexible extensibility, while 
>> exposing a strongly typed set of data when uncompressed. The stream is 
>> compressed using the unsigned5 encoding, which alreay present in the JDK 
>> (because of pack200) and the JVM (because JIT compulers use it to comrpess 
>> debugging information).
>> 
>> More technical details are available in the CR: 
>> https://bugs.openjdk.org/browse/JDK-8292818
>> 
>> Those changes include a re-organisation of fields' flags, splitting the 
>> previous heterogeneous AccessFlags field into three distincts flag 
>> categories: immutable flags from the class file, immutable fields defined by 
>> the JVM, and finally mutable flags defined by the JVM.
>> 
>> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, 
>> have been updated too to deal with the new FieldInfo format.
>> 
>> Tested with mach5, tier 1 to 7.
>> 
>> Thank you.
>
> Frederic Parain has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   SA and JVMCI fixes

Nice piece of work Fred - I won't pretend to follow every detail.

A few nits on unnecessary alignment (which may match pre-existing style not 
evident in the diff).

Thanks.

src/hotspot/share/oops/fieldInfo.inline.hpp line 170:

> 168: new_flags = old_flags & ~mask;
> 169: witness = Atomic::cmpxchg(&flags, old_flags, new_flags);
> 170:   } while(witness != old_flags);

Just to prove I did read this :) space needed after `while`

src/hotspot/share/oops/fieldInfo.inline.hpp line 174:

> 172: 
> 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) 
> {
> 174:   if (z)atomic_set_bits(  _flags, flag_mask(pos));

Nit: extra space before `_flags`

src/hotspot/share/oops/fieldInfo.inline.hpp line 175:

> 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) 
> {
> 174:   if (z)atomic_set_bits(  _flags, flag_mask(pos));
> 175:   else  atomic_clear_bits(_flags, flag_mask(pos));

Nit: no need for the extra spaces. If you really want these to align just place 
them on ne wlines.

src/hotspot/share/oops/instanceKlass.inline.hpp line 50:

> 48: 
> 49: inline Symbol* InstanceKlass::field_name(int index) const { 
> return field(index).name(constants()); }
> 50: inline Symbol* InstanceKlass::field_signature   (int index) const { 
> return field(index).signature(constants()); }

There should not be spaces between a method name and the opening `(`. I'm 
really not a fine of this kind of alignment.

-

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12855


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-16 Thread Matias Saavedra Silva
On Thu, 16 Mar 2023 16:11:57 GMT, Richard Reingruber  wrote:

>> Matias Saavedra Silva has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Fixed aarch64 interpreter mistake
>
> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335:
> 
>> 2333: 
>> 2334:   __ load_resolved_indy_entry(cache, index);
>> 2335:   __ ldr(method, Address(cache, 
>> in_bytes(ResolvedIndyEntry::method_offset(;
> 
> Should this load have acquire semantics?
> Like [here in template 
> interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239)
>  and [here for the zero 
> interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)?
> 
> Call stack for zero interpreter is
> 
> ConstantPoolCacheEntry::indices_ord()
> ConstantPoolCacheEntry::bytecode_1()
> ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code)
> BytecodeInterpreter::run(interpreterState)

Yes, acquire semantics should be used here. Thank you for pointing this out!

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]

2023-03-16 Thread Daniel D . Daugherty
On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"

2023-03-16 Thread Alex Menkov
On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer  wrote:

> The debuggee main method creates two threads and then starts them:
> 
> 
> public static void main(String[] args) {
> System.out.println("Howdy!");
> Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1);
> Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2);
> 
> t1.start();
> t2.start();
> }
> 
> 
> These threads will hit breakpoints which the debugger handles and issues an 
> invoke on the breakpoint thread. The threads run until they generate 100 
> breakpoints. There is an issue when these two threads are virtual threads. 
> Virtual threads are daemon threads. That means the JVM can exit while they 
> are still running. The above main() method is not waiting for these two 
> threads to exit, so main() exits immediately and the JVM starts the shutdown 
> process. It first must wait for all non-daemon threads to exit, but there are 
> none, so the JVM exits right away before the two threads are completed.  The 
> end result of this early exit is that sometimes the invoke done by the 
> debugger never completes because the JVM has already issued a VMDeath event 
> and the debuggee has been disconnected.
> 
> When these two threads are platform threads, the JVM has to wait until they 
> complete before it exits, so they will always complete. The fix for virtual 
> threads is to do a join with t1 and t2. This forces the main() method to 
> block until they have completed.

Marked as reviewed by amenkov (Reviewer).

test/jdk/com/sun/jdi/InvokeHangTest.java line 200:

> 198: try {
> 199: StackFrame sf = thread.frame(0);
> 200: System.out.println("  Debugger: Breakpoint hit at 
> "+sf.location());

while you are here please add spaces around plus

-

PR: https://git.openjdk.org/jdk/pull/13068


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Roman Kennke
On Thu, 16 Mar 2023 21:05:54 GMT, Daniel D. Daugherty  
wrote:

> > I pushed a fix for aarch64.
> 
> 
> 
> Do you think this is the cause for the -Xcheck:jni failures that I ran into
> 
> in my Tier4 testing?

Yes, and with high probability also for some/all of the other failures. It 
leads to the situation that when the lock-stack is full, it should take the 
slow-path, but doesn't (because the flags are not set correctly) and thus leave 
the object unlocked.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"

2023-03-16 Thread Chris Plummer
On Thu, 16 Mar 2023 21:02:09 GMT, Chris Plummer  wrote:

> The debuggee main method creates two threads and then starts them:
> 
> 
> public static void main(String[] args) {
> System.out.println("Howdy!");
> Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1);
> Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2);
> 
> t1.start();
> t2.start();
> }
> 
> 
> These threads will hit breakpoints which the debugger handles and issues an 
> invoke on the breakpoint thread. The threads run until they generate 100 
> breakpoints. There is an issue when these two threads are virtual threads. 
> Virtual threads are daemon threads. That means the JVM can exit while they 
> are still running. The above main() method is not waiting for these two 
> threads to exit, so main() exits immediately and the JVM starts the shutdown 
> process. It first must wait for all non-daemon threads to exit, but there are 
> none, so the JVM exits right away before the two threads are completed.  The 
> end result of this early exit is that sometimes the invoke done by the 
> debugger never completes because the JVM has already issued a VMDeath event 
> and the debuggee has been disconnected.
> 
> When these two threads are platform threads, the JVM has to wait until they 
> complete before it exits, so they will always complete. The fix for virtual 
> threads is to do a join with t1 and t2. This forces the main() method to 
> block until they have completed.

test/jdk/com/sun/jdi/InvokeHangTest.java line 223:

> 221: mainThread = bpe.thread();
> 222: EventRequestManager erm = vm().eventRequestManager();
> 223: final Thread mainTestThread = Thread.currentThread();

This local variable was conflicting with an instance field of the same name. 
See line 135. I ran into issues when I added some debugging code to this method 
that referenced the other `mainThread`, so I did a rename.

-

PR: https://git.openjdk.org/jdk/pull/13068


RFR: 8290200: com/sun/jdi/InvokeHangTest.java fails with "Debuggee appears to be hung"

2023-03-16 Thread Chris Plummer
The debuggee main method creates two threads and then starts them:


public static void main(String[] args) {
System.out.println("Howdy!");
Thread t1 = TestScaffold.newThread(new InvokeHangTarg(), name1);
Thread t2 = TestScaffold.newThread(new InvokeHangTarg(), name2);

t1.start();
t2.start();
}


These threads will hit breakpoints which the debugger handles and issues an 
invoke on the breakpoint thread. The threads run until they generate 100 
breakpoints. There is an issue when these two threads are virtual threads. 
Virtual threads are daemon threads. That means the JVM can exit while they are 
still running. The above main() method is not waiting for these two threads to 
exit, so main() exits immediately and the JVM starts the shutdown process. It 
first must wait for all non-daemon threads to exit, but there are none, so the 
JVM exits right away before the two threads are completed.  The end result of 
this early exit is that sometimes the invoke done by the debugger never 
completes because the JVM has already issued a VMDeath event and the debuggee 
has been disconnected.

When these two threads are platform threads, the JVM has to wait until they 
complete before it exits, so they will always complete. The fix for virtual 
threads is to do a join with t1 and t2. This forces the main() method to block 
until they have completed.

-

Commit messages:
 - Make sure main() debuggee method does not exit until test threads are 
complete.

Changes: https://git.openjdk.org/jdk/pull/13068/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13068&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8290200
  Stats: 13 lines in 2 files changed: 6 ins; 2 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/13068.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13068/head:pull/13068

PR: https://git.openjdk.org/jdk/pull/13068


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Daniel D . Daugherty
On Thu, 16 Mar 2023 20:57:31 GMT, Roman Kennke  wrote:

>> src/hotspot/cpu/x86/x86_32.ad line 617:
>> 
>>> 615:   int bangsize = C->output()->bang_size_in_bytes();
>>> 616: 
>>> 617:   __ verified_entry(framesize, 
>>> C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), 
>>> C->stub_function() != NULL);
>> 
>> Why did this change from `nullptr` -> `NULL`?
>
> I reverted that part back to upstream state (at least what is in JDK-21+13).

Okay.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Daniel D . Daugherty
On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke  wrote:

> I pushed a fix for aarch64.

Do you think this is the cause for the -Xcheck:jni failures that I ran into
in my Tier4 testing?

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Roman Kennke
On Thu, 16 Mar 2023 20:50:12 GMT, Daniel D. Daugherty  
wrote:

>> Roman Kennke has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Several changes (mostly cosmetic) in response to reviews
>
> src/hotspot/cpu/x86/x86_32.ad line 617:
> 
>> 615:   int bangsize = C->output()->bang_size_in_bytes();
>> 616: 
>> 617:   __ verified_entry(framesize, 
>> C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), 
>> C->stub_function() != NULL);
> 
> Why did this change from `nullptr` -> `NULL`?

I reverted that part back to upstream state (at least what is in JDK-21+13).

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Daniel D . Daugherty
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Roman Kennke
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]

2023-03-16 Thread Roman Kennke
> This change adds a fast-locking scheme as an alternative to the current 
> stack-locking implementation. It retains the advantages of stack-locking 
> (namely fast locking in uncontended code-paths), while avoiding the overload 
> of the mark word. That overloading causes massive problems with Lilliput, 
> because it means we have to check and deal with this situation when trying to 
> access the mark-word. And because of the very racy nature, this turns out to 
> be very complex and would involve a variant of the inflation protocol to 
> ensure that the object header is stable. (The current implementation of 
> setting/fetching the i-hash provides a glimpse into the complexity).
> 
> What the original stack-locking does is basically to push a stack-lock onto 
> the stack which consists only of the displaced header, and CAS a pointer to 
> this stack location into the object header (the lowest two header bits being 
> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
> identify which thread currently owns the lock.
> 
> This change basically reverses stack-locking: It still CASes the lowest two 
> header bits to 00 to indicate 'fast-locked' but does *not* overload the upper 
> bits with a stack-pointer. Instead, it pushes the object-reference to a 
> thread-local lock-stack. This is a new structure which is basically a small 
> array of oops that is associated with each thread. Experience shows that this 
> array typcially remains very small (3-5 elements). Using this lock stack, it 
> is possible to query which threads own which locks. Most importantly, the 
> most common question 'does the current thread own me?' is very quickly 
> answered by doing a quick scan of the array. More complex queries like 'which 
> thread owns X?' are not performed in very performance-critical paths (usually 
> in code like JVMTI or deadlock detection) where it is ok to do more complex 
> operations (and we already do). The lock-stack is also a new set of GC roots, 
> and would be scanned during thread scanning, possibly concurrently, via the 
> normal p
 rotocols.
> 
> The lock-stack is grown when needed. This means that we need to check for 
> potential overflow before attempting locking. When that is the case, locking 
> fast-paths would call into the runtime to grow the stack and handle the 
> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
> on method entry to avoid (possibly lots) of such checks at locking sites.
> 
> In contrast to stack-locking, fast-locking does *not* support recursive 
> locking (yet). When that happens, the fast-lock gets inflated to a full 
> monitor. It is not clear if it is worth to add support for recursive 
> fast-locking.
> 
> One trouble is that when a contending thread arrives at a fast-locked object, 
> it must inflate the fast-lock to a full monitor. Normally, we need to know 
> the current owning thread, and record that in the monitor, so that the 
> contending thread can wait for the current owner to properly exit the 
> monitor. However, fast-locking doesn't have this information. What we do 
> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
> currently holds the lock arrives at monitorexit, and observes 
> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
> and then properly exits the monitor, and thus handing over to the contending 
> thread.
> 
> As an alternative, I considered to remove stack-locking altogether, and only 
> use heavy monitors. In most workloads this did not show measurable 
> regressions. However, in a few workloads, I have observed severe regressions. 
> All of them have been using old synchronized Java collections (Vector, 
> Stack), StringBuffer or similar code. The combination of two conditions leads 
> to regressions without stack- or fast-locking: 1. The workload synchronizes 
> on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 
> 2. The workload churns such locks. IOW, uncontended use of Vector, 
> StringBuffer, etc as such is ok, but creating lots of such single-use, 
> single-threaded-locked objects leads to massive ObjectMonitor churn, which 
> can lead to a significant performance impact. But alas, such code exists, and 
> we probably don't want to punish it if we can avoid it.
> 
> This change enables to simplify (and speed-up!) a lot of code:
> 
> - The inflation protocol is no longer necessary: we can directly CAS the 
> (tagged) ObjectMonitor pointer to the object header.
> - Accessing the hashcode could now be done in the fastpath always, if the 
> hashcode has been installed. Fast-locked headers can be used directly, for 
> monitor-locked objects we can easily reach-through to the displaced header. 
> This is safe because Java threads participate in monitor deflation protocol. 
> This would be implemented in a separate PR
> 
> 
> Testing:
>  - [x] tier1 x86_64 x aarch64 x +UseFastLocking
>  - [x] tier2 x86_6

Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]

2023-03-16 Thread Chris Plummer
On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov  wrote:

>> The change:
>> - updates UniqueVtableTest to follow standard SA way - attach to target from 
>> subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess;
>> - updates several tests in the same directory to resolve 
>> NoClassDefFoundError failures; It's known JTReg issue that "@build" actions 
>> for part of used shared classes may cause intermittent NoClassDefFoundError 
>> in other tests which use the same shared library classpath.
>> 
>> Tested: 100 runs on all platforms, no failures
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   feedback

Marked as reviewed by cjplummer (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/13030


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Daniel D . Daugherty
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Daniel D . Daugherty
On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]

2023-03-16 Thread Serguei Spitsyn
On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov  wrote:

>> The change:
>> - updates UniqueVtableTest to follow standard SA way - attach to target from 
>> subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess;
>> - updates several tests in the same directory to resolve 
>> NoClassDefFoundError failures; It's known JTReg issue that "@build" actions 
>> for part of used shared classes may cause intermittent NoClassDefFoundError 
>> in other tests which use the same shared library classpath.
>> 
>> Tested: 100 runs on all platforms, no failures
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   feedback

Marked as reviewed by sspitsyn (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/13030


Disallowing the dynamic loading of agents by default

2023-03-16 Thread Ron Pressler
Hi.

In JDK 21 we intend to disallow the dynamic loading of agents by default. This
will affect tools that use the Attach API to load an agent into a JVM some time
after the JVM has started [1]. There is no change to any of the mechanisms that
load an agent at JVM startup (-javaagent/-agentlib on the command line or the
Launcher-Agent-Class attribute in the main JAR's manifest).

This change in default behavior was proposed in 2017 as part of JEP 261 [2][3]. 
At that time the consensus was to switch to this default not in JDK 9 but in a 
later release to give tool maintainers sufficient time to inform their users. 
To allow the dynamic loading of agents, users will need to specify 
-XX:+EnableDynamicAgentLoading on the command line.

I'll post a draft JEP for review shortly.

-- Ron

[1]: 
https://docs.oracle.com/en/java/javase/19/docs/api/jdk.attach/com/sun/tools/attach/package-summary.html
[2]: https://openjdk.org/jeps/261
[3]: https://mail.openjdk.org/pipermail/jigsaw-dev/2017-April/012040.html

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3]

2023-03-16 Thread Justin Lu
On Thu, 16 Mar 2023 18:31:23 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Reconvert CS.properties to UTF-8
>  - Revert all changes to CurrencySymbols.properties

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1:

> 1: #

CurrencySymbols.properties is fully converted to UTF-8 now

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v3]

2023-03-16 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with two additional 
commits since the last revision:

 - Reconvert CS.properties to UTF-8
 - Revert all changes to CurrencySymbols.properties

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/6d6bffe8..7119830b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01-02

  Stats: 87 lines in 1 file changed: 0 ins; 0 del; 87 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Wed, 15 Mar 2023 16:18:44 GMT, Archie L. Cobbs  wrote:

>> Justin Lu has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Bug6204853 should not be converted
>>  - Copyright year for CompileProperties
>>  - Redo translation for CS.properties
>>  - Spot convert CurrencySymbols.properties
>
> test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1:
> 
>> 1: #
> 
> This file should probably be excluded because it's used in a test that 
> relates to UTF-8 encoding (or not) of property files.

Thank you, removed the changes for this file

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Wed, 15 Mar 2023 20:19:51 GMT, Naoto Sato  wrote:

>> Justin Lu has updated the pull request incrementally with four additional 
>> commits since the last revision:
>> 
>>  - Bug6204853 should not be converted
>>  - Copyright year for CompileProperties
>>  - Redo translation for CS.properties
>>  - Spot convert CurrencySymbols.properties
>
> test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156:
> 
>> 154: zh=\u00A4
>> 155: zh_CN=\uFFE5
>> 156: zh_HK=HK$
> 
> Why are they not encoded into UTF-8 native?

Not sure, thank you for catching it. Working on it right now.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
On Thu, 16 Mar 2023 18:19:29 GMT, Justin Lu  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> Justin Lu has updated the pull request incrementally with four additional 
> commits since the last revision:
> 
>  - Bug6204853 should not be converted
>  - Copyright year for CompileProperties
>  - Redo translation for CS.properties
>  - Spot convert CurrencySymbols.properties

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 1:

> 1: #

Conversion did not work as expected, addressing right now.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v2]

2023-03-16 Thread Justin Lu
> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

Justin Lu has updated the pull request incrementally with four additional 
commits since the last revision:

 - Bug6204853 should not be converted
 - Copyright year for CompileProperties
 - Redo translation for CS.properties
 - Spot convert CurrencySymbols.properties

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12726/files
  - new: https://git.openjdk.org/jdk/pull/12726/files/1e798f24..6d6bffe8

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=00-01

  Stats: 92 lines in 4 files changed: 0 ins; 0 del; 92 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-16 Thread Richard Reingruber
On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva  
wrote:

>> The current structure used to store the resolution information for 
>> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its 
>> ambigious fields f1 and f2. This structure can hold information for fields, 
>> methods, and invokedynamics and each of its fields can hold different types 
>> of values depending on the entry. 
>> 
>> This enhancement proposes a new structure to exclusively contain 
>> invokedynamic information in a manner that is easy to interpret and easy to 
>> extend.  Resolved invokedynamic entries will be stored in an array in the 
>> constant pool cache and the operand of the invokedynamic bytecode will be 
>> rewritten to be the index into this array.
>> 
>> Any areas that previously accessed invokedynamic data from 
>> ConstantPoolCacheEntry will be replaced with accesses to this new array and 
>> structure. Verified with tier1-9 tests.
>> 
>> The PPC was provided by @reinrich and the RISCV port was provided by 
>> @DingliZhang and @zifeihan.
>> 
>> This change supports the following platforms: x86, aarch64, PPC, and RISCV
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fixed aarch64 interpreter mistake

src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335:

> 2333: 
> 2334:   __ load_resolved_indy_entry(cache, index);
> 2335:   __ ldr(method, Address(cache, 
> in_bytes(ResolvedIndyEntry::method_offset(;

Should this load have acquire semantics?
Like [here in template 
interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239)
 and [here for the zero 
interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)?

Call stack for zero interpreter is

ConstantPoolCacheEntry::indices_ord()
ConstantPoolCacheEntry::bytecode_1()
ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code)
BytecodeInterpreter::run(interpreterState)

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5]

2023-03-16 Thread Chen Liang
On Thu, 16 Mar 2023 14:50:14 GMT, Alan Bateman  wrote:

>> test/jdk/java/util/ServiceLoader/BadProvidersTest.java line 216:
>> 
>>> 214: clb.withSuperclass(CD_Object);
>>> 215: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER);
>>> 216: var provider$1Desc = ClassDesc.of("p", 
>>> "ProviderFactory$1");
>> 
>> This is class descriptor for ProviderFactory$1, not "Provider" so maybe 
>> rename this to providerFactory1 or something a bit clearer.
>
>> This is class descriptor for ProviderFactory$1, not "Provider" so maybe 
>> rename this to providerFactory1 or something a bit clearer.
> 
> The updated version looks good. I assume you'll do a pass over the updated 
> tests to bump their copyright date as this is the first change in 2023 for 
> many of these tests.

Yes, the copyright years are updated. Tested Serializable, instrument, and 
LambdaStackTrace as of "[Fix LambdaStackTrace after 
running](https://github.com/openjdk/jdk/pull/13009/commits/a728c9de1ff684bd30726eb8ea6e7a674cb5a140)"

-

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v5]

2023-03-16 Thread Chen Liang
> Summaries:
> 1. A few recommendations about updating the constant API is made at 
> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html 
> and I may update this patch shall the API changes be integrated before
> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their 
> code generation infrastructure upgraded from ASM to Classfile API.
> 3. Most tests are included in tier1, but some are not:
> In `:jdk_io`: (tier2, part 2)
> 
> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java
> test/jdk/java/io/Serializable/records/ProhibitedMethods.java
> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java
> 
> In `:jdk_instrument`: (tier 3)
> 
> test/jdk/java/lang/instrument/RetransformAgent.java
> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java
> test/jdk/java/lang/instrument/asmlib/Instrumentor.java
> 
> 
> @asotona Would you mind reviewing?

Chen Liang has updated the pull request incrementally with one additional 
commit since the last revision:

  Fix LambdaStackTrace after running

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13009/files
  - new: https://git.openjdk.org/jdk/pull/13009/files/09bbe4d5..a728c9de

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=04
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13009.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v4]

2023-03-16 Thread Chen Liang
> Summaries:
> 1. A few recommendations about updating the constant API is made at 
> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html 
> and I may update this patch shall the API changes be integrated before
> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their 
> code generation infrastructure upgraded from ASM to Classfile API.
> 3. Most tests are included in tier1, but some are not:
> In `:jdk_io`: (tier2, part 2)
> 
> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java
> test/jdk/java/io/Serializable/records/ProhibitedMethods.java
> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java
> 
> In `:jdk_instrument`: (tier 3)
> 
> test/jdk/java/lang/instrument/RetransformAgent.java
> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java
> test/jdk/java/lang/instrument/asmlib/Instrumentor.java
> 
> 
> @asotona Would you mind reviewing?

Chen Liang has updated the pull request incrementally with one additional 
commit since the last revision:

  formatting

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13009/files
  - new: https://git.openjdk.org/jdk/pull/13009/files/a50b94f9..09bbe4d5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=02-03

  Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/13009.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]

2023-03-16 Thread Doug Simon
On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain  wrote:

>> Please review this change re-implementing the FieldInfo data structure.
>> 
>> The FieldInfo array is an old data structure storing fields metadata. It has 
>> poor extension capabilities, a complex management code because of lack of 
>> strong typing and semantic overloading, and a poor memory efficiency.
>> 
>> The new implementation uses a compressed stream to store those metadata, 
>> achieving better memory density and providing flexible extensibility, while 
>> exposing a strongly typed set of data when uncompressed. The stream is 
>> compressed using the unsigned5 encoding, which alreay present in the JDK 
>> (because of pack200) and the JVM (because JIT compulers use it to comrpess 
>> debugging information).
>> 
>> More technical details are available in the CR: 
>> https://bugs.openjdk.org/browse/JDK-8292818
>> 
>> Those changes include a re-organisation of fields' flags, splitting the 
>> previous heterogeneous AccessFlags field into three distincts flag 
>> categories: immutable flags from the class file, immutable fields defined by 
>> the JVM, and finally mutable flags defined by the JVM.
>> 
>> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, 
>> have been updated too to deal with the new FieldInfo format.
>> 
>> Tested with mach5, tier 1 to 7.
>> 
>> Thank you.
>
> Frederic Parain has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   SA and JVMCI fixes

Marked as reviewed by dnsimon (Committer).

-

PR: https://git.openjdk.org/jdk/pull/12855


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v3]

2023-03-16 Thread Chen Liang
> Summaries:
> 1. A few recommendations about updating the constant API is made at 
> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html 
> and I may update this patch shall the API changes be integrated before
> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their 
> code generation infrastructure upgraded from ASM to Classfile API.
> 3. Most tests are included in tier1, but some are not:
> In `:jdk_io`: (tier2, part 2)
> 
> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java
> test/jdk/java/io/Serializable/records/ProhibitedMethods.java
> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java
> 
> In `:jdk_instrument`: (tier 3)
> 
> test/jdk/java/lang/instrument/RetransformAgent.java
> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java
> test/jdk/java/lang/instrument/asmlib/Instrumentor.java
> 
> 
> @asotona Would you mind reviewing?

Chen Liang has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains four additional commits since 
the last revision:

 - Fix failed LambdaStackTrace test, use more convenient APIs
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into 
invoke-test-classfile
 - Shorten lines, move from mask() to ACC_ constants, other misc improvements
 - Convert test/jdk/java ASM tests to classfile api

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13009/files
  - new: https://git.openjdk.org/jdk/pull/13009/files/c6536bf9..a50b94f9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13009&range=01-02

  Stats: 68505 lines in 534 files changed: 42498 ins; 18129 del; 7878 mod
  Patch: https://git.openjdk.org/jdk/pull/13009.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13009/head:pull/13009

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2]

2023-03-16 Thread Alan Bateman
On Tue, 14 Mar 2023 07:53:25 GMT, Alan Bateman  wrote:

> This is class descriptor for ProviderFactory$1, not "Provider" so maybe 
> rename this to providerFactory1 or something a bit clearer.

The updated version looks good. I assume you'll do a pass over the updated 
tests to bump their copyright date as this is the first change in 2023 for many 
of these tests.

-

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2]

2023-03-16 Thread Adam Sotona
On Wed, 15 Mar 2023 04:09:04 GMT, Chen Liang  wrote:

>> Summaries:
>> 1. A few recommendations about updating the constant API is made at 
>> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html 
>> and I may update this patch shall the API changes be integrated before
>> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their 
>> code generation infrastructure upgraded from ASM to Classfile API.
>> 3. Most tests are included in tier1, but some are not:
>> In `:jdk_io`: (tier2, part 2)
>> 
>> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java
>> test/jdk/java/io/Serializable/records/ProhibitedMethods.java
>> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java
>> 
>> In `:jdk_instrument`: (tier 3)
>> 
>> test/jdk/java/lang/instrument/RetransformAgent.java
>> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java
>> test/jdk/java/lang/instrument/asmlib/Instrumentor.java
>> 
>> 
>> @asotona Would you mind reviewing?
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Shorten lines, move from mask() to ACC_ constants, other misc improvements

I like the easy way I can read the tests code now even I don't know them.
They look great :)

-

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8294977: Convert test/jdk/java tests from ASM library to Classfile API [v2]

2023-03-16 Thread Adam Sotona
On Wed, 15 Mar 2023 04:09:04 GMT, Chen Liang  wrote:

>> Summaries:
>> 1. A few recommendations about updating the constant API is made at 
>> https://mail.openjdk.org/pipermail/classfile-api-dev/2023-March/000233.html 
>> and I may update this patch shall the API changes be integrated before
>> 2. One ASM library-specific test, `LambdaAsm` is removed. Others have their 
>> code generation infrastructure upgraded from ASM to Classfile API.
>> 3. Most tests are included in tier1, but some are not:
>> In `:jdk_io`: (tier2, part 2)
>> 
>> test/jdk/java/io/Serializable/records/SerialPersistentFieldsTest.java
>> test/jdk/java/io/Serializable/records/ProhibitedMethods.java
>> test/jdk/java/io/Serializable/records/BadCanonicalCtrTest.java
>> 
>> In `:jdk_instrument`: (tier 3)
>> 
>> test/jdk/java/lang/instrument/RetransformAgent.java
>> test/jdk/java/lang/instrument/NativeMethodPrefixAgent.java
>> test/jdk/java/lang/instrument/asmlib/Instrumentor.java
>> 
>> 
>> @asotona Would you mind reviewing?
>
> Chen Liang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Shorten lines, move from mask() to ACC_ constants, other misc improvements

test/jdk/java/lang/Class/getSimpleName/GetSimpleNameTest.java line 174:

> 172: clb.withSuperclass(CD_Object);
> 173: clb.withFlags(AccessFlag.PUBLIC, AccessFlag.SUPER);
> 174: clb.accept(InnerClassesAttribute.of(

During the API discussions there was slightly more recommended to use 
`ClasfileBuilder::with` over `ClasfileBuilder::accept`, however it is just a 
cosmetic difference.

-

PR: https://git.openjdk.org/jdk/pull/13009


Re: RFR: 8291555: Implement alternative fast-locking scheme [v28]

2023-03-16 Thread Roman Kennke
> This change adds a fast-locking scheme as an alternative to the current 
> stack-locking implementation. It retains the advantages of stack-locking 
> (namely fast locking in uncontended code-paths), while avoiding the overload 
> of the mark word. That overloading causes massive problems with Lilliput, 
> because it means we have to check and deal with this situation when trying to 
> access the mark-word. And because of the very racy nature, this turns out to 
> be very complex and would involve a variant of the inflation protocol to 
> ensure that the object header is stable. (The current implementation of 
> setting/fetching the i-hash provides a glimpse into the complexity).
> 
> What the original stack-locking does is basically to push a stack-lock onto 
> the stack which consists only of the displaced header, and CAS a pointer to 
> this stack location into the object header (the lowest two header bits being 
> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
> identify which thread currently owns the lock.
> 
> This change basically reverses stack-locking: It still CASes the lowest two 
> header bits to 00 to indicate 'fast-locked' but does *not* overload the upper 
> bits with a stack-pointer. Instead, it pushes the object-reference to a 
> thread-local lock-stack. This is a new structure which is basically a small 
> array of oops that is associated with each thread. Experience shows that this 
> array typcially remains very small (3-5 elements). Using this lock stack, it 
> is possible to query which threads own which locks. Most importantly, the 
> most common question 'does the current thread own me?' is very quickly 
> answered by doing a quick scan of the array. More complex queries like 'which 
> thread owns X?' are not performed in very performance-critical paths (usually 
> in code like JVMTI or deadlock detection) where it is ok to do more complex 
> operations (and we already do). The lock-stack is also a new set of GC roots, 
> and would be scanned during thread scanning, possibly concurrently, via the 
> normal p
 rotocols.
> 
> The lock-stack is grown when needed. This means that we need to check for 
> potential overflow before attempting locking. When that is the case, locking 
> fast-paths would call into the runtime to grow the stack and handle the 
> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
> on method entry to avoid (possibly lots) of such checks at locking sites.
> 
> In contrast to stack-locking, fast-locking does *not* support recursive 
> locking (yet). When that happens, the fast-lock gets inflated to a full 
> monitor. It is not clear if it is worth to add support for recursive 
> fast-locking.
> 
> One trouble is that when a contending thread arrives at a fast-locked object, 
> it must inflate the fast-lock to a full monitor. Normally, we need to know 
> the current owning thread, and record that in the monitor, so that the 
> contending thread can wait for the current owner to properly exit the 
> monitor. However, fast-locking doesn't have this information. What we do 
> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
> currently holds the lock arrives at monitorexit, and observes 
> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
> and then properly exits the monitor, and thus handing over to the contending 
> thread.
> 
> As an alternative, I considered to remove stack-locking altogether, and only 
> use heavy monitors. In most workloads this did not show measurable 
> regressions. However, in a few workloads, I have observed severe regressions. 
> All of them have been using old synchronized Java collections (Vector, 
> Stack), StringBuffer or similar code. The combination of two conditions leads 
> to regressions without stack- or fast-locking: 1. The workload synchronizes 
> on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 
> 2. The workload churns such locks. IOW, uncontended use of Vector, 
> StringBuffer, etc as such is ok, but creating lots of such single-use, 
> single-threaded-locked objects leads to massive ObjectMonitor churn, which 
> can lead to a significant performance impact. But alas, such code exists, and 
> we probably don't want to punish it if we can avoid it.
> 
> This change enables to simplify (and speed-up!) a lot of code:
> 
> - The inflation protocol is no longer necessary: we can directly CAS the 
> (tagged) ObjectMonitor pointer to the object header.
> - Accessing the hashcode could now be done in the fastpath always, if the 
> hashcode has been installed. Fast-locked headers can be used directly, for 
> monitor-locked objects we can easily reach-through to the displaced header. 
> This is safe because Java threads participate in monitor deflation protocol. 
> This would be implemented in a separate PR
> 
> 
> Testing:
>  - [x] tier1 x86_64 x aarch64 x +UseFastLocking
>  - [x] tier2 x86_6

Re: RFR: 8291555: Implement alternative fast-locking scheme [v21]

2023-03-16 Thread Roman Kennke
On Sat, 11 Mar 2023 14:52:54 GMT, Thomas Stuefe  wrote:

>> Roman Kennke has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2
>>  - Use nullptr instead of NULL in touched code (shared)
>
> src/hotspot/share/runtime/lockStack.hpp line 64:
> 
>> 62: 
>> 63:   // GC support
>> 64:   inline void oops_do(OopClosure* cl);
> 
> Does this need to be nonconst?

Yes, because the OopClosures can (and will) update the inline array elements.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v21]

2023-03-16 Thread Roman Kennke
On Sat, 11 Mar 2023 14:57:19 GMT, Thomas Stuefe  wrote:

>> Roman Kennke has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2
>>  - Use nullptr instead of NULL in touched code (shared)
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234:
> 
>> 6232: 
>> 6233:   // Load (object->mark() | 1) into hdr
>> 6234:   orr(hdr, hdr, markWord::unlocked_value);
> 
> I wondered why this is needed. Should we not have the header of an unloaded 
> object in hdr? Or is this a safeguard against a misuse of this function 
> (called with the header of an already locked object)?

It could be a monitor-locked header. In C2 this is not possible and we *could* 
save an instruction here, I guess. Not sure if it is worth it, though.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Robbin Ehn
On Thu, 16 Mar 2023 10:26:26 GMT, Thomas Stuefe  wrote:

> Sounds good. Just to be clear, you mean enforce symmetric locking? resp. 
> forbid asymmetric locking?

Yes, sorry, thanks for correcting! :)

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Thomas Stuefe
On Thu, 16 Mar 2023 10:20:21 GMT, Robbin Ehn  wrote:

> 



> > > > @rkennke I must be missing something. In aarch64, why do we handle the 
> > > > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we 
> > > > just seem to pop whatever is on top.
> > > 
> > > 
> > > C1 and C2 don't allow assymmetric locking. If that ever happens, they 
> > > would refuse to compile the method. We should probably check that this 
> > > assumption holds true when popping the top entry in an #ASSERT block.
> > 
> > 
> > Thanks for clarifying. Yes, asserting that would make sense.
> 
> FYI: I'm trying to convince folks that JVMS should be allowed to enforce 
> asymmetric locking. We think most people don't know they will be stuck in 
> interpreter, unintended. What was discussed latest was to diagnose and warn 
> about this behavior as a first step.

Sounds good. Just to be clear, you mean enforce symmetric locking? resp. forbid 
asymmetric locking?

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Robbin Ehn
On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke  wrote:

>> I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is 
>> meant to be transitional, it is not meant to be used by end-users (except 
>> the bravest nerds) at all. When it lands, the Lilliput flag (e.g. 
>> +UseCompactObjectHeaders) will also control the locking flag. Eventually 
>> (e.g. release+1) both flags would become on by default and afterwards (e.g. 
>> release+2) would go away entirely, at which point the whole original 
>> stack-locking would disappear.
>
>> @rkennke I must be missing something. In aarch64, why do we handle the 
>> non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just 
>> seem to pop whatever is on top.
> 
> C1 and C2 don't allow assymmetric locking. If that ever happens, they would 
> refuse to compile the method. We should probably check that this assumption 
> holds true when popping the top entry in an #ASSERT block.

> > > @rkennke I must be missing something. In aarch64, why do we handle the 
> > > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we 
> > > just seem to pop whatever is on top.
> > 
> > 
> > C1 and C2 don't allow assymmetric locking. If that ever happens, they would 
> > refuse to compile the method. We should probably check that this assumption 
> > holds true when popping the top entry in an #ASSERT block.
> 
> Thanks for clarifying. Yes, asserting that would make sense.

FYI:
I'm trying to convince folks that JVMS should be allowed to enforce asymmetric 
locking.
We think most people don't know they will be stuck in interpreter, unintended.
What was discussed latest was to diagnose and warn about this behavior as a 
first step.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-16 Thread Richard Reingruber
On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva  
wrote:

>> The current structure used to store the resolution information for 
>> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its 
>> ambigious fields f1 and f2. This structure can hold information for fields, 
>> methods, and invokedynamics and each of its fields can hold different types 
>> of values depending on the entry. 
>> 
>> This enhancement proposes a new structure to exclusively contain 
>> invokedynamic information in a manner that is easy to interpret and easy to 
>> extend.  Resolved invokedynamic entries will be stored in an array in the 
>> constant pool cache and the operand of the invokedynamic bytecode will be 
>> rewritten to be the index into this array.
>> 
>> Any areas that previously accessed invokedynamic data from 
>> ConstantPoolCacheEntry will be replaced with accesses to this new array and 
>> structure. Verified with tier1-9 tests.
>> 
>> The PPC was provided by @reinrich and the RISCV port was provided by 
>> @DingliZhang and @zifeihan.
>> 
>> This change supports the following platforms: x86, aarch64, PPC, and RISCV
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fixed aarch64 interpreter mistake

src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398:

> 3396:   const Bytecodes::Code code = bytecode();
> 3397:   const bool is_invokeinterface  = code == Bytecodes::_invokeinterface;
> 3398:   const bool is_invokedynamic= code == false; // should not reach 
> here with invokedynamic

This is what I meant.

Suggestion:

  const bool is_invokedynamic= false; // should not reach here with 
invokedynamic


Thanks!

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-16 Thread Richard Reingruber
On Thu, 16 Mar 2023 09:21:26 GMT, Martin Doerr  wrote:

>> Basically I kept the local variable as a name for the (now) constant value 
>> passed in the call at L3409.
>> 
>> The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is 
>> declared in a shared header.
>> 
>> I could replace the variable reference in the call with `false /* 
>> is_invokedynamic */` if you like that better. Personally I'd prefer the 
>> current version.
>
> I meant `code == false`. That was probably not intended.

Oh my ... Your are right of course.

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-16 Thread Martin Doerr
On Thu, 16 Mar 2023 09:07:27 GMT, Richard Reingruber  wrote:

>> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398:
>> 
>>> 3396:   const Bytecodes::Code code = bytecode();
>>> 3397:   const bool is_invokeinterface  = code == 
>>> Bytecodes::_invokeinterface;
>>> 3398:   const bool is_invokedynamic= code == false; // should not reach 
>>> here with invokedynamic
>> 
>> This looks strange! I guess you wanted to delete more?
>
> Basically I kept the local variable as a name for the (now) constant value 
> passed in the call at L3409.
> 
> The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is 
> declared in a shared header.
> 
> I could replace the variable reference in the call with `false /* 
> is_invokedynamic */` if you like that better. Personally I'd prefer the 
> current version.

I meant `code == false`. That was probably not intended.

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-16 Thread Richard Reingruber
On Wed, 15 Mar 2023 19:04:41 GMT, Martin Doerr  wrote:

>> Matias Saavedra Silva has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Fixed aarch64 interpreter mistake
>
> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398:
> 
>> 3396:   const Bytecodes::Code code = bytecode();
>> 3397:   const bool is_invokeinterface  = code == Bytecodes::_invokeinterface;
>> 3398:   const bool is_invokedynamic= code == false; // should not reach 
>> here with invokedynamic
> 
> This looks strange! I guess you wanted to delete more?

Basically I kept the local variable as a name for the (now) constant value 
passed in the call at L3409.

The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is 
declared in a shared header.

I could replace the variable reference in the call with `false /* 
is_invokedynamic */` if you like that better. Personally I'd prefer the current 
version.

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Thomas Stuefe
On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke  wrote:

> 
> > @rkennke I must be missing something. In aarch64, why do we handle the 
> > non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just 
> > seem to pop whatever is on top.
> 
> C1 and C2 don't allow assymmetric locking. If that ever happens, they would 
> refuse to compile the method. We should probably check that this assumption 
> holds true when popping the top entry in an #ASSERT block.

Thanks for clarifying. Yes, asserting that would make sense.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Roman Kennke
On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke  wrote:

>> Roman Kennke has updated the pull request incrementally with three 
>> additional commits since the last revision:
>> 
>>  - More RISCV changes (by Fei Yang)
>>  - Use -w instructions in fast_unlock()
>>  - Increase stub size of C2HandleAnonOwnerStub to 18
>
> I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is 
> meant to be transitional, it is not meant to be used by end-users (except the 
> bravest nerds) at all. When it lands, the Lilliput flag (e.g. 
> +UseCompactObjectHeaders) will also control the locking flag. Eventually 
> (e.g. release+1) both flags would become on by default and afterwards (e.g. 
> release+2) would go away entirely, at which point the whole original 
> stack-locking would disappear.

> @rkennke I must be missing something. In aarch64, why do we handle the 
> non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just 
> seem to pop whatever is on top.

C1 and C2 don't allow assymmetric locking. If that ever happens, they would 
refuse to compile the method. We should probably check that this assumption 
holds true when popping the top entry in an #ASSERT block.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Thomas Stuefe
On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke  wrote:

>> Roman Kennke has updated the pull request incrementally with three 
>> additional commits since the last revision:
>> 
>>  - More RISCV changes (by Fei Yang)
>>  - Use -w instructions in fast_unlock()
>>  - Increase stub size of C2HandleAnonOwnerStub to 18
>
> I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is 
> meant to be transitional, it is not meant to be used by end-users (except the 
> bravest nerds) at all. When it lands, the Lilliput flag (e.g. 
> +UseCompactObjectHeaders) will also control the locking flag. Eventually 
> (e.g. release+1) both flags would become on by default and afterwards (e.g. 
> release+2) would go away entirely, at which point the whole original 
> stack-locking would disappear.

@rkennke I must be missing something. In aarch64, why do we handle the 
non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem 
to pop whatever is on top.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Roman Kennke
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Robbin Ehn
On Thu, 16 Mar 2023 08:00:38 GMT, Thomas Stuefe  wrote:

> I like UseNewLocks but fear that this may conflict with Oracles plan (?) to 
> move OMs into heap, which would be another revamp of locking - fat locks in 
> this case - and may come with yet another switch. Other than that, 
> UseNewLocks sounds good and succinct.
> 
> Another proposal: UseThreadLockStack or UseLockStack

Just a FYI, at the moment we have:

  product(ccstr, ObjectSynchronizerMode, "fast",\
  "ObjectSynchronizer modes: "  \
  "legacy: legacy native system; "  \
  "native: java entry with native monitors; "   \
  "heavy: java entry with always inflated Java monitors; "  \
  "fast: java entry with fast-locks and"\
  "  inflate-on-demand Java monitors; ")\



At least personally I prefer one option than using many.
A cmd line with e.g.
`-XX:-UseLockStack -XX:+UseHeavyMonitors`
It's harder, for me ?, to figure out what is selected and what was intended to 
be selected.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread Thomas Stuefe
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-16 Thread David Holmes
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v2]

2023-03-16 Thread Serguei Spitsyn
> This is needed for performance improvements in support of virtual threads.
> The update includes the following:
> 
> 1. Refactored the `VirtualThread` native methods:
> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd`  => 
> `notifyJvmtiMount`
> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => 
> `notifyJvmtiUnmount`
> 2. Still useful implementation of old native methods is moved from `jvm.cpp` 
> to `jvmtiThreadState.cpp`:
>  `JVM_VirtualThreadMountStart`   => `VTMS_mount_begin`
>  `JVM_VirtualThreadMountEnd` => `VTMS_mount_end`
>  `JVM_VirtualThreadUnmountStart`  = > `VTMS_unmount_begin`
>  `JVM_VirtualThreadUnmountEnd`=> `VTMS_mount_end`
> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, 
> `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`.
> 4. Removed the`VirtualThread` static boolean state variable 
> `notifyJvmtiEvents` and its support in `javaClasses`.
> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the 
> jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` 
> `notifyJvmtiEvents` variable.
> 
> Implementing the same methods as C1 intrinsics can be needed in the future 
> but is a low priority for now.  
> 
> Testing:
>  - Ran mach5 tiers 1-6. No regressions were found.

Serguei Spitsyn has updated the pull request incrementally with one additional 
commit since the last revision:

  include jniHandles.hpp into sharedRuntime.cpp

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/13054/files
  - new: https://git.openjdk.org/jdk/pull/13054/files/8a379320..397b6337

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=00-01

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/13054.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054

PR: https://git.openjdk.org/jdk/pull/13054


Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]

2023-03-16 Thread David Holmes
On Wed, 15 Mar 2023 20:02:11 GMT, Alex Menkov  wrote:

> It seems to me that it's much simpler to remove build action from 4 tests in 
> the directory than add it for other 55

True. Sadly we keep getting bitten over and over by CODETOOLS-7902847. 
Sometimes the "fix" is to remove build directives (in Hotspot we switched from 
adding to removing back in 2017) and sometimes it is to add them (there have 
been some recent fixes that have done this - including by me).

-

PR: https://git.openjdk.org/jdk/pull/13030