Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]

2023-03-15 Thread David Holmes
On Wed, 15 Mar 2023 19:40:33 GMT, Roman Kennke  wrote:

>>> Would it be possible to open/send me the failing test that triggers 
>>> vframeArray assert
>>> or extract a reproducer that you could publish?
>> 
>> I have started an internal discussion at Oracle to see what it would take
>> to move that test from closed to open. Will keep you posted.
>
>> > Would it be possible to open/send me the failing test that triggers 
>> > vframeArray assert
>> > or extract a reproducer that you could publish?
>> 
>> I have started an internal discussion at Oracle to see what it would take to 
>> move that test from closed to open. Will keep you posted.
> 
> Thank you!
> 
> Regarding moving this PR back to draft, I am not sure. I can do that, yes. 
> But really the fundamental algorithm and implementation is basically fixed 
> since half a year already. I have re-worked it into a fresh PR based on the 
> request to put it behind a flag. The recent change to a fixed-size lock-stack 
> has probably invalidated part of your previous reviews, and I am sorry for 
> that. On the upside, it removed a lot of complexity in the JIT compilers and 
> assembly code generators.
> 
> What else do I expect to happen?
> 
> Thomas is working on an ARM(32) port, but this is quite separate and could 
> even land after this PR is done.
> 
> I still don't quite like the naming. Fast-locking doesn't really say anything 
> and it's not (meant to be) faster than the previous stack-locking. It is an 
> alternative (and less racy, on the object header) way to implement a 
> thin-locking layer before inflating monitors, that is all. So maybe 
> -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is 
> that when we eventually switch to Lilliput turned on by default, we would 
> entirely remove stack-locking.
> 
> I would also add some code in arguments.cpp to keep this new thin locking 
> turned off on platforms that don't yet support it.
> 
> Besides that, from my POV, it is pretty much done.
> 
> What do you think?

@rkennke The changed to fixed-size lock stack was a significant change as you 
note and that suggested to me that the design was still in flux. So I have to 
wonder whether everything is in fact now stable? (or as stable as one should 
expect for an experimental new feature)

> Fast-locking doesn't really say anything and it's not (meant to be) faster 
> than the previous stack-locking.

Agreed. But I don't think "Thin locks" is an option as that was specifically an 
IBM locking implementation. Historically Hotspot's locking mechanism has 
internally been referred to as stack-locks, so I would suggest UseNewStackLocks

-

PR: https://git.openjdk.org/jdk/pull/10907


RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics

2023-03-15 Thread Serguei Spitsyn
This is needed for performance improvements in support of virtual threads.
The update includes the following:

1. Refactored the `VirtualThread` native methods:
`notifyJvmtiMountBegin` and `notifyJvmtiMountEnd`  => 
`notifyJvmtiMount`
`notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => 
`notifyJvmtiUnmount`
2. Still useful implementation of old native methods is moved from `jvm.cpp` to 
`jvmtiThreadState.cpp`:
 `JVM_VirtualThreadMountStart`   => `VTMS_mount_begin`
 `JVM_VirtualThreadMountEnd` => `VTMS_mount_end`
 `JVM_VirtualThreadUnmountStart`  = > `VTMS_unmount_begin`
 `JVM_VirtualThreadUnmountEnd`=> `VTMS_mount_end`
3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, 
`notifyJvmtiUnmount`, `notifyJvmtiHideFrames`.
4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` 
and its support in `javaClasses`.
5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the 
jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` 
`notifyJvmtiEvents` variable.

Implementing the same methods as C1 intrinsics can be needed in the future but 
is a low priority for now.  

Testing:
 - Ran mach5 tiers 1-6. No regressions were found.

-

Commit messages:
 - fix traling spaces in a couple of files
 - minor update for VTMS_notify_jvmti_events variable
 - 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics

Changes: https://git.openjdk.org/jdk/pull/13054/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=13054=00
  Issue: https://bugs.openjdk.org/browse/JDK-8304303
  Stats: 438 lines in 20 files changed: 265 ins; 125 del; 48 mod
  Patch: https://git.openjdk.org/jdk/pull/13054.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054

PR: https://git.openjdk.org/jdk/pull/13054


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-15 Thread Thomas Stuefe
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-15 Thread Daniel D . Daugherty
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Naoto Sato
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu  wrote:

> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

test/jdk/java/text/Format/NumberFormat/CurrencySymbols.properties line 156:

> 154: zh=\u00A4
> 155: zh_CN=\uFFE5
> 156: zh_HK=HK$

Why are they not encoded into UTF-8 native?

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]

2023-03-15 Thread Alex Menkov
On Wed, 15 Mar 2023 18:50:25 GMT, Serguei Spitsyn  wrote:

> This looks pretty good. How did you test the fix? Does it never fail with 
> your fix? Thanks, Serguei

I run test jobs for "serviceability/sa" 100 times on all platforms, no failures 
(neither attach timeout nor NoClassDefFoundError)

-

PR: https://git.openjdk.org/jdk/pull/13030


Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]

2023-03-15 Thread Alex Menkov
On Wed, 15 Mar 2023 02:41:18 GMT, David Holmes  wrote:

> Not sure removing the build directives was the right way to go. As per the 
> jtreg tag guide:
> 
> > A test that relies upon library classes should contain appropriate @build 
> > directives to ensure that the classes will be compiled. It is strongly 
> > recommended that tests do not rely on the use of implicit compilation by 
> > the Java compiler.
> 
> so the problem is likely caused by missing build directives in the test(s) 
> that fails.

This is long standing issue with NoClassDefFoundError for library classes.
As far as I got from reading similar issues there was a try to add build 
directive for failing tests, but other tests from the same directory started to 
fail with NoClassDefFoundError later, and now most of the test have no build 
action for libraries.
It seems to me that it's much simpler to remove build action from 4 tests in 
the directory than add it for other 55

-

PR: https://git.openjdk.org/jdk/pull/13030


Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]

2023-03-15 Thread Roman Kennke
On Wed, 15 Mar 2023 19:14:09 GMT, Daniel D. Daugherty  
wrote:

> > Would it be possible to open/send me the failing test that triggers 
> > vframeArray assert
> > or extract a reproducer that you could publish?
> 
> I have started an internal discussion at Oracle to see what it would take to 
> move that test from closed to open. Will keep you posted.

Thank you!

Regarding moving this PR back to draft, I am not sure. I can do that, yes. But 
really the fundamental algorithm and implementation is basically fixed since 
half a year already. I have re-worked it into a fresh PR based on the request 
to put it behind a flag. The recent change to a fixed-size lock-stack has 
probably invalidated part of your previous reviews, and I am sorry for that. On 
the upside, it removed a lot of complexity in the JIT compilers and assembly 
code generators.

What else do I expect to happen?

Thomas is working on an ARM(32) port, but this is quite separate and could even 
land after this PR is done.

I still don't quite like the naming. Fast-locking doesn't really say anything 
and it's not (meant to be) faster than the previous stack-locking. It is an 
alternative (and less racy, on the object header) way to implement a 
thin-locking layer before inflating monitors, that is all. So maybe 
-XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is 
that when we eventually switch to Lilliput turned on by default, we would 
entirely remove stack-locking.

I would also add some code in arguments.cpp to keep this new thin locking 
turned off on platforms that don't yet support it.

Besides that, from my POV, it is pretty much done.

What do you think?

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8257967: JFR: Events for loaded agents [v10]

2023-03-15 Thread Serguei Spitsyn
On Tue, 14 Mar 2023 12:26:16 GMT, Markus Grönlund  wrote:

>> I've had a good look through now and have a better sense of the refactoring. 
>> Seems good.
>> 
>> I'll wait for any tweaks before hitting the approve button though.
>> 
>> Thanks
>
>> I've had a good look through now and have a better sense of the refactoring. 
>> Seems good.
>> 
>> 
>> 
>> I'll wait for any tweaks before hitting the approve button though.
>> 
>> 
>> 
>> Thanks
> 
> Thanks so much for taking a look. I realized that implementation details of 
> loading should probably reside in agent.cpp, not agentList.cpp.
> 
> I am currently off on vacation and will update when back. Thanks also to 
> Andrew Dinn for comments.

@mgronlun I'm looking at the fixes but it will take some time.

-

PR: https://git.openjdk.org/jdk/pull/12923


Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]

2023-03-15 Thread Daniel D . Daugherty
On Wed, 15 Mar 2023 09:36:25 GMT, Roman Kennke  wrote:

> Would it be possible to open/send me the failing test that triggers 
> vframeArray assert
> or extract a reproducer that you could publish?

I have started an internal discussion at Oracle to see what it would take
to move that test from closed to open. Will keep you posted.

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-15 Thread Martin Doerr
On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva  
wrote:

>> The current structure used to store the resolution information for 
>> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its 
>> ambigious fields f1 and f2. This structure can hold information for fields, 
>> methods, and invokedynamics and each of its fields can hold different types 
>> of values depending on the entry. 
>> 
>> This enhancement proposes a new structure to exclusively contain 
>> invokedynamic information in a manner that is easy to interpret and easy to 
>> extend.  Resolved invokedynamic entries will be stored in an array in the 
>> constant pool cache and the operand of the invokedynamic bytecode will be 
>> rewritten to be the index into this array.
>> 
>> Any areas that previously accessed invokedynamic data from 
>> ConstantPoolCacheEntry will be replaced with accesses to this new array and 
>> structure. Verified with tier1-9 tests.
>> 
>> The PPC was provided by @reinrich and the RISCV port was provided by 
>> @DingliZhang and @zifeihan.
>> 
>> This change supports the following platforms: x86, aarch64, PPC, and RISCV
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fixed aarch64 interpreter mistake

src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398:

> 3396:   const Bytecodes::Code code = bytecode();
> 3397:   const bool is_invokeinterface  = code == Bytecodes::_invokeinterface;
> 3398:   const bool is_invokedynamic= code == false; // should not reach 
> here with invokedynamic

This looks strange! I guess you wanted to delete more?

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-15 Thread Daniel D . Daugherty
On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation 

Re: RFR: 8303921: serviceability/sa/UniqueVtableTest.java timed out [v2]

2023-03-15 Thread Serguei Spitsyn
On Wed, 15 Mar 2023 00:34:00 GMT, Alex Menkov  wrote:

>> The change:
>> - updates UniqueVtableTest to follow standard SA way - attach to target from 
>> subprocess and use SATestUtils.addPrivilegesIfNeeded for the subprocess;
>> - updates several tests in the same directory to resolve 
>> NoClassDefFoundError failures; It's known JTReg issue that "@build" actions 
>> for part of used shared classes may cause intermittent NoClassDefFoundError 
>> in other tests which use the same shared library classpath.
>> 
>> Tested: 100 runs on all platforms, no failures
>
> Alex Menkov has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   feedback

This looks pretty good.
How did you test the fix?
Does it never fail with your fix?
Thanks,
Serguei

-

PR: https://git.openjdk.org/jdk/pull/13030


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6]

2023-03-15 Thread Matias Saavedra Silva
> The current structure used to store the resolution information for 
> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its 
> ambigious fields f1 and f2. This structure can hold information for fields, 
> methods, and invokedynamics and each of its fields can hold different types 
> of values depending on the entry. 
> 
> This enhancement proposes a new structure to exclusively contain 
> invokedynamic information in a manner that is easy to interpret and easy to 
> extend.  Resolved invokedynamic entries will be stored in an array in the 
> constant pool cache and the operand of the invokedynamic bytecode will be 
> rewritten to be the index into this array.
> 
> Any areas that previously accessed invokedynamic data from 
> ConstantPoolCacheEntry will be replaced with accesses to this new array and 
> structure. Verified with tier1-9 tests.
> 
> The PPC was provided by @reinrich and the RISCV port was provided by 
> @DingliZhang and @zifeihan.
> 
> This change supports the following platforms: x86, aarch64, PPC, and RISCV

Matias Saavedra Silva has updated the pull request incrementally with one 
additional commit since the last revision:

  Fixed aarch64 interpreter mistake

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12778/files
  - new: https://git.openjdk.org/jdk/pull/12778/files/415e7116..9a3a63ae

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12778=05
 - incr: https://webrevs.openjdk.org/?repo=jdk=12778=04-05

  Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/12778.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2]

2023-03-15 Thread Richard Reingruber
On Tue, 14 Mar 2023 17:01:20 GMT, Matias Saavedra Silva  
wrote:

> > @matias9927 can I ask you to merge master? There seem to be conflicts (at 
> > least I see a message "This branch has conflicts that must be resolved"). 
> > I'd like to give the change a spin in our CI testing. This requires that it 
> > can be applied on master.
> 
> I saw that merge error but nothing came up when I tried to merge locally. The 
> branch is updated nonetheless, so you should be able to test it now @reinrich 
> !

Thanks. The testing didn't reveal anything.

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]

2023-03-15 Thread Chris Plummer
On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain  wrote:

>> Please review this change re-implementing the FieldInfo data structure.
>> 
>> The FieldInfo array is an old data structure storing fields metadata. It has 
>> poor extension capabilities, a complex management code because of lack of 
>> strong typing and semantic overloading, and a poor memory efficiency.
>> 
>> The new implementation uses a compressed stream to store those metadata, 
>> achieving better memory density and providing flexible extensibility, while 
>> exposing a strongly typed set of data when uncompressed. The stream is 
>> compressed using the unsigned5 encoding, which alreay present in the JDK 
>> (because of pack200) and the JVM (because JIT compulers use it to comrpess 
>> debugging information).
>> 
>> More technical details are available in the CR: 
>> https://bugs.openjdk.org/browse/JDK-8292818
>> 
>> Those changes include a re-organisation of fields' flags, splitting the 
>> previous heterogeneous AccessFlags field into three distincts flag 
>> categories: immutable flags from the class file, immutable fields defined by 
>> the JVM, and finally mutable flags defined by the JVM.
>> 
>> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, 
>> have been updated too to deal with the new FieldInfo format.
>> 
>> Tested with mach5, tier 1 to 7.
>> 
>> Thank you.
>
> Frederic Parain has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   SA and JVMCI fixes

SA changes looks good. Thanks for taking care of this!

-

Marked as reviewed by cjplummer (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12855


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v5]

2023-03-15 Thread Matias Saavedra Silva
> The current structure used to store the resolution information for 
> invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its 
> ambigious fields f1 and f2. This structure can hold information for fields, 
> methods, and invokedynamics and each of its fields can hold different types 
> of values depending on the entry. 
> 
> This enhancement proposes a new structure to exclusively contain 
> invokedynamic information in a manner that is easy to interpret and easy to 
> extend.  Resolved invokedynamic entries will be stored in an array in the 
> constant pool cache and the operand of the invokedynamic bytecode will be 
> rewritten to be the index into this array.
> 
> Any areas that previously accessed invokedynamic data from 
> ConstantPoolCacheEntry will be replaced with accesses to this new array and 
> structure. Verified with tier1-9 tests.
> 
> The PPC was provided by @reinrich and the RISCV port was provided by 
> @DingliZhang and @zifeihan.
> 
> This change supports the following platforms: x86, aarch64, PPC, and RISCV

Matias Saavedra Silva has updated the pull request incrementally with one 
additional commit since the last revision:

  Fixed indentation and other comments

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12778/files
  - new: https://git.openjdk.org/jdk/pull/12778/files/db892223..415e7116

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12778=04
 - incr: https://webrevs.openjdk.org/?repo=jdk=12778=03-04

  Stats: 71 lines in 9 files changed: 1 ins; 3 del; 67 mod
  Patch: https://git.openjdk.org/jdk/pull/12778.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Archie L . Cobbs
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu  wrote:

> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

test/jdk/java/util/ResourceBundle/Bug6204853.properties line 1:

> 1: #

This file should probably be excluded because it's used in a test that relates 
to UTF-8 encoding (or not) of property files.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Justin Lu
On Tue, 7 Mar 2023 23:15:14 GMT, Jonathan Gibbons  wrote:

>> This PR converts Unicode sequences to UTF-8 native in .properties file. 
>> (Excluding the Unicode space and tab sequence). The conversion was done 
>> using native2ascii.
>> 
>> In addition, the build logic is adjusted to support reading in the 
>> .properties files as UTF-8 during the conversion from .properties file to 
>> .java ListResourceBundle file.
>
> make/langtools/tools/compileproperties/CompileProperties.java line 252:
> 
>> 250: try {
>> 251: writer = new BufferedWriter(
>> 252: new OutputStreamWriter(new 
>> FileOutputStream(outputPath), StandardCharsets.ISO_8859_1));
> 
> Using ISO_8859_1 seems strange.
> Since these are generated files, you could write them as UTF-8 and then 
> override the default javac option for ascii when compiling _just_ these files.
> 
> Or else just stay with ascii; no one should be looking at these files!

Will stick with your latter solution, as since the .properties files were 
converted via native2ascii, it makes sense to write out via ascii.

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Jonathan Gibbons
On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu  wrote:

> This PR converts Unicode sequences to UTF-8 native in .properties file. 
> (Excluding the Unicode space and tab sequence). The conversion was done using 
> native2ascii.
> 
> In addition, the build logic is adjusted to support reading in the 
> .properties files as UTF-8 during the conversion from .properties file to 
> .java ListResourceBundle file.

make/langtools/tools/compileproperties/CompileProperties.java line 252:

> 250: try {
> 251: writer = new BufferedWriter(
> 252: new OutputStreamWriter(new 
> FileOutputStream(outputPath), StandardCharsets.ISO_8859_1));

Using ISO_8859_1 seems strange.
Since these are generated files, you could write them as UTF-8 and then 
override the default javac option for ascii when compiling _just_ these files.

Or else just stay with ascii; no one should be looking at these files!

-

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3]

2023-03-15 Thread Matias Saavedra Silva
On Tue, 14 Mar 2023 15:39:39 GMT, Gui Cao  wrote:

>> Matias Saavedra Silva has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains five 
>> additional commits since the last revision:
>> 
>>  - Typo in comment
>>  - Merge branch 'master' into resolvedIndyEntry_8301995
>>  - Interpreter optimization and comments
>>  - PPC and RISCV port
>>  - 8301995: Move invokedynamic resolution information out of the cpCache
>
> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1843:
> 
>> 1841:   ldr(cache, Address(rcpool, 
>> in_bytes(ConstantPoolCache::invokedynamic_entries_offset(;
>> 1842:   // Scale the index to be the entry index * 
>> sizeof(ResolvedInvokeDynamicInfo)
>> 1843:   mov(tmp, sizeof(ResolvedIndyEntry));
> 
> The tmp register is not used here, is it redundant?

Right, the tmp register is not needed anymore thanks to the mul to shift 
optimization. Note that shifting will not be possible on 32-bit systems due to 
the size of ResolvedIndyEntry not being a power of two. This optimization only 
works on 64 bit builds.

-

PR: https://git.openjdk.org/jdk/pull/12778


RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native

2023-03-15 Thread Justin Lu
This PR converts Unicode sequences to UTF-8 native in .properties file. 
(Excluding the Unicode space and tab sequence). The conversion was done using 
native2ascii.

In addition, the build logic is adjusted to support reading in the .properties 
files as UTF-8 during the conversion from .properties file to .java 
ListResourceBundle file.

-

Commit messages:
 - Write to ASCII
 - Read in .properties as UTF-8, but write to LRB .java as ISO-8859-1
 - Compile class with ascii (Not ready to make system wide change)
 - Toggle UTF-8 for javac option in JavaCompilation.gmk
 - CompileProperties converts in UTF-8
 - Convert .properties from ISO-8859-1 to UTF-8

Changes: https://git.openjdk.org/jdk/pull/12726/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk=12726=00
  Issue: https://bugs.openjdk.org/browse/JDK-8301991
  Stats: 29093 lines in 490 files changed: 6 ins; 0 del; 29087 mod
  Patch: https://git.openjdk.org/jdk/pull/12726.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12726/head:pull/12726

PR: https://git.openjdk.org/jdk/pull/12726


Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5]

2023-03-15 Thread Frederic Parain
> Please review this change re-implementing the FieldInfo data structure.
> 
> The FieldInfo array is an old data structure storing fields metadata. It has 
> poor extension capabilities, a complex management code because of lack of 
> strong typing and semantic overloading, and a poor memory efficiency.
> 
> The new implementation uses a compressed stream to store those metadata, 
> achieving better memory density and providing flexible extensibility, while 
> exposing a strongly typed set of data when uncompressed. The stream is 
> compressed using the unsigned5 encoding, which alreay present in the JDK 
> (because of pack200) and the JVM (because JIT compulers use it to comrpess 
> debugging information).
> 
> More technical details are available in the CR: 
> https://bugs.openjdk.org/browse/JDK-8292818
> 
> Those changes include a re-organisation of fields' flags, splitting the 
> previous heterogeneous AccessFlags field into three distincts flag 
> categories: immutable flags from the class file, immutable fields defined by 
> the JVM, and finally mutable flags defined by the JVM.
> 
> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have 
> been updated too to deal with the new FieldInfo format.
> 
> Tested with mach5, tier 1 to 7.
> 
> Thank you.

Frederic Parain has updated the pull request incrementally with one additional 
commit since the last revision:

  SA and JVMCI fixes

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/12855/files
  - new: https://git.openjdk.org/jdk/pull/12855/files/12b4f1b4..f81337f7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=12855=04
 - incr: https://webrevs.openjdk.org/?repo=jdk=12855=03-04

  Stats: 130 lines in 13 files changed: 14 ins; 46 del; 70 mod
  Patch: https://git.openjdk.org/jdk/pull/12855.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855

PR: https://git.openjdk.org/jdk/pull/12855


Re: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4]

2023-03-15 Thread Matias Saavedra Silva
On Tue, 14 Mar 2023 23:29:17 GMT, Calvin Cheung  wrote:

>> Matias Saavedra Silva has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   RISCV port update
>
> src/hotspot/share/interpreter/bootstrapInfo.cpp line 234:
> 
>> 232:   if (_indy_index > -1) {
>> 233: os::snprintf_checked(what, sizeof(what), "indy#%d", _indy_index);
>> 234:   }
> 
> Since the `else` case doesn’t have braces, maybe omit the braces for this 
> case as well?

The if statements below use braces so I think it would be better to add braces 
to the else case.

-

PR: https://git.openjdk.org/jdk/pull/12778


Re: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4]

2023-03-15 Thread Frederic Parain
On Tue, 14 Mar 2023 01:25:01 GMT, Chris Plummer  wrote:

>> Frederic Parain has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Fixes includes and style
>
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 75:
> 
>> 73: int initialValueIndex;
>> 74: int genericSignatureIndex;
>> 75: int contendedGroup;
> 
> It seems that these should all be shorts. All the getter methods are casting 
> them to short.

Indexes in the constant pool are unsigned shorts, but Java shorts are signed, 
using ints is the simplest way to store those indexes.

> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java 
> line 108:
> 
>> 106: CLASS_STATE_INITIALIZATION_ERROR = 
>> db.lookupIntConstant("InstanceKlass::initialization_error").intValue();
>> 107: // We need a new fieldsCache each time we attach.
>> 108: fieldsCache = new HashMap();
> 
> This should probably be a WeakHashMap. I tried it and it seems to work (or at 
> least didn't cause any problems). However, when doing a heap dump I didn't 
> notice the table being any smaller on exit when it was made weak, even though 
> there were numerous GC's while dumping the heap.
> 
> The `` is the Address of the hotspot InstanceKlass instance, and this 
> Address is referenced by the SA InstanceKlass mirror. So theoretically when 
> the reference to the mirror goes way, then the cache entry can be cleared.

I've changed the map to a WeakHashMap and didn't see any issue during testing. 
But I didn't measure footprint.

> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java 
> line 325:
> 
>> 323: 
>> 324:   public int getFieldOffset(int index) {
>> 325: return (int)getField(index).getOffset();
> 
> Cast to int is not needed

Other APIs (like MetadaField) are using longs to pass offsets, doing a cast 
here is less disruptive than changing all the other APIs.

-

PR: https://git.openjdk.org/jdk/pull/12855


Re: RFR: 8291555: Implement alternative fast-locking scheme [v26]

2023-03-15 Thread Roman Kennke
On Wed, 15 Mar 2023 01:25:33 GMT, Daniel D. Daugherty  
wrote:

> I did Mach5 Tier{1,2,3} on v25. Please see the bug report for the gory 
> details:
> 
> Tier1 - 1 known, unrelated failure Tier2 - 4 closed, unknown, related test 
> failures Tier3 - 8 closed, unknown, related test failures; 2 open, known, 
> unrelated test failures; 16 open, unknown, related test failures
> 
> I'm pausing my Mach5 testing at this point.

Hi Daniel,

I could not reproduce any of the test failures, neither on x86_64 nor on 
aarch64. I have blindly fixed the code stub size issue, it seems rather 
trivial. Would it be possible to open/send me the failing test that triggers 
vframeArray assert or extract a reproducer that you could publish? I looked at 
it but could not figure out what could be going on there.

Thanks,
Roman

-

PR: https://git.openjdk.org/jdk/pull/10907


Re: RFR: 8291555: Implement alternative fast-locking scheme [v27]

2023-03-15 Thread Roman Kennke
> This change adds a fast-locking scheme as an alternative to the current 
> stack-locking implementation. It retains the advantages of stack-locking 
> (namely fast locking in uncontended code-paths), while avoiding the overload 
> of the mark word. That overloading causes massive problems with Lilliput, 
> because it means we have to check and deal with this situation when trying to 
> access the mark-word. And because of the very racy nature, this turns out to 
> be very complex and would involve a variant of the inflation protocol to 
> ensure that the object header is stable. (The current implementation of 
> setting/fetching the i-hash provides a glimpse into the complexity).
> 
> What the original stack-locking does is basically to push a stack-lock onto 
> the stack which consists only of the displaced header, and CAS a pointer to 
> this stack location into the object header (the lowest two header bits being 
> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
> identify which thread currently owns the lock.
> 
> This change basically reverses stack-locking: It still CASes the lowest two 
> header bits to 00 to indicate 'fast-locked' but does *not* overload the upper 
> bits with a stack-pointer. Instead, it pushes the object-reference to a 
> thread-local lock-stack. This is a new structure which is basically a small 
> array of oops that is associated with each thread. Experience shows that this 
> array typcially remains very small (3-5 elements). Using this lock stack, it 
> is possible to query which threads own which locks. Most importantly, the 
> most common question 'does the current thread own me?' is very quickly 
> answered by doing a quick scan of the array. More complex queries like 'which 
> thread owns X?' are not performed in very performance-critical paths (usually 
> in code like JVMTI or deadlock detection) where it is ok to do more complex 
> operations (and we already do). The lock-stack is also a new set of GC roots, 
> and would be scanned during thread scanning, possibly concurrently, via the 
> normal p
 rotocols.
> 
> The lock-stack is grown when needed. This means that we need to check for 
> potential overflow before attempting locking. When that is the case, locking 
> fast-paths would call into the runtime to grow the stack and handle the 
> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
> on method entry to avoid (possibly lots) of such checks at locking sites.
> 
> In contrast to stack-locking, fast-locking does *not* support recursive 
> locking (yet). When that happens, the fast-lock gets inflated to a full 
> monitor. It is not clear if it is worth to add support for recursive 
> fast-locking.
> 
> One trouble is that when a contending thread arrives at a fast-locked object, 
> it must inflate the fast-lock to a full monitor. Normally, we need to know 
> the current owning thread, and record that in the monitor, so that the 
> contending thread can wait for the current owner to properly exit the 
> monitor. However, fast-locking doesn't have this information. What we do 
> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
> currently holds the lock arrives at monitorexit, and observes 
> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
> and then properly exits the monitor, and thus handing over to the contending 
> thread.
> 
> As an alternative, I considered to remove stack-locking altogether, and only 
> use heavy monitors. In most workloads this did not show measurable 
> regressions. However, in a few workloads, I have observed severe regressions. 
> All of them have been using old synchronized Java collections (Vector, 
> Stack), StringBuffer or similar code. The combination of two conditions leads 
> to regressions without stack- or fast-locking: 1. The workload synchronizes 
> on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 
> 2. The workload churns such locks. IOW, uncontended use of Vector, 
> StringBuffer, etc as such is ok, but creating lots of such single-use, 
> single-threaded-locked objects leads to massive ObjectMonitor churn, which 
> can lead to a significant performance impact. But alas, such code exists, and 
> we probably don't want to punish it if we can avoid it.
> 
> This change enables to simplify (and speed-up!) a lot of code:
> 
> - The inflation protocol is no longer necessary: we can directly CAS the 
> (tagged) ObjectMonitor pointer to the object header.
> - Accessing the hashcode could now be done in the fastpath always, if the 
> hashcode has been installed. Fast-locked headers can be used directly, for 
> monitor-locked objects we can easily reach-through to the displaced header. 
> This is safe because Java threads participate in monitor deflation protocol. 
> This would be implemented in a separate PR
> 
> 
> Testing:
>  - [x] tier1 x86_64 x aarch64 x +UseFastLocking
>  - [x] tier2 

Re: RFR: 8291555: Implement alternative fast-locking scheme [v25]

2023-03-15 Thread Fei Yang
On Tue, 14 Mar 2023 18:46:47 GMT, Roman Kennke  wrote:

>> I've reviewed the changes in v23 and v24. Trying another
>> Mach5 Tier1 job set.
>
>> I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set.
> 
> I just now pushed a simple change that changes the log message 
> 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy.

@rkennke : Hi, I have prepared some extra changes for RISC-V to make it work. 
See attachment. 

BTW: You might also want to use -w instructions in MacroAssembler::fast_unlock 
for aarch64.

[more-riscv-changes.txt](https://github.com/openjdk/jdk/files/10977109/more-riscv-changes.txt)

-

PR: https://git.openjdk.org/jdk/pull/10907