On Fri, 12 Jul 2024 05:57:30 GMT, Axel Boldt-Christmas <abold...@openjdk.org> 
wrote:

>> When inflating a monitor the `ObjectMonitor*` is written directly over the 
>> `markWord` and any overwritten data is displaced into a displaced 
>> `markWord`. This is problematic for concurrent GCs which needs extra care or 
>> looser semantics to use this displaced data. In Lilliput this data also 
>> contains the klass forcing this to be something that the GC has to take into 
>> account everywhere.
>> 
>> This patch introduces an alternative solution where locking only uses the 
>> lock bits of the `markWord` and inflation does not override and displace the 
>> `markWord`. This is done by keeping associations between objects and 
>> `ObjectMonitor*` in an external hash table. Different caching techniques are 
>> used to speedup lookups from compiled code.
>> 
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is 
>> only supported in combination with the LM_LIGHTWEIGHT locking mode (the 
>> default). 
>> 
>> This patch has been evaluated to be performance neutral when 
>> `UseObjectMonitorTable` is turned off (the default). 
>> 
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` 
>> and `UseObjectMonitorTable` works.
>> 
>> # Cleanups
>> 
>> Cleaned up displaced header usage for:
>>   * BasicLock
>>     * Contains some Zero changes
>>     * Renames one exported JVMCI field
>>   * ObjectMonitor
>>     * Updates comments and tests consistencies
>> 
>> # Refactoring
>> 
>> `ObjectMonitor::enter` has been refactored an a 
>> `ObjectMonitorContentionMark` witness object has been introduced to the 
>> signatures. Which signals that the contentions reference counter is being 
>> held. More details are given below in the section about deflation.
>> 
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact 
>> more seamlessly with the `ObjectMonitor::enter` code. 
>> 
>> _There is even more `ObjectMonitor` refactoring which can be done here to 
>> create a more understandable and enforceable API. There are a handful of 
>> invariants / assumptions which are not always explicitly asserted which 
>> could be trivially abstracted and verified by the type system by using 
>> similar witness objects._
>> 
>> # LightweightSynchronizer
>> 
>> Working on adapting and incorporating the following section as a comment in 
>> the source code
>> 
>> ## Fast Locking
>> 
>>   CAS on locking bits in markWord. 
>>   0b00 (Fast Locked) <--> 0b01 (Unlocked)
>> 
>>   When locking and 0b00 (Fast Locked) is observed, it may be beneficial to 
>> avoid inflating by spinning a bit.
>> 
>>   If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Update arguments.cpp

I've reviewed and tested some earlier versions of this change in the context of 
Lilliput, and haven't encountered any showstopping problems. Very nice work!

When you say 'This patch has been evaluated to be performance neutral when 
UseObjectMonitorTable is turned off (the default).' - what does the performance 
look like with +UOMT? How does it compare to -UOMT?

I've only reviewed the platform-specific changes so far. Mostly looks good, I 
only have some relatively minor remarks.

Will review the shared code changes separately.

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 318:

> 316: 
> 317:       // Loop after unrolling, advance iterator.
> 318:       increment(t3_t, in_bytes(OMCache::oop_to_oop_difference()));

Maybe I am misreading this but... in the unroll loop you avoid emitting the 
increment on the last iteration, but then you emit it explicitely here? 
Wouldn't it be cleaner to do it in the unroll loop always and elide the 
explicit increment after loop?

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 343:

> 341:     const Register t3_owner = t3;
> 342:     const ByteSize monitor_tag = in_ByteSize(UseObjectMonitorTable ? 0 : 
> checked_cast<int>(markWord::monitor_value));
> 343:     const Address owner_address{t1_monitor, 
> ObjectMonitor::owner_offset() - monitor_tag};

That may be just me, but I found that syntax weird. I first needed to look-up 
what the {}-initializer actually means. Hiccups like this reduce readability, 
IMO. I'd prefer the normal ()-init for the Address like we seem to do 
everywhere else.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674:

> 672: 
> 673:       // Loop after unrolling, advance iterator.
> 674:       increment(t, in_bytes(OMCache::oop_to_oop_difference()));

Same issue as in aarch64 code.

-------------

Changes requested by rkennke (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2174300266
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675587650
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675597362
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1675605009

Reply via email to