On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <[email protected]>
wrote:
>> When inflating a monitor the `ObjectMonitor*` is written directly over the
>> `markWord` and any overwritten data is displaced into a displaced
>> `markWord`. This is problematic for concurrent GCs which needs extra care or
>> looser semantics to use this displaced data. In Lilliput this data also
>> contains the klass forcing this to be something that the GC has to take into
>> account everywhere.
>>
>> This patch introduces an alternative solution where locking only uses the
>> lock bits of the `markWord` and inflation does not override and displace the
>> `markWord`. This is done by keeping associations between objects and
>> `ObjectMonitor*` in an external hash table. Different caching techniques are
>> used to speedup lookups from compiled code.
>>
>> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is
>> only supported in combination with the LM_LIGHTWEIGHT locking mode (the
>> default).
>>
>> This patch has been evaluated to be performance neutral when
>> `UseObjectMonitorTable` is turned off (the default).
>>
>> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT`
>> and `UseObjectMonitorTable` works.
>>
>> # Cleanups
>>
>> Cleaned up displaced header usage for:
>> * BasicLock
>> * Contains some Zero changes
>> * Renames one exported JVMCI field
>> * ObjectMonitor
>> * Updates comments and tests consistencies
>>
>> # Refactoring
>>
>> `ObjectMonitor::enter` has been refactored an a
>> `ObjectMonitorContentionMark` witness object has been introduced to the
>> signatures. Which signals that the contentions reference counter is being
>> held. More details are given below in the section about deflation.
>>
>> The initial purpose of this was to allow `UseObjectMonitorTable` to interact
>> more seamlessly with the `ObjectMonitor::enter` code.
>>
>> _There is even more `ObjectMonitor` refactoring which can be done here to
>> create a more understandable and enforceable API. There are a handful of
>> invariants / assumptions which are not always explicitly asserted which
>> could be trivially abstracted and verified by the type system by using
>> similar witness objects._
>>
>> # LightweightSynchronizer
>>
>> Working on adapting and incorporating the following section as a comment in
>> the source code
>>
>> ## Fast Locking
>>
>> CAS on locking bits in markWord.
>> 0b00 (Fast Locked) <--> 0b01 (Unlocked)
>>
>> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to
>> avoid inflating by spinning a bit.
>>
>> If 0b10 (Inflated) is observed or there is to...
>
> Axel Boldt-Christmas has updated the pull request incrementally with 10
> additional commits since the last revision:
>
> - Remove try_read
> - Add explicit to single parameter constructors
> - Remove superfluous access specifier
> - Remove unused include
> - Update assert message OMCache::set_monitor
> - Fix indentation
> - Remove outdated comment LightweightSynchronizer::exit
> - Remove logStream include
> - Remove strange comment
> - Fix javaThread include
Another review pass by me. It looks to me like the cache lookup can be
improved, see comments below.
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 323:
> 321: ldr(t1, Address(t3_t));
> 322: cmp(obj, t1);
> 323: br(Assembler::EQ, monitor_found);
I think the loop could be optimized a bit, if we start with the (cache_address)
- 1 in t3, then increment t3 at the start of the loop, and let the success-case
fall-through and only branch back to loop-start or to failure-path. Something
like:
bind(loop);
increment(t3_t, in_bytes(OMCache::oop_to_oop_difference()));
ldr(t1, Address(t3_t));
cbnz(t1, loop);
cmp(obj, t1);
br(Assembler::NE, loop);
// Success
Advantage would be that we have no forward-branch in the fast/expected case.
CPU static branch prediction tends to not like that. I'm not sure if if makes a
difference, though. Also, if you do that, then the unrolled loop also needs
corresponding adjustment.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674:
> 672:
> 673: // Search for obj in cache.
> 674: bind(loop);
Same loop transformation would be possible here.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 776:
> 774: movl(top, Address(thread, JavaThread::lock_stack_top_offset()));
> 775:
> 776: if (!UseObjectMonitorTable) {
Why is the mark loaded here in the !UOMT case, but later in the +UOMT case?
-------------
PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2179942149
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679210139
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679313050
PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679315158