When inflating a monitor the `ObjectMonitor*` is written directly over the
`markWord` and any overwritten data is displaced into a displaced `markWord`.
This is problematic for concurrent GCs which needs extra care or looser
semantics to use this displaced data. In Lilliput this data also contains the
klass forcing this to be something that the GC has to take into account
everywhere.
This patch introduces an alternative solution where locking only uses the lock
bits of the `markWord` and inflation does not override and displace the
`markWord`. This is done by keeping associations between objects and
`ObjectMonitor*` in an external hash table. Different caching techniques are
used to speedup lookups from compiled code.
A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is only
supported in combination with the LM_LIGHTWEIGHT locking mode (the default).
This patch has been evaluated to be performance neutral when
`UseObjectMonitorTable` is turned off (the default).
Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT`
and `UseObjectMonitorTable` works.
# Cleanups
Cleaned up displaced header usage for:
* BasicLock
* Contains some Zero changes
* Renames one exported JVMCI field
* ObjectMonitor
* Updates comments and tests consistencies
# Refactoring
`ObjectMonitor::enter` has been refactored an a `ObjectMonitorContentionMark`
witness object has been introduced to the signatures. Which signals that the
contentions reference counter is being held. More details are given below in
the section about deflation.
The initial purpose of this was to allow `UseObjectMonitorTable` to interact
more seamlessly with the `ObjectMonitor::enter` code.
_There is even more `ObjectMonitor` refactoring which can be done here to
create a more understandable and enforceable API. There are a handful of
invariants / assumptions which are not always explicitly asserted which could
be trivially abstracted and verified by the type system by using similar
witness objects._
# LightweightSynchronizer
Working on adapting and incorporating the following section as a comment in the
source code
## Fast Locking
CAS on locking bits in markWord.
0b00 (Fast Locked) <--> 0b01 (Unlocked)
When locking and 0b00 (Fast Locked) is observed, it may be beneficial to
avoid inflating by spinning a bit.
If 0b10 (Inflated) is observed or there is to much contention or to long
critical sections for spinning to be feasible, inflated locking is performed.
### Fast Lock Spinning (UseObjectMonitorTable)
When a thread fails fast locking when a monitor is not yet inflated, it will
spin on the markWord using a exponential backoff scheme. The thread will
attempt the fast lock CAS and then SpinWait() for some time, doubling with
every failed attempt, up to a maximum number of attempts. There is a diagnostic
VM option LightweightFastLockingSpins which can be used to tune this value. The
behavior of SpinWait() can be hardware dependent.
A future improvement may be to adapt this spinning limit to observed
behavior. Which would automatically adapt to the different hardware behavior of
SpinWait().
## Inflated Locking
Inflated locking means that a ObjectMonitor is associated with the object and
is used for locking instead of the locking bits in the markWord.
## Inflated Locking without table (!UseObjectMonitorTable)
An inflating thread will create a ObjectMonitor and CAS the ObjectMonitor*
into the markWord along with the 0b10 (Inflated) lock bits. If the transition
of the lock bits is from 0b00 (Fast Locked) the ObjectMonitor must be published
with an anonymous owner (setting _owner to ANONYMOUS_OWNER). If the transition
of the lock bits is from 0b00 (Unlocked) the ObjectMonitor is published with no
owner.
When encountering an ObjectMonitor with an anonymous owner the thread checks
its lock stack to see if it is the owner, in which case it removes the object
from its lock stack and sets itself as the owner of the ObjectMonitor along
with fixing the recursion level to correspond to the number of removed lock
stack entires.
## Inflated Locking with table (UseObjectMonitorTable)
Because publishing the ObjectMonitor* and signaling that a object's monitor
is inflated is not atomic, more care must be taken (in the presence of
deflation) so that all threads agree on which ObjectMonitor* to use.
When encountering an ObjectMonitor with an anonymous owner the thread checks
its lock stack to see if it is the owner, in which case it removes the object
from its lock stack and sets itself as the owner of the ObjectMonitor along
with fixing the recursion level to correspond to the number of removed lock
stack entires.
All complications arise from deflation, or the process of disassociating an
ObjectMonitor from its Java Object. So first the mechanism used for deflation
is explained. Followed by retrieval and creation of ObjectMonitors.
### Deflation
An ObjectMonitor can only be deflated if it has no owner, its queues are
empty and no thread is in a scope where it has incremented and checked the
contentions reference counter.
The interactions between deflation and wait is handled by having the owner
and wait queue entry overlap to blocks out deflation; the wait queue entry is
protected by a waiters reference counter which is only modified by the waiters
while holding the monitor, incremented before exiting the monitor and
decremented after reentering the monitor.
For enter and exit where the deflator may observe empty queues and no owner a
two step mechanism is used to synchronize deflation with concurrently locking
threads; deflation is synchronized using the contentions reference counter.
In the text below we refer to "holding the contentions reference counter".
This means that a thread has incremented the contentions reference counter and
verified that it is not negative.
```c++
if (Atomic::fetch_and_add(&monitor->_contentions, 1) >= 0) {
// holding the contentions reference counter
}
Atomic::decrement(&monitor->_contentions);
```
#### Deflation protocol
The first step for the deflator is to try and CAS the owner from no owner to
a special marker (DEFLATER_MARKER). If this is successful it blocks any
entering thread from successfully installing themselves as the owner and causes
compiled code to take a slow path and call into the runtime.
The second step for the deflator is to check waiters reference counter and if
it is 0 try CAS the contentions reference counter from 0 to a large negative
value (INT_MIN). If this succeeds the monitor is deflated.
The deflator does not have to check the entry queues because every thread on
the entry queues must have either hold the contentions reference counter, or
incremented the waiters reference counter, in the case they were moved from the
wait queue to the entry queues by a notify. The deflator check the waiters
reference counter, with the memory ordering of Waiter: { increment waiters
reference counter; release owner }, Deflator: { acquire owner; check waiters
reference counter }. All threads on the entry queues or wait queue invariantly
holds the contentions reference counter or the waiters reference counter.
#### Deflation cleanup
If deflation succeeds, locking bits are then transitioned back to 0b01
(Unlocked). With UseObjectMonitorTable it is required that this is done by the
deflator, or it could lead to ABA problems in the locking bits. Without the
table the whole ObjectMonitor* is part of the markWord transition, with its
pointer being phased out of the system with a handshake, making every value
distinguishable and avoiding ABA issues.
For UseObjectMonitorTable the deflated monitor is also removed from the
table. This is done after transitioning the markWord to allow concurrently
entering threads to fast lock on the object while the monitor is being removed
from the hash table.
If deflation fails after the marker (DEFLATER_MARKER) has been CASed into the
owner field the owner must be restored. From the deflation threads point of
view it is as simple as CASing from the marker to no owner. However to not have
all threads depend on the deflation thread making progress here we allow any
thread to CAS from the marker if that thread has both incremented and checked
the contentions counter. This thread has now effectively canceled the
deflation, but it is important that the deflator observes this fact, we do this
by forgetting to decrement the contentions counter. The effect is that the
contentions CAS will fail, which will force the deflator to try and restore the
owner, but this will also fail because it got canceled. So the deflator
decrements the contentions counter instead on behalf of the canceling thread to
balance the reference counting. (Currently this is implemented by doing a +1 +1
-1 reference count on the locking thread, but a simple only +1 would s
uffice).
### Retrieve ObjectMonitor
#### HashTable
Maintains a mapping between Java Objects and ObjectMonitors. Lookups are done
via the objects identity_hash. If the hash table contains an ObjectMonitor for
a specific object then that ObjectMonitor is used for locking unless it is
being deflated.
Only deflation removes (not dead) entries inside the HashTable.
#### ThreadLocal Cache (UseObjectMonitorTable)
The most recently locked ObjectMonitors by a thread are cached in that
thread's local storage. These are used to elide hash table lookups. These
caches uses raw oops to make cache lookups trivial. However this requires
special handling of the cache at safepoints. The caches are cleared when a
safepoint is triggered (instead of letting the gc visit them), this to avoid
keeping cache entries as gc roots.
These cache entires may become deflated, but locking on such a monitor still
participates in the normal deflation protocol. Because these entries are
cleared during a safepoint, the handshake performed by monitor deflation to
phase out ObjectMonitor* from the system will also phase these out.
#### StackLocal Cache
Each monitorenter has a corresponding BasicLock entry on the stack. Each
successful inflated monitorenter saves the ObjectMonitor* inside this BasicLock
entry and retrieves it when performing the corresponding monitorexit.
This means it is important that the BasicLock entry is always initialized to
a known state (nullptr is used).
The RAII object class CacheSetter is used to ensure that the BasicLock gets
initialized before leaving the runtime code, and that both caches gets updated
correctly. (Only once, with the same locked ObjectMonitor).
The cache entries are set when a monitor is entered and never used again
after a that monitored has been exited. So there are no interactions with
deflation here. Similarly these caches does not track the associated oop, but
rely on the fact that the same BasicLock data created for a monitorenter is
used when executing the corresponding monitorexit.
### Creating ObjectMonitor
If retrieval of the ObjectMonitor fails, because there is no ObjectMonitor,
either because this is the first time inflating or the ObjectMonitor has been
deflated a new ObjectMonitor must be created and associated with the object.
The inflating thread will then attempt to insert a newly created
ObjectMonitor in the hash table. The important invariant is that any
ObjectMonitor inserted must have an anonymous owner (setting _owner to
ANONYMOUS_OWNER).
This solves the issue of not being able to atomically inserting the
ObjectMonitor in the hash table, and transitioning the markWord to 0b10
(Inflated). We instead have all inflating threads insert an identical
anonymously owned ObjectMonitor in the table and then decide ownership based on
how the markWord is transitioned to 0b10 (Inflated). Note: Only one
ObjectMonitor can be inserted.
This also has the effect of blocking deflation on a newly inserted
ObjectMonitor, until the contentions reference counter can be incremented. The
contentions reference counter is held while transitioning the markWord to block
out deflation.
* If a thread observes 0b10 (Inflated)
* If the current thread is the thread that fast locked, take ownership.
Update ObjectMonitor _recursions based on fast locked recursions.
Call ObjectMonitor::enter(current);
* Otherwise Some other thread is the owner, and will claim ownership.
Call ObjectMonitor::enter(current);
* If a thread succeeds with the CAS to 0b10 (Inflated)
* From 0b00 (Fast Locked)
* If the current thread is the thread that fast locked, take ownership.
Update ObjectMonitor _recursions based on fast locked recursions.
Call ObjectMonitor::enter(current);
* Otherwise Some other thread is the owner, and will claim ownership.
Call ObjectMonitor::enter(current);
* From 0b01 (Unlocked)
* Claim ownership, no ObjectMonitor::enter is required.
* If a thread fails the CAS reload markWord and retry
### Un-contended Inflated Locking
CAS on _owner field in ObjectMonitor.
JavaThread* (Locked By Thread) <--> nullptr (Unlocked)
### Contended Inflated Locking
Blocks out deflation.
Spin CAS on _owner field in ObjectMonitor.
JavaThread* (Locked By Thread) <--> nullptr (Unlocked)
Details in ObjectMonitor.hpp
### HashTable Resizing and Cleanup
Resizing is currently handled with the similar logic to what the string and
symbol table uses. And is delegated to the ServiceThread.
The goal is to eventually this to deflation thread, to allow for better
interactions with the deflation cycles, making it possible to also shrink the
table. But this will be done incrementally as a separate enhancement. The
ServiceThread is currently used to deal with the fact that we currently allow
the deflation thread to be turned off via JVM options.
Cleanup is mostly handled by the the deflator which actively removes deflated
monitors, which includes monitors for dead objects. However we allow any thread
to remove dead objects' ObjectMonitor* associations. But actual memory
reclamation of the ObjectMonitor is always handled by the deflator.
The table is currently initialized before `init_globals`, as such the max
size of the table which is based on `MaxHeapSize` may be incorrect because it
is not yet finalized.
-------------
Commit messages:
- 8315884: New Object to ObjectMonitor mapping
Changes: https://git.openjdk.org/jdk/pull/20067/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20067&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8315884
Stats: 3613 lines in 70 files changed: 2700 ins; 313 del; 600 mod
Patch: https://git.openjdk.org/jdk/pull/20067.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/20067/head:pull/20067
PR: https://git.openjdk.org/jdk/pull/20067