Re: Spin Loop Hint support: Draft JEP proposal

Gil Tene Thu, 08 Oct 2015 00:40:24 -0700

On Oct 7, 2015, at 3:01 PM, John Rose 
<john.r.r...@oracle.com<mailto:john.r.r...@oracle.com>> wrote:

On Oct 5, 2015, at 2:41 AM, Andrew Haley
<a...@redhat.com<mailto:a...@redhat.com>> wrote:

Hi Gil,

On 04/10/15 17:22, Gil Tene wrote:

Summary

Add an API that would allow Java code to hint that a spin loop is
being executed.

I don't think this will work for ARM, which has a rather different
spinlock mechanism.

Instead of PAUSE, we wait on a lock word with WFE. WFE puts a core
into a lightweight sleep state waiting on a particular address (the
lock word) and a write to the lock word wakes it up. This is very
useful and somewhat analogous to 86's MONITOR/MWAIT.

I can't immediately see how to generalize your proposal to ARM, which
is a shame.

Suggestion: Allow the hint intrinsic to take an argument, from which
a JIT can infer a memory dependency (if one is in fact present).

Even if we are just targeting a PAUSE instruction, I think it is helpful
to the JIT to add more connection points (beyond control flow) between
the intrinsic and the surrounding loop.

class jdk.internal.vm.SpinLoop {
/** Provides a hint to the processor that a spin loop is in progress.
* The boolean is returned unchanged. The processor may assume
* that the loop is likely to continue as long as the boolean is false.
* The processor may pause or wait after a false result, if there is
* some reason to believe that the boolean argument, if re-evaluated,
* will be false again. Any pausing behavior is system-specific.
* The processor may not pause indefinitely.
* <p>Example:
* <blockquote><pre>{@code
MyMailbox mb = …;
while (true) {
if (!pollSpinExit(mb.hasMail()) continue;
Object m = mb.getMail();
if (m != null) return m;
}
* }</pre></blockquote>
* /
@jdk.internal.HotSpotIntrinsicCandidate
public static boolean pollSpinExit(boolean spinExit) { return spinExit; }
}

I'm going to guess that the extra hinting provided by the parameter would
make it easier for a JIT to generate MWAIT and WFEs.

On the one hand:

I like the idea of (an optional?) boolean parameter as a means of hinting at
the thing that may terminate the spin. It's probably much more general than
identifying a specific field or address. And it can be used to cover cases that
poll multiple addresses (an or in the boolean) or look a termination time. If
the JVM can track down the boolean's evaluation to dependencies on specific
memory state changes, it could pass it on to hardware, if such hardware exists.

On the other hard:

Unfortunately, I don't think that hardware support that can receive the address
information exists right now, and if/when it does, I'm not sure the semantics
of passing the boolean through are enough to cover the actual way to use such
hardware when it becomes available. It is probably premature to design a
generic way to provide addresses and/or state to this "spin until something
interesting changes" stuff without looking at working examples. A single
watched address API is much more likely to fit current implementations without
being fragile.

ARM v8's WFE is probably the most real user-mode-accesible thing for this right
now (MWAIT isn't real yet, as it's not accessible from user mode). We can look
at an example of how a spinloop needs to coordinate the use of WFE, SEVL, and
the evaluation of memory location with load exclusive operations here:
http://lxr.free-electrons.com/source/arch/arm64/include/asm/spinlock.h . The
tricky part is that the SEVL needs to immediately proceed the loop (and all
accesses that need to be watched by the WFE), but can't be part of the loop (if
were in the loop the WFE would always trigger immediately). But the code in the
spinning loop can can only track a single address (the exclusive tag in load
exclusive applies only the the most recent address used), so it would be wrong
to allow generic code in the spin (it would have to be code that watches
exactly one address).

My suspicion is that the "right" way to capture the various ways a spin loop
would need to interact with RFE logic will be different than tracking things
that can generically affect the value of a boolean. E.g. the evaluation of the
boolean could be based on multiple addresses, and since it's not clear (in the
API) that this is a problem, the benefits derived would be fragile. In
addition, there can validly be state mutating logic in the loop (e.g.
counting), and implicitly re-executing that logic repeatedly inside a
pollSpinExit(booleanThatOnlyWatchesOneAddress) call would seem "wrong" (the
logic would presumably proceed the call, and it would be surprising to see it
execute more than once within the call).

I suspect that the right way to deal with RFE would be to provide an API that
is closer to what it needs (and which is different from spin-hinting in the
loop). E.g. some way to designate the beginning of the loop (so SEVL could be
inserted right before it), some way to indicate the address that needs to use
exclusive load in the loop, and some way to indicate that the loop is done. A
possible way to do this is by wrapping the spinloop code and providing the
address.

E.g.:

/**
* Execute the spinCode repeatedly until it returns false. The processor
* may assume that of the return value is false, it is likely to continue to
* return false as long as the contents of the fieldToWatch field of the
* objectToWatchFieldIn object does not change. he processor may therefore
* pause or wait after a false result. The processor must not pause
indefinitely,
* but other pausing behavior is system-specific.
*/
void spinExecuteWhileTrue(BooleanSupplier spinCode, Field fieldToWatch, Object
objectToWatchFieldIn);

This would probably be a good fit for the specific WFE/SEVL semantics: the loop
is implicit to the call, so the SEVL can be placed ahead of it; The loop can
perform a load exclusive on the designated field of the designated object, and
the spinCode can then do whatever it wants, with the understanding that no
address other than the fieldToWatch is being watched to provide a timely exit
from the loop. [Similar variant can be done for watching array fields].

The same single-watched-address API will probably fit MONITOR/MWAIT if it
becomes available, and possibly ll/sc variants in other CPUs too. But
wider-watching variants (NCAS, TSX) will not be covered by this API. And common
uses of the x86 PAUSE instruction wouldn't either (since they are not limited
at all to a limited number of addresses). Th good news is that even though the
single-address-watching API only covers limited use cases, it can be easily
implemented on architectures that only support spin hinting. So if someone's
use case does fit into the API and is codes to that form, they are likely to
gain benefits on both types of platforms.

This leads me to believe that we are looking at two different APIs:
- Spin loop hinting (matching the mature use cases of the PAUSE instruction in
x86 and HW thread priority reduction in Power).
- Single-watched-address spinning, matching ARM v8's WFE/SEVL use case, and
potential other single address watchers (MONITOR/WAIT, and potential ll/sc
based hints in other future cpus).

I think the first use case is very mature and well understood, and certainly
ready for a long term supported Java SE API. The second use case only applies
to recently introduced hardware (ARM v8 right now), but it is fairly simple and
*may* be useful more widely in the future.

Since it can be beneficially intrinsified on platforms that support the wider
spin-hinting API, we could add the single-address-watching for the JEP (as the
two do seem related). I just worry that the questions about the usefulness and
longevity of the single-address-watching use model may shadow the simplicity
and apparent slam-dunkness of the spin loop hinting solution.

Also, the boolean argument is easy to profile in the interpreter, if that's what
a VM wants to do.

For a similar mechanism (which again uses a boolean to provide IR
connection to a data dependency), see:

http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/fe40b31c0e52/src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java#l697

In fact, something like the profileBoolean intrinsic might be useful to allow
spin loops to gather their own statistics. Getting the array allocation right
might require an invokedynamic site (or some equivalent, like a static
method handle), in order to control the allocation of profile state per call
site.

HTH
— John

P.S. I agree with others that this needs cooking, in a jdk.internal place,
before it is ready for SE.

Re: Spin Loop Hint support: Draft JEP proposal

Reply via email to