On Oct 7, 2015, at 3:01 PM, John Rose 
<john.r.r...@oracle.com<mailto:john.r.r...@oracle.com>> wrote:

On Oct 5, 2015, at 2:41 AM, Andrew Haley 
<a...@redhat.com<mailto:a...@redhat.com>> wrote:

Hi Gil,

On 04/10/15 17:22, Gil Tene wrote:

Summary

Add an API that would allow Java code to hint that a spin loop is
being executed.


I don't think this will work for ARM, which has a rather different
spinlock mechanism.

Instead of PAUSE, we wait on a lock word with WFE.  WFE puts a core
into a lightweight sleep state waiting on a particular address (the
lock word) and a write to the lock word wakes it up.  This is very
useful and somewhat analogous to 86's MONITOR/MWAIT.

I can't immediately see how to generalize your proposal to ARM, which
is a shame.

Suggestion:  Allow the hint intrinsic to take an argument, from which
a JIT can infer a memory dependency (if one is in fact present).

Even if we are just targeting a PAUSE instruction, I think it is helpful
to the JIT to add more connection points (beyond control flow) between
the intrinsic and the surrounding loop.

class jdk.internal.vm.SpinLoop {
    /** Provides a hint to the processor that a spin loop is in progress.
     *  The boolean is returned unchanged.  The processor may assume
     *  that the loop is likely to continue as long as the boolean is false.
     *  The processor may pause or wait after a false result, if there is
     *  some reason to believe that the boolean argument, if re-evaluated,
     *  will be false again.  Any pausing behavior is system-specific.
     *  The processor may not pause indefinitely.
     *  <p>Example:
     * <blockquote><pre>{@code
MyMailbox mb = …;
while (true) {
  if (!pollSpinExit(mb.hasMail())  continue;
  Object m = mb.getMail();
  if (m != null)  return m;
}
     * }</pre></blockquote>
     * /
   @jdk.internal.HotSpotIntrinsicCandidate
    public static boolean pollSpinExit(boolean spinExit) { return spinExit; }
}

I'm going to guess that the extra hinting provided by the parameter would
make it easier for a JIT to generate MWAIT and WFEs.

On the one hand:

I like the idea of (an optional?) boolean parameter as a means of hinting at 
the thing that may terminate the spin. It's probably much more general than 
identifying a specific field or address. And it can be used to cover cases that 
poll multiple addresses (an or in the boolean) or look a termination time. If 
the JVM can track down the boolean's evaluation to dependencies on specific 
memory state changes, it could pass it on to hardware, if such hardware exists.

On the other hard:

Unfortunately, I don't think that hardware support that can receive the address 
information exists right now, and if/when it does, I'm not sure the semantics 
of passing the boolean through are enough to cover the actual way to use such 
hardware when it becomes available. It is probably premature to design a 
generic way to provide addresses and/or state to this "spin until something 
interesting changes" stuff without looking at working examples. A single 
watched address API is much more likely to fit current implementations without 
being fragile.

ARM v8's WFE is probably the most real user-mode-accesible thing for this right 
now (MWAIT isn't real yet, as it's not accessible from user mode). We can look 
at an example of how a spinloop needs to coordinate the use of WFE, SEVL, and 
the evaluation of memory location with load exclusive operations here: 
http://lxr.free-electrons.com/source/arch/arm64/include/asm/spinlock.h . The 
tricky part is that the SEVL needs to immediately proceed the loop (and all 
accesses that need to be watched by the WFE), but can't be part of the loop (if 
were in the loop the WFE would always trigger immediately). But the code in the 
spinning loop can can only track a single address (the exclusive tag in load 
exclusive applies only the the most recent address used), so it would be wrong 
to allow generic code in the spin (it would have to be code that watches 
exactly one address).

My suspicion is that the "right" way to capture the various ways a spin loop 
would need to interact with RFE logic will be different than tracking things 
that can generically affect the value of a boolean. E.g. the evaluation of the 
boolean could be based on multiple addresses, and since it's not clear (in the 
API) that this is a problem, the benefits derived would be fragile. In 
addition, there can validly be state mutating logic in the loop (e.g. 
counting), and implicitly re-executing that logic repeatedly inside a 
pollSpinExit(booleanThatOnlyWatchesOneAddress) call would seem "wrong" (the 
logic would presumably proceed the call, and it would be surprising to see it 
execute more than once within the call).

I suspect that the right way to deal with RFE would be to provide an API that 
is closer to what it needs (and which is different from spin-hinting in the 
loop). E.g. some way to designate the beginning of the loop (so SEVL could be 
inserted right before it), some way to indicate the address that needs to use 
exclusive load in the loop, and some way to indicate that the loop is done. A 
possible way to do this is by wrapping the spinloop code and providing the 
address.

E.g.:

/**
 * Execute the spinCode repeatedly until it returns false. The processor
 * may assume that of the return value is false, it is likely to continue to
 * return false as long as the contents of the fieldToWatch field of the
 * objectToWatchFieldIn object does not change. he processor may therefore
 * pause or wait after a false result. The processor must not pause 
indefinitely,
 * but other pausing behavior is system-specific.
 */
void spinExecuteWhileTrue(BooleanSupplier spinCode, Field fieldToWatch, Object 
objectToWatchFieldIn);

This would probably be a good fit for the specific WFE/SEVL semantics: the loop 
is implicit to the call, so the SEVL can be placed ahead of it; The loop can 
perform a load exclusive on the designated field of the designated object, and 
the spinCode can then do whatever it wants, with the understanding that no 
address other than the fieldToWatch is being watched to provide a timely exit 
from the loop. [Similar variant can be done for watching array fields].

The same single-watched-address API will probably fit MONITOR/MWAIT if it 
becomes available, and possibly ll/sc variants in other CPUs too. But 
wider-watching variants (NCAS, TSX) will not be covered by this API. And common 
uses of the x86 PAUSE instruction wouldn't either (since they are not limited 
at all to a limited number of addresses). Th good news is that even though the 
single-address-watching API only covers limited use cases, it can be easily 
implemented on architectures that only support spin hinting. So if someone's 
use case does fit into the API and is codes to that form, they are likely to 
gain benefits on both types of platforms.

This leads me to believe that we are looking at two different APIs:
- Spin loop hinting (matching the mature use cases of the PAUSE instruction in 
x86 and HW thread priority reduction in Power).
- Single-watched-address spinning, matching ARM v8's WFE/SEVL use case, and 
potential other single address watchers (MONITOR/WAIT, and potential ll/sc 
based hints in other future cpus).

I think the first use case is very mature and well understood, and certainly 
ready for a long term supported Java SE API. The second use case only applies 
to recently introduced hardware (ARM v8 right now), but it is fairly simple and 
*may* be useful more widely in the future.

Since it can be beneficially intrinsified on platforms that support the wider 
spin-hinting API, we could add the single-address-watching for the JEP (as the 
two do seem related). I just worry that the questions about the usefulness and  
longevity of the single-address-watching use model may shadow the simplicity 
and apparent slam-dunkness of the spin loop hinting solution.


Also, the boolean argument is easy to profile in the interpreter, if that's what
a VM wants to do.

For a similar mechanism (which again uses a boolean to provide IR
connection to a data dependency), see:

http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/fe40b31c0e52/src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java#l697

In fact, something like the profileBoolean intrinsic might be useful to allow
spin loops to gather their own statistics.  Getting the array allocation right
might require an invokedynamic site (or some equivalent, like a static
method handle), in order to control the allocation of profile state per call 
site.

HTH
— John

P.S. I agree with others that this needs cooking, in a jdk.internal place,
before it is ready for SE.

Reply via email to