On Oct 8, 2015, at 12:39 AM, Gil Tene <g...@azul.com> wrote:
> 
> On the one hand:
> 
> I like the idea of (an optional?) boolean parameter as a means of hinting at 
> the thing that may terminate the spin. It's probably much more general than 
> identifying a specific field or address. And it can be used to cover cases 
> that poll multiple addresses (an or in the boolean) or look a termination 
> time. If the JVM can track down the boolean's evaluation to dependencies on 
> specific memory state changes, it could pass it on to hardware, if such 
> hardware exists.

Yep.  And there is a user-mode MWAIT in SPARC M7, today.  For Intel, Dave Dice 
wrote this up:
  https://blogs.oracle.com/dave/entry/monitor_mwait_for_spin_loops

Also, from a cross-platform POV, a boolean would provide an easy to use "hook" 
for profiling how often the polling is failing.  Failure frequency is an 
important input to the tuning of spin loops, isn't it?  Why not feed that info 
through to the JVM?

> On the other hard:
> 
> Unfortunately, I don't think that hardware support that can receive the 
> address information exists right now,

(It does, on SPARC.)

> and if/when it does, I'm not sure the semantics of passing the boolean 
> through are enough to cover the actual way to use such hardware when it 
> becomes available.

The alternative is to have the JIT pattern-match for loop control around the 
call to Thread.yield. That is obviously less robust than having the user thread 
the poll condition bit through the poll primitive.

> It is probably premature to design a generic way to provide addresses and/or 
> state to this "spin until something interesting changes" stuff without 
> looking at working examples. A single watched address API is much more likely 
> to fit current implementations without being fragile.
> 
> ARM v8's WFE is probably the most real user-mode-accesible thing for this 
> right now (MWAIT isn't real yet, as it's not accessible from user mode). We 
> can look at an example of how a spinloop needs to coordinate the use of WFE, 
> SEVL, and the evaluation of memory location with load exclusive operations 
> here: http://lxr.free-electrons.com/source/arch/arm64/include/asm/spinlock.h 
> . The tricky part is that the SEVL needs to immediately proceed the loop (and 
> all accesses that need to be watched by the WFE), but can't be part of the 
> loop (if were in the loop the WFE would always trigger immediately). But the 
> code in the spinning loop can can only track a single address (the exclusive 
> tag in load exclusive applies only the the most recent address used), so it 
> would be wrong to allow generic code in the spin (it would have to be code 
> that watches exactly one address). 
> 
> My suspicion is that the "right" way to capture the various ways a spin loop 
> would need to interact with RFE logic will be different than tracking things 
> that can generically affect the value of a boolean. E.g. the evaluation of 
> the boolean could be based on multiple addresses, and since it's not clear 
> (in the API) that this is a problem, the benefits derived would be fragile.

Having the JIT explore nearby loop structure for memory references is even more 
fragile.

If we can agree that (a) there are advantages to profiling the boolean 
parameter for all platforms, and (b) the single-poll-variable case is likely to 
be optimizable sooner *with* a parameter than *without*, maybe this is enough 
to tip the scales towards boolean parameter.

The idea would be that programmers would take a little extra thought when using 
yield(Z)Z, and get paid immediately from good profiling.  They would get paid 
again later if and when platforms analyze data dependencies on the Z.

If there's no initial payoff, then, yes, it is hard asking programmers to 
expend extra thought that only benefits on some platforrms.

— John

Reply via email to