> Dunno what to say. I'm not trying this on Plan 9, and I can't
> reproduce your results on an i7 or an e5-2690. I'm certainly not
> claiming that all pipelines, processors, and caches are equal, but
> I've simply never seen this behavior. I also can't think of an
> application in which one would want to execute a million consecutive
> LOCK-prefixed instructions. Perhaps I just lack imagination.

the original question was, are locked instructions wait free.  the
purpose was to see if there could be more than a few percent
variation, and it appears there can be.

i think in modern intel microarches this question boils down to,
can a given cpu thread in an arbitrary topology transition an
arbitrary cacheline from an arbitrary state to state E in a bounded
number of QPI cycles.

of course this reformulation isn't particular helpful, at least to me,
given the vagueness in the descriptions i've seen.

this is practically important because if there is a dogpile lock or
bit of shared memory in the system, then a particular cpu may
end up waiting quite a long time to acquire the lock.  it may
even get starved out for a bit, perhaps with a subset of other cpus
"batting around" more than once.  

this would be hidden waiting behavior that might lead to surprising
(lack of) performance.

i would say this vaguery could impede worst-cast analysis for safety
critical systems, but you'd be pretty adventursome, to say the least,
to use a processor with SMM in such a system.

- erik

Reply via email to