> It is an ordering, but I don't think it's a valid one: your ellipses
> suggest an unbounded execution time (given the context of the
> discussion). I don't think that's valid because the protocol can't
> possibly negotiate execution for more instructions than it has space
> for in its pipeline. Furthermore, the pipeline cannot possibly be

yes it is an unbounded set of instruction.  i am wondering if it isn't
possible for the same core to keep winning the MESI(F) arbitration.

i don't see tying µ-ops to cachelines.  load/store buffers i believe
is where cachelines come in to play.

> filled with LOCK-prefixed instructions because it also needs to
> schedule instruction loading, and it pipelines μops, not whole

i didn't read that in the arch guide.  where did you see this?

> instructions anyway. Furthermore, part of the execution cycle is
> decomposing an instruction into its μop parts. At some point, that
> processor is not going to be executing a LOCK instruction, it is going
> to be executing some other μop (like decoding the next LOCK-prefixed
> instruction it wants to execute). This won't be done with any
> synchronization. When this happens, other processors will execute
> their LOCK-prefixed instructions.

and this is an additional assumtion that i was trying to avoid.  i'm
interested if LOCK XADD is wait free in a theory.

> further instructions. Instruction load and decode stages are shared,

this is not always true.  and i think hints at the issue that it might be
inaccurate to generalize from your cpu to all MESI cpus.

i get a 126% difference executing lock xadd 1024*1024 times
with no branches using cores 4-7 of a xeon e3-1230.  i'm sure it would
be quite a bit more impressive if it were a bit easier to turn the timer
interrupt off.

i really wish i had a four package system to play with right now.  that
could yield some really fun numbers.  :-)

- erik

example run.  output core/cycles.
; 6.lxac
4 152880511
7 288660939
6 320991900
5 338755451


Reply via email to