> It is an ordering, but I don't think it's a valid one: your ellipses > suggest an unbounded execution time (given the context of the > discussion). I don't think that's valid because the protocol can't > possibly negotiate execution for more instructions than it has space > for in its pipeline. Furthermore, the pipeline cannot possibly be
yes it is an unbounded set of instruction. i am wondering if it isn't possible for the same core to keep winning the MESI(F) arbitration. i don't see tying µ-ops to cachelines. load/store buffers i believe is where cachelines come in to play. > filled with LOCK-prefixed instructions because it also needs to > schedule instruction loading, and it pipelines μops, not whole i didn't read that in the arch guide. where did you see this? > instructions anyway. Furthermore, part of the execution cycle is > decomposing an instruction into its μop parts. At some point, that > processor is not going to be executing a LOCK instruction, it is going > to be executing some other μop (like decoding the next LOCK-prefixed > instruction it wants to execute). This won't be done with any > synchronization. When this happens, other processors will execute > their LOCK-prefixed instructions. and this is an additional assumtion that i was trying to avoid. i'm interested if LOCK XADD is wait free in a theory. > further instructions. Instruction load and decode stages are shared, this is not always true. and i think hints at the issue that it might be inaccurate to generalize from your cpu to all MESI cpus. i get a 126% difference executing lock xadd 1024*1024 times with no branches using cores 4-7 of a xeon e3-1230. i'm sure it would be quite a bit more impressive if it were a bit easier to turn the timer interrupt off. i really wish i had a four package system to play with right now. that could yield some really fun numbers. :-) - erik example run. output core/cycles. ; 6.lxac 4 152880511 7 288660939 6 320991900 5 338755451