On Nov 29, 2011, at 9:08 AM, Jed Brown wrote: > On Tue, Nov 29, 2011 at 08:52, Dmitry Karpeev <karpeev at mcs.anl.gov> wrote: > >From what I understand Barry doesn't want the threads to spin. > > Lots of MPI calls spin because it's MUCH lower latency. Unless we have more > threads than cores, what is the problem? > > Also, synchronizing through an unguarded memory location seems to create a > race conditions. > > Not if writes are atomic. There is always a way to do atomic writes (usually > machine-word) because otherwise the operating system could not implement > synchronization primitives. > > Isn't cmpxchg instruction-set specific? > > All instruction sets have some analogue of cmpxchg because it's the building > block for all other primitives.
Shri, I think we need to investigate both the method Dmitry suggested and what Jed suggests. Barry Generally I think of sigs as being pretty slow so I am surprised by the result on the website but hey In the world Intel is pushing us toward they want us to have SEVERAL threads per core (and IBM BlueGene Q also I think) how is the spinning going to work in that situation.