Re: Thread notes

Sam Vilain Sun, 04 Jan 2004 17:45:13 -0800

On Mon, 05 Jan 2004 10:13, Dan Sugalski wrote;

  [...]
  > these things. It's a set of 8 4-processor nodes with a fast 
  > interconnect between them which functions as a 32 CPU system. The 
  > four processors in each node are in a traditional SMP setup with a 
  > shared memory bus, tightly coupled caches, and fight-for-the-bus 
  [...]


I know what a NUMA system is, I was just a little worried by the
combination of the terms SMP and NUMA in the same sentence :).

Normally "SMP" means "Shared Everything" - meaning Uniform Memory
Access.  If compared to the term MPP or AMP (in which different CPUs
are put to different tasks), it is true that each node in a NUMA
system could be put to any task.  So, the term "SMP" would seem to fit
partially; but the implication is with NUMA that there are clear
benefits to *not* having each processor doing *exactly* the same
thing, all the time.  "CPU affinity" & all that.

Groups of processors in a NUMA machine have shared a group of memory
in all the NUMA systems I've seen too (SGI Origin/Onyx, and Sun
Enterprise servers *must* be, though they don't mention it!).  So I'd
say that the "SMP" is practically redundant, borderlining on
confusing.  Maybe a term like "8 x 4MP NUMA" is better.

I did apologise at the beginning for being pedantic.  But hey, didn't
this digression serve to elaborate on the meaning of NUMA ?  :)

  > Given the increases in processor vs memory vs bus speeds, this
  > setup may not hold for that much longer, as it's only really
  > workable when a single CPU doesn't saturate the memory bus with
  > any regularity, which is getting harder and harder to
  > do. (backplane and memory speeds can be increased pretty
  > significantly with a sufficient application of cash, which is why
  > the mini and mainframe systems can actually do it, but there are
  > limits beyond which cash just won't get you)

Opteron, and Sparc IV (IIRC) both have 3 bi-directional high speed
(=core speed) interconnects, so these could `easily' be arranged into
NUMA configurations with SMP groups.  Also, some high-end processors
are going multicore, which presumably has different characteristics
again (especially if the two chips on the die share a cache!).

Then of course there's single-processor multi-threading (eg, Intel
HyperThreading).  These systems have twice the registers internally
and interleave instructions from each `thread' as the processor can
deal with them; using seperate registers for them all helps keep the
execution units busy (kind of like what GCC does with unrolled loops
on a RISC system with more registers in the first place).  These
perform like little NUMAs, because the cache is `hotter' (ie, locks on
those memory pages are held) on the other virtual processor than other
CPUs on the motherboard.  If my understanding is correct, the Intel
implementation is not truly SMP, as the other virtual processor must
share code segments to run threads.  If that is true, doing JIT (or
otherwise changing the executable code, eg dlopen()) in a thread might
break Hyperthreading.  But then again, it might not.  Maybe someone
who gives a flying fork() would like to devise a test to see if this
is the case.

Apparently current dual Opteron systems are also effectively NUMA (as
each chip has its own memory controller), but at the moment, NUMA mode
with Linux is slower than straight SMP mode.  Presumably because it's
a bitch to code for ;-)

So these fun systems are here to stay!  :)
-- 
Sam Vilain, [EMAIL PROTECTED]

All things being equal, a fat person uses more soap than a thin
person.
 - anon.

Re: Thread notes

Reply via email to