On Mon, 05 Jan 2004 10:13, Dan Sugalski wrote; [...] > these things. It's a set of 8 4-processor nodes with a fast > interconnect between them which functions as a 32 CPU system. The > four processors in each node are in a traditional SMP setup with a > shared memory bus, tightly coupled caches, and fight-for-the-bus [...]
I know what a NUMA system is, I was just a little worried by the combination of the terms SMP and NUMA in the same sentence :). Normally "SMP" means "Shared Everything" - meaning Uniform Memory Access. If compared to the term MPP or AMP (in which different CPUs are put to different tasks), it is true that each node in a NUMA system could be put to any task. So, the term "SMP" would seem to fit partially; but the implication is with NUMA that there are clear benefits to *not* having each processor doing *exactly* the same thing, all the time. "CPU affinity" & all that. Groups of processors in a NUMA machine have shared a group of memory in all the NUMA systems I've seen too (SGI Origin/Onyx, and Sun Enterprise servers *must* be, though they don't mention it!). So I'd say that the "SMP" is practically redundant, borderlining on confusing. Maybe a term like "8 x 4MP NUMA" is better. I did apologise at the beginning for being pedantic. But hey, didn't this digression serve to elaborate on the meaning of NUMA ? :) > Given the increases in processor vs memory vs bus speeds, this > setup may not hold for that much longer, as it's only really > workable when a single CPU doesn't saturate the memory bus with > any regularity, which is getting harder and harder to > do. (backplane and memory speeds can be increased pretty > significantly with a sufficient application of cash, which is why > the mini and mainframe systems can actually do it, but there are > limits beyond which cash just won't get you) Opteron, and Sparc IV (IIRC) both have 3 bi-directional high speed (=core speed) interconnects, so these could `easily' be arranged into NUMA configurations with SMP groups. Also, some high-end processors are going multicore, which presumably has different characteristics again (especially if the two chips on the die share a cache!). Then of course there's single-processor multi-threading (eg, Intel HyperThreading). These systems have twice the registers internally and interleave instructions from each `thread' as the processor can deal with them; using seperate registers for them all helps keep the execution units busy (kind of like what GCC does with unrolled loops on a RISC system with more registers in the first place). These perform like little NUMAs, because the cache is `hotter' (ie, locks on those memory pages are held) on the other virtual processor than other CPUs on the motherboard. If my understanding is correct, the Intel implementation is not truly SMP, as the other virtual processor must share code segments to run threads. If that is true, doing JIT (or otherwise changing the executable code, eg dlopen()) in a thread might break Hyperthreading. But then again, it might not. Maybe someone who gives a flying fork() would like to devise a test to see if this is the case. Apparently current dual Opteron systems are also effectively NUMA (as each chip has its own memory controller), but at the moment, NUMA mode with Linux is slower than straight SMP mode. Presumably because it's a bitch to code for ;-) So these fun systems are here to stay! :) -- Sam Vilain, [EMAIL PROTECTED] All things being equal, a fat person uses more soap than a thin person. - anon.