Re: [osol-discuss] AMD64/NUMA: is it possible to limit amount of memory available to one CPU?

Erik Trimble Wed, 24 Mar 2010 19:04:38 -0700

Karel Gardas wrote:

Erik,


to be honest I quite don't get what you write here. The reason is engineering. 
4 socket server means that 1 cpu is connected to another 2 and forms kind of 
ring. Nice simple picture is here: 
http://ixbtlabs.com/articles2/cpu/rmma-numa2.html

Now, from this picture I would imagine the memory hiararchy should be divided into 3 
parts (not two as you write). The first is local memory connected to the CPU memory 
controller. The second is memory connected directly to CPU's neighbours CPU (2) and the 
third is memory connected directly to the CPU in the "opposite corner". ie. to 
the CPU where you need to go thorough 2 HT links to reach it. That's exactly the reason 
I'm not able to understand why you are talking just about 2 CPU groups and you seems to 
assume that local memory access is of the same latency like access over HT link? Sorry I 
don't understand. If you could be so kind and provide me with some link to AMD 
documentation. I've really tried hard googling for it, but to no avail. I've just found 
various articles like above but none which would explain what you have described here.

I went back and looked at my AMD system documentation, and I think I ledyour astray. All 800-series AMD chips have a total of 3 HyperTransportlinks, and in a 4-socket system, they are indeed layed out in a ringformation. So, any single socket has direct connection to it's "own"memory, plus connection to two neighbor CPUs.


http://support.amd.com/us/Processor_TechDocs/40555.pdf

The link above actually is really good for exactly your problem, as tohow to optimize workloads for a 4-socket system.

What this boils down to is that it requires 50 ns to get to local RAM(the DIMMs wired directly to that socket), 50ns + 1 "hop" time to get toDRAM associated with any of the neighbor's CPUs (which looks like 50ns +55 ns = 105ns), and 50ns + 2 "hop" time to get to the CPU cross-wise (50ns + 55ns +55ns = 160ns). So, you are correct - there are 3 levels tothe memory access hierarchy. You still want to pin processes toindividual CPUs if you can, to limit this NUMA penalty. If you need morethan one CPU's power, I would recommend grouping them in 2s like I hadoriginal mentioned, as it avoids the 2-hop penalty. That is, CPU0 & 1,and 2 & 3. Avoid grouping 2 & 3 and 1 & 4, as they will definitelysuffer the 2-hop problem.

What I had originally described is the layout of the 8-socket systems,which, because they likewise have only 3 HT links per socket, aregrouped in a ring also, but each "corner" is a 2-CPU group.



Sorry about that - I should think more at 3am before I post.  <yawn>

If your boss will let you, get a whole bunch of 512MB
or 1GB HP DIMMsfrom eBay, as they're dirt cheap (search for "HP(512*,1gb) PC-3200 ECC"and see what it shows, or just search for therelevant HP part numbers:376638-B21 and 376639-B21) - I'm seeing prices thatare under $100 for4GB of additional RAM. Otherwise go to someone likewww.memoryx.net andget certified memory - it's still going to besuper-cheap for thesemachines, so you /should/ be able to get enough tobalance out the banksof memory for better performance.
In fact dimms in USA looks like on the 1/3 of price here in EU. That's the 
reason why I wrote about expensive memory and I've been looking on our EU 
prices. That's for the note I'm probably going to purchase from USA eschop then.

Thanks,
Karel

MemoryX is here in Silicon Valley, and I highly recommend them. They'revery professional, very thorough, and they are /very/ experienced withdealing with non-USA customers. They have a massive stock of stuff forvirtually everything, and they have a lifetime guaranty (including amoney-back, compatibility guaranty). If you're not doing eBay, that'swho I'd deal with.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Re: [osol-discuss] AMD64/NUMA: is it possible to limit amount of memory available to one CPU?

Reply via email to