Karel Gardas wrote:
NO NO NO, You never want to do this. Processes that
run on multiple CPUs need to be able to access the
same memory range. Say you had a process load on cpu0
and start running accessing a given memory range and
then it gets kicked off cpu0 due to a interrupt or
some other reason and its next run gets on cpu1. If
cpu1 does not have access to the memory range needed,
CRASH!
Excuse me, but don't you mix physical and virtual memory concepts? I'm talking
here about physical memory available on different CPUs in NUMA box, while you
seems to talk about virtual memory available to processes. The thing is working
in a way that OS is responsible to preserve single virtual memory space to all
processes completely independently on what CPU process is running. That's at
least my understanding of the topic.
Karel
You are correct - Solaris is a single-image Virtual Memory OS, so all
memory is available to all processes in a 64-bit setup (we won't touch
that PAE stuff). NUMA here refers to non-uniform TIME memory access, so
different sections of memory have different performance
characteristics. But all memory is usable by all processes.
For a 4-socket 800-series Opteron, there are two CPU "groups" -
CPU0/CPU1 in group 0, and CPU2/CPU3 in group 1. Each group has "local"
memory, which it can access at about 50ns speeds. Memory on the other
group has a latency of around 110ns or so, roughly double that of
"local" RAM. (I'm doing this from memory, so my exact numbers may be
off, but it's roughly correct). Inside each group, you generally get
better performance by providing memory in banks of two, meaning each CPU
should have groups of 2 DIMMs in it's memory bank, and you should spread
DIMM pairs evenly between the two groups. IIRC, having but 1 DIMM for a
pair of CPUs means you get roughly 50% of the throughput that having 2
pairs of 2 DIMMs would get you.
If you don't plan to use all the CPUs, then you can pin a process to a
group of CPUs, defined at the thread level. Since each Opteron 800 has 2
cores (and 1 thread/core), you can define a cpu group which would have 4
threads, all on CPU0 and CPU1. You can then tell your process only to
run on that specific CPU group, which would mean that it would avoid the
serious memory latency issue.
Unless you do this (CPU group pinning), even if your process only uses a
small amount of RAM, there is a probability that it may get scheduled on
a CPU where the relevant memory is on the other group, and thus take the
NUMA performance hit.
If your boss will let you, get a whole bunch of 512MB or 1GB HP DIMMs
from eBay, as they're dirt cheap (search for "HP (512*,1gb) PC-3200 ECC"
and see what it shows, or just search for the relevant HP part numbers:
376638-B21 and 376639-B21) - I'm seeing prices that are under $100 for
4GB of additional RAM. Otherwise go to someone like www.memoryx.net and
get certified memory - it's still going to be super-cheap for these
machines, so you /should/ be able to get enough to balance out the banks
of memory for better performance.
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
opensolaris-discuss mailing list
[email protected]