Re: [Beowulf] Memory issue with Quad Dual-Core Opteron w/Two NICs

Bruce Allen Thu, 27 Sep 2007 19:47:37 -0700

Jeremy,

You might be better off posting to the Linux Kernel Mailing List (LKML)about this issue. There are a few experts here (Don, Joe, Mike, ....) whomight know, but LKML is more likely to give you correct guidance quickly.


cheers,
        Bruce

On Mon, 24 Sep 2007, Jeremy Fleming wrote:

I have a quad opteron node machine where each node is a dual core with each
core running at 2.0Gz.  The machine has 64 GB of ram, two broadcom ethernet
gigabit cards and 2 other gigabit intel cards each supplying 2 ports, and
are supported by the e1000 driver.  The machine is running the default
install of Redhat Enterprise 5.0 (original release, no patches or updates).


Remote machines are supplying ~512 megabit/sec streams over gigabit ethernet
to this machine.  There are two streams on seperate ethernet lines.  I have
each stream connected to a different port on one of the intel cards.  The
streams are sent via multicast, and there are 4 sub-streams per ethernet
line.  Each substream is approximately 131.072 megabits/sec.

On the opteron machine I have a process that can pull a substream off of an
ethernet port and dump it to a ring buffer in shared memory.  To start, the
process could never keep up with receiving the data via ethernet and then
doing a memcpy to shared memory.  Then I found out about NUMA, and decided
to use sched_setaffinity to bind the process to a cpu, I bound the process
to the same cpu the ethernet card is bound to via it's IRQ.

I looked in /proc/interrupts and found "eth0" or "eth1", looked up it's IRQ,
then went into /proc/irq/<eth 0 IRQ>/smp_affinity, and checked which cpu the
IRQ was bound to.  I bound the process to that processor and ran it again.
Luckily no data loss and it could keep up.  I bound the process before I
allocated memory so the memory was bound to the same process too.  I was
even able to run three more processes, bound to the same cpu and have all 4
read the sub-streams from the ethernet device eth0, with no data loss.  I
can even run another process which reads from the ring buffer and dumps the
data to disk and it causes no slow downs or data loss.

Now I want to read a substream from the other stream connected to eth1 while
reading from the other 4 sub-streams.  I start that up just by binding the
same application to the processor associated with eth 1, by checking
"/proc/irq/<eth 1 IRQ>/smp_affinity".  When the process starts the system
starts to not be able to keep up anymore, just like in the beginning when I
just had 1 processor reading one stream without doing anything else.  I
thought I was just trying to do too much work, so I turned off all streams,
and ran just two processes bound to two different processors, each bound to
the same processor as the associated eth device.  I ran them both, and they
lose data.  If I run them seperately they work fine, but when I read 1
sub-stream from each of the two unique streams they fail.

Are the two ethernet devices dumping their multicast data into kernel
buffers associated with different processors?
How do I know what processor the kernel ethernet buffers are associated
with?
Is there a way to set cpuaffinity for ethernet devices before they boot up
so I know which processors memory they are dumping data to?
Any ideas on why there would be a problem with reading a stream from each
eth device at the same time and not with reading 4 streams from one eth
device?
Do I need to turn a NUMA aware scheduler on somehow, or is that on by
default in RHEL 5?
I also noticed that linux assigns IRQs at bootup that vary with each boot,
is there a way to statically assign IRQs to the ethernet cards?

Any help or pointers at all would be great!

Thanks in advance
Jeremy

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Memory issue with Quad Dual-Core Opteron w/Two NICs

Reply via email to