You are fast away from the real world. This has been explained million
times, just like
I teach intern student every summer :-)
First of all, DDR400 and 200 MHz bus mean nothing -- A DDR 266 + 500MHz
CPU system
can over perform a DDR 400 + 1.7 GHz CPU system. Another example:
Ixxxx 2 CPU was designed with 3 level caches. Supposedly
Level 1 to level2 takes 5 cycles
Level 2 to level 3 takes 11 cycles
What you expect CPU to memory time (cycles) -- CPU to level-1 is one
cycle ?
you would expect 17 cycles to 20 cycles of total. But it actually
takes 210 cycles
due to some design issues.
Now your 1.6 GB/s reduced to 16MB/s or even worse just based on this
factor.
Number of other factors affect memory bandwidth, such as bus arbitration.
Have you done any memory benchmark on a system before doing such simple
calculation?
Secondly, DMA moves data from NIC to mbuf, then who moves data from mbuf
to user buffer?
Not human. It is CPU. When DMA moving data, can CPU moves data
simultaneously?
DMA takes both I/O bandwidth and memory bandwidth. If your system has
only 16 MB/s
memory bandwidth, your network throughput is less 8 MB/s, typically
below 6.4 MB/s.
If you cannot move data fast enough away from NIC, what happens?
packet loss!
That is why his CPU utilization was low because there was no much data
cross CPU.
So, that is why I asked him what is the CPU utilization first, then the
chipset. This is
the basic steps to diagnose network performance.
If you know a CPU and chipset for a system, you will know the network
performance
ceiling for that system, guaranteed. But it does not guarantee you can
get that ceiling
performance, especially over OC-12 (622 Mb/s) high-speed networks. That
requires
intensive tuning knowledge for current TCP stack, which is well
explained on the Internet
by searching for "TCP tuning".
-Jin
Gary Thorpe wrote:
I thought all modern NICs used bus mastering DMA i.e. not dependent on
CPU for data transfers? In addition, the available memory bandwidth
for modern CPU's/systems is well over 100 MB/s. DDR400 is 400 MB/s
(megabytes per second). Bus mastering DMA will be limited by the
memory or IO bus bandwidth primarily. The system bus bandwidth cannot
be the problem either: his motherboard's lowest front side bus speed
is 200 MHz * 64-bit width = 1.6 GB/s (gigabytes per second) of peak
system bus bandwidth.
The limitation of 32-bit/33 MHz PCI is 133 MB/s (again, megabytes not bits)
maximum. Gigabit ethernet requires 125 MB/s (not Mb/s) maximum bandwidth: 32/33
PCI has enough for bursts but bus contention with disk bandwidth will reduce
the sustained bandwidth. The motherboard in question has an option for
integrated gigabit LAN which may bypass the shared PCI bus altogether (or it
might not).
Anyway, the original problem was packet loss and not bandwidth. His CPU is mostly idle, so that cannot be the reason for packet loss. If 32/33 PCI can sustain 133 MB/s then it cannot be a problem because he needs
less than this. If it cannot, then packets will arrive too fast from the network before they can be moved from the board into memory and would cause the packet loss. Otherwise, his system is capable of achieving what he wants in theory and the suboptimal behavior may be due to hardware (e.g. PCI bus bandwidth not being able to reach 133 MB/s sustained) or software limitations (e.g inefficient operating system).
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"