You are fast away from the real world. This has been explained million times, just like
I teach intern student every summer :-)

First of all, DDR400 and 200 MHz bus mean nothing -- A DDR 266 + 500MHz CPU system
can over perform a DDR 400 + 1.7 GHz CPU system. Another example:
   Ixxxx 2 CPU was designed with 3 level caches. Supposedly
       Level 1 to level2 takes 5 cycles
       Level 2 to level 3 takes 11 cycles
What you expect CPU to memory time (cycles) -- CPU to level-1 is one cycle ? you would expect 17 cycles to 20 cycles of total. But it actually takes 210 cycles
   due to some design issues.
Now your 1.6 GB/s reduced to 16MB/s or even worse just based on this factor.
Number of other factors affect memory bandwidth, such as bus arbitration.
Have you done any memory benchmark on a system before doing such simple calculation?

Secondly, DMA moves data from NIC to mbuf, then who moves data from mbuf to user buffer? Not human. It is CPU. When DMA moving data, can CPU moves data simultaneously? DMA takes both I/O bandwidth and memory bandwidth. If your system has only 16 MB/s memory bandwidth, your network throughput is less 8 MB/s, typically below 6.4 MB/s. If you cannot move data fast enough away from NIC, what happens? packet loss!

That is why his CPU utilization was low because there was no much data cross CPU. So, that is why I asked him what is the CPU utilization first, then the chipset. This is
the basic steps to diagnose network performance.
If you know a CPU and chipset for a system, you will know the network performance ceiling for that system, guaranteed. But it does not guarantee you can get that ceiling performance, especially over OC-12 (622 Mb/s) high-speed networks. That requires intensive tuning knowledge for current TCP stack, which is well explained on the Internet
by searching for "TCP tuning".

   -Jin

Gary Thorpe wrote:

I thought all modern NICs used bus mastering DMA i.e. not dependent on CPU for data transfers? In addition, the available memory bandwidth for modern CPU's/systems is well over 100 MB/s. DDR400 is 400 MB/s (megabytes per second). Bus mastering DMA will be limited by the memory or IO bus bandwidth primarily. The system bus bandwidth cannot be the problem either: his motherboard's lowest front side bus speed is 200 MHz * 64-bit width = 1.6 GB/s (gigabytes per second) of peak system bus bandwidth.

The limitation of 32-bit/33 MHz PCI is 133 MB/s (again, megabytes not bits) 
maximum. Gigabit ethernet requires 125 MB/s (not Mb/s) maximum bandwidth: 32/33 
PCI has enough for bursts but bus contention with disk bandwidth will reduce 
the sustained bandwidth. The motherboard in question has an option for 
integrated gigabit LAN which may bypass the shared PCI bus altogether (or it 
might not).

Anyway, the original problem was packet loss and not bandwidth. His CPU is mostly idle, so that cannot be the reason for packet loss. If 32/33 PCI can sustain 133 MB/s then it cannot be a problem because he needs less than this. If it cannot, then packets will arrive too fast from the network before they can be moved from the board into memory and would cause the packet loss. Otherwise, his system is capable of achieving what he wants in theory and the suboptimal behavior may be due to hardware (e.g. PCI bus bandwidth not being able to reach 133 MB/s sustained) or software limitations (e.g inefficient operating system).
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to