All,
I am not sure if this is the right list for this, but I guess this is a good
place to start.
I am developing for an embedded system (a 64-bit Core 2 Duo). I am running a
vanilla kernel (2.6.39). Our application is simply this - receive multicast
data from the network and dump it to an SSD.
My problem comes in two parts: unicast data works somewhat and multicast data
does not at all.
Here is my test scenario (everything is UDP - no TCP):
A PC running Win7 blasts a file over the network - this file simple contains a
ramp of integers (0, 1, 2, 3, 4, 5, ...). On the receiving side I receive the
datagrams (however large - we are allowing for the IP stack to handle
fragmentation) and check the first value of the ramp that is in that datagram.
When the value of the data in the packet is not what I expect, I print an
error, set my counter to the value of the data, and move on.
When I am unicasting, everything is just peachy until I get to large datagram
sizes. I can run at high rates (up to 30 MiB/s I have tested) and don't lose
anything. I have run with varying datagram lengths; from 5000 bytes to 30000
bytes. Once I start to get into the 15000 to 20000 byte range, things go south
(no offense to anyone from the south :) ). My MTU is set to 9000 on the
transmit and receive side (and yes, my switches handle jumbo frames...).
When I am multicasting, things go bad. Even running at a low rate (say 5
MiB/s) I see missing datagrams. The things that really bothers me is that
these do not show up anywhere. ifconfig does not report anything dropped, and
ethtool -S does not report any errors whatsoever. But it is clear to me that
packets are being missed. With a datagram size of 5000 things work okay, but
when I step up to 8500 things go awry.
It seems like something is up with larger datagrams, but I can't seem to put my
finger on it. I have tried both of my network interfaces and they both behave
in the same manner.
I have looked at some sysctl parameters and everything seems to look good. I
increased net.core.rmem_max and net.core.wmem_max to 64 MB (way overkill I
think, nevertheless...). net.ipv4.udp_mem is "186048 248064 372096". That is
just what it is defaulting to, which seems like oodles of pages. Nothing else
I could find really dealt with my scenario.
I have attached the test code if anyone feels like that may help get to the
bottom of the issue. I don't know if this is just an exercise in optimizing
the network stack, or if there really is a bug somewhere. Any way you want to
look at it, any help is appreciated.
Output follows...
Thanks!
Jonathan
Here is the hardware I have:
ADLGS45 (Advanced Digital Logic PCI/104-Express Core 2 Duo SBC)
eth0 = Intel 82567LM-2 Gigabit Network Controller
eth1 = Intel 82574L Gigabit Network Controller
Here is some output from the system:
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:01:05:0A:4A:92
inet addr:172.31.22.90 Bcast:172.31.22.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:1674366 errors:0 dropped:0 overruns:0 frame:0
TX packets:2620 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10694569545 (9.9 GiB) TX bytes:459481 (448.7 KiB)
Interrupt:20 Memory:fdfc0000-fdfe0000
# ethtool -i eth0
driver: e1000e
version: 1.3.10-k2
firmware-version: 1.8-4
bus-info: 0000:00:19.0
# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off
# ethtool -S eth0
NIC statistics:
rx_packets: 1674840
tx_packets: 2772
rx_bytes: 10694622001
tx_bytes: 481113
rx_broadcast: 1063
tx_broadcast: 0
rx_multicast: 159412
tx_multicast: 4
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 159412
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 10694622001
rx_csum_offload_good: 106021
rx_csum_offload_errors: 0
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
rx_dma_failed: 0
tx_dma_failed: 0
Output for eth1 is very similar, but here is ethtool -i eth1
# ethtool -i eth1
driver: e1000e
version: 1.3.10-k2
firmware-version: 15.255-15
bus-info: 0000:02:00.0
Now here is some output from my test program. Packets = XXX --> MB = XXXX is
just a status thread printing out the current rates.
# benchnetRX
Usage: benchnetRX <ip> <port> <max datagramlen> <recvbuf in KB>
<multicast> <benchmark>
# benchnetRX 3 19601 50000 16384 0 1
-> Beginning network benchmark ...
-> Creating socket for benchmark on 00000000:19601
-> Packets = 159 --> MB = 1.011
-> Packets = 750 --> MB = 4.768
lost packets = 0
-> Packets = 749 --> MB = 4.762
lost packets = 1
-> Packets = 750 --> MB = 4.768
lost packets = 1
-> Packets = 748 --> MB = 4.756
-> Packets = 749 --> MB = 4.762
lost packets = 2
-> Packets = 746 --> MB = 4.743
lost packets = 5
-> Packets = 739 --> MB = 4.698
-> Packets = 520 --> MB = 3.306
^C -> Exiting network RX benchmark...
# benchnetRX 3 19601 50000 16384 1 1
-> Beginning network benchmark ...
-> Creating socket for benchmark on e0010103:19601
-> Packets = 256 --> MB = 1.628
lost packets = 1
-> Packets = 750 --> MB = 4.768
-> Packets = 749 --> MB = 4.762
lost packets = 3
-> Packets = 750 --> MB = 4.768
lost packets = 0
-> Packets = 751 --> MB = 4.775
lost packets = 0
-> Packets = 749 --> MB = 4.762
-> Packets = 750 --> MB = 4.768
lost packets = 3
-> Packets = 750 --> MB = 4.768
lost packets = 2
-> Packets = 743 --> MB = 4.724
-> Packets = 718 --> MB = 4.565
lost packets = 37
^C -> Exiting network RX benchmark...
--
Jonathan R. Haws
Electrical Engineer
Space Dynamics Laboratory
(435) 713-3489
[email protected]
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired