BUG: sky2: hw csum failure with dual-port copper NIC on SMP
I am getting "hw csum failure" messages with sky2. I have seen this problem reported elsewhere with a fibre NIC, but I am using a copper NIC. It seems to be triggered by SMP. It is easy to reproduce in 2.6.23. 2.6.24-rc2-git3 still has the problem, but it happens less frequently. To reproduce the problem, I am using a simple network benchmark program that I wrote that basically does send()/recv() as fast as possible using a memory buffer (null data, no disk I/O, no data integrity checking). The computer with the SysKonnect NIC acts as the server. I have two other computers with Intel PRO/1000 NICs that are directly cabled to the two ports on the SysKonnect NIC. Each of them runs the client program, which connects to the server, send()s 10 GB, and then recv()s 10 GB. Essentially, both ports on the Syskonnect NIC are receiving at the maximum rate for a few minutes, and then transmitting at the maximum rate for a few minutes. Sustained throughput is about 117 MB/s on both ports simultaneously. The "hw csum failure" does not seem to affect the test. send()/recv() continue to work normally. Nothing locks up. I get several "hw csum failure" messages per minute on 2.6.23-SMP. The error does not happen with 2.6.23 if I boot with "max_cpus=1". The message seems less frequent with 2.6.24-SMP, but it still happens once every minute or so. The "hw csum failure" message does not happen when only one port is in use. You have to stress both ports simultaneously to reproduce the problem. Another cosmetic issue is that "ifconfig" shows eth2 at IRQ 16 and eth3 at IRQ 218, when in fact both are at IRQ 218. IRQ 16 is the regular interrupt line and IRQ 218 is the MSI interrupt. I imagine that the driver is just reporting the IRQ incorrectly in this case. It is just a minor cosmetic issue which doesn't break anything. Let me know if I can be of any further assistance in tracking down this problem. NIC: Syskonnect SK-9E22 dual-port copper PCI-express motherboard: SuperMicro PDSME CPU: Pentium D 945 (dual-core 3.4 GHz) kernel versions: 2.6.23 and 2.6.24-rc2-git3 All information below is from 2.6.24-rc2-git3. portion of dmesg showing error: : hw csum failure. [] skb_copy_and_csum_datagram_iovec+0x120/0x130 [] __set_page_dirty+0x83/0x140 [] tcp_rcv_established+0x981/0x9a0 [] tcp_v4_do_rcv+0xc0/0x370 [] release_sock+0x12/0xa0 [] sk_wait_data+0xa1/0xd0 [] tcp_prequeue_process+0x48/0x70 [] tcp_recvmsg+0x671/0xc50 [] enqueue_task_fair+0x73/0xb0 [] sock_common_recvmsg+0x45/0x70 [] sock_recvmsg+0xd8/0x130 [] autoremove_wake_function+0x0/0x50 [] __do_softirq+0x82/0x100 [] irq_exit+0x52/0x90 [] smp_apic_timer_interrupt+0x54/0x80 [] sys_recvfrom+0xeb/0x180 [] read_hpet+0xa/0x10 [] getnstimeofday+0x40/0xf0 [] rebalance_domains+0x110/0x3e0 [] sys_recv+0x33/0x40 [] sys_socketcall+0x165/0x280 [] sysenter_past_esp+0x5f/0x85 === dmesg | grep sky2 sky2 :04:00.0: v1.20 addr 0xea30 irq 16 Yukon-XL (0xb3) rev 1 sky2 :04:00.0: PCI Express Advanced Error Reporting not configured or MMCONFIG problem? sky2 eth2: addr 00:00:5a:72:b8:91 sky2 eth3: addr 00:00:5a:72:b8:92 sky2 eth2: enabling interface sky2 eth3: enabling interface sky2 eth2: Link is up at 1000 Mbps, full duplex, flow control both sky2 eth3: Link is up at 1000 Mbps, full duplex, flow control both ifconfig eth2 Link encap:Ethernet HWaddr 00:00:5A:72:B8:91 inet addr:192.168.1.10 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34910877 errors:0 dropped:0 overruns:0 frame:0 TX packets:22659597 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3207874526 (2.9 GiB) TX bytes:2888042042 (2.6 GiB) Interrupt:16 eth3 Link encap:Ethernet HWaddr 00:00:5A:72:B8:92 inet addr:137.157.10.224 Bcast:137.157.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34902414 errors:0 dropped:0 overruns:0 frame:0 TX packets:22641940 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3207442696 (2.9 GiB) TX bytes:2886952355 (2.6 GiB) Interrupt:218 ethtool -i eth2 driver: sky2 version: 1.20 firmware-version: N/A bus-info: :04:00.0 ethtool eth2 Settings for eth2: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Trans
Re: BUG: sky2: hw csum failure with dual-port copper NIC on SMP
Stephen Hemminger wrote: > I can reproduce the problem under load with only a single port on 2.6.23. > I haven't been able to reproduce it on 2.6.24-rc2 (latest) but that maybe > because of either insufficient stress or another bug fix correcting the > problem. There is an issue with Yukon XL updating the receive status index > before updating the receive status structure, that is now fixed in 2.6.24. > The fix is: > > commit ab5adecb2d02f3688719dfb5936a82833fcc3955 > Author: Stephen Hemminger <[EMAIL PROTECTED]> > Date: Mon Nov 5 15:52:09 2007 -0800 > > sky2: status ring race fix > > The D-Link PCI-X board (and maybe others) can lie about status > ring entries. It seems it will update the register for last status > index before completing the DMA for the ring entry. To avoid reading > stale data, zap the old entry and check. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> The kernel I tested (2.6.24-rc2-git3) has this patch in it already. Perhaps that is why the problem happens less frequently with that kernel, but it didn't fix it entirely. Do you want the test program I am using? It is a pretty basic send()/recv() program, ~650 lines of C. Tony - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: sky2: hw csum failure with dual-port copper NIC on SMP
>> Do you want the test program I am using? It is a pretty basic >> send()/recv() program, ~650 lines of C. >> >> Tony >> >> > > Not really, iperf drives the problem fine. > > Thanks for the tip; I hadn't tried iperf before today. I can reproduce the problem with "iperf -s" on the system with the dual-port SysKonnect NIC and "iperf -c host -t 120" on the two Intel PRO/1000 (e1000 driver) client systems. The problem doesn't seem to happen the other way around though (running the server on the two Intel PRO/1000 systems and two iperf clients on the SysKonnect system). So the problem appears to be triggered by recv() on the SysKonnect side but not send(). Tony - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
> What bus and chipset is in use on the systems with sky2? > I have seen problems when using PCI-X on AMD systems (documented in AMD > errata) > due to multiple outstanding transactions. Motherboard: SuperMicro PDSME Chipset: Intel E7230 Processor: Intel Pentium D 3.4 GHz (note: tried both SMP and booting with maxcpus=1) lspci: 00:00.0 Host bridge: Intel Corporation E7230/3000/3010 Memory Controller Hub (rev 81) 00:01.0 PCI bridge: Intel Corporation E7230/3000/3010 PCI Express Root Port (rev 81) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) 01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) 01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) 01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09) 04:00.0 Ethernet controller: SysKonnect SK-9E21D 10/100/1000Base-T Adapter, Copper RJ-45 (rev 14) 05:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) 06:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) 0a:04.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) cat /proc/cpuinfo: processor: 0 vendor_id: GenuineIntel cpu family: 15 model: 6 model name: Intel(R) Pentium(R) D CPU 3.40GHz stepping: 4 cpu MHz: 3391.734 cache size: 2048 KB physical id: 0 siblings: 2 core id: 0 cpu cores: 2 fdiv_bug: no hlt_bug: no f00f_bug: no coma_bug: no fpu: yes fpu_exception: yes cpuid level: 6 wp: yes flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm bogomips: 6789.26 clflush size: 64 processor: 1 vendor_id: GenuineIntel cpu family: 15 model: 6 model name: Intel(R) Pentium(R) D CPU 3.40GHz stepping: 4 cpu MHz: 3391.734 cache size: 2048 KB physical id: 0 siblings: 2 core id: 1 cpu cores: 2 fdiv_bug: no hlt_bug: no f00f_bug: no coma_bug: no fpu: yes fpu_exception: yes cpuid level: 6 wp: yes flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm bogomips: 6783.57 clflush size: 64 cat /proc/interrupts CPU0 CPU1 0: 86 0 IO-APIC-edge timer 1: 81 0 IO-APIC-edge i8042 7: 0 0 IO-APIC-edge parport0 8: 1 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 5 0 IO-APIC-edge i8042 14: 412 0 IO-APIC-edge ide0 16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 19: 31 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 20: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 219: 1 0 PCI-MSI-edge eth0 NMI: 0 0 Non-maskable interrupts LOC: 1924 514 Local timer interrupts RES: 16 20 Rescheduling interrupts CAL: 19 56 function call interrupts TLB: 21 41 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 (note: tried booting with pci=nomsi also) Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sky2: tx hang on dual-port Yukon XL when rx csum disabled
I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in 2.6.24. The problem is triggered by both ports transmitting at high speed simultaneously. This problem is 100% quickly reproducible. Here is the setup: PC #1 with Intel PRO/1000 NIC: e1000 IP address 192.168.1.1 running iperf -s PC #2 with Intel PRO/1000 NIC: e1000 IP address 192.168.2.1 running iperf -s PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express) sky2 IP address 192.168.1.2 sky2 IP address 192.168.2.2 So basically, I have two PCs with Intel PRO/1000 NICs running "iperf -s". Each of these Intel NICs is directly cabled to one of the two ports of the SysKonnect NIC. When I run: (PC #3 tty1) iperf -c 192.168.1.1 -t 30 (wait for a second or two) (PC #3 tty2) iperf -c 192.168.2.1 -t 30 "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does not respond. Ping 192.168.2.1 does respond, but each ping has almost exactly 1 second latency (the latency should be < 1 ms). When I switch the order of the tests, whichever iperf -c was started _first_ is the one that locks up with no ping afterward, and whichever was started _second_ is the one that finishes, but with a 1-second ping latency afterward. So the problem follows the ordering of the tests rather than a specific port. Also, the trigger seems to be transmitting, not receiving. If I run "iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000 PCs, then the tests pass. When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx checksumming on both ports of the SysKonnect NIC, both tests pass successfully. Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2: disable rx checksum on Yukon XL" disabled rx checksumming by default on this NIC to get rid of some "hw csum failure" messages (http://marc.info/?l=linux-netdev&m=119497815523843&w=4). However, this seems to have exposed a different (and arguably worse) bug. I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect the problem. As a temporary workaround, I will use ethtool to turn on rx checksumming and live with the "hw csum failure" messages, since they are better than network lockups. Let me know if I can be of any further assistance in tracking down this problem. Tony Battersby Cybernetics -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
Brandeburg, Jesse wrote: > make sure to disable the default Linux arp behavior for this kind of > test on PC3 by* > [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter > [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter > [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/eth1/arp_filter > > *see http://linux-ip.net/html/ether-arp.html > > Yeah, that bit me a few years ago, and I now have it in one of my boot startup scripts... But thanks anyway. Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
Tony Battersby wrote: > "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does > finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does > not respond. Ping 192.168.2.1 does respond, but each ping has almost > exactly 1 second latency (the latency should be < 1 ms). > > Update: after triggering the problem, the ping latency on the interface that still responds is the same as the ping interval. The default ping interval is 1 second, so in my initial test I was seeing a 1 second ping latency. If I do "ping -i 2 192.168.2.1", then each ping takes 2 seconds to receive the response. If I do "ping -i 5 192.168.2.1", then each ping takes 5 seconds to receive the response. This implies that the network stack doesn't realize that it received the ping reply until it goes to send another ping. Hope that helps. Tony Battersby Cybernetics -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: fix kernel_accept() error path
If accept() returns an error, kernel_accept() releases the new socket but passes a pointer to the released socket back to the caller. Make it pass back NULL instead. Signed-off-by: Tony Battersby <[EMAIL PROTECTED]> --- --- linux-2.6.23-rc9/net/socket.c.bak 2007-10-04 15:21:17.0 -0400 +++ linux-2.6.23-rc9/net/socket.c 2007-10-04 15:21:22.0 -0400 @@ -2230,6 +2230,7 @@ int kernel_accept(struct socket *sock, s err = sock->ops->accept(sock, *newsock, flags); if (err < 0) { sock_release(*newsock); + *newsock = NULL; goto done; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fix kernel_accept() error path
James Morris wrote: > On Thu, 4 Oct 2007, Tony Battersby wrote: > > >> If accept() returns an error, kernel_accept() releases the new socket >> but passes a pointer to the released socket back to the caller. Make it >> pass back NULL instead. >> >> Signed-off-by: Tony Battersby <[EMAIL PROTECTED]> >> --- >> --- linux-2.6.23-rc9/net/socket.c.bak2007-10-04 15:21:17.0 >> -0400 >> +++ linux-2.6.23-rc9/net/socket.c2007-10-04 15:21:22.0 -0400 >> @@ -2230,6 +2230,7 @@ int kernel_accept(struct socket *sock, s >> err = sock->ops->accept(sock, *newsock, flags); >> if (err < 0) { >> sock_release(*newsock); >> +*newsock = NULL; >> goto done; >> } >> >> > > If you get an error back from kernel_accept, you should not be trying to > use newsock. > > Here is an example of what I would consider "reasonable code" that would fail: int example() { struct socket *conn_socket = NULL; int err; ... if ((err = kernel_accept(sock, &conn_socket, 0)) < 0) goto out_cleanup; [do whatever with conn_socket] out_cleanup: if (conn_socket != NULL) sock_release(&conn_socket); return err; } Without the patch, the double sock_release() will cause a BUG(). Also compare to sock_create_lite(), which sets *res to NULL on error. Tony - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
TG3 network data corruption regression 2.6.24/2.6.23.4
tets: 5403873 rx_fragments: 0 rx_ucast_packets: 77197 rx_mcast_packets: 0 rx_bcast_packets: 1 rx_fcs_errors: 0 rx_align_errors: 0 rx_xon_pause_rcvd: 0 rx_xoff_pause_rcvd: 0 rx_mac_ctrl_rcvd: 0 rx_xoff_entered: 0 rx_frame_too_long_errors: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_in_length_errors: 0 rx_out_length_errors: 0 rx_64_or_less_octet_packets: 2 rx_65_to_127_octet_packets: 77196 rx_128_to_255_octet_packets: 0 rx_256_to_511_octet_packets: 0 rx_512_to_1023_octet_packets: 0 rx_1024_to_1522_octet_packets: 0 rx_1523_to_2047_octet_packets: 0 rx_2048_to_4095_octet_packets: 0 rx_4096_to_8191_octet_packets: 0 rx_8192_to_9022_octet_packets: 0 tx_octets: 1000276920 tx_collisions: 0 tx_xon_sent: 0 tx_xoff_sent: 0 tx_flow_control: 0 tx_mac_errors: 0 tx_single_collisions: 0 tx_mult_collisions: 0 tx_deferred: 0 tx_excessive_collisions: 0 tx_late_collisions: 0 tx_collide_2times: 0 tx_collide_3times: 0 tx_collide_4times: 0 tx_collide_5times: 0 tx_collide_6times: 0 tx_collide_7times: 0 tx_collide_8times: 0 tx_collide_9times: 0 tx_collide_10times: 0 tx_collide_11times: 0 tx_collide_12times: 0 tx_collide_13times: 0 tx_collide_14times: 0 tx_collide_15times: 0 tx_ucast_packets: 3488350 tx_mcast_packets: 0 tx_bcast_packets: 0 tx_carrier_sense_errors: 0 tx_discards: 0 tx_errors: 0 dma_writeq_full: 0 dma_write_prioq_full: 0 rxbds_empty: 0 rx_discards: 0 rx_errors: 0 rx_threshold_hit: 11 dma_readq_full: 2188114 dma_read_prioq_full: 162588 tx_comp_queue_full: 0 ring_set_send_prod_index: 2901128 ring_status_update: 218885 nic_irqs: 146494 nic_avoided_irqs: 72391 nic_tx_threshold_hit: 103584 Tony Battersby Cybernetics -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: > On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote: > > >> One consequence of Herbert's change is that the chip will see a >> different datastream. The initial skb->data linear area will be >> smaller, and the transition to the fragmented area of pages will be >> quicker. >> >> > > I see. Perhaps when we get to the end of the data-stream, there is a > tiny frag that the chip cannot handle. That's the only thing I can > think of. > > Please try this patch to see if the problem goes away. This will > disable SG on 5701 so we always get linear SKBs. > > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c > index db606b6..bb37e76 100644 > --- a/drivers/net/tg3.c > +++ b/drivers/net/tg3.c > @@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev > *pdev, > } else > tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS; > > + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701) > + dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG); > + > /* flow control autonegotiation is default behavior */ > tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG; > tp->link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX; > > > > This patch does appear to fix the data corruption (tested with 2.6.24.2). However, it results in performance problems with the iSCSI application that I am trying to run on this machine. The test program that I described in the previous message still gets good performance in both directions. "iperf -r" gets good performance in both directions (940 Mbits/s or 117 MB/s). However, my target-mode iSCSI application (which obviously generates rx/tx traffic patterns more complicated than the synthetic tests) gets very poor performance in one direction but good performance in the other direction. iSCSI performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx with light tx, but remains at a decent 115 MB/s when the 3Com NIC is doing heavy tx with light rx. When I revert Herbert's patch instead of applying the patch above, I get 115 MB/s in both cases. (With a stock unpatched kernel, the test fails almost immediately because the iSCSI control PDUs are corrupted, causing the TCP connection to be dropped.) The SysKonnect NIC that does not exhibit this problem has a chip that says "BCM5411KQM" "TT0128 P2Q" and "56975E". Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: >> The SysKonnect NIC that does not exhibit this problem has a chip that >> says "BCM5411KQM" "TT0128 P2Q" and "56975E". > I think this is the 5700, but please send me the tg3 output that > identifies the chip and the revision. Something like this: > > eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) > 10/100/1000Base-T Ethernet 00:10:18:04:57:0d > eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1] > Here is the dmesg output for the SysKonnect NIC: eth0: Tigon3 [partno(SK-9D21) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:00:5a:9d:0c:4a eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: > On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote: > >> iSCSI >> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx >> with light tx, >> > > That's strange. The patch should only affect TX performance slightly > since we are just turning off SG for TX. Please take an ethereal trace > to see what's happening and compare with a good trace. > > Update: when I revert Herbert's patch in addition to applying your patch, the iSCSI performance goes back up to 115 MB/s again in both directions. So it looks like turning off SG for TX didn't itself cause the performance drop, but rather that the performance drop is just another manifestation of whatever bug is causing the data corruption. I do not regularly use wireshark or look at network packet dumps, so I am not really sure what to look for. Given the above information, do you still believe that there is value in examining the packet dump? Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Michael Chan wrote: > On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote: > > >> Update: when I revert Herbert's patch in addition to applying your >> patch, the iSCSI performance goes back up to 115 MB/s again in both >> directions. So it looks like turning off SG for TX didn't itself cause >> the performance drop, but rather that the performance drop is just >> another manifestation of whatever bug is causing the data corruption. >> >> I do not regularly use wireshark or look at network packet dumps, so I >> am not really sure what to look for. Given the above information, do >> you still believe that there is value in examining the packet dump? >> >> > > Can you confirm whether you're getting TCP checksum errors on the other > side that is receiving packets from the 5701? You can just check > statistics using netstat -s. I suspect that after we turn off SG, > checksum is no longer offloaded and we are getting lots of TCP checksum > errors instead that are slowing the performance. > > > Confirmed. With a 100 MB read/write test, netstat -s shows 75 bad segments received, and performance in the one direction is about 5 MB/s. When I switch to the SysKonnect NIC, netstat -s shows 0 bad segments received, and performance is 115 MB/s. So that solves that mystery - there is still data corruption, but the software-computed TCP checksum causes the bad packets to be retransmitted rather than being passed on to the application. Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Herbert Xu wrote: > On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote: > >> Update: when I revert Herbert's patch in addition to applying your >> patch, the iSCSI performance goes back up to 115 MB/s again in both >> directions. So it looks like turning off SG for TX didn't itself cause >> the performance drop, but rather that the performance drop is just >> another manifestation of whatever bug is causing the data corruption. >> > > Interesting. So the workload that regressed is mostly RX with a > little TX traffic? Can you try to reproduce this with something > like netperf to eliminate other variables? > > This is all very puzzling since the patch in question shouldn't > change an RX load at all. > > Thanks, > We have established that the slowdown was caused by TCP checksum errors and retransmits. I assume that the slowdown in my test was due to the light TX rather than the heavy RX. I am no TCP protocol expert, but perhaps heavy TX (such as iperf) might not be affected as much because the wire stays busy while waiting for the retransmit, whereas with my light TX iSCSI load, the wire goes idle while waiting for the retransmit because the iSCSI state machine is stalled. Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Matt Carlson wrote: > Hi Tony. Can you give us the output of : > > sudo lspci -vvv - -s 03:01.0' > 03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) Subsystem: Compaq Computer Corporation NC7770 Gigabit Server Adapter (PCI-X, 10/100/1000-T) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable- Address: 063000119b608000 Data: 0423 00: e4 14 45 16 06 00 b0 02 15 00 00 02 10 40 00 00 10: 04 00 7f df 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 11 0e 7c 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 40 00 40: 07 48 00 00 09 03 03 00 01 50 02 c0 00 20 00 64 50: 03 58 00 00 08 10 21 08 05 00 86 00 00 80 60 9b 60: 11 00 30 06 23 04 00 00 98 02 05 01 0f 00 db 76 70: 8a 10 00 00 c7 00 00 80 50 00 00 00 00 00 00 00 80: 03 58 00 00 00 00 00 00 34 80 13 04 82 10 00 00 90: 09 06 00 01 00 00 00 00 00 00 00 00 c6 01 00 00 a0: 00 00 00 00 fe 02 00 00 00 00 00 00 af 01 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Also, after some digging, I found that the 5701 can run into trouble if > a 64-bit DMA read terminates early and then completes as a 32-bit transfer. > The problem is reportedly very rare, but the failure mode looks like a > match. Can you apply the following patch and see if it helps your > performance / corruption problems? > > > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c > index db606b6..7ad08ce 100644 > --- a/drivers/net/tg3.c > +++ b/drivers/net/tg3.c > @@ -11409,6 +11409,8 @@ static int __devinit tg3_get_invariants(struct tg3 > *tp) > tp->tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED; > if ((pci_state_reg & PCISTATE_BUS_32BIT) != 0) > tp->tg3_flags |= TG3_FLAG_PCI_32BIT; > + else if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701) > + tp->grc_mode |= GRC_MODE_FORCE_PCI32BIT; > > /* Chip-specific fixup from Broadcom driver */ > if ((tp->pci_chip_rev_id == CHIPREV_ID_5704_A0) && > > Sorry, this didn't help. I still get data corruption with hardware checksumming or poor performance with software checksumming. Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
Update: Herbert's patch alters the arguments to alloc_skb_fclone() and skb_reserve() from within sk_stream_alloc_pskb(). This changes the skb_headroom() and skb_tailroom() of the returned skb. I decided to see if I could detect the precise point at which data corruption started to happen. The result is this table: (sk_stream_alloc_pskb() called with size == 1448; sk->sk_prot->max_header == 160) skb_headroom skb_tailroom test result note 216 1448 fail [1] 344 1448 fail 340 1452 pass 336 1456 pass 332 1460 pass 328 1464 fail 324 1468 pass 320 1472 pass 316 1476 pass 312 1480 fail 308 1484 pass 304 1488 pass 300 1492 pass 296 1496 fail 292 1500 pass 288 1504 pass 284 1508 pass 280 1512 fail 276 1516 pass 272 1520 pass 268 1524 pass 264 1528 fail 260 1532 pass 256 1536 pass [2] Notes: [1] Kernels 2.6.23.4 - 2.6.23.16 and 2.6.24 - current with Herbert's patch [2] Kernels 2.6.23.3 and before without Herbert's patch Note that the first row has skb_headroom + skb_tailroom == 1664; the remaining rows have skb_headroom + skb_tailroom == 1792. >From these results, it looks like a data alignment issue. Herbert's patch unfortunately just happened to change the alignment in a way that made it break. Tony -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 network data corruption regression 2.6.24/2.6.23.4
The following patch fixes the problem for me. Do we want to accept this patch and call it a day or continue investigating the source of the problem? Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc. If everyone agrees that this is the right solution, I will resubmit with a proper subject line and description. Tony --- linux-2.6.24.2/include/net/sock.h.orig 2008-02-20 17:19:20.0 -0500 +++ linux-2.6.24.2/include/net/sock.h 2008-02-20 17:25:55.0 -0500 @@ -1236,8 +1236,10 @@ static inline struct sk_buff *sk_stream_ { struct sk_buff *skb; - /* The TCP header must be at least 32-bit aligned. */ - size = ALIGN(size, 4); + /* The TCP header must be at least 32-bit aligned, but some chipsets +* such as Broadcom BCM5701 require at least 16-byte alignment. +*/ + size = ALIGN(size, 16); skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp); if (skb) { -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html