BUG: sky2: hw csum failure with dual-port copper NIC on SMP

2007-11-13 Thread Tony Battersby
I am getting "hw csum failure" messages with sky2.  I have seen this
problem reported elsewhere with a fibre NIC, but I am using a copper
NIC.  It seems to be triggered by SMP.  It is easy to reproduce in
2.6.23.  2.6.24-rc2-git3 still has the problem, but it happens less
frequently.

To reproduce the problem, I am using a simple network benchmark program
that I wrote that basically does send()/recv() as fast as possible using
a memory buffer (null data, no disk I/O, no data integrity checking).
The computer with the SysKonnect NIC acts as the server.  I have two
other computers with Intel PRO/1000 NICs that are directly cabled to the
two ports on the SysKonnect NIC.  Each of them runs the client program,
which connects to the server, send()s 10 GB, and then recv()s 10 GB.
Essentially, both ports on the Syskonnect NIC are receiving at the
maximum rate for a few minutes, and then transmitting at the maximum
rate for a few minutes.  Sustained throughput is about 117 MB/s on both
ports simultaneously.

The "hw csum failure" does not seem to affect the test.  send()/recv()
continue to work normally.  Nothing locks up.

I get several "hw csum failure" messages per minute on 2.6.23-SMP.  The
error does not happen with 2.6.23 if I boot with "max_cpus=1".  The
message seems less frequent with 2.6.24-SMP, but it still happens once
every minute or so.

The "hw csum failure" message does not happen when only one port is in
use.  You have to stress both ports simultaneously to reproduce the
problem.

Another cosmetic issue is that "ifconfig" shows eth2 at IRQ 16 and eth3
at IRQ 218, when in fact both are at IRQ 218.  IRQ 16 is the regular
interrupt line and IRQ 218 is the MSI interrupt.  I imagine that the
driver is just reporting the IRQ incorrectly in this case.  It is just a
minor cosmetic issue which doesn't break anything.

Let me know if I can be of any further assistance in tracking down this
problem.

NIC: Syskonnect SK-9E22 dual-port copper PCI-express
motherboard: SuperMicro PDSME
CPU: Pentium D 945 (dual-core 3.4 GHz)
kernel versions: 2.6.23 and 2.6.24-rc2-git3

All information below is from 2.6.24-rc2-git3.

portion of dmesg showing error:
: hw csum failure.
 [] skb_copy_and_csum_datagram_iovec+0x120/0x130
 [] __set_page_dirty+0x83/0x140
 [] tcp_rcv_established+0x981/0x9a0
 [] tcp_v4_do_rcv+0xc0/0x370
 [] release_sock+0x12/0xa0
 [] sk_wait_data+0xa1/0xd0
 [] tcp_prequeue_process+0x48/0x70
 [] tcp_recvmsg+0x671/0xc50
 [] enqueue_task_fair+0x73/0xb0
 [] sock_common_recvmsg+0x45/0x70
 [] sock_recvmsg+0xd8/0x130
 [] autoremove_wake_function+0x0/0x50
 [] __do_softirq+0x82/0x100
 [] irq_exit+0x52/0x90
 [] smp_apic_timer_interrupt+0x54/0x80
 [] sys_recvfrom+0xeb/0x180
 [] read_hpet+0xa/0x10
 [] getnstimeofday+0x40/0xf0
 [] rebalance_domains+0x110/0x3e0
 [] sys_recv+0x33/0x40
 [] sys_socketcall+0x165/0x280
 [] sysenter_past_esp+0x5f/0x85
 ===

dmesg | grep sky2
sky2 :04:00.0: v1.20 addr 0xea30 irq 16 Yukon-XL (0xb3) rev 1
sky2 :04:00.0: PCI Express Advanced Error Reporting not configured or 
MMCONFIG problem?
sky2 eth2: addr 00:00:5a:72:b8:91
sky2 eth3: addr 00:00:5a:72:b8:92
sky2 eth2: enabling interface
sky2 eth3: enabling interface
sky2 eth2: Link is up at 1000 Mbps, full duplex, flow control both
sky2 eth3: Link is up at 1000 Mbps, full duplex, flow control both

ifconfig
eth2  Link encap:Ethernet  HWaddr 00:00:5A:72:B8:91  
  inet addr:192.168.1.10  Bcast:192.168.1.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:34910877 errors:0 dropped:0 overruns:0 frame:0
  TX packets:22659597 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3207874526 (2.9 GiB)  TX bytes:2888042042 (2.6 GiB)
  Interrupt:16 

eth3  Link encap:Ethernet  HWaddr 00:00:5A:72:B8:92  
  inet addr:137.157.10.224  Bcast:137.157.255.255  Mask:255.255.0.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:34902414 errors:0 dropped:0 overruns:0 frame:0
  TX packets:22641940 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:3207442696 (2.9 GiB)  TX bytes:2886952355 (2.6 GiB)
  Interrupt:218 

ethtool -i eth2
driver: sky2
version: 1.20
firmware-version: N/A
bus-info: :04:00.0

ethtool eth2
Settings for eth2:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Trans

Re: BUG: sky2: hw csum failure with dual-port copper NIC on SMP

2007-11-13 Thread Tony Battersby
Stephen Hemminger wrote:
> I can reproduce the problem under load with only a single port on 2.6.23.
> I haven't been able to reproduce it on 2.6.24-rc2 (latest) but that maybe
> because of either insufficient stress or another bug fix correcting the
> problem.  There is an issue with Yukon XL updating the receive status index
> before updating the receive status structure, that is now fixed in 2.6.24.
> The fix is:
>
> commit ab5adecb2d02f3688719dfb5936a82833fcc3955
> Author: Stephen Hemminger <[EMAIL PROTECTED]>
> Date:   Mon Nov 5 15:52:09 2007 -0800
>
> sky2: status ring race fix
> 
> The D-Link PCI-X board (and maybe others) can lie about status
> ring entries. It seems it will update the register for last status
> index before completing the DMA for the ring entry. To avoid reading
> stale data, zap the old entry and check.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
The kernel I tested (2.6.24-rc2-git3) has this patch in it already.
Perhaps that is why the problem happens less frequently with that
kernel, but it didn't fix it entirely.

Do you want the test program I am using? It is a pretty basic
send()/recv() program, ~650 lines of C.

Tony

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: sky2: hw csum failure with dual-port copper NIC on SMP

2007-11-14 Thread Tony Battersby

>> Do you want the test program I am using? It is a pretty basic
>> send()/recv() program, ~650 lines of C.
>>
>> Tony
>>
>> 
>
> Not really, iperf drives the problem fine.
>
>   

Thanks for the tip; I hadn't tried iperf before today.  I can reproduce
the problem with "iperf -s" on the system with the dual-port SysKonnect
NIC and "iperf -c host -t 120" on the two Intel PRO/1000 (e1000 driver)
client systems.  The problem doesn't seem to happen the other way around
though (running the server on the two Intel PRO/1000 systems and two
iperf clients on the SysKonnect system).  So the problem appears to be
triggered by recv() on the SysKonnect side but not send().

Tony

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-28 Thread Tony Battersby

> What bus and chipset is in use on the systems with sky2?
> I have seen problems when using PCI-X on AMD systems (documented in AMD 
> errata)
> due to multiple outstanding transactions.

Motherboard: SuperMicro PDSME
Chipset: Intel E7230
Processor: Intel Pentium D 3.4 GHz
(note: tried both SMP and booting with maxcpus=1)

lspci:

00:00.0 Host bridge: Intel Corporation E7230/3000/3010 Memory Controller Hub 
(rev 81)
00:01.0 PCI bridge: Intel Corporation E7230/3000/3010 PCI Express Root Port 
(rev 81)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 
(rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express 
Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI 
Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface 
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller 
(rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 
09)
01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A 
(rev 09)
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 
09)
01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09)
04:00.0 Ethernet controller: SysKonnect SK-9E21D 10/100/1000Base-T Adapter, 
Copper RJ-45 (rev 14)
05:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet 
Controller (Copper) (rev 03)
06:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet 
Controller (Copper) (rev 03)
0a:04.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)

cat /proc/cpuinfo:

processor: 0
vendor_id: GenuineIntel
cpu family: 15
model: 6
model name: Intel(R) Pentium(R) D CPU 3.40GHz
stepping: 4
cpu MHz: 3391.734
cache size: 2048 KB
physical id: 0
siblings: 2
core id: 0
cpu cores: 2
fdiv_bug: no
hlt_bug: no
f00f_bug: no
coma_bug: no
fpu: yes
fpu_exception: yes
cpuid level: 6
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts 
sync_rdtsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm
bogomips: 6789.26
clflush size: 64

processor: 1
vendor_id: GenuineIntel
cpu family: 15
model: 6
model name: Intel(R) Pentium(R) D CPU 3.40GHz
stepping: 4
cpu MHz: 3391.734
cache size: 2048 KB
physical id: 0
siblings: 2
core id: 1
cpu cores: 2
fdiv_bug: no
hlt_bug: no
f00f_bug: no
coma_bug: no
fpu: yes
fpu_exception: yes
cpuid level: 6
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts 
sync_rdtsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm
bogomips: 6783.57
clflush size: 64

cat /proc/interrupts

CPU0 CPU1
0: 86 0 IO-APIC-edge timer
1: 81 0 IO-APIC-edge i8042
7: 0 0 IO-APIC-edge parport0
8: 1 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 5 0 IO-APIC-edge i8042
14: 412 0 IO-APIC-edge ide0
16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 31 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
20: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
219: 1 0 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 1924 514 Local timer interrupts
RES: 16 20 Rescheduling interrupts
CAL: 19 56 function call interrupts
TLB: 21 41 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

(note: tried booting with pci=nomsi also)

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-28 Thread Tony Battersby
I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
2.6.24.  The problem is triggered by both ports transmitting at high
speed simultaneously.  This problem is 100% quickly reproducible.  Here
is the setup:

PC #1 with Intel PRO/1000 NIC:
e1000 IP address 192.168.1.1
running iperf -s

PC #2 with Intel PRO/1000 NIC:
e1000 IP address 192.168.2.1
running iperf -s

PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
sky2 IP address 192.168.1.2
sky2 IP address 192.168.2.2

So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
-s".  Each of these Intel NICs is directly cabled to one of the two
ports of the SysKonnect NIC.

When I run:
(PC #3 tty1) iperf -c 192.168.1.1 -t 30
(wait for a second or two)
(PC #3 tty2) iperf -c 192.168.2.1 -t 30

"iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
finish.  Press Ctrl-C to abort the hung iperf.  Ping 192.168.1.1 does
not respond.  Ping 192.168.2.1 does respond, but each ping has almost
exactly 1 second latency (the latency should be < 1 ms).

When I switch the order of the tests, whichever iperf -c was started
_first_ is the one that locks up with no ping afterward, and whichever
was started _second_ is the one that finishes, but with a 1-second ping
latency afterward.  So the problem follows the ordering of the tests
rather than a specific port.

Also, the trigger seems to be transmitting, not receiving.  If I run
"iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000
PCs, then the tests pass.

When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx
checksumming on both ports of the SysKonnect NIC, both tests pass
successfully.  Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2:
disable rx checksum on Yukon XL" disabled rx checksumming by default on
this NIC to get rid of some "hw csum failure" messages
(http://marc.info/?l=linux-netdev&m=119497815523843&w=4).  However, this
seems to have exposed a different (and arguably worse) bug.

I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect
the problem.

As a temporary workaround, I will use ethtool to turn on rx checksumming
and live with the "hw csum failure" messages, since they are better than
network lockups.

Let me know if I can be of any further assistance in tracking down this
problem.

Tony Battersby
Cybernetics

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-28 Thread Tony Battersby
Brandeburg, Jesse wrote:
> make sure to disable the default Linux arp behavior for this kind of
> test on PC3 by*
> [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
> [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter
> [EMAIL PROTECTED] echo 1 > /proc/sys/net/ipv4/conf/eth1/arp_filter
>
> *see http://linux-ip.net/html/ether-arp.html
>
>   

Yeah, that bit me a few years ago, and I now have it in one of my boot
startup scripts...

But thanks anyway.
Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled

2008-01-29 Thread Tony Battersby
Tony Battersby wrote:
> "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
> finish.  Press Ctrl-C to abort the hung iperf.  Ping 192.168.1.1 does
> not respond.  Ping 192.168.2.1 does respond, but each ping has almost
> exactly 1 second latency (the latency should be < 1 ms).
>
>   

Update: after triggering the problem, the ping latency on the interface
that still responds is the same as the ping interval.  The default ping
interval is 1 second, so in my initial test I was seeing a 1 second ping
latency.  If I do "ping -i 2 192.168.2.1", then each ping takes 2
seconds to receive the response.  If I do "ping -i 5 192.168.2.1", then
each ping takes 5 seconds to receive the response.  This implies that
the network stack doesn't realize that it received the ping reply until
it goes to send another ping.

Hope that helps.

Tony Battersby
Cybernetics

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fix kernel_accept() error path

2007-10-04 Thread Tony Battersby
If accept() returns an error, kernel_accept() releases the new socket
but passes a pointer to the released socket back to the caller.  Make it
pass back NULL instead.

Signed-off-by: Tony Battersby <[EMAIL PROTECTED]>
---
--- linux-2.6.23-rc9/net/socket.c.bak   2007-10-04 15:21:17.0 -0400
+++ linux-2.6.23-rc9/net/socket.c   2007-10-04 15:21:22.0 -0400
@@ -2230,6 +2230,7 @@ int kernel_accept(struct socket *sock, s
err = sock->ops->accept(sock, *newsock, flags);
if (err < 0) {
sock_release(*newsock);
+   *newsock = NULL;
goto done;
}
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fix kernel_accept() error path

2007-10-04 Thread Tony Battersby
James Morris wrote:
> On Thu, 4 Oct 2007, Tony Battersby wrote:
>
>   
>> If accept() returns an error, kernel_accept() releases the new socket
>> but passes a pointer to the released socket back to the caller.  Make it
>> pass back NULL instead.
>>
>> Signed-off-by: Tony Battersby <[EMAIL PROTECTED]>
>> ---
>> --- linux-2.6.23-rc9/net/socket.c.bak2007-10-04 15:21:17.0 
>> -0400
>> +++ linux-2.6.23-rc9/net/socket.c2007-10-04 15:21:22.0 -0400
>> @@ -2230,6 +2230,7 @@ int kernel_accept(struct socket *sock, s
>>  err = sock->ops->accept(sock, *newsock, flags);
>>  if (err < 0) {
>>  sock_release(*newsock);
>> +*newsock = NULL;
>>  goto done;
>>  }
>>  
>> 
>
> If you get an error back from kernel_accept, you should not be trying to 
> use newsock.
>
>   

Here is an example of what I would consider "reasonable code" that would
fail:

int example()
{
struct socket *conn_socket = NULL;
int err;

...

if ((err = kernel_accept(sock, &conn_socket, 0)) < 0)
goto out_cleanup;

[do whatever with conn_socket]

 out_cleanup:

if (conn_socket != NULL)
sock_release(&conn_socket);

return err;
}

Without the patch, the double sock_release() will cause a BUG().

Also compare to sock_create_lite(), which sets *res to NULL on error.

Tony

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Tony Battersby
tets: 5403873
 rx_fragments: 0
 rx_ucast_packets: 77197
 rx_mcast_packets: 0
 rx_bcast_packets: 1
 rx_fcs_errors: 0
 rx_align_errors: 0
 rx_xon_pause_rcvd: 0
 rx_xoff_pause_rcvd: 0
 rx_mac_ctrl_rcvd: 0
 rx_xoff_entered: 0
 rx_frame_too_long_errors: 0
 rx_jabbers: 0
 rx_undersize_packets: 0
 rx_in_length_errors: 0
 rx_out_length_errors: 0
 rx_64_or_less_octet_packets: 2
 rx_65_to_127_octet_packets: 77196
 rx_128_to_255_octet_packets: 0
 rx_256_to_511_octet_packets: 0
 rx_512_to_1023_octet_packets: 0
 rx_1024_to_1522_octet_packets: 0
 rx_1523_to_2047_octet_packets: 0
 rx_2048_to_4095_octet_packets: 0
 rx_4096_to_8191_octet_packets: 0
 rx_8192_to_9022_octet_packets: 0
 tx_octets: 1000276920
 tx_collisions: 0
 tx_xon_sent: 0
 tx_xoff_sent: 0
 tx_flow_control: 0
 tx_mac_errors: 0
 tx_single_collisions: 0
 tx_mult_collisions: 0
 tx_deferred: 0
 tx_excessive_collisions: 0
 tx_late_collisions: 0
 tx_collide_2times: 0
 tx_collide_3times: 0
 tx_collide_4times: 0
 tx_collide_5times: 0
 tx_collide_6times: 0
 tx_collide_7times: 0
 tx_collide_8times: 0
 tx_collide_9times: 0
 tx_collide_10times: 0
 tx_collide_11times: 0
 tx_collide_12times: 0
 tx_collide_13times: 0
 tx_collide_14times: 0
 tx_collide_15times: 0
 tx_ucast_packets: 3488350
 tx_mcast_packets: 0
 tx_bcast_packets: 0
 tx_carrier_sense_errors: 0
 tx_discards: 0
 tx_errors: 0
 dma_writeq_full: 0
 dma_write_prioq_full: 0
 rxbds_empty: 0
 rx_discards: 0
 rx_errors: 0
 rx_threshold_hit: 11
 dma_readq_full: 2188114
 dma_read_prioq_full: 162588
 tx_comp_queue_full: 0
 ring_set_send_prod_index: 2901128
 ring_status_update: 218885
 nic_irqs: 146494
 nic_avoided_irqs: 72391
 nic_tx_threshold_hit: 103584

Tony Battersby
Cybernetics

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote:
> On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote:
>
>   
>> One consequence of Herbert's change is that the chip will see a
>> different datastream.  The initial skb->data linear area will be
>> smaller, and the transition to the fragmented area of pages will be
>> quicker.
>>
>> 
>
> I see.  Perhaps when we get to the end of the data-stream, there is a
> tiny frag that the chip cannot handle.  That's the only thing I can
> think of.
>
> Please try this patch to see if the problem goes away.  This will
> disable SG on 5701 so we always get linear SKBs.
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index db606b6..bb37e76 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev 
> *pdev,
>   } else
>   tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS;
>  
> + if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
> + dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG);
> +
>   /* flow control autonegotiation is default behavior */
>   tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
>   tp->link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX;
>
>
>
>   
This patch does appear to fix the data corruption (tested with
2.6.24.2).  However, it results in performance problems with the iSCSI
application that I am trying to run on this machine.

The test program that I described in the previous message still gets
good performance in both directions.  "iperf -r" gets good performance
in both directions (940 Mbits/s or 117 MB/s).  However, my target-mode
iSCSI application (which obviously generates rx/tx traffic patterns more
complicated than the synthetic tests) gets very poor performance in one
direction but good performance in the other direction.  iSCSI
performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
with light tx, but remains at a decent 115 MB/s when the 3Com NIC is
doing heavy tx with light rx.  When I revert Herbert's patch instead of
applying the patch above, I get 115 MB/s in both cases.  (With a stock
unpatched kernel, the test fails almost immediately because the iSCSI
control PDUs are corrupted, causing the TCP connection to be dropped.)

The SysKonnect NIC that does not exhibit this problem has a chip that
says "BCM5411KQM" "TT0128 P2Q" and "56975E".

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote:
>> The SysKonnect NIC that does not exhibit this problem has a chip that
>> says "BCM5411KQM" "TT0128 P2Q" and "56975E".
> I think this is the 5700, but please send me the tg3 output that
> identifies the chip and the revision.  Something like this:
>
> eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 
> 10/100/1000Base-T Ethernet 00:10:18:04:57:0d
> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]
>   
Here is the dmesg output for the SysKonnect NIC:

eth0: Tigon3 [partno(SK-9D21) rev 7104 PHY(5411)] (PCI:66MHz:64-bit)
10/100/1000Base-T Ethernet 00:00:5a:9d:0c:4a
eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0]
eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby
Michael Chan wrote:
> On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:
>   
>> iSCSI
>> performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
>> with light tx,
>> 
>
> That's strange.  The patch should only affect TX performance slightly
> since we are just turning off SG for TX.  Please take an ethereal trace
> to see what's happening and compare with a good trace.
>
>   

Update: when I revert Herbert's patch in addition to applying your
patch, the iSCSI performance goes back up to 115 MB/s again in both
directions.  So it looks like turning off SG for TX didn't itself cause
the performance drop, but rather that the performance drop is just
another manifestation of whatever bug is causing the data corruption.

I do not regularly use wireshark or look at network packet dumps, so I
am not really sure what to look for.  Given the above information, do
you still believe that there is value in examining the packet dump?

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Michael Chan wrote:
> On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote:
>
>   
>> Update: when I revert Herbert's patch in addition to applying your
>> patch, the iSCSI performance goes back up to 115 MB/s again in both
>> directions.  So it looks like turning off SG for TX didn't itself cause
>> the performance drop, but rather that the performance drop is just
>> another manifestation of whatever bug is causing the data corruption.
>>
>> I do not regularly use wireshark or look at network packet dumps, so I
>> am not really sure what to look for.  Given the above information, do
>> you still believe that there is value in examining the packet dump?
>>
>> 
>
> Can you confirm whether you're getting TCP checksum errors on the other
> side that is receiving packets from the 5701?  You can just check
> statistics using netstat -s.  I suspect that after we turn off SG,
> checksum is no longer offloaded and we are getting lots of TCP checksum
> errors instead that are slowing the performance.
>
>
>   
Confirmed.  With a 100 MB read/write test, netstat -s shows 75 bad
segments received, and performance in the one direction is about 5
MB/s.  When I switch to the SysKonnect NIC, netstat -s shows 0 bad
segments received, and performance is 115 MB/s.  So that solves that
mystery - there is still data corruption, but the software-computed TCP
checksum causes the bad packets to be retransmitted rather than being
passed on to the application.

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Herbert Xu wrote:
> On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
>   
>> Update: when I revert Herbert's patch in addition to applying your
>> patch, the iSCSI performance goes back up to 115 MB/s again in both
>> directions.  So it looks like turning off SG for TX didn't itself cause
>> the performance drop, but rather that the performance drop is just
>> another manifestation of whatever bug is causing the data corruption.
>> 
>
> Interesting.  So the workload that regressed is mostly RX with a
> little TX traffic? Can you try to reproduce this with something
> like netperf to eliminate other variables?
>
> This is all very puzzling since the patch in question shouldn't
> change an RX load at all.
>
> Thanks,
>   
We have established that the slowdown was caused by TCP checksum errors
and retransmits.  I assume that the slowdown in my test was due to the
light TX rather than the heavy RX.  I am no TCP protocol expert, but
perhaps heavy TX (such as iperf) might not be affected as much because
the wire stays busy while waiting for the retransmit, whereas with my
light TX iSCSI load, the wire goes idle while waiting for the retransmit
because the iSCSI state machine is stalled.

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Matt Carlson wrote:
> Hi Tony.  Can you give us the output of :
>
> sudo lspci -vvv - -s 03:01.0'
>   
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit 
Ethernet (rev 15)
Subsystem: Compaq Computer Corporation NC7770 Gigabit Server Adapter 
(PCI-X, 10/100/1000-T)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 
Enable-
Address: 063000119b608000  Data: 0423
00: e4 14 45 16 06 00 b0 02 15 00 00 02 10 40 00 00
10: 04 00 7f df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 11 0e 7c 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 40 00
40: 07 48 00 00 09 03 03 00 01 50 02 c0 00 20 00 64
50: 03 58 00 00 08 10 21 08 05 00 86 00 00 80 60 9b
60: 11 00 30 06 23 04 00 00 98 02 05 01 0f 00 db 76
70: 8a 10 00 00 c7 00 00 80 50 00 00 00 00 00 00 00
80: 03 58 00 00 00 00 00 00 34 80 13 04 82 10 00 00
90: 09 06 00 01 00 00 00 00 00 00 00 00 c6 01 00 00
a0: 00 00 00 00 fe 02 00 00 00 00 00 00 af 01 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00



> Also, after some digging, I found that the 5701 can run into trouble if
> a 64-bit DMA read terminates early and then completes as a 32-bit transfer.
> The problem is reportedly very rare, but the failure mode looks like a
> match.  Can you apply the following patch and see if it helps your
> performance / corruption problems?
>
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index db606b6..7ad08ce 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -11409,6 +11409,8 @@ static int __devinit tg3_get_invariants(struct tg3 
> *tp)
>   tp->tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED;
>   if ((pci_state_reg & PCISTATE_BUS_32BIT) != 0)
>   tp->tg3_flags |= TG3_FLAG_PCI_32BIT;
> + else if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
> + tp->grc_mode |= GRC_MODE_FORCE_PCI32BIT;
>  
>   /* Chip-specific fixup from Broadcom driver */
>   if ((tp->pci_chip_rev_id == CHIPREV_ID_5704_A0) &&
>
>   
Sorry, this didn't help.  I still get data corruption with hardware
checksumming or poor performance with software checksumming.

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
Update:

Herbert's patch alters the arguments to alloc_skb_fclone() and
skb_reserve() from within sk_stream_alloc_pskb().  This changes the
skb_headroom() and skb_tailroom() of the returned skb.  I decided to see
if I could detect the precise point at which data corruption started to
happen.  The result is this table:

(sk_stream_alloc_pskb() called with size == 1448;
sk->sk_prot->max_header == 160)

skb_headroom  skb_tailroom  test result  note
216   1448  fail [1]
344   1448  fail
340   1452  pass
336   1456  pass
332   1460  pass
328   1464  fail
324   1468  pass
320   1472  pass
316   1476  pass
312   1480  fail
308   1484  pass
304   1488  pass
300   1492  pass
296   1496  fail
292   1500  pass
288   1504  pass
284   1508  pass
280   1512  fail
276   1516  pass
272   1520  pass
268   1524  pass
264   1528  fail
260   1532  pass
256   1536  pass [2]

Notes:
[1] Kernels 2.6.23.4 - 2.6.23.16 and 2.6.24 - current with Herbert's patch
[2] Kernels 2.6.23.3 and before without Herbert's patch

Note that the first row has skb_headroom + skb_tailroom == 1664; the
remaining rows have skb_headroom + skb_tailroom == 1792.

>From these results, it looks like a data alignment issue.  Herbert's
patch unfortunately just happened to change the alignment in a way that
made it break.

Tony

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-20 Thread Tony Battersby
The following patch fixes the problem for me.  Do we want to accept this
patch and call it a day or continue investigating the source of the problem?

Patch applies to 2.6.24.2, but doesn't apply to 2.6.25-rc.  If everyone
agrees that this is the right solution, I will resubmit with a proper
subject line and description.

Tony

--- linux-2.6.24.2/include/net/sock.h.orig  2008-02-20 17:19:20.0 
-0500
+++ linux-2.6.24.2/include/net/sock.h   2008-02-20 17:25:55.0 -0500
@@ -1236,8 +1236,10 @@ static inline struct sk_buff *sk_stream_
 {
struct sk_buff *skb;
 
-   /* The TCP header must be at least 32-bit aligned.  */
-   size = ALIGN(size, 4);
+   /* The TCP header must be at least 32-bit aligned, but some chipsets
+* such as Broadcom BCM5701 require at least 16-byte alignment.
+*/
+   size = ALIGN(size, 16);
 
skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
if (skb) {


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html