Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Anders Boström
 BH == Ben Hutchings b...@decadent.org.uk writes:

 BH On Tue, 2010-01-26 at 09:34 +0100, Anders Boström wrote:
   JY == Jie Yang jie.y...@atheros.com writes:
  
 JY Anders Boström and...@netinsight.net wrote:
  
 JY following is my test cese,
   
 JY a nfs server server with ar8131chip, device id 1063.
   export /tmp/ dir as the nfs share directory,  JY the client,
   mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
   script to write and read data on the  JY
   /mnt/nfs/testnfs.log. it works fine.
   
   OK, the device-ID in our NFS-server is 1026, rev. b0. So it
   is possible that the problem is specific to that chip/version.
 JY oops, its my mistake in writing, my case is 1026 device ID
  
   
 JY Can you give me some advice on how to reproduce this bug??
   
   The only suggestion I have is to try to find a board with a
   1026-chip on it.
   
   My test-case is just copy of a 1 Gbyte file from the
   NFS-server to /dev/null , after making sure that the file
   isn't cached on the client by reading huge amounts of other data.
   
 JY just to check, if the kernel version is 2.6.26-2 ??
  
  I've tested with
  Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2,
  Debian linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
  kernel.org 2.6.30.10 amd64 with ethtool patch for setting of tso. Same
  result.

 BH Does booting with the kernel parameter 'pci=nomsi' avoid the problem?

I'm sorry, but I can't test this at the moment. The computer with the
TSO-problem is running as a file-server = can't be used for testing.
Also, we don't use the Atheros Ethernet interface any more due to
other problems, hard hang (need reset) of the Eth-interface
every ~6 month's.

However, the computer is scheduled to be replaced as file-server quite
soon, so I might be able to test this again after the replacement.

/ Anders


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Tue, Apr 02, 2013 at 09:35:04AM +0200, Anders Boström wrote:
 I'm sorry, but I can't test this at the moment. The computer with the
 TSO-problem is running as a file-server = can't be used for testing.
 Also, we don't use the Atheros Ethernet interface any more due to
 other problems, hard hang (need reset) of the Eth-interface
 every ~6 month's.

The bug is definitely still around. Yesterday I could reproduce it and will
look for a solution in the next days.

Do you have any details on the hangs every 6 months? Could you catch
thread dumps or oopses?

Thanks,

  Hannes


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Anders Boström
 HFS == Hannes Frederic Sowa han...@stressinduktion.org writes:

 HFS The bug is definitely still around. Yesterday I could reproduce it and 
will
 HFS look for a solution in the next days.

This sounds great!

 HFS Do you have any details on the hangs every 6 months? Could you catch
 HFS thread dumps or oopses?

As I wrote, the computer is a live file-server, so we have restarted
the computer as soon as possible when this has occured, and currently
use an Intel NIC instead.

The following was logged when the hang occured:

May 19 12:50:32 flash kernel: [12182478.782248] ATL1E :03:00.0: atl1e_clean 
is called when AT_DOWN
...
Dec  8 15:00:28 flash kernel: [5282450.781172] ATL1E :03:00.0: atl1e_clean 
is called when AT_DOWN

/ Anders


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Mon, Apr 01, 2013 at 02:51:56AM +, Huang, Xiong wrote:
  
   I checked windows driver, it does limit  the max packet length for TSO
   windows XP : 32*1024 bytes (include MAC header and all MAC payload). No
  support IP/TCP option.
   Windows 7:  15, 000 bytes, support IP/TCP option.
  
  If TSO on these devices don't work properly with TCP options then you're
  just going to have to disable it - Linux requires it to support at least the
  timestamp option.  I'm not sure about IP options (this really ought to be
  documented).
  
  If there's a length limit lower than 64K, you'll need to set the limit using
  netif_set_gso_max_size() before registering the net device.
  
 
 Ben, thanks for your advice. 
 I have discussed with windows driver developer and hardware designer, the TSO 
 limitation for win driver is just
 For simplifying windows driver due to the buffer length limitation of TX 
 descriptor. The hardware itself has no limitation on
 TSO packet length.

The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
can even raise it to 0x3000 and don't see any tcp retransmits. Do you
have an advice on how to size this value (e.g. should we switch to the
windows values)?

I also found some irregularities in the mtu update code. It differs from the
calculations in the init function (I will send a patch for that).


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Huang, Xiong
 The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN in
 the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I can even
 raise it to 0x3000 and don't see any tcp retransmits. Do you have an advice on
 how to size this value (e.g. should we switch to the windows values)?
 

Would you try 0x4000 ? because the buffer-length in TX descriptor is 14bits, 
0x4000 exceeds max value.
Do you find any bug/issue on the code that calculate the length for each TX 
descriptor ?

Thanks
Xiong


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Eric Dumazet
On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:

 The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
 in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
 can even raise it to 0x3000 and don't see any tcp retransmits. Do you
 have an advice on how to size this value (e.g. should we switch to the
 windows values)?

This looks like an overflow error...

diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c 
b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
index 7e0a822..7965f89 100644
--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
@@ -1569,18 +1569,17 @@ static u16 atl1e_cal_tdp_req(const struct sk_buff *skb)
 {
int i = 0;
u16 tpd_req = 1;
-   u16 fg_size = 0;
-   u16 proto_hdr_len = 0;
 
for (i = 0; i  skb_shinfo(skb)-nr_frags; i++) {
-   fg_size = skb_frag_size(skb_shinfo(skb)-frags[i]);
+   u32 fg_size = skb_frag_size(skb_shinfo(skb)-frags[i]);
+
tpd_req += ((fg_size + MAX_TX_BUF_LEN - 1)  MAX_TX_BUF_SHIFT);
}
 
if (skb_is_gso(skb)) {
if (skb-protocol == htons(ETH_P_IP) ||
   (skb_shinfo(skb)-gso_type == SKB_GSO_TCPV6)) {
-   proto_hdr_len = skb_transport_offset(skb) +
+   u32 proto_hdr_len = skb_transport_offset(skb) +
tcp_hdrlen(skb);
if (proto_hdr_len  skb_headlen(skb)) {
tpd_req += ((skb_headlen(skb) - proto_hdr_len +


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Tue, Apr 02, 2013 at 03:00:38PM -0700, Eric Dumazet wrote:
 On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:
 
  The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
  in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
  can even raise it to 0x3000 and don't see any tcp retransmits. Do you
  have an advice on how to size this value (e.g. should we switch to the
  windows values)?
 
 This looks like an overflow error...

Thanks for your input, Eric.

I am limited in my time to work on this today but nontheless just tested
your patch without any of my changes and count a lot of TcpRetransSegs
again. Either there is really some hardware limitation or another
overflow.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Tue, Apr 02, 2013 at 09:51:12PM +, Huang, Xiong wrote:
  The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN in
  the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I can even
  raise it to 0x3000 and don't see any tcp retransmits. Do you have an advice 
  on
  how to size this value (e.g. should we switch to the windows values)?
  
 
 Would you try 0x4000 ? because the buffer-length in TX descriptor is 14bits, 
 0x4000 exceeds max value.
 Do you find any bug/issue on the code that calculate the length for each TX 
 descriptor ?

Setting MAX_TX_BUF_LEN to 0x4000

[ 8949.833750] ATL1E :04:00.0 p33p1: NIC Link is Up 100 Mbps Full Duplex
[ 8949.833783] IPv6: ADDRCONF(NETDEV_CHANGE): p33p1: link becomes ready
[ 8960.861557] ATL1E :04:00.0 p33p1: PCIE DMA RW error (status = 0x5000400)
[ 8960.866879] ATL1E :04:00.0 p33p1: NIC Link is Up 100 Mbps Full Duplex
[ 8961.095266] ATL1E :04:00.0 p33p1: PCIE DMA RW error (status = 0x5000400)
[ 8961.100791] ATL1E :04:00.0 p33p1: NIC Link is Up 100 Mbps Full Duplex

I have not looked at the buffer calculations intensly.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Huang, Xiong

 
 On Tue, Apr 02, 2013 at 09:51:12PM +, Huang, Xiong wrote:
   The error vanishes as soon as I put a gso size limit of
   MAX_TX_BUF_LEN in the driver. MAX_TX_BUF_LEN seems to be
 arbitrary
   set to 0x2000. I can even raise it to 0x3000 and don't see any tcp
   retransmits. Do you have an advice on how to size this value (e.g. should
 we switch to the windows values)?
  
 
  Would you try 0x4000 ? because the buffer-length in TX descriptor is 14bits,
 0x4000 exceeds max value.
  Do you find any bug/issue on the code that calculate the length for each TX
 descriptor ?
 
 Setting MAX_TX_BUF_LEN to 0x4000
 
 [ 8949.833750] ATL1E :04:00.0 p33p1: NIC Link is Up 100 Mbps Full
 Duplex [ 8949.833783] IPv6: ADDRCONF(NETDEV_CHANGE): p33p1: link
 becomes ready [ 8960.861557] ATL1E :04:00.0 p33p1: PCIE DMA RW error
 (status = 0x5000400) [ 8960.866879] ATL1E :04:00.0 p33p1: NIC Link is Up
 100 Mbps Full Duplex [ 8961.095266] ATL1E :04:00.0 p33p1: PCIE DMA
 RW error (status = 0x5000400) [ 8961.100791] ATL1E :04:00.0 p33p1: NIC
 Link is Up 100 Mbps Full Duplex
 
Hannes,  Thanks for your testing !

 simply revising MAX_TX_BUF_LEN to 0x4000 will cause incorrect TX 
configuration...
I mean you can try to put a gso size limit of 0x4000 (or 0x5000)

Thanks
Xiong



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Eric Dumazet
On Wed, 2013-04-03 at 00:15 +0200, Hannes Frederic Sowa wrote:
 On Tue, Apr 02, 2013 at 03:00:38PM -0700, Eric Dumazet wrote:
  On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:
  
   The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
   in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
   can even raise it to 0x3000 and don't see any tcp retransmits. Do you
   have an advice on how to size this value (e.g. should we switch to the
   windows values)?
  
  This looks like an overflow error...
 
 Thanks for your input, Eric.
 
 I am limited in my time to work on this today but nontheless just tested
 your patch without any of my changes and count a lot of TcpRetransSegs
 again. Either there is really some hardware limitation or another
 overflow.

Another overflow...

Really I don't understand why people use u16 instead of u32.

u16 is slower most of the time, and more prone to overflows.

diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c 
b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
index 7e0a822..48ac487 100644
--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
@@ -1569,18 +1569,17 @@ static u16 atl1e_cal_tdp_req(const struct sk_buff *skb)
 {
int i = 0;
u16 tpd_req = 1;
-   u16 fg_size = 0;
-   u16 proto_hdr_len = 0;
 
for (i = 0; i  skb_shinfo(skb)-nr_frags; i++) {
-   fg_size = skb_frag_size(skb_shinfo(skb)-frags[i]);
+   u32 fg_size = skb_frag_size(skb_shinfo(skb)-frags[i]);
+
tpd_req += ((fg_size + MAX_TX_BUF_LEN - 1)  MAX_TX_BUF_SHIFT);
}
 
if (skb_is_gso(skb)) {
if (skb-protocol == htons(ETH_P_IP) ||
   (skb_shinfo(skb)-gso_type == SKB_GSO_TCPV6)) {
-   proto_hdr_len = skb_transport_offset(skb) +
+   u32 proto_hdr_len = skb_transport_offset(skb) +
tcp_hdrlen(skb);
if (proto_hdr_len  skb_headlen(skb)) {
tpd_req += ((skb_headlen(skb) - proto_hdr_len +
@@ -1670,7 +1669,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
 {
struct atl1e_tpd_desc *use_tpd = NULL;
struct atl1e_tx_buffer *tx_buffer = NULL;
-   u16 buf_len = skb_headlen(skb);
+   u32 buf_len = skb_headlen(skb);
u16 map_len = 0;
u16 mapped_len = 0;
u16 hdr_len = 0;


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Tue, Apr 02, 2013 at 03:34:53PM -0700, Eric Dumazet wrote:
 On Wed, 2013-04-03 at 00:15 +0200, Hannes Frederic Sowa wrote:
  On Tue, Apr 02, 2013 at 03:00:38PM -0700, Eric Dumazet wrote:
   On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:
   
The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
can even raise it to 0x3000 and don't see any tcp retransmits. Do you
have an advice on how to size this value (e.g. should we switch to the
windows values)?
   
   This looks like an overflow error...
  
  Thanks for your input, Eric.
  
  I am limited in my time to work on this today but nontheless just tested
  your patch without any of my changes and count a lot of TcpRetransSegs
  again. Either there is really some hardware limitation or another
  overflow.
 
 Another overflow...
 
 Really I don't understand why people use u16 instead of u32.
 
 u16 is slower most of the time, and more prone to overflows.

Just gave your patch a test and I still have a fast increasing tcp
retransmitted segments counter.

Maximum skb length hitting the device is 23234 in my tests (as reported
by ftrace). So I actually think it is a device limitation.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Tue, Apr 02, 2013 at 10:23:54PM +, Huang, Xiong wrote:
 
  
  On Tue, Apr 02, 2013 at 09:51:12PM +, Huang, Xiong wrote:
The error vanishes as soon as I put a gso size limit of
MAX_TX_BUF_LEN in the driver. MAX_TX_BUF_LEN seems to be
  arbitrary
set to 0x2000. I can even raise it to 0x3000 and don't see any tcp
retransmits. Do you have an advice on how to size this value (e.g. 
should
  we switch to the windows values)?
   
  
   Would you try 0x4000 ? because the buffer-length in TX descriptor is 
   14bits,
  0x4000 exceeds max value.
   Do you find any bug/issue on the code that calculate the length for each 
   TX
  descriptor ?
  
  Setting MAX_TX_BUF_LEN to 0x4000
  
  [ 8949.833750] ATL1E :04:00.0 p33p1: NIC Link is Up 100 Mbps Full
  Duplex [ 8949.833783] IPv6: ADDRCONF(NETDEV_CHANGE): p33p1: link
  becomes ready [ 8960.861557] ATL1E :04:00.0 p33p1: PCIE DMA RW error
  (status = 0x5000400) [ 8960.866879] ATL1E :04:00.0 p33p1: NIC Link is Up
  100 Mbps Full Duplex [ 8961.095266] ATL1E :04:00.0 p33p1: PCIE DMA
  RW error (status = 0x5000400) [ 8961.100791] ATL1E :04:00.0 p33p1: NIC
  Link is Up 100 Mbps Full Duplex
  
 Hannes,  Thanks for your testing !
 
  simply revising MAX_TX_BUF_LEN to 0x4000 will cause incorrect TX 
 configuration...
 I mean you can try to put a gso size limit of 0x4000 (or 0x5000)

I tested both values with multi-gigabyte nfsv4 traffic and both values are ok.
If I understand you correctly 0x4000 is a safe limit?


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Huang, Xiong
  Hannes,  Thanks for your testing !
 
   simply revising MAX_TX_BUF_LEN to 0x4000 will cause incorrect TX
 configuration...
  I mean you can try to put a gso size limit of 0x4000 (or 0x5000)
 
 I tested both values with multi-gigabyte nfsv4 traffic and both values are ok.
 If I understand you correctly 0x4000 is a safe limit?

Since Win7 driver uses 15000 bytes as its max packet length for TSO, I think 
0x3C00 is more safer than 0x4000. :)

Thanks
Xiong


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Tue, Apr 02, 2013 at 03:34:53PM -0700, Eric Dumazet wrote:
 Really I don't understand why people use u16 instead of u32.
 
 u16 is slower most of the time, and more prone to overflows.
 
 diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c 
 b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
 index 7e0a822..48ac487 100644
 --- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
 +++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
 @@ -1569,18 +1569,17 @@ static u16 atl1e_cal_tdp_req(const struct sk_buff 
 *skb)
  {
   int i = 0;
   u16 tpd_req = 1;
 - u16 fg_size = 0;
 - u16 proto_hdr_len = 0;
  
   for (i = 0; i  skb_shinfo(skb)-nr_frags; i++) {
 - fg_size = skb_frag_size(skb_shinfo(skb)-frags[i]);
 + u32 fg_size = skb_frag_size(skb_shinfo(skb)-frags[i]);
 +
   tpd_req += ((fg_size + MAX_TX_BUF_LEN - 1)  MAX_TX_BUF_SHIFT);
   }
  
   if (skb_is_gso(skb)) {
   if (skb-protocol == htons(ETH_P_IP) ||
  (skb_shinfo(skb)-gso_type == SKB_GSO_TCPV6)) {
 - proto_hdr_len = skb_transport_offset(skb) +
 + u32 proto_hdr_len = skb_transport_offset(skb) +
   tcp_hdrlen(skb);
   if (proto_hdr_len  skb_headlen(skb)) {
   tpd_req += ((skb_headlen(skb) - proto_hdr_len +
 @@ -1670,7 +1669,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
  {
   struct atl1e_tpd_desc *use_tpd = NULL;
   struct atl1e_tx_buffer *tx_buffer = NULL;
 - u16 buf_len = skb_headlen(skb);
 + u32 buf_len = skb_headlen(skb);
   u16 map_len = 0;
   u16 mapped_len = 0;
   u16 hdr_len = 0;
 

I tested this patch ontop of the patch which reduces gso max size to 0x3c00.
If you want to submit the patch you could add my acked-by.

Thanks,

  Hannes


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-04-02 Thread Hannes Frederic Sowa
On Wed, Apr 03, 2013 at 12:12:12AM +, Huang, Xiong wrote:
   Hannes,  Thanks for your testing !
  
simply revising MAX_TX_BUF_LEN to 0x4000 will cause incorrect TX
  configuration...
   I mean you can try to put a gso size limit of 0x4000 (or 0x5000)
  
  I tested both values with multi-gigabyte nfsv4 traffic and both values are 
  ok.
  If I understand you correctly 0x4000 is a safe limit?
 
 Since Win7 driver uses 15000 bytes as its max packet length for TSO, I think 
 0x3C00 is more safer than 0x4000. :)

Thanks again for helping to resolve this issue. I just submitted a patch
but accidently killed the cc-line.

Greetings,

  Hannes


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-03-31 Thread Hannes Frederic Sowa
On Sun, Mar 31, 2013 at 12:25:58AM +, Ben Hutchings wrote:
 On Tue, 2010-01-26 at 09:34 +0100, Anders Boström wrote:
   JY == Jie Yang jie.y...@atheros.com writes:
  
   JY Anders Boström and...@netinsight.net wrote:
  
   JY following is my test cese,

   JY a nfs server server with ar8131chip, device id 1063.
export /tmp/ dir as the nfs share directory,  JY the client,
mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
script to write and read data on the  JY
/mnt/nfs/testnfs.log. it works fine.

OK, the device-ID in our NFS-server is 1026, rev. b0. So it
is possible that the problem is specific to that chip/version.
   JY oops, its my mistake in writing, my case is 1026 device ID
  

   JY Can you give me some advice on how to reproduce this bug??

The only suggestion I have is to try to find a board with a
1026-chip on it.

My test-case is just copy of a 1 Gbyte file from the
NFS-server to /dev/null , after making sure that the file
isn't cached on the client by reading huge amounts of other data.

   JY just to check, if the kernel version is 2.6.26-2 ??
  
  I've tested with
  Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2,
  Debian linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
  kernel.org 2.6.30.10 amd64 with ethtool patch for setting of tso. Same
  result.
 
 Does booting with the kernel parameter 'pci=nomsi' avoid the problem?

Thanks Ben for bringing this up.

I'll have a look if I can reproduce it in the next days and if I'll try to
find a workaround.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-03-31 Thread Huang, Xiong
 
  I checked windows driver, it does limit  the max packet length for TSO
  windows XP : 32*1024 bytes (include MAC header and all MAC payload). No
 support IP/TCP option.
  Windows 7:  15, 000 bytes, support IP/TCP option.
 
 If TSO on these devices don't work properly with TCP options then you're
 just going to have to disable it - Linux requires it to support at least the
 timestamp option.  I'm not sure about IP options (this really ought to be
 documented).
 
 If there's a length limit lower than 64K, you'll need to set the limit using
 netif_set_gso_max_size() before registering the net device.
 

Ben, thanks for your advice. 
I have discussed with windows driver developer and hardware designer, the TSO 
limitation for win driver is just
For simplifying windows driver due to the buffer length limitation of TX 
descriptor. The hardware itself has no limitation on
TSO packet length.

BTW. Ip/tcp option is supported as well.

Thanks
Xiong


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-03-30 Thread Ben Hutchings
On Tue, 2010-01-26 at 09:34 +0100, Anders Boström wrote:
  JY == Jie Yang jie.y...@atheros.com writes:
 
  JY Anders Boström and...@netinsight.net wrote:
 
  JY following is my test cese,
   
  JY a nfs server server with ar8131chip, device id 1063.
   export /tmp/ dir as the nfs share directory,  JY the client,
   mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
   script to write and read data on the  JY
   /mnt/nfs/testnfs.log. it works fine.
   
   OK, the device-ID in our NFS-server is 1026, rev. b0. So it
   is possible that the problem is specific to that chip/version.
  JY oops, its my mistake in writing, my case is 1026 device ID
 
   
  JY Can you give me some advice on how to reproduce this bug??
   
   The only suggestion I have is to try to find a board with a
   1026-chip on it.
   
   My test-case is just copy of a 1 Gbyte file from the
   NFS-server to /dev/null , after making sure that the file
   isn't cached on the client by reading huge amounts of other data.
   
  JY just to check, if the kernel version is 2.6.26-2 ??
 
 I've tested with
 Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2,
 Debian linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
 kernel.org 2.6.30.10 amd64 with ethtool patch for setting of tso. Same
 result.

Does booting with the kernel parameter 'pci=nomsi' avoid the problem?

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.


signature.asc
Description: This is a digitally signed message part


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-03-30 Thread Huang, Xiong
 
  I've tested with
  Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2, Debian
  linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and kernel.org
  2.6.30.10 amd64 with ethtool patch for setting of tso. Same result.
 
 Does booting with the kernel parameter 'pci=nomsi' avoid the problem?
 

Hannes has found DMA-write (for rx-packet)  is abnormal due to msi function. 
But TSO is for rx-packet, an opposite direction. I'm not sure :(,
If someone has this issue,  he/she could have a try.

Thanks
Xiong


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-03-30 Thread Huang, Xiong
  
   I've tested with
   Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2, Debian
   linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
   kernel.org
   2.6.30.10 amd64 with ethtool patch for setting of tso. Same result.
 
  Does booting with the kernel parameter 'pci=nomsi' avoid the problem?
 
 
 Hannes has found DMA-write (for rx-packet)  is abnormal due to msi function.
 But TSO is for rx-packet, an opposite direction. I'm not sure :(, If someone
 has this issue,  he/she could have a try.
 

I checked windows driver, it does limit  the max packet length for TSO
windows XP : 32*1024 bytes (include MAC header and all MAC payload). No support 
IP/TCP option.
Windows 7:  15, 000 bytes, support IP/TCP option.

Thanks
Xiong



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2013-03-30 Thread Ben Hutchings
On Sun, 2013-03-31 at 02:18 +, Huang, Xiong wrote:
   
I've tested with
Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2, Debian
linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
kernel.org
2.6.30.10 amd64 with ethtool patch for setting of tso. Same result.
  
   Does booting with the kernel parameter 'pci=nomsi' avoid the problem?
  
  
  Hannes has found DMA-write (for rx-packet)  is abnormal due to msi function.
  But TSO is for rx-packet, an opposite direction. I'm not sure :(, If someone
  has this issue,  he/she could have a try.
  
 
 I checked windows driver, it does limit  the max packet length for TSO
 windows XP : 32*1024 bytes (include MAC header and all MAC payload). No 
 support IP/TCP option.
 Windows 7:  15, 000 bytes, support IP/TCP option.

If TSO on these devices don't work properly with TCP options then you're
just going to have to disable it - Linux requires it to support at least
the timestamp option.  I'm not sure about IP options (this really ought
to be documented).

If there's a length limit lower than 64K, you'll need to set the limit
using netif_set_gso_max_size() before registering the net device.

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.


signature.asc
Description: This is a digitally signed message part


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-26 Thread Anders Boström
 JY == Jie Yang jie.y...@atheros.com writes:

 JY Anders Boström and...@netinsight.net wrote:

 JY following is my test cese,
  
 JY a nfs server server with ar8131chip, device id 1063.
  export /tmp/ dir as the nfs share directory,  JY the client,
  mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
  script to write and read data on the  JY
  /mnt/nfs/testnfs.log. it works fine.
  
  OK, the device-ID in our NFS-server is 1026, rev. b0. So it
  is possible that the problem is specific to that chip/version.
 JY oops, its my mistake in writing, my case is 1026 device ID

  
 JY Can you give me some advice on how to reproduce this bug??
  
  The only suggestion I have is to try to find a board with a
  1026-chip on it.
  
  My test-case is just copy of a 1 Gbyte file from the
  NFS-server to /dev/null , after making sure that the file
  isn't cached on the client by reading huge amounts of other data.
  
 JY just to check, if the kernel version is 2.6.26-2 ??

I've tested with
Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2,
Debian linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
kernel.org 2.6.30.10 amd64 with ethtool patch for setting of tso. Same
result.

/ Anders



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-25 Thread Jie Yang
Anders Boström and...@netinsight.net wrote:
 IP-header length field, but is shorter.
  
  JY following is my test cese,

  JY a nfs server server with ar8131chip, device id 1063.
 export /tmp/ dir as the nfs share directory,  JY the client,
 mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
 script to write and read data on the  JY
 /mnt/nfs/testnfs.log. it works fine.

 OK, the device-ID in our NFS-server is 1026, rev. b0. So it
 is possible that the problem is specific to that chip/version.
oops, its my mistake in writing, my case is 1026 device ID


  JY Can you give me some advice on how to reproduce this bug??

 The only suggestion I have is to try to find a board with a
 1026-chip on it.

 My test-case is just copy of a 1 Gbyte file from the
 NFS-server to /dev/null , after making sure that the file
 isn't cached on the client by reading huge amounts of other data.

just to check, if the kernel version is 2.6.26-2 ??

Best wishes
jie



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-25 Thread Anders Boström
 JY == Jie Yang jie.y...@atheros.com writes:

 JY Anders Boström and...@netinsight.net wrote:
  Cc: b...@decadent.org.uk; net...@vger.kernel.org;
  565...@bugs.debian.org; Xiong Huang
  Subject: Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e:
  TSO is broken

  One strange observation is that I can only reproduce this
  problem when transmitting data from a NFS-server using TCP
  with Atheros AR8121/AR8113/AR8114.
  
  I've tried to reproduce the problem using test-programs, like
  nttcp and netpipe, without any success. One observation is
  that the test-programs *only* generates 1500 bytes
  IP-packets. When the NFS-server sends data, a sequence of
  1500 bytes IP-packets are generated, ending with a shorter
  packet. And this last packet in the sequence has 1500 in the
  IP-header length field, but is shorter.
  
 JY following is my test cese,

 JY a nfs server server with ar8131chip, device id 1063. export /tmp/ dir as 
the nfs share directory,
 JY the client, mount the server_ip:/tmp to local dir /mnt/nfs, ust a python 
script to write and read data on the
 JY /mnt/nfs/testnfs.log. it works fine.

OK, the device-ID in our NFS-server is 1026, rev. b0. So it is
possible that the problem is specific to that chip/version.

 JY Can you give me some advice on how to reproduce this bug??

The only suggestion I have is to try to find a board with a 1026-chip
on it.

My test-case is just copy of a 1 Gbyte file from the
NFS-server to /dev/null , after making sure that the file isn't cached
on the client by reading huge amounts of other data.

/ Anders



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-24 Thread Jie Yang
Anders Boström and...@netinsight.net wrote:

 Cc: b...@decadent.org.uk; net...@vger.kernel.org;
 565...@bugs.debian.org; Xiong Huang
 Subject: Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e:
 TSO is broken

 One strange observation is that I can only reproduce this
 problem when transmitting data from a NFS-server using TCP
 with Atheros AR8121/AR8113/AR8114.

 I've tried to reproduce the problem using test-programs, like
 nttcp and netpipe, without any success. One observation is
 that the test-programs *only* generates 1500 bytes
 IP-packets. When the NFS-server sends data, a sequence of
 1500 bytes IP-packets are generated, ending with a shorter
 packet. And this last packet in the sequence has 1500 in the
 IP-header length field, but is shorter.

following is my test cese,

a nfs server server with ar8131chip, device id 1063. export /tmp/ dir as the 
nfs share directory,
the client, mount the server_ip:/tmp to local dir /mnt/nfs, ust a python script 
to write and read data on the
/mnt/nfs/testnfs.log. it works fine.

Can you give me some advice on how to reproduce this bug??

Best wishes
jie



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-23 Thread Ben Hutchings
On Thu, 2010-01-21 at 17:42 +0100, Anders Boström wrote:
  JY == Jie Yang jie.y...@atheros.com writes:
 
   Have you tested NFS over TCP? The block-size the application
   uses can have an effect on this. What application did you
   use? Block-size?
   
  JY yes, I tested NFS over TCP.
 
 One strange observation is that I can only reproduce this problem when
 transmitting data from a NFS-server using TCP with Atheros
 AR8121/AR8113/AR8114.
 
 I've tried to reproduce the problem using test-programs, like nttcp
 and netpipe, without any success. One observation is that the
 test-programs *only* generates 1500 bytes IP-packets. When
 the NFS-server sends data, a sequence of 1500 bytes IP-packets are
 generated, ending with a shorter packet. And this last packet in the
 sequence has 1500 in the IP-header length field, but is shorter.

I ran tcpdump over your packet capture and saw:

13:48:39.122723 00:26:18:ae:69:6d  00:18:f3:52:22:3f, ethertype IPv4 (0x0800), 
length 1514: (tos 0x0, ttl 64, id 32664, offset 0, flags [DF], proto TCP (6), 
length 1500)
10.100.0.88.2049  10.100.1.25.888: Flags [.], cksum 0x3ebd (correct), seq 
21720:23168, ack 157, win 501, options [nop,nop,TS val 152460082 ecr 
1212787170], length 1448
13:48:39.122733 00:18:f3:52:22:3f  00:26:18:ae:69:6d, ethertype IPv4 (0x0800), 
length 66: (tos 0x0, ttl 64, id 39773, offset 0, flags [DF], proto TCP (6), 
length 52)
10.100.1.25.888  10.100.0.88.2049: Flags [.], cksum 0x5cfc (correct), ack 
23168, win 58293, options [nop,nop,TS val 1212787170 ecr 152460082], length 0
13:48:39.122742 00:26:18:ae:69:6d  00:18:f3:52:22:3f, ethertype IPv4 (0x0800), 
length 1462: truncated-ip - 52 bytes missing! (tos 0x0, ttl 64, id 32664, 
offset 0, flags [DF], proto TCP (6), length 1500)
10.100.0.88.2049  10.100.1.25.888: Flags [.], seq 23168:24616, ack 157, 
win 501, options [nop,nop,TS val 152460082 ecr 1212787170], length 1448
13:48:39.122747 00:26:18:ae:69:6d  00:18:f3:52:22:3f, ethertype IPv4 (0x0800), 
length 1514: (tos 0x0, ttl 64, id 32666, offset 0, flags [DF], proto TCP (6), 
length 1500)
10.100.0.88.2049  10.100.1.25.888: Flags [.], cksum 0x33a1 (correct), seq 
24564:26012, ack 157, win 501, options [nop,nop,TS val 152460082 ecr 
1212787170], length 1448

Based on the TCP sequence numbers, it seems that the length of the
broken packet is correct but its IP header is wrong.

My understanding is that the length of the TCP payload in a GSO skb must
always be a multiple of the gso_size, so that hardware is not required
to adjust length fields.  So I see several possible explanations:

1. Something generated invalid GSO skbs (unlikely; other hardware should
show the same problem)
2. The driver constructed TSO DMA descriptors for a non-GSO skb
3. The hardware is continuing to apply TSO to packets with non-TSO DMA
descriptors

Ben.

-- 
Ben Hutchings
Any smoothly functioning technology is indistinguishable from a rigged demo.


signature.asc
Description: This is a digitally signed message part


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-23 Thread Herbert Xu
Ben Hutchings b...@decadent.org.uk wrote:
 
 Based on the TCP sequence numbers, it seems that the length of the
 broken packet is correct but its IP header is wrong.
 
 My understanding is that the length of the TCP payload in a GSO skb must
 always be a multiple of the gso_size, so that hardware is not required
 to adjust length fields.  So I see several possible explanations:

No, there is no such requirement.  The trailer skb can be of any
size less than or equal to gso_size.

However, if the hardware assumed this then yes it would explain
the problem.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-21 Thread Anders Boström
 JY == Jie Yang jie.y...@atheros.com writes:

  Have you tested NFS over TCP? The block-size the application
  uses can have an effect on this. What application did you
  use? Block-size?
  
 JY yes, I tested NFS over TCP.

One strange observation is that I can only reproduce this problem when
transmitting data from a NFS-server using TCP with Atheros
AR8121/AR8113/AR8114.

I've tried to reproduce the problem using test-programs, like nttcp
and netpipe, without any success. One observation is that the
test-programs *only* generates 1500 bytes IP-packets. When
the NFS-server sends data, a sequence of 1500 bytes IP-packets are
generated, ending with a shorter packet. And this last packet in the
sequence has 1500 in the IP-header length field, but is shorter.

/ Anders



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-20 Thread Anders Boström
 JY == Jie Yang jie.y...@atheros.com writes:

 JY Anders Boström and...@netinsight.net wrote:
  It is an ASUS M4A78 PRO motherboard with the Atheros
  AR8121/AR8113/AR8114 on-board.
  
   ~25Mbyte/s performance. I get ~5000 retransmitted packets
  per GByte   data, according to RetransSegs in
  /proc/net/snmp . wireshark in the   client show that the
  server send out a sequence of frames. All but the   last
  one are 1500 bytes IP-packets. The last one is shorter, but
  the   IP-header still say 1500 byte. The client then
  requests retransmit,   and the retransmitted frame arrives
  with correct IP-header.

 JY i just test it on Linux localhost.localdomain 2.6.31.5-127.fc12.x86_64 #1 
SMP Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux.
 JY with hardware, Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev 
b0)
 JY device id : 1969:1026 (rev b0)

 JY i upload/download a 382M it work well with retransmit packet:

Have you tested NFS over TCP? The block-size the application uses can
have an effect on this. What application did you use? Block-size?

/ Anders



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-20 Thread Jie Yang
 Anders Boström and...@netinsight.net
 Sent: Wednesday, January 20, 2010 5:27 PM
 To: Jie Yang
 Cc: b...@decadent.org.uk; net...@vger.kernel.org;
 565...@bugs.debian.org; Xiong Huang
 Subject: Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e:
 TSO is broken

  JY == Jie Yang jie.y...@atheros.com writes:

  JY Anders Boström and...@netinsight.net wrote:
   It is an ASUS M4A78 PRO motherboard with the Atheros  
 AR8121/AR8113/AR8114 on-board.
  
~25Mbyte/s performance. I get ~5000 retransmitted
 packets   per GByte   data, according to RetransSegs in
  /proc/net/snmp . wireshark in the   client show that the
   server send out a sequence of frames. All but the  
 last   one are 1500 bytes IP-packets. The last one is
 shorter, but   the   IP-header still say 1500 byte. The
 client then   requests retransmit,   and the
 retransmitted frame arrives   with correct IP-header.

  JY i just test it on Linux localhost.localdomain
 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009
 x86_64 x86_64 x86_64 GNU/Linux.
  JY with hardware, Atheros AR8121/AR8113/AR8114 PCI-E
 Ethernet Controller (rev b0)  JY device id : 1969:1026 (rev b0)

  JY i upload/download a 382M it work well with retransmit packet:

 Have you tested NFS over TCP? The block-size the application
 uses can have an effect on this. What application did you
 use? Block-size?

yes, I tested NFS over TCP.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-19 Thread Jie Yang
Anders Boström and...@netinsight.net wrote:

 It is an ASUS M4A78 PRO motherboard with the Atheros
 AR8121/AR8113/AR8114 on-board.

   ~25Mbyte/s performance. I get ~5000 retransmitted packets
 per GByte   data, according to RetransSegs in
 /proc/net/snmp . wireshark in the   client show that the
 server send out a sequence of frames. All but the   last
 one are 1500 bytes IP-packets. The last one is shorter, but
 the   IP-header still say 1500 byte. The client then
 requests retransmit,   and the retransmitted frame arrives
 with correct IP-header.

i just test it on Linux localhost.localdomain 2.6.31.5-127.fc12.x86_64 #1 SMP 
Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux.
with hardware, Atheros AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev b0)
device id : 1969:1026 (rev b0)

i upload/download a 382M it work well with retransmit packet:

Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails 
EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 12 -1 2 4 2 0 2 532501 220631 6 0 2

I also test it on kernel 2.6.33-rc1 sync from git. but it fail to boot kernel




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-17 Thread Ben Hutchings
On Fri, 2010-01-15 at 14:25 +0100, Anders Boström wrote:
 When I run NFS over TCP (default options) and read large files from a
 server with Atheros AR8121/AR8113/AR8114 Ethernet chip, I only get

Do you know which specific chip it is?

 ~25Mbyte/s performance. I get ~5000 retransmitted packets per GByte
 data, according to RetransSegs in /proc/net/snmp . wireshark in the
 client show that the server send out a sequence of frames. All but the
 last one are 1500 bytes IP-packets. The last one is shorter, but the
 IP-header still say 1500 byte. The client then requests retransmit,
 and the retransmitted frame arrives with correct IP-header.

Please can you send a longer packet capture in pcap format?

[...]
 I've also reported this upstream.

Since this is network-related, you should mail net...@vger.kernel.org
not linux-ker...@vger.

Ben.

-- 
Ben Hutchings
I'm not a reverse psychological virus.  Please don't copy me into your sig.


signature.asc
Description: This is a digitally signed message part


Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken

2010-01-15 Thread Anders Boström
Package: linux-image-2.6.26-2-amd64
Version: 2.6.26-19
Severity: normal

Short desription


TCP Segmentation Offload (TSO) result in broken IPv4-packets sent out
from Atheros AR8121/AR8113/AR8114 with the atl1e driver.


Work around
---

Turn off TSO.


Long desription


When I run NFS over TCP (default options) and read large files from a
server with Atheros AR8121/AR8113/AR8114 Ethernet chip, I only get
~25Mbyte/s performance. I get ~5000 retransmitted packets per GByte
data, according to RetransSegs in /proc/net/snmp . wireshark in the
client show that the server send out a sequence of frames. All but the
last one are 1500 bytes IP-packets. The last one is shorter, but the
IP-header still say 1500 byte. The client then requests retransmit,
and the retransmitted frame arrives with correct IP-header.

If I mount NFS using UDP instead, performance is ~110Mbyte/s.

TCP Segmentation Offload (TSO) is default enabled in the atl1e
Ethernet-driver. When I run a patched 2.6.30.10, enabling ethtool to
turn off TSO (using ac936929092dc6a5409b627c4c67305ab9b626b3 by Ben
Hutchings), and turn off TSO, the problem disappears. Performance is
~110Mbyte/s and no broken IP-packets arrive.


Capture of 146-byte Ethernet frame with bad IP-header:

No. TimeSourceDestination   Protocol Info
  98329 11.034129   flash.netinsight.se   sid.netinsight.se RPC  
Continuation

Frame 98329 (146 bytes on wire, 146 bytes captured)
Arrival Time: Jan 15, 2010 13:35:16.224491000
[Time delta from previous captured frame: 0.09000 seconds]
[Time delta from previous displayed frame: 0.09000 seconds]
[Time since reference or first frame: 11.034129000 seconds]
Frame Number: 98329
Frame Length: 146 bytes
Capture Length: 146 bytes
[Frame is marked: False]
[Protocols in frame: eth:ip:tcp:rpc]
[Coloring Rule Name: TCP]
[Coloring Rule String: tcp]
Ethernet II, Src: AsustekC_ae:69:6d (00:26:18:ae:69:6d), Dst: sid.netinsight.se 
(00:18:f3:52:22:3f)
Internet Protocol, Src: flash.netinsight.se (10.100.0.88), Dst: 
sid.netinsight.se (10.100.1.25)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
Total Length: 1500
Identification: 0x331e (13086)
Flags: 0x02 (Don't Fragment)
Fragment offset: 0
Time to live: 64
Protocol: TCP (0x06)
Header checksum: 0xebc5 [correct]
Source: flash.netinsight.se (10.100.0.88)
Destination: sid.netinsight.se (10.100.1.25)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: accessbuilder 
(888), Seq: 93989617, Ack: 516997, Len: 80
Remote Procedure Call

  00 18 f3 52 22 3f 00 26 18 ae 69 6d 08 00 45 00   ...R?...im..E.
0010  05 dc 33 1e 40 00 40 06 eb c5 0a 64 00 58 0a 64   @.@d.X.d
0020  01 19 08 01 03 78 a4 07 57 23 e6 50 1f 1b 80 10   .x..W#.P
0030  01 f5 dd 8d 00 00 01 01 08 0a 05 28 ca 7e 38 93   ...(.~8.
0040  67 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00   g...
0050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   
0060  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   
0070  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   
0080  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   
0090  00 00 ..



Software info:

I've tested with Debian 2.6.26 (stable) and 2.6.30 (testing), as well
as 2.6.30.10 from kernel.org. Same result.
Architecture: amd64 (x86_64)


Hardware info:

lspci -vvv:

03:00.0 Ethernet controller: Attansic Technology Corp. Atheros 
AR8121/AR8113/AR8114 PCI-E Ethernet Controller (rev b0)
Subsystem: ASUSTeK Computer Inc. Device 831c
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR+ PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 27
Region 0: Memory at fbfc (64-bit, non-prefetchable) [size=256K]
Region 2: I/O ports at ec00 [size=128]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable+
Address: fee0f00c  Data: 4189
Capabilities: [58] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s 4us, 
L1 unlimited
ExtTag- AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes,