Hi everyone,

I'm getting e1000 tx unit hangs, I've done a lot of digging, and I think I know
why they are occurring.

It seems very rare circumstances cause TCP to send packets where gso_size=16.
e1000_xmit_frame computes max_per_txd=64. The headers are 66 bytes, so they get
split across two descriptors. Intel's open source programmer's manual says this
is not supported (see section 3.5.6), so my guess is this is the cause of my
hangs.

The dump below shows the suspected trouble. Descriptor 0x0BF sets MSS=0010h.
And the following two descriptors are the headers for the packet, split across
two descriptors. 0x0C0 has the first 64 bytes, and 0x0C1 has the last 2 bytes
of the headers. This is the only packet in the ring that sets TSE.

0000:04:00.1: eth1: Detected Tx Unit Hang:
TDH                  <c0>
TDT                  <ab>
next_to_use          <ab>
next_to_clean        <bf>
buffer_info[next_to_clean]:
time_stamp           <105e55050>
next_to_watch        <c3>
jiffies              <105e5509e>
next_to_watch.status <0>

Tc[0x0BF]    000032220021180E 0010420027000020 0000000000000000 0021   C3 
0000000105E55050 (null) NTC
Td[0x0C0]    000000003062207E 0000030026100040 000000003062207E 0040   C0 
0000000105E55050 (null)
buffer at 000000003062207E (64)
0000: 00 00 0c 07 ac 00 00 15 17 64 af 99 08 00 45 00 
0010: 00 00 29 76 40 00 40 06 00 00 ac 12 2f eb ce c8 
0020: fb 92 67 67 b8 f6 74 4e 49 cc 25 d2 38 8a 80 18 
0030: 00 06 a6 5f 00 00 01 01 08 0a 05 e5 50 50 04 bf 
Td[0x0C1]    00000000306220BE 0000030026100006 00000000306220BE 0006   C1 
0000000105E55050 (null)
buffer at 00000000306220BE (6)
0000: 9e c1 00 0e 53 44                               
Td[0x0C2]    00000000306A5677 0000030026100018 00000000306A5677 0018   C2 
0000000105E55050 (null)
buffer at 00000000306A5677 (24)
0000: 05 a1 a8 22 00 00 00 00 04 77 da 03 00 0e 53 44 
0010: 05 a2 11 16 00 00 00 00                         
Td[0x0C3]    00000000306A568F 00000300AF100004 00000000306A568F 0004   C3 
0000000105E55050 ffff88004da8fe80
buffer at 00000000306A568F (4)
0000: 04 85 1b cf                                     

Environment:
# ethtool -i eth1
driver: e1000e
version: 0.3.3.3-k6
firmware-version: 1.0-0
bus-info: 0000:04:00.1
# lspci | grep 04:00.1
04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
# uname -srvmo
Linux 2.6.29.4 #1 SMP Tue Aug 10 18:20:22 EDT 2010 x86_64 GNU/Linux


This hang has occurred twice with the dump code patched into the kernel, and
both times the dump showed the same split header descriptor with MSS=16.

Is my diagnosis correct? Is the split header the cause of the hang?

More so, does the hardware even actually support sending TSO packets where
MSS=16? Or MSS=1? (Assuming the headers are not split across two descriptors).

Can this be fixed by ensuring the headers for each packet are placed into a 
single descriptor? Or does TCP need to be fixed to not send gso_size=16?


Regards,
-Bob

------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to