Hi everyone, I'm getting e1000 tx unit hangs, I've done a lot of digging, and I think I know why they are occurring.
It seems very rare circumstances cause TCP to send packets where gso_size=16. e1000_xmit_frame computes max_per_txd=64. The headers are 66 bytes, so they get split across two descriptors. Intel's open source programmer's manual says this is not supported (see section 3.5.6), so my guess is this is the cause of my hangs. The dump below shows the suspected trouble. Descriptor 0x0BF sets MSS=0010h. And the following two descriptors are the headers for the packet, split across two descriptors. 0x0C0 has the first 64 bytes, and 0x0C1 has the last 2 bytes of the headers. This is the only packet in the ring that sets TSE. 0000:04:00.1: eth1: Detected Tx Unit Hang: TDH <c0> TDT <ab> next_to_use <ab> next_to_clean <bf> buffer_info[next_to_clean]: time_stamp <105e55050> next_to_watch <c3> jiffies <105e5509e> next_to_watch.status <0> Tc[0x0BF] 000032220021180E 0010420027000020 0000000000000000 0021 C3 0000000105E55050 (null) NTC Td[0x0C0] 000000003062207E 0000030026100040 000000003062207E 0040 C0 0000000105E55050 (null) buffer at 000000003062207E (64) 0000: 00 00 0c 07 ac 00 00 15 17 64 af 99 08 00 45 00 0010: 00 00 29 76 40 00 40 06 00 00 ac 12 2f eb ce c8 0020: fb 92 67 67 b8 f6 74 4e 49 cc 25 d2 38 8a 80 18 0030: 00 06 a6 5f 00 00 01 01 08 0a 05 e5 50 50 04 bf Td[0x0C1] 00000000306220BE 0000030026100006 00000000306220BE 0006 C1 0000000105E55050 (null) buffer at 00000000306220BE (6) 0000: 9e c1 00 0e 53 44 Td[0x0C2] 00000000306A5677 0000030026100018 00000000306A5677 0018 C2 0000000105E55050 (null) buffer at 00000000306A5677 (24) 0000: 05 a1 a8 22 00 00 00 00 04 77 da 03 00 0e 53 44 0010: 05 a2 11 16 00 00 00 00 Td[0x0C3] 00000000306A568F 00000300AF100004 00000000306A568F 0004 C3 0000000105E55050 ffff88004da8fe80 buffer at 00000000306A568F (4) 0000: 04 85 1b cf Environment: # ethtool -i eth1 driver: e1000e version: 0.3.3.3-k6 firmware-version: 1.0-0 bus-info: 0000:04:00.1 # lspci | grep 04:00.1 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) # uname -srvmo Linux 2.6.29.4 #1 SMP Tue Aug 10 18:20:22 EDT 2010 x86_64 GNU/Linux This hang has occurred twice with the dump code patched into the kernel, and both times the dump showed the same split header descriptor with MSS=16. Is my diagnosis correct? Is the split header the cause of the hang? More so, does the hardware even actually support sending TSO packets where MSS=16? Or MSS=1? (Assuming the headers are not split across two descriptors). Can this be fixed by ensuring the headers for each packet are placed into a single descriptor? Or does TCP need to be fixed to not send gso_size=16? Regards, -Bob ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired