Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: Bjorn, I was thinking yesterday and thought if you could have sent me the wrong dump file, may be! Can you check confirm that is not the case? I'm pretty sure it's the right log. You can even see when the USB keyboard was connected and the driver reloaded at 16:36: [94258.692165] usb 1-3.1: new low speed USB device using ehci_hcd and address 3 [94258.796645] usb 1-3.1: New USB device found, idVendor=03f0, idProduct=0024 [94258.799573] usb 1-3.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [94258.802513] usb 1-3.1: Product: HP Basic USB Keyboard [94258.805482] usb 1-3.1: Manufacturer: CHICONY [94258.992191] input: CHICONY HP Basic USB Keyboard as /devices/pci:00/:00:1d.7/usb1/1-3/1-3.1/1-3.1:1.0/input/input3 [94258.995591] generic-usb 0003:03F0:0024.0001: input,hidraw0: USB HID v1.10 Keyboard [CHICONY HP Basic USB Keyboard] on usb-:00:1d.7-3.1/input0 [94259.002616] usbcore: registered new interface driver usbhid [94259.006046] usbhid: USB HID core driver [94352.064182] e1000 :07:08.0: PCI INT A disabled [94352.496202] e1000 :06:07.0: PCI INT A disabled [94352.568084] e1000 :03:0b.1: PCI INT B disabled [94352.632079] e1000 :03:0b.0: PCI INT A disabled [94390.926335] Intel(R) PRO/1000 Network Driver - version 8.0.30-NAPI_debug [94390.930272] Copyright (c) 1999-2010 Intel Corporation. [94390.934269] e1000 :03:0b.0: PCI INT A - GSI 37 (level, low) - IRQ 37 [94391.209054] e1000: :03:0b.0: e1000_probe: (PCI-X:133MHz:64-bit) 00:1b:21:5d:e4:10 [94391.249271] [ cut here ] [94391.253330] WARNING: at /usr/src/linux-headers-2.6.37-1-common/include/linux/netdevice.h:1557 netif_tx_stop_queue+0x24/0x40 [e1000]() [94391.261724] Hardware name: PowerEdge 1850 [94391.266092] Modules linked in: e1000(+) usbhid hid binfmt_misc fuse ipmi_si ipmi_devintf ipmi_msghandler ide_generic ide_gd_mod ide_cd_mod ide_core r 2c_algo_bit i2c_core power_supply e752x_edac tpm_tis dcdbas edac_core video tpm shpchp tpm_bios processor thermal_sys psmouse evdev output pci_hotplug r w ext3 jbd mbcache sd_mod crc_t10dif sg sr_mod cdrom ata_generic ata_piix libata mptspi mptscsih uhci_hcd mptbase floppy scsi_transport_spi scsi_mod ehc nloaded: e1000] [94391.290790] Pid: 22748, comm: insmod Tainted: GW 2.6.37-1-amd64 #1 [94391.295916] Call Trace: [94391.301049] [81046ed4] ? warn_slowpath_common+0x78/0x8c I have confirmed that the ignore_64bit_dma is working on my system. And as far as I can tell it doesn't work here. :-( I have considered modifying the driver code, adding logging of the parameter at the time of the hangs. But I haven't had time to do that yet. -- Björn -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: # rmmod e1000 # insmod ./e1000.ko ignore_64bit_dma=1 # cat /sys/module/e1000/parameters/ignore_64bit_dma Well, now the /sys variable finally says 1, but the hangs are still occurring. Here is a kernel log showing tx hangs (with dma on), reloading of the module at 16:39 and then more tx hangs (with dma off): http://bjorn.haxx.se/e1000/64bit-dma-disabled.log.bz2 -- Björn -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Brandeburg, Jesse wrote: I believe it has to be done at load time, there is no code to re-register its dma masks, and the registration is done early in probe. I am unable to get the ignore_64bit_dma parameter to take at boot. I have the following line in /etc/modprobe.conf: options e1000 ignore_64bit_dma=1 But after rebooting, the parameter is still 0: # cat /sys/module/e1000/parameters/ignore_64bit_dma 0 The following dmesg lines look like they might be related: [1.473278] [ cut here ] [1.473355] WARNING: at /usr/src/linux-headers-2.6.37-1-common/include/linux/netdevice.h:1557 netif_tx_stop_queue+0x24/0x40 [e1000]() [1.473435] Hardware name: PowerEdge 1850 [1.473491] Modules linked in: mptspi(+) mptscsih uhci_hcd(+) mptbase floppy scsi_transport_spi scsi_mod e1000(+) ehci_hcd usbcore nls_base [1.474029] Pid: 137, comm: modprobe Not tainted 2.6.37-1-amd64 #1 [1.474088] Call Trace: [1.474150] [81046ed4] ? warn_slowpath_common+0x78/0x8c [1.474220] [a018e924] ? netif_tx_stop_queue+0x24/0x40 [e1000] [1.474292] [a01a931c] ? e1000_probe+0x9fc/0xb2d [e1000] [1.474355] [811ad232] ? local_pci_probe+0x49/0x92 [1.474416] [811adf39] ? pci_device_probe+0xc2/0xef [1.474478] [8123439e] ? driver_sysfs_add+0x66/0x8d [1.474538] [812344df] ? driver_probe_device+0xa8/0x138 [1.474599] [812345be] ? __driver_attach+0x4f/0x6f [1.474659] [8123456f] ? __driver_attach+0x0/0x6f [1.474719] [81233b68] ? bus_for_each_dev+0x44/0x78 [1.474779] [81233fc0] ? bus_add_driver+0xa8/0x1f0 [1.474839] [81234865] ? driver_register+0x90/0xf8 [1.474901] [811ae183] ? __pci_register_driver+0x4e/0xc0 [1.474969] [a01b7000] ? e1000_init_module+0x0/0x82 [e1000] [1.475037] [a01b704c] ? e1000_init_module+0x4c/0x82 [e1000] [1.475099] [81002079] ? do_one_initcall+0x78/0x131 [1.475161] [8107588c] ? sys_init_module+0x97/0x1d3 [1.475223] [81033073] ? ia32_sysret+0x0/0x5 [1.475281] ---[ end trace 5be424714a9ab14e ]--- [1.475338] netif_stop_queue() cannot be called before register_netdev() [1.476285] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection [1.476441] e1000 :03:0b.1: PCI INT B - GSI 38 (level, low) - IRQ 38 [1.785286] [ cut here ] [1.785361] WARNING: at /usr/src/linux-headers-2.6.37-1-common/include/linux/netdevice.h:1557 netif_tx_stop_queue+0x24/0x40 [e1000]() [1.785441] Hardware name: PowerEdge 1850 [1.785496] Modules linked in: sg sr_mod cdrom ata_generic ata_piix libata mptspi(+) mptscsih uhci_hcd mptbase floppy scsi_transport_spi scsi_mod e1000(+) ehci_hcd usbcore nls_base [1.786286] Pid: 137, comm: modprobe Tainted: GW 2.6.37-1-amd64 #1 [1.799486] Call Trace: [1.799549] [81046ed4] ? warn_slowpath_common+0x78/0x8c [1.799620] [a018e924] ? netif_tx_stop_queue+0x24/0x40 [e1000] [1.799692] [a01a931c] ? e1000_probe+0x9fc/0xb2d [e1000] [1.799755] [811ad232] ? local_pci_probe+0x49/0x92 [1.799817] [811adf39] ? pci_device_probe+0xc2/0xef [1.799879] [8123439e] ? driver_sysfs_add+0x66/0x8d [1.799940] [812344df] ? driver_probe_device+0xa8/0x138 [1.81] [812345be] ? __driver_attach+0x4f/0x6f [1.800074] [8123456f] ? __driver_attach+0x0/0x6f [1.800135] [81233b68] ? bus_for_each_dev+0x44/0x78 [1.800196] [81233fc0] ? bus_add_driver+0xa8/0x1f0 [1.800257] [81234865] ? driver_register+0x90/0xf8 [1.800318] [811ae183] ? __pci_register_driver+0x4e/0xc0 [1.800391] [a01b7000] ? e1000_init_module+0x0/0x82 [e1000] [1.800463] [a01b704c] ? e1000_init_module+0x4c/0x82 [e1000] [1.800526] [81002079] ? do_one_initcall+0x78/0x131 [1.800587] [8107588c] ? sys_init_module+0x97/0x1d3 [1.800648] [81033073] ? ia32_sysret+0x0/0x5 [1.800707] ---[ end trace 5be424714a9ab14f ]--- [1.800764] netif_stop_queue() cannot be called before register_netdev() [1.801749] e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection [1.801913] e1000 :06:07.0: PCI INT A - GSI 64 (level, low) - IRQ 64 [1.861060] e1000: :06:07.0: e1000_probe: (PCI:66MHz:32-bit) 00:14:22:16:b6:7b [2.110075] [ cut here ] [2.110147] WARNING: at /usr/src/linux-headers-2.6.37-1-common/include/linux/netdevice.h:1557 netif_tx_stop_queue+0x24/0x40 [e1000]() [2.110226] Hardware name: PowerEdge 1850 [2.110281] Modules linked in: sg sr_mod cdrom ata_generic ata_piix libata mptspi(+) mptscsih uhci_hcd mptbase floppy scsi_transport_spi scsi_mod e1000(+) ehci_hcd usbcore nls_base [2.111064] Pid: 137, comm: modprobe Tainted: GW
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: Can you disable tso? Disabled. I'll get back with results later. If you already doing ignore_64bit_dma=1 then , we should make sure the w/o is working correctly. Please make sure that it's sets correctly by #cat /sys/module/e1000/parameters/ignore_64bit_dma It should return 1. It does. I have had 'ignore_64bit_dma' enabled for the last month. However, this was enabled using echo 1 /sys/module/e1000/parameters/ignore_64bit_dma rather than passing the parameter att module load time. I interpreted the driver README as it should have the same effect? -- Björn -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: Can you please disable gso? (#ethtool -K ethx gso off) It didn't make any obvious difference. Here's a log of 21 hangs with gso off: http://bjorn.haxx.se/e1000/gso-off-kern.log.bz2 # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: off large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: off -- Björn -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: Also I noticed that you are running debian kernel linux-image-2.6.37-1-amd64. This is an unstable debian kernel. Have you seen the tx hang with latest stable debian release (Squeeze) which has 2.6.32 kernel? Yes I have. I normally run the 'testing' kernel on this machine. I only upgraded to the unstable kernel because I saw the subject patch and hoped it would help my system too. So far, I have not see tx hang with my setup. Do you want to trade? :-) I have uploaded a new log for your viewing pleasure, with debug data from 352 hangs over the past week: http://bjorn.haxx.se/e1000/kern.log.bz2 Let me know if there is anything else I can do. -- Björn -- Forrester Wave Report - Recovery time is now measured in hours and minutes not days. Key insights are discussed in the 2010 Forrester Wave Report as part of an in-depth evaluation of disaster recovery service providers. Forrester found the best-in-class provider in terms of services and vision. Read this report now! http://p.sf.net/sfu/ibm-webcastpromo ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Björn Stenberg wrote: Yes I have. I normally run the 'testing' kernel on this machine. I only upgraded to the unstable kernel because I saw the subject patch and hoped it would help my system too. I grepped some logs to see how long I've been having this problem, and found IRC discussions mentioning it in september 2009. If I'm reading Debian release notes right, at that time I was most likely running 2.6.30. It is perhaps also worth noting that this machine moved to a different datacenter in the spring of 2010. So it got the same issues in two different network environments (different sites, different ISPs). -- Björn -- Forrester Wave Report - Recovery time is now measured in hours and minutes not days. Key insights are discussed in the 2010 Forrester Wave Report as part of an in-depth evaluation of disaster recovery service providers. Forrester found the best-in-class provider in terms of services and vision. Read this report now! http://p.sf.net/sfu/ibm-webcastpromo ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: Sorry for this to taking so long in coming, however I have not been able to reproduced the issue locally in lab. Are you running any tools or application that helps reproducing this problem? How long it takes this issue to occurs? The machine is running a number of network services, but nothing exotic: rsync, svn, git, smtp, imap, ftp, ssh, cvs, http. The vast majority of the traffic (~1 TB/month) is http. I have not seen any correlation between a specific service and the issue occurring. The issue seems to take something like 10-12 hours after reboot to occur. After that it's quite sporadic, sometimes many hours between hangs, sometimes seconds. See earlier in this thread for logs. -- Björn -- Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: At this point I think a bus trace can be very helpful. We will be trying to repro this issue locally. If we have successful repro I can capture bus trace and look into more details of the cause. Yesterday I moved the network cables to the internal ports, which are connected via the PCI bus rather than the PCI-X bus. Unfortunately this did not improve the situation. I still get Tx hangs. Let me know if there is anything more I can do to help. -- Björn -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
Dave, Tushar N wrote: While I continue analyzing the dump , can you try setting pci=nommconf and boot the kernel to see of it makes any difference. Tested now. I detect no difference. Kernel log available here: http://bjorn.haxx.se/e1000/nommconf-kern.log.bz2 Also this could be an issue with 64bit system only if you have 32 bit kernel live cd it's worth trying it. Unfortunately this machine lives in a remote data center so I have limited physical access to it. It certainly seems to be a 64-bit issue. It started happening when increasing RAM from 2 to 6 GB, the same as others have reported. I have also tried setting /sys/module/e1000/parameters/ignore_64bit_dma to 1 but it makes no difference. -- Björn -- What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel#174; Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] e1000: fix Tx hangs by disabling 64-bit DMA
[Apologies for the top-post and huge quote. I'm including everything in this first list post to provide full history.] While I don't have a recipe for reproducing the hang, I get it quite frequently: # dmesg | grep Tx Unit Hang [41236.796234] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [41238.796456] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [41240.796175] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [41242.796146] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [62224.808317] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [62226.808288] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [62228.808257] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64685.784121] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64687.784089] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64689.784309] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64691.784283] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64733.784322] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64735.784295] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [64737.784264] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [131620.784217] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [131622.784186] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [131624.784159] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [131626.784135] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [131628.784103] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [131630.784323] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156240.784131] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156242.784106] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156244.784073] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156246.784295] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156325.796095] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156327.796317] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156329.796288] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156331.796259] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [156333.796231] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [165260.820271] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [165262.820242] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [165264.820215] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [165266.820182] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [195756.784165] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [195758.784142] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [195760.784113] e1000 :03:0b.1: eth2: Detected Tx Unit Hang [195762.784329] e1000 :03:0b.1: eth2: Detected Tx Unit Hang I am running _with_ the below patch and still get the issue. Presumably because my NIC is on the PCI-X bus. I don't have flow control configured, but the driver reports RX flow control each time the link comes back up: e1000: eth2 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX Thanks, /Björn Brandeburg, Jesse wrote: Hi Bjorn, We haven't had any reports of reproducible tx hangs on PCI-X slots. We can hardly reproduce the issue on the PCI adapters (read we've had a few reports, but most everyone works fine, and *all* our test machines are fine) We can try to repro this issue here, were you able to verify that the patch mentioned below fixes your issue or do you need a test patch? tx hangs at 100Mb are kind of surprising. Do you happen to have flow control enabled? I'd prefer in the future if you could keep this stuff on e1000-devel as it allows others to contribute as well. Jesse -Original Message- From: Björn Stenberg [mailto:bj...@haxx.se] Sent: Tuesday, February 22, 2011 4:54 AM To: Brandeburg, Jesse Subject: e1000: fix Tx hangs by disabling 64-bit DMA Hi. I have a Dell PowerEdge 1850 system with a PCI-X add-on network where this bug keeps occuring even after upgrading to kernel 2.6.37-1. I noticed the commit message said DMA was only disabled in PCI, not PCI-X mode. Can I ask why? http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commitdiff;h=e508be174ad36b0cf9b324cd04978c2b13c21502 The add-on PCI-X card was added in an attempt to troubleshoot this bug. Should I go back to using the internal interfaces and avoid the 6700PXH bridge? Is there any information I can provide to help investigate this issue? I am using the debian kernel linux-image-2.6.37-1-amd64. The machine has 6GB ram. Here is my lscpi output: 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09) 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09) 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09) 00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 09) 00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 09) 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB