Re: [E1000-devel] DCA/IOAT problem
Brandeburg, Jesse pisze: > Forwarding entire message to include e1000-devel > > Pawel Staszewski wrote: > >> Hello >> >> Some lspci: >> lspci >> 00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev 90) >> 00:02.0 PCI bridge: Intel Corporation PCI Express x8 Port 2-3 (rev 90) >> 00:04.0 PCI bridge: Intel Corporation PCI Express x16 Port 4-7 (rev >> 90) 00:08.0 System peripheral: Intel Corporation DMA Engine (rev 90) >> 00:10.0 Host bridge: Intel Corporation FSB Registers (rev 90) >> 00:10.1 Host bridge: Intel Corporation FSB Registers (rev 90) >> 00:10.2 Host bridge: Intel Corporation FSB Registers (rev 90) >> 00:11.0 Host bridge: Intel Corporation Reserved Registers (rev 90) >> 00:13.0 Host bridge: Intel Corporation Reserved Registers (rev 90) >> 00:15.0 Host bridge: Intel Corporation DDR Channel 0 Registers (rev >> 90) 00:16.0 Host bridge: Intel Corporation DDR Channel 1 Registers >> (rev 90) 00:1a.0 USB Controller: Intel Corporation USB UHCI >> Controller #4 (rev 02) 00:1a.7 USB Controller: Intel Corporation USB2 >> EHCI Controller #2 (rev 02) 00:1c.0 PCI bridge: Intel Corporation PCI >> Express Port 1 (rev 02) 00:1c.4 PCI bridge: Intel Corporation PCI >> Express Port 5 (rev 02) 00:1c.5 PCI bridge: Intel Corporation PCI >> Express Port 6 (rev 02) 00:1d.0 USB Controller: Intel Corporation USB >> UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation >> USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel >> Corporation USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: >> Intel Corporation USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI >> bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA >> bridge: Intel Corporation LPC Interface Controller (rev 02) 00:1f.2 >> IDE interface: Intel Corporation 4 port SATA IDE Controller (rev 02) >> 00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02) 00:1f.5 >> IDE interface: Intel Corporation 2 port SATA IDE Controller (rev 02) >> 01:00.0 Ethernet controller: Intel Corporation Device 10dd (rev 01) >> 04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit >> Ethernet Controller (Copper) (rev 03) 05:00.0 Ethernet controller: >> Intel Corporation 82573L Gigabit Ethernet Controller >> 06:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev >> 02) >> and then: >> modprobe ioatdma >> >> dmesg: >> ... >> ... >> IPv4 FIB: Using LC-trie version 0.408 >> e1000e :04:00.0: irq 1274 for MSI/MSI-X >> e1000e :04:00.0: irq 1274 for MSI/MSI-X >> e1000e :05:00.0: irq 1273 for MSI/MSI-X >> e1000e :05:00.0: irq 1273 for MSI/MSI-X >> :05:00.0: eth1: Link is Up 1000 Mbps Full Duplex, Flow Control: >> RX/TX dca service started, version 1.4 >> ioatdma :00:08.0: can't find IRQ for PCI INT A; probably buggy MP >> table ioatdma :00:08.0: setting latency timer to 64 >> ioatdma :00:08.0: Intel(R) I/OAT DMA Engine found, 4 channels, >> device version 0x12, driver version 3.30 >> ioatdma :00:08.0: irq 1255 for MSI/MSI-X >> ioatdma :00:08.0: DCA is disabled in BIOS >> ixgbe: eth2: ixgbe_watchdog_task: NIC Link is Up 10 Gbps, Flow >> Control: None e1000e :04:00.0: irq 1274 for MSI/MSI-X >> e1000e :04:00.0: irq 1274 for MSI/MSI-X >> >> ... >> ... >> >> >> lspvi -vvv (for dma engine) >> 00:08.0 System peripheral: Intel Corporation DMA Engine (rev 90) >> Subsystem: Super Micro Computer Inc Device de80 >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >> ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0 >> Interrupt: pin A routed to IRQ 1255 >> Region 0: Memory at fe70 (64-bit, non-prefetchable) >> [size=1K] Capabilities: [50] Power Management version 2 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >> PME(D0+,D1-,D2-,D3hot+,D3cold+) >> Status: D0 PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit- >> Queue=0/0 Enable+ >> Address: feeff00c Data: 41b2 >> Capabilities: [6c] Express (v1) Root Complex Integrated >> Endpoint, MSI 00 >> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s >> <64ns, L1 <1us >> ExtTag- RBE- FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ >> Unsupported- >> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 128 bytes, MaxReadReq 128 bytes >> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- >> AuxPwr- TransPend- >> LnkCap: Port #0, Speed unknown, Width x0, ASPM >> unknown, Latency L0 <64ns, L1 <1us >> ClockPM- Suprise- LLActRep- BwNot- >> LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk- >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed unkno
Re: [E1000-devel] Linux 2.6.27.13
Greg KH wrote: > On Mon, Jan 26, 2009 at 09:01:36PM +0100, Jesper Krogh wrote: >> Greg KH wrote: >>> We (the -stable team) are announcing the release of the 2.6.27.13 >>> kernel. It contains a wide range of bugfixes, and all users of the >>> 2.6.27 kernel series are strongly encouraged to upgrade. >>> I'll also be replying to this message with a copy of the patch >>> between >>> 2.6.27.12 and 2.6.27.13 >> >> Hi. >> >> I'm getting some e1000 noise on a 2.6.27.6, I search the log up to >> .13 but couldn't find any log messsage that looked like it fixed it. >> >> >> [862734.501786] [ cut here ] >> [862734.501793] WARNING: at net/sched/sch_generic.c:219 >> dev_watchdog+0x1f8/0x210() [862734.501795] NETDEV WATCHDOG: eth0 >> (e1000): transmit timed out > > I've been getting a lot of reports about this as well. Did it show up > in 2.6.27.6? > > Netdev developers, any ideas of what would be causing this? no immediate idea, but a quick test to help isolate which functionality could be causing problems is to disable TSO on all four interfaces using ethtool. It could be that GSO is somehow playing into this as well, but I don't know why (you could disable it with ethtool too). It could be unrelated but I've noticed that TCP window size can grow much larger now than it used to (especially talking to LRO enabled clients) and this might cause some kind of an overflow in the TCP transmit offloading hardware in the e1000 parts. >> >> Complete dmesg here: >> http://krogh.cc/~jesper/dmesg-2.6.27.6.txt >> >> The system is running with bonded interfaces with (lspci output) >> 06:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit >> Ethernet Controller (Copper) (rev 03) 06:01.1 Ethernet controller: >> Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev >> 03) 06:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit >> Ethernet Controller (Copper) (rev 03) 06:02.1 Ethernet controller: >> Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev >> 03) >> >> The system is still "fully functional", and I havent notiched >> anything wrong, but there sure is a lot of link ups and downs on >> that bond. in your log I saw one tx timeout for each interface, one first one by itself and then several more all within a few minutes, but then no more for a really long time. My first reaction is to ask you what test you're running, and ask you to run the e1000_dump code (see google) to dump the tx descriptor rings at the time of failure. I can get you that code with updates if you're willing to test, but it might take a couple of days. Jesse -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] recommended reporting/resolution method
Hi Corey, Please use the tracker on e1000.sf.net. You may also want to look through the bugs as there have been reports of Tx hangs with some chipsets that may be similar to yours. Thanks, Emil -Original Message- From: Corey Wright [mailto:undefi...@pobox.com] Sent: Monday, January 26, 2009 8:19 AM To: e1000-devel@lists.sourceforge.net Subject: [E1000-devel] recommended reporting/resolution method intel developers, what's the recommend procedure for getting assistance with a "Intel Corporation 82540EM Gigabit Ethernet Controller [8086:100e] (rev 02)" that inevitably, eventually reports "e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang" and stops functioning using 7.3.20-k2-NAPI (from debian's 2.6.26 kernel package), 8.0.6-NAPI, and 8.0.6-stepfix-NAPI (with TxDescriptorStep=4)? mailing list? bug tracker? irc? i don't know if mailing a bunch of attachments (eg lspci, dmesg) to the list is recommended or if there's already enough similar bug reports in the bug tracker to avoid making another one. please advise. thanks for the excellent hardware and driver (as this nic model runs perfectly in two other computers)! corey -- undefi...@pobox.com -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
[E1000-devel] recommended reporting/resolution method
intel developers, what's the recommend procedure for getting assistance with a "Intel Corporation 82540EM Gigabit Ethernet Controller [8086:100e] (rev 02)" that inevitably, eventually reports "e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang" and stops functioning using 7.3.20-k2-NAPI (from debian's 2.6.26 kernel package), 8.0.6-NAPI, and 8.0.6-stepfix-NAPI (with TxDescriptorStep=4)? mailing list? bug tracker? irc? i don't know if mailing a bunch of attachments (eg lspci, dmesg) to the list is recommended or if there's already enough similar bug reports in the bug tracker to avoid making another one. please advise. thanks for the excellent hardware and driver (as this nic model runs perfectly in two other computers)! corey -- undefi...@pobox.com -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
[E1000-devel] sk_over_panic in igb when changing mtu
Hello, I am experiencing skb_over_panic in igb driver when changing mtu in vanilla 2.6.28.2. This is happennig on Intel PRO/1000 VT quad and PRO/1000 EF dual. Connection (telnet) opened from remote host through gateway with igb driver. Then I change mtu to 900, trying to pass some traffic on same connection and gateway crash. Same happened with latest igb 1.3.8.6 from e1000.sf.net [r...@gateway ~]# ifconfig eth2 mtu 900 [r...@gateway ~]# skb_over_panic: text:f7fc2d6a len:1432 put:408 head:f34b6770 data:f34b6782 tail:0xf34b6d1a end:0xf34b6bf0 dev:eth2 [ cut here ] kernel BUG at net/core/skbuff.c:128! invalid opcode: [#1] SMP last sysfs file: /sys/devices/pci:00/:00:04.0/:05:00.0/host0/target0:1:0/0:1:0:0/vendor Modules linked in: ipv6 autofs4 sunrpc dm_multipath scsi_dh rfkill input_polldev sbs sbshc battery ac parport_pc lp parport sg dcdbas sr_mod cdrom serio_raw igb button rtc_cmos rtc _core rtc_lib tg3 libphy pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata mptsas mptscsih scsi_transport_sas mptbase sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 0, comm: swapper Not tainted (2.6.28.2 #1) PowerEdge R300 EIP: 0060:[] EFLAGS: 00010296 CPU: 0 EIP is at skb_over_panic+0x3a/0x41 EAX: 0076 EBX: f57f1000 ECX: c064f121 EDX: 0168a000 ESI: f57f1000 EDI: f4ab7e38 EBP: c096ff24 ESP: c096fef8 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process swapper (pid: 0, ti=c096f000 task=c07883e0 task.ti=c07f6000) Stack: c075fb32 f7fc2d6a 0598 0198 f34b6770 f34b6782 f34b6d1a f34b6bf0 f57f1000 f4ab7e38 f34b6b82 c096ff34 c05d7137 f6dc0648 c096ff9c f7fc2d6a 0040 c096ffa8 f57ade3c 8840 0001 Call Trace: [] ? igb_clean_rx_irq_adv+0x10f/0x412 [igb] [] ? skb_put+0x35/0x3b [] ? igb_clean_rx_irq_adv+0x10f/0x412 [igb] [] ? igb_clean_rx_ring_msix+0x2d/0x137 [igb] [] ? net_rx_action+0xb5/0x18e [] ? __do_softirq+0x85/0x138 [] ? __do_softirq+0x0/0x138 <0> [] ? handle_edge_irq+0x0/0x105 [] ? irq_exit+0x44/0x46 [] ? do_IRQ+0xea/0x101 [] ? common_interrupt+0x28/0x30 [] ? find_usage_forwards+0x2b/0x84 [] ? acpi_idle_enter_simple+0x190/0x201 [] ? acpi_idle_enter_bm+0xc8/0x312 [] ? cpuidle_idle_call+0x5f/0x98 [] ? cpu_idle+0x66/0x7a [] ? rest_init+0x4e/0x50 Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 00 00 00 ff b0 a0 00 00 00 ff b0 9c 00 00 00 52 ff 70 50 51 68 32 fb 75 c0 e8 4b 16 e5 ff <0f> 0b 83 c4 24 eb fe 55 89 e5 57 56 0f b7 f 2 53 89 c3 83 ec 04 EIP: [] skb_over_panic+0x3a/0x41 SS:ESP 0068:c096fef8 Kernel panic - not syncing: Fatal exception in interrupt Rebooting in 15 seconds.. Any ideas? Thanks ,Igor. -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel