Re: [E1000-devel] DCA/IOAT problem

2009-01-26 Thread Paweł Staszewski
Brandeburg, Jesse pisze:
> Forwarding entire message to include e1000-devel
>
> Pawel Staszewski wrote:
>   
>> Hello
>>
>> Some lspci:
>> lspci
>> 00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev 90)
>> 00:02.0 PCI bridge: Intel Corporation PCI Express x8 Port 2-3 (rev 90)
>> 00:04.0 PCI bridge: Intel Corporation PCI Express x16 Port 4-7 (rev
>> 90) 00:08.0 System peripheral: Intel Corporation DMA Engine (rev 90)
>> 00:10.0 Host bridge: Intel Corporation FSB Registers (rev 90)
>> 00:10.1 Host bridge: Intel Corporation FSB Registers (rev 90)
>> 00:10.2 Host bridge: Intel Corporation FSB Registers (rev 90)
>> 00:11.0 Host bridge: Intel Corporation Reserved Registers (rev 90)
>> 00:13.0 Host bridge: Intel Corporation Reserved Registers (rev 90)
>> 00:15.0 Host bridge: Intel Corporation DDR Channel 0 Registers (rev
>> 90) 00:16.0 Host bridge: Intel Corporation DDR Channel 1 Registers
>> (rev 90) 00:1a.0 USB Controller: Intel Corporation USB UHCI
>> Controller #4 (rev 02) 00:1a.7 USB Controller: Intel Corporation USB2
>> EHCI Controller #2 (rev 02) 00:1c.0 PCI bridge: Intel Corporation PCI
>> Express Port 1 (rev 02) 00:1c.4 PCI bridge: Intel Corporation PCI
>> Express Port 5 (rev 02) 00:1c.5 PCI bridge: Intel Corporation PCI
>> Express Port 6 (rev 02) 00:1d.0 USB Controller: Intel Corporation USB
>> UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation
>> USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel
>> Corporation USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller:
>> Intel Corporation USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI
>> bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA
>> bridge: Intel Corporation LPC Interface Controller (rev 02) 00:1f.2
>> IDE interface: Intel Corporation 4 port SATA IDE Controller (rev 02)
>> 00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02) 00:1f.5
>> IDE interface: Intel Corporation 2 port SATA IDE Controller (rev 02)
>> 01:00.0 Ethernet controller: Intel Corporation Device 10dd (rev 01)
>> 04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit
>> Ethernet Controller (Copper) (rev 03) 05:00.0 Ethernet controller:
>> Intel Corporation 82573L Gigabit Ethernet Controller
>> 06:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev
>> 02)
>> and then:
>> modprobe ioatdma
>>
>> dmesg:
>> ...
>> ...
>> IPv4 FIB: Using LC-trie version 0.408
>> e1000e :04:00.0: irq 1274 for MSI/MSI-X
>> e1000e :04:00.0: irq 1274 for MSI/MSI-X
>> e1000e :05:00.0: irq 1273 for MSI/MSI-X
>> e1000e :05:00.0: irq 1273 for MSI/MSI-X
>> :05:00.0: eth1: Link is Up 1000 Mbps Full Duplex, Flow Control:
>> RX/TX dca service started, version 1.4
>> ioatdma :00:08.0: can't find IRQ for PCI INT A; probably buggy MP
>> table ioatdma :00:08.0: setting latency timer to 64
>> ioatdma :00:08.0: Intel(R) I/OAT DMA Engine found, 4 channels,
>> device version 0x12, driver version 3.30
>> ioatdma :00:08.0: irq 1255 for MSI/MSI-X
>> ioatdma :00:08.0: DCA is disabled in BIOS
>> ixgbe: eth2: ixgbe_watchdog_task: NIC Link is Up 10 Gbps, Flow
>> Control: None e1000e :04:00.0: irq 1274 for MSI/MSI-X
>> e1000e :04:00.0: irq 1274 for MSI/MSI-X
>>
>> ...
>> ...
>>
>>
>> lspvi -vvv (for dma engine)
>> 00:08.0 System peripheral: Intel Corporation DMA Engine (rev 90)
>> Subsystem: Super Micro Computer Inc Device de80
>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> SERR- > Latency: 0
>> Interrupt: pin A routed to IRQ 1255
>> Region 0: Memory at fe70 (64-bit, non-prefetchable)
>> [size=1K] Capabilities: [50] Power Management version 2
>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit-
>> Queue=0/0 Enable+
>> Address: feeff00c  Data: 41b2
>> Capabilities: [6c] Express (v1) Root Complex Integrated
>> Endpoint, MSI 00
>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>> <64ns, L1 <1us
>> ExtTag- RBE- FLReset-
>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal+
>> Unsupported-
>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>> MaxPayload 128 bytes, MaxReadReq 128 bytes
>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
>> AuxPwr- TransPend-
>> LnkCap: Port #0, Speed unknown, Width x0, ASPM
>> unknown, Latency L0 <64ns, L1 <1us
>> ClockPM- Suprise- LLActRep- BwNot-
>> LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed unkno

Re: [E1000-devel] Linux 2.6.27.13

2009-01-26 Thread Brandeburg, Jesse
Greg KH wrote:
> On Mon, Jan 26, 2009 at 09:01:36PM +0100, Jesper Krogh wrote:
>> Greg KH wrote:
>>> We (the -stable team) are announcing the release of the 2.6.27.13
>>> kernel. It contains a wide range of bugfixes, and all users of the
>>> 2.6.27 kernel series are strongly encouraged to upgrade.
>>> I'll also be replying to this message with a copy of the patch
>>> between 
>>> 2.6.27.12 and 2.6.27.13
>> 
>> Hi.
>> 
>> I'm getting some e1000 noise on a 2.6.27.6, I search the log up to
>> .13 but couldn't find any log messsage that looked like it fixed it.
>> 
>> 
>> [862734.501786] [ cut here ]
>> [862734.501793] WARNING: at net/sched/sch_generic.c:219
>> dev_watchdog+0x1f8/0x210() [862734.501795] NETDEV WATCHDOG: eth0
>> (e1000): transmit timed out 
> 
> I've been getting a lot of reports about this as well.  Did it show up
> in 2.6.27.6?
> 
> Netdev developers, any ideas of what would be causing this?

no immediate idea, but a quick test to help isolate which functionality
could be causing problems is to disable TSO on all four interfaces using
ethtool.

It could be that GSO is somehow playing into this as well, but I don't
know why (you could disable it with ethtool too).

It could be unrelated but I've noticed that TCP window size can grow much
larger now than it used to (especially talking to LRO enabled clients) 
and this might cause some kind of an overflow in the TCP transmit
offloading hardware in the e1000 parts.


>> 
>> Complete dmesg here:
>> http://krogh.cc/~jesper/dmesg-2.6.27.6.txt
>> 
>> The system is running with bonded interfaces with  (lspci output)
>> 06:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit
>> Ethernet Controller (Copper) (rev 03) 06:01.1 Ethernet controller:
>> Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev
>> 03) 06:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit
>> Ethernet Controller (Copper) (rev 03) 06:02.1 Ethernet controller:
>> Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev
>> 03)   
>> 
>> The system is still "fully functional", and I havent notiched
>> anything wrong, but there sure is a lot of link ups and downs on
>> that bond. 

in your log I saw one tx timeout for each interface, one first one by itself
and then several more all within a few minutes, but then no more for
a really long time.

My first reaction is to ask you what test you're running, and ask you to
run the e1000_dump code (see google) to dump the tx descriptor rings at 
the time of failure.

I can get you that code with updates if you're willing to test, but 
it might take a couple of days.

Jesse
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] recommended reporting/resolution method

2009-01-26 Thread Tantilov, Emil S
Hi Corey,

Please use the tracker on e1000.sf.net.

You may also want to look through the bugs as there have been reports of Tx 
hangs with some chipsets that may be similar to yours.

Thanks,
Emil

-Original Message-
From: Corey Wright [mailto:undefi...@pobox.com]
Sent: Monday, January 26, 2009 8:19 AM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] recommended reporting/resolution method

intel developers,

what's the recommend procedure for getting assistance with a "Intel
Corporation 82540EM Gigabit Ethernet Controller [8086:100e] (rev 02)" that
inevitably, eventually reports "e1000: eth1: e1000_clean_tx_irq: Detected
Tx Unit Hang" and stops functioning using 7.3.20-k2-NAPI (from debian's
2.6.26 kernel package), 8.0.6-NAPI, and 8.0.6-stepfix-NAPI (with
TxDescriptorStep=4)?

mailing list?  bug tracker?  irc?

i don't know if mailing a bunch of attachments (eg lspci, dmesg) to the
list is recommended or if there's already enough similar bug reports in the
bug tracker to avoid making another one.

please advise.

thanks for the excellent hardware and driver (as this nic model runs
perfectly in two other computers)!

corey
--
undefi...@pobox.com

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


[E1000-devel] recommended reporting/resolution method

2009-01-26 Thread Corey Wright
intel developers,

what's the recommend procedure for getting assistance with a "Intel
Corporation 82540EM Gigabit Ethernet Controller [8086:100e] (rev 02)" that
inevitably, eventually reports "e1000: eth1: e1000_clean_tx_irq: Detected
Tx Unit Hang" and stops functioning using 7.3.20-k2-NAPI (from debian's
2.6.26 kernel package), 8.0.6-NAPI, and 8.0.6-stepfix-NAPI (with
TxDescriptorStep=4)?

mailing list?  bug tracker?  irc?

i don't know if mailing a bunch of attachments (eg lspci, dmesg) to the
list is recommended or if there's already enough similar bug reports in the
bug tracker to avoid making another one.

please advise.

thanks for the excellent hardware and driver (as this nic model runs
perfectly in two other computers)!

corey
-- 
undefi...@pobox.com

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


[E1000-devel] sk_over_panic in igb when changing mtu

2009-01-26 Thread Igor Smolyar
Hello,

I am experiencing skb_over_panic in igb driver when changing mtu in
vanilla 2.6.28.2.
This is happennig on Intel PRO/1000 VT quad and PRO/1000 EF dual.

Connection (telnet) opened from remote host through gateway with igb driver.
Then I change mtu to 900, trying to pass some traffic on same
connection and gateway crash.

Same happened with latest igb 1.3.8.6 from e1000.sf.net

[r...@gateway ~]# ifconfig eth2 mtu 900
[r...@gateway ~]# skb_over_panic: text:f7fc2d6a len:1432 put:408
head:f34b6770 data:f34b6782 tail:0xf34b6d1a end:0xf34b6bf0 dev:eth2
[ cut here ]
kernel BUG at net/core/skbuff.c:128!
invalid opcode:  [#1] SMP
last sysfs file:
/sys/devices/pci:00/:00:04.0/:05:00.0/host0/target0:1:0/0:1:0:0/vendor
Modules linked in: ipv6 autofs4 sunrpc dm_multipath scsi_dh rfkill
input_polldev sbs sbshc battery ac parport_pc lp parport sg dcdbas
sr_mod cdrom serio_raw igb button rtc_cmos rtc
_core rtc_lib tg3 libphy pcspkr dm_snapshot dm_zero dm_mirror
dm_region_hash dm_log dm_mod ata_piix libata mptsas mptscsih
scsi_transport_sas mptbase sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
ehci_hcd [last unloaded: microcode]

Pid: 0, comm: swapper Not tainted (2.6.28.2 #1) PowerEdge R300
EIP: 0060:[] EFLAGS: 00010296 CPU: 0
EIP is at skb_over_panic+0x3a/0x41
EAX: 0076 EBX: f57f1000 ECX: c064f121 EDX: 0168a000
ESI: f57f1000 EDI: f4ab7e38 EBP: c096ff24 ESP: c096fef8
DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process swapper (pid: 0, ti=c096f000 task=c07883e0 task.ti=c07f6000)
Stack:
 c075fb32 f7fc2d6a 0598 0198 f34b6770 f34b6782 f34b6d1a f34b6bf0
 f57f1000 f4ab7e38 f34b6b82 c096ff34 c05d7137  f6dc0648 c096ff9c
 f7fc2d6a 0040 c096ffa8 f57ade3c 8840  0001 
Call Trace:
 [] ? igb_clean_rx_irq_adv+0x10f/0x412 [igb]
 [] ? skb_put+0x35/0x3b
 [] ? igb_clean_rx_irq_adv+0x10f/0x412 [igb]
 [] ? igb_clean_rx_ring_msix+0x2d/0x137 [igb]
 [] ? net_rx_action+0xb5/0x18e
 [] ? __do_softirq+0x85/0x138
 [] ? __do_softirq+0x0/0x138
  <0> [] ? handle_edge_irq+0x0/0x105
 [] ? irq_exit+0x44/0x46
 [] ? do_IRQ+0xea/0x101
 [] ? common_interrupt+0x28/0x30
 [] ? find_usage_forwards+0x2b/0x84
 [] ? acpi_idle_enter_simple+0x190/0x201
 [] ? acpi_idle_enter_bm+0xc8/0x312
 [] ? cpuidle_idle_call+0x5f/0x98
 [] ? cpu_idle+0x66/0x7a
 [] ? rest_init+0x4e/0x50
Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 00 00 00 ff b0 a0 00 00
00 ff b0 9c 00 00 00 52 ff 70 50 51 68 32 fb 75 c0 e8 4b 16 e5 ff <0f>
0b 83 c4 24 eb fe 55 89 e5 57 56 0f b7 f
2 53 89 c3 83 ec 04
EIP: [] skb_over_panic+0x3a/0x41 SS:ESP 0068:c096fef8
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 15 seconds..


Any ideas?

Thanks ,Igor.

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel