Re: 2.6.17-mm6

2006-07-07 Thread Reuben Farrelly



On 3/07/2006 10:03 p.m., Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm6/


- A major update to the e1000 driver.

- 1394 updates


Some minor breakage in the e1000...

Fedora Core release 5.90 (Test)
Kernel 2.6.17-mm6 on an x86_64

tornado.reub.net login: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  a
  TDT  1c
  next_to_use  1c
  next_to_clean8
buffer_info[next_to_clean]
  time_stamp   100027f1a
  next_to_watcha
  jiffies  1000281d4
  next_to_watch.status 0
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  a
  TDT  1c
  next_to_use  1c
  next_to_clean8
buffer_info[next_to_clean]
  time_stamp   100027f1a
  next_to_watcha
  jiffies  1000283c8
  next_to_watch.status 0
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  a
  TDT  1c
  next_to_use  1c
  next_to_clean8
buffer_info[next_to_clean]
  time_stamp   100027f1a
  next_to_watcha
  jiffies  1000285bc
  next_to_watch.status 0
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  a
  TDT  1c
  next_to_use  1c
  next_to_clean8
buffer_info[next_to_clean]
  time_stamp   100027f1a
  next_to_watcha
  jiffies  1000287b0
  next_to_watch.status 0
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue 0
  TDH  a
  TDT  1c
  next_to_use  1c
  next_to_clean8
buffer_info[next_to_clean]
  time_stamp   100027f1a
  next_to_watcha
  jiffies  1000289a4
  next_to_watch.status 0


A look through my switch logs and kernel logs over the last few days shows these 
messages and layer 2/link down disconnections every few hours or so, but of very 
short duration (I hadn't noticed until now).


This output above was under virtually no load.

Both the e1000 and switch port on the other end are doing RX and TX flow 
control.

The controller is a built in chip on an Intel D945GNT board.

01:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet 
Controller (Copper) (rev 03)

Subsystem: Intel Corporation Unknown device 3094
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR-

Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 313
Region 0: Memory at 4800 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 2000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)

Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 
Enable+
Address: fee0100c  Data: 4142
Capabilities: [e0] Express Endpoint IRQ 0
Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s 512ns, L1 64us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM unknown, Port 0
Link: Latency L0s 128ns, L1 64us
Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
Link: Speed 2.5Gb/s, Width x1

[EMAIL PROTECTED] log]# ethtool -i eth0
driver: e1000
version: 7.1.9-k2-NAPI
firmware-version: 1.0-5
bus-info: :01:00.0
[EMAIL PROTECTED] log]#

Where can I go from here to help debug this further?

reuben
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.17-mm6

2006-07-05 Thread Stefan Richter
On 7/4/2006 10:01 PM, Arjan van de Ven wrote:
 this is one for the networking people, and thus netdev

It's actually ieee1394 using net infrastructure for purposes which ar
unrelated to networking.

Furthermore...

 On Tue, 2006-07-04 at 21:53 +0200, Rafael J. Wysocki wrote:
 On Monday 03 July 2006 12:03, Andrew Morton wrote:
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm6/
  
  - A major update to the e1000 driver.
  - 1394 updates

...I believe it is unrelated to the 1394 updates new to -mm6.

 Just found this in dmesg:
 
 =
 [ INFO: inconsistent lock state ]
 -
 inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
 nscd/4929 [HC0[0]:SC0[1]:HE1:SE0] takes:
  (skb_queue_lock_key){++..}, at: [8044fe40] udp_ioctl+0x50/0xa0
 {in-hardirq-W} state was registered at:
   [8024b4fa] lock_acquire+0x8a/0xc0
   [80476e3f] _spin_lock_irqsave+0x3f/0x60
   [80408c25] skb_queue_tail+0x25/0x60
 
 ok so skb_queue_lock is used in a hardirq context
 
   [881c9517] queue_packet_complete+0x27/0x40 [ieee1394]
   [881c9d6b] hpsb_packet_sent+0xab/0x100 [ieee1394]
   [8822a4b5] dma_trm_reset+0x115/0x140 [ohci1394]
   [8822c512] ohci_devctl+0x1c2/0x540 [ohci1394]
   [881c9673] hpsb_bus_reset+0x43/0xb0 [ieee1394]
   [8822d7f6] ohci_irq_handler+0x416/0x830 [ohci1394]
   [802631ab] handle_IRQ_event+0x2b/0x70
   [80264dd4] handle_level_irq+0xc4/0x130
   [8020c762] do_IRQ+0x112/0x130
   [80209d90] common_interrupt+0x64/0x65
 irq event stamp: 4280
 hardirqs last  enabled at (4279): [8047606a] 
 trace_hardirqs_on_thunk+0x35/0x37
 hardirqs last disabled at (4278): [804760a1] 
 trace_hardirqs_off_thunk+0x35/0x67
 softirqs last  enabled at (4258): [804065b5] release_sock+0xd5/0xe0
 softirqs last disabled at (4280): [804764d1] 
 _spin_lock_bh+0x11/0x50
 
 other info that might help us debug this:
 no locks held by nscd/4929.
 
 stack backtrace:
 
 Call Trace:
  [8020ab9f] show_trace+0x9f/0x240
  [8020af75] dump_stack+0x15/0x20
  [80249e52] print_usage_bug+0x272/0x290
  [8024a0d7] mark_lock+0x267/0x5f0
  [8024a9a6] __lock_acquire+0x546/0xd10
  [8024b4fb] lock_acquire+0x8b/0xc0
  [804764f4] _spin_lock_bh+0x34/0x50
  [8044fe40] udp_ioctl+0x50/0xa0
 
 yet udp_ioctl takes it only for _bh
 
  [80457359] inet_ioctl+0x69/0x70
  [804033ac] sock_ioctl+0x22c/0x270
  [802a32b1] do_ioctl+0x31/0xa0
  [802a35db] vfs_ioctl+0x2bb/0x2e0
  [802a366a] sys_ioctl+0x6a/0xa0
  [8020985a] system_call+0x7e/0x83
  [2b2d76ab98a9]
 
 is this a real scenario, or is this a case of firewire is special and
 needs it's own rules?

Well, firewire is special, but that should already be addressed by this
patch: lockdep: annotate ieee1394 skb-queue-head locking
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=d378834840907326ac9d448056d957d13cc3718f

Why is there still a lockdep warning?

(Ieee1394 core's usage of the skb_* API is entirely unrelated to
networking; even if eth1394 was used.)
-- 
Stefan Richter
-=-=-==- -=== --=-=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.17-mm6

2006-07-05 Thread Stefan Richter
I wrote:
 (Ieee1394 core's usage of the skb_* API is entirely unrelated to
 networking; even if eth1394 was used.)

PS:
I wonder if it wouldn't be better to migrate ieee1394 core away from
skb_*. I didn't look thoroughly at it yet but the benefit of using this
API appears quite low to me.

We use it to keep track of IEEE 1394 transactions [ = outgoing request
 (incoming response || expiry)], with completion of transactions often
in-order due to mostly single-threaded usage, but sometimes out-of-order
(may happen regardless of multithreaded or single-threaded usage).
-- 
Stefan Richter
-=-=-==- -=== --=-=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.17-mm6

2006-07-05 Thread Ingo Molnar

* Stefan Richter [EMAIL PROTECTED] wrote:

 I wrote:
  (Ieee1394 core's usage of the skb_* API is entirely unrelated to
  networking; even if eth1394 was used.)
 
 PS:
 I wonder if it wouldn't be better to migrate ieee1394 core away from 
 skb_*. I didn't look thoroughly at it yet but the benefit of using 
 this API appears quite low to me.

yeah, it seems to be the wrong abstraction to use. It's also more 
expensive than necessary: e.g. skb-heads have a qlen field that is 
maintained in every list op - but the ieee1394 code does not make use of 
the queue-length information. Using list.h plus a spinlock should do the 
trick?

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.17-mm6

2006-07-04 Thread Arjan van de Ven
this is one for the networking people, and thus netdev


On Tue, 2006-07-04 at 21:53 +0200, Rafael J. Wysocki wrote:
 On Monday 03 July 2006 12:03, Andrew Morton wrote:
  
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm6/
  
  
  - A major update to the e1000 driver.
  
  - 1394 updates
 
 Just found this in dmesg:
 
 =
 [ INFO: inconsistent lock state ]
 -
 inconsistent {in-hardirq-W} - {hardirq-on-W} usage.
 nscd/4929 [HC0[0]:SC0[1]:HE1:SE0] takes:
  (skb_queue_lock_key){++..}, at: [8044fe40] udp_ioctl+0x50/0xa0
 {in-hardirq-W} state was registered at:
   [8024b4fa] lock_acquire+0x8a/0xc0
   [80476e3f] _spin_lock_irqsave+0x3f/0x60
   [80408c25] skb_queue_tail+0x25/0x60

ok so skb_queue_lock is used in a hardirq context

   [881c9517] queue_packet_complete+0x27/0x40 [ieee1394]
   [881c9d6b] hpsb_packet_sent+0xab/0x100 [ieee1394]
   [8822a4b5] dma_trm_reset+0x115/0x140 [ohci1394]
   [8822c512] ohci_devctl+0x1c2/0x540 [ohci1394]
   [881c9673] hpsb_bus_reset+0x43/0xb0 [ieee1394]
   [8822d7f6] ohci_irq_handler+0x416/0x830 [ohci1394]
   [802631ab] handle_IRQ_event+0x2b/0x70
   [80264dd4] handle_level_irq+0xc4/0x130
   [8020c762] do_IRQ+0x112/0x130
   [80209d90] common_interrupt+0x64/0x65
 irq event stamp: 4280
 hardirqs last  enabled at (4279): [8047606a] 
 trace_hardirqs_on_thunk+0x35/0x37
 hardirqs last disabled at (4278): [804760a1] 
 trace_hardirqs_off_thunk+0x35/0x67
 softirqs last  enabled at (4258): [804065b5] release_sock+0xd5/0xe0
 softirqs last disabled at (4280): [804764d1] _spin_lock_bh+0x11/0x50
 
 other info that might help us debug this:
 no locks held by nscd/4929.
 
 stack backtrace:
 
 Call Trace:
  [8020ab9f] show_trace+0x9f/0x240
  [8020af75] dump_stack+0x15/0x20
  [80249e52] print_usage_bug+0x272/0x290
  [8024a0d7] mark_lock+0x267/0x5f0
  [8024a9a6] __lock_acquire+0x546/0xd10
  [8024b4fb] lock_acquire+0x8b/0xc0
  [804764f4] _spin_lock_bh+0x34/0x50
  [8044fe40] udp_ioctl+0x50/0xa0

yet udp_ioctl takes it only for _bh

  [80457359] inet_ioctl+0x69/0x70
  [804033ac] sock_ioctl+0x22c/0x270
  [802a32b1] do_ioctl+0x31/0xa0
  [802a35db] vfs_ioctl+0x2bb/0x2e0
  [802a366a] sys_ioctl+0x6a/0xa0
  [8020985a] system_call+0x7e/0x83
  [2b2d76ab98a9]


is this a real scenario, or is this a case of firewire is special and
needs it's own rules?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html