Re: [PATCH] mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs
On 1/23/07, Dale Farnsworth <[EMAIL PROTECTED]> wrote: From Dale Farnsworth <[EMAIL PROTECTED]> mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs This bug was found and isolated by Thibaut VARENE <[EMAIL PROTECTED]> and Jarek Poplawski <[EMAIL PROTECTED]>. This patch is a modification of their fixes. We acquire and release the lock for each descriptor that is freed to minimize the time the lock is held. --- drivers/net/mv643xx_eth.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index c41ae42..b3bf864 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -332,13 +339,13 @@ int mv643xx_eth_free_tx_descs(struct net if (skb) mp->tx_skb[tx_index] = NULL; - spin_unlock_irqrestore(&mp->lock, flags); - if (cmd_sts & ETH_ERROR_SUMMARY) { printk("%s: Error in TX\n", dev->name); mp->stats.tx_errors++; } Note that this printk probably won't show immediately because IRQs are disabled. But that's maybe not a big deal. HTH -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/23/07, Thibaut VARENE <[EMAIL PROTECTED]> wrote: - As Jarek pointed out, you're checking twice the value of mp->tx_desc_count, which means dereferencing a pointer and accessing memory twice. I don't know how perf-critical this bit of code is, but I wonder which of keeping the lock for a long time or doing what you is better (I'm being anal and you probably know that better than me :) Forget that. That's an irq disabling lock, it's worse than anything else :) -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/22/07, Dale Farnsworth <[EMAIL PROTECTED]> wrote: Jarek and Thibaut, Thank you both very much for your work finding and fixing this bug. Jarek, can you verify that the following patch fixes the problem you were seeing? -Dale Hi Dale, The patch seems to work fine. Just thinking out loud (as I really don't know this part of the kernel), here are a few remarks: - As Jarek pointed out, you're checking twice the value of mp->tx_desc_count, which means dereferencing a pointer and accessing memory twice. I don't know how perf-critical this bit of code is, but I wonder which of keeping the lock for a long time or doing what you is better (I'm being anal and you probably know that better than me :) - Also, lines 344-349, in the test condition, cmd_sts (an indirection to mp content) is accessed (dunno if it's ok to do that outside of the lock), and on line 346, mp->stats.tx.errors is incremented outside of the spinlock protection. But then, I don't know what that lock is meant to protect, just pointing this out :) Thanks for your help, I hope the fix will go upstream asap :) And about being the author of the patch, since I'm not, I don't really mind 8) HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/21/07, Thibaut VARENE <[EMAIL PROTECTED]> wrote: On 1/11/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > PS: alas I didn't even check compiling - I had no time to > find all compile dependencies of this driver > --- Hmm, I think this is guaranteed not to work. In between those lines the lock is released, while data in the mp structure is still being accessed. It seems that this bit of code is indeed not race-safe though, I'm gonna try to figure something. This was indeed the right spot. The attached raw hack seems to fix the bug (I couldn't crash the box so far). I haven't checked that the same "situation" happens elsewhere in the code, I leave that as an exercise for the maintainers (or until I experience another kind of crash :) The patch is a bit ugly (printk with irq disabled will not show, etc) but at least it does work. I'm sure somebody will figure something HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ --- linux-2.6.19.orig/drivers/net/mv643xx_eth.c 2007-01-21 13:56:04.450689123 +0100 +++ linux-2.6.19/drivers/net/mv643xx_eth.c 2007-01-21 13:39:58.228404763 +0100 @@ -312,8 +312,8 @@ int count; int released = 0; + spin_lock_irqsave(&mp->lock, flags); while (mp->tx_desc_count > 0) { - spin_lock_irqsave(&mp->lock, flags); tx_index = mp->tx_used_desc_q; desc = &mp->p_tx_desc_area[tx_index]; cmd_sts = desc->cmd_sts; @@ -332,8 +332,6 @@ if (skb) mp->tx_skb[tx_index] = NULL; - spin_unlock_irqrestore(&mp->lock, flags); - if (cmd_sts & ETH_ERROR_SUMMARY) { printk("%s: Error in TX\n", dev->name); mp->stats.tx_errors++; @@ -349,6 +347,7 @@ released = 1; } + spin_unlock_irqrestore(&mp->lock, flags); return released; }
Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/11/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote: PS: alas I didn't even check compiling - I had no time to find all compile dependencies of this driver --- Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]> --- diff -Nurp linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c linux-2.6.20-rc4/drivers/net/mv643xx_eth.c --- linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c 2006-12-18 08:57:52.0 +0100 +++ linux-2.6.20-rc4/drivers/net/mv643xx_eth.c 2007-01-11 08:55:34.0 +0100 @@ -312,8 +312,8 @@ int mv643xx_eth_free_tx_descs(struct net int count; int released = 0; + spin_lock_irqsave(&mp->lock, flags); while (mp->tx_desc_count > 0) { - spin_lock_irqsave(&mp->lock, flags); tx_index = mp->tx_used_desc_q; desc = &mp->p_tx_desc_area[tx_index]; cmd_sts = desc->cmd_sts; @@ -348,8 +348,10 @@ int mv643xx_eth_free_tx_descs(struct net dev_kfree_skb_irq(skb); Hmm, I think this is guaranteed not to work. In between those lines the lock is released, while data in the mp structure is still being accessed. It seems that this bit of code is indeed not race-safe though, I'm gonna try to figure something. released = 1; + spin_lock_irqsave(&mp->lock, flags); } + spin_unlock_irqrestore(&mp->lock, flags); return released; } Ugh, this is really unclean... Taking a lock "for nothing" like that has a perf cost. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/9/07, Thibaut VARENE <[EMAIL PROTECTED]> wrote: On 1/9/07, Dale Farnsworth <[EMAIL PROTECTED]> wrote: > > Thank you Thibaut. Please try the following patch: > > From: Dale Farnsworth <[EMAIL PROTECTED]> > > Reserve one unused descriptor in the TX ring > to facilitate testing for when the ring is full. Dale, tried it and unfortunately: Also, I don't know if you read that bit, but everytime I reboot the box immediately after a crash, the NIC gets a bogus (always the same it seems) MAC address, and I have to reboot one more time to get back to the "normal" MAC address. Dunno if that hints anything though. HTH -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/9/07, Dale Farnsworth <[EMAIL PROTECTED]> wrote: On Tue, Jan 09, 2007 at 06:44:49PM +0100, Thibaut VARENE wrote: > On 1/9/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > >On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: > >... > >> I suspected both and changed both the disk and the ram for quality > >> parts, that I tested afterwards. Both passed thorough tests. > >> > >> Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > >> works absolutely fine. > > > >If you are not tired, I'd suggest two more tests: > > I volunteered to help :) Thank you Thibaut. Please try the following patch: From: Dale Farnsworth <[EMAIL PROTECTED]> Reserve one unused descriptor in the TX ring to facilitate testing for when the ring is full. Dale, tried it and unfortunately: Alucard login: [ cut here ] kernel BUG at drivers/net/mv643xx_eth.c:1071! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd vt8231 ohci1394 ieee13t NIP: C0210B40 LR: C02126DC CTR: C0212620 REGS: dd2d7b40 TRAP: 0700 Not tainted (2.6.20-rc4) MSR: 00021032 CR: 28242488 XER: TASK = da03c640[1775] 'ncftp' THREAD: dd2d6000 GPR00: DD2D7BF0 DA03C640 CFB16260 CFB16000 000B DF79FDD2 GPR08: 0BA9 0001 1000 0BAA 28242482 10056CD0 28004422 C03D9BF8 GPR16: DD2D6000 0001 CFB162BC 9032 GPR24: 05A8 C03E CFB16000 C0212620 CFCB3260 CFB16260 DF79FDA0 NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50 LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [DD2D7BF0] [DF79FDD0] 0xdf79fdd0 (unreliable) [DD2D7C30] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8 [DD2D7C50] [C02A1BF4] dev_queue_xmit+0x2bc/0x334 [DD2D7C70] [C02BC8A8] ip_output+0x120/0x244 [DD2D7C90] [C02BD8DC] ip_queue_xmit+0x17c/0x408 [DD2D7D00] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc [DD2D7D40] [C02C2FC0] tcp_cleanup_rbuf+0xb8/0x158 [DD2D7D50] [C02C5C14] tcp_recvmsg+0x4c0/0xbcc [DD2D7DB0] [C0294490] sock_common_recvmsg+0x3c/0x60 [DD2D7DD0] [C02920E4] sock_aio_read+0x10c/0x114 [DD2D7E30] [C006F210] do_sync_read+0xc4/0x138 [DD2D7EF0] [C006FECC] vfs_read+0x19c/0x1a4 [DD2D7F10] [C00702E4] sys_read+0x4c/0x90 [DD2D7F40] [C00122EC] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff5ba98 LR = 0x10032fc0 Instruction dump: 5400fffe 0f00 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850 7d694a78 91630020 7d290034 5529d97e <0f09> 7d034378 4e800020 2f840001 <0>Kernel panic - not syncing: Fatal exception in interrupt <0>Rebooting in 180 seconds..<4>atkbd.c: Spurious ACK on isa0060/serio0. Some . atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/9/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote: On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: ... > I suspected both and changed both the disk and the ram for quality > parts, that I tested afterwards. Both passed thorough tests. > > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > works absolutely fine. If you are not tired, I'd suggest two more tests: I volunteered to help :) For the sake of testing up-to-date code, I performed the following tests with 2.6.20-rc4. First test was the usual nfs video playback. Crashdump is panic-2.6.20-rc4-nfs.txt. Went down in about 20mn. - as above but with NIC set to 100Mbps also, Couldn't crash the machine (or at least it didn't happen in the time frame I was willing to wait for doing ftp downloads, ~20mn). One note though: The throughput of the card was terribly sucky when set in 100-FD: I couldn't get more than 5,5MB/s doing ftp get writing to /dev/null (to rule out disk perf), ie, half the max link speed, though the /only/ thing I changed in the setup was the link speed (same switch - made sure it properly detected link speed/duplex, same file server, same everything else). When configured in 1000-FD, still writing to /dev/null I could get about 60MB/s. Again half link speed, but there, I suppose that the remote fileserver couldn't pull data faster from the disks :) - long downloading but without nfs e.g. ftp That was fast and easy. In 1000-FD, I took down the box in 2s (after downloading 90MB). Crashdump is panic-2.6.20-rc4-ftp.txt (btw. there were some patches after 2.6.19 for rpc memory races). It seems that's something else. I think I also reproduced the bug while surfing the internet with firefox, but I didn't have serial line hooked to capture a dump, unfortunately. PS: Maintainers were cc-ed, I hope? Now they are :) HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ Debian GNU/Linux 4.0 Alucard ttyS0 Alucard login: [ cut here ] kernel BUG at drivers/net/mv643xx_eth.c:1071! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci1394 parport_pc pae NIP: C0210B40 LR: C02126DC CTR: C0212620 REGS: da247ac0 TRAP: 0700 Not tainted (2.6.20-rc4) MSR: 00021032 CR: 28222488 XER: TASK = db82a050[1780] 'ncftp' THREAD: da246000 GPR00: DA247B70 DB82A050 CFB14260 CFB14000 000B DED5FD72 GPR08: 0819 0001 1000 081A 48222422 10056CD0 28004422 C03D9BF8 GPR16: DA246000 0001 CFB142BC 9032 GPR24: C03E CFB14000 C0212620 DEDFD160 CFB14260 DED5FD40 NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50 LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [DA247B70] [DED5FD70] 0xded5fd70 (unreliable) [DA247BB0] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8 [DA247BD0] [C02A1BF4] dev_queue_xmit+0x2bc/0x334 [DA247BF0] [C02BC8A8] ip_output+0x120/0x244 [DA247C10] [C02BD8DC] ip_queue_xmit+0x17c/0x408 [DA247C80] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc [DA247CC0] [C02CBF80] __tcp_ack_snd_check+0x64/0xbc [DA247CD0] [C02CDA94] tcp_rcv_established+0x5d4/0x980 [DA247D00] [C02D4764] tcp_v4_do_rcv+0xe0/0x3c0 [DA247D30] [C0294B58] release_sock+0x7c/0xf4 [DA247D50] [C02C5C1C] tcp_recvmsg+0x4c8/0xbcc [DA247DB0] [C0294490] sock_common_recvmsg+0x3c/0x60 [DA247DD0] [C02920E4] sock_aio_read+0x10c/0x114 [DA247E30] [C006F210] do_sync_read+0xc4/0x138 [DA247EF0] [C006FECC] vfs_read+0x19c/0x1a4 [DA247F10] [C00702E4] sys_read+0x4c/0x90 [DA247F40] [C00122EC] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff5ba98 LR = 0x10032fc0
Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/9/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote: On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote: ... > I suspected both and changed both the disk and the ram for quality > parts, that I tested afterwards. Both passed thorough tests. You wrote about half an hour, so overheating was also considered, I presume. Yes, but since it works fine with the other NIC... :) > Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), > works absolutely fine. So it looks like the card/driver (or maybe this specimen?). I'm suspecting the driver, but I'm not a specialist :) It's true that this particular card specimen could be damaged even though that seems a bit unlikely. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
On 1/9/07, Jarek Poplawski <[EMAIL PROTECTED]> wrote: On 05-01-2007 20:03, Thibaut VARENE wrote: > Hi, > > I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M ... > [C7F6FA60] [C0012498] ret_from_except+0x0/0x14 > --- Exception: 501 at __kmalloc+0x30/0xc0 > LR = rpc_malloc+0x48/0xac [sunrpc] > [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable) > [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] > [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] > [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] Aren't there any other warnings displayed before? No, I've pasted the full dump that appeared on the serial console I had setup when the crash occured. No problems with memory or disk? I suspected both and changed both the disk and the ram for quality parts, that I tested afterwards. Both passed thorough tests. Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps), works absolutely fine. HTH T-Bone (PS: please CC me in answers) -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
Hi, I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M RAM) box for a while: I can reliably kill my machine in about half an hour while watching some video read from a remote nfs volume (hence the "mplayer" task in the following dump). It was relatively uneasy to get proper debug info as the crash happens while video was playing on the screen, but it's there anyway :) This particular dump comes from kernel 2.6.19-ck2 but I reproduced the bug with vanilla 2.6.19 too, so the bug lives in mainline. I'm not really familiar with that particular code, but I'd gladly provide as much debug info as I can. The box is hooked to a gigabit switch and the NIC is configured as gigabit too. Interestingly, when I reboot immediately after the crash, the NIC gets a bogus MAC address, and I have to reboot again to get back to normal. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT Modules linked in: nfs lockd sunrpc eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci14 NIP: C020F0E0 LR: C0210C54 CTR: C0210B98 REGS: c7f6f670 TRAP: 0700 Not tainted (2.6.19-ck2) MSR: 00021032 CR: 24022488 XER: TASK = c49a8d10[2227] 'mplayer' THREAD: c7f6e000 GPR00: C7F6F720 C49A8D10 DFF41260 DFF41000 000B CE0CF932 GPR08: 0CEA 0001 1000 0CEB 44022422 1085F9B8 C50B0368 B241 GPR16: C7F6FD28 B240 DFF412DC C038 9032 0400 C7F6E000 GPR24: DFF41000 C7F6E000 C0210B98 CE0EAC80 DFF41260 CE0CF900 NIP [C020F0E0] eth_alloc_tx_desc_index+0x44/0x50 LR [C0210C54] mv643xx_eth_start_xmit+0xbc/0x3b8 Call Trace: [C7F6F720] [CE0CF930] 0xce0cf930 (unreliable) [C7F6F760] [C0299714] dev_hard_start_xmit+0x1d4/0x2c8 [C7F6F780] [C029C0E0] dev_queue_xmit+0x2bc/0x334 [C7F6F7A0] [C02B6E1C] ip_output+0x124/0x248 [C7F6F7C0] [C02B7E54] ip_queue_xmit+0x17c/0x404 [C7F6F830] [C02C91BC] tcp_transmit_skb+0x38c/0x7dc [C7F6F860] [C02C65E4] __tcp_ack_snd_check+0x64/0xbc [C7F6F870] [C02C8100] tcp_rcv_established+0x5d4/0x980 [C7F6F8A0] [C02CEDCC] tcp_v4_do_rcv+0xd8/0x3e4 [C7F6F8D0] [C02D1610] tcp_v4_rcv+0x788/0x98c [C7F6F900] [C02B2594] ip_local_deliver+0xe4/0x1a4 [C7F6F920] [C02B2A50] ip_rcv+0x288/0x46c [C7F6F950] [C0299308] netif_receive_skb+0x214/0x304 [C7F6F980] [C0211CBC] mv643xx_poll+0x41c/0x48c [C7F6F9D0] [C029B550] net_rx_action+0x98/0x200 [C7F6FA00] [C0026958] __do_softirq+0x80/0xf4 [C7F6FA30] [C0006930] do_softirq+0x58/0x5c [C7F6FA40] [C0026408] irq_exit+0x60/0x80 [C7F6FA50] [C00069DC] do_IRQ+0xa8/0xc8 [C7F6FA60] [C0012498] ret_from_except+0x0/0x14 --- Exception: 501 at __kmalloc+0x30/0xc0 LR = rpc_malloc+0x48/0xac [sunrpc] [C7F6FB20] [C3D72508] 0xc3d72508 (unreliable) [C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] [C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] [C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] [C7F6FB80] [E2B0EEB0] nfs_execute_read+0x40/0x64 [nfs] [C7F6FBB0] [E2B0F6A4] nfs_pagein_one+0x2a0/0x300 [nfs] [C7F6FBF0] [E2B0FA9C] nfs_readpages+0x118/0x1f8 [nfs] [C7F6FC40] [C00521DC] __do_page_cache_readahead+0x1e8/0x318 [C7F6FCD0] [C0052390] blockable_page_cache_readahead+0x84/0x114 [C7F6FCF0] [C00524A4] make_ahead_window+0x84/0xd4 [C7F6FD00] [C00525AC] page_cache_readahead+0xb8/0x220 [C7F6FD20] [C004B00C] do_generic_mapping_read+0x574/0x5e8 [C7F6FDC0] [C004D624] generic_file_aio_read+0x120/0x274
Re: [BUG] in skge.c on 2.6.18-rc5
On 8/30/06, Stephen Hemminger <[EMAIL PROTECTED]> wrote: On Wed, 30 Aug 2006 19:21:20 +0200 "Thibaut VARENE" <[EMAIL PROTECTED]> wrote: > Replying to myself as I've been pointed at Stephen's reply (please CC > me, i'm not subscribed): > > I'm bringing the interface up with 'dhclient eth0', and yes it's using autoneg. > Any chance of getting a backtrace; serial port, digital camera, handwritten note? If you can deal with this extremely blurry shot: http://www.pateam.org/archive/tmp/IMGP0825.JPG begins with "mod_timer" / "neigh_update" / "read_lock" and so on. Worst case I'll reproduce the bug again and dump a better bt, but I'd rather avoid as much as possible as I use that machine a lot right now ;P HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] in skge.c on 2.6.18-rc5
Replying to myself as I've been pointed at Stephen's reply (please CC me, i'm not subscribed): I'm bringing the interface up with 'dhclient eth0', and yes it's using autoneg. HTH T_Bone On 8/30/06, Thibaut VARENE <[EMAIL PROTECTED]> wrote: Hi, The following commit: commit 239e44e1f05e2163ee066c07a753f9fb445979b2 Author: Edgar E. Iglesias <[EMAIL PROTECTED]> Date: Mon Aug 14 23:00:24 2006 -0700 -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] in skge.c on 2.6.18-rc5
Hi, The following commit: commit 239e44e1f05e2163ee066c07a753f9fb445979b2 Author: Edgar E. Iglesias <[EMAIL PROTECTED]> Date: Mon Aug 14 23:00:24 2006 -0700 [PATCH] skge: remember to run netif_poll_disable() Signed-off-by: Edgar E. Iglesias <[EMAIL PROTECTED]> Cc: Stephen Hemminger <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> diff --git a/drivers/net/skge.c b/drivers/net/skge.c index 7de9a07..ad878df 100644 --- a/drivers/net/skge.c +++ b/drivers/net/skge.c @@ -2211,6 +2211,7 @@ static int skge_up(struct net_device *de skge_write8(hw, Q_ADDR(rxqaddr[port], Q_CSR), CSR_START | CSR_IRQ_CL_F);skge_led(skge, LED_MODE_ON); + netif_poll_enable(dev); return 0; free_rx_ring: @@ -2279,6 +2280,7 @@ static int skge_down(struct net_device * skge_led(skge, LED_MODE_OFF); + netif_poll_disable(dev); skge_tx_clean(skge); skge_rx_clean(skge); panics my 2.6.18-rc5 kernel on my em64t box apparently on first network activity (eg 'ping'). Reverting it gets me back to a functional kernel. HTH T-Bone [EMAIL PROTECTED]:~$ lspci | grep Ethernet 02:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown device 4364 (rev 12) 05:04.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13) -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] tulip: NatSemi DP83840A PHY fix
On 6/25/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: From: Thibaut VARENE <[EMAIL PROTECTED]> Fix a problem with Tulip 21142 HP branded PCI cards (PN#: B5509-66001), which feature a NatSemi DP83840A PHY. Without that patch, it is impossible to properly initialize the card's PHY, and it's thus impossible to monitor/configure it. This patch can now be dropped. There's a better fix from Kyle McMartin in our cvs, and Val should have gotten it already. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/9] tulip: NatSemi DP83840A PHY fix
On 4/27/06, Jeff Garzik <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > + if (startup) { > > + int timeout = 10; /* max 1 ms */ > > for (i = 0; i < reset_length; i++) > > > > iowrite32(get_u16(&reset_sequence[i]) << 16, ioaddr + CSR15); > > + > > + /* flush posted writes */ > > + ioread32(ioaddr + CSR15); > > + > > + /* Sect 3.10.3 in DP83840A.pdf (p39) > > */ > > + udelay(500); > > + > > + /* Section 4.2 in DP83840A.pdf (p43) > > */ > > + /* and IEEE 802.3 "22.2.4.1.1 Reset" > > */ > > + while (timeout-- && > > + (tulip_mdio_read (dev, > > phy_num, MII_BMCR) & BMCR_RESET)) > > + udelay(100); > > > What can we do about this? > > Its a huge delay to be taken inside a spinlock. This is device setup code. ISTR Grant showing other similar examples of delays in such code in the kernel. Unless you keep configuring/deconfiguring the device, and assuming you hit worst case scenario everytime, it won't be a problem. But if you're doing that, you already have a problem elsewhere. Or am I missing something? > Anybody interested to converting the driver to use schedule_work() or > similar? That question has been raised months ago without any significant outcome. Maybe it's time to move on? This code does respect hardware specs, at least, which isn't the case of existing code, and fixes a bug... HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html