Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On 06/29/2007 03:12 PM, Satyam Sharma wrote: > Hi Clemens, > > [ Cc:'ing Andrew, original thread at http://lkml.org/lkml/2007/5/15/354 ] > > On 6/29/07, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: >> On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote: >> > Hi, >> >> I had my system running up for about one month without any issues, and >> then it happened again, same kernel oops, panic, end. >> >> So I have upgraded to 2.6.22-rc4-mm2 in hope it might fix it, but I just >> got another oops (uptime 4d) [see attached file] > > You "upgraded" from -stable series kernels (2.6.19.2 / 2.6.20.6 / > 2.6.21.1) to a -mm kernel, which is anything but :-) yeah, its sort of "last hope" > On the one hand, I really like that we're getting testers for -mm > kernels, but on the other hand, my good and honest side would > recommend you to install stable kernels (2.6.x.y versions) on > production systems, if you really care about uptimes. thats fine. its just my workstation here. I would never ever do that on any production box. I am too old to be that experimental :) >> my config hasn't changed in any way to the previous kernels. > > This doesn't look like the same oops you were getting persistently > with 2.6.21.1 ... you could try upgrading to 2.6.22-rc6 (without -mm) > too, if the oops in 2.6.21.1 was occurring too frequently in your setup; > possibly it has been resolved in the 22-rc series. I will try that. thanks a lot for the tip (I upgraded to rc6-mm1, and I will see if I get the oops again, or the other one ...) > Satyam > > [ Clemens' 2.6.22-rc4-mm2 oops below. ] > > > Jun 29 11:25:08 saturn kernel: [348308.690154] BUG: unable to handle > kernel NULL pointer dereference at virtual address 0001 > Jun 29 11:25:08 saturn kernel: [348308.690160] printing eip: > Jun 29 11:25:08 saturn kernel: [348308.690162] c108887c > Jun 29 11:25:08 saturn kernel: [348308.690163] *pde = > Jun 29 11:25:08 saturn kernel: [348308.690166] Oops: [#2] > Jun 29 11:25:08 saturn kernel: [348308.690167] PREEMPT > Jun 29 11:25:08 saturn kernel: [348308.690169] Modules linked in: > eeprom pcspkr i2c_viapro k8temp hwmon i2c_core > Jun 29 11:25:08 saturn kernel: [348308.690177] CPU:0 > Jun 29 11:25:08 saturn kernel: [348308.690177] EIP: > 0060:[__d_lookup+108/336]Not tainted VLI > Jun 29 11:25:08 saturn kernel: [348308.690179] EFLAGS: 00010202 > (2.6.22-rc4-mm2 #1) > Jun 29 11:25:08 saturn kernel: [348308.690185] EIP is at > __d_lookup+0x6c/0x150 > Jun 29 11:25:08 saturn kernel: [348308.690187] eax: 0001 ebx: > 0001 ecx: 0001 edx: 089c1579 > Jun 29 11:25:08 saturn kernel: [348308.690190] esi: c6840ee8 edi: > c301d734 ebp: f786f080 esp: c6840e84 > Jun 29 11:25:08 saturn kernel: [348308.690193] ds: 007b es: 007b > fs: gs: 0033 ss: 0068 > Jun 29 11:25:08 saturn kernel: [348308.690196] Process nfsd (pid: > 30536, ti=c684 task=c62bce90 task.ti=c684) > Jun 29 11:25:08 saturn kernel: [348308.690198] Stack: c301d734 > 089c1579 c6840edb 0002 c6840ee8 0005 c6840edb > Jun 29 11:25:08 saturn kernel: [348308.690205]f9ec > c6840ee8 c301d734 c471ba84 c1088976 c7bbb090 c7bbb090 > Jun 29 11:25:08 saturn kernel: [348308.690211]c10b49fc > c6840edb 000d c14914fb 7793 332bce90 31313630 c5469900 > Jun 29 11:25:08 saturn kernel: [348308.690217] Call Trace: > Jun 29 11:25:08 saturn kernel: [348308.690220] [d_lookup+22/64] > d_lookup+0x16/0x40 > Jun 29 11:25:08 saturn kernel: [348308.690224] > [proc_flush_task+76/496] proc_flush_task+0x4c/0x1f0 > Jun 29 11:25:08 saturn kernel: [348308.690229] [release_task+612/880] > release_task+0x264/0x370 > Jun 29 11:25:08 saturn kernel: [348308.690234] [do_wait+1850/3072] > do_wait+0x73a/0xc00 > Jun 29 11:25:08 saturn kernel: [348308.690239] > [_spin_unlock_irq+38/64] _spin_unlock_irq+0x26/0x40 > Jun 29 11:25:08 saturn kernel: [348308.690243] > [default_wake_function+0/16] default_wake_function+0x0/0x10 > Jun 29 11:25:08 saturn kernel: [348308.690247] [sys_wait4+49/64] > sys_wait4+0x31/0x40 > Jun 29 11:25:08 saturn kernel: [348308.690251] [sys_waitpid+39/48] > sys_waitpid+0x27/0x30 > Jun 29 11:25:08 saturn kernel: [348308.690255] [syscall_call+7/11] > syscall_call+0x7/0xb > Jun 29 11:25:08 saturn kernel: [348308.690259] === > Jun 29 11:25:08 saturn kernel: [348308.690260] INFO: lockdep is turned off. > Jun 29 11:25:08 saturn kernel: [348308.690262] Code: d3 e8 31 c3 23 1d > b4 d8 51 c1 b8 01 00 00 00 c1 e3 02 03 1d bc d8 51 c1 e8 e2 24 f9 ff > 8b 1b 85 db 75 08 eb 44 85 c0 89 c3 74 3e <8b> 03 0f 18 00 90 8d 6b d8 > 8b 54 24 04 3b 55 34 75 e8 8b 34 24 > Jun 29 11:25:08 saturn kernel: [348308.690289] EIP: > [__d_lookup+108/336] __d_lookup+0x6c/0x150 SS:ESP 0068:c6840e84 > Jun 29 11:25:08 saturn kernel: [348308.690296] note: nfsd[30536] > exited with preempt_count 1 > Jun 29 11:25:08 saturn kernel: [348308.690303] BUG: scheduling while > atomic: nfsd/0x1002/30536 > Jun 29
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
Hi Clemens, [ Cc:'ing Andrew, original thread at http://lkml.org/lkml/2007/5/15/354 ] On 6/29/07, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote: > Hi, I had my system running up for about one month without any issues, and then it happened again, same kernel oops, panic, end. So I have upgraded to 2.6.22-rc4-mm2 in hope it might fix it, but I just got another oops (uptime 4d) [see attached file] You "upgraded" from -stable series kernels (2.6.19.2 / 2.6.20.6 / 2.6.21.1) to a -mm kernel, which is anything but :-) On the one hand, I really like that we're getting testers for -mm kernels, but on the other hand, my good and honest side would recommend you to install stable kernels (2.6.x.y versions) on production systems, if you really care about uptimes. my config hasn't changed in any way to the previous kernels. This doesn't look like the same oops you were getting persistently with 2.6.21.1 ... you could try upgrading to 2.6.22-rc6 (without -mm) too, if the oops in 2.6.21.1 was occurring too frequently in your setup; possibly it has been resolved in the 22-rc series. Satyam [ Clemens' 2.6.22-rc4-mm2 oops below. ] Jun 29 11:25:08 saturn kernel: [348308.690154] BUG: unable to handle kernel NULL pointer dereference at virtual address 0001 Jun 29 11:25:08 saturn kernel: [348308.690160] printing eip: Jun 29 11:25:08 saturn kernel: [348308.690162] c108887c Jun 29 11:25:08 saturn kernel: [348308.690163] *pde = Jun 29 11:25:08 saturn kernel: [348308.690166] Oops: [#2] Jun 29 11:25:08 saturn kernel: [348308.690167] PREEMPT Jun 29 11:25:08 saturn kernel: [348308.690169] Modules linked in: eeprom pcspkr i2c_viapro k8temp hwmon i2c_core Jun 29 11:25:08 saturn kernel: [348308.690177] CPU:0 Jun 29 11:25:08 saturn kernel: [348308.690177] EIP: 0060:[__d_lookup+108/336]Not tainted VLI Jun 29 11:25:08 saturn kernel: [348308.690179] EFLAGS: 00010202 (2.6.22-rc4-mm2 #1) Jun 29 11:25:08 saturn kernel: [348308.690185] EIP is at __d_lookup+0x6c/0x150 Jun 29 11:25:08 saturn kernel: [348308.690187] eax: 0001 ebx: 0001 ecx: 0001 edx: 089c1579 Jun 29 11:25:08 saturn kernel: [348308.690190] esi: c6840ee8 edi: c301d734 ebp: f786f080 esp: c6840e84 Jun 29 11:25:08 saturn kernel: [348308.690193] ds: 007b es: 007b fs: gs: 0033 ss: 0068 Jun 29 11:25:08 saturn kernel: [348308.690196] Process nfsd (pid: 30536, ti=c684 task=c62bce90 task.ti=c684) Jun 29 11:25:08 saturn kernel: [348308.690198] Stack: c301d734 089c1579 c6840edb 0002 c6840ee8 0005 c6840edb Jun 29 11:25:08 saturn kernel: [348308.690205]f9ec c6840ee8 c301d734 c471ba84 c1088976 c7bbb090 c7bbb090 Jun 29 11:25:08 saturn kernel: [348308.690211]c10b49fc c6840edb 000d c14914fb 7793 332bce90 31313630 c5469900 Jun 29 11:25:08 saturn kernel: [348308.690217] Call Trace: Jun 29 11:25:08 saturn kernel: [348308.690220] [d_lookup+22/64] d_lookup+0x16/0x40 Jun 29 11:25:08 saturn kernel: [348308.690224] [proc_flush_task+76/496] proc_flush_task+0x4c/0x1f0 Jun 29 11:25:08 saturn kernel: [348308.690229] [release_task+612/880] release_task+0x264/0x370 Jun 29 11:25:08 saturn kernel: [348308.690234] [do_wait+1850/3072] do_wait+0x73a/0xc00 Jun 29 11:25:08 saturn kernel: [348308.690239] [_spin_unlock_irq+38/64] _spin_unlock_irq+0x26/0x40 Jun 29 11:25:08 saturn kernel: [348308.690243] [default_wake_function+0/16] default_wake_function+0x0/0x10 Jun 29 11:25:08 saturn kernel: [348308.690247] [sys_wait4+49/64] sys_wait4+0x31/0x40 Jun 29 11:25:08 saturn kernel: [348308.690251] [sys_waitpid+39/48] sys_waitpid+0x27/0x30 Jun 29 11:25:08 saturn kernel: [348308.690255] [syscall_call+7/11] syscall_call+0x7/0xb Jun 29 11:25:08 saturn kernel: [348308.690259] === Jun 29 11:25:08 saturn kernel: [348308.690260] INFO: lockdep is turned off. Jun 29 11:25:08 saturn kernel: [348308.690262] Code: d3 e8 31 c3 23 1d b4 d8 51 c1 b8 01 00 00 00 c1 e3 02 03 1d bc d8 51 c1 e8 e2 24 f9 ff 8b 1b 85 db 75 08 eb 44 85 c0 89 c3 74 3e <8b> 03 0f 18 00 90 8d 6b d8 8b 54 24 04 3b 55 34 75 e8 8b 34 24 Jun 29 11:25:08 saturn kernel: [348308.690289] EIP: [__d_lookup+108/336] __d_lookup+0x6c/0x150 SS:ESP 0068:c6840e84 Jun 29 11:25:08 saturn kernel: [348308.690296] note: nfsd[30536] exited with preempt_count 1 Jun 29 11:25:08 saturn kernel: [348308.690303] BUG: scheduling while atomic: nfsd/0x1002/30536 Jun 29 11:25:08 saturn kernel: [348308.690305] INFO: lockdep is turned off. Jun 29 11:25:08 saturn kernel: [348308.690307] [schedule+1490/1744] schedule+0x5d2/0x6d0 Jun 29 11:25:08 saturn kernel: [348308.690311] [vt_console_print+106/688] vt_console_print+0x6a/0x2b0 Jun 29 11:25:08 saturn kernel: [348308.690316] [__cond_resched+18/48] __cond_resched+0x12/0x30 Jun 29 11:25:08 saturn kernel: [348308.690319] [cond_resched+42/64] cond_resched+0x2a/0x40 Jun 29 11:25:08 saturn kernel: [348308.690322] [unmap_vmas+1116/1184] unmap
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote: > Hi, I had my system running up for about one month without any issues, and then it happened again, same kernel oops, panic, end. So I have upgraded to 2.6.22-rc4-mm2 in hope it might fix it, but I just got another oops (uptime 4d) [see attached file] my config hasn't changed in any way to the previous kernels. -- [ Clemens Schwaighofer -=:~ ] [ TEQUILA\ Japan IT Group] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ] Jun 29 11:25:08 saturn kernel: [348308.690154] BUG: unable to handle kernel NULL pointer dereference at virtual address 0001 Jun 29 11:25:08 saturn kernel: [348308.690160] printing eip: Jun 29 11:25:08 saturn kernel: [348308.690162] c108887c Jun 29 11:25:08 saturn kernel: [348308.690163] *pde = Jun 29 11:25:08 saturn kernel: [348308.690166] Oops: [#2] Jun 29 11:25:08 saturn kernel: [348308.690167] PREEMPT Jun 29 11:25:08 saturn kernel: [348308.690169] Modules linked in: eeprom pcspkr i2c_viapro k8temp hwmon i2c_core Jun 29 11:25:08 saturn kernel: [348308.690177] CPU:0 Jun 29 11:25:08 saturn kernel: [348308.690177] EIP:0060:[__d_lookup+108/336]Not tainted VLI Jun 29 11:25:08 saturn kernel: [348308.690179] EFLAGS: 00010202 (2.6.22-rc4-mm2 #1) Jun 29 11:25:08 saturn kernel: [348308.690185] EIP is at __d_lookup+0x6c/0x150 Jun 29 11:25:08 saturn kernel: [348308.690187] eax: 0001 ebx: 0001 ecx: 0001 edx: 089c1579 Jun 29 11:25:08 saturn kernel: [348308.690190] esi: c6840ee8 edi: c301d734 ebp: f786f080 esp: c6840e84 Jun 29 11:25:08 saturn kernel: [348308.690193] ds: 007b es: 007b fs: gs: 0033 ss: 0068 Jun 29 11:25:08 saturn kernel: [348308.690196] Process nfsd (pid: 30536, ti=c684 task=c62bce90 task.ti=c684) Jun 29 11:25:08 saturn kernel: [348308.690198] Stack: c301d734 089c1579 c6840edb 0002 c6840ee8 0005 c6840edb Jun 29 11:25:08 saturn kernel: [348308.690205]f9ec c6840ee8 c301d734 c471ba84 c1088976 c7bbb090 c7bbb090 Jun 29 11:25:08 saturn kernel: [348308.690211]c10b49fc c6840edb 000d c14914fb 7793 332bce90 31313630 c5469900 Jun 29 11:25:08 saturn kernel: [348308.690217] Call Trace: Jun 29 11:25:08 saturn kernel: [348308.690220] [d_lookup+22/64] d_lookup+0x16/0x40 Jun 29 11:25:08 saturn kernel: [348308.690224] [proc_flush_task+76/496] proc_flush_task+0x4c/0x1f0 Jun 29 11:25:08 saturn kernel: [348308.690229] [release_task+612/880] release_task+0x264/0x370 Jun 29 11:25:08 saturn kernel: [348308.690234] [do_wait+1850/3072] do_wait+0x73a/0xc00 Jun 29 11:25:08 saturn kernel: [348308.690239] [_spin_unlock_irq+38/64] _spin_unlock_irq+0x26/0x40 Jun 29 11:25:08 saturn kernel: [348308.690243] [default_wake_function+0/16] default_wake_function+0x0/0x10 Jun 29 11:25:08 saturn kernel: [348308.690247] [sys_wait4+49/64] sys_wait4+0x31/0x40 Jun 29 11:25:08 saturn kernel: [348308.690251] [sys_waitpid+39/48] sys_waitpid+0x27/0x30 Jun 29 11:25:08 saturn kernel: [348308.690255] [syscall_call+7/11] syscall_call+0x7/0xb Jun 29 11:25:08 saturn kernel: [348308.690259] === Jun 29 11:25:08 saturn kernel: [348308.690260] INFO: lockdep is turned off. Jun 29 11:25:08 saturn kernel: [348308.690262] Code: d3 e8 31 c3 23 1d b4 d8 51 c1 b8 01 00 00 00 c1 e3 02 03 1d bc d8 51 c1 e8 e2 24 f9 ff 8b 1b 85 db 75 08 eb 44 85 c0 89 c3 74 3e <8b> 03 0f 18 00 90 8d 6b d8 8b 54 24 04 3b 55 34 75 e8 8b 34 24 Jun 29 11:25:08 saturn kernel: [348308.690289] EIP: [__d_lookup+108/336] __d_lookup+0x6c/0x150 SS:ESP 0068:c6840e84 Jun 29 11:25:08 saturn kernel: [348308.690296] note: nfsd[30536] exited with preempt_count 1 Jun 29 11:25:08 saturn kernel: [348308.690303] BUG: scheduling while atomic: nfsd/0x1002/30536 Jun 29 11:25:08 saturn kernel: [348308.690305] INFO: lockdep is turned off. Jun 29 11:25:08 saturn kernel: [348308.690307] [schedule+1490/1744] schedule+0x5d2/0x6d0 Jun 29 11:25:08 saturn kernel: [348308.690311] [vt_console_print+106/688] vt_console_print+0x6a/0x2b0 Jun 29 11:25:08 saturn kernel: [348308.690316] [__cond_resched+18/48] __cond_resched+0x12/0x30 Jun 29 11:25:08 saturn kernel: [348308.690319] [cond_resched+42/64] cond_resched+0x2a/0x40 Jun 29 11:25:08 saturn kernel: [348308.690322] [unmap_vmas+1116/1184] unmap_vmas+0x45c/0x4a0 Jun 29 11:25:08 saturn kernel: [348308.690327] [exit_mmap+105/256] exit_mmap+0x69/0x100 Jun 29 11:25:08 saturn kernel: [348308.690331] [mmput+68/256] mmput+0x44/0x100 Jun 29 11:25:08 saturn kernel: [348308.690335] [do_exit+301/2224] do_exit+0x12d/0x8b0 Jun 29 11:25:08 saturn kernel: [348308.690339] [__wake_up+56/80] __wake_up+0x38/0x50 Jun 29 11:25:08 saturn kernel: [348308.690342] [die+574/576] die+0x23e/0x240 Jun 29 11:25:08 saturn kernel: [348308.690346] [do_page_fault
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
Andrew Morton wrote: On Wed, 16 May 2007 17:40:53 +0200 Tejun Heo <[EMAIL PROTECTED]> wrote: I see. I thought there was different approach on fixing the problem. I'll try to backport the synchronization fix but am afraid it can be too risky for -stable. If it seems too risky, I'll send a patch to disable reclamation. OK. Sad. Maybe we add /proc/sys/fs/i-have-lots-of-disks-and-dont-mind-if-it-oopses to enable the old behaviour. Out of curiosity, is there a decent reproducer for this problem, or is it just a few lucky individuals? :) -Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On Wed, 16 May 2007 17:40:53 +0200 Tejun Heo <[EMAIL PROTECTED]> wrote: > I see. I thought there was different approach on fixing the problem. > I'll try to backport the synchronization fix but am afraid it can be too > risky for -stable. If it seems too risky, I'll send a patch to disable > reclamation. OK. Sad. Maybe we add /proc/sys/fs/i-have-lots-of-disks-and-dont-mind-if-it-oopses to enable the old behaviour. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
Tejun Heo wrote: >>> The safest approach I can think of is making >>> dentries for attributes unreclaimable but those are made reclaimable for >>> good reasons. :-( >> Yeah, that was the google workaround. It's OK unless you happen to have >> thousands of disks on an ia32 box. > > I see. I thought there was different approach on fixing the problem. > I'll try to backport the synchronization fix but am afraid it can be too > risky for -stable. If it seems too risky, I'll send a patch to disable > reclamation. > Realistically, how can disabling the reclamation be worse than what's there now? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
Andrew Morton wrote: >>> a number of people have hit that, on and off. >> Yeah, I've been seeing that one. It should have been fixed with the big >> fat patchset. > > Great - fingers crossed. > >>> We were close to having a fix, I think, but then we decided that great >>> chunks of sysfs needed rewriting and I believe that we believe that this >>> great rewrite will fix this bug. >> How were we gonna fix it? If it isn't too complex, I can cook up a >> patch for -stable series. > > Do we actually understand the causes? Yeah, I think I do. Basically, the problem is that on-demand attach and reclamation update sd->s_dentry but accesses to it aren't synchronized properly. In the big fat patchset, first I tried to fix it by removing sd->s_dentry completely which didn't work because of shadow nodes, so the second try was to fix the synchronization which is in -mm now. >> The safest approach I can think of is making >> dentries for attributes unreclaimable but those are made reclaimable for >> good reasons. :-( > > Yeah, that was the google workaround. It's OK unless you happen to have > thousands of disks on an ia32 box. I see. I thought there was different approach on fixing the problem. I'll try to backport the synchronization fix but am afraid it can be too risky for -stable. If it seems too risky, I'll send a patch to disable reclamation. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On Wed, 16 May 2007 13:05:19 +0200 Tejun Heo <[EMAIL PROTECTED]> wrote: > Hello, > > Andrew Morton wrote: > > On Wed, 16 May 2007 09:24:54 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> > > wrote: > > > >> I have re-occurring oopses and panics in those above kernels. The error > >> is always the same. I have the last Kernel Panic as a picture here: > >> > >> http://dev.tequila.jp/clemens/R0010172.JPG > >> > >> The oops have the same error style like this Panic. I tried to capture > >> one, but right after copying it into vim, I got a Panic. So next time I > >> try to. > >> > >> I think it started with 2.6.19.2, I cannot remember I had any of those > >> problems before. The box can work fine for about a week or more, or it > >> looks up several times a day. I run a memtest for 10 h, but I had no > >> errors. > > > > shrink_dcache_memory->...sysfs_d_iput->BUG > > > > BUG_ON(sd->s_dentry != dentry); > > > > a number of people have hit that, on and off. > > Yeah, I've been seeing that one. It should have been fixed with the big > fat patchset. Great - fingers crossed. > > We were close to having a fix, I think, but then we decided that great > > chunks of sysfs needed rewriting and I believe that we believe that this > > great rewrite will fix this bug. > > How were we gonna fix it? If it isn't too complex, I can cook up a > patch for -stable series. Do we actually understand the causes? > The safest approach I can think of is making > dentries for attributes unreclaimable but those are made reclaimable for > good reasons. :-( Yeah, that was the google workaround. It's OK unless you happen to have thousands of disks on an ia32 box. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
Hello, Andrew Morton wrote: > On Wed, 16 May 2007 09:24:54 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> > wrote: > >> I have re-occurring oopses and panics in those above kernels. The error >> is always the same. I have the last Kernel Panic as a picture here: >> >> http://dev.tequila.jp/clemens/R0010172.JPG >> >> The oops have the same error style like this Panic. I tried to capture >> one, but right after copying it into vim, I got a Panic. So next time I >> try to. >> >> I think it started with 2.6.19.2, I cannot remember I had any of those >> problems before. The box can work fine for about a week or more, or it >> looks up several times a day. I run a memtest for 10 h, but I had no errors. > > shrink_dcache_memory->...sysfs_d_iput->BUG > > BUG_ON(sd->s_dentry != dentry); > > a number of people have hit that, on and off. Yeah, I've been seeing that one. It should have been fixed with the big fat patchset. > We were close to having a fix, I think, but then we decided that great > chunks of sysfs needed rewriting and I believe that we believe that this > great rewrite will fix this bug. How were we gonna fix it? If it isn't too complex, I can cook up a patch for -stable series. The safest approach I can think of is making dentries for attributes unreclaimable but those are made reclaimable for good reasons. :-( Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On Wed, 16 May 2007 11:46:00 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: > On 05/16/2007 10:53 AM, Andrew Morton wrote: > > > How frequently do you see these failures? If it's repeatable with any > > reliability > > then it'd be great if you could test a patchset for us. It's at: > > > > http://userweb.kernel.org/~akpm/cs.gz > > > > that's a single patch against 2.6.21-rc1, containing the following patches, > > which > > are from the forthcoming 2.6.21-rc1-mm1 lineup: > > (and those above are 2.6.22-rc1 of course) > > well, I tried to apply those patches and when I compile I get this error: > > CC net/ipv6/exthdrs.o > net/ipv6/exthdrs.c: In function ‘ipv6_rthdr_rcv’: > net/ipv6/exthdrs.c:390: error: ‘struct sk_buff’ has no member named ‘h’ > net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’ > net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’ > net/ipv6/exthdrs.c:398: error: ‘struct sk_buff’ has no member named ‘h’ > make[2]: *** [net/ipv6/exthdrs.o] Error 1 > make[1]: *** [net/ipv6] Error 2 > make: *** [net] Error 2 > > It's probably totally unrelated but sadly a showstopper for testing the > new sysfs patches, oh crap, sorry. Oh well. Please test 2.6.22-rc1-mm1 which has all the fixes and is a damn fine kernel. It's at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/ The core patch is at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/2.6.22-rc1-mm1.gz, and it's against 2.6.22-rc1. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On 05/16/2007 10:53 AM, Andrew Morton wrote: > How frequently do you see these failures? If it's repeatable with any > reliability > then it'd be great if you could test a patchset for us. It's at: > > http://userweb.kernel.org/~akpm/cs.gz > > that's a single patch against 2.6.21-rc1, containing the following patches, > which > are from the forthcoming 2.6.21-rc1-mm1 lineup: (and those above are 2.6.22-rc1 of course) well, I tried to apply those patches and when I compile I get this error: CC net/ipv6/exthdrs.o net/ipv6/exthdrs.c: In function ‘ipv6_rthdr_rcv’: net/ipv6/exthdrs.c:390: error: ‘struct sk_buff’ has no member named ‘h’ net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’ net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’ net/ipv6/exthdrs.c:398: error: ‘struct sk_buff’ has no member named ‘h’ make[2]: *** [net/ipv6/exthdrs.o] Error 1 make[1]: *** [net/ipv6] Error 2 make: *** [net] Error 2 It's probably totally unrelated but sadly a showstopper for testing the new sysfs patches, -- [ Clemens Schwaighofer -=:~ ] [ TEQUILA\ Japan IT Group] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ] signature.asc Description: OpenPGP digital signature
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On 05/16/2007 10:53 AM, Andrew Morton wrote: > >> I think it started with 2.6.19.2, I cannot remember I had any of those >> problems before. The box can work fine for about a week or more, or it >> looks up several times a day. I run a memtest for 10 h, but I had no errors. > > shrink_dcache_memory->...sysfs_d_iput->BUG > > BUG_ON(sd->s_dentry != dentry); > > a number of people have hit that, on and off. > > We were close to having a fix, I think, but then we decided that great > chunks of sysfs needed rewriting and I believe that we believe that this > great rewrite will fix this bug. > > But alas, it's all too late for 2.6.22. Well, there is always hope for 2.6.23 :) > How frequently do you see these failures? If it's repeatable with any > reliability > then it'd be great if you could test a patchset for us. It's at: > > http://userweb.kernel.org/~akpm/cs.gz > > that's a single patch against 2.6.21-rc1, containing the following patches, > which > are from the forthcoming 2.6.21-rc1-mm1 lineup: I get this very frequently recently. I just got hit by another PANIC which was probably the same issue. I will get this patch and try it out and see if it helps for me. -- [ Clemens Schwaighofer -=:~ ] [ TEQUILA\ Japan IT Group] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ] signature.asc Description: OpenPGP digital signature
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On Wed, 16 May 2007 09:24:54 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> wrote: > I have re-occurring oopses and panics in those above kernels. The error > is always the same. I have the last Kernel Panic as a picture here: > > http://dev.tequila.jp/clemens/R0010172.JPG > > The oops have the same error style like this Panic. I tried to capture > one, but right after copying it into vim, I got a Panic. So next time I > try to. > > I think it started with 2.6.19.2, I cannot remember I had any of those > problems before. The box can work fine for about a week or more, or it > looks up several times a day. I run a memtest for 10 h, but I had no errors. shrink_dcache_memory->...sysfs_d_iput->BUG BUG_ON(sd->s_dentry != dentry); a number of people have hit that, on and off. We were close to having a fix, I think, but then we decided that great chunks of sysfs needed rewriting and I believe that we believe that this great rewrite will fix this bug. But alas, it's all too late for 2.6.22. How frequently do you see these failures? If it's repeatable with any reliability then it'd be great if you could test a patchset for us. It's at: http://userweb.kernel.org/~akpm/cs.gz that's a single patch against 2.6.21-rc1, containing the following patches, which are from the forthcoming 2.6.21-rc1-mm1 lineup: origin gregkh-driver-uio gregkh-driver-uio-documentation gregkh-driver-uio-hilscher-cif-card-driver gregkh-driver-idr-fix-obscure-bug-in-allocation-path gregkh-driver-idr-separate-out-idr_mark_full gregkh-driver-ida-implement-idr-based-id-allocator gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc gregkh-driver-sysfs-allocate-inode-number-using-ida gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd gregkh-driver-sysfs-fix-error-handling-in-binattr-write gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir gregkh-driver-sysfs-flatten-and-fix-sysfs_rename_dir-error-handling gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions gregkh-driver-sysfs-add-sysfs_dirent-s_parent gregkh-driver-sysfs-add-sysfs_dirent-s_name gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree gregkh-driver-sysfs-implement-bin_buffer gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect gregkh-driver-sysfs-kill-attribute-file-orphaning gregkh-driver-sysfs-separate-out-sysfs_attach_dentry gregkh-driver-sysfs-reimplement-syfs_drop_dentry gregkh-driver-sysfs-kill-unnecessary-attribute-owner gregkh-driver-driver-core-make-devt_attr-and-uevent_attr-static gregkh-driver-put_device-might_sleep gregkh-driver-kobject-warn gregkh-driver-warn-when-statically-allocated-kobjects-are-used gregkh-driver-nozomi fix-gregkh-driver-sysfs-fix-error-handling-in-binattr-write Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2
On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote: > The oops have the same error style like this Panic. I tried to capture > one, but right after copying it into vim, I got a Panic. So next time I > try to. I just got a oops and I could record it, the followed Kernel Panic didn't send out any Panic to my remote serial box. So I cannot give more information about that. -- [ Clemens Schwaighofer -=:~ ] [ TEQUILA\ Japan IT Group] [6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ] [ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ] [ http://www.tequila.co.jp ] [ 5955.558356] BUG: unable to handle kernel paging request at virtual address 182b10f7 [ 5955.558362] printing eip: [ 5955.558363] 182b10f7 [ 5955.558365] *pde = [ 5955.558367] Oops: [#1] [ 5955.558369] PREEMPT [ 5955.558370] Modules linked in: eeprom i2c_viapro i2c_core pcspkr k8temp hwmon eth1394 [ 5955.558377] CPU:0 [ 5955.558378] EIP:0060:[<182b10f7>]Not tainted VLI [ 5955.558379] EFLAGS: 00010202 (2.6.21.1 #1) [ 5955.558382] EIP is at 0x182b10f7 [ 5955.558385] eax: 80c7dcc1 ebx: f786bc00 ecx: 182b10f7 edx: 0002 [ 5955.558388] esi: f7836200 edi: c2669f6c ebp: f780c88f esp: c2669f34 [ 5955.558390] ds: 007b es: 007b fs: 00d8 gs: ss: 0068 [ 5955.558393] Process pdflush (pid: 192, ti=c2669000 task=c268da90 task.ti=c2669000) [ 5955.558395] Stack: c108a454 f7836284 00158443 024a f7836200 f783623c c2669f6c c147774c [ 5955.558401]c108aa8b c2669fb8 00158925 bc01 c1053030 c1052c44 [ 5955.558406]c2669f94 01b6 024a 0025 [ 5955.558411] Call Trace: [ 5955.558413] [] sync_sb_inodes+0x74/0x280 [ 5955.558419] [] writeback_inodes+0xab/0x110 [ 5955.558423] [] pdflush+0x0/0x220 [ 5955.558426] [] wb_kupdate+0x74/0xe0 [ 5955.558430] [] pdflush+0x114/0x220 [ 5955.558433] [] wb_kupdate+0x0/0xe0 [ 5955.558436] [] kthread+0xa8/0xe0 [ 5955.558439] [] kthread+0x0/0xe0 [ 5955.558442] [] kernel_thread_helper+0x7/0x18 [ 5955.558446] === [ 5955.558447] Code: Bad EIP value. [ 5955.558449] EIP: [<182b10f7>] 0x182b10f7 SS:ESP 0068:c2669f34 [ 5955.558455] note: pdflush[192] exited with preempt_count 1 signature.asc Description: OpenPGP digital signature