Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-06-28 Thread Clemens Schwaighofer
On 06/29/2007 03:12 PM, Satyam Sharma wrote:
> Hi Clemens,
> 
> [ Cc:'ing Andrew, original thread at http://lkml.org/lkml/2007/5/15/354 ]
> 
> On 6/29/07, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote:
>> On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote:
>> > Hi,
>>
>> I had my system running up for about one month without any issues, and
>> then it happened again, same kernel oops, panic, end.
>>
>> So I have upgraded to 2.6.22-rc4-mm2 in hope it might fix it, but I just
>> got another oops (uptime 4d) [see attached file]
> 
> You "upgraded" from -stable series kernels (2.6.19.2 / 2.6.20.6 /
> 2.6.21.1) to a -mm kernel, which is anything but :-)

yeah, its sort of "last hope"

> On the one hand, I really like that we're getting testers for -mm
> kernels, but on the other hand, my good and honest side would
> recommend you to install stable kernels (2.6.x.y versions) on
> production systems, if you really care about uptimes.

thats fine. its just my workstation here. I would never ever do that on
any production box. I am too old to be that experimental :)

>> my config hasn't changed in any way to the previous kernels.
> 
> This doesn't look like the same oops you were getting persistently
> with 2.6.21.1 ... you could try upgrading to 2.6.22-rc6 (without -mm)
> too, if the oops in 2.6.21.1 was occurring too frequently in your setup;
> possibly it has been resolved in the 22-rc series.

I will try that. thanks a lot for the tip (I upgraded to rc6-mm1, and I
will see if I get the oops again, or the other one ...)

> Satyam
> 
> [ Clemens' 2.6.22-rc4-mm2 oops below. ]
> 
> 
> Jun 29 11:25:08 saturn kernel: [348308.690154] BUG: unable to handle
> kernel NULL pointer dereference at virtual address 0001
> Jun 29 11:25:08 saturn kernel: [348308.690160]  printing eip:
> Jun 29 11:25:08 saturn kernel: [348308.690162] c108887c
> Jun 29 11:25:08 saturn kernel: [348308.690163] *pde = 
> Jun 29 11:25:08 saturn kernel: [348308.690166] Oops:  [#2]
> Jun 29 11:25:08 saturn kernel: [348308.690167] PREEMPT
> Jun 29 11:25:08 saturn kernel: [348308.690169] Modules linked in:
> eeprom pcspkr i2c_viapro k8temp hwmon i2c_core
> Jun 29 11:25:08 saturn kernel: [348308.690177] CPU:0
> Jun 29 11:25:08 saturn kernel: [348308.690177] EIP:
> 0060:[__d_lookup+108/336]Not tainted VLI
> Jun 29 11:25:08 saturn kernel: [348308.690179] EFLAGS: 00010202
> (2.6.22-rc4-mm2 #1)
> Jun 29 11:25:08 saturn kernel: [348308.690185] EIP is at
> __d_lookup+0x6c/0x150
> Jun 29 11:25:08 saturn kernel: [348308.690187] eax: 0001   ebx:
> 0001   ecx: 0001   edx: 089c1579
> Jun 29 11:25:08 saturn kernel: [348308.690190] esi: c6840ee8   edi:
> c301d734   ebp: f786f080   esp: c6840e84
> Jun 29 11:25:08 saturn kernel: [348308.690193] ds: 007b   es: 007b
> fs:   gs: 0033  ss: 0068
> Jun 29 11:25:08 saturn kernel: [348308.690196] Process nfsd (pid:
> 30536, ti=c684 task=c62bce90 task.ti=c684)
> Jun 29 11:25:08 saturn kernel: [348308.690198] Stack: c301d734
> 089c1579 c6840edb 0002 c6840ee8  0005 c6840edb
> Jun 29 11:25:08 saturn kernel: [348308.690205]f9ec
> c6840ee8 c301d734 c471ba84 c1088976 c7bbb090 c7bbb090 
> Jun 29 11:25:08 saturn kernel: [348308.690211]c10b49fc
> c6840edb 000d c14914fb 7793 332bce90 31313630 c5469900
> Jun 29 11:25:08 saturn kernel: [348308.690217] Call Trace:
> Jun 29 11:25:08 saturn kernel: [348308.690220]  [d_lookup+22/64]
> d_lookup+0x16/0x40
> Jun 29 11:25:08 saturn kernel: [348308.690224]
> [proc_flush_task+76/496] proc_flush_task+0x4c/0x1f0
> Jun 29 11:25:08 saturn kernel: [348308.690229]  [release_task+612/880]
> release_task+0x264/0x370
> Jun 29 11:25:08 saturn kernel: [348308.690234]  [do_wait+1850/3072]
> do_wait+0x73a/0xc00
> Jun 29 11:25:08 saturn kernel: [348308.690239]
> [_spin_unlock_irq+38/64] _spin_unlock_irq+0x26/0x40
> Jun 29 11:25:08 saturn kernel: [348308.690243]
> [default_wake_function+0/16] default_wake_function+0x0/0x10
> Jun 29 11:25:08 saturn kernel: [348308.690247]  [sys_wait4+49/64]
> sys_wait4+0x31/0x40
> Jun 29 11:25:08 saturn kernel: [348308.690251]  [sys_waitpid+39/48]
> sys_waitpid+0x27/0x30
> Jun 29 11:25:08 saturn kernel: [348308.690255]  [syscall_call+7/11]
> syscall_call+0x7/0xb
> Jun 29 11:25:08 saturn kernel: [348308.690259]  ===
> Jun 29 11:25:08 saturn kernel: [348308.690260] INFO: lockdep is turned off.
> Jun 29 11:25:08 saturn kernel: [348308.690262] Code: d3 e8 31 c3 23 1d
> b4 d8 51 c1 b8 01 00 00 00 c1 e3 02 03 1d bc d8 51 c1 e8 e2 24 f9 ff
> 8b 1b 85 db 75 08 eb 44 85 c0 89 c3 74 3e <8b> 03 0f 18 00 90 8d 6b d8
> 8b 54 24 04 3b 55 34 75 e8 8b 34 24
> Jun 29 11:25:08 saturn kernel: [348308.690289] EIP:
> [__d_lookup+108/336] __d_lookup+0x6c/0x150 SS:ESP 0068:c6840e84
> Jun 29 11:25:08 saturn kernel: [348308.690296] note: nfsd[30536]
> exited with preempt_count 1
> Jun 29 11:25:08 saturn kernel: [348308.690303] BUG: scheduling while
> atomic: nfsd/0x1002/30536
> Jun 29

Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-06-28 Thread Satyam Sharma

Hi Clemens,

[ Cc:'ing Andrew, original thread at http://lkml.org/lkml/2007/5/15/354 ]

On 6/29/07, Clemens Schwaighofer <[EMAIL PROTECTED]> wrote:

On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote:
> Hi,

I had my system running up for about one month without any issues, and
then it happened again, same kernel oops, panic, end.

So I have upgraded to 2.6.22-rc4-mm2 in hope it might fix it, but I just
got another oops (uptime 4d) [see attached file]


You "upgraded" from -stable series kernels (2.6.19.2 / 2.6.20.6 /
2.6.21.1) to a -mm kernel, which is anything but :-)

On the one hand, I really like that we're getting testers for -mm
kernels, but on the other hand, my good and honest side would
recommend you to install stable kernels (2.6.x.y versions) on
production systems, if you really care about uptimes.


my config hasn't changed in any way to the previous kernels.


This doesn't look like the same oops you were getting persistently
with 2.6.21.1 ... you could try upgrading to 2.6.22-rc6 (without -mm)
too, if the oops in 2.6.21.1 was occurring too frequently in your setup;
possibly it has been resolved in the 22-rc series.

Satyam

[ Clemens' 2.6.22-rc4-mm2 oops below. ]


Jun 29 11:25:08 saturn kernel: [348308.690154] BUG: unable to handle
kernel NULL pointer dereference at virtual address 0001
Jun 29 11:25:08 saturn kernel: [348308.690160]  printing eip:
Jun 29 11:25:08 saturn kernel: [348308.690162] c108887c
Jun 29 11:25:08 saturn kernel: [348308.690163] *pde = 
Jun 29 11:25:08 saturn kernel: [348308.690166] Oops:  [#2]
Jun 29 11:25:08 saturn kernel: [348308.690167] PREEMPT
Jun 29 11:25:08 saturn kernel: [348308.690169] Modules linked in:
eeprom pcspkr i2c_viapro k8temp hwmon i2c_core
Jun 29 11:25:08 saturn kernel: [348308.690177] CPU:0
Jun 29 11:25:08 saturn kernel: [348308.690177] EIP:
0060:[__d_lookup+108/336]Not tainted VLI
Jun 29 11:25:08 saturn kernel: [348308.690179] EFLAGS: 00010202
(2.6.22-rc4-mm2 #1)
Jun 29 11:25:08 saturn kernel: [348308.690185] EIP is at __d_lookup+0x6c/0x150
Jun 29 11:25:08 saturn kernel: [348308.690187] eax: 0001   ebx:
0001   ecx: 0001   edx: 089c1579
Jun 29 11:25:08 saturn kernel: [348308.690190] esi: c6840ee8   edi:
c301d734   ebp: f786f080   esp: c6840e84
Jun 29 11:25:08 saturn kernel: [348308.690193] ds: 007b   es: 007b
fs:   gs: 0033  ss: 0068
Jun 29 11:25:08 saturn kernel: [348308.690196] Process nfsd (pid:
30536, ti=c684 task=c62bce90 task.ti=c684)
Jun 29 11:25:08 saturn kernel: [348308.690198] Stack: c301d734
089c1579 c6840edb 0002 c6840ee8  0005 c6840edb
Jun 29 11:25:08 saturn kernel: [348308.690205]f9ec
c6840ee8 c301d734 c471ba84 c1088976 c7bbb090 c7bbb090 
Jun 29 11:25:08 saturn kernel: [348308.690211]c10b49fc
c6840edb 000d c14914fb 7793 332bce90 31313630 c5469900
Jun 29 11:25:08 saturn kernel: [348308.690217] Call Trace:
Jun 29 11:25:08 saturn kernel: [348308.690220]  [d_lookup+22/64]
d_lookup+0x16/0x40
Jun 29 11:25:08 saturn kernel: [348308.690224]
[proc_flush_task+76/496] proc_flush_task+0x4c/0x1f0
Jun 29 11:25:08 saturn kernel: [348308.690229]  [release_task+612/880]
release_task+0x264/0x370
Jun 29 11:25:08 saturn kernel: [348308.690234]  [do_wait+1850/3072]
do_wait+0x73a/0xc00
Jun 29 11:25:08 saturn kernel: [348308.690239]
[_spin_unlock_irq+38/64] _spin_unlock_irq+0x26/0x40
Jun 29 11:25:08 saturn kernel: [348308.690243]
[default_wake_function+0/16] default_wake_function+0x0/0x10
Jun 29 11:25:08 saturn kernel: [348308.690247]  [sys_wait4+49/64]
sys_wait4+0x31/0x40
Jun 29 11:25:08 saturn kernel: [348308.690251]  [sys_waitpid+39/48]
sys_waitpid+0x27/0x30
Jun 29 11:25:08 saturn kernel: [348308.690255]  [syscall_call+7/11]
syscall_call+0x7/0xb
Jun 29 11:25:08 saturn kernel: [348308.690259]  ===
Jun 29 11:25:08 saturn kernel: [348308.690260] INFO: lockdep is turned off.
Jun 29 11:25:08 saturn kernel: [348308.690262] Code: d3 e8 31 c3 23 1d
b4 d8 51 c1 b8 01 00 00 00 c1 e3 02 03 1d bc d8 51 c1 e8 e2 24 f9 ff
8b 1b 85 db 75 08 eb 44 85 c0 89 c3 74 3e <8b> 03 0f 18 00 90 8d 6b d8
8b 54 24 04 3b 55 34 75 e8 8b 34 24
Jun 29 11:25:08 saturn kernel: [348308.690289] EIP:
[__d_lookup+108/336] __d_lookup+0x6c/0x150 SS:ESP 0068:c6840e84
Jun 29 11:25:08 saturn kernel: [348308.690296] note: nfsd[30536]
exited with preempt_count 1
Jun 29 11:25:08 saturn kernel: [348308.690303] BUG: scheduling while
atomic: nfsd/0x1002/30536
Jun 29 11:25:08 saturn kernel: [348308.690305] INFO: lockdep is turned off.
Jun 29 11:25:08 saturn kernel: [348308.690307]  [schedule+1490/1744]
schedule+0x5d2/0x6d0
Jun 29 11:25:08 saturn kernel: [348308.690311]
[vt_console_print+106/688] vt_console_print+0x6a/0x2b0
Jun 29 11:25:08 saturn kernel: [348308.690316]  [__cond_resched+18/48]
__cond_resched+0x12/0x30
Jun 29 11:25:08 saturn kernel: [348308.690319]  [cond_resched+42/64]
cond_resched+0x2a/0x40
Jun 29 11:25:08 saturn kernel: [348308.690322]  [unmap_vmas+1116/1184]
unmap

Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-06-28 Thread Clemens Schwaighofer
On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote:
> Hi,

I had my system running up for about one month without any issues, and
then it happened again, same kernel oops, panic, end.

So I have upgraded to 2.6.22-rc4-mm2 in hope it might fix it, but I just
got another oops (uptime 4d) [see attached file]

my config hasn't changed in any way to the previous kernels.

-- 
[ Clemens Schwaighofer  -=:~ ]
[ TEQUILA\ Japan IT Group]
[6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp   ]
Jun 29 11:25:08 saturn kernel: [348308.690154] BUG: unable to handle kernel NULL pointer dereference at virtual address 0001
Jun 29 11:25:08 saturn kernel: [348308.690160]  printing eip:
Jun 29 11:25:08 saturn kernel: [348308.690162] c108887c
Jun 29 11:25:08 saturn kernel: [348308.690163] *pde = 
Jun 29 11:25:08 saturn kernel: [348308.690166] Oops:  [#2]
Jun 29 11:25:08 saturn kernel: [348308.690167] PREEMPT
Jun 29 11:25:08 saturn kernel: [348308.690169] Modules linked in: eeprom pcspkr i2c_viapro k8temp hwmon i2c_core
Jun 29 11:25:08 saturn kernel: [348308.690177] CPU:0
Jun 29 11:25:08 saturn kernel: [348308.690177] EIP:0060:[__d_lookup+108/336]Not tainted VLI
Jun 29 11:25:08 saturn kernel: [348308.690179] EFLAGS: 00010202   (2.6.22-rc4-mm2 #1)
Jun 29 11:25:08 saturn kernel: [348308.690185] EIP is at __d_lookup+0x6c/0x150
Jun 29 11:25:08 saturn kernel: [348308.690187] eax: 0001   ebx: 0001   ecx: 0001   edx: 089c1579
Jun 29 11:25:08 saturn kernel: [348308.690190] esi: c6840ee8   edi: c301d734   ebp: f786f080   esp: c6840e84
Jun 29 11:25:08 saturn kernel: [348308.690193] ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Jun 29 11:25:08 saturn kernel: [348308.690196] Process nfsd (pid: 30536, ti=c684 task=c62bce90 task.ti=c684)
Jun 29 11:25:08 saturn kernel: [348308.690198] Stack: c301d734 089c1579 c6840edb 0002 c6840ee8  0005 c6840edb
Jun 29 11:25:08 saturn kernel: [348308.690205]f9ec c6840ee8 c301d734 c471ba84 c1088976 c7bbb090 c7bbb090 
Jun 29 11:25:08 saturn kernel: [348308.690211]c10b49fc c6840edb 000d c14914fb 7793 332bce90 31313630 c5469900
Jun 29 11:25:08 saturn kernel: [348308.690217] Call Trace:
Jun 29 11:25:08 saturn kernel: [348308.690220]  [d_lookup+22/64] d_lookup+0x16/0x40
Jun 29 11:25:08 saturn kernel: [348308.690224]  [proc_flush_task+76/496] proc_flush_task+0x4c/0x1f0
Jun 29 11:25:08 saturn kernel: [348308.690229]  [release_task+612/880] release_task+0x264/0x370
Jun 29 11:25:08 saturn kernel: [348308.690234]  [do_wait+1850/3072] do_wait+0x73a/0xc00
Jun 29 11:25:08 saturn kernel: [348308.690239]  [_spin_unlock_irq+38/64] _spin_unlock_irq+0x26/0x40
Jun 29 11:25:08 saturn kernel: [348308.690243]  [default_wake_function+0/16] default_wake_function+0x0/0x10
Jun 29 11:25:08 saturn kernel: [348308.690247]  [sys_wait4+49/64] sys_wait4+0x31/0x40
Jun 29 11:25:08 saturn kernel: [348308.690251]  [sys_waitpid+39/48] sys_waitpid+0x27/0x30
Jun 29 11:25:08 saturn kernel: [348308.690255]  [syscall_call+7/11] syscall_call+0x7/0xb
Jun 29 11:25:08 saturn kernel: [348308.690259]  ===
Jun 29 11:25:08 saturn kernel: [348308.690260] INFO: lockdep is turned off.
Jun 29 11:25:08 saturn kernel: [348308.690262] Code: d3 e8 31 c3 23 1d b4 d8 51 c1 b8 01 00 00 00 c1 e3 02 03 1d bc d8 51 c1 e8 e2 24 f9 ff 8b 1b 85 db 75 08 eb 44 85 c0 89 c3 74 3e <8b> 03 0f 18 00 90 8d 6b d8 8b 54 24 04 3b 55 34 75 e8 8b 34 24
Jun 29 11:25:08 saturn kernel: [348308.690289] EIP: [__d_lookup+108/336] __d_lookup+0x6c/0x150 SS:ESP 0068:c6840e84
Jun 29 11:25:08 saturn kernel: [348308.690296] note: nfsd[30536] exited with preempt_count 1
Jun 29 11:25:08 saturn kernel: [348308.690303] BUG: scheduling while atomic: nfsd/0x1002/30536
Jun 29 11:25:08 saturn kernel: [348308.690305] INFO: lockdep is turned off.
Jun 29 11:25:08 saturn kernel: [348308.690307]  [schedule+1490/1744] schedule+0x5d2/0x6d0
Jun 29 11:25:08 saturn kernel: [348308.690311]  [vt_console_print+106/688] vt_console_print+0x6a/0x2b0
Jun 29 11:25:08 saturn kernel: [348308.690316]  [__cond_resched+18/48] __cond_resched+0x12/0x30
Jun 29 11:25:08 saturn kernel: [348308.690319]  [cond_resched+42/64] cond_resched+0x2a/0x40
Jun 29 11:25:08 saturn kernel: [348308.690322]  [unmap_vmas+1116/1184] unmap_vmas+0x45c/0x4a0
Jun 29 11:25:08 saturn kernel: [348308.690327]  [exit_mmap+105/256] exit_mmap+0x69/0x100
Jun 29 11:25:08 saturn kernel: [348308.690331]  [mmput+68/256] mmput+0x44/0x100
Jun 29 11:25:08 saturn kernel: [348308.690335]  [do_exit+301/2224] do_exit+0x12d/0x8b0
Jun 29 11:25:08 saturn kernel: [348308.690339]  [__wake_up+56/80] __wake_up+0x38/0x50
Jun 29 11:25:08 saturn kernel: [348308.690342]  [die+574/576] die+0x23e/0x240
Jun 29 11:25:08 saturn kernel: [348308.690346]  [do_page_fault

Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-17 Thread Eric Sandeen

Andrew Morton wrote:

On Wed, 16 May 2007 17:40:53 +0200 Tejun Heo <[EMAIL PROTECTED]> wrote:


I see.  I thought there was different approach on fixing the problem.
I'll try to backport the synchronization fix but am afraid it can be too
risky for -stable.  If it seems too risky, I'll send a patch to disable
reclamation.


OK.  Sad.  Maybe we add 
/proc/sys/fs/i-have-lots-of-disks-and-dont-mind-if-it-oopses
to enable the old behaviour.


Out of curiosity, is there a decent reproducer for this problem, or is 
it just a few lucky individuals? :)


-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 17:40:53 +0200 Tejun Heo <[EMAIL PROTECTED]> wrote:

> I see.  I thought there was different approach on fixing the problem.
> I'll try to backport the synchronization fix but am afraid it can be too
> risky for -stable.  If it seems too risky, I'll send a patch to disable
> reclamation.

OK.  Sad.  Maybe we add 
/proc/sys/fs/i-have-lots-of-disks-and-dont-mind-if-it-oopses
to enable the old behaviour.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-16 Thread Chuck Ebbert
Tejun Heo wrote:
>>>  The safest approach I can think of is making
>>> dentries for attributes unreclaimable but those are made reclaimable for
>>> good reasons.  :-(
>> Yeah, that was the google workaround.  It's OK unless you happen to have
>> thousands of disks on an ia32 box.
> 
> I see.  I thought there was different approach on fixing the problem.
> I'll try to backport the synchronization fix but am afraid it can be too
> risky for -stable.  If it seems too risky, I'll send a patch to disable
> reclamation.
> 

Realistically, how can disabling the reclamation be worse than what's
there now?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-16 Thread Tejun Heo
Andrew Morton wrote:
>>> a number of people have hit that, on and off.
>> Yeah, I've been seeing that one.  It should have been fixed with the big
>> fat patchset.
> 
> Great - fingers crossed.
> 
>>> We were close to having a fix, I think, but then we decided that great
>>> chunks of sysfs needed rewriting and I believe that we believe that this
>>> great rewrite will fix this bug.
>> How were we gonna fix it?  If it isn't too complex, I can cook up a
>> patch for -stable series.
> 
> Do we actually understand the causes?

Yeah, I think I do.  Basically, the problem is that on-demand attach and
reclamation update sd->s_dentry but accesses to it aren't synchronized
properly.  In the big fat patchset, first I tried to fix it by removing
sd->s_dentry completely which didn't work because of shadow nodes, so
the second try was to fix the synchronization which is in -mm now.

>>  The safest approach I can think of is making
>> dentries for attributes unreclaimable but those are made reclaimable for
>> good reasons.  :-(
> 
> Yeah, that was the google workaround.  It's OK unless you happen to have
> thousands of disks on an ia32 box.

I see.  I thought there was different approach on fixing the problem.
I'll try to backport the synchronization fix but am afraid it can be too
risky for -stable.  If it seems too risky, I'll send a patch to disable
reclamation.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 13:05:19 +0200 Tejun Heo <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> Andrew Morton wrote:
> > On Wed, 16 May 2007 09:24:54 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> 
> > wrote:
> > 
> >> I have re-occurring oopses and panics in those above kernels. The error
> >> is always the same. I have the last Kernel Panic as a picture here:
> >>
> >> http://dev.tequila.jp/clemens/R0010172.JPG
> >>
> >> The oops have the same error style like this Panic. I tried to capture
> >> one, but right after copying it into vim, I got a Panic. So next time I
> >> try to.
> >>
> >> I think it started with 2.6.19.2, I cannot remember I had any of those
> >> problems before. The box can work fine for about a week or more, or it
> >> looks up several times a day. I run a memtest for 10 h, but I had no 
> >> errors.
> > 
> > shrink_dcache_memory->...sysfs_d_iput->BUG
> > 
> > BUG_ON(sd->s_dentry != dentry);
> > 
> > a number of people have hit that, on and off.
> 
> Yeah, I've been seeing that one.  It should have been fixed with the big
> fat patchset.

Great - fingers crossed.

> > We were close to having a fix, I think, but then we decided that great
> > chunks of sysfs needed rewriting and I believe that we believe that this
> > great rewrite will fix this bug.
> 
> How were we gonna fix it?  If it isn't too complex, I can cook up a
> patch for -stable series.

Do we actually understand the causes?

>  The safest approach I can think of is making
> dentries for attributes unreclaimable but those are made reclaimable for
> good reasons.  :-(

Yeah, that was the google workaround.  It's OK unless you happen to have
thousands of disks on an ia32 box.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-16 Thread Tejun Heo
Hello,

Andrew Morton wrote:
> On Wed, 16 May 2007 09:24:54 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> 
> wrote:
> 
>> I have re-occurring oopses and panics in those above kernels. The error
>> is always the same. I have the last Kernel Panic as a picture here:
>>
>> http://dev.tequila.jp/clemens/R0010172.JPG
>>
>> The oops have the same error style like this Panic. I tried to capture
>> one, but right after copying it into vim, I got a Panic. So next time I
>> try to.
>>
>> I think it started with 2.6.19.2, I cannot remember I had any of those
>> problems before. The box can work fine for about a week or more, or it
>> looks up several times a day. I run a memtest for 10 h, but I had no errors.
> 
> shrink_dcache_memory->...sysfs_d_iput->BUG
> 
> BUG_ON(sd->s_dentry != dentry);
> 
> a number of people have hit that, on and off.

Yeah, I've been seeing that one.  It should have been fixed with the big
fat patchset.

> We were close to having a fix, I think, but then we decided that great
> chunks of sysfs needed rewriting and I believe that we believe that this
> great rewrite will fix this bug.

How were we gonna fix it?  If it isn't too complex, I can cook up a
patch for -stable series.  The safest approach I can think of is making
dentries for attributes unreclaimable but those are made reclaimable for
good reasons.  :-(

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-15 Thread Andrew Morton
On Wed, 16 May 2007 11:46:00 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> 
wrote:

> On 05/16/2007 10:53 AM, Andrew Morton wrote:
> 
> > How frequently do you see these failures?  If it's repeatable with any 
> > reliability
> > then it'd be great if you could test a patchset for us.  It's at:
> > 
> > http://userweb.kernel.org/~akpm/cs.gz
> > 
> > that's a single patch against 2.6.21-rc1, containing the following patches, 
> > which 
> > are from the forthcoming 2.6.21-rc1-mm1 lineup:
> 
> (and those above are 2.6.22-rc1 of course)
> 
> well, I tried to apply those patches and when I compile I get this error:
> 
>   CC  net/ipv6/exthdrs.o
> net/ipv6/exthdrs.c: In function ‘ipv6_rthdr_rcv’:
> net/ipv6/exthdrs.c:390: error: ‘struct sk_buff’ has no member named ‘h’
> net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’
> net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’
> net/ipv6/exthdrs.c:398: error: ‘struct sk_buff’ has no member named ‘h’
> make[2]: *** [net/ipv6/exthdrs.o] Error 1
> make[1]: *** [net/ipv6] Error 2
> make: *** [net] Error 2
> 
> It's probably totally unrelated but sadly a showstopper for testing the
> new sysfs patches,

oh crap, sorry.  Oh well.

Please test 2.6.22-rc1-mm1 which has all the fixes and is a damn fine
kernel.  It's at
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/

The core patch is at 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/2.6.22-rc1-mm1.gz,
 and it's against 2.6.22-rc1.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-15 Thread Clemens Schwaighofer
On 05/16/2007 10:53 AM, Andrew Morton wrote:

> How frequently do you see these failures?  If it's repeatable with any 
> reliability
> then it'd be great if you could test a patchset for us.  It's at:
> 
> http://userweb.kernel.org/~akpm/cs.gz
> 
> that's a single patch against 2.6.21-rc1, containing the following patches, 
> which 
> are from the forthcoming 2.6.21-rc1-mm1 lineup:

(and those above are 2.6.22-rc1 of course)

well, I tried to apply those patches and when I compile I get this error:

  CC  net/ipv6/exthdrs.o
net/ipv6/exthdrs.c: In function ‘ipv6_rthdr_rcv’:
net/ipv6/exthdrs.c:390: error: ‘struct sk_buff’ has no member named ‘h’
net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’
net/ipv6/exthdrs.c:391: error: ‘struct sk_buff’ has no member named ‘h’
net/ipv6/exthdrs.c:398: error: ‘struct sk_buff’ has no member named ‘h’
make[2]: *** [net/ipv6/exthdrs.o] Error 1
make[1]: *** [net/ipv6] Error 2
make: *** [net] Error 2

It's probably totally unrelated but sadly a showstopper for testing the
new sysfs patches,

-- 
[ Clemens Schwaighofer  -=:~ ]
[ TEQUILA\ Japan IT Group]
[6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp   ]



signature.asc
Description: OpenPGP digital signature


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-15 Thread Clemens Schwaighofer
On 05/16/2007 10:53 AM, Andrew Morton wrote:
>
>> I think it started with 2.6.19.2, I cannot remember I had any of those
>> problems before. The box can work fine for about a week or more, or it
>> looks up several times a day. I run a memtest for 10 h, but I had no errors.
> 
> shrink_dcache_memory->...sysfs_d_iput->BUG
> 
> BUG_ON(sd->s_dentry != dentry);
> 
> a number of people have hit that, on and off.
> 
> We were close to having a fix, I think, but then we decided that great
> chunks of sysfs needed rewriting and I believe that we believe that this
> great rewrite will fix this bug.
> 
> But alas, it's all too late for 2.6.22.

Well, there is always hope for 2.6.23 :)

> How frequently do you see these failures?  If it's repeatable with any 
> reliability
> then it'd be great if you could test a patchset for us.  It's at:
> 
> http://userweb.kernel.org/~akpm/cs.gz
> 
> that's a single patch against 2.6.21-rc1, containing the following patches, 
> which 
> are from the forthcoming 2.6.21-rc1-mm1 lineup:

I get this very frequently recently. I just got hit by another PANIC
which was probably the same issue. I will get this patch and try it out
and see if it helps for me.

-- 
[ Clemens Schwaighofer  -=:~ ]
[ TEQUILA\ Japan IT Group]
[6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp   ]



signature.asc
Description: OpenPGP digital signature


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-15 Thread Andrew Morton
On Wed, 16 May 2007 09:24:54 +0900 Clemens Schwaighofer <[EMAIL PROTECTED]> 
wrote:

> I have re-occurring oopses and panics in those above kernels. The error
> is always the same. I have the last Kernel Panic as a picture here:
> 
> http://dev.tequila.jp/clemens/R0010172.JPG
> 
> The oops have the same error style like this Panic. I tried to capture
> one, but right after copying it into vim, I got a Panic. So next time I
> try to.
> 
> I think it started with 2.6.19.2, I cannot remember I had any of those
> problems before. The box can work fine for about a week or more, or it
> looks up several times a day. I run a memtest for 10 h, but I had no errors.

shrink_dcache_memory->...sysfs_d_iput->BUG

BUG_ON(sd->s_dentry != dentry);

a number of people have hit that, on and off.

We were close to having a fix, I think, but then we decided that great
chunks of sysfs needed rewriting and I believe that we believe that this
great rewrite will fix this bug.

But alas, it's all too late for 2.6.22.

How frequently do you see these failures?  If it's repeatable with any 
reliability
then it'd be great if you could test a patchset for us.  It's at:

http://userweb.kernel.org/~akpm/cs.gz

that's a single patch against 2.6.21-rc1, containing the following patches, 
which 
are from the forthcoming 2.6.21-rc1-mm1 lineup:

origin
gregkh-driver-uio
gregkh-driver-uio-documentation
gregkh-driver-uio-hilscher-cif-card-driver
gregkh-driver-idr-fix-obscure-bug-in-allocation-path
gregkh-driver-idr-separate-out-idr_mark_full
gregkh-driver-ida-implement-idr-based-id-allocator
gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc
gregkh-driver-sysfs-allocate-inode-number-using-ida
gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd
gregkh-driver-sysfs-fix-error-handling-in-binattr-write
gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir
gregkh-driver-sysfs-flatten-and-fix-sysfs_rename_dir-error-handling
gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions
gregkh-driver-sysfs-add-sysfs_dirent-s_parent
gregkh-driver-sysfs-add-sysfs_dirent-s_name
gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union
gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock
gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree
gregkh-driver-sysfs-implement-bin_buffer
gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect
gregkh-driver-sysfs-kill-attribute-file-orphaning
gregkh-driver-sysfs-separate-out-sysfs_attach_dentry
gregkh-driver-sysfs-reimplement-syfs_drop_dentry
gregkh-driver-sysfs-kill-unnecessary-attribute-owner
gregkh-driver-driver-core-make-devt_attr-and-uevent_attr-static
gregkh-driver-put_device-might_sleep
gregkh-driver-kobject-warn
gregkh-driver-warn-when-statically-allocated-kobjects-are-used
gregkh-driver-nozomi
fix-gregkh-driver-sysfs-fix-error-handling-in-binattr-write

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops and Panics in 2.6.21.1, 2.6.20.6 and 2.6.19.2

2007-05-15 Thread Clemens Schwaighofer
On 05/16/2007 09:24 AM, Clemens Schwaighofer wrote:

> The oops have the same error style like this Panic. I tried to capture
> one, but right after copying it into vim, I got a Panic. So next time I
> try to.

I just got a oops and I could record it, the followed Kernel Panic
didn't send out any Panic to my remote serial box. So I cannot give more
information about that.



-- 
[ Clemens Schwaighofer  -=:~ ]
[ TEQUILA\ Japan IT Group]
[6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp   ]
[ 5955.558356] BUG: unable to handle kernel paging request at virtual address 
182b10f7
[ 5955.558362]  printing eip:
[ 5955.558363] 182b10f7
[ 5955.558365] *pde = 
[ 5955.558367] Oops:  [#1]
[ 5955.558369] PREEMPT
[ 5955.558370] Modules linked in: eeprom i2c_viapro i2c_core pcspkr k8temp 
hwmon eth1394
[ 5955.558377] CPU:0
[ 5955.558378] EIP:0060:[<182b10f7>]Not tainted VLI
[ 5955.558379] EFLAGS: 00010202   (2.6.21.1 #1)
[ 5955.558382] EIP is at 0x182b10f7
[ 5955.558385] eax: 80c7dcc1   ebx: f786bc00   ecx: 182b10f7   edx: 0002
[ 5955.558388] esi: f7836200   edi: c2669f6c   ebp: f780c88f   esp: c2669f34
[ 5955.558390] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[ 5955.558393] Process pdflush (pid: 192, ti=c2669000 task=c268da90 
task.ti=c2669000)
[ 5955.558395] Stack: c108a454 f7836284 00158443 024a f7836200 f783623c 
c2669f6c c147774c
[ 5955.558401]c108aa8b c2669fb8 00158925 bc01 c1053030 c1052c44 
 
[ 5955.558406]c2669f94 01b6 024a    
 0025
[ 5955.558411] Call Trace:
[ 5955.558413]  [] sync_sb_inodes+0x74/0x280
[ 5955.558419]  [] writeback_inodes+0xab/0x110
[ 5955.558423]  [] pdflush+0x0/0x220
[ 5955.558426]  [] wb_kupdate+0x74/0xe0
[ 5955.558430]  [] pdflush+0x114/0x220
[ 5955.558433]  [] wb_kupdate+0x0/0xe0
[ 5955.558436]  [] kthread+0xa8/0xe0
[ 5955.558439]  [] kthread+0x0/0xe0
[ 5955.558442]  [] kernel_thread_helper+0x7/0x18
[ 5955.558446]  ===
[ 5955.558447] Code:  Bad EIP value.
[ 5955.558449] EIP: [<182b10f7>] 0x182b10f7 SS:ESP 0068:c2669f34
[ 5955.558455] note: pdflush[192] exited with preempt_count 1



signature.asc
Description: OpenPGP digital signature