Re: 2.6.21.14 NFS related oops

2007-06-20 Thread Maciej Sołtysiak
> > I'm running 2.6.21.5 now with slab debugging on, here's what I got 
about

> > slab corruption:
> >
> > Slab corruption: skbuff_head_cache start=ef287b78, len=164
> > Redzone: 0x5a2cf071/0x5a2cf071.
> > Last user: [](kfree_skbmem+0x3c/0x90)
> > 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > Single bit error detected. Probably bad RAM.
> > Run memtest86+ or a similar memory test tool.
> > Prev obj: start=ef287ac8, len=164
> > Redzone: 0x170fc2a5/0x170fc2a5.
> > Last user: [](__alloc_skb+0x2b/0x100)
> > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
> > Next obj: start=ef287c28, len=164
> > Redzone: 0x170fc2a5/0x170fc2a5.
> > Last user: [](__alloc_skb+0x2b/0x100)
> > 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
> > 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee
> >
> > How probable is that it is really a bad memory issue?
> > Does this report say anything about which RAM chip I should
> > investigate/replace ? I have 1x512MB+1x256MB
> >
> > Best Regards,
> > Maciej
>
> I'd try doing as suggested above: run memtest86 on the computer for a
> couple of hours and see what it tells you. That should hopefully give
> you enough information to figure out which chips need replacing.

I am also getting BAD CRC on the disk that holds my swap partition.
I was wondering if slab debugging could say I have slab corruption not 
because

my RAM chips are bad, but because SWAP has bad blocks ? And that the
whole problem might be swap disk related not ram related.

> Cheers
>   Trond
Regards,
Maciej

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-20 Thread Maciej Sołtysiak
  I'm running 2.6.21.5 now with slab debugging on, here's what I got 
about

  slab corruption:
 
  Slab corruption: skbuff_head_cache start=ef287b78, len=164
  Redzone: 0x5a2cf071/0x5a2cf071.
  Last user: [c031710c](kfree_skbmem+0x3c/0x90)
  090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
  Single bit error detected. Probably bad RAM.
  Run memtest86+ or a similar memory test tool.
  Prev obj: start=ef287ac8, len=164
  Redzone: 0x170fc2a5/0x170fc2a5.
  Last user: [c031798b](__alloc_skb+0x2b/0x100)
  000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
  Next obj: start=ef287c28, len=164
  Redzone: 0x170fc2a5/0x170fc2a5.
  Last user: [c031798b](__alloc_skb+0x2b/0x100)
  000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
  010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee
 
  How probable is that it is really a bad memory issue?
  Does this report say anything about which RAM chip I should
  investigate/replace ? I have 1x512MB+1x256MB
 
  Best Regards,
  Maciej

 I'd try doing as suggested above: run memtest86 on the computer for a
 couple of hours and see what it tells you. That should hopefully give
 you enough information to figure out which chips need replacing.

I am also getting BAD CRC on the disk that holds my swap partition.
I was wondering if slab debugging could say I have slab corruption not 
because

my RAM chips are bad, but because SWAP has bad blocks ? And that the
whole problem might be swap disk related not ram related.

 Cheers
   Trond
Regards,
Maciej

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-16 Thread Trond Myklebust
On Sat, 2007-06-16 at 11:26 +0200, Maciej Sołtysiak wrote:
> >>  ===
> >> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 
> >> 8b
> >> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c 
> >> c3
> >> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: []
> >> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
> >
> > At a first guess, it looks as though something has scribbled over your
> > credential. Have you tried running this kernel with slab debugging
> > enabled?
> 
> I'm running 2.6.21.5 now with slab debugging on, here's what I got about
> slab corruption:
> 
> Slab corruption: skbuff_head_cache start=ef287b78, len=164
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [](kfree_skbmem+0x3c/0x90)
> 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Single bit error detected. Probably bad RAM.
> Run memtest86+ or a similar memory test tool.
> Prev obj: start=ef287ac8, len=164
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [](__alloc_skb+0x2b/0x100)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
> Next obj: start=ef287c28, len=164
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [](__alloc_skb+0x2b/0x100)
> 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
> 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee
> 
> How probable is that it is really a bad memory issue?
> Does this report say anything about which RAM chip I should
> investigate/replace ? I have 1x512MB+1x256MB
> 
> Best Regards,
> Maciej

I'd try doing as suggested above: run memtest86 on the computer for a
couple of hours and see what it tells you. That should hopefully give
you enough information to figure out which chips need replacing.

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-16 Thread Maciej Sołtysiak

 ===
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 
8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c 
c3

89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: []
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec


At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?


I'm running 2.6.21.5 now with slab debugging on, here's what I got about
slab corruption:

Slab corruption: skbuff_head_cache start=ef287b78, len=164
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [](kfree_skbmem+0x3c/0x90)
090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Probably bad RAM.
Run memtest86+ or a similar memory test tool.
Prev obj: start=ef287ac8, len=164
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [](__alloc_skb+0x2b/0x100)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
Next obj: start=ef287c28, len=164
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [](__alloc_skb+0x2b/0x100)
000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee

How probable is that it is really a bad memory issue?
Does this report say anything about which RAM chip I should
investigate/replace ? I have 1x512MB+1x256MB

Best Regards,
Maciej

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-16 Thread Maciej Sołtysiak

 ===
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 
8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 18 83 c4 1c 
c3

89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [f0a93c94]
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec


At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?


I'm running 2.6.21.5 now with slab debugging on, here's what I got about
slab corruption:

Slab corruption: skbuff_head_cache start=ef287b78, len=164
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [c031710c](kfree_skbmem+0x3c/0x90)
090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Probably bad RAM.
Run memtest86+ or a similar memory test tool.
Prev obj: start=ef287ac8, len=164
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [c031798b](__alloc_skb+0x2b/0x100)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
Next obj: start=ef287c28, len=164
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [c031798b](__alloc_skb+0x2b/0x100)
000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee

How probable is that it is really a bad memory issue?
Does this report say anything about which RAM chip I should
investigate/replace ? I have 1x512MB+1x256MB

Best Regards,
Maciej

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-16 Thread Trond Myklebust
On Sat, 2007-06-16 at 11:26 +0200, Maciej Sołtysiak wrote:
   ===
  Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 
  8b
  4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 18 83 c4 1c 
  c3
  89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [f0a93c94]
  rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
 
  At a first guess, it looks as though something has scribbled over your
  credential. Have you tried running this kernel with slab debugging
  enabled?
 
 I'm running 2.6.21.5 now with slab debugging on, here's what I got about
 slab corruption:
 
 Slab corruption: skbuff_head_cache start=ef287b78, len=164
 Redzone: 0x5a2cf071/0x5a2cf071.
 Last user: [c031710c](kfree_skbmem+0x3c/0x90)
 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
 Single bit error detected. Probably bad RAM.
 Run memtest86+ or a similar memory test tool.
 Prev obj: start=ef287ac8, len=164
 Redzone: 0x170fc2a5/0x170fc2a5.
 Last user: [c031798b](__alloc_skb+0x2b/0x100)
 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
 Next obj: start=ef287c28, len=164
 Redzone: 0x170fc2a5/0x170fc2a5.
 Last user: [c031798b](__alloc_skb+0x2b/0x100)
 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee
 
 How probable is that it is really a bad memory issue?
 Does this report say anything about which RAM chip I should
 investigate/replace ? I have 1x512MB+1x256MB
 
 Best Regards,
 Maciej

I'd try doing as suggested above: run memtest86 on the computer for a
couple of hours and see what it tells you. That should hopefully give
you enough information to figure out which chips need replacing.

Cheers
  Trond

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-14 Thread Maciej Soltysiak

Trond Myklebust pisze:

On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
  

Hi,

If anyone is interested I got this OOPS while running a torrent 
(btdownloadcurses)

application writing directly to a NAS mounted via nfs3.

The client machine is 2.6.21.14 and it is mounted with options:
wsize=8192,rsize=8192,hard,intr,tcp



Hmm. The Oops says '2.6.20.14-cks1'

Firstly, does that have any extra out-of-tree patches?
Secondly, is it reproducible with 2.6.21 or a more recent kernel?

  

Ah, yes, 2.6.20.14 not 2.6.21.14 and it does contain 2 extra things:
- Con Kolivas' -cks1 (server version)
- reiser4 code, one mounted filesystem.
After that, the application hung and i am unable to cd into the mounted 
nfs directory

nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)

Best regards,
Maciej

BUG: unable to handle kernel paging request at virtual address 5018f248
 printing eip:
f0a93c94
*pde = 
Oops: 0002 [#1]
Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
i2c_isa i2c_viapro i2c_core via_agp agpgart rtc

CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010206   (2.6.20.14-cks1 #15)
EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
eax: d2f4447c   ebx: c655d584   ecx:    edx: f0aa9f60
esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
ds: 007b   es: 007b   ss: 0068
Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 
f0a8d7fd
   f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 
c655d584
    c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
c034e11c

Call Trace:
 [] call_decode+0x27d/0x5e0 [sunrpc]
 [] rpcauth_unbindcred+0x20/0x60 [sunrpc]
 [] nfs_readpage_result_full+0xf7/0x120 [nfs]
 [] nfs3_xdr_readres+0x0/0x160 [nfs]
 [] rpc_async_schedule+0x0/0x10 [sunrpc]
 [] __rpc_execute+0x5f/0x250 [sunrpc]
 [] schedule+0x21c/0x450
 [] run_workqueue+0x7a/0x110
 [] worker_thread+0x137/0x160
 [] default_wake_function+0x0/0x10
 [] worker_thread+0x0/0x160
 [] kthread+0xa9/0xe0
 [] kthread+0x0/0xe0
 [] kernel_thread_helper+0x7/0x10
 ===
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: []
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec



At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?

  
No, i will turn it on, though. The server crashes on heavy NFS traffic 
(eg. nightly rsync backup)

It crashed again today, but the oops did not get written to kern.log

Cheers
  Trond
  

Thanks for your reply and best regards,
Maciej

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-14 Thread Maciej Soltysiak

Trond Myklebust pisze:

On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
  

Hi,

If anyone is interested I got this OOPS while running a torrent 
(btdownloadcurses)

application writing directly to a NAS mounted via nfs3.

The client machine is 2.6.21.14 and it is mounted with options:
wsize=8192,rsize=8192,hard,intr,tcp



Hmm. The Oops says '2.6.20.14-cks1'

Firstly, does that have any extra out-of-tree patches?
Secondly, is it reproducible with 2.6.21 or a more recent kernel?

  

Ah, yes, 2.6.20.14 not 2.6.21.14 and it does contain 2 extra things:
- Con Kolivas' -cks1 (server version)
- reiser4 code, one mounted filesystem.
After that, the application hung and i am unable to cd into the mounted 
nfs directory

nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)

Best regards,
Maciej

BUG: unable to handle kernel paging request at virtual address 5018f248
 printing eip:
f0a93c94
*pde = 
Oops: 0002 [#1]
Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
i2c_isa i2c_viapro i2c_core via_agp agpgart rtc

CPU:0
EIP:0060:[f0a93c94]Not tainted VLI
EFLAGS: 00010206   (2.6.20.14-cks1 #15)
EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
eax: d2f4447c   ebx: c655d584   ecx:    edx: f0aa9f60
esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
ds: 007b   es: 007b   ss: 0068
Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 
f0a8d7fd
   f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 
c655d584
    c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
c034e11c

Call Trace:
 [f0a8d7fd] call_decode+0x27d/0x5e0 [sunrpc]
 [f0a93d60] rpcauth_unbindcred+0x20/0x60 [sunrpc]
 [f08bae07] nfs_readpage_result_full+0xf7/0x120 [nfs]
 [f08c0520] nfs3_xdr_readres+0x0/0x160 [nfs]
 [f0a93260] rpc_async_schedule+0x0/0x10 [sunrpc]
 [f0a9306f] __rpc_execute+0x5f/0x250 [sunrpc]
 [c034e11c] schedule+0x21c/0x450
 [c01283aa] run_workqueue+0x7a/0x110
 [c0128a07] worker_thread+0x137/0x160
 [c01176b0] default_wake_function+0x0/0x10
 [c01288d0] worker_thread+0x0/0x160
 [c012b329] kthread+0xa9/0xe0
 [c012b280] kthread+0x0/0xe0
 [c0103a97] kernel_thread_helper+0x7/0x10
 ===
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 18 83 c4 1c c3
89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [f0a93c94]
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec



At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?

  
No, i will turn it on, though. The server crashes on heavy NFS traffic 
(eg. nightly rsync backup)

It crashed again today, but the oops did not get written to kern.log

Cheers
  Trond
  

Thanks for your reply and best regards,
Maciej

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-13 Thread Chuck Ebbert
On 06/13/2007 03:17 PM, Trond Myklebust wrote:
> On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
>>  ===
>> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
>> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
>> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: []
>> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
> 
> At a first guess, it looks as though something has scribbled over your
> credential. Have you tried running this kernel with slab debugging
> enabled?
> 

Disassembly of this code yields gibberish, like a bit got flipped
somewhere:

  1c:   ff 51 18  call   *0x18(%ecx)
  1f:   8b 5c 24 10   mov0x10(%esp),%ebx
  23:   83 74 24 14 8bxorl   $0xff8b,0x14(%esp)
  28:   7c 24 jl 4e <_EIP+0x4e>
   0:   18 83 c4 1c c3 89 sbb%al,0x89c31cc4(%ebx)   <=
   6:   74 24 je 2c <_EIP+0x2c>
   8:   0c 8b or $0x8b,%al
   a:   40inc%eax
   b:   10 8b 40 24 8b 40 adc%cl,0x408b2440(%ebx)
  11:   10.byte 0x10
  12:   8b 40 08  mov0x8(%eax),%eax

Somewhere around 23: things went horribly wrong.
At 12: it starts to make sense again.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-13 Thread Trond Myklebust
On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
> Hi,
> 
> If anyone is interested I got this OOPS while running a torrent 
> (btdownloadcurses)
> application writing directly to a NAS mounted via nfs3.
> 
> The client machine is 2.6.21.14 and it is mounted with options:
> wsize=8192,rsize=8192,hard,intr,tcp

Hmm. The Oops says '2.6.20.14-cks1'

Firstly, does that have any extra out-of-tree patches?
Secondly, is it reproducible with 2.6.21 or a more recent kernel?

> After that, the application hung and i am unable to cd into the mounted 
> nfs directory
> nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)
> 
> Best regards,
> Maciej
> 
> BUG: unable to handle kernel paging request at virtual address 5018f248
>  printing eip:
> f0a93c94
> *pde = 
> Oops: 0002 [#1]
> Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
> i2c_isa i2c_viapro i2c_core via_agp agpgart rtc
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010206   (2.6.20.14-cks1 #15)
> EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
> eax: d2f4447c   ebx: c655d584   ecx:    edx: f0aa9f60
> esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
> ds: 007b   es: 007b   ss: 0068
> Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
> Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 
> f0a8d7fd
>f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 
> c655d584
> c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
> c034e11c
> Call Trace:
>  [] call_decode+0x27d/0x5e0 [sunrpc]
>  [] rpcauth_unbindcred+0x20/0x60 [sunrpc]
>  [] nfs_readpage_result_full+0xf7/0x120 [nfs]
>  [] nfs3_xdr_readres+0x0/0x160 [nfs]
>  [] rpc_async_schedule+0x0/0x10 [sunrpc]
>  [] __rpc_execute+0x5f/0x250 [sunrpc]
>  [] schedule+0x21c/0x450
>  [] run_workqueue+0x7a/0x110
>  [] worker_thread+0x137/0x160
>  [] default_wake_function+0x0/0x10
>  [] worker_thread+0x0/0x160
>  [] kthread+0xa9/0xe0
>  [] kthread+0x0/0xe0
>  [] kernel_thread_helper+0x7/0x10
>  ===
> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: []
> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec

At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21.14 NFS related oops

2007-06-13 Thread Maciej Soltysiak

Hi,

If anyone is interested I got this OOPS while running a torrent 
(btdownloadcurses)

application writing directly to a NAS mounted via nfs3.

The client machine is 2.6.21.14 and it is mounted with options:
wsize=8192,rsize=8192,hard,intr,tcp

After that, the application hung and i am unable to cd into the mounted 
nfs directory

nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)

Best regards,
Maciej

BUG: unable to handle kernel paging request at virtual address 5018f248
printing eip:
f0a93c94
*pde = 
Oops: 0002 [#1]
Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
i2c_isa i2c_viapro i2c_core via_agp agpgart rtc

CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010206   (2.6.20.14-cks1 #15)
EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
eax: d2f4447c   ebx: c655d584   ecx:    edx: f0aa9f60
esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
ds: 007b   es: 007b   ss: 0068
Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 
f0a8d7fd
  f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 
c655d584
   c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
c034e11c

Call Trace:
[] call_decode+0x27d/0x5e0 [sunrpc]
[] rpcauth_unbindcred+0x20/0x60 [sunrpc]
[] nfs_readpage_result_full+0xf7/0x120 [nfs]
[] nfs3_xdr_readres+0x0/0x160 [nfs]
[] rpc_async_schedule+0x0/0x10 [sunrpc]
[] __rpc_execute+0x5f/0x250 [sunrpc]
[] schedule+0x21c/0x450
[] run_workqueue+0x7a/0x110
[] worker_thread+0x137/0x160
[] default_wake_function+0x0/0x10
[] worker_thread+0x0/0x160
[] kthread+0xa9/0xe0
[] kthread+0x0/0xe0
[] kernel_thread_helper+0x7/0x10
===
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: []
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21.14 NFS related oops

2007-06-13 Thread Maciej Soltysiak

Hi,

If anyone is interested I got this OOPS while running a torrent 
(btdownloadcurses)

application writing directly to a NAS mounted via nfs3.

The client machine is 2.6.21.14 and it is mounted with options:
wsize=8192,rsize=8192,hard,intr,tcp

After that, the application hung and i am unable to cd into the mounted 
nfs directory

nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)

Best regards,
Maciej

BUG: unable to handle kernel paging request at virtual address 5018f248
printing eip:
f0a93c94
*pde = 
Oops: 0002 [#1]
Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
i2c_isa i2c_viapro i2c_core via_agp agpgart rtc

CPU:0
EIP:0060:[f0a93c94]Not tainted VLI
EFLAGS: 00010206   (2.6.20.14-cks1 #15)
EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
eax: d2f4447c   ebx: c655d584   ecx:    edx: f0aa9f60
esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
ds: 007b   es: 007b   ss: 0068
Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 
f0a8d7fd
  f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 
c655d584
   c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
c034e11c

Call Trace:
[f0a8d7fd] call_decode+0x27d/0x5e0 [sunrpc]
[f0a93d60] rpcauth_unbindcred+0x20/0x60 [sunrpc]
[f08bae07] nfs_readpage_result_full+0xf7/0x120 [nfs]
[f08c0520] nfs3_xdr_readres+0x0/0x160 [nfs]
[f0a93260] rpc_async_schedule+0x0/0x10 [sunrpc]
[f0a9306f] __rpc_execute+0x5f/0x250 [sunrpc]
[c034e11c] schedule+0x21c/0x450
[c01283aa] run_workqueue+0x7a/0x110
[c0128a07] worker_thread+0x137/0x160
[c01176b0] default_wake_function+0x0/0x10
[c01288d0] worker_thread+0x0/0x160
[c012b329] kthread+0xa9/0xe0
[c012b280] kthread+0x0/0xe0
[c0103a97] kernel_thread_helper+0x7/0x10
===
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 18 83 c4 1c c3
89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [f0a93c94]
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-13 Thread Trond Myklebust
On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
 Hi,
 
 If anyone is interested I got this OOPS while running a torrent 
 (btdownloadcurses)
 application writing directly to a NAS mounted via nfs3.
 
 The client machine is 2.6.21.14 and it is mounted with options:
 wsize=8192,rsize=8192,hard,intr,tcp

Hmm. The Oops says '2.6.20.14-cks1'

Firstly, does that have any extra out-of-tree patches?
Secondly, is it reproducible with 2.6.21 or a more recent kernel?

 After that, the application hung and i am unable to cd into the mounted 
 nfs directory
 nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)
 
 Best regards,
 Maciej
 
 BUG: unable to handle kernel paging request at virtual address 5018f248
  printing eip:
 f0a93c94
 *pde = 
 Oops: 0002 [#1]
 Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
 i2c_isa i2c_viapro i2c_core via_agp agpgart rtc
 CPU:0
 EIP:0060:[f0a93c94]Not tainted VLI
 EFLAGS: 00010206   (2.6.20.14-cks1 #15)
 EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
 eax: d2f4447c   ebx: c655d584   ecx:    edx: f0aa9f60
 esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
 ds: 007b   es: 007b   ss: 0068
 Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
 Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 
 f0a8d7fd
f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 
 c655d584
 c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
 c034e11c
 Call Trace:
  [f0a8d7fd] call_decode+0x27d/0x5e0 [sunrpc]
  [f0a93d60] rpcauth_unbindcred+0x20/0x60 [sunrpc]
  [f08bae07] nfs_readpage_result_full+0xf7/0x120 [nfs]
  [f08c0520] nfs3_xdr_readres+0x0/0x160 [nfs]
  [f0a93260] rpc_async_schedule+0x0/0x10 [sunrpc]
  [f0a9306f] __rpc_execute+0x5f/0x250 [sunrpc]
  [c034e11c] schedule+0x21c/0x450
  [c01283aa] run_workqueue+0x7a/0x110
  [c0128a07] worker_thread+0x137/0x160
  [c01176b0] default_wake_function+0x0/0x10
  [c01288d0] worker_thread+0x0/0x160
  [c012b329] kthread+0xa9/0xe0
  [c012b280] kthread+0x0/0xe0
  [c0103a97] kernel_thread_helper+0x7/0x10
  ===
 Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 18 83 c4 1c c3
 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [f0a93c94]
 rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec

At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?

Cheers
  Trond

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21.14 NFS related oops

2007-06-13 Thread Chuck Ebbert
On 06/13/2007 03:17 PM, Trond Myklebust wrote:
 On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
  ===
 Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 18 83 c4 1c c3
 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [f0a93c94]
 rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
 
 At a first guess, it looks as though something has scribbled over your
 credential. Have you tried running this kernel with slab debugging
 enabled?
 

Disassembly of this code yields gibberish, like a bit got flipped
somewhere:

  1c:   ff 51 18  call   *0x18(%ecx)
  1f:   8b 5c 24 10   mov0x10(%esp),%ebx
  23:   83 74 24 14 8bxorl   $0xff8b,0x14(%esp)
  28:   7c 24 jl 4e _EIP+0x4e
   0:   18 83 c4 1c c3 89 sbb%al,0x89c31cc4(%ebx)   =
   6:   74 24 je 2c _EIP+0x2c
   8:   0c 8b or $0x8b,%al
   a:   40inc%eax
   b:   10 8b 40 24 8b 40 adc%cl,0x408b2440(%ebx)
  11:   10.byte 0x10
  12:   8b 40 08  mov0x8(%eax),%eax

Somewhere around 23: things went horribly wrong.
At 12: it starts to make sense again.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/