Re: 2.6.21.14 NFS related oops
> > I'm running 2.6.21.5 now with slab debugging on, here's what I got about > > slab corruption: > > > > Slab corruption: skbuff_head_cache start=ef287b78, len=164 > > Redzone: 0x5a2cf071/0x5a2cf071. > > Last user: [](kfree_skbmem+0x3c/0x90) > > 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > > Single bit error detected. Probably bad RAM. > > Run memtest86+ or a similar memory test tool. > > Prev obj: start=ef287ac8, len=164 > > Redzone: 0x170fc2a5/0x170fc2a5. > > Last user: [](__alloc_skb+0x2b/0x100) > > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00 > > Next obj: start=ef287c28, len=164 > > Redzone: 0x170fc2a5/0x170fc2a5. > > Last user: [](__alloc_skb+0x2b/0x100) > > 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46 > > 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee > > > > How probable is that it is really a bad memory issue? > > Does this report say anything about which RAM chip I should > > investigate/replace ? I have 1x512MB+1x256MB > > > > Best Regards, > > Maciej > > I'd try doing as suggested above: run memtest86 on the computer for a > couple of hours and see what it tells you. That should hopefully give > you enough information to figure out which chips need replacing. I am also getting BAD CRC on the disk that holds my swap partition. I was wondering if slab debugging could say I have slab corruption not because my RAM chips are bad, but because SWAP has bad blocks ? And that the whole problem might be swap disk related not ram related. > Cheers > Trond Regards, Maciej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.14 NFS related oops
On Sat, 2007-06-16 at 11:26 +0200, Maciej Sołtysiak wrote: > >> === > >> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a > >> 8b > >> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c > >> c3 > >> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [] > >> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec > > > > At a first guess, it looks as though something has scribbled over your > > credential. Have you tried running this kernel with slab debugging > > enabled? > > I'm running 2.6.21.5 now with slab debugging on, here's what I got about > slab corruption: > > Slab corruption: skbuff_head_cache start=ef287b78, len=164 > Redzone: 0x5a2cf071/0x5a2cf071. > Last user: [](kfree_skbmem+0x3c/0x90) > 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > Single bit error detected. Probably bad RAM. > Run memtest86+ or a similar memory test tool. > Prev obj: start=ef287ac8, len=164 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [](__alloc_skb+0x2b/0x100) > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00 > Next obj: start=ef287c28, len=164 > Redzone: 0x170fc2a5/0x170fc2a5. > Last user: [](__alloc_skb+0x2b/0x100) > 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46 > 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee > > How probable is that it is really a bad memory issue? > Does this report say anything about which RAM chip I should > investigate/replace ? I have 1x512MB+1x256MB > > Best Regards, > Maciej I'd try doing as suggested above: run memtest86 on the computer for a couple of hours and see what it tells you. That should hopefully give you enough information to figure out which chips need replacing. Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.14 NFS related oops
=== Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [] rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec At a first guess, it looks as though something has scribbled over your credential. Have you tried running this kernel with slab debugging enabled? I'm running 2.6.21.5 now with slab debugging on, here's what I got about slab corruption: Slab corruption: skbuff_head_cache start=ef287b78, len=164 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [](kfree_skbmem+0x3c/0x90) 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Single bit error detected. Probably bad RAM. Run memtest86+ or a similar memory test tool. Prev obj: start=ef287ac8, len=164 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [](__alloc_skb+0x2b/0x100) 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00 Next obj: start=ef287c28, len=164 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [](__alloc_skb+0x2b/0x100) 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee How probable is that it is really a bad memory issue? Does this report say anything about which RAM chip I should investigate/replace ? I have 1x512MB+1x256MB Best Regards, Maciej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.14 NFS related oops
Trond Myklebust pisze: On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote: Hi, If anyone is interested I got this OOPS while running a torrent (btdownloadcurses) application writing directly to a NAS mounted via nfs3. The client machine is 2.6.21.14 and it is mounted with options: wsize=8192,rsize=8192,hard,intr,tcp Hmm. The Oops says '2.6.20.14-cks1' Firstly, does that have any extra out-of-tree patches? Secondly, is it reproducible with 2.6.21 or a more recent kernel? Ah, yes, 2.6.20.14 not 2.6.21.14 and it does contain 2 extra things: - Con Kolivas' -cks1 (server version) - reiser4 code, one mounted filesystem. After that, the application hung and i am unable to cd into the mounted nfs directory nor unmount it (busy), nor kill the app (kill -9 fails, process in D state) Best regards, Maciej BUG: unable to handle kernel paging request at virtual address 5018f248 printing eip: f0a93c94 *pde = Oops: 0002 [#1] Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf i2c_isa i2c_viapro i2c_core via_agp agpgart rtc CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010206 (2.6.20.14-cks1 #15) EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc] eax: d2f4447c ebx: c655d584 ecx: edx: f0aa9f60 esi: e91ea640 edi: d2f44474 ebp: ede2f228 esp: e64b5eec ds: 007b es: 007b ss: 0068 Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000) Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 f0a8d7fd f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 c655d584 c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 c034e11c Call Trace: [] call_decode+0x27d/0x5e0 [sunrpc] [] rpcauth_unbindcred+0x20/0x60 [sunrpc] [] nfs_readpage_result_full+0xf7/0x120 [nfs] [] nfs3_xdr_readres+0x0/0x160 [nfs] [] rpc_async_schedule+0x0/0x10 [sunrpc] [] __rpc_execute+0x5f/0x250 [sunrpc] [] schedule+0x21c/0x450 [] run_workqueue+0x7a/0x110 [] worker_thread+0x137/0x160 [] default_wake_function+0x0/0x10 [] worker_thread+0x0/0x160 [] kthread+0xa9/0xe0 [] kthread+0x0/0xe0 [] kernel_thread_helper+0x7/0x10 === Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [] rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec At a first guess, it looks as though something has scribbled over your credential. Have you tried running this kernel with slab debugging enabled? No, i will turn it on, though. The server crashes on heavy NFS traffic (eg. nightly rsync backup) It crashed again today, but the oops did not get written to kern.log Cheers Trond Thanks for your reply and best regards, Maciej - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.14 NFS related oops
On 06/13/2007 03:17 PM, Trond Myklebust wrote: > On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote: >> === >> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b >> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3 >> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [] >> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec > > At a first guess, it looks as though something has scribbled over your > credential. Have you tried running this kernel with slab debugging > enabled? > Disassembly of this code yields gibberish, like a bit got flipped somewhere: 1c: ff 51 18 call *0x18(%ecx) 1f: 8b 5c 24 10 mov0x10(%esp),%ebx 23: 83 74 24 14 8bxorl $0xff8b,0x14(%esp) 28: 7c 24 jl 4e <_EIP+0x4e> 0: 18 83 c4 1c c3 89 sbb%al,0x89c31cc4(%ebx) <= 6: 74 24 je 2c <_EIP+0x2c> 8: 0c 8b or $0x8b,%al a: 40inc%eax b: 10 8b 40 24 8b 40 adc%cl,0x408b2440(%ebx) 11: 10.byte 0x10 12: 8b 40 08 mov0x8(%eax),%eax Somewhere around 23: things went horribly wrong. At 12: it starts to make sense again. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21.14 NFS related oops
On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote: > Hi, > > If anyone is interested I got this OOPS while running a torrent > (btdownloadcurses) > application writing directly to a NAS mounted via nfs3. > > The client machine is 2.6.21.14 and it is mounted with options: > wsize=8192,rsize=8192,hard,intr,tcp Hmm. The Oops says '2.6.20.14-cks1' Firstly, does that have any extra out-of-tree patches? Secondly, is it reproducible with 2.6.21 or a more recent kernel? > After that, the application hung and i am unable to cd into the mounted > nfs directory > nor unmount it (busy), nor kill the app (kill -9 fails, process in D state) > > Best regards, > Maciej > > BUG: unable to handle kernel paging request at virtual address 5018f248 > printing eip: > f0a93c94 > *pde = > Oops: 0002 [#1] > Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf > i2c_isa i2c_viapro i2c_core via_agp agpgart rtc > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010206 (2.6.20.14-cks1 #15) > EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc] > eax: d2f4447c ebx: c655d584 ecx: edx: f0aa9f60 > esi: e91ea640 edi: d2f44474 ebp: ede2f228 esp: e64b5eec > ds: 007b es: 007b ss: 0068 > Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000) > Stack: 0286 ede2f8a0 ede2f8a0 0286 c655d584 121d0da3 0820 > f0a8d7fd >f0a93d60 f08bae07 0286 c655d5cc 0286 0286 f08c0520 > c655d584 > c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 > c034e11c > Call Trace: > [] call_decode+0x27d/0x5e0 [sunrpc] > [] rpcauth_unbindcred+0x20/0x60 [sunrpc] > [] nfs_readpage_result_full+0xf7/0x120 [nfs] > [] nfs3_xdr_readres+0x0/0x160 [nfs] > [] rpc_async_schedule+0x0/0x10 [sunrpc] > [] __rpc_execute+0x5f/0x250 [sunrpc] > [] schedule+0x21c/0x450 > [] run_workqueue+0x7a/0x110 > [] worker_thread+0x137/0x160 > [] default_wake_function+0x0/0x10 > [] worker_thread+0x0/0x160 > [] kthread+0xa9/0xe0 > [] kthread+0x0/0xe0 > [] kernel_thread_helper+0x7/0x10 > === > Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b > 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3 > 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [] > rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec At a first guess, it looks as though something has scribbled over your credential. Have you tried running this kernel with slab debugging enabled? Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/