Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Fri, 21 Dec 2007 22:51:45 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote: > > Here's a test patch: > > Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug. > > Thanks a lot to both of you. Thank you for testing -mm (especially on sparc64) and for reporting the bug and for testing the fix. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, > > > [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: > > > 005119b0 Y: Not tainted > > > [ 145.128940] TPC: > > > > My suspicion at this point is that with certain RAM layouts, simply > > iterating over PFN's is simply not working out. > > That was my original suspicion, which is why I asked Mariusz to > effectively comment out the actual PFN lookup up-thread. I didn't send > him a patch to do that, so I guess my instructions on how to hack it > may have been misunderstood. No. I just made a trivial mistake :-/ Sorry for confusion. I guess I need to verify things three times before sending an email next time. > > pfn_to_page() seems to be doing no range checking, and with sparsemem > > vmemmap, which sparc64 always uses, this can be problematic. > > > > It just blindly goes "vmemmap + pfn" which is asking for trouble, in > > particular when the physical RAM layout really is sparse. > > > > Maybe it's enough to add a pfn_valid() check here? If pfn_valid() > > means there is a vmemmap translation setup for that page struct too, > > it would work. > > Here's a test patch: Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug. Thanks a lot to both of you. Mariusz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 19:06:55 -0600 > @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi > return -EIO; > > while (count > 0) { > - ppage = pfn_to_page(pfn++); > + ppage = 0; > + if (pfn_valid(pfn)) > + ppage = pfn_to_page(pfn); > + pfn++; > if (!ppage) > pcount = 0; > else Yes that should work, please use "NULL" in the final version of the patch instead of "0" so that sparse is happy. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Thu, Dec 20, 2007 at 04:17:26PM -0800, David Miller wrote: > From: Mariusz Kozlowski <[EMAIL PROTECTED]> > Date: Thu, 20 Dec 2007 20:47:55 +0100 > > > [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: > > 005119b0 Y: Not tainted > > [ 145.128940] TPC: > > My suspicion at this point is that with certain RAM layouts, simply > iterating over PFN's is simply not working out. That was my original suspicion, which is why I asked Mariusz to effectively comment out the actual PFN lookup up-thread. I didn't send him a patch to do that, so I guess my instructions on how to hack it may have been misunderstood. > pfn_to_page() seems to be doing no range checking, and with sparsemem > vmemmap, which sparc64 always uses, this can be problematic. > > It just blindly goes "vmemmap + pfn" which is asking for trouble, in > particular when the physical RAM layout really is sparse. > > Maybe it's enough to add a pfn_valid() check here? If pfn_valid() > means there is a vmemmap translation setup for that page struct too, > it would work. Here's a test patch: Index: mm/fs/proc/proc_misc.c === --- mm.orig/fs/proc/proc_misc.c 2007-12-20 19:04:35.0 -0600 +++ mm/fs/proc/proc_misc.c 2007-12-20 19:06:01.0 -0600 @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi return -EIO; while (count > 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) pcount = 0; else @@ -773,7 +776,10 @@ static ssize_t kpageflags_read(struct fi return -EIO; while (count > 0) { - ppage = pfn_to_page(pfn++); + ppage = 0; + if (pfn_valid(pfn)) + ppage = pfn_to_page(pfn); + pfn++; if (!ppage) kflags = 0; else -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Mariusz Kozlowski <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 20:47:55 +0100 > [ 145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: > 005119b0 Y: Not tainted > [ 145.128940] TPC: My suspicion at this point is that with certain RAM layouts, simply iterating over PFN's is simply not working out. pfn_to_page() seems to be doing no range checking, and with sparsemem vmemmap, which sparc64 always uses, this can be problematic. It just blindly goes "vmemmap + pfn" which is asking for trouble, in particular when the physical RAM layout really is sparse. Maybe it's enough to add a pfn_valid() check here? If pfn_valid() means there is a vmemmap translation setup for that page struct too, it would work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, > > > Actually, you may only need these two: > > > > > > > maps4-add-proc-kpagecount-interface.patch > > > > maps4-add-proc-kpageflags-interface.patch > > > > Yes these two were enough, and exporting fs/proc/base.c's > > mem_lseek(). > > > > As hard as I try, I can't reproduce this at all. I tried > > both on my workstation and my niagara boxes. > > That's good to know, I was having a very hard time imagining how the > kpagecount code could be going south. > > > It must be other needle in the 30MB+ -mm haystack. :-( I'm afraid you are wrong. Eariler kernel are affected as well. At reading your mail I was thinking of applying those two patches to 2.6.24-rc5 and do bisection on the rest of -mm series. Unfortunately clean 2.6.24-rc5 with these two patches is affected as well (new processes stuck in D state etc). So I tried vanilla 2.6.23 patched by these two patches (and mem_lseek export from fs/proc/base.c). Now at least I got a trace produced by 'cat /proc/kpagecount' which you can find below. Also, in spite of the oops, the box doesn't get locked (as with -mm) and is still usable. [ 126.060976] TSTATE: 009980009603 TPC: 00428a84 TNPC: 00428a88 Y: Not tainted [ 126.063486] TPC: [ 126.065986] g0: 0009 g1: 04804000 g2: 000f g3: 007204c0 [ 126.068636] g4: 007244c0 g5: f8007f878000 g6: 007204c0 g7: 00724958 [ 126.071232] o0: 0001 o1: 007204c8 o2: 0001 o3: [ 126.073924] o4: 6000 o5: 0078f140 sp: 007239b1 ret_pc: 00428a78 [ 126.076569] RPC: [ 126.079185] l0: 0072 l1: 0002 l2: 0001 l3: 0075d400 [ 126.081934] l4: 0075d400 l5: f80080015b10 l6: f80080005b08 l7: 0001 [ 126.084637] i0: 0001 i1: 00720094 i2: i3: [ 126.087375] i4: 007204c0 i5: 0002 i6: 00723a71 i7: 00665a24 [ 126.090135] I7: [ 145.121228] Unable to handle kernel NULL pointer dereference [ 145.124515] tsk->{mm,active_mm}->context = 0d41 [ 145.127778] tsk->{mm,active_mm}->pgd = f800bd8d2000 [ 145.127801] \|/ \|/ [ 145.127808] "@'/ .. \`@" [ 145.127815] /_| \__/ |_\ [ 145.127821] \__U_/ [ 145.127831] cat(3111): Oops [#1] [ 145.127849] [ 145.127853] = [ 145.127861] [ INFO: inconsistent lock state ] [ 145.127873] 2.6.23 #1 [ 145.127880] - [ 145.127891] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. [ 145.127906] cat/3111 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 145.127918] (regdump_lock){+...}, at: [<004281d0>] __show_regs+0x18/0x320 [ 145.127951] {in-hardirq-W} state was registered at: [ 145.127960] [<00669780>] _spin_lock+0x28/0x40 [ 145.127983] [<004281d0>] __show_regs+0x18/0x320 [ 145.128000] [<004284e4>] show_regs+0xc/0x20 [ 145.128016] [<005ac9d8>] sysrq_handle_showregs+0x20/0x40 [ 145.128041] [<005ac7fc>] __handle_sysrq+0x84/0x160 [ 145.128060] [<005ac8f8>] handle_sysrq+0x20/0x40 [ 145.128078] [<005a4f08>] kbd_event+0x670/0xb60 [ 145.128110] [<005ea0c0>] input_event+0x1e8/0x560 [ 145.128140] [<005efa2c>] sunkbd_interrupt+0x114/0x140 [ 145.128167] [<005e6270>] serio_interrupt+0x38/0xa0 [ 145.128186] [<005b2e58>] sunsu_kbd_ms_interrupt+0xa0/0x140 [ 145.128212] [<0049f6f8>] handle_IRQ_event+0x20/0x80 [ 145.128251] [<0049f808>] __do_IRQ+0xb0/0x140 [ 145.128268] [<0042f48c>] handler_irq+0x94/0xc0 [ 145.128306] [<00426f30>] sunos_sys_table+0x560/0x728 [ 145.128324] [<00428a78>] cpu_idle+0x20/0xe0 [ 145.128341] [<00665a24>] rest_init+0x6c/0x80 [ 145.128375] [<0076ec24>] start_kernel+0x2ec/0x340 [ 145.128405] [<0066599c>] tlb_fixup_done+0xa0/0xbc [ 145.128425] [<>] 0x8 [ 145.128443] irq event stamp: 1209 [ 145.128451] hardirqs last enabled at (1209): [<00404b74>] __handle_softirq_continue+0x20/0x24 [ 145.128480] hardirqs last disabled at (1207): [<00474494>] __do_softirq+0xbc/0x140 [ 145.128506] softirqs last enabled at (1208): [<004744dc>] __do_softirq+0x104/0x140 [ 145.128526] softirqs last disabled at (1203): [<004745a0>] do_softirq+0x88/0xa0 [ 145.128546] [ 145.128551] other info that might help us debug this: [ 145.128562] no locks held by cat/3111. [ 145.128570] [ 145.128574] stack backtrace: [ 145.128582] Call Trace: [ 145.128590] [004907a0] print_usage_bug+0x148/0x160 [ 145.128624] [004917f4] mark_lock+0x6dc/0x780 [ 145.128641] [0049286c] __lock_acquire+0x734/0x12a0 [ 145.128659] [0049
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Thu, Dec 20, 2007 at 04:53:59AM -0800, David Miller wrote: > From: Matt Mackall <[EMAIL PROTECTED]> > Date: Mon, 17 Dec 2007 08:55:54 -0600 > > > On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: > > Actually, you may only need these two: > > > > > maps4-add-proc-kpagecount-interface.patch > > > maps4-add-proc-kpageflags-interface.patch > > Yes these two were enough, and exporting fs/proc/base.c's > mem_lseek(). > > As hard as I try, I can't reproduce this at all. I tried > both on my workstation and my niagara boxes. That's good to know, I was having a very hard time imagining how the kpagecount code could be going south. > It must be other needle in the 30MB+ -mm haystack. :-( Have we seen a config for the broken machine? Perhaps that'll help us make a guess.. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall <[EMAIL PROTECTED]> Date: Mon, 17 Dec 2007 08:55:54 -0600 > On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: > Actually, you may only need these two: > > > maps4-add-proc-kpagecount-interface.patch > > maps4-add-proc-kpageflags-interface.patch Yes these two were enough, and exporting fs/proc/base.c's mem_lseek(). As hard as I try, I can't reproduce this at all. I tried both on my workstation and my niagara boxes. It must be other needle in the 30MB+ -mm haystack. :-( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, > > cat /proc/kpagecount on the other hand - with the change in line 710 > > - locks the box. Sysrq works, changing consoles works, but there is > > no "BUG: soft lockup ..." message. After a while the box becomes > > totaly unresponsive - even caps lock doesn't work, no responses to > > ping. > > Well I'm baffled. There's basically two things in that function that > do anything interesting: pfn_to_page and put_user. access_ok is > "return 1" on Sparc64. atomic_read is a simple read. > > My usual approach at this point would be to litter it with printks and > see where its hanging. Ok. Maybe this will help. Don't know how to compare that to the results from yesterday (test with ppage = NULL) - maybe I f something up. This time I added a bunch of printks and got these results: This is from 'cat /proc/kpageflags' (after this the box is locked): 01 pfn:0, src:0, KPMSIZE:8 23458 ppage:0002, pfn:1 and the relevant code: static ssize_t kpageflags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; u64 kflags, uflags; printk("0"); if (!access_ok(VERIFY_WRITE, buf, count)) return -EFAULT; printk("1"); pfn = src / KPMSIZE; printk("\npfn:%u, src:%u, KPMSIZE:%d\n", pfn, src, KPMSIZE); count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src); printk("2"); if (src & KPMMASK || count & KPMMASK) return -EIO; printk("3"); while (count > 0) { printk("4"); ppage = pfn_to_page(pfn++); printk("5"); if (!ppage) { printk("6"); kflags = 0; printk("7"); } else { printk("8"); printk("\nppage:%p, pfn:%u\n", ppage, pfn); kflags = ppage->flags; // < something bad happens printk("9"); } printk("a"); This is from 'cat /proc/kpagecount' (after this the box is locked) 01 pfn:0, src:0, KPMSIZE:8 23567a ppage:0002, pfn:1 and this is the relevant code: static ssize_t kpagecount_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { u64 __user *out = (u64 __user *)buf; struct page *ppage; unsigned long src = *ppos; unsigned long pfn; ssize_t ret = 0; u64 pcount; printk("0"); if (!access_ok(VERIFY_WRITE, buf, count)) return -EFAULT; printk("1"); pfn = src / KPMSIZE; printk("\npfn:%u, src:%u, KPMSIZE:%d\n", pfn, src, KPMSIZE); printk("2"); count = min_t(size_t, count, (max_pfn * KPMSIZE) - src); printk("3"); if (src & KPMMASK || count & KPMMASK) { printk("4"); return -EIO; } printk("5"); while (count > 0) { printk("6"); ppage = pfn_to_page(pfn++); printk("7"); if (!ppage) { printk("8"); pcount = 0; } else { printk("a"); printk("\nppage:%p, pfn:%u\n", ppage, pfn); pcount = atomic_read(&ppage->_count); // < something bad happens printk("b"); } Regards, Mariusz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote: > On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> > wrote: > > > From: Matt Mackall <[EMAIL PROTECTED]> > > Date: Sun, 16 Dec 2007 20:11:49 -0600 > > > > > But as the function doesn't actually show up in your stack trace, > > > something else is probably wrong. So I'd also try commenting out > > > pieces of that function until it started working. > > > > Some piece of state is being indirectly corrupted and this > > is showing up later in some unrelated operation. > > > > Can someone send me this kpageflags patch under seperate > > cover? I'll try figure out why it farts on sparc64. > > hm, non trivial. It's the third-from-last patch in: > > maps4-add-proportional-set-size-accounting-in-smaps.patch > maps4-rework-task_size-macros.patch > maps4-rework-task_size-macros-mips-fix.patch > maps4-move-is_swap_pte.patch > maps4-introduce-a-generic-page-walker.patch > maps4-use-pagewalker-in-clear_refs-and-smaps.patch > maps4-simplify-interdependence-of-maps-and-smaps.patch > maps4-move-clear_refs-code-to-task_mmuc.patch > maps4-regroup-task_mmu-by-interface.patch > maps4-add-proc-pid-pagemap-interface.patch Actually, you may only need these two: > maps4-add-proc-kpagecount-interface.patch > maps4-add-proc-kpageflags-interface.patch -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Matt Mackall <[EMAIL PROTECTED]> > Date: Sun, 16 Dec 2007 20:11:49 -0600 > > > But as the function doesn't actually show up in your stack trace, > > something else is probably wrong. So I'd also try commenting out > > pieces of that function until it started working. > > Some piece of state is being indirectly corrupted and this > is showing up later in some unrelated operation. > > Can someone send me this kpageflags patch under seperate > cover? I'll try figure out why it farts on sparc64. hm, non trivial. It's the third-from-last patch in: maps4-add-proportional-set-size-accounting-in-smaps.patch maps4-rework-task_size-macros.patch maps4-rework-task_size-macros-mips-fix.patch maps4-move-is_swap_pte.patch maps4-introduce-a-generic-page-walker.patch maps4-use-pagewalker-in-clear_refs-and-smaps.patch maps4-simplify-interdependence-of-maps-and-smaps.patch maps4-move-clear_refs-code-to-task_mmuc.patch maps4-regroup-task_mmu-by-interface.patch maps4-add-proc-pid-pagemap-interface.patch maps4-add-proc-kpagecount-interface.patch maps4-add-proc-kpageflags-interface.patch maps4-make-page-monitoring-proc-file-optional.patch maps4-make-page-monitoring-proc-file-optional-fix.patch from ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/broken-out That patch series does apply OK to mainline though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
From: Matt Mackall <[EMAIL PROTECTED]> Date: Sun, 16 Dec 2007 20:11:49 -0600 > But as the function doesn't actually show up in your stack trace, > something else is probably wrong. So I'd also try commenting out > pieces of that function until it started working. Some piece of state is being indirectly corrupted and this is showing up later in some unrelated operation. Can someone send me this kpageflags patch under seperate cover? I'll try figure out why it farts on sparc64. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 08:10:10PM +0100, Mariusz Kozlowski wrote: > > > Can you change line 710 of fs/proc/proc_misc.c to: > > > > > > ppage = NULL; > > > > Sure. > > > > > ..and see if it still breaks? > > > > Yes it does - the same way as eariler. Box is locked, processes stuck in D > > state > > and after a while "BUG: soft lockup - CPU#0 stuck for 11s!". > > My mistake. I run cat /proc/kpageflags in the first place - so how > could anything change :) > > cat /proc/kpagecount on the other hand - with the change in line 710 > - locks the box. Sysrq works, changing consoles works, but there is > no "BUG: soft lockup ..." message. After a while the box becomes > totaly unresponsive - even caps lock doesn't work, no responses to > ping. Well I'm baffled. There's basically two things in that function that do anything interesting: pfn_to_page and put_user. access_ok is "return 1" on Sparc64. atomic_read is a simple read. My usual approach at this point would be to litter it with printks and see where its hanging. But as the function doesn't actually show up in your stack trace, something else is probably wrong. So I'd also try commenting out pieces of that function until it started working. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Witam, > > > > cat /proc/kpageflags on sparc64 causes the box to lock. > > > > I can not write on any terminal - but I can issue sysrqs and switch > > > > between consoles. > > > > > > > > cat process hangs in read(3, ... > > > > > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w > > > sshd trace: > > > > > > __down > > > __down_interruptible > > > kobject_get > > > lock_kernel > > > chrdev_open > > > __dentry_open > > > nameidata_to_filp > > > open_pathname > > > do_sys_open > > > sparc32_open > > > linux_sparc_syscall32 > > > > Perhaps this is related to sparsemem. > > > > Can you change line 710 of fs/proc/proc_misc.c to: > > > > ppage = NULL; > > Sure. > > > ..and see if it still breaks? > > Yes it does - the same way as eariler. Box is locked, processes stuck in D > state > and after a while "BUG: soft lockup - CPU#0 stuck for 11s!". My mistake. I run cat /proc/kpageflags in the first place - so how could anything change :) cat /proc/kpagecount on the other hand - with the change in line 710 - locks the box. Sysrq works, changing consoles works, but there is no "BUG: soft lockup ..." message. After a while the box becomes totaly unresponsive - even caps lock doesn't work, no responses to ping. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
> > > cat /proc/kpageflags on sparc64 causes the box to lock. > > > I can not write on any terminal - but I can issue sysrqs and switch > > > between consoles. > > > > > > cat process hangs in read(3, ... > > > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w > > sshd trace: > > > > __down > > __down_interruptible > > kobject_get > > lock_kernel > > chrdev_open > > __dentry_open > > nameidata_to_filp > > open_pathname > > do_sys_open > > sparc32_open > > linux_sparc_syscall32 > > Perhaps this is related to sparsemem. > > Can you change line 710 of fs/proc/proc_misc.c to: > > ppage = NULL; Sure. > ..and see if it still breaks? Yes it does - the same way as eariler. Box is locked, processes stuck in D state and after a while "BUG: soft lockup - CPU#0 stuck for 11s!". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
On Sun, Dec 16, 2007 at 12:40:53PM +0100, Mariusz Kozlowski wrote: > > cat /proc/kpageflags on sparc64 causes the box to lock. > > I can not write on any terminal - but I can issue sysrqs and switch > > between consoles. > > > > cat process hangs in read(3, ... > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w sshd > trace: > > __down > __down_interruptible > kobject_get > lock_kernel > chrdev_open > __dentry_open > nameidata_to_filp > open_pathname > do_sys_open > sparc32_open > linux_sparc_syscall32 Perhaps this is related to sparsemem. Can you change line 710 of fs/proc/proc_misc.c to: ppage = NULL; ..and see if it still breaks? -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
> cat /proc/kpageflags on sparc64 causes the box to lock. > I can not write on any terminal - but I can issue sysrqs and switch > between consoles. > > cat process hangs in read(3, ... cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w sshd trace: __down __down_interruptible kobject_get lock_kernel chrdev_open __dentry_open nameidata_to_filp open_pathname do_sys_open sparc32_open linux_sparc_syscall32 then again: BUG: soft lockup - CPU#0 stuck for 11s! [sshd:3242] ... TPC: spitfire_xcall_helper+0xa0/0x100 ... RPC: spitfire_xcall_helper+0xac/0x100 ... I7: flush_dcache_page_all+0x1a4/0x1e0 or: BUG: soft lockup - CPU#0 stuck for 11s! [sshd:3242] ... TPC: tick_get_tick+0xc/0x20 ... RPC: __handle_softirq_continue+0x20/0x24 ... I7: __delay+0x2c/0x60 Box is unusable. Easy to reproduce - every time. Regards, Mariusz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello > Will reply soon with correct data. Ok here it goes: cat /proc/kpageflags on sparc64 causes the box to lock. I can not write on any terminal - but I can issue sysrqs and switch between consoles. cat process hangs in read(3, ... sysrq-w shows: syslogd D 0069240c 0 2470 1 Call Trace: [00692224] __down+0x8c/0x100 [0069240c] __down_interruptible+0x174/0x1a0 [006935d4] mutex_trylock+0xfc/0x1e0 [00695c7c] lock_kernel+0x24/0x40 [005b0cc0] tty_write+0x168/0x200 [004d0b08] do_loop_readv_writev+0x30/0x60 [00507540] compat_do_readv_writev+0x268/0x280 [005075b0] compat_sys_writev+0x58/0x80 [004062d4] linux_sparc_syscall32+0x3c/0x40 [f7e3f408] 0xf7e3f410 then when I try to ssh to the sparc machine I fail but at sparc you can see this: BUG: soft lockup - CPU#0 stuck for 11s! [sshd:3227] TSTATE: 009911009607 TPC: 00430c2c TNPC:00430c30 Y: Not tainted TCP: <__delay+0x34/0x60> g0: g1: 0042875103e3 g2: 00430800 g3: 0001869c g4: f800bf086100 g5: f8007f832000 g6: f800be4a g7: 0004 o0: 0042875103e3 o1: o2: 00430c78 o3: o4: 7fff o5: sp: f800be4a2e81 ret_pc: 00430c24 RPC: <__delay+0x2c/0x60> l0: 0042875100df l1: 007a4000 l2: l3: 007d9000 l4: l5: 0001 l6: l7: i0: 0382 i1: f800be4a0400 i2: 00445d3c i3: i4: 0002 i5: 0045388c i6: f800be4a2f41 i7: 00430c6c I7: When this happens box seems to react only to sysrq-b or manual reset. Anything else is useless. Regards, Mariusz # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc5-mm1 # Fri Dec 14 19:47:15 2007 # CONFIG_SPARC=y CONFIG_SPARC64=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_64BIT=y CONFIG_MMU=y CONFIG_QUICKLIST=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_AUDIT_ARCH=y CONFIG_ARCH_NO_VIRT_TO_BUS=y CONFIG_OF=y CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y CONFIG_ARCH_SUPPORTS_AOUT=y CONFIG_SPARC64_PAGE_SIZE_8KB=y # CONFIG_SPARC64_PAGE_SIZE_64KB is not set # CONFIG_SPARC64_PAGE_SIZE_512KB is not set # CONFIG_SPARC64_PAGE_SIZE_4MB is not set CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_HOTPLUG_CPU is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 # CONFIG_CGROUPS is not set # CONFIG_FAIR_GROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_IPC_NS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set # CONFIG_BLK_DEV_INITRD is not set # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROC_PAGE_MONITOR=y # CONFIG_PROFILING is not set # CONFIG_MARKERS is not set CONFIG_HAVE_OPROFILE=y # CONFIG_KPROBES is not set CONFIG_HAVE_KPROBES=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y # CONFIG_BLK_DEV_BSG is not set CONFIG_BLOCK_COMPAT=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" CONFIG_SYSVIPC_COMPAT=y CONFIG_GENERIC_HARDIRQS=y # # General machine setup # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_NR_CPUS=4 # CONFIG_CPU
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
> cat /proc/kpageflags on sparc64 causes the box to lock. > I can not write on any terminal - but I can issue sysrqs and switch > between consoles. > > cat process hangs in read(3, ... > > sysrq-w shows: > > syslogd D 0069240c 0 2470 1 > Call Trace: > [00692224] > [00692224] > [00692224] > [00692224] > [00692224] > [00692224] aggrh ... please ignore. Sent by mistake when retyping info from sparc (no camera right now :/) Will reply soon with correct data. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello, cat /proc/kpageflags on sparc64 causes the box to lock. I can not write on any terminal - but I can issue sysrqs and switch between consoles. cat process hangs in read(3, ... sysrq-w shows: syslogd D 0069240c 0 2470 1 Call Trace: [00692224] [00692224] [00692224] [00692224] [00692224] [00692224] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/