Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-21 Thread Andrew Morton
On Fri, 21 Dec 2007 22:51:45 +0100 Mariusz Kozlowski <[EMAIL PROTECTED]> wrote:

> > Here's a test patch:
> 
> Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug.
> 
> Thanks a lot to both of you.

Thank you for testing -mm (especially on sparc64) and for reporting
the bug and for testing the fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-21 Thread Mariusz Kozlowski
Hello,

> > > [  145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: 
> > > 005119b0 Y: Not tainted
> > > [  145.128940] TPC: 
> > 
> > My suspicion at this point is that with certain RAM layouts, simply
> > iterating over PFN's is simply not working out.
> 
> That was my original suspicion, which is why I asked Mariusz to
> effectively comment out the actual PFN lookup up-thread. I didn't send
> him a patch to do that, so I guess my instructions on how to hack it
> may have been misunderstood.

No. I just made a trivial mistake :-/ Sorry for confusion. I guess I need to
verify things three times before sending an email next time.
  
> > pfn_to_page() seems to be doing no range checking, and with sparsemem
> > vmemmap, which sparc64 always uses, this can be problematic.
> > 
> > It just blindly goes "vmemmap + pfn" which is asking for trouble, in
> > particular when the physical RAM layout really is sparse.
> > 
> > Maybe it's enough to add a pfn_valid() check here?  If pfn_valid()
> > means there is a vmemmap translation setup for that page struct too,
> > it would work.
> 
> Here's a test patch:

Tested on 2.6.23 and 2.6.24-rc5-mm1. The patch fixes the bug.

Thanks a lot to both of you.

Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread David Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 19:06:55 -0600

> @@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi
>   return -EIO;
>  
>   while (count > 0) {
> - ppage = pfn_to_page(pfn++);
> + ppage = 0;
> + if (pfn_valid(pfn))
> + ppage = pfn_to_page(pfn);
> + pfn++;
>   if (!ppage)
>   pcount = 0;
>   else

Yes that should work, please use "NULL" in the final
version of the patch instead of "0" so that sparse is
happy.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread Matt Mackall
On Thu, Dec 20, 2007 at 04:17:26PM -0800, David Miller wrote:
> From: Mariusz Kozlowski <[EMAIL PROTECTED]>
> Date: Thu, 20 Dec 2007 20:47:55 +0100
> 
> > [  145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: 
> > 005119b0 Y: Not tainted
> > [  145.128940] TPC: 
> 
> My suspicion at this point is that with certain RAM layouts, simply
> iterating over PFN's is simply not working out.

That was my original suspicion, which is why I asked Mariusz to
effectively comment out the actual PFN lookup up-thread. I didn't send
him a patch to do that, so I guess my instructions on how to hack it
may have been misunderstood.
 
> pfn_to_page() seems to be doing no range checking, and with sparsemem
> vmemmap, which sparc64 always uses, this can be problematic.
> 
> It just blindly goes "vmemmap + pfn" which is asking for trouble, in
> particular when the physical RAM layout really is sparse.
> 
> Maybe it's enough to add a pfn_valid() check here?  If pfn_valid()
> means there is a vmemmap translation setup for that page struct too,
> it would work.

Here's a test patch:

Index: mm/fs/proc/proc_misc.c
===
--- mm.orig/fs/proc/proc_misc.c 2007-12-20 19:04:35.0 -0600
+++ mm/fs/proc/proc_misc.c  2007-12-20 19:06:01.0 -0600
@@ -707,7 +707,10 @@ static ssize_t kpagecount_read(struct fi
return -EIO;
 
while (count > 0) {
-   ppage = pfn_to_page(pfn++);
+   ppage = 0;
+   if (pfn_valid(pfn))
+   ppage = pfn_to_page(pfn);
+   pfn++;
if (!ppage)
pcount = 0;
else
@@ -773,7 +776,10 @@ static ssize_t kpageflags_read(struct fi
return -EIO;
 
while (count > 0) {
-   ppage = pfn_to_page(pfn++);
+   ppage = 0;
+   if (pfn_valid(pfn))
+   ppage = pfn_to_page(pfn);
+   pfn++;
if (!ppage)
kflags = 0;
else


-- 
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread David Miller
From: Mariusz Kozlowski <[EMAIL PROTECTED]>
Date: Thu, 20 Dec 2007 20:47:55 +0100

> [  145.128915] TSTATE: 004411009603 TPC: 005119ac TNPC: 
> 005119b0 Y: Not tainted
> [  145.128940] TPC: 

My suspicion at this point is that with certain RAM layouts, simply
iterating over PFN's is simply not working out.

pfn_to_page() seems to be doing no range checking, and with sparsemem
vmemmap, which sparc64 always uses, this can be problematic.

It just blindly goes "vmemmap + pfn" which is asking for trouble, in
particular when the physical RAM layout really is sparse.

Maybe it's enough to add a pfn_valid() check here?  If pfn_valid()
means there is a vmemmap translation setup for that page struct too,
it would work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread Mariusz Kozlowski
Hello, 

> > > Actually, you may only need these two:
> > > 
> > > > maps4-add-proc-kpagecount-interface.patch
> > > > maps4-add-proc-kpageflags-interface.patch
> > 
> > Yes these two were enough, and exporting fs/proc/base.c's
> > mem_lseek().
> > 
> > As hard as I try, I can't reproduce this at all.  I tried
> > both on my workstation and my niagara boxes.
> 
> That's good to know, I was having a very hard time imagining how the
> kpagecount code could be going south.
>  
> > It must be other needle in the 30MB+ -mm haystack. :-(

I'm afraid you are wrong. Eariler kernel are affected as well. At reading your 
mail I was
thinking of applying those two patches to 2.6.24-rc5 and do bisection on the 
rest of -mm series.
Unfortunately clean 2.6.24-rc5 with these two patches is affected as well (new 
processes
stuck in D state etc). So I tried vanilla 2.6.23 patched by these two patches 
(and
mem_lseek export from fs/proc/base.c). Now at least I got a trace produced by 
'cat /proc/kpagecount'
which you can find below. Also, in spite of the oops, the box doesn't get 
locked (as with -mm)
and is still usable.

[  126.060976] TSTATE: 009980009603 TPC: 00428a84 TNPC: 
00428a88 Y: Not tainted
[  126.063486] TPC: 
[  126.065986] g0: 0009 g1: 04804000 g2: 000f 
g3: 007204c0
[  126.068636] g4: 007244c0 g5: f8007f878000 g6: 007204c0 
g7: 00724958
[  126.071232] o0: 0001 o1: 007204c8 o2: 0001 
o3: 
[  126.073924] o4: 6000 o5: 0078f140 sp: 007239b1 
ret_pc: 00428a78
[  126.076569] RPC: 
[  126.079185] l0: 0072 l1: 0002 l2: 0001 
l3: 0075d400
[  126.081934] l4: 0075d400 l5: f80080015b10 l6: f80080005b08 
l7: 0001
[  126.084637] i0: 0001 i1: 00720094 i2:  
i3: 
[  126.087375] i4: 007204c0 i5: 0002 i6: 00723a71 
i7: 00665a24
[  126.090135] I7: 
[  145.121228] Unable to handle kernel NULL pointer dereference
[  145.124515] tsk->{mm,active_mm}->context = 0d41
[  145.127778] tsk->{mm,active_mm}->pgd = f800bd8d2000
[  145.127801]   \|/  \|/
[  145.127808]   "@'/ .. \`@"
[  145.127815]   /_| \__/ |_\
[  145.127821]  \__U_/
[  145.127831] cat(3111): Oops [#1]
[  145.127849] 
[  145.127853] =
[  145.127861] [ INFO: inconsistent lock state ]
[  145.127873] 2.6.23 #1
[  145.127880] -
[  145.127891] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
[  145.127906] cat/3111 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  145.127918]  (regdump_lock){+...}, at: [<004281d0>] 
__show_regs+0x18/0x320
[  145.127951] {in-hardirq-W} state was registered at:
[  145.127960]   [<00669780>] _spin_lock+0x28/0x40
[  145.127983]   [<004281d0>] __show_regs+0x18/0x320
[  145.128000]   [<004284e4>] show_regs+0xc/0x20
[  145.128016]   [<005ac9d8>] sysrq_handle_showregs+0x20/0x40
[  145.128041]   [<005ac7fc>] __handle_sysrq+0x84/0x160
[  145.128060]   [<005ac8f8>] handle_sysrq+0x20/0x40
[  145.128078]   [<005a4f08>] kbd_event+0x670/0xb60
[  145.128110]   [<005ea0c0>] input_event+0x1e8/0x560
[  145.128140]   [<005efa2c>] sunkbd_interrupt+0x114/0x140
[  145.128167]   [<005e6270>] serio_interrupt+0x38/0xa0
[  145.128186]   [<005b2e58>] sunsu_kbd_ms_interrupt+0xa0/0x140
[  145.128212]   [<0049f6f8>] handle_IRQ_event+0x20/0x80
[  145.128251]   [<0049f808>] __do_IRQ+0xb0/0x140
[  145.128268]   [<0042f48c>] handler_irq+0x94/0xc0
[  145.128306]   [<00426f30>] sunos_sys_table+0x560/0x728
[  145.128324]   [<00428a78>] cpu_idle+0x20/0xe0
[  145.128341]   [<00665a24>] rest_init+0x6c/0x80
[  145.128375]   [<0076ec24>] start_kernel+0x2ec/0x340
[  145.128405]   [<0066599c>] tlb_fixup_done+0xa0/0xbc
[  145.128425]   [<>] 0x8
[  145.128443] irq event stamp: 1209
[  145.128451] hardirqs last  enabled at (1209): [<00404b74>] 
__handle_softirq_continue+0x20/0x24
[  145.128480] hardirqs last disabled at (1207): [<00474494>] 
__do_softirq+0xbc/0x140
[  145.128506] softirqs last  enabled at (1208): [<004744dc>] 
__do_softirq+0x104/0x140
[  145.128526] softirqs last disabled at (1203): [<004745a0>] 
do_softirq+0x88/0xa0
[  145.128546] 
[  145.128551] other info that might help us debug this:
[  145.128562] no locks held by cat/3111.
[  145.128570] 
[  145.128574] stack backtrace:
[  145.128582] Call Trace:
[  145.128590]  [004907a0] print_usage_bug+0x148/0x160
[  145.128624]  [004917f4] mark_lock+0x6dc/0x780
[  145.128641]  [0049286c] __lock_acquire+0x734/0x12a0
[  145.128659]  [0049

Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread Matt Mackall
On Thu, Dec 20, 2007 at 04:53:59AM -0800, David Miller wrote:
> From: Matt Mackall <[EMAIL PROTECTED]>
> Date: Mon, 17 Dec 2007 08:55:54 -0600
> 
> > On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote:
> > Actually, you may only need these two:
> > 
> > > maps4-add-proc-kpagecount-interface.patch
> > > maps4-add-proc-kpageflags-interface.patch
> 
> Yes these two were enough, and exporting fs/proc/base.c's
> mem_lseek().
> 
> As hard as I try, I can't reproduce this at all.  I tried
> both on my workstation and my niagara boxes.

That's good to know, I was having a very hard time imagining how the
kpagecount code could be going south.
 
> It must be other needle in the 30MB+ -mm haystack. :-(

Have we seen a config for the broken machine? Perhaps that'll help us
make a guess..

-- 
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-20 Thread David Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Mon, 17 Dec 2007 08:55:54 -0600

> On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote:
> Actually, you may only need these two:
> 
> > maps4-add-proc-kpagecount-interface.patch
> > maps4-add-proc-kpageflags-interface.patch

Yes these two were enough, and exporting fs/proc/base.c's
mem_lseek().

As hard as I try, I can't reproduce this at all.  I tried
both on my workstation and my niagara boxes.

It must be other needle in the 30MB+ -mm haystack. :-(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-17 Thread Mariusz Kozlowski
Hello,

> > cat /proc/kpagecount on the other hand - with the change in line 710
> > - locks the box. Sysrq works, changing consoles works, but there is
> > no "BUG: soft lockup ..." message. After a while the box becomes
> > totaly unresponsive - even caps lock doesn't work, no responses to
> > ping.
> 
> Well I'm baffled. There's basically two things in that function that
> do anything interesting: pfn_to_page and put_user. access_ok is
> "return 1" on Sparc64. atomic_read is a simple read.
>
> My usual approach at this point would be to litter it with printks and
> see where its hanging.

Ok. Maybe this will help. Don't know how to compare that to the results from 
yesterday
(test with ppage = NULL) - maybe I f something up. This time I added a bunch
of printks and got these results:

This is from 'cat /proc/kpageflags' (after this the box is locked):

01
pfn:0, src:0, KPMSIZE:8
23458
ppage:0002, pfn:1

and the relevant code:

static ssize_t kpageflags_read(struct file *file, char __user *buf,
 size_t count, loff_t *ppos)
{

u64 __user *out = (u64 __user *)buf;
struct page *ppage;
unsigned long src = *ppos;
unsigned long pfn;
ssize_t ret = 0;
u64 kflags, uflags;

printk("0");

if (!access_ok(VERIFY_WRITE, buf, count))
return -EFAULT;

printk("1");
pfn = src / KPMSIZE;
printk("\npfn:%u, src:%u, KPMSIZE:%d\n", pfn, src, KPMSIZE);
count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src);

printk("2");
if (src & KPMMASK || count & KPMMASK)
return -EIO;

printk("3");
while (count > 0) {
printk("4");
ppage = pfn_to_page(pfn++);
printk("5");
if (!ppage) {
printk("6");
kflags = 0;
printk("7");
} else {
printk("8");
printk("\nppage:%p, pfn:%u\n", ppage, pfn);
kflags = ppage->flags; // < something 
bad happens
printk("9");
}

printk("a");



This is from 'cat /proc/kpagecount' (after this the box is locked)

01
pfn:0, src:0, KPMSIZE:8
23567a
ppage:0002, pfn:1

and this is the relevant code:

static ssize_t kpagecount_read(struct file *file, char __user *buf,
 size_t count, loff_t *ppos)
{

u64 __user *out = (u64 __user *)buf;
struct page *ppage;
unsigned long src = *ppos;
unsigned long pfn;
ssize_t ret = 0;
u64 pcount;
printk("0");
if (!access_ok(VERIFY_WRITE, buf, count))
return -EFAULT;

printk("1");
pfn = src / KPMSIZE;
printk("\npfn:%u, src:%u, KPMSIZE:%d\n", pfn, src, KPMSIZE);

printk("2");
count = min_t(size_t, count, (max_pfn * KPMSIZE) - src);
printk("3");
if (src & KPMMASK || count & KPMMASK) {

printk("4");
return -EIO;
}
printk("5");
while (count > 0) {
printk("6");
ppage = pfn_to_page(pfn++);
printk("7");
if (!ppage) {
printk("8");
pcount = 0;
} else {
printk("a");
printk("\nppage:%p, pfn:%u\n", ppage, pfn);
pcount = atomic_read(&ppage->_count); // 
< something bad happens
printk("b");
}


Regards,

Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-17 Thread Matt Mackall
On Sun, Dec 16, 2007 at 10:39:17PM -0800, Andrew Morton wrote:
> On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> 
> wrote:
> 
> > From: Matt Mackall <[EMAIL PROTECTED]>
> > Date: Sun, 16 Dec 2007 20:11:49 -0600
> > 
> > > But as the function doesn't actually show up in your stack trace,
> > > something else is probably wrong. So I'd also try commenting out
> > > pieces of that function until it started working.
> > 
> > Some piece of state is being indirectly corrupted and this
> > is showing up later in some unrelated operation.
> > 
> > Can someone send me this kpageflags patch under seperate
> > cover?  I'll try figure out why it farts on sparc64.
> 
> hm, non trivial.  It's the third-from-last patch in:
> 
> maps4-add-proportional-set-size-accounting-in-smaps.patch
> maps4-rework-task_size-macros.patch
> maps4-rework-task_size-macros-mips-fix.patch
> maps4-move-is_swap_pte.patch
> maps4-introduce-a-generic-page-walker.patch
> maps4-use-pagewalker-in-clear_refs-and-smaps.patch
> maps4-simplify-interdependence-of-maps-and-smaps.patch
> maps4-move-clear_refs-code-to-task_mmuc.patch
> maps4-regroup-task_mmu-by-interface.patch
> maps4-add-proc-pid-pagemap-interface.patch

Actually, you may only need these two:

> maps4-add-proc-kpagecount-interface.patch
> maps4-add-proc-kpageflags-interface.patch

-- 
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Andrew Morton
On Sun, 16 Dec 2007 20:26:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote:

> From: Matt Mackall <[EMAIL PROTECTED]>
> Date: Sun, 16 Dec 2007 20:11:49 -0600
> 
> > But as the function doesn't actually show up in your stack trace,
> > something else is probably wrong. So I'd also try commenting out
> > pieces of that function until it started working.
> 
> Some piece of state is being indirectly corrupted and this
> is showing up later in some unrelated operation.
> 
> Can someone send me this kpageflags patch under seperate
> cover?  I'll try figure out why it farts on sparc64.

hm, non trivial.  It's the third-from-last patch in:

maps4-add-proportional-set-size-accounting-in-smaps.patch
maps4-rework-task_size-macros.patch
maps4-rework-task_size-macros-mips-fix.patch
maps4-move-is_swap_pte.patch
maps4-introduce-a-generic-page-walker.patch
maps4-use-pagewalker-in-clear_refs-and-smaps.patch
maps4-simplify-interdependence-of-maps-and-smaps.patch
maps4-move-clear_refs-code-to-task_mmuc.patch
maps4-regroup-task_mmu-by-interface.patch
maps4-add-proc-pid-pagemap-interface.patch
maps4-add-proc-kpagecount-interface.patch
maps4-add-proc-kpageflags-interface.patch
maps4-make-page-monitoring-proc-file-optional.patch
maps4-make-page-monitoring-proc-file-optional-fix.patch

from
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/broken-out

That patch series does apply OK to mainline though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread David Miller
From: Matt Mackall <[EMAIL PROTECTED]>
Date: Sun, 16 Dec 2007 20:11:49 -0600

> But as the function doesn't actually show up in your stack trace,
> something else is probably wrong. So I'd also try commenting out
> pieces of that function until it started working.

Some piece of state is being indirectly corrupted and this
is showing up later in some unrelated operation.

Can someone send me this kpageflags patch under seperate
cover?  I'll try figure out why it farts on sparc64.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Matt Mackall
On Sun, Dec 16, 2007 at 08:10:10PM +0100, Mariusz Kozlowski wrote:
> > > Can you change line 710 of fs/proc/proc_misc.c to:
> > > 
> > >   ppage = NULL;
> > 
> > Sure.
> > 
> > > ..and see if it still breaks?
> > 
> > Yes it does - the same way as eariler. Box is locked, processes stuck in D 
> > state
> > and after a while "BUG: soft lockup - CPU#0 stuck for 11s!".
> 
> My mistake. I run cat /proc/kpageflags in the first place - so how
> could anything change :)
> 
> cat /proc/kpagecount on the other hand - with the change in line 710
> - locks the box. Sysrq works, changing consoles works, but there is
> no "BUG: soft lockup ..." message. After a while the box becomes
> totaly unresponsive - even caps lock doesn't work, no responses to
> ping.

Well I'm baffled. There's basically two things in that function that
do anything interesting: pfn_to_page and put_user. access_ok is
"return 1" on Sparc64. atomic_read is a simple read.

My usual approach at this point would be to litter it with printks and
see where its hanging.

But as the function doesn't actually show up in your stack trace,
something else is probably wrong. So I'd also try commenting out
pieces of that function until it started working.

-- 
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Mariusz Kozlowski
Witam, 

> > > > cat /proc/kpageflags on sparc64 causes the box to lock.
> > > > I can not write on any terminal - but I can issue sysrqs and switch
> > > > between consoles.
> > > > 
> > > > cat process hangs in read(3, ...
> > > 
> > > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w 
> > > sshd trace:
> > > 
> > > __down
> > > __down_interruptible
> > > kobject_get
> > > lock_kernel
> > > chrdev_open
> > > __dentry_open
> > > nameidata_to_filp
> > > open_pathname
> > > do_sys_open
> > > sparc32_open
> > > linux_sparc_syscall32
> > 
> > Perhaps this is related to sparsemem.
> > 
> > Can you change line 710 of fs/proc/proc_misc.c to:
> > 
> > ppage = NULL;
> 
> Sure.
> 
> > ..and see if it still breaks?
> 
> Yes it does - the same way as eariler. Box is locked, processes stuck in D 
> state
> and after a while "BUG: soft lockup - CPU#0 stuck for 11s!".

My mistake. I run cat /proc/kpageflags in the first place - so how could 
anything change :)

cat /proc/kpagecount on the other hand - with the change in line 710 - locks 
the box.
Sysrq works, changing consoles works, but there is no "BUG: soft lockup ..." 
message.
After a while the box becomes totaly unresponsive - even caps lock doesn't 
work, no
responses to ping.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Mariusz Kozlowski
> > > cat /proc/kpageflags on sparc64 causes the box to lock.
> > > I can not write on any terminal - but I can issue sysrqs and switch
> > > between consoles.
> > > 
> > > cat process hangs in read(3, ...
> > 
> > cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w 
> > sshd trace:
> > 
> > __down
> > __down_interruptible
> > kobject_get
> > lock_kernel
> > chrdev_open
> > __dentry_open
> > nameidata_to_filp
> > open_pathname
> > do_sys_open
> > sparc32_open
> > linux_sparc_syscall32
> 
> Perhaps this is related to sparsemem.
> 
> Can you change line 710 of fs/proc/proc_misc.c to:
> 
>   ppage = NULL;

Sure.

> ..and see if it still breaks?

Yes it does - the same way as eariler. Box is locked, processes stuck in D state
and after a while "BUG: soft lockup - CPU#0 stuck for 11s!".


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Matt Mackall
On Sun, Dec 16, 2007 at 12:40:53PM +0100, Mariusz Kozlowski wrote:
> > cat /proc/kpageflags on sparc64 causes the box to lock.
> > I can not write on any terminal - but I can issue sysrqs and switch
> > between consoles.
> > 
> > cat process hangs in read(3, ...
> 
> cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w sshd 
> trace:
> 
> __down
> __down_interruptible
> kobject_get
> lock_kernel
> chrdev_open
> __dentry_open
> nameidata_to_filp
> open_pathname
> do_sys_open
> sparc32_open
> linux_sparc_syscall32

Perhaps this is related to sparsemem.

Can you change line 710 of fs/proc/proc_misc.c to:

ppage = NULL;

..and see if it still breaks?

-- 
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Mariusz Kozlowski
> cat /proc/kpageflags on sparc64 causes the box to lock.
> I can not write on any terminal - but I can issue sysrqs and switch
> between consoles.
> 
> cat process hangs in read(3, ...

cat /proc/kpagecount produces similar symptoms. box is locked - sysrq-w sshd 
trace:

__down
__down_interruptible
kobject_get
lock_kernel
chrdev_open
__dentry_open
nameidata_to_filp
open_pathname
do_sys_open
sparc32_open
linux_sparc_syscall32

then again:

BUG: soft lockup - CPU#0 stuck for 11s! [sshd:3242]
...
TPC: spitfire_xcall_helper+0xa0/0x100
...
RPC: spitfire_xcall_helper+0xac/0x100
...
I7: flush_dcache_page_all+0x1a4/0x1e0

or:

BUG: soft lockup - CPU#0 stuck for 11s! [sshd:3242]
...
TPC: tick_get_tick+0xc/0x20
...
RPC: __handle_softirq_continue+0x20/0x24
...
I7: __delay+0x2c/0x60

Box is unusable. Easy to reproduce - every time.

Regards,

Mariusz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Mariusz Kozlowski
Hello

> Will reply soon with correct data.

Ok here it goes:

cat /proc/kpageflags on sparc64 causes the box to lock.
I can not write on any terminal - but I can issue sysrqs and switch
between consoles.

cat process hangs in read(3, ...

sysrq-w shows:

syslogd   D 0069240c 0  2470  1
Call Trace:
 [00692224] __down+0x8c/0x100
 [0069240c] __down_interruptible+0x174/0x1a0 
 [006935d4] mutex_trylock+0xfc/0x1e0 
 [00695c7c] lock_kernel+0x24/0x40 
 [005b0cc0] tty_write+0x168/0x200 
 [004d0b08] do_loop_readv_writev+0x30/0x60
 [00507540] compat_do_readv_writev+0x268/0x280
 [005075b0] compat_sys_writev+0x58/0x80
 [004062d4] linux_sparc_syscall32+0x3c/0x40
 [f7e3f408] 0xf7e3f410

then when I try to ssh to the sparc machine I fail but at sparc you
can see this:

BUG: soft lockup - CPU#0 stuck for 11s! [sshd:3227]
TSTATE: 009911009607 TPC: 00430c2c TNPC:00430c30 Y: 
Not tainted
TCP: <__delay+0x34/0x60>
g0:  g1: 0042875103e3 g2: 00430800 g3: 
0001869c
g4: f800bf086100 g5: f8007f832000 g6: f800be4a g7: 
0004
o0: 0042875103e3 o1:  o2: 00430c78 o3: 

o4: 7fff o5:  sp: f800be4a2e81 ret_pc: 
00430c24
RPC: <__delay+0x2c/0x60>
l0: 0042875100df l1: 007a4000 l2:  l3: 
007d9000
l4:  l5: 0001 l6:  l7: 

i0: 0382 i1: f800be4a0400 i2: 00445d3c i3: 

i4: 0002 i5: 0045388c i6: f800be4a2f41 i7: 
00430c6c
I7: 

When this happens box seems to react only to sysrq-b or manual reset.
Anything else is useless.

Regards,

Mariusz
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc5-mm1
# Fri Dec 14 19:47:15 2007
#
CONFIG_SPARC=y
CONFIG_SPARC64=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_64BIT=y
CONFIG_MMU=y
CONFIG_QUICKLIST=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_NO_VIRT_TO_BUS=y
CONFIG_OF=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_ARCH_SUPPORTS_AOUT=y
CONFIG_SPARC64_PAGE_SIZE_8KB=y
# CONFIG_SPARC64_PAGE_SIZE_64KB is not set
# CONFIG_SPARC64_PAGE_SIZE_512KB is not set
# CONFIG_SPARC64_PAGE_SIZE_4MB is not set
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_HOTPLUG_CPU is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CGROUPS is not set
# CONFIG_FAIR_GROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_PROFILING is not set
# CONFIG_MARKERS is not set
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_KPROBES=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"
CONFIG_SYSVIPC_COMPAT=y
CONFIG_GENERIC_HARDIRQS=y

#
# General machine setup
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_NR_CPUS=4
# CONFIG_CPU

Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Mariusz Kozlowski
>   cat /proc/kpageflags on sparc64 causes the box to lock.
> I can not write on any terminal - but I can issue sysrqs and switch
> between consoles.
> 
> cat process hangs in read(3, ...
> 
> sysrq-w shows:
> 
> syslogd   D 0069240c 0  2470  1
> Call Trace:
>  [00692224] 
>  [00692224] 
>  [00692224] 
>  [00692224] 
>  [00692224] 
>  [00692224] 

aggrh ... please ignore.

Sent by mistake when retyping info from sparc (no camera right now :/)

Will reply soon with correct data.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags

2007-12-16 Thread Mariusz Kozlowski
Hello,

cat /proc/kpageflags on sparc64 causes the box to lock.
I can not write on any terminal - but I can issue sysrqs and switch
between consoles.

cat process hangs in read(3, ...

sysrq-w shows:

syslogd   D 0069240c 0  2470  1
Call Trace:
 [00692224] 
 [00692224] 
 [00692224] 
 [00692224] 
 [00692224] 
 [00692224] 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/