Re: [PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Balbir Singh
Lee Schermerhorn wrote:
> On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote:
>> On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:
>>
 Interesting, I don't see a memory controller function in the stack
 trace, but I'll double check to see if I can find some silly race
 condition in there.
>>> right.  I noticed that after I sent the mail.  
>>>
>>> Also, config available at:
>>> http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont
>> Be interested to know the outcome of any bisect you do.  Given its
>> tripping in reclaim.
> 
> Problem isolated to memory controller patches.  This patch seems to fix
> this particular problem.  I've only run the test for a few minutes with
> and without memory controller configured, but I did observe reclaim
> kicking in several times.  W/o this patch, system would panic as soon as
> I entered direct/zone reclaim--less than a minute.
> 

Thanks, excellent catch! The patch looks sane.  Thanks for your help in
sorting this issue out. Hmm.. that means I never hit direct/zone reclaim
in my tests (I'll make a mental note to enhance my test cases to cover
this scenario).

> Lee
> 
> 
> PATCH 2.6.23-rc4-mm1 Memory Controller:  initialize all scan_controls'
>   isolate_pages member.
> 
> We need to initialize all scan_controls' isolate_pages member.
> Otherwise, shrink_active_list() attempts to execute at undefined
> location.
> 
> Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>
> 
>  mm/vmscan.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: Linux/mm/vmscan.c
> ===
> --- Linux.orig/mm/vmscan.c2007-09-10 13:22:21.0 -0400
> +++ Linux/mm/vmscan.c 2007-09-12 15:30:27.0 -0400
> @@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned
>   .swap_cluster_max = nr_pages,
>   .may_writepage = 1,
>   .swappiness = vm_swappiness,
> + .isolate_pages = isolate_pages_global,
>   };
> 
>   current->reclaim_state = _state;
> @@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z
>   SWAP_CLUSTER_MAX),
>   .gfp_mask = gfp_mask,
>   .swappiness = vm_swappiness,
> + .isolate_pages = isolate_pages_global,
>   };
>   unsigned long slab_reclaimable;
> 
> 
> 


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Lee Schermerhorn
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote:
> On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:
> 
> > > Interesting, I don't see a memory controller function in the stack
> > > trace, but I'll double check to see if I can find some silly race
> > > condition in there.
> > 
> > right.  I noticed that after I sent the mail.  
> > 
> > Also, config available at:
> > http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont
> 
> Be interested to know the outcome of any bisect you do.  Given its
> tripping in reclaim.

Problem isolated to memory controller patches.  This patch seems to fix
this particular problem.  I've only run the test for a few minutes with
and without memory controller configured, but I did observe reclaim
kicking in several times.  W/o this patch, system would panic as soon as
I entered direct/zone reclaim--less than a minute.

Lee


PATCH 2.6.23-rc4-mm1 Memory Controller:  initialize all scan_controls'
isolate_pages member.

We need to initialize all scan_controls' isolate_pages member.
Otherwise, shrink_active_list() attempts to execute at undefined
location.

Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>

 mm/vmscan.c |2 ++
 1 file changed, 2 insertions(+)

Index: Linux/mm/vmscan.c
===
--- Linux.orig/mm/vmscan.c  2007-09-10 13:22:21.0 -0400
+++ Linux/mm/vmscan.c   2007-09-12 15:30:27.0 -0400
@@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned
.swap_cluster_max = nr_pages,
.may_writepage = 1,
.swappiness = vm_swappiness,
+   .isolate_pages = isolate_pages_global,
};
 
current->reclaim_state = _state;
@@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z
SWAP_CLUSTER_MAX),
.gfp_mask = gfp_mask,
.swappiness = vm_swappiness,
+   .isolate_pages = isolate_pages_global,
};
unsigned long slab_reclaimable;
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Lee Schermerhorn
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote:
> On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:
> 
> > > Interesting, I don't see a memory controller function in the stack
> > > trace, but I'll double check to see if I can find some silly race
> > > condition in there.
> > 
> > right.  I noticed that after I sent the mail.  
> > 
> > Also, config available at:
> > http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont
> 
> Be interested to know the outcome of any bisect you do.  Given its
> tripping in reclaim.

FYI:  doesn't seem to fail with 23-rc6.  

> 
> What size of box is this?  Wondering if we have anything big enough to
> test with.

This is a 16-cpu, 4-node, 32GB HP rx8620.  The test load that I'm
running is Dave Anderson's "usex" with a custom test script that runs:

5 built-in usex IO tests to a separate file system on a SCSI disk.
1 built-in usex IO rate test -- to/from same disk/fs.
1 POV ray tracing app--just because I had it :-)
1 script that does "find / -type f | xargs strings >/dev/null" to
pollute the page cache.
2 memtoy scripts to allocate various size anon segments--up to 20GB--
and mlock() them down to force reclaim.
1 32-way parallel kernel build
3 1GB random vm tests
3 1GB sequential vm tests
9 built-in usex "bin" tests--these run a series of programs
from /usr/bin to simulate users doing random things.  Not really random,
tho'.  Just walks a table of commands sequentially.

This load beats up on the system fairly heavily.

I can package up the usex input script and the other associated scripts
that it invokes, if you're interested.  Let me know...

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Andy Whitcroft
On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:

> > Interesting, I don't see a memory controller function in the stack
> > trace, but I'll double check to see if I can find some silly race
> > condition in there.
> 
> right.  I noticed that after I sent the mail.  
> 
> Also, config available at:
> http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont

Be interested to know the outcome of any bisect you do.  Given its
tripping in reclaim.

What size of box is this?  Wondering if we have anything big enough to
test with.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Lee Schermerhorn
On Wed, 2007-09-12 at 19:38 +0530, Balbir Singh wrote:
> Lee Schermerhorn wrote:
> > On Wed, 2007-09-12 at 07:22 +0530, Balbir Singh wrote:
> >> Lee Schermerhorn wrote:
> >>> [Balbir:  see notes re:  replication and memory controller below]
> >>>
> >>> A quick update:  I have rebased the automatic/lazy page migration and
> >>> replication patches to 23-rc4-mm1.  If interested, you can find the
> >>> entire series that I push in the '070911' tarball at:
> >>>
> >>>   http://free.linux.hp.com/~lts/Patches/Replication/
> >>>
> >>> I haven't gotten around to some of the things you suggested to address
> >>> the soft lockups. etc.  I just wanted to keep the patches up to date.  
> >>>
> >>> In the process of doing a quick sanity test, I encountered an issue with
> >>> replication and the new memory controller patches.  I had built the
> >>> kernel with the memory controller enabled.  I encountered a panic in
> >>> reclaim, while attempting to "drop caches", because replication was not
> >>> "charging" the replicated pages and reclaim tried to deref a null
> >>> "page_container" pointer.  [!!! new member in page struct !!!]
> >>>
> >>> I added code to try_to_create_replica(), __remove_replicated_page() and
> >>> release_pcache_desc() to charge/uncharge where I thought appropriate
> >>> [replication patch # 02].  That seemed to solve the panic during drop
> >>> caches triggered reclaim.  However, when I tried a more stressful load,
> >>> I hit another panic ["NaT Consumption" == ia64-ese for invalid pointer
> >>> deref, I think] in shrink_active_list() called from direct reclaim.
> >>> Still to be investigated.  I wanted to give you and Balbir a heads up
> >>> about the interaction of memory controllers with page replication.
> >>>
> >> Hi, Lee,
> >>
> >> Thanks for testing the memory controller with page replication. I do
> >> have some questions on the problem you are seeing
> >>
> >> Did you see the problem with direct reclaim or container reclaim?
> >> drop_caches calls remove_mapping(), which should eventually call
> >> the uncharge routine. We have some sanity checks in there.
> > 
> > Sorry.  This one wasn't in reclaim.  It was from the fault path, via
> > activate page.  The bug in reclaim occurred after I "fixed" page
> > replication to charge for replicated pages, thus adding the
> > page_container.  The second panic resulted from bad pointer ref in
> > shrink_active_list() from direct reclaim.
> > 
> > [abbreviated] stack traces attached below.
> > 
> > I took a look at an assembly language objdump and it appears that the
> > bad pointer deref occurred in the "while (!list_empty(_inactive))"
> > loop.  I see that there is also a mem_container_move_lists() call there.
> > I will try to rerun the workload on an unpatched 23-rc4-mm1 today to see
> > if it's reproducible there.  I can believe that this is a race between
> > replication [possibly "unreplicate"] and vmscan.  I don't know what type
> > of protection, if any, we have against that.  
> > 
> 
> 
> Thanks, the stack trace makes sense now. So basically, we have a case
> where a page is on the zone LRU, but does not belong to any container,
> which is why we do indeed need your first fix (to charge/uncharge) the
> pages on replication/removal.
> 
> >> We do try to see at several places if the page->page_container is NULL
> >> and check for it. I'll look at your patches to see if there are any
> >> changes to the reclaim logic. I tried looking for the oops you
> >> mentioned, but could not find it in your directory, I saw the soft
> >> lockup logs though. Do you still have the oops saved somewhere?
> >>
> >> I think the fix you have is correct and makes things works, but it
> >> worries me that in direct reclaim we dereference the page_container
> >> pointer without the page belonging to a container? What are the
> >> properties of replicated pages? Are they assumed to be exact
> >> replicas (struct page mappings, page_container expected to be the
> >> same for all replicated pages) of the replicated page?
> > 
> > Before "fix"
> > 
> > Running spol+lpm+repl patches on 23-rc4-mm1.  kernel build test
> > echo 1 >/proc/sys/vm/drop_caches
> > Then [perhaps a coincidence]:
> > 
> > Unable to handle kernel NULL pointer dereference (address 0008)
> > cc1[23366]: Oops 11003706212352 [1]
> > Modules linked in: sunrpc binfmt_misc fan dock sg thermal processor 
> > container button sr_mod scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore
> > 
> > Pid: 23366, CPU 6, comm:  cc1
> > 
> >  [] __mem_container_move_lists+0x50/0x100
> > sp=e720449a7d60 bsp=e720449a1040
> >  [] mem_container_move_lists+0x50/0x80
> > sp=e720449a7d60 bsp=e720449a1010
> >  [] activate_page+0x1d0/0x220
> > sp=e720449a7d60 bsp=e720449a0fd0
> >  [] mark_page_accessed+0xe0/0x160
> > sp=e720449a7d60 

Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Andy Whitcroft
On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:

  Interesting, I don't see a memory controller function in the stack
  trace, but I'll double check to see if I can find some silly race
  condition in there.
 
 right.  I noticed that after I sent the mail.  
 
 Also, config available at:
 http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont

Be interested to know the outcome of any bisect you do.  Given its
tripping in reclaim.

What size of box is this?  Wondering if we have anything big enough to
test with.

-apw
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Lee Schermerhorn
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote:
 On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:
 
   Interesting, I don't see a memory controller function in the stack
   trace, but I'll double check to see if I can find some silly race
   condition in there.
  
  right.  I noticed that after I sent the mail.  
  
  Also, config available at:
  http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont
 
 Be interested to know the outcome of any bisect you do.  Given its
 tripping in reclaim.

FYI:  doesn't seem to fail with 23-rc6.  

 
 What size of box is this?  Wondering if we have anything big enough to
 test with.

This is a 16-cpu, 4-node, 32GB HP rx8620.  The test load that I'm
running is Dave Anderson's usex with a custom test script that runs:

5 built-in usex IO tests to a separate file system on a SCSI disk.
1 built-in usex IO rate test -- to/from same disk/fs.
1 POV ray tracing app--just because I had it :-)
1 script that does find / -type f | xargs strings /dev/null to
pollute the page cache.
2 memtoy scripts to allocate various size anon segments--up to 20GB--
and mlock() them down to force reclaim.
1 32-way parallel kernel build
3 1GB random vm tests
3 1GB sequential vm tests
9 built-in usex bin tests--these run a series of programs
from /usr/bin to simulate users doing random things.  Not really random,
tho'.  Just walks a table of commands sequentially.

This load beats up on the system fairly heavily.

I can package up the usex input script and the other associated scripts
that it invokes, if you're interested.  Let me know...

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Lee Schermerhorn
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote:
 On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:
 
   Interesting, I don't see a memory controller function in the stack
   trace, but I'll double check to see if I can find some silly race
   condition in there.
  
  right.  I noticed that after I sent the mail.  
  
  Also, config available at:
  http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont
 
 Be interested to know the outcome of any bisect you do.  Given its
 tripping in reclaim.

Problem isolated to memory controller patches.  This patch seems to fix
this particular problem.  I've only run the test for a few minutes with
and without memory controller configured, but I did observe reclaim
kicking in several times.  W/o this patch, system would panic as soon as
I entered direct/zone reclaim--less than a minute.

Lee


PATCH 2.6.23-rc4-mm1 Memory Controller:  initialize all scan_controls'
isolate_pages member.

We need to initialize all scan_controls' isolate_pages member.
Otherwise, shrink_active_list() attempts to execute at undefined
location.

Signed-off-by:  Lee Schermerhorn [EMAIL PROTECTED]

 mm/vmscan.c |2 ++
 1 file changed, 2 insertions(+)

Index: Linux/mm/vmscan.c
===
--- Linux.orig/mm/vmscan.c  2007-09-10 13:22:21.0 -0400
+++ Linux/mm/vmscan.c   2007-09-12 15:30:27.0 -0400
@@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned
.swap_cluster_max = nr_pages,
.may_writepage = 1,
.swappiness = vm_swappiness,
+   .isolate_pages = isolate_pages_global,
};
 
current-reclaim_state = reclaim_state;
@@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z
SWAP_CLUSTER_MAX),
.gfp_mask = gfp_mask,
.swappiness = vm_swappiness,
+   .isolate_pages = isolate_pages_global,
};
unsigned long slab_reclaimable;
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...

2007-09-12 Thread Balbir Singh
Lee Schermerhorn wrote:
 On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote:
 On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote:

 Interesting, I don't see a memory controller function in the stack
 trace, but I'll double check to see if I can find some silly race
 condition in there.
 right.  I noticed that after I sent the mail.  

 Also, config available at:
 http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont
 Be interested to know the outcome of any bisect you do.  Given its
 tripping in reclaim.
 
 Problem isolated to memory controller patches.  This patch seems to fix
 this particular problem.  I've only run the test for a few minutes with
 and without memory controller configured, but I did observe reclaim
 kicking in several times.  W/o this patch, system would panic as soon as
 I entered direct/zone reclaim--less than a minute.
 

Thanks, excellent catch! The patch looks sane.  Thanks for your help in
sorting this issue out. Hmm.. that means I never hit direct/zone reclaim
in my tests (I'll make a mental note to enhance my test cases to cover
this scenario).

 Lee
 
 
 PATCH 2.6.23-rc4-mm1 Memory Controller:  initialize all scan_controls'
   isolate_pages member.
 
 We need to initialize all scan_controls' isolate_pages member.
 Otherwise, shrink_active_list() attempts to execute at undefined
 location.
 
 Signed-off-by:  Lee Schermerhorn [EMAIL PROTECTED]
 
  mm/vmscan.c |2 ++
  1 file changed, 2 insertions(+)
 
 Index: Linux/mm/vmscan.c
 ===
 --- Linux.orig/mm/vmscan.c2007-09-10 13:22:21.0 -0400
 +++ Linux/mm/vmscan.c 2007-09-12 15:30:27.0 -0400
 @@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned
   .swap_cluster_max = nr_pages,
   .may_writepage = 1,
   .swappiness = vm_swappiness,
 + .isolate_pages = isolate_pages_global,
   };
 
   current-reclaim_state = reclaim_state;
 @@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z
   SWAP_CLUSTER_MAX),
   .gfp_mask = gfp_mask,
   .swappiness = vm_swappiness,
 + .isolate_pages = isolate_pages_global,
   };
   unsigned long slab_reclaimable;
 
 
 


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/