Re: [PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
Lee Schermerhorn wrote: > On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote: >> On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: >> Interesting, I don't see a memory controller function in the stack trace, but I'll double check to see if I can find some silly race condition in there. >>> right. I noticed that after I sent the mail. >>> >>> Also, config available at: >>> http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont >> Be interested to know the outcome of any bisect you do. Given its >> tripping in reclaim. > > Problem isolated to memory controller patches. This patch seems to fix > this particular problem. I've only run the test for a few minutes with > and without memory controller configured, but I did observe reclaim > kicking in several times. W/o this patch, system would panic as soon as > I entered direct/zone reclaim--less than a minute. > Thanks, excellent catch! The patch looks sane. Thanks for your help in sorting this issue out. Hmm.. that means I never hit direct/zone reclaim in my tests (I'll make a mental note to enhance my test cases to cover this scenario). > Lee > > > PATCH 2.6.23-rc4-mm1 Memory Controller: initialize all scan_controls' > isolate_pages member. > > We need to initialize all scan_controls' isolate_pages member. > Otherwise, shrink_active_list() attempts to execute at undefined > location. > > Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> > > mm/vmscan.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: Linux/mm/vmscan.c > === > --- Linux.orig/mm/vmscan.c2007-09-10 13:22:21.0 -0400 > +++ Linux/mm/vmscan.c 2007-09-12 15:30:27.0 -0400 > @@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned > .swap_cluster_max = nr_pages, > .may_writepage = 1, > .swappiness = vm_swappiness, > + .isolate_pages = isolate_pages_global, > }; > > current->reclaim_state = _state; > @@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z > SWAP_CLUSTER_MAX), > .gfp_mask = gfp_mask, > .swappiness = vm_swappiness, > + .isolate_pages = isolate_pages_global, > }; > unsigned long slab_reclaimable; > > > -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote: > On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: > > > > Interesting, I don't see a memory controller function in the stack > > > trace, but I'll double check to see if I can find some silly race > > > condition in there. > > > > right. I noticed that after I sent the mail. > > > > Also, config available at: > > http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont > > Be interested to know the outcome of any bisect you do. Given its > tripping in reclaim. Problem isolated to memory controller patches. This patch seems to fix this particular problem. I've only run the test for a few minutes with and without memory controller configured, but I did observe reclaim kicking in several times. W/o this patch, system would panic as soon as I entered direct/zone reclaim--less than a minute. Lee PATCH 2.6.23-rc4-mm1 Memory Controller: initialize all scan_controls' isolate_pages member. We need to initialize all scan_controls' isolate_pages member. Otherwise, shrink_active_list() attempts to execute at undefined location. Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> mm/vmscan.c |2 ++ 1 file changed, 2 insertions(+) Index: Linux/mm/vmscan.c === --- Linux.orig/mm/vmscan.c 2007-09-10 13:22:21.0 -0400 +++ Linux/mm/vmscan.c 2007-09-12 15:30:27.0 -0400 @@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned .swap_cluster_max = nr_pages, .may_writepage = 1, .swappiness = vm_swappiness, + .isolate_pages = isolate_pages_global, }; current->reclaim_state = _state; @@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z SWAP_CLUSTER_MAX), .gfp_mask = gfp_mask, .swappiness = vm_swappiness, + .isolate_pages = isolate_pages_global, }; unsigned long slab_reclaimable; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote: > On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: > > > > Interesting, I don't see a memory controller function in the stack > > > trace, but I'll double check to see if I can find some silly race > > > condition in there. > > > > right. I noticed that after I sent the mail. > > > > Also, config available at: > > http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont > > Be interested to know the outcome of any bisect you do. Given its > tripping in reclaim. FYI: doesn't seem to fail with 23-rc6. > > What size of box is this? Wondering if we have anything big enough to > test with. This is a 16-cpu, 4-node, 32GB HP rx8620. The test load that I'm running is Dave Anderson's "usex" with a custom test script that runs: 5 built-in usex IO tests to a separate file system on a SCSI disk. 1 built-in usex IO rate test -- to/from same disk/fs. 1 POV ray tracing app--just because I had it :-) 1 script that does "find / -type f | xargs strings >/dev/null" to pollute the page cache. 2 memtoy scripts to allocate various size anon segments--up to 20GB-- and mlock() them down to force reclaim. 1 32-way parallel kernel build 3 1GB random vm tests 3 1GB sequential vm tests 9 built-in usex "bin" tests--these run a series of programs from /usr/bin to simulate users doing random things. Not really random, tho'. Just walks a table of commands sequentially. This load beats up on the system fairly heavily. I can package up the usex input script and the other associated scripts that it invokes, if you're interested. Let me know... Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: > > Interesting, I don't see a memory controller function in the stack > > trace, but I'll double check to see if I can find some silly race > > condition in there. > > right. I noticed that after I sent the mail. > > Also, config available at: > http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont Be interested to know the outcome of any bisect you do. Given its tripping in reclaim. What size of box is this? Wondering if we have anything big enough to test with. -apw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, 2007-09-12 at 19:38 +0530, Balbir Singh wrote: > Lee Schermerhorn wrote: > > On Wed, 2007-09-12 at 07:22 +0530, Balbir Singh wrote: > >> Lee Schermerhorn wrote: > >>> [Balbir: see notes re: replication and memory controller below] > >>> > >>> A quick update: I have rebased the automatic/lazy page migration and > >>> replication patches to 23-rc4-mm1. If interested, you can find the > >>> entire series that I push in the '070911' tarball at: > >>> > >>> http://free.linux.hp.com/~lts/Patches/Replication/ > >>> > >>> I haven't gotten around to some of the things you suggested to address > >>> the soft lockups. etc. I just wanted to keep the patches up to date. > >>> > >>> In the process of doing a quick sanity test, I encountered an issue with > >>> replication and the new memory controller patches. I had built the > >>> kernel with the memory controller enabled. I encountered a panic in > >>> reclaim, while attempting to "drop caches", because replication was not > >>> "charging" the replicated pages and reclaim tried to deref a null > >>> "page_container" pointer. [!!! new member in page struct !!!] > >>> > >>> I added code to try_to_create_replica(), __remove_replicated_page() and > >>> release_pcache_desc() to charge/uncharge where I thought appropriate > >>> [replication patch # 02]. That seemed to solve the panic during drop > >>> caches triggered reclaim. However, when I tried a more stressful load, > >>> I hit another panic ["NaT Consumption" == ia64-ese for invalid pointer > >>> deref, I think] in shrink_active_list() called from direct reclaim. > >>> Still to be investigated. I wanted to give you and Balbir a heads up > >>> about the interaction of memory controllers with page replication. > >>> > >> Hi, Lee, > >> > >> Thanks for testing the memory controller with page replication. I do > >> have some questions on the problem you are seeing > >> > >> Did you see the problem with direct reclaim or container reclaim? > >> drop_caches calls remove_mapping(), which should eventually call > >> the uncharge routine. We have some sanity checks in there. > > > > Sorry. This one wasn't in reclaim. It was from the fault path, via > > activate page. The bug in reclaim occurred after I "fixed" page > > replication to charge for replicated pages, thus adding the > > page_container. The second panic resulted from bad pointer ref in > > shrink_active_list() from direct reclaim. > > > > [abbreviated] stack traces attached below. > > > > I took a look at an assembly language objdump and it appears that the > > bad pointer deref occurred in the "while (!list_empty(_inactive))" > > loop. I see that there is also a mem_container_move_lists() call there. > > I will try to rerun the workload on an unpatched 23-rc4-mm1 today to see > > if it's reproducible there. I can believe that this is a race between > > replication [possibly "unreplicate"] and vmscan. I don't know what type > > of protection, if any, we have against that. > > > > > Thanks, the stack trace makes sense now. So basically, we have a case > where a page is on the zone LRU, but does not belong to any container, > which is why we do indeed need your first fix (to charge/uncharge) the > pages on replication/removal. > > >> We do try to see at several places if the page->page_container is NULL > >> and check for it. I'll look at your patches to see if there are any > >> changes to the reclaim logic. I tried looking for the oops you > >> mentioned, but could not find it in your directory, I saw the soft > >> lockup logs though. Do you still have the oops saved somewhere? > >> > >> I think the fix you have is correct and makes things works, but it > >> worries me that in direct reclaim we dereference the page_container > >> pointer without the page belonging to a container? What are the > >> properties of replicated pages? Are they assumed to be exact > >> replicas (struct page mappings, page_container expected to be the > >> same for all replicated pages) of the replicated page? > > > > Before "fix" > > > > Running spol+lpm+repl patches on 23-rc4-mm1. kernel build test > > echo 1 >/proc/sys/vm/drop_caches > > Then [perhaps a coincidence]: > > > > Unable to handle kernel NULL pointer dereference (address 0008) > > cc1[23366]: Oops 11003706212352 [1] > > Modules linked in: sunrpc binfmt_misc fan dock sg thermal processor > > container button sr_mod scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore > > > > Pid: 23366, CPU 6, comm: cc1 > > > > [] __mem_container_move_lists+0x50/0x100 > > sp=e720449a7d60 bsp=e720449a1040 > > [] mem_container_move_lists+0x50/0x80 > > sp=e720449a7d60 bsp=e720449a1010 > > [] activate_page+0x1d0/0x220 > > sp=e720449a7d60 bsp=e720449a0fd0 > > [] mark_page_accessed+0xe0/0x160 > > sp=e720449a7d60
Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: Interesting, I don't see a memory controller function in the stack trace, but I'll double check to see if I can find some silly race condition in there. right. I noticed that after I sent the mail. Also, config available at: http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont Be interested to know the outcome of any bisect you do. Given its tripping in reclaim. What size of box is this? Wondering if we have anything big enough to test with. -apw - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote: On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: Interesting, I don't see a memory controller function in the stack trace, but I'll double check to see if I can find some silly race condition in there. right. I noticed that after I sent the mail. Also, config available at: http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont Be interested to know the outcome of any bisect you do. Given its tripping in reclaim. FYI: doesn't seem to fail with 23-rc6. What size of box is this? Wondering if we have anything big enough to test with. This is a 16-cpu, 4-node, 32GB HP rx8620. The test load that I'm running is Dave Anderson's usex with a custom test script that runs: 5 built-in usex IO tests to a separate file system on a SCSI disk. 1 built-in usex IO rate test -- to/from same disk/fs. 1 POV ray tracing app--just because I had it :-) 1 script that does find / -type f | xargs strings /dev/null to pollute the page cache. 2 memtoy scripts to allocate various size anon segments--up to 20GB-- and mlock() them down to force reclaim. 1 32-way parallel kernel build 3 1GB random vm tests 3 1GB sequential vm tests 9 built-in usex bin tests--these run a series of programs from /usr/bin to simulate users doing random things. Not really random, tho'. Just walks a table of commands sequentially. This load beats up on the system fairly heavily. I can package up the usex input script and the other associated scripts that it invokes, if you're interested. Let me know... Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote: On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: Interesting, I don't see a memory controller function in the stack trace, but I'll double check to see if I can find some silly race condition in there. right. I noticed that after I sent the mail. Also, config available at: http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont Be interested to know the outcome of any bisect you do. Given its tripping in reclaim. Problem isolated to memory controller patches. This patch seems to fix this particular problem. I've only run the test for a few minutes with and without memory controller configured, but I did observe reclaim kicking in several times. W/o this patch, system would panic as soon as I entered direct/zone reclaim--less than a minute. Lee PATCH 2.6.23-rc4-mm1 Memory Controller: initialize all scan_controls' isolate_pages member. We need to initialize all scan_controls' isolate_pages member. Otherwise, shrink_active_list() attempts to execute at undefined location. Signed-off-by: Lee Schermerhorn [EMAIL PROTECTED] mm/vmscan.c |2 ++ 1 file changed, 2 insertions(+) Index: Linux/mm/vmscan.c === --- Linux.orig/mm/vmscan.c 2007-09-10 13:22:21.0 -0400 +++ Linux/mm/vmscan.c 2007-09-12 15:30:27.0 -0400 @@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned .swap_cluster_max = nr_pages, .may_writepage = 1, .swappiness = vm_swappiness, + .isolate_pages = isolate_pages_global, }; current-reclaim_state = reclaim_state; @@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z SWAP_CLUSTER_MAX), .gfp_mask = gfp_mask, .swappiness = vm_swappiness, + .isolate_pages = isolate_pages_global, }; unsigned long slab_reclaimable; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Kernel Panic - 2.6.23-rc4-mm1 ia64 - was Re: Update: [Automatic] NUMA replicated pagecache ...
Lee Schermerhorn wrote: On Wed, 2007-09-12 at 16:41 +0100, Andy Whitcroft wrote: On Wed, Sep 12, 2007 at 11:09:47AM -0400, Lee Schermerhorn wrote: Interesting, I don't see a memory controller function in the stack trace, but I'll double check to see if I can find some silly race condition in there. right. I noticed that after I sent the mail. Also, config available at: http://free.linux.hp.com/~lts/Temp/config-2.6.23-rc4-mm1-gwydyr-nomemcont Be interested to know the outcome of any bisect you do. Given its tripping in reclaim. Problem isolated to memory controller patches. This patch seems to fix this particular problem. I've only run the test for a few minutes with and without memory controller configured, but I did observe reclaim kicking in several times. W/o this patch, system would panic as soon as I entered direct/zone reclaim--less than a minute. Thanks, excellent catch! The patch looks sane. Thanks for your help in sorting this issue out. Hmm.. that means I never hit direct/zone reclaim in my tests (I'll make a mental note to enhance my test cases to cover this scenario). Lee PATCH 2.6.23-rc4-mm1 Memory Controller: initialize all scan_controls' isolate_pages member. We need to initialize all scan_controls' isolate_pages member. Otherwise, shrink_active_list() attempts to execute at undefined location. Signed-off-by: Lee Schermerhorn [EMAIL PROTECTED] mm/vmscan.c |2 ++ 1 file changed, 2 insertions(+) Index: Linux/mm/vmscan.c === --- Linux.orig/mm/vmscan.c2007-09-10 13:22:21.0 -0400 +++ Linux/mm/vmscan.c 2007-09-12 15:30:27.0 -0400 @@ -1758,6 +1758,7 @@ unsigned long shrink_all_memory(unsigned .swap_cluster_max = nr_pages, .may_writepage = 1, .swappiness = vm_swappiness, + .isolate_pages = isolate_pages_global, }; current-reclaim_state = reclaim_state; @@ -1941,6 +1942,7 @@ static int __zone_reclaim(struct zone *z SWAP_CLUSTER_MAX), .gfp_mask = gfp_mask, .swappiness = vm_swappiness, + .isolate_pages = isolate_pages_global, }; unsigned long slab_reclaimable; -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/