Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3
On Thu, 8 Feb 2007 11:28:30 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > > @@ -193,9 +197,11 @@ > > break; > > case MPOL_BIND: > > policy->v.zonelist = bind_zonelist(nodes); > > - if (policy->v.zonelist == NULL) { > > + if (IS_ERR(policy->v.zonelist)) { > > + void *val = policy->v.zonelist; > > + policy->v.zonelist = NULL; > > void *? Ahh. It takes the error code. > > Looks good. But if we are really going down this road of memory-less > nodes we may want to audit the kernel for other issues. > > Could you run a series of tests on that machine? > Yes. The program which caused trouble works fine. I used 'numademo' command in numactl package. It works fine (reports -EINVAL) with this patch now. I uses this a system with an empty-node for 5 months. reported 2 bugs. - oom-kill's memory less node detection logic. - mempolicy's NULL access(this) It works fine in general. (old RHEL4/linux-2.6.9 kernel doesn't boot on this system.) -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3
On Thu, 8 Feb 2007, KAMEZAWA Hiroyuki wrote: > @@ -162,6 +162,10 @@ > break; > k--; > } > + if (!num) { > + kfree(zl); > + return ERR_PTR(-EINVAL); > + } > zl->zones[num] = NULL; > return zl; > } Ok. So you are detecting a set of nodes that has nodes specified but the zones that these nodes refer to are empty, as an error. Should work. > @@ -193,9 +197,11 @@ > break; > case MPOL_BIND: > policy->v.zonelist = bind_zonelist(nodes); > - if (policy->v.zonelist == NULL) { > + if (IS_ERR(policy->v.zonelist)) { > + void *val = policy->v.zonelist; > + policy->v.zonelist = NULL; void *? Ahh. It takes the error code. Looks good. But if we are really going down this road of memory-less nodes we may want to audit the kernel for other issues. Could you run a series of tests on that machine? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3
Hi, thank you for reviewing. this is take3. (very sorry for sending twice) -Kame following is back trace of NULL pointer access in slab_node(). This patch fix this. == backtrace from crash (linux-2.6.20) == #0 [BSP:e00121f412d8] schedule at a0010061ccc0 #1 [BSP:e00121f41280] rwsem_down_failed_common at a00100290490 #2 [BSP:e00121f41260] rwsem_down_read_failed at a00100620d30 #3 [BSP:e00121f41240] down_read at a001000b01a0 #4 [BSP:e00121f411e8] ia64_do_page_fault at a00100625710 #5 [BSP:e00121f411e8] ia64_leave_kernel at a001c660 EFRAME: e00121f47100 B0: a0010013cc40 CR_IIP: a0010012aa30 CR_IPSR: 101008022018 CR_IFS: 8205 AR_PFS: 0309 AR_RSC: 0003 AR_UNAT: AR_RNAT: AR_CCV: AR_FPSR: 0009804c8a70033f LOADRS: AR_BSPSTORE: B6: a0010003f040 B7: a001ccd0 PR: 0055a9a5 R1: a00100d5a5b0 R2: e0010c50df7c R3: 0030 R8: R9: e0011dc52930 R10: e0011dc52928 R11: e0010c50df80 R12: e00121f472c0 R13: e00121f4 R14: 0002 R15: 3f00 R16: 1040 R17: e00121f4 R18: a00100b5a9d0 R19: e00121f40018 R20: e00121f40c84 R21: R22: e00121f47330 R23: e00121f47334 R24: e00121f40b88 R25: e00121f47340 R26: e00121f47334 R27: R28: R29: e00121f47338 R30: 7fff R31: a00100b5b5e0 F6: 1003eccd55056199632ec F7: 1003e9e3779b97f4a7c16 F8: 1003e0a0010001422 F9: 1003e0fa0 F10: 1003e3b9aca00F11: 1003e431bde82d7b634db #6 [BSP:e00121f411c0] slab_node at a0010012aa30 #7 [BSP:e00121f41190] alternate_node_alloc at a0010013cc40 #8 [BSP:e00121f41160] kmem_cache_alloc at a0010013dc40 #9 [BSP:e00121f41100] desc_prologue at a0010003ee00 #10 [BSP:e00121f410c0] unw_decode_r2 at a0010003f0c0 #11 [BSP:e00121f41068] find_save_locs at a0010003fbf0 #12 [BSP:e00121f41038] unw_init_frame_info at a00100040900 #13 [BSP:e00121f41010] unw_init_running at a001ccf0 == This panic(hang) was found by a numa test-set on a system with 3 nodes, where node(2) was memory-less-node. This patch fixes zero-length zonelist problem in MPOL_MBIND. If the length of zonelist is zero, just returns -EINVAL. Changelog: v2 -> v3 - changed handling of void *pointer - fixed warnings...misuse of PTR_ERR. Changelog: v1 -> v2 - avoid extra pgdat scanningit is not necessary. Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> Index: linux-2.6.20/mm/mempolicy.c === --- linux-2.6.20.orig/mm/mempolicy.c2007-02-08 09:50:45.0 +0900 +++ linux-2.6.20/mm/mempolicy.c 2007-02-08 17:25:34.0 +0900 @@ -144,7 +144,7 @@ max++; /* space for zlcache_ptr (see mmzone.h) */ zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL); if (!zl) - return NULL; + return ERR_PTR(-ENOMEM); zl->zlcache_ptr = NULL; num = 0; /* First put in the highest zones from all nodes, then all the next @@ -162,6 +162,10 @@ break; k--; } + if (!num) { + kfree(zl); + return ERR_PTR(-EINVAL); + } zl->zones[num] = NULL; return zl; } @@ -193,9 +197,11 @@ break; case MPOL_BIND: policy->v.zonelist = bind_zonelist(nodes); - if (policy->v.zonelist == NULL) { + if (IS_ERR(policy->v.zonelist)) { + void *val = policy->v.zonelist; + policy->v.zonelist = NULL; kmem_cache_free(policy_cache, policy); - return ERR_PTR(-ENOMEM); + return val; } break; } @@ -1662,12 +1668,12 @@ zonelist = bind_zonelist(); - /* If no mem, then zonelist is NULL and we keep old zonelist. + /* If no mem, then zonelist is ERR_PTR and we keep old zonelist. * If that old zonelist has no remaining mems_allowed nodes, * then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT. */ - if (zonelist) { + if (!IS_ERR(zonelist)) { /* Good - got mem - substitute new zonelist */ kfree(pol->v.zonelist); pol->v.zonelist =
Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3
Hi, thank you for reviewing. this is take3. (very sorry for sending twice) -Kame following is back trace of NULL pointer access in slab_node(). This patch fix this. == backtrace from crash (linux-2.6.20) == #0 [BSP:e00121f412d8] schedule at a0010061ccc0 #1 [BSP:e00121f41280] rwsem_down_failed_common at a00100290490 #2 [BSP:e00121f41260] rwsem_down_read_failed at a00100620d30 #3 [BSP:e00121f41240] down_read at a001000b01a0 #4 [BSP:e00121f411e8] ia64_do_page_fault at a00100625710 #5 [BSP:e00121f411e8] ia64_leave_kernel at a001c660 EFRAME: e00121f47100 B0: a0010013cc40 CR_IIP: a0010012aa30 CR_IPSR: 101008022018 CR_IFS: 8205 AR_PFS: 0309 AR_RSC: 0003 AR_UNAT: AR_RNAT: AR_CCV: AR_FPSR: 0009804c8a70033f LOADRS: AR_BSPSTORE: B6: a0010003f040 B7: a001ccd0 PR: 0055a9a5 R1: a00100d5a5b0 R2: e0010c50df7c R3: 0030 R8: R9: e0011dc52930 R10: e0011dc52928 R11: e0010c50df80 R12: e00121f472c0 R13: e00121f4 R14: 0002 R15: 3f00 R16: 1040 R17: e00121f4 R18: a00100b5a9d0 R19: e00121f40018 R20: e00121f40c84 R21: R22: e00121f47330 R23: e00121f47334 R24: e00121f40b88 R25: e00121f47340 R26: e00121f47334 R27: R28: R29: e00121f47338 R30: 7fff R31: a00100b5b5e0 F6: 1003eccd55056199632ec F7: 1003e9e3779b97f4a7c16 F8: 1003e0a0010001422 F9: 1003e0fa0 F10: 1003e3b9aca00F11: 1003e431bde82d7b634db #6 [BSP:e00121f411c0] slab_node at a0010012aa30 #7 [BSP:e00121f41190] alternate_node_alloc at a0010013cc40 #8 [BSP:e00121f41160] kmem_cache_alloc at a0010013dc40 #9 [BSP:e00121f41100] desc_prologue at a0010003ee00 #10 [BSP:e00121f410c0] unw_decode_r2 at a0010003f0c0 #11 [BSP:e00121f41068] find_save_locs at a0010003fbf0 #12 [BSP:e00121f41038] unw_init_frame_info at a00100040900 #13 [BSP:e00121f41010] unw_init_running at a001ccf0 == This panic(hang) was found by a numa test-set on a system with 3 nodes, where node(2) was memory-less-node. This patch fixes zero-length zonelist problem in MPOL_MBIND. If the length of zonelist is zero, just returns -EINVAL. Changelog: v2 - v3 - changed handling of void *pointer - fixed warnings...misuse of PTR_ERR. Changelog: v1 - v2 - avoid extra pgdat scanningit is not necessary. Signed-Off-By: KAMEZAWA Hiroyuki [EMAIL PROTECTED] Index: linux-2.6.20/mm/mempolicy.c === --- linux-2.6.20.orig/mm/mempolicy.c2007-02-08 09:50:45.0 +0900 +++ linux-2.6.20/mm/mempolicy.c 2007-02-08 17:25:34.0 +0900 @@ -144,7 +144,7 @@ max++; /* space for zlcache_ptr (see mmzone.h) */ zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL); if (!zl) - return NULL; + return ERR_PTR(-ENOMEM); zl-zlcache_ptr = NULL; num = 0; /* First put in the highest zones from all nodes, then all the next @@ -162,6 +162,10 @@ break; k--; } + if (!num) { + kfree(zl); + return ERR_PTR(-EINVAL); + } zl-zones[num] = NULL; return zl; } @@ -193,9 +197,11 @@ break; case MPOL_BIND: policy-v.zonelist = bind_zonelist(nodes); - if (policy-v.zonelist == NULL) { + if (IS_ERR(policy-v.zonelist)) { + void *val = policy-v.zonelist; + policy-v.zonelist = NULL; kmem_cache_free(policy_cache, policy); - return ERR_PTR(-ENOMEM); + return val; } break; } @@ -1662,12 +1668,12 @@ zonelist = bind_zonelist(nodes); - /* If no mem, then zonelist is NULL and we keep old zonelist. + /* If no mem, then zonelist is ERR_PTR and we keep old zonelist. * If that old zonelist has no remaining mems_allowed nodes, * then zonelist_policy() will FALL THROUGH to MPOL_DEFAULT. */ - if (zonelist) { + if (!IS_ERR(zonelist)) { /* Good - got mem - substitute new zonelist */ kfree(pol-v.zonelist); pol-v.zonelist = zonelist;
Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3
On Thu, 8 Feb 2007, KAMEZAWA Hiroyuki wrote: @@ -162,6 +162,10 @@ break; k--; } + if (!num) { + kfree(zl); + return ERR_PTR(-EINVAL); + } zl-zones[num] = NULL; return zl; } Ok. So you are detecting a set of nodes that has nodes specified but the zones that these nodes refer to are empty, as an error. Should work. @@ -193,9 +197,11 @@ break; case MPOL_BIND: policy-v.zonelist = bind_zonelist(nodes); - if (policy-v.zonelist == NULL) { + if (IS_ERR(policy-v.zonelist)) { + void *val = policy-v.zonelist; + policy-v.zonelist = NULL; void *? Ahh. It takes the error code. Looks good. But if we are really going down this road of memory-less nodes we may want to audit the kernel for other issues. Could you run a series of tests on that machine? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3
On Thu, 8 Feb 2007 11:28:30 -0800 (PST) Christoph Lameter [EMAIL PROTECTED] wrote: @@ -193,9 +197,11 @@ break; case MPOL_BIND: policy-v.zonelist = bind_zonelist(nodes); - if (policy-v.zonelist == NULL) { + if (IS_ERR(policy-v.zonelist)) { + void *val = policy-v.zonelist; + policy-v.zonelist = NULL; void *? Ahh. It takes the error code. Looks good. But if we are really going down this road of memory-less nodes we may want to audit the kernel for other issues. Could you run a series of tests on that machine? Yes. The program which caused trouble works fine. I used 'numademo' command in numactl package. It works fine (reports -EINVAL) with this patch now. I uses this a system with an empty-node for 5 months. reported 2 bugs. - oom-kill's memory less node detection logic. - mempolicy's NULL access(this) It works fine in general. (old RHEL4/linux-2.6.9 kernel doesn't boot on this system.) -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/