Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3

2007-02-08 Thread KAMEZAWA Hiroyuki
On Thu, 8 Feb 2007 11:28:30 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > @@ -193,9 +197,11 @@
> > break;
> > case MPOL_BIND:
> > policy->v.zonelist = bind_zonelist(nodes);
> > -   if (policy->v.zonelist == NULL) {
> > +   if (IS_ERR(policy->v.zonelist)) {
> > +   void *val = policy->v.zonelist;
> > +   policy->v.zonelist = NULL;
> 
> void *? Ahh. It takes the error code.
> 
> Looks good. But if we are really going down this road of memory-less 
> nodes we may want to audit the kernel for other issues.
> 
> Could you run a series of tests on that machine?
> 
Yes. The program which caused trouble works fine.
I used 'numademo' command in numactl package.
It works fine (reports -EINVAL) with this patch now.

I uses this a system with an empty-node for 5 months.
reported 2 bugs.
- oom-kill's memory less node detection logic.
- mempolicy's NULL access(this)

It works fine in general.
(old RHEL4/linux-2.6.9 kernel doesn't boot on this system.)

-Kame









-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3

2007-02-08 Thread Christoph Lameter
On Thu, 8 Feb 2007, KAMEZAWA Hiroyuki wrote:

> @@ -162,6 +162,10 @@
>   break;
>   k--;
>   }
> + if (!num) {
> + kfree(zl);
> + return ERR_PTR(-EINVAL);
> + }
>   zl->zones[num] = NULL;
>   return zl;
>  }

Ok. So you are detecting a set of nodes that has nodes specified but the 
zones that these nodes refer to are empty,  as an error.

Should work.

> @@ -193,9 +197,11 @@
>   break;
>   case MPOL_BIND:
>   policy->v.zonelist = bind_zonelist(nodes);
> - if (policy->v.zonelist == NULL) {
> + if (IS_ERR(policy->v.zonelist)) {
> + void *val = policy->v.zonelist;
> + policy->v.zonelist = NULL;

void *? Ahh. It takes the error code.

Looks good. But if we are really going down this road of memory-less 
nodes we may want to audit the kernel for other issues.

Could you run a series of tests on that machine?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3

2007-02-08 Thread KAMEZAWA Hiroyuki

Hi, thank you for reviewing. this is take3.
(very sorry for sending twice)

-Kame
following is back trace of NULL pointer access in slab_node().
This patch fix this.
== backtrace from crash (linux-2.6.20) ==
 #0 [BSP:e00121f412d8] schedule at a0010061ccc0
 #1 [BSP:e00121f41280] rwsem_down_failed_common at a00100290490
 #2 [BSP:e00121f41260] rwsem_down_read_failed at a00100620d30
 #3 [BSP:e00121f41240] down_read at a001000b01a0
 #4 [BSP:e00121f411e8] ia64_do_page_fault at a00100625710
 #5 [BSP:e00121f411e8] ia64_leave_kernel at a001c660
  EFRAME: e00121f47100
  B0: a0010013cc40  CR_IIP: a0010012aa30
 CR_IPSR: 101008022018  CR_IFS: 8205
  AR_PFS: 0309  AR_RSC: 0003
 AR_UNAT:  AR_RNAT: 
  AR_CCV:  AR_FPSR: 0009804c8a70033f
  LOADRS:  AR_BSPSTORE: 
  B6: a0010003f040  B7: a001ccd0
  PR: 0055a9a5  R1: a00100d5a5b0
  R2: e0010c50df7c  R3: 0030
  R8:   R9: e0011dc52930
 R10: e0011dc52928 R11: e0010c50df80
 R12: e00121f472c0 R13: e00121f4
 R14: 0002 R15: 3f00
 R16: 1040 R17: e00121f4
 R18: a00100b5a9d0 R19: e00121f40018
 R20: e00121f40c84 R21: 
 R22: e00121f47330 R23: e00121f47334
 R24: e00121f40b88 R25: e00121f47340
 R26: e00121f47334 R27: 
 R28:  R29: e00121f47338
 R30: 7fff R31: a00100b5b5e0
  F6: 1003eccd55056199632ec F7: 1003e9e3779b97f4a7c16
  F8: 1003e0a0010001422 F9: 1003e0fa0
 F10: 1003e3b9aca00F11: 1003e431bde82d7b634db
 #6 [BSP:e00121f411c0] slab_node at a0010012aa30
 #7 [BSP:e00121f41190] alternate_node_alloc at a0010013cc40
 #8 [BSP:e00121f41160] kmem_cache_alloc at a0010013dc40
 #9 [BSP:e00121f41100] desc_prologue at a0010003ee00
#10 [BSP:e00121f410c0] unw_decode_r2 at a0010003f0c0
#11 [BSP:e00121f41068] find_save_locs at a0010003fbf0
#12 [BSP:e00121f41038] unw_init_frame_info at a00100040900
#13 [BSP:e00121f41010] unw_init_running at a001ccf0
==
This panic(hang) was found by a numa test-set on a system with 3 nodes, where
node(2) was memory-less-node.
This patch fixes zero-length zonelist problem in MPOL_MBIND.
If the length of zonelist is zero, just returns -EINVAL.

Changelog: v2 -> v3
- changed handling of void *pointer
- fixed warnings...misuse of PTR_ERR.

Changelog: v1 -> v2
- avoid extra pgdat scanningit is not necessary.

Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>


Index: linux-2.6.20/mm/mempolicy.c
===
--- linux-2.6.20.orig/mm/mempolicy.c2007-02-08 09:50:45.0 +0900
+++ linux-2.6.20/mm/mempolicy.c 2007-02-08 17:25:34.0 +0900
@@ -144,7 +144,7 @@
max++;  /* space for zlcache_ptr (see mmzone.h) */
zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
if (!zl)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
zl->zlcache_ptr = NULL;
num = 0;
/* First put in the highest zones from all nodes, then all the next 
@@ -162,6 +162,10 @@
break;
k--;
}
+   if (!num) {
+   kfree(zl);
+   return ERR_PTR(-EINVAL);
+   }
zl->zones[num] = NULL;
return zl;
 }
@@ -193,9 +197,11 @@
break;
case MPOL_BIND:
policy->v.zonelist = bind_zonelist(nodes);
-   if (policy->v.zonelist == NULL) {
+   if (IS_ERR(policy->v.zonelist)) {
+   void *val = policy->v.zonelist;
+   policy->v.zonelist = NULL;
kmem_cache_free(policy_cache, policy);
-   return ERR_PTR(-ENOMEM);
+   return val;
}
break;
}
@@ -1662,12 +1668,12 @@
 
zonelist = bind_zonelist();
 
-   /* If no mem, then zonelist is NULL and we keep old zonelist.
+   /* If no mem, then zonelist is ERR_PTR and we keep old zonelist.
 * If that old zonelist has no remaining mems_allowed nodes,
 * then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT.
 */
 
-   if (zonelist) {
+   if (!IS_ERR(zonelist)) {
/* Good - got mem - substitute new zonelist */
kfree(pol->v.zonelist);
pol->v.zonelist = 

Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3

2007-02-08 Thread KAMEZAWA Hiroyuki

Hi, thank you for reviewing. this is take3.
(very sorry for sending twice)

-Kame
following is back trace of NULL pointer access in slab_node().
This patch fix this.
== backtrace from crash (linux-2.6.20) ==
 #0 [BSP:e00121f412d8] schedule at a0010061ccc0
 #1 [BSP:e00121f41280] rwsem_down_failed_common at a00100290490
 #2 [BSP:e00121f41260] rwsem_down_read_failed at a00100620d30
 #3 [BSP:e00121f41240] down_read at a001000b01a0
 #4 [BSP:e00121f411e8] ia64_do_page_fault at a00100625710
 #5 [BSP:e00121f411e8] ia64_leave_kernel at a001c660
  EFRAME: e00121f47100
  B0: a0010013cc40  CR_IIP: a0010012aa30
 CR_IPSR: 101008022018  CR_IFS: 8205
  AR_PFS: 0309  AR_RSC: 0003
 AR_UNAT:  AR_RNAT: 
  AR_CCV:  AR_FPSR: 0009804c8a70033f
  LOADRS:  AR_BSPSTORE: 
  B6: a0010003f040  B7: a001ccd0
  PR: 0055a9a5  R1: a00100d5a5b0
  R2: e0010c50df7c  R3: 0030
  R8:   R9: e0011dc52930
 R10: e0011dc52928 R11: e0010c50df80
 R12: e00121f472c0 R13: e00121f4
 R14: 0002 R15: 3f00
 R16: 1040 R17: e00121f4
 R18: a00100b5a9d0 R19: e00121f40018
 R20: e00121f40c84 R21: 
 R22: e00121f47330 R23: e00121f47334
 R24: e00121f40b88 R25: e00121f47340
 R26: e00121f47334 R27: 
 R28:  R29: e00121f47338
 R30: 7fff R31: a00100b5b5e0
  F6: 1003eccd55056199632ec F7: 1003e9e3779b97f4a7c16
  F8: 1003e0a0010001422 F9: 1003e0fa0
 F10: 1003e3b9aca00F11: 1003e431bde82d7b634db
 #6 [BSP:e00121f411c0] slab_node at a0010012aa30
 #7 [BSP:e00121f41190] alternate_node_alloc at a0010013cc40
 #8 [BSP:e00121f41160] kmem_cache_alloc at a0010013dc40
 #9 [BSP:e00121f41100] desc_prologue at a0010003ee00
#10 [BSP:e00121f410c0] unw_decode_r2 at a0010003f0c0
#11 [BSP:e00121f41068] find_save_locs at a0010003fbf0
#12 [BSP:e00121f41038] unw_init_frame_info at a00100040900
#13 [BSP:e00121f41010] unw_init_running at a001ccf0
==
This panic(hang) was found by a numa test-set on a system with 3 nodes, where
node(2) was memory-less-node.
This patch fixes zero-length zonelist problem in MPOL_MBIND.
If the length of zonelist is zero, just returns -EINVAL.

Changelog: v2 - v3
- changed handling of void *pointer
- fixed warnings...misuse of PTR_ERR.

Changelog: v1 - v2
- avoid extra pgdat scanningit is not necessary.

Signed-Off-By: KAMEZAWA Hiroyuki [EMAIL PROTECTED]


Index: linux-2.6.20/mm/mempolicy.c
===
--- linux-2.6.20.orig/mm/mempolicy.c2007-02-08 09:50:45.0 +0900
+++ linux-2.6.20/mm/mempolicy.c 2007-02-08 17:25:34.0 +0900
@@ -144,7 +144,7 @@
max++;  /* space for zlcache_ptr (see mmzone.h) */
zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
if (!zl)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
zl-zlcache_ptr = NULL;
num = 0;
/* First put in the highest zones from all nodes, then all the next 
@@ -162,6 +162,10 @@
break;
k--;
}
+   if (!num) {
+   kfree(zl);
+   return ERR_PTR(-EINVAL);
+   }
zl-zones[num] = NULL;
return zl;
 }
@@ -193,9 +197,11 @@
break;
case MPOL_BIND:
policy-v.zonelist = bind_zonelist(nodes);
-   if (policy-v.zonelist == NULL) {
+   if (IS_ERR(policy-v.zonelist)) {
+   void *val = policy-v.zonelist;
+   policy-v.zonelist = NULL;
kmem_cache_free(policy_cache, policy);
-   return ERR_PTR(-ENOMEM);
+   return val;
}
break;
}
@@ -1662,12 +1668,12 @@
 
zonelist = bind_zonelist(nodes);
 
-   /* If no mem, then zonelist is NULL and we keep old zonelist.
+   /* If no mem, then zonelist is ERR_PTR and we keep old zonelist.
 * If that old zonelist has no remaining mems_allowed nodes,
 * then zonelist_policy() will FALL THROUGH to MPOL_DEFAULT.
 */
 
-   if (zonelist) {
+   if (!IS_ERR(zonelist)) {
/* Good - got mem - substitute new zonelist */
kfree(pol-v.zonelist);
pol-v.zonelist = zonelist;




Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3

2007-02-08 Thread Christoph Lameter
On Thu, 8 Feb 2007, KAMEZAWA Hiroyuki wrote:

 @@ -162,6 +162,10 @@
   break;
   k--;
   }
 + if (!num) {
 + kfree(zl);
 + return ERR_PTR(-EINVAL);
 + }
   zl-zones[num] = NULL;
   return zl;
  }

Ok. So you are detecting a set of nodes that has nodes specified but the 
zones that these nodes refer to are empty,  as an error.

Should work.

 @@ -193,9 +197,11 @@
   break;
   case MPOL_BIND:
   policy-v.zonelist = bind_zonelist(nodes);
 - if (policy-v.zonelist == NULL) {
 + if (IS_ERR(policy-v.zonelist)) {
 + void *val = policy-v.zonelist;
 + policy-v.zonelist = NULL;

void *? Ahh. It takes the error code.

Looks good. But if we are really going down this road of memory-less 
nodes we may want to audit the kernel for other issues.

Could you run a series of tests on that machine?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [BUG][PATCH] fix mempolcy's check on a system with memory-less-node take3

2007-02-08 Thread KAMEZAWA Hiroyuki
On Thu, 8 Feb 2007 11:28:30 -0800 (PST)
Christoph Lameter [EMAIL PROTECTED] wrote:
  @@ -193,9 +197,11 @@
  break;
  case MPOL_BIND:
  policy-v.zonelist = bind_zonelist(nodes);
  -   if (policy-v.zonelist == NULL) {
  +   if (IS_ERR(policy-v.zonelist)) {
  +   void *val = policy-v.zonelist;
  +   policy-v.zonelist = NULL;
 
 void *? Ahh. It takes the error code.
 
 Looks good. But if we are really going down this road of memory-less 
 nodes we may want to audit the kernel for other issues.
 
 Could you run a series of tests on that machine?
 
Yes. The program which caused trouble works fine.
I used 'numademo' command in numactl package.
It works fine (reports -EINVAL) with this patch now.

I uses this a system with an empty-node for 5 months.
reported 2 bugs.
- oom-kill's memory less node detection logic.
- mempolicy's NULL access(this)

It works fine in general.
(old RHEL4/linux-2.6.9 kernel doesn't boot on this system.)

-Kame









-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/