Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread David Rientjes
On Mon, 1 Dec 2014, Paul Mackerras wrote:

> The bounds check for nodeid in cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
> 
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c13ba230 ti: c1494000 task.ti: c1494000
> NIP: c0264f6c LR: c0264f5c CTR: 
> REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 92021032   CR: 28000448  XER: 2000
> CFAR: c047e978 SOFTE: 0
> GPR00: c0264f5c c1497c20 c1499d48 0004
> GPR04: 0100 0010 0068 
> GPR08:  0001 082d c0cca5a8
> GPR12: 48000448 cfda 01003bd44ff0 10020578
> GPR16: 01003bd44ff8 01003bd45000 0001 
> GPR20:    0010
> GPR24: c00ffe80 c0c824ec 0068 c00ffe80
> GPR28: 0010 c00ffe80 0010 
> NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
> LR [c0264f5c] .cache_alloc_node+0x5c/0x270
> Call Trace:
> [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
> (unreliable)
> [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c1497dc0] [c0c824ec] .init_list+0x3c/0x128
> [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
> [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
> [c1497f90] [c0008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
> 6000 7f83e000 7d301026 5529effe <0b09> 393c0010 79291f24 7d3d4a14
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
> 
> Signed-off-by: Paul Mackerras 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread Pekka Enberg

On 12/1/14 6:28 AM, Paul Mackerras wrote:

---
v2: include the oops message in the patch description

  mm/slab.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;
  
-	VM_BUG_ON(nodeid > num_online_nodes());

+   VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);


Reviewed-by: Pekka Enberg 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread Michael Ellerman
On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote:
> On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> > On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > > The bounds check for nodeid in cache_alloc_node gives false
> > > positives on machines where the node IDs are not contiguous, leading
> > > to a panic at boot time.  For example, on a POWER8 machine the node
> > > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > > returns 4, so when cache_alloc_node is called with nodeid = 16 the
> > > VM_BUG_ON triggers, like this:
> > ...
> > > 
> > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > > additionally make sure it isn't negative (since nodeid is an int).
> > > The check is there mainly to protect the array dereference in the
> > > get_node() call in the next line, and the array being dereferenced is
> > > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > > example if the node is off-line), the BUG_ON in the next line will
> > > catch that.
> > 
> > When did this break? How come we only just noticed?
> 
> Commit 14e50c6a9bc2, which went into 3.10-rc1.

OK. So a Fixes tag is nice:

Fixes: 14e50c6a9bc2 ("mm: slab: Verify the nodeid passed to 
cache_alloc_node")

> You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
> and you're running on a machine with discontiguous node IDs.

Right. And we have SLUB=y for all the defconfigs that are likely to hit that.

> > Also needs:
> > 
> > Cc: sta...@vger.kernel.org
> 
> It does.  I remembered that a minute after I sent the patch.

OK. Hopefully one of the slab maintainers will be happy to add it for us when
they merge this?

cheers



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread Michael Ellerman
On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote:
 On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
  On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
   The bounds check for nodeid in cache_alloc_node gives false
   positives on machines where the node IDs are not contiguous, leading
   to a panic at boot time.  For example, on a POWER8 machine the node
   IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
   returns 4, so when cache_alloc_node is called with nodeid = 16 the
   VM_BUG_ON triggers, like this:
  ...
   
   To fix this, we instead compare the nodeid with MAX_NUMNODES, and
   additionally make sure it isn't negative (since nodeid is an int).
   The check is there mainly to protect the array dereference in the
   get_node() call in the next line, and the array being dereferenced is
   of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
   example if the node is off-line), the BUG_ON in the next line will
   catch that.
  
  When did this break? How come we only just noticed?
 
 Commit 14e50c6a9bc2, which went into 3.10-rc1.

OK. So a Fixes tag is nice:

Fixes: 14e50c6a9bc2 (mm: slab: Verify the nodeid passed to 
cache_alloc_node)

 You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
 and you're running on a machine with discontiguous node IDs.

Right. And we have SLUB=y for all the defconfigs that are likely to hit that.

  Also needs:
  
  Cc: sta...@vger.kernel.org
 
 It does.  I remembered that a minute after I sent the patch.

OK. Hopefully one of the slab maintainers will be happy to add it for us when
they merge this?

cheers



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread Pekka Enberg

On 12/1/14 6:28 AM, Paul Mackerras wrote:

---
v2: include the oops message in the patch description

  mm/slab.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;
  
-	VM_BUG_ON(nodeid  num_online_nodes());

+   VM_BUG_ON(nodeid  0 || nodeid = MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);


Reviewed-by: Pekka Enberg penb...@kernel.org

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-12-01 Thread David Rientjes
On Mon, 1 Dec 2014, Paul Mackerras wrote:

 The bounds check for nodeid in cache_alloc_node gives false
 positives on machines where the node IDs are not contiguous, leading
 to a panic at boot time.  For example, on a POWER8 machine the node
 IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
 returns 4, so when cache_alloc_node is called with nodeid = 16 the
 VM_BUG_ON triggers, like this:
 
 kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
 Oops: Exception in kernel mode, sig: 5 [#1]
 SMP NR_CPUS=1024 NUMA PowerNV
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
 task: c13ba230 ti: c1494000 task.ti: c1494000
 NIP: c0264f6c LR: c0264f5c CTR: 
 REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
 MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI  CR: 28000448  XER: 2000
 CFAR: c047e978 SOFTE: 0
 GPR00: c0264f5c c1497c20 c1499d48 0004
 GPR04: 0100 0010 0068 
 GPR08:  0001 082d c0cca5a8
 GPR12: 48000448 cfda 01003bd44ff0 10020578
 GPR16: 01003bd44ff8 01003bd45000 0001 
 GPR20:    0010
 GPR24: c00ffe80 c0c824ec 0068 c00ffe80
 GPR28: 0010 c00ffe80 0010 
 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
 LR [c0264f5c] .cache_alloc_node+0x5c/0x270
 Call Trace:
 [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
 (unreliable)
 [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128
 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8
 Instruction dump:
 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
 6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14
 
 To fix this, we instead compare the nodeid with MAX_NUMNODES, and
 additionally make sure it isn't negative (since nodeid is an int).
 The check is there mainly to protect the array dereference in the
 get_node() call in the next line, and the array being dereferenced is
 of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
 example if the node is off-line), the BUG_ON in the next line will
 catch that.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Acked-by: David Rientjes rient...@google.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Paul Mackerras
On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > The bounds check for nodeid in cache_alloc_node gives false
> > positives on machines where the node IDs are not contiguous, leading
> > to a panic at boot time.  For example, on a POWER8 machine the node
> > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > returns 4, so when cache_alloc_node is called with nodeid = 16 the
> > VM_BUG_ON triggers, like this:
> ...
> > 
> > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > additionally make sure it isn't negative (since nodeid is an int).
> > The check is there mainly to protect the array dereference in the
> > get_node() call in the next line, and the array being dereferenced is
> > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > example if the node is off-line), the BUG_ON in the next line will
> > catch that.
> 
> When did this break? How come we only just noticed?

Commit 14e50c6a9bc2, which went into 3.10-rc1.

You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
and you're running on a machine with discontiguous node IDs.

> Also needs:
> 
> Cc: sta...@vger.kernel.org

It does.  I remembered that a minute after I sent the patch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Michael Ellerman
On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> The bounds check for nodeid in cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
...
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.

When did this break? How come we only just noticed?

Also needs:

Cc: sta...@vger.kernel.org

cheers



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Yasuaki Ishimatsu

(2014/12/01 13:28), Paul Mackerras wrote:

The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c13ba230 ti: c1494000 task.ti: c1494000
NIP: c0264f6c LR: c0264f5c CTR: 
REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 92021032   CR: 28000448  XER: 2000
CFAR: c047e978 SOFTE: 0
GPR00: c0264f5c c1497c20 c1499d48 0004
GPR04: 0100 0010 0068 
GPR08:  0001 082d c0cca5a8
GPR12: 48000448 cfda 01003bd44ff0 10020578
GPR16: 01003bd44ff8 01003bd45000 0001 
GPR20:    0010
GPR24: c00ffe80 c0c824ec 0068 c00ffe80
GPR28: 0010 c00ffe80 0010 
NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
LR [c0264f5c] .cache_alloc_node+0x5c/0x270
Call Trace:
[c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
(unreliable)
[c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c1497dc0] [c0c824ec] .init_list+0x3c/0x128
[c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
[c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
[c1497f90] [c0008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
6000 7f83e000 7d301026 5529effe <0b09> 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras 
---


Looks good to me.

Reviewed-by: Yasuaki Ishimatsu 

If you need to backport it into -stable kernel, please read
Documentation/stable_kernel_rules.txt.

Thanks,
Yasuaki Ishimatsu


v2: include the oops message in the patch description

  mm/slab.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;

-   VM_BUG_ON(nodeid > num_online_nodes());
+   VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Paul Mackerras
The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c13ba230 ti: c1494000 task.ti: c1494000
NIP: c0264f6c LR: c0264f5c CTR: 
REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 92021032   CR: 28000448  XER: 2000
CFAR: c047e978 SOFTE: 0
GPR00: c0264f5c c1497c20 c1499d48 0004
GPR04: 0100 0010 0068 
GPR08:  0001 082d c0cca5a8
GPR12: 48000448 cfda 01003bd44ff0 10020578
GPR16: 01003bd44ff8 01003bd45000 0001 
GPR20:    0010
GPR24: c00ffe80 c0c824ec 0068 c00ffe80
GPR28: 0010 c00ffe80 0010 
NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
LR [c0264f5c] .cache_alloc_node+0x5c/0x270
Call Trace:
[c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
(unreliable)
[c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c1497dc0] [c0c824ec] .init_list+0x3c/0x128
[c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
[c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
[c1497f90] [c0008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
6000 7f83e000 7d301026 5529effe <0b09> 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras 
---
v2: include the oops message in the patch description

 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;
 
-   VM_BUG_ON(nodeid > num_online_nodes());
+   VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);
 
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Paul Mackerras
The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c13ba230 ti: c1494000 task.ti: c1494000
NIP: c0264f6c LR: c0264f5c CTR: 
REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI  CR: 28000448  XER: 2000
CFAR: c047e978 SOFTE: 0
GPR00: c0264f5c c1497c20 c1499d48 0004
GPR04: 0100 0010 0068 
GPR08:  0001 082d c0cca5a8
GPR12: 48000448 cfda 01003bd44ff0 10020578
GPR16: 01003bd44ff8 01003bd45000 0001 
GPR20:    0010
GPR24: c00ffe80 c0c824ec 0068 c00ffe80
GPR28: 0010 c00ffe80 0010 
NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
LR [c0264f5c] .cache_alloc_node+0x5c/0x270
Call Trace:
[c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
(unreliable)
[c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c1497dc0] [c0c824ec] .init_list+0x3c/0x128
[c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
[c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
[c1497f90] [c0008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v2: include the oops message in the patch description

 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;
 
-   VM_BUG_ON(nodeid  num_online_nodes());
+   VM_BUG_ON(nodeid  0 || nodeid = MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);
 
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Yasuaki Ishimatsu

(2014/12/01 13:28), Paul Mackerras wrote:

The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c13ba230 ti: c1494000 task.ti: c1494000
NIP: c0264f6c LR: c0264f5c CTR: 
REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI  CR: 28000448  XER: 2000
CFAR: c047e978 SOFTE: 0
GPR00: c0264f5c c1497c20 c1499d48 0004
GPR04: 0100 0010 0068 
GPR08:  0001 082d c0cca5a8
GPR12: 48000448 cfda 01003bd44ff0 10020578
GPR16: 01003bd44ff8 01003bd45000 0001 
GPR20:    0010
GPR24: c00ffe80 c0c824ec 0068 c00ffe80
GPR28: 0010 c00ffe80 0010 
NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
LR [c0264f5c] .cache_alloc_node+0x5c/0x270
Call Trace:
[c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
(unreliable)
[c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c1497dc0] [c0c824ec] .init_list+0x3c/0x128
[c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
[c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
[c1497f90] [c0008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras pau...@samba.org
---


Looks good to me.

Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

If you need to backport it into -stable kernel, please read
Documentation/stable_kernel_rules.txt.

Thanks,
Yasuaki Ishimatsu


v2: include the oops message in the patch description

  mm/slab.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;

-   VM_BUG_ON(nodeid  num_online_nodes());
+   VM_BUG_ON(nodeid  0 || nodeid = MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Michael Ellerman
On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
 The bounds check for nodeid in cache_alloc_node gives false
 positives on machines where the node IDs are not contiguous, leading
 to a panic at boot time.  For example, on a POWER8 machine the node
 IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
 returns 4, so when cache_alloc_node is called with nodeid = 16 the
 VM_BUG_ON triggers, like this:
...
 
 To fix this, we instead compare the nodeid with MAX_NUMNODES, and
 additionally make sure it isn't negative (since nodeid is an int).
 The check is there mainly to protect the array dereference in the
 get_node() call in the next line, and the array being dereferenced is
 of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
 example if the node is off-line), the BUG_ON in the next line will
 catch that.

When did this break? How come we only just noticed?

Also needs:

Cc: sta...@vger.kernel.org

cheers



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Paul Mackerras
On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
 On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
  The bounds check for nodeid in cache_alloc_node gives false
  positives on machines where the node IDs are not contiguous, leading
  to a panic at boot time.  For example, on a POWER8 machine the node
  IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
  returns 4, so when cache_alloc_node is called with nodeid = 16 the
  VM_BUG_ON triggers, like this:
 ...
  
  To fix this, we instead compare the nodeid with MAX_NUMNODES, and
  additionally make sure it isn't negative (since nodeid is an int).
  The check is there mainly to protect the array dereference in the
  get_node() call in the next line, and the array being dereferenced is
  of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
  example if the node is off-line), the BUG_ON in the next line will
  catch that.
 
 When did this break? How come we only just noticed?

Commit 14e50c6a9bc2, which went into 3.10-rc1.

You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
and you're running on a machine with discontiguous node IDs.

 Also needs:
 
 Cc: sta...@vger.kernel.org

It does.  I remembered that a minute after I sent the patch.

Paul.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/