Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, 1 Dec 2014, Paul Mackerras wrote: > The bounds check for nodeid in cache_alloc_node gives false > positives on machines where the node IDs are not contiguous, leading > to a panic at boot time. For example, on a POWER8 machine the node > IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() > returns 4, so when cache_alloc_node is called with nodeid = 16 the > VM_BUG_ON triggers, like this: > > kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! > Oops: Exception in kernel mode, sig: 5 [#1] > SMP NR_CPUS=1024 NUMA PowerNV > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 > task: c13ba230 ti: c1494000 task.ti: c1494000 > NIP: c0264f6c LR: c0264f5c CTR: > REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) > MSR: 92021032 CR: 28000448 XER: 2000 > CFAR: c047e978 SOFTE: 0 > GPR00: c0264f5c c1497c20 c1499d48 0004 > GPR04: 0100 0010 0068 > GPR08: 0001 082d c0cca5a8 > GPR12: 48000448 cfda 01003bd44ff0 10020578 > GPR16: 01003bd44ff8 01003bd45000 0001 > GPR20: 0010 > GPR24: c00ffe80 c0c824ec 0068 c00ffe80 > GPR28: 0010 c00ffe80 0010 > NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 > LR [c0264f5c] .cache_alloc_node+0x5c/0x270 > Call Trace: > [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 > (unreliable) > [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 > [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 > [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 > [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 > [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 > Instruction dump: > 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 > 6000 7f83e000 7d301026 5529effe <0b09> 393c0010 79291f24 7d3d4a14 > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and > additionally make sure it isn't negative (since nodeid is an int). > The check is there mainly to protect the array dereference in the > get_node() call in the next line, and the array being dereferenced is > of size MAX_NUMNODES. If the nodeid is in range but invalid (for > example if the node is off-line), the BUG_ON in the next line will > catch that. > > Signed-off-by: Paul Mackerras Acked-by: David Rientjes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On 12/1/14 6:28 AM, Paul Mackerras wrote: --- v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid > num_online_nodes()); + VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); Reviewed-by: Pekka Enberg -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote: > On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote: > > On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote: > > > The bounds check for nodeid in cache_alloc_node gives false > > > positives on machines where the node IDs are not contiguous, leading > > > to a panic at boot time. For example, on a POWER8 machine the node > > > IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() > > > returns 4, so when cache_alloc_node is called with nodeid = 16 the > > > VM_BUG_ON triggers, like this: > > ... > > > > > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and > > > additionally make sure it isn't negative (since nodeid is an int). > > > The check is there mainly to protect the array dereference in the > > > get_node() call in the next line, and the array being dereferenced is > > > of size MAX_NUMNODES. If the nodeid is in range but invalid (for > > > example if the node is off-line), the BUG_ON in the next line will > > > catch that. > > > > When did this break? How come we only just noticed? > > Commit 14e50c6a9bc2, which went into 3.10-rc1. OK. So a Fixes tag is nice: Fixes: 14e50c6a9bc2 ("mm: slab: Verify the nodeid passed to cache_alloc_node") > You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y > and you're running on a machine with discontiguous node IDs. Right. And we have SLUB=y for all the defconfigs that are likely to hit that. > > Also needs: > > > > Cc: sta...@vger.kernel.org > > It does. I remembered that a minute after I sent the patch. OK. Hopefully one of the slab maintainers will be happy to add it for us when they merge this? cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote: On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote: On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: ... To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. When did this break? How come we only just noticed? Commit 14e50c6a9bc2, which went into 3.10-rc1. OK. So a Fixes tag is nice: Fixes: 14e50c6a9bc2 (mm: slab: Verify the nodeid passed to cache_alloc_node) You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y and you're running on a machine with discontiguous node IDs. Right. And we have SLUB=y for all the defconfigs that are likely to hit that. Also needs: Cc: sta...@vger.kernel.org It does. I remembered that a minute after I sent the patch. OK. Hopefully one of the slab maintainers will be happy to add it for us when they merge this? cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On 12/1/14 6:28 AM, Paul Mackerras wrote: --- v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid num_online_nodes()); + VM_BUG_ON(nodeid 0 || nodeid = MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); Reviewed-by: Pekka Enberg penb...@kernel.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, 1 Dec 2014, Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 task: c13ba230 ti: c1494000 task.ti: c1494000 NIP: c0264f6c LR: c0264f5c CTR: REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI CR: 28000448 XER: 2000 CFAR: c047e978 SOFTE: 0 GPR00: c0264f5c c1497c20 c1499d48 0004 GPR04: 0100 0010 0068 GPR08: 0001 082d c0cca5a8 GPR12: 48000448 cfda 01003bd44ff0 10020578 GPR16: 01003bd44ff8 01003bd45000 0001 GPR20: 0010 GPR24: c00ffe80 c0c824ec 0068 c00ffe80 GPR28: 0010 c00ffe80 0010 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 LR [c0264f5c] .cache_alloc_node+0x5c/0x270 Call Trace: [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 (unreliable) [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 Instruction dump: 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras pau...@samba.org Acked-by: David Rientjes rient...@google.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote: > On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote: > > The bounds check for nodeid in cache_alloc_node gives false > > positives on machines where the node IDs are not contiguous, leading > > to a panic at boot time. For example, on a POWER8 machine the node > > IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() > > returns 4, so when cache_alloc_node is called with nodeid = 16 the > > VM_BUG_ON triggers, like this: > ... > > > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and > > additionally make sure it isn't negative (since nodeid is an int). > > The check is there mainly to protect the array dereference in the > > get_node() call in the next line, and the array being dereferenced is > > of size MAX_NUMNODES. If the nodeid is in range but invalid (for > > example if the node is off-line), the BUG_ON in the next line will > > catch that. > > When did this break? How come we only just noticed? Commit 14e50c6a9bc2, which went into 3.10-rc1. You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y and you're running on a machine with discontiguous node IDs. > Also needs: > > Cc: sta...@vger.kernel.org It does. I remembered that a minute after I sent the patch. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote: > The bounds check for nodeid in cache_alloc_node gives false > positives on machines where the node IDs are not contiguous, leading > to a panic at boot time. For example, on a POWER8 machine the node > IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() > returns 4, so when cache_alloc_node is called with nodeid = 16 the > VM_BUG_ON triggers, like this: ... > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and > additionally make sure it isn't negative (since nodeid is an int). > The check is there mainly to protect the array dereference in the > get_node() call in the next line, and the array being dereferenced is > of size MAX_NUMNODES. If the nodeid is in range but invalid (for > example if the node is off-line), the BUG_ON in the next line will > catch that. When did this break? How come we only just noticed? Also needs: Cc: sta...@vger.kernel.org cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
(2014/12/01 13:28), Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 task: c13ba230 ti: c1494000 task.ti: c1494000 NIP: c0264f6c LR: c0264f5c CTR: REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) MSR: 92021032 CR: 28000448 XER: 2000 CFAR: c047e978 SOFTE: 0 GPR00: c0264f5c c1497c20 c1499d48 0004 GPR04: 0100 0010 0068 GPR08: 0001 082d c0cca5a8 GPR12: 48000448 cfda 01003bd44ff0 10020578 GPR16: 01003bd44ff8 01003bd45000 0001 GPR20: 0010 GPR24: c00ffe80 c0c824ec 0068 c00ffe80 GPR28: 0010 c00ffe80 0010 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 LR [c0264f5c] .cache_alloc_node+0x5c/0x270 Call Trace: [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 (unreliable) [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 Instruction dump: 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 6000 7f83e000 7d301026 5529effe <0b09> 393c0010 79291f24 7d3d4a14 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras --- Looks good to me. Reviewed-by: Yasuaki Ishimatsu If you need to backport it into -stable kernel, please read Documentation/stable_kernel_rules.txt. Thanks, Yasuaki Ishimatsu v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid > num_online_nodes()); + VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 task: c13ba230 ti: c1494000 task.ti: c1494000 NIP: c0264f6c LR: c0264f5c CTR: REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) MSR: 92021032 CR: 28000448 XER: 2000 CFAR: c047e978 SOFTE: 0 GPR00: c0264f5c c1497c20 c1499d48 0004 GPR04: 0100 0010 0068 GPR08: 0001 082d c0cca5a8 GPR12: 48000448 cfda 01003bd44ff0 10020578 GPR16: 01003bd44ff8 01003bd45000 0001 GPR20: 0010 GPR24: c00ffe80 c0c824ec 0068 c00ffe80 GPR28: 0010 c00ffe80 0010 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 LR [c0264f5c] .cache_alloc_node+0x5c/0x270 Call Trace: [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 (unreliable) [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 Instruction dump: 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 6000 7f83e000 7d301026 5529effe <0b09> 393c0010 79291f24 7d3d4a14 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras --- v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid > num_online_nodes()); + VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); -- 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 task: c13ba230 ti: c1494000 task.ti: c1494000 NIP: c0264f6c LR: c0264f5c CTR: REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI CR: 28000448 XER: 2000 CFAR: c047e978 SOFTE: 0 GPR00: c0264f5c c1497c20 c1499d48 0004 GPR04: 0100 0010 0068 GPR08: 0001 082d c0cca5a8 GPR12: 48000448 cfda 01003bd44ff0 10020578 GPR16: 01003bd44ff8 01003bd45000 0001 GPR20: 0010 GPR24: c00ffe80 c0c824ec 0068 c00ffe80 GPR28: 0010 c00ffe80 0010 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 LR [c0264f5c] .cache_alloc_node+0x5c/0x270 Call Trace: [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 (unreliable) [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 Instruction dump: 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras pau...@samba.org --- v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid num_online_nodes()); + VM_BUG_ON(nodeid 0 || nodeid = MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
(2014/12/01 13:28), Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA PowerNV Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17 task: c13ba230 ti: c1494000 task.ti: c1494000 NIP: c0264f6c LR: c0264f5c CTR: REGS: c14979a0 TRAP: 0700 Not tainted (3.18.0-rc5-kvm+) MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI CR: 28000448 XER: 2000 CFAR: c047e978 SOFTE: 0 GPR00: c0264f5c c1497c20 c1499d48 0004 GPR04: 0100 0010 0068 GPR08: 0001 082d c0cca5a8 GPR12: 48000448 cfda 01003bd44ff0 10020578 GPR16: 01003bd44ff8 01003bd45000 0001 GPR20: 0010 GPR24: c00ffe80 c0c824ec 0068 c00ffe80 GPR28: 0010 c00ffe80 0010 NIP [c0264f6c] .cache_alloc_node+0x6c/0x270 LR [c0264f5c] .cache_alloc_node+0x5c/0x270 Call Trace: [c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 (unreliable) [c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360 [c1497dc0] [c0c824ec] .init_list+0x3c/0x128 [c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258 [c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568 [c1497f90] [c0008c6c] start_here_common+0x20/0xa8 Instruction dump: 7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959 6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14 To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. Signed-off-by: Paul Mackerras pau...@samba.org --- Looks good to me. Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com If you need to backport it into -stable kernel, please read Documentation/stable_kernel_rules.txt. Thanks, Yasuaki Ishimatsu v2: include the oops message in the patch description mm/slab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab.c b/mm/slab.c index eb2b2ea..f34e053 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, void *obj; int x; - VM_BUG_ON(nodeid num_online_nodes()); + VM_BUG_ON(nodeid 0 || nodeid = MAX_NUMNODES); n = get_node(cachep, nodeid); BUG_ON(!n); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: ... To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. When did this break? How come we only just noticed? Also needs: Cc: sta...@vger.kernel.org cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote: On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote: The bounds check for nodeid in cache_alloc_node gives false positives on machines where the node IDs are not contiguous, leading to a panic at boot time. For example, on a POWER8 machine the node IDs are typically 0, 1, 16 and 17. This means that num_online_nodes() returns 4, so when cache_alloc_node is called with nodeid = 16 the VM_BUG_ON triggers, like this: ... To fix this, we instead compare the nodeid with MAX_NUMNODES, and additionally make sure it isn't negative (since nodeid is an int). The check is there mainly to protect the array dereference in the get_node() call in the next line, and the array being dereferenced is of size MAX_NUMNODES. If the nodeid is in range but invalid (for example if the node is off-line), the BUG_ON in the next line will catch that. When did this break? How come we only just noticed? Commit 14e50c6a9bc2, which went into 3.10-rc1. You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y and you're running on a machine with discontiguous node IDs. Also needs: Cc: sta...@vger.kernel.org It does. I remembered that a minute after I sent the patch. Paul. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/