Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, [EMAIL PROTECTED] wrote: > On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said: > > > #define BADPTR ((void *)16) > > > I bet you'd find *more* problems that way than by returning NULL, and > > you'd also avoid the whole problem with "if (!ptr) return -ENOMEM". > > Hmm.. this looks like a good contender for "first usage of #ifndef > CONFIG_STABLE" The warning? Sure but the BADPTR (or ZERO_SIZE_PTR now) will stay. It is definitely an error to deference an object that has no size. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said: > #define BADPTR ((void *)16) > I bet you'd find *more* problems that way than by returning NULL, and > you'd also avoid the whole problem with "if (!ptr) return -ENOMEM". Hmm.. this looks like a good contender for "first usage of #ifndef CONFIG_STABLE" :) pgp5hRgrE3k4B.pgp Description: PGP signature
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: > > On Fri, 1 Jun 2007, Andrew Morton wrote: > > > I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there, > > The trouble with the WARN_ON is that it triggers even for code that is > okay like noted by Jeremy. Yes. Sometimes it's just more natural to have ptr = kmalloc(size); .. use it .. free(ptr); and if a *degenerate* case of size=0 happens, who cares? It should just work, as long as we (obviously) don't actually try to access the pointer. So I don't much like the WARN_ON(size == 0). I think it potentially just causes people to write around it, and quite possibly causes the callers to write code that is not at all more readable or maintainable! That's why I'd much rather return BADPTR instead: we'll get an oops for buggy code, but we don't penalize the "natural" and good code! So once you return BADPTR, there really isn't any good reason for the WARN_ON. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Andrew Morton wrote: > In some cases, sure, and we should remove the warning at some stage. > > But it has exposed a couple of bugs and a few weird things. > We just need to scatter better bug powder into the corners of the allocations. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 01 Jun 2007 17:43:46 -0700 Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > As I said, it's specific to the kmalloc(0) problem, and we're fixing that > > by other means anyway. > > > > I guess I'm not getting much traction in my campaign for equal rights > for zero-sized allocations. They're perfectly reasonable things to have, > if that's the way the code falls out. The warning is just prejudice. > In some cases, sure, and we should remove the warning at some stage. But it has exposed a couple of bugs and a few weird things. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Linus Torvalds wrote: > So for *both* of the above reasons, it's actually stupid to return NULL > for a zero-sized allocation. It would be much better to return another > pointer that will trap on access. A good candidate might be to return > > #define BADPTR ((void *)16) > I think this is a good idea in principle, but I wonder if there's any code which assumes that kmalloc(x) != kmalloc(x) for all non-NULL returns from kmalloc. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Andrew Morton wrote: > As I said, it's specific to the kmalloc(0) problem, and we're fixing that > by other means anyway. > I guess I'm not getting much traction in my campaign for equal rights for zero-sized allocations. They're perfectly reasonable things to have, if that's the way the code falls out. The warning is just prejudice. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph Lameter wrote: > Well there are architectural problems. We determine the power of two slab > at compile time. The object size information is currently not available in > the binary :=). > That only applies to allocations with constant sizes. One presumes nobody is explicitly doing kmalloc(0), so we can use a separate runtime-computed-size path to do poisoning. (Which is probably 90% of the problem, since people who kmalloc(sizeof(struct foo)) will generally stay within bounds without too much effort.) J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: > I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there, The trouble with the WARN_ON is that it triggers even for code that is okay like noted by Jeremy. My initial intend with NULL was to allow the allocation of a zero sized pointer without extra checks. It should only trigger a failure if something bad was done. NULL had the problem of confusion with no memory available. I think BADPTR is good. If the coding warts are causing trouble then BADPTR will result in a failure. Otherwise if the code is doing a kmalloc(0) and not dereferencing the pointer (like Paul's fixed code) then we should be fine and not issue a warning. The false positives may be upsetting some people. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 16:57:20 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> wrote: > Andrew, want to take this patch to -mm to see if it triggers anything? spose so. I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there, because it is exposing some coding warts. But we should turn it off for 2.6.22 and make it conditional on CONFIG_DEVEL_KERNEL (or whatever it will be called) later. The BADPTR thing is a little worrying because it will make previously-working-by-luck code go oops. I guess we can live with that. So we end up with the BADPTR code enabled even in production kernels, in which case your ((unsigned long)x <= 16) trick is worth doing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: > > A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in > > kmalloc_slab() will trigger. > > Did we use to BUG_ON()? I think that's wrong. There are ways for users to > potentially ask the kernel to do big allocations, and the correct response > is to say "no can do", not to crash! There is no way to distinguish that from out of memory. Failing on large allocs is what we have always done for kmalloc(). Before 2.6.22 we used to fail for allocs > 256k which was a big nuisance for NUMAs large allocs. The patches in 2.6.22 allow us for the first time to allocate arbitrary sized objects up to MAX_ORDER. So we no longer have troubles with large NUMA objects. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: > > A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in > kmalloc_slab() will trigger. Did we use to BUG_ON()? I think that's wrong. There are ways for users to potentially ask the kernel to do big allocations, and the correct response is to say "no can do", not to crash! > Here is the updated patch. It works fine here: > > SLUB: Return BADPTR instead of warning for kmalloc(0) Looks fine to me. My only comment is that > - if (!x) > + if (!x || x == BADPTR) > return; This could be micro-optimized (again, non-standard, but it should be "practically portable") to have just a single test using something like if ((unsigned long)x <= 16) return; but I guess it doesn't really matter much. I think this is better than what we have now, but I also suspect it's *not* something we should try this late in the -rc sequence ;) Andrew, want to take this patch to -mm to see if it triggers anything? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: > I'm not seeing the path on the freeing side that silently accepts BADPTR. > > It is not ok to derefence BADPTR, but it's obviously ok to _free_ it. Ok. > Also, if I read the patch correctly, you _also_ return BADPTR for slabs > that are too large. No? That would be wrong - those need to return NULL > for "out of memory". A too large alloc is >32MB or MAX_ORDER << PAGE_SIZE. A BUG_ON in kmalloc_slab() will trigger. Here is the updated patch. It works fine here: SLUB: Return BADPTR instead of warning for kmalloc(0) Remove the WARN_ON_ONCE and simply return BADPTR. BADPTR can be used legitimately as long as it is not dereferenced. Can even be freed. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h | 18 -- mm/slub.c| 10 +- 2 files changed, 13 insertions(+), 15 deletions(-) Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-06-01 16:19:36.0 -0700 +++ slub/include/linux/slub_def.h 2007-06-01 16:43:57.0 -0700 @@ -12,6 +12,8 @@ #include #include +#define BADPTR ((void *)16) + struct kmem_cache_node { spinlock_t list_lock; /* Protect partial list and nr_partial */ unsigned long nr_partial; @@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[ */ static inline int kmalloc_index(size_t size) { - /* -* We should return 0 if size == 0 (which would result in the -* kmalloc caller to get NULL) but we use the smallest object -* here for legacy reasons. Just issue a warning so that -* we can discover locations where we do 0 sized allocations. -*/ - WARN_ON_ONCE(size == 0); + + if (!size) + return 0; if (size > KMALLOC_MAX_SIZE) return -1; @@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc(s, flags); } else @@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_zalloc(s, flags); } else @@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc_node(s, flags, node); } else Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-06-01 16:21:00.0 -0700 +++ slub/mm/slub.c 2007-06-01 16:43:27.0 -0700 @@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags if (s) return slab_alloc(s, flags, -1, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc); @@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t if (s) return slab_alloc(s, flags, node, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc_node); #endif @@ -2338,7 +2338,7 @@ void kfree(const void *x) struct kmem_cache *s; struct page *page; - if (!x) + if (!x || x == BADPTR) return; page = virt_to_head_page(x); @@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, -1, caller); } @@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, node, caller); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: > > Something like this? (Not tested yet just for review): I'm not seeing the path on the freeing side that silently accepts BADPTR. It is not ok to derefence BADPTR, but it's obviously ok to _free_ it. Also, if I read the patch correctly, you _also_ return BADPTR for slabs that are too large. No? That would be wrong - those need to return NULL for "out of memory". (But I only looked at the diff, not at the end result, so it may be that all the cases where you changed "return NULL" into "return BADPTR" were really just the size-zero ones - it just looked suspicious). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: > No, I don't think you can do it this way. Ultimately not. But its worth to see if this works. > At a minimum, you'd need to test that the result is word-aligned. > Preferably 8-byte aligned. We literally have stuff that knows about these > things and uses the low bits in the pointer to keep extra data. kmalloc allocations are guaranteed to be aligned to KMALLOC_MINALIGN. I can bring that in but it will make the patch less readable. > Of course, there migth be other (even more subtle) cases where we just > assume certain alignment, and depend on the fact that we just _happen_ to > get it. Who knows.. I tried to get rid of those cased and I hope that work is complete in 2.6.22. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: > > Hmmm... We are going rapidly here. This is a patch that I am testing right > now. It right adjust the object and the patch is manageable: No, I don't think you can do it this way. At a minimum, you'd need to test that the result is word-aligned. Preferably 8-byte aligned. We literally have stuff that knows about these things and uses the low bits in the pointer to keep extra data. (That said, I didn't check whether we actually kmalloc() the data, but I think we do - things like "struct key" etc). Now, maybe those things always have a 8-byte-aligned size, and it works out, but I'd be worried. Of course, there migth be other (even more subtle) cases where we just assume certain alignment, and depend on the fact that we just _happen_ to get it. Who knows.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: > So for *both* of the above reasons, it's actually stupid to return NULL > for a zero-sized allocation. It would be much better to return another > pointer that will trap on access. A good candidate might be to return > > #define BADPTR ((void *)16) Something like this? (Not tested yet just for review): SLUB: Return BADPTR instead of warning for kmalloc(0) Remove the WARN_ON_ONCE and simply return BADPTR. BADPTR can be used legitimately as long as it is not dereferenced. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h | 18 -- mm/slub.c|8 2 files changed, 12 insertions(+), 14 deletions(-) Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-06-01 16:19:36.0 -0700 +++ slub/include/linux/slub_def.h 2007-06-01 16:24:54.0 -0700 @@ -12,6 +12,8 @@ #include #include +#define BADPTR ((void *)16) + struct kmem_cache_node { spinlock_t list_lock; /* Protect partial list and nr_partial */ unsigned long nr_partial; @@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[ */ static inline int kmalloc_index(size_t size) { - /* -* We should return 0 if size == 0 (which would result in the -* kmalloc caller to get NULL) but we use the smallest object -* here for legacy reasons. Just issue a warning so that -* we can discover locations where we do 0 sized allocations. -*/ - WARN_ON_ONCE(size == 0); + + if (!size) + return 0; if (size > KMALLOC_MAX_SIZE) return -1; @@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc(s, flags); } else @@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_zalloc(s, flags); } else @@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc_node(s, flags, node); } else Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-06-01 16:21:00.0 -0700 +++ slub/mm/slub.c 2007-06-01 16:27:12.0 -0700 @@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags if (s) return slab_alloc(s, flags, -1, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc); @@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t if (s) return slab_alloc(s, flags, node, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc_node); #endif @@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, -1, caller); } @@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, node, caller); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: > > We could store the size of the allocation in the allocated object? Just > add four bytes to the user's request, then pick the appropriate cache based > on that, then put the user's `size' at the tail of the resulting allocation? It should be easy enough to do it for _most_ allocations by just doing it when there is already "enough slack" to do it (which is likely true most of the time). IOW, if you ask for a 42-byte allocation, and we allocate from a 64-byte slab, you get the slab allocation at address X, you don't actually have to return "X" at all. Just return "X+8", and then you do: - at 32-bit word at X+0 you put the "real length" - at 32-bit word at X+4 you put some good redzone marker - at 32-bit word at "X + reallen + 8" you put the endzone marker. And then you say: if the real length was within 12 bytes of the allocation length, we just don't do this. So you wouldn't get any redzoning for those allocations that are exactly sized (or close enough) to fit in an allocation block, but I bet *most* allocations would get this for free. And then, if you actually turn on redzoning, you just always add the 12 byte to the allocation size (assuming the alignment rules allow you to). The nice thing about this is that the freeing path already knows where the object is *supposed* to start (because it sees the allocation size in the slub/slab data structures), so the kfree() path can actually figure out on its own whether it is given a "X" or an "X+8" kind of address. So you don't actually need any extra information. You literally just need enough slop in the allocation that you can do this in the first place, so there is no cost (except for the cost of checking itself, of course). Hmm? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: > Hmmm... We are going rapidly here. This is a patch that I am testing right > now. It right adjust the object and the patch is manageable: Does not boot sigh. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
> So a kmalloc(62) would get upped to 66, so we allocate from size-128 > and put the number 62 at bytes 124-127 and we poison bytes 62-123? Hmmm... We are going rapidly here. This is a patch that I am testing right now. It right adjust the object and the patch is manageable: SLUB mm-only: Right align kmalloc objects to trigger overwrite detection Right align kmalloc objects if they are less than the full kmalloc slab size. This will move the object to be flush with the end of the object in order to allow the easy detection of writes / reads after the end of the kmalloc object. Without slub_debug overwrites will destroy the free pointer of the next object or the next object. Read will yield garbage that is likely zero. With slub_debug redzone checks will be triggered. Reads will read redzone poison. This patch is only for checking things out. There are issues: 1. Alignment of kmalloc objects may now be different. In particular objects whose size is not a multiple of wordsize may be not word alignmed. 2. __kmalloc and kfree need to touch an additional cacheline in struct kmem_cache thereby reducing performance. 3. An object allocated via kmalloc may no longer be freed via kmem_cache_free. So we need to figure out some may to make this configurable. Preferably runtime configurable. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h | 22 +++--- mm/slub.c| 11 --- 2 files changed, 27 insertions(+), 6 deletions(-) Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-06-01 15:56:42.0 -0700 +++ slub/include/linux/slub_def.h 2007-06-01 16:00:03.0 -0700 @@ -120,6 +120,19 @@ static inline struct kmem_cache *kmalloc return _caches[index]; } +static inline unsigned long kmalloc_size(size_t size) +{ + int index = kmalloc_index(size); + + if (index >= KMALLOC_SHIFT_LOW) + return 1 << index; + + if (index == 2) + return 192; + return 96; +} + + #ifdef CONFIG_ZONE_DMA #define SLUB_DMA __GFP_DMA #else @@ -135,7 +148,8 @@ static inline void *kmalloc(size_t size, if (!s) return NULL; - return kmem_cache_alloc(s, flags); + return kmem_cache_alloc(s, flags) + + kmalloc_size(size) - size; } else return __kmalloc(size, flags); } @@ -148,7 +162,8 @@ static inline void *kzalloc(size_t size, if (!s) return NULL; - return kmem_cache_zalloc(s, flags); + return kmem_cache_zalloc(s, flags) + + kmalloc_size(size) - size; } else return __kzalloc(size, flags); } @@ -159,7 +174,8 @@ extern void *__kmalloc_node(size_t size, static inline void *kmalloc_node(size_t size, gfp_t flags, int node) { if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) { - struct kmem_cache *s = kmalloc_slab(size); + struct kmem_cache *s = kmalloc_slab(size) + + kmalloc_size(size) - size; if (!s) return NULL; Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-06-01 15:51:05.0 -0700 +++ slub/mm/slub.c 2007-06-01 16:15:21.0 -0700 @@ -2283,9 +2283,10 @@ static struct kmem_cache *get_slab(size_ void *__kmalloc(size_t size, gfp_t flags) { struct kmem_cache *s = get_slab(size, flags); + int offset = size - s->size; if (s) - return slab_alloc(s, flags, -1, __builtin_return_address(0)); + return slab_alloc(s, flags, -1, __builtin_return_address(0)) + offset; return NULL; } EXPORT_SYMBOL(__kmalloc); @@ -2294,9 +2295,10 @@ EXPORT_SYMBOL(__kmalloc); void *__kmalloc_node(size_t size, gfp_t flags, int node) { struct kmem_cache *s = get_slab(size, flags); + int offset = size - s->size; if (s) - return slab_alloc(s, flags, node, __builtin_return_address(0)); + return slab_alloc(s, flags, node, __builtin_return_address(0)) + offset; return NULL; } EXPORT_SYMBOL(__kmalloc_node); @@ -2337,6 +2339,7 @@ void kfree(const void *x) { struct kmem_cache *s; struct page *page; + unsigned long addr = (unsigned long) x; if (!x) return; @@ -2344,7 +2347,9 @@ void kfree(const void *x) page = virt_to_head_page(x); s = page->slab; - slab_free(s, page, (void *)x, __builtin_return_address(0)); + addr &= ~((unsigned long)s->size - 1); + + slab_free(s, page, (void *)addr, __builtin_return_address(0)); } EXPORT_SYMBOL(kfree); - To unsubscribe from this
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 15:41:48 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 1 Jun 2007, Andrew Morton wrote: > > > > I should make SLUB put poisoning values in unused areas of a kmalloced > > > object? > > > > hm, I hadn't thought of it that way actually. I was thinking it was > > specific to kmalloc(0) but as you point out, the situation is > > generalisable. > > Right it could catch a lot of other bugs as well. > > > Yes, if someone does kmalloc(42) and we satisfy the allocation from the > > size-64 slab, we should poison and then check the allegedly-unused 22 > > bytes. > > > > Please ;) > > > > (vaguely stunned that we didn't think of doing this years ago). > > Well there are architectural problems. We determine the power of two slab > at compile time. The object size information is currently not available in > the binary :=). > > > It'll be a large patch, I expect? > > Ummm... Yes. We need to switch off the compile time power of two slab > calculation. Then I need to have some way of storing the object size in > the metainformation of each object. Changes a lot of function calls. Oh well. Don't lose any sleep over it ;) We could store the size of the allocation in the allocated object? Just add four bytes to the user's request, then pick the appropriate cache based on that, then put the user's `size' at the tail of the resulting allocation? So a kmalloc(62) would get upped to 66, so we allocate from size-128 and put the number 62 at bytes 124-127 and we poison bytes 62-123? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: > > Right it could catch a lot of other bugs as well. I'd actually prefer "malloc(0)" to _not_ return NULL, but some known (non-NULL) bogus pointer. Why? Because it's quite sane to have simple logic like ptr = malloc(size); if (!ptr) return -ENOMEM; and writing it as if (size && !ptr) return -ENOMEM; is just annoying. Also, NULL is _special_. There are absolutely tons of code in the kernel (and elsewhere) that just does something *different* from NULL pointers, and that totally breaks the whole notion of "you can allocate a zero-sized allocation, you just must not dereference it". If people special-case NULL as something else, they won't even go through the normal code-path. So for *both* of the above reasons, it's actually stupid to return NULL for a zero-sized allocation. It would be much better to return another pointer that will trap on access. A good candidate might be to return #define BADPTR ((void *)16) which is a portable-enough (where "portable-enough" is "against strict ANSI C rules, but works in practice on all architectures") way to return something that will cause the same page fault behaviour as NULL, but will _not_ trigger the "NULL is special" code. (Of course, you then need to teach "kfree()" to accept it as another pointer to be ignored, that's fine). I bet you'd find *more* problems that way than by returning NULL, and you'd also avoid the whole problem with "if (!ptr) return -ENOMEM". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: > > I should make SLUB put poisoning values in unused areas of a kmalloced > > object? > > hm, I hadn't thought of it that way actually. I was thinking it was > specific to kmalloc(0) but as you point out, the situation is > generalisable. Right it could catch a lot of other bugs as well. > Yes, if someone does kmalloc(42) and we satisfy the allocation from the > size-64 slab, we should poison and then check the allegedly-unused 22 > bytes. > > Please ;) > > (vaguely stunned that we didn't think of doing this years ago). Well there are architectural problems. We determine the power of two slab at compile time. The object size information is currently not available in the binary :=). > It'll be a large patch, I expect? Ummm... Yes. We need to switch off the compile time power of two slab calculation. Then I need to have some way of storing the object size in the metainformation of each object. Changes a lot of function calls. > Actually, I have this vague memory that slab would take that kmalloc(42) > and would then return kmalloc(64)+22, so the returned memory is > "right-aligned". This way the existing overrun-detection is applicable to > all kmallocs. Maybe I dreamed it. Yes, that would be possible by simply adding a compile time generated offset to the compile time generated call to kmem_cache_alloc. But then kfree() will have a difficult time figuring out which object to free. Hmmm... But I can get to the slab via the page struct which allows me to figure out the power of two size. This would mean that kfree would work with an arbitrary pointer anywhere into the object. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 15:20:05 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 1 Jun 2007, Andrew Morton wrote: > > > If slab was smart enough, it would have poisoned those 8 bytes to some > > known pattern, and then checked that they still had that pattern when the > > memory got freed again. > > So this is new feature request? > > > But it isn't smart enough, so the bug went undetected. > > I should make SLUB put poisoning values in unused areas of a kmalloced > object? hm, I hadn't thought of it that way actually. I was thinking it was specific to kmalloc(0) but as you point out, the situation is generalisable. Yes, if someone does kmalloc(42) and we satisfy the allocation from the size-64 slab, we should poison and then check the allegedly-unused 22 bytes. Please ;) (vaguely stunned that we didn't think of doing this years ago). It'll be a large patch, I expect? Actually, I have this vague memory that slab would take that kmalloc(42) and would then return kmalloc(64)+22, so the returned memory is "right-aligned". This way the existing overrun-detection is applicable to all kmallocs. Maybe I dreamed it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: > If slab was smart enough, it would have poisoned those 8 bytes to some > known pattern, and then checked that they still had that pattern when the > memory got freed again. So this is new feature request? > But it isn't smart enough, so the bug went undetected. I should make SLUB put poisoning values in unused areas of a kmalloced object? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 14:45:27 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 1 Jun 2007, Andrew Morton wrote: > > > Poisoning and redzoning could have caught that. > > Redzoning would not have caught it. This was a kmalloc allocation and > SLAB always gave them 32 bytes to play with. Only writes more than 32 > bytes behind would have been caught. > > Poisoning is only applicable to unallocated objects and these were > allocated. Nope and nope. This is a special case where the user asked for zero bytes and the kernel gave him 8 (or 32) bytes instead. If slab was smart enough, it would have poisoned those 8 bytes to some known pattern, and then checked that they still had that pattern when the memory got freed again. But it isn't smart enough, so the bug went undetected. As I said, it's specific to the kmalloc(0) problem, and we're fixing that by other means anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: > Poisoning and redzoning could have caught that. Redzoning would not have caught it. This was a kmalloc allocation and SLAB always gave them 32 bytes to play with. Only writes more than 32 bytes behind would have been caught. Poisoning is only applicable to unallocated objects and these were allocated. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 13:47:23 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote: > > > I think this is a good example of why having to special-case kmalloc(0) > > is a bad idea. The original code was straightforward and, barring > > silliness, should be completely correct with npids==0. This new code > > does nothing other than make things more complex. > > Hehe we got you. The code is indexing the pidarray allocated with > kmalloc(0). So it uncovered a latent bug. It only worked because SLAB gave > him 32 bytes and it now only works because SLUB give him 8. That is enough > to illegally index the first array element. > Poisoning and redzoning could have caught that. But I guess it doesn't matter now, as this shortcoming is specific to the zero-length allocations, and we're weeding those out anyway. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph Lameter wrote: > On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote: > > >> I think this is a good example of why having to special-case kmalloc(0) >> is a bad idea. The original code was straightforward and, barring >> silliness, should be completely correct with npids==0. This new code >> does nothing other than make things more complex. >> > > Hehe we got you. Bugger ;) Well, my get-out clause was "barring silliness". J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph wrote: > Then you are deferencing an element in the pidarray that you did not > allocate! This is a bug in cpuset code. Absolutely - as I described in more detail in a reply Jeremy a few minutes ago. Thanks for smoking it out. If, as I suggested in my previous message to which you are responding: > Perhaps if you moved the "if (unlikely(n == npids))" test before the > "pidarray[n++] = p->pid" assignment, it would be safe. then it looks safe to me, without having to add the bogus "+1" to the allocated size. That is, I suspect the following patch fixes this long standing cpuset bug (warning - white space mangled): --- kernel/cpuset.c 2006-12-10 12:27:37.0 -0800 +++ kernel/cpuset.c.new 2007-06-01 13:53:00.271010074 -0700 @@ -1661,9 +1661,9 @@ static int pid_array_load(pid_t *pidarra do_each_thread(g, p) { if (p->cpuset == cs) { - pidarray[n++] = p->pid; if (unlikely(n == npids)) goto array_full; + pidarray[n++] = p->pid; } } while_each_thread(g, p); -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote: > I think this is a good example of why having to special-case kmalloc(0) > is a bad idea. The original code was straightforward and, barring > silliness, should be completely correct with npids==0. This new code > does nothing other than make things more complex. Hehe we got you. The code is indexing the pidarray allocated with kmalloc(0). So it uncovered a latent bug. It only worked because SLAB gave him 32 bytes and it now only works because SLUB give him 8. That is enough to illegally index the first array element. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: > > There are no checks necessary. Your function worked fine so far for > > the case of zero objects with the pointer returned by kmalloc. If the > > code is correct then it will not dereference the pointer to the zero > > sized array. If not then we may find a bug and fix it. > > I suspect you got lucky. The check for a full pidarray[] in the routine > pid_array_load() occurs -after- a pid is put in the array. If a task > showed up in this cpuset at the wrong time, we would fall over and die > in the code: Then you are deferencing an element in the pidarray that you did not allocate! This is a bug in cpuset code. So we would need to allocate at mininum one array element? Or would we need to allocate npids + 1 to be safe??? --- kernel/cpuset.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/kernel/cpuset.c === --- linux-2.6.orig/kernel/cpuset.c 2007-06-01 13:41:24.0 -0700 +++ linux-2.6/kernel/cpuset.c 2007-06-01 13:42:08.0 -0700 @@ -1741,7 +1741,7 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(>count); - pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); + pidarray = kmalloc(max(1, npids) * sizeof(pid_t), GFP_KERNEL); if (!pidarray) goto err1; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
> I think this is a good example of why having to special-case kmalloc(0) > is a bad idea. The original code was straightforward and, barring > silliness, should be completely correct with npids==0. This new code > does nothing other than make things more complex. Perhaps so. Perhaps not. Actually, the original cpuset pid_array_load() code was not correct, and Christoph's work is smoking this out. The original code assumed that it could access the first element of the kmalloc'd array, before it checked if it could go even further. But with the kmalloc call of: kmalloc(npids * sizeof(pid_t), GFP_KERNEL); it is impossible for kmalloc to know how much to allocate, to guarantee such behaviour. If npids is zero, all kmalloc sees, in essence, is the call: kmalloc(0, GFP_KERNEL); There is no way it can guess that I might still want to access the first "size(pid_t)" bytes of whatever it returned. If we had a different sort of kmalloc API, such as would be called with: kmalloc(npids, sizeof(pid_t), GFP_KERNEL); rather like the libc calloc() routine, having separate count and size fields, then in theory, kmalloc could treat calls of the form: kmalloc(0, sz, GFP_KERNEL); /* for sz > 0 */ as if they were actually: kmalloc(1, sz, GFP_KERNEL); and then my cpuset code might not have been broken. But, since some of us can never remember whether it is the size or the count that comes first in such an API, we would still have had such bugs -- just fewer, for those who got those arguments backwards, and finally tripped over this anyway. That doesn't address the other likely broken code, elsewhere in the kernel somewhere that happened to try to access the 2nd or 3rd element of such an empty array before noticing. So ... while I started this reply agreeing with you, I end up still agreeing with Christoph's changes here. Thanks, Christoph, for finding a bug in the cpuset code - good work! -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Srinivasa Ds wrote: > --- > kernel/cpuset.c | 10 ++ > 1 file changed, 10 insertions(+) > > Index: linux-2.6.22-rc3/kernel/cpuset.c > === > --- linux-2.6.22-rc3.orig/kernel/cpuset.c > +++ linux-2.6.22-rc3/kernel/cpuset.c > @@ -1741,6 +1741,13 @@ static int cpuset_tasks_open(struct inod >* show up until sometime later on. >*/ > npids = atomic_read(>count); > + if (!npids) { > + ctr->buf = NULL; > + ctr->bufsz = 0; > + file->private_data = ctr; > + return 0; > + } > + > I think this is a good example of why having to special-case kmalloc(0) is a bad idea. The original code was straightforward and, barring silliness, should be completely correct with npids==0. This new code does nothing other than make things more complex. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
> There are no checks necessary. Your function worked fine so far for > the case of zero objects with the pointer returned by kmalloc. If the > code is correct then it will not dereference the pointer to the zero > sized array. If not then we may find a bug and fix it. I suspect you got lucky. The check for a full pidarray[] in the routine pid_array_load() occurs -after- a pid is put in the array. If a task showed up in this cpuset at the wrong time, we would fall over and die in the code: static int pid_array_load(pid_t *pidarray, int npids, struct cpuset *cs) { int n = 0; struct task_struct *g, *p; read_lock(_lock); do_each_thread(g, p) { if (p->cpuset == cs) { pidarray[n++] = p->pid; /* Death if pidarray == NULL */ if (unlikely(n == npids)) goto array_full; } } while_each_thread(g, p); Perhaps if you moved the "if (unlikely(n == npids))" test before the "pidarray[n++] = p->pid" assignment, it would be safe. And does the next line of code, the call to sort() after the call of pid_array_load(), work with pidarray == NULL and npids == 0: npids = pid_array_load(pidarray, npids, cs); sort(pidarray, npids, sizeof(pid_t), cmppid, NULL); /* <== ?? */ Off hand, I didn't know. I guess it must, or you would have already tripped over it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: > > We must have a NULL pointer exactly because it cannot be dereferenced. > > Well, both patch versions had NULL pointers - either pidarray or ctr->buf. > > But now I can make more sense of what you say -- you -want- the pidarray > pointer, in particular, to be NULL, so that we don't accidentally use it. > > Does that still mean that your patch suggestion was incomplete, in that > it lacked the additional checks to avoid using a NULL pidarray? There are no checks necessary. Your function worked fine so far for the case of zero objects with the pointer returned by kmalloc. If the code is correct then it will not dereference the pointer to the zero sized array. If not then we may find a bug and fix it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
> We must have a NULL pointer exactly because it cannot be dereferenced. Well, both patch versions had NULL pointers - either pidarray or ctr->buf. But now I can make more sense of what you say -- you -want- the pidarray pointer, in particular, to be NULL, so that we don't accidentally use it. Does that still mean that your patch suggestion was incomplete, in that it lacked the additional checks to avoid using a NULL pidarray? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: > > The allocated array has 0 bytes. Any dereference is an error. > > Well, then, that doesn't work either. > > We -agree- we can't have this failing. You suggested (if I understood > correctly, which I doubt) that we set the pointer to NULL, and noted > that derefencing a NULL pointer would fail. I agree such derefences > would fail, and tried to point out that this would be bad. We can't > have pointer dereferences failing in the kernel ... duh. > > Your reply seems like a non sequiter to me, pointing out that having > a pointer to an array of 0 bytes fails as well. Ok - that's bad too. Think about it some more... > So we cannot have a NULL pointer, used unchecked, and we cannot have > a non-NULL pointer to a zero byte array, used unchecked. We must have a NULL pointer exactly because it cannot be dereferenced. It is a safety precaution that the following code will not deference the pointer to the array of zero size. Because attempting to access an object in an array with zero object in them is an error. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
> The allocated array has 0 bytes. Any dereference is an error. Well, then, that doesn't work either. We -agree- we can't have this failing. You suggested (if I understood correctly, which I doubt) that we set the pointer to NULL, and noted that derefencing a NULL pointer would fail. I agree such derefences would fail, and tried to point out that this would be bad. We can't have pointer dereferences failing in the kernel ... duh. Your reply seems like a non sequiter to me, pointing out that having a pointer to an array of 0 bytes fails as well. Ok - that's bad too. So we cannot have a NULL pointer, used unchecked, and we cannot have a non-NULL pointer to a zero byte array, used unchecked. As I said, and as I thought Srinivasa coded, with these patch lines: @@ -1772,6 +1779,9 @@ static ssize_t cpuset_tasks_read(struct { struct ctr_struct *ctr = file->private_data; + if (!ctr->buf) /* No tasks in this cpuset */ + return 0; + return simple_read_from_buffer(buf, nbytes, ppos, ctr->buf, ctr->bufsz); } we actually have to check the NULL-ness (or emptiness) of this thing, down where it would be used, to avoid using NULL or empty things. Summary - Srinivasa's patch still looks ok to me. This followup discussion between you and I is just confusing me - sorry. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: > What do you mean, Christoph, by "then any dereferences will fail" ? > That sounds bad to me -- we don't want pointer dereferences failing > in the kernel, do we? Wouldn't you have to add some more lines of > code, a bit further down, where pidarray is used, to avoid trying to > use it if its NULL? The allocated array has 0 bytes. Any dereference is an error. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph wrote: > H Isnt it easier to set the pidarray to NULL in the case it > has no objects? Then any deferences will fail. Paul? >From what I can tell, in 30 seconds of reading, either of these approaches, Christoph's or Srinivasa's, based on Vatsa's suggestions -could- be made to work. I don't understand how Christoph's approach works, as stated, however. What do you mean, Christoph, by "then any dereferences will fail" ? That sounds bad to me -- we don't want pointer dereferences failing in the kernel, do we? Wouldn't you have to add some more lines of code, a bit further down, where pidarray is used, to avoid trying to use it if its NULL? I think that Srinivasa's point is that it is legitimate for this array to be empty, and that we have to code for that case, one way or another. Now that we can't simply kmalloc zero bytes and get away with it, that requires explicit tests in the code for the empty case, one way or another. This sounds like a valid point to me. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Srinivasa Ds wrote: > I have modified the patch little bit, Thanks Vatsa for input and hence > resending it. Please let me know your comments on this. H Isnt it easier to set the pidarray to NULL in the case it has no objects? Then any deferences will fail. Paul? --- kernel/cpuset.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) Index: linux-2.6/kernel/cpuset.c === --- linux-2.6.orig/kernel/cpuset.c 2007-06-01 11:09:39.0 -0700 +++ linux-2.6/kernel/cpuset.c 2007-06-01 11:11:59.0 -0700 @@ -1723,7 +1723,7 @@ static int cpuset_tasks_open(struct inod { struct cpuset *cs = __d_cs(file->f_path.dentry->d_parent); struct ctr_struct *ctr; - pid_t *pidarray; + pid_t *pidarray = NULL; int npids; char c; @@ -1741,9 +1741,11 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(>count); - pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); - if (!pidarray) - goto err1; + if (npids) + pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); + if (!pidarray) + goto err1; + } npids = pid_array_load(pidarray, npids, cs); sort(pidarray, npids, sizeof(pid_t), cmppid, NULL); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Friday 01 June 2007 12:57:53 Srinivasa Ds wrote: I have modified the patch little bit, Thanks Vatsa for input and hence resending it. Please let me know your comments on this. Signed-off-by: Srinivasa DS <[EMAIL PROTECTED]> --- kernel/cpuset.c | 10 ++ 1 file changed, 10 insertions(+) Index: linux-2.6.22-rc3/kernel/cpuset.c === --- linux-2.6.22-rc3.orig/kernel/cpuset.c +++ linux-2.6.22-rc3/kernel/cpuset.c @@ -1741,6 +1741,13 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(>count); + if (!npids) { + ctr->buf = NULL; + ctr->bufsz = 0; + file->private_data = ctr; + return 0; + } + pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); if (!pidarray) goto err1; @@ -1772,6 +1779,9 @@ static ssize_t cpuset_tasks_read(struct { struct ctr_struct *ctr = file->private_data; + if (!ctr->buf) /* No tasks in this cpuset */ + return 0; + return simple_read_from_buffer(buf, nbytes, ppos, ctr->buf, ctr->bufsz); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Friday 01 June 2007 12:57:53 Srinivasa Ds wrote: I have modified the patch little bit, Thanks Vatsa for input and hence resending it. Please let me know your comments on this. Signed-off-by: Srinivasa DS [EMAIL PROTECTED] --- kernel/cpuset.c | 10 ++ 1 file changed, 10 insertions(+) Index: linux-2.6.22-rc3/kernel/cpuset.c === --- linux-2.6.22-rc3.orig/kernel/cpuset.c +++ linux-2.6.22-rc3/kernel/cpuset.c @@ -1741,6 +1741,13 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(cs-count); + if (!npids) { + ctr-buf = NULL; + ctr-bufsz = 0; + file-private_data = ctr; + return 0; + } + pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); if (!pidarray) goto err1; @@ -1772,6 +1779,9 @@ static ssize_t cpuset_tasks_read(struct { struct ctr_struct *ctr = file-private_data; + if (!ctr-buf) /* No tasks in this cpuset */ + return 0; + return simple_read_from_buffer(buf, nbytes, ppos, ctr-buf, ctr-bufsz); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Srinivasa Ds wrote: I have modified the patch little bit, Thanks Vatsa for input and hence resending it. Please let me know your comments on this. H Isnt it easier to set the pidarray to NULL in the case it has no objects? Then any deferences will fail. Paul? --- kernel/cpuset.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) Index: linux-2.6/kernel/cpuset.c === --- linux-2.6.orig/kernel/cpuset.c 2007-06-01 11:09:39.0 -0700 +++ linux-2.6/kernel/cpuset.c 2007-06-01 11:11:59.0 -0700 @@ -1723,7 +1723,7 @@ static int cpuset_tasks_open(struct inod { struct cpuset *cs = __d_cs(file-f_path.dentry-d_parent); struct ctr_struct *ctr; - pid_t *pidarray; + pid_t *pidarray = NULL; int npids; char c; @@ -1741,9 +1741,11 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(cs-count); - pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); - if (!pidarray) - goto err1; + if (npids) + pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); + if (!pidarray) + goto err1; + } npids = pid_array_load(pidarray, npids, cs); sort(pidarray, npids, sizeof(pid_t), cmppid, NULL); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph wrote: H Isnt it easier to set the pidarray to NULL in the case it has no objects? Then any deferences will fail. Paul? From what I can tell, in 30 seconds of reading, either of these approaches, Christoph's or Srinivasa's, based on Vatsa's suggestions -could- be made to work. I don't understand how Christoph's approach works, as stated, however. What do you mean, Christoph, by then any dereferences will fail ? That sounds bad to me -- we don't want pointer dereferences failing in the kernel, do we? Wouldn't you have to add some more lines of code, a bit further down, where pidarray is used, to avoid trying to use it if its NULL? I think that Srinivasa's point is that it is legitimate for this array to be empty, and that we have to code for that case, one way or another. Now that we can't simply kmalloc zero bytes and get away with it, that requires explicit tests in the code for the empty case, one way or another. This sounds like a valid point to me. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: What do you mean, Christoph, by then any dereferences will fail ? That sounds bad to me -- we don't want pointer dereferences failing in the kernel, do we? Wouldn't you have to add some more lines of code, a bit further down, where pidarray is used, to avoid trying to use it if its NULL? The allocated array has 0 bytes. Any dereference is an error. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
The allocated array has 0 bytes. Any dereference is an error. Well, then, that doesn't work either. We -agree- we can't have this failing. You suggested (if I understood correctly, which I doubt) that we set the pointer to NULL, and noted that derefencing a NULL pointer would fail. I agree such derefences would fail, and tried to point out that this would be bad. We can't have pointer dereferences failing in the kernel ... duh. Your reply seems like a non sequiter to me, pointing out that having a pointer to an array of 0 bytes fails as well. Ok - that's bad too. So we cannot have a NULL pointer, used unchecked, and we cannot have a non-NULL pointer to a zero byte array, used unchecked. As I said, and as I thought Srinivasa coded, with these patch lines: @@ -1772,6 +1779,9 @@ static ssize_t cpuset_tasks_read(struct { struct ctr_struct *ctr = file-private_data; + if (!ctr-buf) /* No tasks in this cpuset */ + return 0; + return simple_read_from_buffer(buf, nbytes, ppos, ctr-buf, ctr-bufsz); } we actually have to check the NULL-ness (or emptiness) of this thing, down where it would be used, to avoid using NULL or empty things. Summary - Srinivasa's patch still looks ok to me. This followup discussion between you and I is just confusing me - sorry. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: The allocated array has 0 bytes. Any dereference is an error. Well, then, that doesn't work either. We -agree- we can't have this failing. You suggested (if I understood correctly, which I doubt) that we set the pointer to NULL, and noted that derefencing a NULL pointer would fail. I agree such derefences would fail, and tried to point out that this would be bad. We can't have pointer dereferences failing in the kernel ... duh. Your reply seems like a non sequiter to me, pointing out that having a pointer to an array of 0 bytes fails as well. Ok - that's bad too. Think about it some more... So we cannot have a NULL pointer, used unchecked, and we cannot have a non-NULL pointer to a zero byte array, used unchecked. We must have a NULL pointer exactly because it cannot be dereferenced. It is a safety precaution that the following code will not deference the pointer to the array of zero size. Because attempting to access an object in an array with zero object in them is an error. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
We must have a NULL pointer exactly because it cannot be dereferenced. Well, both patch versions had NULL pointers - either pidarray or ctr-buf. But now I can make more sense of what you say -- you -want- the pidarray pointer, in particular, to be NULL, so that we don't accidentally use it. Does that still mean that your patch suggestion was incomplete, in that it lacked the additional checks to avoid using a NULL pidarray? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: We must have a NULL pointer exactly because it cannot be dereferenced. Well, both patch versions had NULL pointers - either pidarray or ctr-buf. But now I can make more sense of what you say -- you -want- the pidarray pointer, in particular, to be NULL, so that we don't accidentally use it. Does that still mean that your patch suggestion was incomplete, in that it lacked the additional checks to avoid using a NULL pidarray? There are no checks necessary. Your function worked fine so far for the case of zero objects with the pointer returned by kmalloc. If the code is correct then it will not dereference the pointer to the zero sized array. If not then we may find a bug and fix it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
There are no checks necessary. Your function worked fine so far for the case of zero objects with the pointer returned by kmalloc. If the code is correct then it will not dereference the pointer to the zero sized array. If not then we may find a bug and fix it. I suspect you got lucky. The check for a full pidarray[] in the routine pid_array_load() occurs -after- a pid is put in the array. If a task showed up in this cpuset at the wrong time, we would fall over and die in the code: static int pid_array_load(pid_t *pidarray, int npids, struct cpuset *cs) { int n = 0; struct task_struct *g, *p; read_lock(tasklist_lock); do_each_thread(g, p) { if (p-cpuset == cs) { pidarray[n++] = p-pid; /* Death if pidarray == NULL */ if (unlikely(n == npids)) goto array_full; } } while_each_thread(g, p); Perhaps if you moved the if (unlikely(n == npids)) test before the pidarray[n++] = p-pid assignment, it would be safe. And does the next line of code, the call to sort() after the call of pid_array_load(), work with pidarray == NULL and npids == 0: npids = pid_array_load(pidarray, npids, cs); sort(pidarray, npids, sizeof(pid_t), cmppid, NULL); /* == ?? */ Off hand, I didn't know. I guess it must, or you would have already tripped over it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Srinivasa Ds wrote: --- kernel/cpuset.c | 10 ++ 1 file changed, 10 insertions(+) Index: linux-2.6.22-rc3/kernel/cpuset.c === --- linux-2.6.22-rc3.orig/kernel/cpuset.c +++ linux-2.6.22-rc3/kernel/cpuset.c @@ -1741,6 +1741,13 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(cs-count); + if (!npids) { + ctr-buf = NULL; + ctr-bufsz = 0; + file-private_data = ctr; + return 0; + } + I think this is a good example of why having to special-case kmalloc(0) is a bad idea. The original code was straightforward and, barring silliness, should be completely correct with npids==0. This new code does nothing other than make things more complex. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Paul Jackson wrote: There are no checks necessary. Your function worked fine so far for the case of zero objects with the pointer returned by kmalloc. If the code is correct then it will not dereference the pointer to the zero sized array. If not then we may find a bug and fix it. I suspect you got lucky. The check for a full pidarray[] in the routine pid_array_load() occurs -after- a pid is put in the array. If a task showed up in this cpuset at the wrong time, we would fall over and die in the code: Then you are deferencing an element in the pidarray that you did not allocate! This is a bug in cpuset code. So we would need to allocate at mininum one array element? Or would we need to allocate npids + 1 to be safe??? --- kernel/cpuset.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/kernel/cpuset.c === --- linux-2.6.orig/kernel/cpuset.c 2007-06-01 13:41:24.0 -0700 +++ linux-2.6/kernel/cpuset.c 2007-06-01 13:42:08.0 -0700 @@ -1741,7 +1741,7 @@ static int cpuset_tasks_open(struct inod * show up until sometime later on. */ npids = atomic_read(cs-count); - pidarray = kmalloc(npids * sizeof(pid_t), GFP_KERNEL); + pidarray = kmalloc(max(1, npids) * sizeof(pid_t), GFP_KERNEL); if (!pidarray) goto err1; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
I think this is a good example of why having to special-case kmalloc(0) is a bad idea. The original code was straightforward and, barring silliness, should be completely correct with npids==0. This new code does nothing other than make things more complex. Perhaps so. Perhaps not. Actually, the original cpuset pid_array_load() code was not correct, and Christoph's work is smoking this out. The original code assumed that it could access the first element of the kmalloc'd array, before it checked if it could go even further. But with the kmalloc call of: kmalloc(npids * sizeof(pid_t), GFP_KERNEL); it is impossible for kmalloc to know how much to allocate, to guarantee such behaviour. If npids is zero, all kmalloc sees, in essence, is the call: kmalloc(0, GFP_KERNEL); There is no way it can guess that I might still want to access the first size(pid_t) bytes of whatever it returned. If we had a different sort of kmalloc API, such as would be called with: kmalloc(npids, sizeof(pid_t), GFP_KERNEL); rather like the libc calloc() routine, having separate count and size fields, then in theory, kmalloc could treat calls of the form: kmalloc(0, sz, GFP_KERNEL); /* for sz 0 */ as if they were actually: kmalloc(1, sz, GFP_KERNEL); and then my cpuset code might not have been broken. But, since some of us can never remember whether it is the size or the count that comes first in such an API, we would still have had such bugs -- just fewer, for those who got those arguments backwards, and finally tripped over this anyway. That doesn't address the other likely broken code, elsewhere in the kernel somewhere that happened to try to access the 2nd or 3rd element of such an empty array before noticing. So ... while I started this reply agreeing with you, I end up still agreeing with Christoph's changes here. Thanks, Christoph, for finding a bug in the cpuset code - good work! -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote: I think this is a good example of why having to special-case kmalloc(0) is a bad idea. The original code was straightforward and, barring silliness, should be completely correct with npids==0. This new code does nothing other than make things more complex. Hehe we got you. The code is indexing the pidarray allocated with kmalloc(0). So it uncovered a latent bug. It only worked because SLAB gave him 32 bytes and it now only works because SLUB give him 8. That is enough to illegally index the first array element. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph wrote: Then you are deferencing an element in the pidarray that you did not allocate! This is a bug in cpuset code. Absolutely - as I described in more detail in a reply Jeremy a few minutes ago. Thanks for smoking it out. If, as I suggested in my previous message to which you are responding: Perhaps if you moved the if (unlikely(n == npids)) test before the pidarray[n++] = p-pid assignment, it would be safe. then it looks safe to me, without having to add the bogus +1 to the allocated size. That is, I suspect the following patch fixes this long standing cpuset bug (warning - white space mangled): --- kernel/cpuset.c 2006-12-10 12:27:37.0 -0800 +++ kernel/cpuset.c.new 2007-06-01 13:53:00.271010074 -0700 @@ -1661,9 +1661,9 @@ static int pid_array_load(pid_t *pidarra do_each_thread(g, p) { if (p-cpuset == cs) { - pidarray[n++] = p-pid; if (unlikely(n == npids)) goto array_full; + pidarray[n++] = p-pid; } } while_each_thread(g, p); -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph Lameter wrote: On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote: I think this is a good example of why having to special-case kmalloc(0) is a bad idea. The original code was straightforward and, barring silliness, should be completely correct with npids==0. This new code does nothing other than make things more complex. Hehe we got you. Bugger ;) Well, my get-out clause was barring silliness. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 13:47:23 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: On Fri, 1 Jun 2007, Jeremy Fitzhardinge wrote: I think this is a good example of why having to special-case kmalloc(0) is a bad idea. The original code was straightforward and, barring silliness, should be completely correct with npids==0. This new code does nothing other than make things more complex. Hehe we got you. The code is indexing the pidarray allocated with kmalloc(0). So it uncovered a latent bug. It only worked because SLAB gave him 32 bytes and it now only works because SLUB give him 8. That is enough to illegally index the first array element. Poisoning and redzoning could have caught that. But I guess it doesn't matter now, as this shortcoming is specific to the zero-length allocations, and we're weeding those out anyway. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: Poisoning and redzoning could have caught that. Redzoning would not have caught it. This was a kmalloc allocation and SLAB always gave them 32 bytes to play with. Only writes more than 32 bytes behind would have been caught. Poisoning is only applicable to unallocated objects and these were allocated. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 14:45:27 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: On Fri, 1 Jun 2007, Andrew Morton wrote: Poisoning and redzoning could have caught that. Redzoning would not have caught it. This was a kmalloc allocation and SLAB always gave them 32 bytes to play with. Only writes more than 32 bytes behind would have been caught. Poisoning is only applicable to unallocated objects and these were allocated. Nope and nope. This is a special case where the user asked for zero bytes and the kernel gave him 8 (or 32) bytes instead. If slab was smart enough, it would have poisoned those 8 bytes to some known pattern, and then checked that they still had that pattern when the memory got freed again. But it isn't smart enough, so the bug went undetected. As I said, it's specific to the kmalloc(0) problem, and we're fixing that by other means anyway. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: If slab was smart enough, it would have poisoned those 8 bytes to some known pattern, and then checked that they still had that pattern when the memory got freed again. So this is new feature request? But it isn't smart enough, so the bug went undetected. I should make SLUB put poisoning values in unused areas of a kmalloced object? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 15:20:05 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: On Fri, 1 Jun 2007, Andrew Morton wrote: If slab was smart enough, it would have poisoned those 8 bytes to some known pattern, and then checked that they still had that pattern when the memory got freed again. So this is new feature request? But it isn't smart enough, so the bug went undetected. I should make SLUB put poisoning values in unused areas of a kmalloced object? hm, I hadn't thought of it that way actually. I was thinking it was specific to kmalloc(0) but as you point out, the situation is generalisable. Yes, if someone does kmalloc(42) and we satisfy the allocation from the size-64 slab, we should poison and then check the allegedly-unused 22 bytes. Please ;) (vaguely stunned that we didn't think of doing this years ago). It'll be a large patch, I expect? Actually, I have this vague memory that slab would take that kmalloc(42) and would then return kmalloc(64)+22, so the returned memory is right-aligned. This way the existing overrun-detection is applicable to all kmallocs. Maybe I dreamed it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: I should make SLUB put poisoning values in unused areas of a kmalloced object? hm, I hadn't thought of it that way actually. I was thinking it was specific to kmalloc(0) but as you point out, the situation is generalisable. Right it could catch a lot of other bugs as well. Yes, if someone does kmalloc(42) and we satisfy the allocation from the size-64 slab, we should poison and then check the allegedly-unused 22 bytes. Please ;) (vaguely stunned that we didn't think of doing this years ago). Well there are architectural problems. We determine the power of two slab at compile time. The object size information is currently not available in the binary :=). It'll be a large patch, I expect? Ummm... Yes. We need to switch off the compile time power of two slab calculation. Then I need to have some way of storing the object size in the metainformation of each object. Changes a lot of function calls. Actually, I have this vague memory that slab would take that kmalloc(42) and would then return kmalloc(64)+22, so the returned memory is right-aligned. This way the existing overrun-detection is applicable to all kmallocs. Maybe I dreamed it. Yes, that would be possible by simply adding a compile time generated offset to the compile time generated call to kmem_cache_alloc. But then kfree() will have a difficult time figuring out which object to free. Hmmm... But I can get to the slab via the page struct which allows me to figure out the power of two size. This would mean that kfree would work with an arbitrary pointer anywhere into the object. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: Right it could catch a lot of other bugs as well. I'd actually prefer malloc(0) to _not_ return NULL, but some known (non-NULL) bogus pointer. Why? Because it's quite sane to have simple logic like ptr = malloc(size); if (!ptr) return -ENOMEM; and writing it as if (size !ptr) return -ENOMEM; is just annoying. Also, NULL is _special_. There are absolutely tons of code in the kernel (and elsewhere) that just does something *different* from NULL pointers, and that totally breaks the whole notion of you can allocate a zero-sized allocation, you just must not dereference it. If people special-case NULL as something else, they won't even go through the normal code-path. So for *both* of the above reasons, it's actually stupid to return NULL for a zero-sized allocation. It would be much better to return another pointer that will trap on access. A good candidate might be to return #define BADPTR ((void *)16) which is a portable-enough (where portable-enough is against strict ANSI C rules, but works in practice on all architectures) way to return something that will cause the same page fault behaviour as NULL, but will _not_ trigger the NULL is special code. (Of course, you then need to teach kfree() to accept it as another pointer to be ignored, that's fine). I bet you'd find *more* problems that way than by returning NULL, and you'd also avoid the whole problem with if (!ptr) return -ENOMEM. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 15:41:48 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: On Fri, 1 Jun 2007, Andrew Morton wrote: I should make SLUB put poisoning values in unused areas of a kmalloced object? hm, I hadn't thought of it that way actually. I was thinking it was specific to kmalloc(0) but as you point out, the situation is generalisable. Right it could catch a lot of other bugs as well. Yes, if someone does kmalloc(42) and we satisfy the allocation from the size-64 slab, we should poison and then check the allegedly-unused 22 bytes. Please ;) (vaguely stunned that we didn't think of doing this years ago). Well there are architectural problems. We determine the power of two slab at compile time. The object size information is currently not available in the binary :=). It'll be a large patch, I expect? Ummm... Yes. We need to switch off the compile time power of two slab calculation. Then I need to have some way of storing the object size in the metainformation of each object. Changes a lot of function calls. Oh well. Don't lose any sleep over it ;) loses sleep We could store the size of the allocation in the allocated object? Just add four bytes to the user's request, then pick the appropriate cache based on that, then put the user's `size' at the tail of the resulting allocation? So a kmalloc(62) would get upped to 66, so we allocate from size-128 and put the number 62 at bytes 124-127 and we poison bytes 62-123? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
So a kmalloc(62) would get upped to 66, so we allocate from size-128 and put the number 62 at bytes 124-127 and we poison bytes 62-123? Hmmm... We are going rapidly here. This is a patch that I am testing right now. It right adjust the object and the patch is manageable: SLUB mm-only: Right align kmalloc objects to trigger overwrite detection Right align kmalloc objects if they are less than the full kmalloc slab size. This will move the object to be flush with the end of the object in order to allow the easy detection of writes / reads after the end of the kmalloc object. Without slub_debug overwrites will destroy the free pointer of the next object or the next object. Read will yield garbage that is likely zero. With slub_debug redzone checks will be triggered. Reads will read redzone poison. This patch is only for checking things out. There are issues: 1. Alignment of kmalloc objects may now be different. In particular objects whose size is not a multiple of wordsize may be not word alignmed. 2. __kmalloc and kfree need to touch an additional cacheline in struct kmem_cache thereby reducing performance. 3. An object allocated via kmalloc may no longer be freed via kmem_cache_free. So we need to figure out some may to make this configurable. Preferably runtime configurable. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/slub_def.h | 22 +++--- mm/slub.c| 11 --- 2 files changed, 27 insertions(+), 6 deletions(-) Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-06-01 15:56:42.0 -0700 +++ slub/include/linux/slub_def.h 2007-06-01 16:00:03.0 -0700 @@ -120,6 +120,19 @@ static inline struct kmem_cache *kmalloc return kmalloc_caches[index]; } +static inline unsigned long kmalloc_size(size_t size) +{ + int index = kmalloc_index(size); + + if (index = KMALLOC_SHIFT_LOW) + return 1 index; + + if (index == 2) + return 192; + return 96; +} + + #ifdef CONFIG_ZONE_DMA #define SLUB_DMA __GFP_DMA #else @@ -135,7 +148,8 @@ static inline void *kmalloc(size_t size, if (!s) return NULL; - return kmem_cache_alloc(s, flags); + return kmem_cache_alloc(s, flags) + + kmalloc_size(size) - size; } else return __kmalloc(size, flags); } @@ -148,7 +162,8 @@ static inline void *kzalloc(size_t size, if (!s) return NULL; - return kmem_cache_zalloc(s, flags); + return kmem_cache_zalloc(s, flags) + + kmalloc_size(size) - size; } else return __kzalloc(size, flags); } @@ -159,7 +174,8 @@ extern void *__kmalloc_node(size_t size, static inline void *kmalloc_node(size_t size, gfp_t flags, int node) { if (__builtin_constant_p(size) !(flags SLUB_DMA)) { - struct kmem_cache *s = kmalloc_slab(size); + struct kmem_cache *s = kmalloc_slab(size) + + kmalloc_size(size) - size; if (!s) return NULL; Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-06-01 15:51:05.0 -0700 +++ slub/mm/slub.c 2007-06-01 16:15:21.0 -0700 @@ -2283,9 +2283,10 @@ static struct kmem_cache *get_slab(size_ void *__kmalloc(size_t size, gfp_t flags) { struct kmem_cache *s = get_slab(size, flags); + int offset = size - s-size; if (s) - return slab_alloc(s, flags, -1, __builtin_return_address(0)); + return slab_alloc(s, flags, -1, __builtin_return_address(0)) + offset; return NULL; } EXPORT_SYMBOL(__kmalloc); @@ -2294,9 +2295,10 @@ EXPORT_SYMBOL(__kmalloc); void *__kmalloc_node(size_t size, gfp_t flags, int node) { struct kmem_cache *s = get_slab(size, flags); + int offset = size - s-size; if (s) - return slab_alloc(s, flags, node, __builtin_return_address(0)); + return slab_alloc(s, flags, node, __builtin_return_address(0)) + offset; return NULL; } EXPORT_SYMBOL(__kmalloc_node); @@ -2337,6 +2339,7 @@ void kfree(const void *x) { struct kmem_cache *s; struct page *page; + unsigned long addr = (unsigned long) x; if (!x) return; @@ -2344,7 +2347,9 @@ void kfree(const void *x) page = virt_to_head_page(x); s = page-slab; - slab_free(s, page, (void *)x, __builtin_return_address(0)); + addr = ~((unsigned long)s-size - 1); + + slab_free(s, page, (void *)addr, __builtin_return_address(0)); } EXPORT_SYMBOL(kfree); - To unsubscribe from this list: send
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: Hmmm... We are going rapidly here. This is a patch that I am testing right now. It right adjust the object and the patch is manageable: Does not boot sigh. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: We could store the size of the allocation in the allocated object? Just add four bytes to the user's request, then pick the appropriate cache based on that, then put the user's `size' at the tail of the resulting allocation? It should be easy enough to do it for _most_ allocations by just doing it when there is already enough slack to do it (which is likely true most of the time). IOW, if you ask for a 42-byte allocation, and we allocate from a 64-byte slab, you get the slab allocation at address X, you don't actually have to return X at all. Just return X+8, and then you do: - at 32-bit word at X+0 you put the real length - at 32-bit word at X+4 you put some good redzone marker - at 32-bit word at X + reallen + 8 you put the endzone marker. And then you say: if the real length was within 12 bytes of the allocation length, we just don't do this. So you wouldn't get any redzoning for those allocations that are exactly sized (or close enough) to fit in an allocation block, but I bet *most* allocations would get this for free. And then, if you actually turn on redzoning, you just always add the 12 byte to the allocation size (assuming the alignment rules allow you to). The nice thing about this is that the freeing path already knows where the object is *supposed* to start (because it sees the allocation size in the slub/slab data structures), so the kfree() path can actually figure out on its own whether it is given a X or an X+8 kind of address. So you don't actually need any extra information. You literally just need enough slop in the allocation that you can do this in the first place, so there is no cost (except for the cost of checking itself, of course). Hmm? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: So for *both* of the above reasons, it's actually stupid to return NULL for a zero-sized allocation. It would be much better to return another pointer that will trap on access. A good candidate might be to return #define BADPTR ((void *)16) Something like this? (Not tested yet just for review): SLUB: Return BADPTR instead of warning for kmalloc(0) Remove the WARN_ON_ONCE and simply return BADPTR. BADPTR can be used legitimately as long as it is not dereferenced. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/slub_def.h | 18 -- mm/slub.c|8 2 files changed, 12 insertions(+), 14 deletions(-) Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-06-01 16:19:36.0 -0700 +++ slub/include/linux/slub_def.h 2007-06-01 16:24:54.0 -0700 @@ -12,6 +12,8 @@ #include linux/kobject.h #include linux/log2.h +#define BADPTR ((void *)16) + struct kmem_cache_node { spinlock_t list_lock; /* Protect partial list and nr_partial */ unsigned long nr_partial; @@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[ */ static inline int kmalloc_index(size_t size) { - /* -* We should return 0 if size == 0 (which would result in the -* kmalloc caller to get NULL) but we use the smallest object -* here for legacy reasons. Just issue a warning so that -* we can discover locations where we do 0 sized allocations. -*/ - WARN_ON_ONCE(size == 0); + + if (!size) + return 0; if (size KMALLOC_MAX_SIZE) return -1; @@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc(s, flags); } else @@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_zalloc(s, flags); } else @@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc_node(s, flags, node); } else Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-06-01 16:21:00.0 -0700 +++ slub/mm/slub.c 2007-06-01 16:27:12.0 -0700 @@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags if (s) return slab_alloc(s, flags, -1, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc); @@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t if (s) return slab_alloc(s, flags, node, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc_node); #endif @@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, -1, caller); } @@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, node, caller); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: Hmmm... We are going rapidly here. This is a patch that I am testing right now. It right adjust the object and the patch is manageable: No, I don't think you can do it this way. At a minimum, you'd need to test that the result is word-aligned. Preferably 8-byte aligned. We literally have stuff that knows about these things and uses the low bits in the pointer to keep extra data. (That said, I didn't check whether we actually kmalloc() the data, but I think we do - things like struct key etc). Now, maybe those things always have a 8-byte-aligned size, and it works out, but I'd be worried. Of course, there migth be other (even more subtle) cases where we just assume certain alignment, and depend on the fact that we just _happen_ to get it. Who knows.. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: No, I don't think you can do it this way. Ultimately not. But its worth to see if this works. At a minimum, you'd need to test that the result is word-aligned. Preferably 8-byte aligned. We literally have stuff that knows about these things and uses the low bits in the pointer to keep extra data. kmalloc allocations are guaranteed to be aligned to KMALLOC_MINALIGN. I can bring that in but it will make the patch less readable. Of course, there migth be other (even more subtle) cases where we just assume certain alignment, and depend on the fact that we just _happen_ to get it. Who knows.. I tried to get rid of those cased and I hope that work is complete in 2.6.22. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: Something like this? (Not tested yet just for review): I'm not seeing the path on the freeing side that silently accepts BADPTR. It is not ok to derefence BADPTR, but it's obviously ok to _free_ it. Also, if I read the patch correctly, you _also_ return BADPTR for slabs that are too large. No? That would be wrong - those need to return NULL for out of memory. (But I only looked at the diff, not at the end result, so it may be that all the cases where you changed return NULL into return BADPTR were really just the size-zero ones - it just looked suspicious). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: I'm not seeing the path on the freeing side that silently accepts BADPTR. It is not ok to derefence BADPTR, but it's obviously ok to _free_ it. Ok. Also, if I read the patch correctly, you _also_ return BADPTR for slabs that are too large. No? That would be wrong - those need to return NULL for out of memory. A too large alloc is 32MB or MAX_ORDER PAGE_SIZE. A BUG_ON in kmalloc_slab() will trigger. Here is the updated patch. It works fine here: SLUB: Return BADPTR instead of warning for kmalloc(0) Remove the WARN_ON_ONCE and simply return BADPTR. BADPTR can be used legitimately as long as it is not dereferenced. Can even be freed. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/slub_def.h | 18 -- mm/slub.c| 10 +- 2 files changed, 13 insertions(+), 15 deletions(-) Index: slub/include/linux/slub_def.h === --- slub.orig/include/linux/slub_def.h 2007-06-01 16:19:36.0 -0700 +++ slub/include/linux/slub_def.h 2007-06-01 16:43:57.0 -0700 @@ -12,6 +12,8 @@ #include linux/kobject.h #include linux/log2.h +#define BADPTR ((void *)16) + struct kmem_cache_node { spinlock_t list_lock; /* Protect partial list and nr_partial */ unsigned long nr_partial; @@ -74,13 +76,9 @@ extern struct kmem_cache kmalloc_caches[ */ static inline int kmalloc_index(size_t size) { - /* -* We should return 0 if size == 0 (which would result in the -* kmalloc caller to get NULL) but we use the smallest object -* here for legacy reasons. Just issue a warning so that -* we can discover locations where we do 0 sized allocations. -*/ - WARN_ON_ONCE(size == 0); + + if (!size) + return 0; if (size KMALLOC_MAX_SIZE) return -1; @@ -133,7 +131,7 @@ static inline void *kmalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc(s, flags); } else @@ -146,7 +144,7 @@ static inline void *kzalloc(size_t size, struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_zalloc(s, flags); } else @@ -162,7 +160,7 @@ static inline void *kmalloc_node(size_t struct kmem_cache *s = kmalloc_slab(size); if (!s) - return NULL; + return BADPTR; return kmem_cache_alloc_node(s, flags, node); } else Index: slub/mm/slub.c === --- slub.orig/mm/slub.c 2007-06-01 16:21:00.0 -0700 +++ slub/mm/slub.c 2007-06-01 16:43:27.0 -0700 @@ -2286,7 +2286,7 @@ void *__kmalloc(size_t size, gfp_t flags if (s) return slab_alloc(s, flags, -1, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc); @@ -2297,7 +2297,7 @@ void *__kmalloc_node(size_t size, gfp_t if (s) return slab_alloc(s, flags, node, __builtin_return_address(0)); - return NULL; + return BADPTR; } EXPORT_SYMBOL(__kmalloc_node); #endif @@ -2338,7 +2338,7 @@ void kfree(const void *x) struct kmem_cache *s; struct page *page; - if (!x) + if (!x || x == BADPTR) return; page = virt_to_head_page(x); @@ -2707,7 +2707,7 @@ void *__kmalloc_track_caller(size_t size struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, -1, caller); } @@ -2718,7 +2718,7 @@ void *__kmalloc_node_track_caller(size_t struct kmem_cache *s = get_slab(size, gfpflags); if (!s) - return NULL; + return BADPTR; return slab_alloc(s, gfpflags, node, caller); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: A too large alloc is 32MB or MAX_ORDER PAGE_SIZE. A BUG_ON in kmalloc_slab() will trigger. Did we use to BUG_ON()? I think that's wrong. There are ways for users to potentially ask the kernel to do big allocations, and the correct response is to say no can do, not to crash! Here is the updated patch. It works fine here: SLUB: Return BADPTR instead of warning for kmalloc(0) Looks fine to me. My only comment is that - if (!x) + if (!x || x == BADPTR) return; This could be micro-optimized (again, non-standard, but it should be practically portable) to have just a single test using something like if ((unsigned long)x = 16) return; but I guess it doesn't really matter much. I think this is better than what we have now, but I also suspect it's *not* something we should try this late in the -rc sequence ;) Andrew, want to take this patch to -mm to see if it triggers anything? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Linus Torvalds wrote: A too large alloc is 32MB or MAX_ORDER PAGE_SIZE. A BUG_ON in kmalloc_slab() will trigger. Did we use to BUG_ON()? I think that's wrong. There are ways for users to potentially ask the kernel to do big allocations, and the correct response is to say no can do, not to crash! There is no way to distinguish that from out of memory. Failing on large allocs is what we have always done for kmalloc(). Before 2.6.22 we used to fail for allocs 256k which was a big nuisance for NUMAs large allocs. The patches in 2.6.22 allow us for the first time to allocate arbitrary sized objects up to MAX_ORDER. So we no longer have troubles with large NUMA objects. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007 16:57:20 -0700 (PDT) Linus Torvalds [EMAIL PROTECTED] wrote: Andrew, want to take this patch to -mm to see if it triggers anything? spose so. I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there, because it is exposing some coding warts. But we should turn it off for 2.6.22 and make it conditional on CONFIG_DEVEL_KERNEL (or whatever it will be called) later. The BADPTR thing is a little worrying because it will make previously-working-by-luck code go oops. I guess we can live with that. So we end up with the BADPTR code enabled even in production kernels, in which case your ((unsigned long)x = 16) trick is worth doing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Andrew Morton wrote: I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there, The trouble with the WARN_ON is that it triggers even for code that is okay like noted by Jeremy. My initial intend with NULL was to allow the allocation of a zero sized pointer without extra checks. It should only trigger a failure if something bad was done. NULL had the problem of confusion with no memory available. I think BADPTR is good. If the coding warts are causing trouble then BADPTR will result in a failure. Otherwise if the code is doing a kmalloc(0) and not dereferencing the pointer (like Paul's fixed code) then we should be fine and not issue a warning. The false positives may be upsetting some people. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Christoph Lameter wrote: Well there are architectural problems. We determine the power of two slab at compile time. The object size information is currently not available in the binary :=). That only applies to allocations with constant sizes. One presumes nobody is explicitly doing kmalloc(0), so we can use a separate runtime-computed-size path to do poisoning. (Which is probably 90% of the problem, since people who kmalloc(sizeof(struct foo)) will generally stay within bounds without too much effort.) J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Andrew Morton wrote: As I said, it's specific to the kmalloc(0) problem, and we're fixing that by other means anyway. I guess I'm not getting much traction in my campaign for equal rights for zero-sized allocations. They're perfectly reasonable things to have, if that's the way the code falls out. The warning is just prejudice. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Linus Torvalds wrote: So for *both* of the above reasons, it's actually stupid to return NULL for a zero-sized allocation. It would be much better to return another pointer that will trap on access. A good candidate might be to return #define BADPTR ((void *)16) I think this is a good idea in principle, but I wonder if there's any code which assumes that kmalloc(x) != kmalloc(x) for all non-NULL returns from kmalloc. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 01 Jun 2007 17:43:46 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Andrew Morton wrote: As I said, it's specific to the kmalloc(0) problem, and we're fixing that by other means anyway. I guess I'm not getting much traction in my campaign for equal rights for zero-sized allocations. They're perfectly reasonable things to have, if that's the way the code falls out. The warning is just prejudice. In some cases, sure, and we should remove the warning at some stage. But it has exposed a couple of bugs and a few weird things. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
Andrew Morton wrote: In some cases, sure, and we should remove the warning at some stage. But it has exposed a couple of bugs and a few weird things. We just need to scatter better bug powder into the corners of the allocations. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, Christoph Lameter wrote: On Fri, 1 Jun 2007, Andrew Morton wrote: I think it'd be better if we kept the WARN_ON_ONCE(size == 0) in there, The trouble with the WARN_ON is that it triggers even for code that is okay like noted by Jeremy. Yes. Sometimes it's just more natural to have ptr = kmalloc(size); .. use it .. free(ptr); and if a *degenerate* case of size=0 happens, who cares? It should just work, as long as we (obviously) don't actually try to access the pointer. So I don't much like the WARN_ON(size == 0). I think it potentially just causes people to write around it, and quite possibly causes the callers to write code that is not at all more readable or maintainable! That's why I'd much rather return BADPTR instead: we'll get an oops for buggy code, but we don't penalize the natural and good code! So once you return BADPTR, there really isn't any good reason for the WARN_ON. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said: #define BADPTR ((void *)16) I bet you'd find *more* problems that way than by returning NULL, and you'd also avoid the whole problem with if (!ptr) return -ENOMEM. Hmm.. this looks like a good contender for first usage of #ifndef CONFIG_STABLE :) pgp5hRgrE3k4B.pgp Description: PGP signature
Re: [RFC] [PATCH] cpuset operations causes Badness at mm/slab.c:777 warning
On Fri, 1 Jun 2007, [EMAIL PROTECTED] wrote: On Fri, 01 Jun 2007 16:00:30 PDT, Linus Torvalds said: #define BADPTR ((void *)16) I bet you'd find *more* problems that way than by returning NULL, and you'd also avoid the whole problem with if (!ptr) return -ENOMEM. Hmm.. this looks like a good contender for first usage of #ifndef CONFIG_STABLE The warning? Sure but the BADPTR (or ZERO_SIZE_PTR now) will stay. It is definitely an error to deference an object that has no size. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/