Re: [PATCH v3 1/2] kunit: add a KUnit test for SLUB debugging functionality

Marco Elver Thu, 01 Apr 2021 02:17:44 -0700

[Note, if you'd like me to see future versions, please Cc me, otherwise
it's unlikely I see it in time. Also add [email protected] if
perhaps a KUnit dev should have another look, too.]


On Wed, Mar 31, 2021 at 10:51AM +0200, [email protected] wrote:
> From: Oliver Glitta <[email protected]>
> 
> SLUB has resiliency_test() function which is hidden behind #ifdef
> SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody
> runs it. KUnit should be a proper replacement for it.
> 
> Try changing byte in redzone after allocation and changing
> pointer to next free node, first byte, 50th byte and redzone
> byte. Check if validation finds errors.
> 
> There are several differences from the original resiliency test:
> Tests create own caches with known state instead of corrupting
> shared kmalloc caches.
> 
> The corruption of freepointer uses correct offset, the original
> resiliency test got broken with freepointer changes.
> 
> Scratch changing random byte test, because it does not have
> meaning in this form where we need deterministic results.
> 
> Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig.
> 
> Add a counter field "errors" to struct kmem_cache to count number
> of errors detected in cache.
> 
> Silence bug report in SLUB test. Add SLAB_SILENT_ERRORS debug flag.
> Add SLAB_SILENT_ERRORS flag to SLAB_NEVER_MERGE, SLAB_DEBUG_FLAGS,
> SLAB_FLAGS_PERMITTED macros.
> 
> Signed-off-by: Oliver Glitta <[email protected]>
> ---
> Changes since v2
> 
> Use bit operation & instead of logical && as reported by kernel test 
> robot and Dan Carpenter
> 
> Changes since v1
> 
> Conversion from kselftest to KUnit test suggested by Marco Elver.
> Error silencing.
> Error counting improvements. 
> 
>  include/linux/slab.h     |   2 +
>  include/linux/slub_def.h |   2 +
>  lib/Kconfig.debug        |   5 ++
>  lib/Makefile             |   1 +
>  lib/test_slub.c          | 124 +++++++++++++++++++++++++++++++++++++++
>  mm/slab.h                |   7 ++-
>  mm/slab_common.c         |   2 +-
>  mm/slub.c                |  64 +++++++++++++-------
>  8 files changed, 184 insertions(+), 23 deletions(-)
>  create mode 100644 lib/test_slub.c
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 7ae604076767..ed1a5a64d028 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -25,6 +25,8 @@
>   */
>  /* DEBUG: Perform (expensive) checks on alloc/free */
>  #define SLAB_CONSISTENCY_CHECKS      ((slab_flags_t __force)0x00000100U)
> +/* DEBUG: Silent bug reports */
> +#define SLAB_SILENT_ERRORS   ((slab_flags_t __force)0x00000200U)

This flag wouldn't be necessary if you do the design using
kunit_resource (see below).

(But perhaps I missed a conversation that said that this flag is
generally useful, but if so, it should probably be in a separate patch
justifying why it is required beyond the test.)

>  /* DEBUG: Red zone objs in a cache */
>  #define SLAB_RED_ZONE                ((slab_flags_t __force)0x00000400U)
>  /* DEBUG: Poison objects */
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index dcde82a4434c..e4b51bb5bb83 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -133,6 +133,8 @@ struct kmem_cache {
>       unsigned int usersize;          /* Usercopy region size */
>  
>       struct kmem_cache_node *node[MAX_NUMNODES];
> +
> +     int errors;                     /* Number of errors in cache */

So, I think it's bad design to add a new field 'errors', just for the
test. This will increase kmem_cache size for all builds, which is
unnecessary.

Is there use to retrieve 'errors' elsewhere?

While you could guard this with #ifdef CONFIG_SLUB_DEBUG or so, there's
a better design option if this is just for the KUnit test's benefit: use
kunit_resource.

The way it'd work is that for each test (you can add a common init
function) you add a named resource, in this case just an 'int' I guess,
that slab would be able to retrieve if this test is being run.

In the test somewhere, you could add something like this:


        static struct kunit_resource resource;
        static int slab_errors;

        ..........

        static int test_init(struct kunit *test)
        {
                slab_errors = 0;
                kunit_add_named_resource(test, NULL, NULL, &resource,
                                         "slab_errors", &slab_errors);
                return 0;
        }

        ...... tests now check slab_errors .....

and then in slub.c you'd have:

        #if IS_ENABLED(CONFIG_KUNIT)
        static bool slab_add_kunit_errors(void)
        {
                struct kunit_resource *resource;

                if (likely(!current->kunit_test))
                        return false;
                resource = kunit_find_named_resource(current->kunit_test, 
"slab_errors");
                if (!resource)
                        return false;
                (*(int *)resource->data)++;
                kunit_put_resource(resource);
        }
        #else
        static inline bool slab_add_kunit_errors(void) { return false; }
        #endif

And anywhere you want to increase the error count, you'd call
slab_add_kunit_errors().

Another benefit of this approach is that if KUnit is disabled, there is
zero overhead and no additional code generated (vs. the current
approach).

>  };
>  
>  #ifdef CONFIG_SLUB_CPU_PARTIAL
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 2779c29d9981..e0dec830c269 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2371,6 +2371,11 @@ config BITS_TEST
>  
>         If unsure, say N.
>  
> +config SLUB_KUNIT_TEST
> +     tristate "KUnit Test for SLUB cache error detection" if !KUNIT_ALL_TESTS
> +     depends on SLUB_DEBUG && KUNIT
> +     default KUNIT_ALL_TESTS

Help text please.

> +
>  config TEST_UDELAY
>       tristate "udelay test driver"
>       help
> diff --git a/lib/Makefile b/lib/Makefile
> index b5307d3eec1a..e1eb986c0e87 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -352,5 +352,6 @@ obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o
>  obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o
>  obj-$(CONFIG_BITS_TEST) += test_bits.o
>  obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o
> +obj-$(CONFIG_SLUB_KUNIT_TEST) += test_slub.o

Any reason to not name this slub_kunit.c? This is a new test, and it
could follow the naming convention of the other KUnit tests.

Some of it is also documented:
https://www.kernel.org/doc/html/latest/dev-tools/kunit/style.html

>  obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o
> diff --git a/lib/test_slub.c b/lib/test_slub.c
> new file mode 100644
> index 000000000000..4f8ea3c7d867
> --- /dev/null
> +++ b/lib/test_slub.c
> @@ -0,0 +1,124 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <kunit/test.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include "../mm/slab.h"
> +
> +
> +static void test_clobber_zone(struct kunit *test)
> +{
> +     struct kmem_cache *s = kmem_cache_create("TestSlub_RZ_alloc", 64, 0,
> +                             SLAB_RED_ZONE | SLAB_SILENT_ERRORS, NULL);
> +     u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +     p[64] = 0x12;
> +
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 1, s->errors);
> +
> +     kmem_cache_free(s, p);
> +     kmem_cache_destroy(s);
> +}
> +
> +static void test_next_pointer(struct kunit *test)
> +{
> +     struct kmem_cache *s = kmem_cache_create("TestSlub_next_ptr_free", 64, 
> 0,
> +                             SLAB_POISON | SLAB_SILENT_ERRORS, NULL);
> +     u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +     unsigned long tmp;
> +     unsigned long *ptr_addr;
> +
> +     kmem_cache_free(s, p);
> +
> +     ptr_addr = (unsigned long *)(p + s->offset);
> +     tmp = *ptr_addr;
> +     p[s->offset] = 0x12;
> +
> +     /*
> +      * Expecting two errors.
> +      * One for the corrupted freechain and the other one for the wrong
> +      * count of objects in use.
> +      */
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 2, s->errors);
> +
> +     /*
> +      * Try to repair corrupted freepointer.
> +      * Still expecting one error for the wrong count of objects in use.
> +      */
> +     *ptr_addr = tmp;
> +
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 1, s->errors);
> +
> +     /*
> +      * Previous validation repaired the count of objects in use.
> +      * Now expecting no error.
> +      */
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 0, s->errors);
> +
> +     kmem_cache_destroy(s);
> +}
> +
> +static void test_first_word(struct kunit *test)
> +{
> +     struct kmem_cache *s = kmem_cache_create("TestSlub_1th_word_free", 64, 
> 0,
> +                             SLAB_POISON | SLAB_SILENT_ERRORS, NULL);
> +     u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +     kmem_cache_free(s, p);
> +     *p = 0x78;
> +
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 1, s->errors);
> +
> +     kmem_cache_destroy(s);
> +}
> +
> +static void test_clobber_50th_byte(struct kunit *test)
> +{
> +     struct kmem_cache *s = kmem_cache_create("TestSlub_50th_word_free", 64, 
> 0,
> +                             SLAB_POISON | SLAB_SILENT_ERRORS, NULL);
> +     u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +     kmem_cache_free(s, p);
> +     p[50] = 0x9a;
> +
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 1, s->errors);
> +     kmem_cache_destroy(s);
> +}
> +
> +static void test_clobber_redzone_free(struct kunit *test)
> +{
> +     struct kmem_cache *s = kmem_cache_create("TestSlub_RZ_free", 64, 0,
> +                             SLAB_RED_ZONE | SLAB_SILENT_ERRORS, NULL);
> +     u8 *p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +     kmem_cache_free(s, p);
> +     p[64] = 0xab;
> +
> +     validate_slab_cache(s);
> +     KUNIT_EXPECT_EQ(test, 1, s->errors);
> +     kmem_cache_destroy(s);
> +}
> +
> +static struct kunit_case test_cases[] = {
> +     KUNIT_CASE(test_clobber_zone),
> +     KUNIT_CASE(test_next_pointer),
> +     KUNIT_CASE(test_first_word),
> +     KUNIT_CASE(test_clobber_50th_byte),
> +     KUNIT_CASE(test_clobber_redzone_free),
> +     {}
> +};
> +
> +static struct kunit_suite test_suite = {
> +     .name = "slub_test",
> +     .test_cases = test_cases,

Here you could add a 

        .init = test_init,

or so, if you go with the kunit_resource design.

> +};
> +kunit_test_suite(test_suite);
> +
> +MODULE_LICENSE("GPL");
> diff --git a/mm/slab.h b/mm/slab.h
> index 076582f58f68..382507b6cab9 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -134,7 +134,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int 
> object_size,
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)
>  #elif defined(CONFIG_SLUB_DEBUG)
>  #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> -                       SLAB_TRACE | SLAB_CONSISTENCY_CHECKS)
> +                       SLAB_TRACE | SLAB_CONSISTENCY_CHECKS | \
> +                       SLAB_SILENT_ERRORS)
>  #else
>  #define SLAB_DEBUG_FLAGS (0)
>  #endif
> @@ -164,7 +165,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int 
> object_size,
>                             SLAB_NOLEAKTRACE | \
>                             SLAB_RECLAIM_ACCOUNT | \
>                             SLAB_TEMPORARY | \
> -                           SLAB_ACCOUNT)
> +                           SLAB_ACCOUNT | \
> +                           SLAB_SILENT_ERRORS)
>  
>  bool __kmem_cache_empty(struct kmem_cache *);
>  int __kmem_cache_shutdown(struct kmem_cache *);
> @@ -215,6 +217,7 @@ DECLARE_STATIC_KEY_TRUE(slub_debug_enabled);
>  DECLARE_STATIC_KEY_FALSE(slub_debug_enabled);
>  #endif
>  extern void print_tracking(struct kmem_cache *s, void *object);
> +long validate_slab_cache(struct kmem_cache *s);
>  #else
>  static inline void print_tracking(struct kmem_cache *s, void *object)
>  {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 88e833986332..239c9095e7ea 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -55,7 +55,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
>   */
>  #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
>               SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> -             SLAB_FAILSLAB | kasan_never_merge())
> +             SLAB_FAILSLAB | SLAB_SILENT_ERRORS | kasan_never_merge())
>  
>  #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
>                        SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> diff --git a/mm/slub.c b/mm/slub.c
> index 3021ce9bf1b3..7083e89432d0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -673,14 +673,16 @@ static void slab_bug(struct kmem_cache *s, char *fmt, 
> ...)
>  
>  static void slab_fix(struct kmem_cache *s, char *fmt, ...)
>  {
> -     struct va_format vaf;
> -     va_list args;
> -
> -     va_start(args, fmt);
> -     vaf.fmt = fmt;
> -     vaf.va = &args;
> -     pr_err("FIX %s: %pV\n", s->name, &vaf);
> -     va_end(args);
> +     if (!(s->flags & SLAB_SILENT_ERRORS)) {

This style of guarding causes lots of line diffs.

Instead, if the whole function is to be skipped, please prefer:

        if (skip_things_if_true)
                return;

instead of

        if (!skip_things_if_true) {
                ...
        }

here and everywhere else.

Specifically, for the kunit_resource design, this would simply become:

        if (slab_add_kunit_errors())
                return;

Meaning that whenever a KUnit test has a resource "slab_errors", no
messages are printed -- otherwise it'll always print a message.

> +             struct va_format vaf;
> +             va_list args;
> +
> +             va_start(args, fmt);
> +             vaf.fmt = fmt;
> +             vaf.va = &args;
> +             pr_err("FIX %s: %pV\n", s->name, &vaf);
> +             va_end(args);
> +     }
>  }
>  
>  static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
> @@ -739,8 +741,10 @@ static void print_trailer(struct kmem_cache *s, struct 
> page *page, u8 *p)
>  void object_err(struct kmem_cache *s, struct page *page,
>                       u8 *object, char *reason)
>  {
> -     slab_bug(s, "%s", reason);
> -     print_trailer(s, page, object);
> +     if (!(s->flags & SLAB_SILENT_ERRORS)) {
> +             slab_bug(s, "%s", reason);
> +             print_trailer(s, page, object);
> +     }
>  }
>  
>  static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page,
> @@ -752,9 +756,11 @@ static __printf(3, 4) void slab_err(struct kmem_cache 
> *s, struct page *page,
>       va_start(args, fmt);
>       vsnprintf(buf, sizeof(buf), fmt, args);
>       va_end(args);
> -     slab_bug(s, "%s", buf);
> -     print_page_info(page);
> -     dump_stack();
> +     if (!(s->flags & SLAB_SILENT_ERRORS)) {
> +             slab_bug(s, "%s", buf);
> +             print_page_info(page);
> +             dump_stack();
> +     }
>  }
>  
>  static void init_object(struct kmem_cache *s, void *object, u8 val)
> @@ -798,11 +804,13 @@ static int check_bytes_and_report(struct kmem_cache *s, 
> struct page *page,
>       while (end > fault && end[-1] == value)
>               end--;
>  
> -     slab_bug(s, "%s overwritten", what);
> -     pr_err("INFO: 0x%p-0x%p @offset=%tu. First byte 0x%x instead of 0x%x\n",
> +     if (!(s->flags & SLAB_SILENT_ERRORS)) {
> +             slab_bug(s, "%s overwritten", what);
> +             pr_err("INFO: 0x%p-0x%p @offset=%tu. First byte 0x%x instead of 
> 0x%x\n",
>                                       fault, end - 1, fault - addr,
>                                       fault[0], value);
> -     print_trailer(s, page, object);
> +             print_trailer(s, page, object);
> +     }
>  
>       restore_bytes(s, what, value, fault, end);
>       return 0;
> @@ -964,6 +972,7 @@ static int check_slab(struct kmem_cache *s, struct page 
> *page)
>  
>       if (!PageSlab(page)) {
>               slab_err(s, page, "Not a valid slab page");
> +             s->errors += 1;

So to me it looks like errors is set whenever there's a slab_err() or
slab_fix() call. So why not move setting that an error occurred into
those functions?

>               return 0;
>       }
>  
> @@ -971,11 +980,13 @@ static int check_slab(struct kmem_cache *s, struct page 
> *page)
>       if (page->objects > maxobj) {
>               slab_err(s, page, "objects %u > max %u",
>                       page->objects, maxobj);
> +             s->errors += 1;
>               return 0;
>       }
>       if (page->inuse > page->objects) {
>               slab_err(s, page, "inuse %u > max %u",
>                       page->inuse, page->objects);
> +             s->errors += 1;
>               return 0;
>       }
>       /* Slab_pad_check fixes things up after itself */
> @@ -1008,8 +1019,10 @@ static int on_freelist(struct kmem_cache *s, struct 
> page *page, void *search)
>                               page->freelist = NULL;
>                               page->inuse = page->objects;
>                               slab_fix(s, "Freelist cleared");
> +                             s->errors += 1;
>                               return 0;
>                       }
> +                     s->errors += 1;
>                       break;
>               }
>               object = fp;
> @@ -1026,12 +1039,14 @@ static int on_freelist(struct kmem_cache *s, struct 
> page *page, void *search)
>                        page->objects, max_objects);
>               page->objects = max_objects;
>               slab_fix(s, "Number of objects adjusted.");
> +             s->errors += 1;
>       }
>       if (page->inuse != page->objects - nr) {
>               slab_err(s, page, "Wrong object count. Counter is %d but 
> counted were %d",
>                        page->inuse, page->objects - nr);
>               page->inuse = page->objects - nr;
>               slab_fix(s, "Object count adjusted.");
> +             s->errors += 1;
>       }
>       return search == NULL;
>  }
> @@ -4629,8 +4644,10 @@ static void validate_slab(struct kmem_cache *s, struct 
> page *page)
>               u8 val = test_bit(__obj_to_index(s, addr, p), map) ?
>                        SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
>  
> -             if (!check_object(s, page, p, val))
> +             if (!check_object(s, page, p, val)) {
> +                     s->errors += 1;
>                       break;
> +             }
>       }
>       put_map(map);
>  unlock:
> @@ -4650,9 +4667,11 @@ static int validate_slab_node(struct kmem_cache *s,
>               validate_slab(s, page);
>               count++;
>       }
> -     if (count != n->nr_partial)
> +     if (count != n->nr_partial) {
>               pr_err("SLUB %s: %ld partial slabs counted but counter=%ld\n",
>                      s->name, count, n->nr_partial);
> +             s->errors += 1;
> +     }
>  
>       if (!(s->flags & SLAB_STORE_USER))
>               goto out;
> @@ -4661,20 +4680,23 @@ static int validate_slab_node(struct kmem_cache *s,
>               validate_slab(s, page);
>               count++;
>       }
> -     if (count != atomic_long_read(&n->nr_slabs))
> +     if (count != atomic_long_read(&n->nr_slabs)) {
>               pr_err("SLUB: %s %ld slabs counted but counter=%ld\n",
>                      s->name, count, atomic_long_read(&n->nr_slabs));
> +             s->errors += 1;

I think these here are a few cases where there's no slab_err() or
slab_fix() call, and open coding setting errors is still required...

> +     }
>  
>  out:
>       spin_unlock_irqrestore(&n->list_lock, flags);
>       return count;
>  }
>  
> -static long validate_slab_cache(struct kmem_cache *s)
> +long validate_slab_cache(struct kmem_cache *s)
>  {
>       int node;
>       unsigned long count = 0;
>       struct kmem_cache_node *n;
> +     s->errors = 0;

This would also go away if you use the kunit_resource design. I also
think it's better if the test fully controls 'slab_errors', and it isn't
reset by this function.

>       flush_all(s);
>       for_each_kmem_cache_node(s, node, n)
> @@ -4682,6 +4704,8 @@ static long validate_slab_cache(struct kmem_cache *s)
>  
>       return count;
>  }
> +EXPORT_SYMBOL(validate_slab_cache);
> +
>  /*
>   * Generate lists of code addresses where slabcache objects are allocated
>   * and freed.
> -- 
> 2.17.1

Re: [PATCH v3 1/2] kunit: add a KUnit test for SLUB debugging functionality

Reply via email to