On Mon, 7 Sep 2020 15:40:46 +0200
Marco Elver <el...@google.com> wrote:

> From: Alexander Potapenko <gli...@google.com>
> 
> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors.
> 
> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.
> 
> KFENCE objects each reside on a dedicated page, at either the left or
> right page boundaries. The pages to the left and right of the object
> page are "guard pages", whose attributes are changed to a protected
> state, and cause page faults on any attempted access to them. Such page
> faults are then intercepted by KFENCE, which handles the fault
> gracefully by reporting a memory access error.
> 
> Guarded allocations are set up based on a sample interval (can be set
> via kfence.sample_interval). After expiration of the sample interval, a
> guarded allocation from the KFENCE object pool is returned to the main
> allocator (SLAB or SLUB). At this point, the timer is reset, and the
> next allocation is set up after the expiration of the interval.
> 
> To enable/disable a KFENCE allocation through the main allocator's
> fast-path without overhead, KFENCE relies on static branches via the
> static keys infrastructure. The static branch is toggled to redirect the
> allocation to KFENCE. To date, we have verified by running synthetic
> benchmarks (sysbench I/O workloads) that a kernel compiled with KFENCE
> is performance-neutral compared to the non-KFENCE baseline.
> 
> For more details, see Documentation/dev-tools/kfence.rst (added later in
> the series).
> 
> Co-developed-by: Marco Elver <el...@google.com>
> Signed-off-by: Marco Elver <el...@google.com>
> Signed-off-by: Alexander Potapenko <gli...@google.com>

Interesting bit of work. A few trivial things inline I spotted whilst having
a first read through.

Thanks,

Jonathan

> +
> +static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, 
> gfp_t gfp)
> +{
> +     /*
> +      * Note: for allocations made before RNG initialization, will always
> +      * return zero. We still benefit from enabling KFENCE as early as
> +      * possible, even when the RNG is not yet available, as this will allow
> +      * KFENCE to detect bugs due to earlier allocations. The only downside
> +      * is that the out-of-bounds accesses detected are deterministic for
> +      * such allocations.
> +      */
> +     const bool right = prandom_u32_max(2);
> +     unsigned long flags;
> +     struct kfence_metadata *meta = NULL;
> +     void *addr = NULL;

I think this is set in all paths, so no need to initialize here.

> +
> +     /* Try to obtain a free object. */
> +     raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
> +     if (!list_empty(&kfence_freelist)) {
> +             meta = list_entry(kfence_freelist.next, struct kfence_metadata, 
> list);
> +             list_del_init(&meta->list);
> +     }
> +     raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
> +     if (!meta)
> +             return NULL;
> +
> +     if (unlikely(!raw_spin_trylock_irqsave(&meta->lock, flags))) {
> +             /*
> +              * This is extremely unlikely -- we are reporting on a
> +              * use-after-free, which locked meta->lock, and the reporting
> +              * code via printk calls kmalloc() which ends up in
> +              * kfence_alloc() and tries to grab the same object that we're
> +              * reporting on. While it has never been observed, lockdep does
> +              * report that there is a possibility of deadlock. Fix it by
> +              * using trylock and bailing out gracefully.
> +              */
> +             raw_spin_lock_irqsave(&kfence_freelist_lock, flags);
> +             /* Put the object back on the freelist. */
> +             list_add_tail(&meta->list, &kfence_freelist);
> +             raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags);
> +
> +             return NULL;
> +     }
> +
> +     meta->addr = metadata_to_pageaddr(meta);
> +     /* Unprotect if we're reusing this page. */
> +     if (meta->state == KFENCE_OBJECT_FREED)
> +             kfence_unprotect(meta->addr);
> +
> +     /* Calculate address for this allocation. */
> +     if (right)
> +             meta->addr += PAGE_SIZE - size;
> +     meta->addr = ALIGN_DOWN(meta->addr, cache->align);
> +
> +     /* Update remaining metadata. */
> +     metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED);
> +     /* Pairs with READ_ONCE() in kfence_shutdown_cache(). */
> +     WRITE_ONCE(meta->cache, cache);
> +     meta->size = right ? -size : size;
> +     for_each_canary(meta, set_canary_byte);
> +     virt_to_page(meta->addr)->slab_cache = cache;
> +
> +     raw_spin_unlock_irqrestore(&meta->lock, flags);
> +
> +     /* Memory initialization. */
> +
> +     /*
> +      * We check slab_want_init_on_alloc() ourselves, rather than letting
> +      * SL*B do the initialization, as otherwise we might overwrite KFENCE's
> +      * redzone.
> +      */
> +     addr = (void *)meta->addr;
> +     if (unlikely(slab_want_init_on_alloc(gfp, cache)))
> +             memzero_explicit(addr, size);
> +     if (cache->ctor)
> +             cache->ctor(addr);
> +
> +     if (CONFIG_KFENCE_FAULT_INJECTION && 
> !prandom_u32_max(CONFIG_KFENCE_FAULT_INJECTION))
> +             kfence_protect(meta->addr); /* Random "faults" by protecting 
> the object. */
> +
> +     atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCATED]);
> +     atomic_long_inc(&counters[KFENCE_COUNTER_ALLOCS]);
> +
> +     return addr;
> +}

...

> +
> +size_t kfence_ksize(const void *addr)
> +{
> +     const struct kfence_metadata *meta = addr_to_metadata((unsigned 
> long)addr);
> +
> +     /*
> +      * Read locklessly -- if there is a race with __kfence_alloc(), this
> +      * most certainly is either a use-after-free, or invalid access.
> +      */
> +     return meta ? abs(meta->size) : 0;
> +}
> +
> +void *kfence_object_start(const void *addr)
> +{
> +     const struct kfence_metadata *meta = addr_to_metadata((unsigned 
> long)addr);
> +
> +     /*
> +      * Read locklessly -- if there is a race with __kfence_alloc(), this
> +      * most certainly is either a use-after-free, or invalid access.

To my reading using "most certainly" makes this statement less clear

Read locklessly -- if there is a race with __kfence_alloc() this
is either a use-after-free or invalid access.

Same for other cases of that particular "most certainly".

> +      */
> +     return meta ? (void *)meta->addr : NULL;
> +}
> +
> +void __kfence_free(void *addr)
> +{
> +     struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
> +
> +     if (unlikely(meta->cache->flags & SLAB_TYPESAFE_BY_RCU))
> +             call_rcu(&meta->rcu_head, rcu_guarded_free);
> +     else
> +             kfence_guarded_free(addr, meta);
> +}
> +
> +bool kfence_handle_page_fault(unsigned long addr)
> +{
> +     const int page_index = (addr - (unsigned long)__kfence_pool) / 
> PAGE_SIZE;
> +     struct kfence_metadata *to_report = NULL;
> +     enum kfence_error_type error_type;
> +     unsigned long flags;
> +
> +     if (!is_kfence_address((void *)addr))
> +             return false;
> +
> +     if (!READ_ONCE(kfence_enabled)) /* If disabled at runtime ... */
> +             return kfence_unprotect(addr); /* ... unprotect and proceed. */
> +
> +     atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);
> +
> +     if (page_index % 2) {
> +             /* This is a redzone, report a buffer overflow. */
> +             struct kfence_metadata *meta = NULL;

Not need to set to NULL here as assigned 3 lines down.

> +             int distance = 0;
> +
> +             meta = addr_to_metadata(addr - PAGE_SIZE)

> +             if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) {
> +                     to_report = meta;
> +                     /* Data race ok; distance calculation approximate. */
> +                     distance = addr - data_race(meta->addr + 
> abs(meta->size));
> +             }
> +
> +             meta = addr_to_metadata(addr + PAGE_SIZE);
> +             if (meta && READ_ONCE(meta->state) == KFENCE_OBJECT_ALLOCATED) {
> +                     /* Data race ok; distance calculation approximate. */
> +                     if (!to_report || distance > data_race(meta->addr) - 
> addr)
> +                             to_report = meta;
> +             }
> +
> +             if (!to_report)
> +                     goto out;
> +
> +             raw_spin_lock_irqsave(&to_report->lock, flags);
> +             to_report->unprotected_page = addr;
> +             error_type = KFENCE_ERROR_OOB;
> +
> +             /*
> +              * If the object was freed before we took the look we can still
> +              * report this as an OOB -- the report will simply show the
> +              * stacktrace of the free as well.
> +              */
> +     } else {
> +             to_report = addr_to_metadata(addr);
> +             if (!to_report)
> +                     goto out;
> +
> +             raw_spin_lock_irqsave(&to_report->lock, flags);
> +             error_type = KFENCE_ERROR_UAF;
> +             /*
> +              * We may race with __kfence_alloc(), and it is possible that a
> +              * freed object may be reallocated. We simply report this as a
> +              * use-after-free, with the stack trace showing the place where
> +              * the object was re-allocated.
> +              */
> +     }
> +
> +out:
> +     if (to_report) {
> +             kfence_report_error(addr, to_report, error_type);
> +             raw_spin_unlock_irqrestore(&to_report->lock, flags);
> +     } else {
> +             /* This may be a UAF or OOB access, but we can't be sure. */
> +             kfence_report_error(addr, NULL, KFENCE_ERROR_INVALID);
> +     }
> +
> +     return kfence_unprotect(addr); /* Unprotect and let access proceed. */
> +}
...

Reply via email to