Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-10-05 Thread Jann Horn
On Mon, Oct 5, 2020 at 6:01 PM Alexander Potapenko  wrote:
>
> On Tue, Sep 29, 2020 at 5:06 PM Mark Rutland  wrote:
> >
> > On Tue, Sep 29, 2020 at 04:51:29PM +0200, Marco Elver wrote:
> > > On Tue, 29 Sep 2020 at 16:24, Mark Rutland  wrote:
> > > [...]
> > > >
> > > > From other sub-threads it sounds like these addresses are not part of
> > > > the linear/direct map. Having kmalloc return addresses outside of the
> > > > linear map is going to break anything that relies on virt<->phys
> > > > conversions, and is liable to make DMA corrupt memory. There were
> > > > problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> > > > separate from kmalloc().
> > > >
> > > > Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
> > > >
> > > > I strongly suspect this isn't going to be safe unless you always use an
> > > > in-place carevout from the linear map (which could be the linear alias
> > > > of a static carevout).
> > >
> > > That's an excellent point, thank you! Indeed, on arm64, a version with
> > > naive static-pool screams with CONFIG_DEBUG_VIRTUAL.
> > >
> > > We'll try to put together an arm64 version using a carveout as you 
> > > suggest.
> >
> > Great, thanks!
> >
> > Just to be clear, the concerns for DMA and virt<->phys conversions also
> > apply to x86 (the x86 virt<->phys conversion behaviour is more forgiving
> > in the common case, but still has cases that can go wrong).
>
> To clarify, shouldn't kmalloc/kmem_cache allocations used with DMA be
> allocated with explicit GFP_DMA?
> If so, how practical would it be to just skip such allocations in
> KFENCE allocator?

AFAIK GFP_DMA doesn't really mean "I will use this allocation for
DMA"; it means "I will use this allocation for DMA using some ancient
hardware (e.g. stuff on the ISA bus?) that only supports 16-bit
physical addresses (or maybe different limits on other
architectures)".
There's also GFP_DMA32, which means the same thing, except with 32-bit
physical addresses.

You can see in e.g. __dma_direct_alloc_pages() that the GFP_DMA32 and
GFP_DMA flags are only used if the hardware can't address the full
physical address space supported by the CPU.


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-10-05 Thread Alexander Potapenko
On Tue, Sep 29, 2020 at 5:06 PM Mark Rutland  wrote:
>
> On Tue, Sep 29, 2020 at 04:51:29PM +0200, Marco Elver wrote:
> > On Tue, 29 Sep 2020 at 16:24, Mark Rutland  wrote:
> > [...]
> > >
> > > From other sub-threads it sounds like these addresses are not part of
> > > the linear/direct map. Having kmalloc return addresses outside of the
> > > linear map is going to break anything that relies on virt<->phys
> > > conversions, and is liable to make DMA corrupt memory. There were
> > > problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> > > separate from kmalloc().
> > >
> > > Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
> > >
> > > I strongly suspect this isn't going to be safe unless you always use an
> > > in-place carevout from the linear map (which could be the linear alias
> > > of a static carevout).
> >
> > That's an excellent point, thank you! Indeed, on arm64, a version with
> > naive static-pool screams with CONFIG_DEBUG_VIRTUAL.
> >
> > We'll try to put together an arm64 version using a carveout as you suggest.
>
> Great, thanks!
>
> Just to be clear, the concerns for DMA and virt<->phys conversions also
> apply to x86 (the x86 virt<->phys conversion behaviour is more forgiving
> in the common case, but still has cases that can go wrong).

To clarify, shouldn't kmalloc/kmem_cache allocations used with DMA be
allocated with explicit GFP_DMA?
If so, how practical would it be to just skip such allocations in
KFENCE allocator?


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-10-01 Thread Mark Rutland
On Tue, Sep 29, 2020 at 05:51:58PM +0200, Alexander Potapenko wrote:
> On Tue, Sep 29, 2020 at 4:24 PM Mark Rutland  wrote:
> >
> > On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote:
> > > From: Alexander Potapenko 
> > >
> > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > > low-overhead sampling-based memory safety error detector of heap
> > > use-after-free, invalid-free, and out-of-bounds access errors.
> > >
> > > KFENCE is designed to be enabled in production kernels, and has near
> > > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > > for precision. The main motivation behind KFENCE's design, is that with
> > > enough total uptime KFENCE will detect bugs in code paths not typically
> > > exercised by non-production test workloads. One way to quickly achieve a
> > > large enough total uptime is when the tool is deployed across a large
> > > fleet of machines.
> > >
> > > KFENCE objects each reside on a dedicated page, at either the left or
> > > right page boundaries. The pages to the left and right of the object
> > > page are "guard pages", whose attributes are changed to a protected
> > > state, and cause page faults on any attempted access to them. Such page
> > > faults are then intercepted by KFENCE, which handles the fault
> > > gracefully by reporting a memory access error. To detect out-of-bounds
> > > writes to memory within the object's page itself, KFENCE also uses
> > > pattern-based redzones. The following figure illustrates the page
> > > layout:
> > >
> > >   ---+---+---+---+---+---+---
> > >  | x | O :   | x |   : O | x |
> > >  | x | B :   | x |   : B | x |
> > >  | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> > >  | x | E :  ZONE | x |  ZONE : E | x |
> > >  | x | C :   | x |   : C | x |
> > >  | x | T :   | x |   : T | x |
> > >   ---+---+---+---+---+---+---
> > >
> > > Guarded allocations are set up based on a sample interval (can be set
> > > via kfence.sample_interval). After expiration of the sample interval, a
> > > guarded allocation from the KFENCE object pool is returned to the main
> > > allocator (SLAB or SLUB). At this point, the timer is reset, and the
> > > next allocation is set up after the expiration of the interval.
> >
> > From other sub-threads it sounds like these addresses are not part of
> > the linear/direct map.
> For x86 these addresses belong to .bss, i.e. "kernel text mapping"
> section, isn't that the linear map?

No; the "linear map" is the "direct mapping" on x86, and the "image" or
"kernel text mapping" is a distinct VA region. The image mapping aliases
(i.e. uses the same physical pages as) a portion of the linear map, and
every page in the linear map has a struct page.

Fon the x86_64 ivirtual memory layout, see:

https://www.kernel.org/doc/html/latest/x86/x86_64/mm.html

Originally, the kernel image lived in the linear map, but it was split
out into a distinct VA range (among other things) to permit KASLR.  When
that split was made, the x86 virt_to_*() helpers were updated to detect
when they were passed a kernel image address, and automatically fix that
up as-if they'd been handed the linear map alias of that address.

For going one-way from virt->{phys,page} that works ok, but it doesn't
survive the round-trip, and introduces redundant work into each
virt_to_*() call.

As it was largely arch code that was using image addresses, we didn't
bother with the fixup on arm64, as we preferred the stronger warning. At
the time I was also under the impression that on x86 they wanted to get
rid of the automatic fixup, but that doesn't seem to have happened.

Thanks,
Mark.


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Alexander Potapenko
On Tue, Sep 29, 2020 at 4:24 PM Mark Rutland  wrote:
>
> On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote:
> > From: Alexander Potapenko 
> >
> > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > low-overhead sampling-based memory safety error detector of heap
> > use-after-free, invalid-free, and out-of-bounds access errors.
> >
> > KFENCE is designed to be enabled in production kernels, and has near
> > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > for precision. The main motivation behind KFENCE's design, is that with
> > enough total uptime KFENCE will detect bugs in code paths not typically
> > exercised by non-production test workloads. One way to quickly achieve a
> > large enough total uptime is when the tool is deployed across a large
> > fleet of machines.
> >
> > KFENCE objects each reside on a dedicated page, at either the left or
> > right page boundaries. The pages to the left and right of the object
> > page are "guard pages", whose attributes are changed to a protected
> > state, and cause page faults on any attempted access to them. Such page
> > faults are then intercepted by KFENCE, which handles the fault
> > gracefully by reporting a memory access error. To detect out-of-bounds
> > writes to memory within the object's page itself, KFENCE also uses
> > pattern-based redzones. The following figure illustrates the page
> > layout:
> >
> >   ---+---+---+---+---+---+---
> >  | x | O :   | x |   : O | x |
> >  | x | B :   | x |   : B | x |
> >  | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> >  | x | E :  ZONE | x |  ZONE : E | x |
> >  | x | C :   | x |   : C | x |
> >  | x | T :   | x |   : T | x |
> >   ---+---+---+---+---+---+---
> >
> > Guarded allocations are set up based on a sample interval (can be set
> > via kfence.sample_interval). After expiration of the sample interval, a
> > guarded allocation from the KFENCE object pool is returned to the main
> > allocator (SLAB or SLUB). At this point, the timer is reset, and the
> > next allocation is set up after the expiration of the interval.
>
> From other sub-threads it sounds like these addresses are not part of
> the linear/direct map.
For x86 these addresses belong to .bss, i.e. "kernel text mapping"
section, isn't that the linear map?
I also don't see lm_alias being used much outside arm64 code.

> Having kmalloc return addresses outside of the
> linear map is going to break anything that relies on virt<->phys
> conversions, and is liable to make DMA corrupt memory. There were
> problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> separate from kmalloc().
>
> Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.

Just checked - it doesn't scream on x86.

> I strongly suspect this isn't going to be safe unless you always use an
> in-place carevout from the linear map (which could be the linear alias
> of a static carevout).
>
> [...]
>
> > +static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t 
> > size, gfp_t flags)
> > +{
> > + return static_branch_unlikely(_allocation_key) ? 
> > __kfence_alloc(s, size, flags) :
> > +   NULL;
> > +}
>
> Minor (unrelated) nit, but this would be easier to read as:
>
> static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, 
> gfp_t flags)
> {
> if (static_branch_unlikely(_allocation_key))
> return __kfence_alloc(s, size, flags);
> return NULL;
> }
>
> Thanks,
> Mark.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Mark Rutland
On Tue, Sep 29, 2020 at 04:51:29PM +0200, Marco Elver wrote:
> On Tue, 29 Sep 2020 at 16:24, Mark Rutland  wrote:
> [...]
> >
> > From other sub-threads it sounds like these addresses are not part of
> > the linear/direct map. Having kmalloc return addresses outside of the
> > linear map is going to break anything that relies on virt<->phys
> > conversions, and is liable to make DMA corrupt memory. There were
> > problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> > separate from kmalloc().
> >
> > Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
> >
> > I strongly suspect this isn't going to be safe unless you always use an
> > in-place carevout from the linear map (which could be the linear alias
> > of a static carevout).
> 
> That's an excellent point, thank you! Indeed, on arm64, a version with
> naive static-pool screams with CONFIG_DEBUG_VIRTUAL.
> 
> We'll try to put together an arm64 version using a carveout as you suggest.

Great, thanks!

Just to be clear, the concerns for DMA and virt<->phys conversions also
apply to x86 (the x86 virt<->phys conversion behaviour is more forgiving
in the common case, but still has cases that can go wrong).

Other than the code to initialize the page tables for the careveout, I
think the carevout code can be geenric.

Thanks,
Mark.


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Marco Elver
On Tue, 29 Sep 2020 at 16:24, Mark Rutland  wrote:
[...]
>
> From other sub-threads it sounds like these addresses are not part of
> the linear/direct map. Having kmalloc return addresses outside of the
> linear map is going to break anything that relies on virt<->phys
> conversions, and is liable to make DMA corrupt memory. There were
> problems of that sort with VMAP_STACK, and this is why kvmalloc() is
> separate from kmalloc().
>
> Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.
>
> I strongly suspect this isn't going to be safe unless you always use an
> in-place carevout from the linear map (which could be the linear alias
> of a static carevout).

That's an excellent point, thank you! Indeed, on arm64, a version with
naive static-pool screams with CONFIG_DEBUG_VIRTUAL.

We'll try to put together an arm64 version using a carveout as you suggest.

> [...]
>
> > +static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t 
> > size, gfp_t flags)
> > +{
> > + return static_branch_unlikely(_allocation_key) ? 
> > __kfence_alloc(s, size, flags) :
> > +   NULL;
> > +}
>
> Minor (unrelated) nit, but this would be easier to read as:
>
> static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, 
> gfp_t flags)
> {
> if (static_branch_unlikely(_allocation_key))
> return __kfence_alloc(s, size, flags);
> return NULL;
> }

Will fix for v5.

Thanks,
-- Marco


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Mark Rutland
On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote:
> From: Alexander Potapenko 
> 
> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors.
> 
> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.
> 
> KFENCE objects each reside on a dedicated page, at either the left or
> right page boundaries. The pages to the left and right of the object
> page are "guard pages", whose attributes are changed to a protected
> state, and cause page faults on any attempted access to them. Such page
> faults are then intercepted by KFENCE, which handles the fault
> gracefully by reporting a memory access error. To detect out-of-bounds
> writes to memory within the object's page itself, KFENCE also uses
> pattern-based redzones. The following figure illustrates the page
> layout:
> 
>   ---+---+---+---+---+---+---
>  | x | O :   | x |   : O | x |
>  | x | B :   | x |   : B | x |
>  | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
>  | x | E :  ZONE | x |  ZONE : E | x |
>  | x | C :   | x |   : C | x |
>  | x | T :   | x |   : T | x |
>   ---+---+---+---+---+---+---
> 
> Guarded allocations are set up based on a sample interval (can be set
> via kfence.sample_interval). After expiration of the sample interval, a
> guarded allocation from the KFENCE object pool is returned to the main
> allocator (SLAB or SLUB). At this point, the timer is reset, and the
> next allocation is set up after the expiration of the interval.

>From other sub-threads it sounds like these addresses are not part of
the linear/direct map. Having kmalloc return addresses outside of the
linear map is going to break anything that relies on virt<->phys
conversions, and is liable to make DMA corrupt memory. There were
problems of that sort with VMAP_STACK, and this is why kvmalloc() is
separate from kmalloc().

Have you tested with CONFIG_DEBUG_VIRTUAL? I'd expect that to scream.

I strongly suspect this isn't going to be safe unless you always use an
in-place carevout from the linear map (which could be the linear alias
of a static carevout).

[...]

> +static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, 
> gfp_t flags)
> +{
> + return static_branch_unlikely(_allocation_key) ? 
> __kfence_alloc(s, size, flags) :
> +   NULL;
> +}

Minor (unrelated) nit, but this would be easier to read as:

static __always_inline void *kfence_alloc(struct kmem_cache *s, size_t size, 
gfp_t flags)
{
if (static_branch_unlikely(_allocation_key))
return __kfence_alloc(s, size, flags);
return NULL;
}

Thanks,
Mark.


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Andrey Konovalov
On Tue, Sep 29, 2020 at 3:49 PM Marco Elver  wrote:
>
> On Tue, 29 Sep 2020 at 15:48, Andrey Konovalov  wrote:
> > On Tue, Sep 29, 2020 at 3:11 PM Marco Elver  wrote:
> > >
> > > On Tue, Sep 29, 2020 at 02:42PM +0200, Andrey Konovalov wrote:
> > > [...]
> > > > > +*/
> > > > > +   index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 
> > > > > 2) - 1;
> > > >
> > > > Why do we subtract 1 here? We do have the metadata entry reserved for 
> > > > something?
> > >
> > > Above the declaration of __kfence_pool it says:
> > >
> > > * We allocate an even number of pages, as it simplifies 
> > > calculations to map
> > > * address to metadata indices; effectively, the very first page 
> > > serves as an
> > > * extended guard page, but otherwise has no special purpose.
> > >
> > > Hopefully that clarifies the `- 1` here.
> >
> > So there are two guard pages at the beginning and only then a page
> > that holds an object?
>
> Yes, correct.

OK, I see. This isn't directly clear from the comment though, at least for me :)


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Marco Elver
On Tue, 29 Sep 2020 at 15:48, Andrey Konovalov  wrote:
> On Tue, Sep 29, 2020 at 3:11 PM Marco Elver  wrote:
> >
> > On Tue, Sep 29, 2020 at 02:42PM +0200, Andrey Konovalov wrote:
> > [...]
> > > > +*/
> > > > +   index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) 
> > > > - 1;
> > >
> > > Why do we subtract 1 here? We do have the metadata entry reserved for 
> > > something?
> >
> > Above the declaration of __kfence_pool it says:
> >
> > * We allocate an even number of pages, as it simplifies 
> > calculations to map
> > * address to metadata indices; effectively, the very first page 
> > serves as an
> > * extended guard page, but otherwise has no special purpose.
> >
> > Hopefully that clarifies the `- 1` here.
>
> So there are two guard pages at the beginning and only then a page
> that holds an object?

Yes, correct.


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Andrey Konovalov
On Tue, Sep 29, 2020 at 3:11 PM Marco Elver  wrote:
>
> On Tue, Sep 29, 2020 at 02:42PM +0200, Andrey Konovalov wrote:
> [...]
> > > +*/
> > > +   index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) - 
> > > 1;
> >
> > Why do we subtract 1 here? We do have the metadata entry reserved for 
> > something?
>
> Above the declaration of __kfence_pool it says:
>
> * We allocate an even number of pages, as it simplifies calculations 
> to map
> * address to metadata indices; effectively, the very first page 
> serves as an
> * extended guard page, but otherwise has no special purpose.
>
> Hopefully that clarifies the `- 1` here.

So there are two guard pages at the beginning and only then a page
that holds an object?


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Marco Elver
On Tue, Sep 29, 2020 at 02:42PM +0200, Andrey Konovalov wrote:
[...]
> > +*/
> > +   index = (addr - (unsigned long)__kfence_pool) / (PAGE_SIZE * 2) - 1;
> 
> Why do we subtract 1 here? We do have the metadata entry reserved for 
> something?

Above the declaration of __kfence_pool it says:

* We allocate an even number of pages, as it simplifies calculations to 
map
* address to metadata indices; effectively, the very first page serves 
as an
* extended guard page, but otherwise has no special purpose.

Hopefully that clarifies the `- 1` here.

[...]
> > +   /* Allocation and free stack information. */
> > +   int num_alloc_stack;
> > +   int num_free_stack;
> > +   unsigned long alloc_stack[KFENCE_STACK_DEPTH];
> > +   unsigned long free_stack[KFENCE_STACK_DEPTH];
> 
> It was a concious decision to not use stackdepot, right? Perhaps it
> makes sense to document the reason somewhere.

Yes; we want to avoid the dynamic allocations that stackdepot does.

[...]

Thanks,
-- Marco


Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-29 Thread Andrey Konovalov
On Mon, Sep 21, 2020 at 3:26 PM Marco Elver  wrote:
>
> From: Alexander Potapenko 
>
> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors.
>
> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.
>
> KFENCE objects each reside on a dedicated page, at either the left or
> right page boundaries. The pages to the left and right of the object
> page are "guard pages", whose attributes are changed to a protected
> state, and cause page faults on any attempted access to them. Such page
> faults are then intercepted by KFENCE, which handles the fault
> gracefully by reporting a memory access error. To detect out-of-bounds
> writes to memory within the object's page itself, KFENCE also uses
> pattern-based redzones. The following figure illustrates the page
> layout:
>
>   ---+---+---+---+---+---+---
>  | x | O :   | x |   : O | x |
>  | x | B :   | x |   : B | x |
>  | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
>  | x | E :  ZONE | x |  ZONE : E | x |
>  | x | C :   | x |   : C | x |
>  | x | T :   | x |   : T | x |
>   ---+---+---+---+---+---+---
>
> Guarded allocations are set up based on a sample interval (can be set
> via kfence.sample_interval). After expiration of the sample interval, a
> guarded allocation from the KFENCE object pool is returned to the main
> allocator (SLAB or SLUB). At this point, the timer is reset, and the
> next allocation is set up after the expiration of the interval.
>
> To enable/disable a KFENCE allocation through the main allocator's
> fast-path without overhead, KFENCE relies on static branches via the
> static keys infrastructure. The static branch is toggled to redirect the
> allocation to KFENCE. To date, we have verified by running synthetic
> benchmarks (sysbench I/O workloads) that a kernel compiled with KFENCE
> is performance-neutral compared to the non-KFENCE baseline.
>
> For more details, see Documentation/dev-tools/kfence.rst (added later in
> the series).
>
> Reviewed-by: Dmitry Vyukov 
> Co-developed-by: Marco Elver 
> Signed-off-by: Marco Elver 
> Signed-off-by: Alexander Potapenko 
> ---
> v3:
> * Reports by SeongJae Park:
>   * Remove reference to Documentation/dev-tools/kfence.rst.
>   * Remove redundant braces.
>   * Use CONFIG_KFENCE_NUM_OBJECTS instead of ARRAY_SIZE(...).
>   * Align some comments.
> * Add figure from Documentation/dev-tools/kfence.rst added later in
>   series to patch description.
>
> v2:
> * Add missing __printf attribute to seq_con_printf, and fix new warning.
>   [reported by kernel test robot ]
> * Fix up some comments [reported by Jonathan Cameron].
> * Remove 2 cases of redundant stack variable initialization
>   [reported by Jonathan Cameron].
> * Fix printf format [reported by kernel test robot ].
> * Print (in kfence-#nn) after address, to more clearly establish link
>   between first and second stacktrace [reported by Andrey Konovalov].
> * Make choice between KASAN and KFENCE clearer in Kconfig help text
>   [suggested by Dave Hansen].
> * Document CONFIG_KFENCE_SAMPLE_INTERVAL=0.
> * Shorten memory corruption report line length.
> * Make /sys/module/kfence/parameters/sample_interval root-writable for
>   all builds (to enable debugging, automatic dynamic tweaking).
> * Reports by Dmitry Vyukov:
>   * Do not store negative size for right-located objects
>   * Only cache-align addresses of right-located objects.
>   * Run toggle_allocation_gate() after KFENCE is enabled.
>   * Add empty line between allocation and free stacks.
>   * Add comment about SLAB_TYPESAFE_BY_RCU.
>   * Also skip internals for allocation/free stacks.
>   * s/KFENCE_FAULT_INJECTION/KFENCE_STRESS_TEST_FAULTS/ as FAULT_INJECTION
> is already overloaded in different contexts.
>   * Parenthesis for macro variable.
>   * Lower max of KFENCE_NUM_OBJECTS config variable.
> ---
>  MAINTAINERS|  11 +
>  include/linux/kfence.h | 174 ++
>  init/main.c|   2 +
>  lib/Kconfig.debug  |   1 +
>  lib/Kconfig.kfence |  63 
>  mm/Makefile|   1 +
>  mm/kfence/Makefile |   3 +
>  mm/kfence/core.c   | 733 +
>  mm/kfence/kfence.h | 102 ++
>  

Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-25 Thread Marco Elver
On Fri, 25 Sep 2020 at 13:24, 'SeongJae Park' via kasan-dev
 wrote:
>
> On Mon, 21 Sep 2020 15:26:02 +0200 Marco Elver  wrote:
>
> > From: Alexander Potapenko 
> >
> > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > low-overhead sampling-based memory safety error detector of heap
> > use-after-free, invalid-free, and out-of-bounds access errors.
> >
> > KFENCE is designed to be enabled in production kernels, and has near
> > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > for precision. The main motivation behind KFENCE's design, is that with
> > enough total uptime KFENCE will detect bugs in code paths not typically
> > exercised by non-production test workloads. One way to quickly achieve a
> > large enough total uptime is when the tool is deployed across a large
> > fleet of machines.
> >
> > KFENCE objects each reside on a dedicated page, at either the left or
> > right page boundaries. The pages to the left and right of the object
> > page are "guard pages", whose attributes are changed to a protected
> > state, and cause page faults on any attempted access to them. Such page
> > faults are then intercepted by KFENCE, which handles the fault
> > gracefully by reporting a memory access error. To detect out-of-bounds
> > writes to memory within the object's page itself, KFENCE also uses
> > pattern-based redzones. The following figure illustrates the page
> > layout:
> >
> >   ---+---+---+---+---+---+---
> >  | x | O :   | x |   : O | x |
> >  | x | B :   | x |   : B | x |
> >  | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
> >  | x | E :  ZONE | x |  ZONE : E | x |
> >  | x | C :   | x |   : C | x |
> >  | x | T :   | x |   : T | x |
> >   ---+---+---+---+---+---+---
> >
> > Guarded allocations are set up based on a sample interval (can be set
> > via kfence.sample_interval). After expiration of the sample interval, a
> > guarded allocation from the KFENCE object pool is returned to the main
> > allocator (SLAB or SLUB). At this point, the timer is reset, and the
> > next allocation is set up after the expiration of the interval.
> >
> > To enable/disable a KFENCE allocation through the main allocator's
> > fast-path without overhead, KFENCE relies on static branches via the
> > static keys infrastructure. The static branch is toggled to redirect the
> > allocation to KFENCE. To date, we have verified by running synthetic
> > benchmarks (sysbench I/O workloads) that a kernel compiled with KFENCE
> > is performance-neutral compared to the non-KFENCE baseline.
> >
> > For more details, see Documentation/dev-tools/kfence.rst (added later in
> > the series).
> >
> > Reviewed-by: Dmitry Vyukov 
> > Co-developed-by: Marco Elver 
> > Signed-off-by: Marco Elver 
> > Signed-off-by: Alexander Potapenko 
> > ---
> > v3:
> > * Reports by SeongJae Park:
> >   * Remove reference to Documentation/dev-tools/kfence.rst.
> >   * Remove redundant braces.
> >   * Use CONFIG_KFENCE_NUM_OBJECTS instead of ARRAY_SIZE(...).
> >   * Align some comments.
> > * Add figure from Documentation/dev-tools/kfence.rst added later in
> >   series to patch description.
> >
> > v2:
> > * Add missing __printf attribute to seq_con_printf, and fix new warning.
> >   [reported by kernel test robot ]
> > * Fix up some comments [reported by Jonathan Cameron].
> > * Remove 2 cases of redundant stack variable initialization
> >   [reported by Jonathan Cameron].
> > * Fix printf format [reported by kernel test robot ].
> > * Print (in kfence-#nn) after address, to more clearly establish link
> >   between first and second stacktrace [reported by Andrey Konovalov].
> > * Make choice between KASAN and KFENCE clearer in Kconfig help text
> >   [suggested by Dave Hansen].
> > * Document CONFIG_KFENCE_SAMPLE_INTERVAL=0.
> > * Shorten memory corruption report line length.
> > * Make /sys/module/kfence/parameters/sample_interval root-writable for
> >   all builds (to enable debugging, automatic dynamic tweaking).
> > * Reports by Dmitry Vyukov:
> >   * Do not store negative size for right-located objects
> >   * Only cache-align addresses of right-located objects.
> >   * Run toggle_allocation_gate() after KFENCE is enabled.
> >   * Add empty line between allocation and free stacks.
> >   * Add comment about SLAB_TYPESAFE_BY_RCU.
> >   * Also skip internals for allocation/free stacks.
> >   * s/KFENCE_FAULT_INJECTION/KFENCE_STRESS_TEST_FAULTS/ as FAULT_INJECTION
> > is already overloaded in different contexts.
> >   * Parenthesis for macro variable.
> >   * Lower max of KFENCE_NUM_OBJECTS config variable.
> > ---
> >  MAINTAINERS|  11 +
> >  include/linux/kfence.h | 174 ++
> >  init/main.c|   2 +

Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

2020-09-25 Thread SeongJae Park
On Mon, 21 Sep 2020 15:26:02 +0200 Marco Elver  wrote:

> From: Alexander Potapenko 
> 
> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors.
> 
> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.
> 
> KFENCE objects each reside on a dedicated page, at either the left or
> right page boundaries. The pages to the left and right of the object
> page are "guard pages", whose attributes are changed to a protected
> state, and cause page faults on any attempted access to them. Such page
> faults are then intercepted by KFENCE, which handles the fault
> gracefully by reporting a memory access error. To detect out-of-bounds
> writes to memory within the object's page itself, KFENCE also uses
> pattern-based redzones. The following figure illustrates the page
> layout:
> 
>   ---+---+---+---+---+---+---
>  | x | O :   | x |   : O | x |
>  | x | B :   | x |   : B | x |
>  | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
>  | x | E :  ZONE | x |  ZONE : E | x |
>  | x | C :   | x |   : C | x |
>  | x | T :   | x |   : T | x |
>   ---+---+---+---+---+---+---
> 
> Guarded allocations are set up based on a sample interval (can be set
> via kfence.sample_interval). After expiration of the sample interval, a
> guarded allocation from the KFENCE object pool is returned to the main
> allocator (SLAB or SLUB). At this point, the timer is reset, and the
> next allocation is set up after the expiration of the interval.
> 
> To enable/disable a KFENCE allocation through the main allocator's
> fast-path without overhead, KFENCE relies on static branches via the
> static keys infrastructure. The static branch is toggled to redirect the
> allocation to KFENCE. To date, we have verified by running synthetic
> benchmarks (sysbench I/O workloads) that a kernel compiled with KFENCE
> is performance-neutral compared to the non-KFENCE baseline.
> 
> For more details, see Documentation/dev-tools/kfence.rst (added later in
> the series).
> 
> Reviewed-by: Dmitry Vyukov 
> Co-developed-by: Marco Elver 
> Signed-off-by: Marco Elver 
> Signed-off-by: Alexander Potapenko 
> ---
> v3:
> * Reports by SeongJae Park:
>   * Remove reference to Documentation/dev-tools/kfence.rst.
>   * Remove redundant braces.
>   * Use CONFIG_KFENCE_NUM_OBJECTS instead of ARRAY_SIZE(...).
>   * Align some comments.
> * Add figure from Documentation/dev-tools/kfence.rst added later in
>   series to patch description.
> 
> v2:
> * Add missing __printf attribute to seq_con_printf, and fix new warning.
>   [reported by kernel test robot ]
> * Fix up some comments [reported by Jonathan Cameron].
> * Remove 2 cases of redundant stack variable initialization
>   [reported by Jonathan Cameron].
> * Fix printf format [reported by kernel test robot ].
> * Print (in kfence-#nn) after address, to more clearly establish link
>   between first and second stacktrace [reported by Andrey Konovalov].
> * Make choice between KASAN and KFENCE clearer in Kconfig help text
>   [suggested by Dave Hansen].
> * Document CONFIG_KFENCE_SAMPLE_INTERVAL=0.
> * Shorten memory corruption report line length.
> * Make /sys/module/kfence/parameters/sample_interval root-writable for
>   all builds (to enable debugging, automatic dynamic tweaking).
> * Reports by Dmitry Vyukov:
>   * Do not store negative size for right-located objects
>   * Only cache-align addresses of right-located objects.
>   * Run toggle_allocation_gate() after KFENCE is enabled.
>   * Add empty line between allocation and free stacks.
>   * Add comment about SLAB_TYPESAFE_BY_RCU.
>   * Also skip internals for allocation/free stacks.
>   * s/KFENCE_FAULT_INJECTION/KFENCE_STRESS_TEST_FAULTS/ as FAULT_INJECTION
> is already overloaded in different contexts.
>   * Parenthesis for macro variable.
>   * Lower max of KFENCE_NUM_OBJECTS config variable.
> ---
>  MAINTAINERS|  11 +
>  include/linux/kfence.h | 174 ++
>  init/main.c|   2 +
>  lib/Kconfig.debug  |   1 +
>  lib/Kconfig.kfence |  63 
>  mm/Makefile|   1 +
>  mm/kfence/Makefile |   3 +
>  mm/kfence/core.c   | 733 +
>  mm/kfence/kfence.h | 102