Re: [RFC PATCH 3/6] mm, arm64: untag user addresses in memory syscalls

2018-03-09 Thread Evgenii Stepanov
On Fri, Mar 9, 2018 at 9:31 AM, Andrey Konovalov  wrote:
> On Fri, Mar 9, 2018 at 4:53 PM, Catalin Marinas  
> wrote:
>> On Fri, Mar 09, 2018 at 03:02:01PM +0100, Andrey Konovalov wrote:
>>> Memory subsystem syscalls accept user addresses as arguments, but don't use
>>> copy_from_user and other similar functions, so we need to handle this case
>>> separately.
>>>
>>> Untag user pointers passed to madvise, mbind, get_mempolicy, mincore,
>>> mlock, mlock2, brk, mmap_pgoff, old_mmap, munmap, remap_file_pages,
>>> mprotect, pkey_mprotect, mremap and msync.
>>>
>>> Signed-off-by: Andrey Konovalov 
>>
>> Please keep the cc list small (maybe linux-arch, linux-arm-kernel) as
>> I'm sure some lists would consider this spam.
>
> OK.
>
>>
>>>  mm/madvise.c   | 2 ++
>>>  mm/mempolicy.c | 6 ++
>>>  mm/mincore.c   | 2 ++
>>>  mm/mlock.c | 5 +
>>>  mm/mmap.c  | 9 +
>>>  mm/mprotect.c  | 2 ++
>>>  mm/mremap.c| 2 ++
>>>  mm/msync.c | 3 +++
>>
>> I'm not yet convinced these functions need to allow tagged pointers.
>> They are not doing memory accesses but rather dealing with the memory
>> range, hence an untagged pointer is better suited. There is probably a
>> reason why the "start" argument is "unsigned long" vs "void __user *"
>> (in the kernel, not the man page).
>
> So that would make the user to untag pointers before passing to these 
> syscalls.
>
> Evgeniy, would that be possible to untag pointers in HWASan before
> using memory subsystem syscalls? Is there a reason for untagging them
> in the kernel?

Generally, no. It's possible to intercept a libc call using symbol
interposition, but I don't know how to rewrite arguments of a raw
system call other than through ptrace, and that creates more problems
than it solves.

AFAIU, it's valid for a program to pass an address obtained from
malloc or, better, posix_memalign to an mm syscall like mprotect().
These arguments are pointers from the userspace point of view.


Re: [PATCH RFC v3 22/35] arm64: mte: Enable tag storage if CMA areas have been activated

2024-02-02 Thread Evgenii Stepanov
On Thu, Jan 25, 2024 at 8:44 AM Alexandru Elisei
 wrote:
>
> Before enabling MTE tag storage management, make sure that the CMA areas
> have been successfully activated. If a CMA area fails activation, the pages
> are kept as reserved. Reserved pages are never used by the page allocator.
>
> If this happens, the kernel would have to manage tag storage only for some
> of the memory, but not for all memory, and that would make the code
> unreasonably complicated.
>
> Choose to disable tag storage management altogether if a CMA area fails to
> be activated.
>
> Signed-off-by: Alexandru Elisei 
> ---
>
> Changes since v2:
>
> * New patch.
>
>  arch/arm64/include/asm/mte_tag_storage.h | 12 ++
>  arch/arm64/kernel/mte_tag_storage.c  | 50 
>  2 files changed, 62 insertions(+)
>
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> b/arch/arm64/include/asm/mte_tag_storage.h
> index 3c2cd29e053e..7b3f6bff8e6f 100644
> --- a/arch/arm64/include/asm/mte_tag_storage.h
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -6,8 +6,20 @@
>  #define __ASM_MTE_TAG_STORAGE_H
>
>  #ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +
> +DECLARE_STATIC_KEY_FALSE(tag_storage_enabled_key);
> +
> +static inline bool tag_storage_enabled(void)
> +{
> +   return static_branch_likely(&tag_storage_enabled_key);
> +}
> +
>  void mte_init_tag_storage(void);
>  #else
> +static inline bool tag_storage_enabled(void)
> +{
> +   return false;
> +}
>  static inline void mte_init_tag_storage(void)
>  {
>  }
> diff --git a/arch/arm64/kernel/mte_tag_storage.c 
> b/arch/arm64/kernel/mte_tag_storage.c
> index 9a1a8a45171e..d58c68b4a849 100644
> --- a/arch/arm64/kernel/mte_tag_storage.c
> +++ b/arch/arm64/kernel/mte_tag_storage.c
> @@ -19,6 +19,8 @@
>
>  #include 
>
> +__ro_after_init DEFINE_STATIC_KEY_FALSE(tag_storage_enabled_key);
> +
>  struct tag_region {
> struct range mem_range; /* Memory associated with the tag storage, in 
> PFNs. */
> struct range tag_range; /* Tag storage memory, in PFNs. */
> @@ -314,3 +316,51 @@ void __init mte_init_tag_storage(void)
> num_tag_regions = 0;
> pr_info("MTE tag storage region management disabled");
>  }
> +
> +static int __init mte_enable_tag_storage(void)
> +{
> +   struct range *tag_range;
> +   struct cma *cma;
> +   int i, ret;
> +
> +   if (num_tag_regions == 0)
> +   return 0;
> +
> +   for (i = 0; i < num_tag_regions; i++) {
> +   tag_range = &tag_regions[i].tag_range;
> +   cma = tag_regions[i].cma;
> +   /*
> +* CMA will keep the pages as reserved when the region fails
> +* activation.
> +*/
> +   if (PageReserved(pfn_to_page(tag_range->start)))
> +   goto out_disabled;
> +   }
> +
> +   static_branch_enable(&tag_storage_enabled_key);
> +   pr_info("MTE tag storage region management enabled");
> +
> +   return 0;
> +
> +out_disabled:
> +   for (i = 0; i < num_tag_regions; i++) {
> +   tag_range = &tag_regions[i].tag_range;
> +   cma = tag_regions[i].cma;
> +
> +   if (PageReserved(pfn_to_page(tag_range->start)))
> +   continue;
> +
> +   /* Try really hard to reserve the tag storage. */
> +   ret = cma_alloc(cma, range_len(tag_range), 8, true);
> +   /*
> +* Tag storage is still in use for data, memory and/or tag
> +* corruption will ensue.
> +*/
> +   WARN_ON_ONCE(ret);

cma_alloc returns (page *), so this condition needs to be inverted,
and the type of `ret` changed.
Not sure how it slipped through, this is a compile error with clang.

> +   }
> +   num_tag_regions = 0;
> +   pr_info("MTE tag storage region management disabled");
> +
> +   return -EINVAL;
> +}
> +arch_initcall(mte_enable_tag_storage);
> --
> 2.43.0
>



Re: [PATCH RFC v3 28/35] arm64: mte: swap: Handle tag restoring when missing tag storage

2024-02-02 Thread Evgenii Stepanov
On Fri, Feb 2, 2024 at 6:56 AM Alexandru Elisei
 wrote:
>
> Hi Peter,
>
> On Thu, Feb 01, 2024 at 08:02:40PM -0800, Peter Collingbourne wrote:
> > On Thu, Jan 25, 2024 at 8:45 AM Alexandru Elisei
> >  wrote:
> > >
> > > Linux restores tags when a page is swapped in and there are tags 
> > > associated
> > > with the swap entry which the new page will replace. The saved tags are
> > > restored even if the page will not be mapped as tagged, to protect against
> > > cases where the page is shared between different VMAs, and is tagged in
> > > some, but untagged in others. By using this approach, the process can 
> > > still
> > > access the correct tags following an mprotect(PROT_MTE) on the non-MTE
> > > enabled VMA.
> > >
> > > But this poses a challenge for managing tag storage: in the scenario 
> > > above,
> > > when a new page is allocated to be swapped in for the process where it 
> > > will
> > > be mapped as untagged, the corresponding tag storage block is not 
> > > reserved.
> > > mte_restore_page_tags_by_swp_entry(), when it restores the saved tags, 
> > > will
> > > overwrite data in the tag storage block associated with the new page,
> > > leading to data corruption if the block is in use by a process.
> > >
> > > Get around this issue by saving the tags in a new xarray, this time 
> > > indexed
> > > by the page pfn, and then restoring them when tag storage is reserved for
> > > the page.
> > >
> > > Signed-off-by: Alexandru Elisei 
> > > ---
> > >
> > > Changes since rfc v2:
> > >
> > > * Restore saved tags **before** setting the PG_tag_storage_reserved bit to
> > > eliminate a brief window of opportunity where userspace can access 
> > > uninitialized
> > > tags (Peter Collingbourne).
> > >
> > >  arch/arm64/include/asm/mte_tag_storage.h |   8 ++
> > >  arch/arm64/include/asm/pgtable.h |  11 +++
> > >  arch/arm64/kernel/mte_tag_storage.c  |  12 ++-
> > >  arch/arm64/mm/mteswap.c  | 110 +++
> > >  4 files changed, 140 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/include/asm/mte_tag_storage.h 
> > > b/arch/arm64/include/asm/mte_tag_storage.h
> > > index 50bdae94cf71..40590a8c3748 100644
> > > --- a/arch/arm64/include/asm/mte_tag_storage.h
> > > +++ b/arch/arm64/include/asm/mte_tag_storage.h
> > > @@ -36,6 +36,14 @@ bool page_is_tag_storage(struct page *page);
> > >
> > >  vm_fault_t handle_folio_missing_tag_storage(struct folio *folio, struct 
> > > vm_fault *vmf,
> > > bool *map_pte);
> > > +vm_fault_t mte_try_transfer_swap_tags(swp_entry_t entry, struct page 
> > > *page);
> > > +
> > > +void tags_by_pfn_lock(void);
> > > +void tags_by_pfn_unlock(void);
> > > +
> > > +void *mte_erase_tags_for_pfn(unsigned long pfn);
> > > +bool mte_save_tags_for_pfn(void *tags, unsigned long pfn);
> > > +void mte_restore_tags_for_pfn(unsigned long start_pfn, int order);
> > >  #else
> > >  static inline bool tag_storage_enabled(void)
> > >  {
> > > diff --git a/arch/arm64/include/asm/pgtable.h 
> > > b/arch/arm64/include/asm/pgtable.h
> > > index 0174e292f890..87ae59436162 100644
> > > --- a/arch/arm64/include/asm/pgtable.h
> > > +++ b/arch/arm64/include/asm/pgtable.h
> > > @@ -1085,6 +1085,17 @@ static inline void arch_swap_invalidate_area(int 
> > > type)
> > > mte_invalidate_tags_area_by_swp_entry(type);
> > >  }
> > >
> > > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> > > +#define __HAVE_ARCH_SWAP_PREPARE_TO_RESTORE
> > > +static inline vm_fault_t arch_swap_prepare_to_restore(swp_entry_t entry,
> > > + struct folio *folio)
> > > +{
> > > +   if (tag_storage_enabled())
> > > +   return mte_try_transfer_swap_tags(entry, &folio->page);
> > > +   return 0;
> > > +}
> > > +#endif
> > > +
> > >  #define __HAVE_ARCH_SWAP_RESTORE
> > >  static inline void arch_swap_restore(swp_entry_t entry, struct folio 
> > > *folio)
> > >  {
> > > diff --git a/arch/arm64/kernel/mte_tag_storage.c 
> > > b/arch/arm64/kernel/mte_tag_storage.c
> > > index afe2bb754879..ac7b9c9c585c 100644
> > > --- a/arch/arm64/kernel/mte_tag_storage.c
> > > +++ b/arch/arm64/kernel/mte_tag_storage.c
> > > @@ -567,6 +567,7 @@ int reserve_tag_storage(struct page *page, int order, 
> > > gfp_t gfp)
> > > }
> > > }
> > >
> > > +   mte_restore_tags_for_pfn(page_to_pfn(page), order);
> > > page_set_tag_storage_reserved(page, order);
> > >  out_unlock:
> > > mutex_unlock(&tag_blocks_lock);
> > > @@ -595,7 +596,8 @@ void free_tag_storage(struct page *page, int order)
> > > struct tag_region *region;
> > > unsigned long page_va;
> > > unsigned long flags;
> > > -   int ret;
> > > +   void *tags;
> > > +   int i, ret;
> > >
> > > ret = tag_storage_find_block(page, &start_block, ®ion);
> > > if (WARN_ONCE(ret, "Missing tag storage block for pfn 0x%lx", 
> > > page_to_pfn(pa

Re: [RFC][PATCH 0/3] arm64 relaxed ABI

2019-02-11 Thread Evgenii Stepanov
On Mon, Feb 11, 2019 at 9:28 AM Kevin Brodsky  wrote:
>
> On 19/12/2018 12:52, Dave Martin wrote:
> > On Tue, Dec 18, 2018 at 05:59:38PM +, Catalin Marinas wrote:
> >> On Tue, Dec 18, 2018 at 04:03:38PM +0100, Andrey Konovalov wrote:
> >>> On Wed, Dec 12, 2018 at 4:02 PM Catalin Marinas  
> >>> wrote:
>  The summary of our internal discussions (mostly between kernel
>  developers) is that we can't properly describe a user ABI that covers
>  future syscalls or syscall extensions while not all syscalls accept
>  tagged pointers. So we tweaked the requirements slightly to only allow
>  tagged pointers back into the kernel *if* the originating address is
>  from an anonymous mmap() or below sbrk(0). This should cover some of the
>  ioctls or getsockopt(TCP_ZEROCOPY_RECEIVE) where the user passes a
>  pointer to a buffer obtained via mmap() on the device operations.
> 
>  (sorry for not being clear on what Vincenzo's proposal implies)
> >>> OK, I see. So I need to make the following changes to my patchset AFAIU.
> >>>
> >>> 1. Make sure that we only allow tagged user addresses that originate
> >>> from an anonymous mmap() or below sbrk(0). How exactly should this
> >>> check be performed?
> >> I don't think we should perform such checks. That's rather stating that
> >> the kernel only guarantees that the tagged pointers work if they
> >> originated from these memory ranges.
> > I concur.
> >
> > Really, the kernel should do the expected thing with all "non-weird"
> > memory.
> >
> > In lieu of a proper definition of "non-weird", I think we should have
> > some lists of things that are explicitly included, and also excluded:
> >
> > OK:
> >   kernel-allocated process stack
> >   brk area
> >   MAP_ANONYMOUS | MAP_PRIVATE
> >   MAP_PRIVATE mappings of /dev/zero
> >
> > Not OK:
> >   MAP_SHARED
> >   mmaps of non-memory-like devices
> >   mmaps of anything that is not a regular file
> >   the VDSO
> >   ...
> >
> > In general, userspace can tag memory that it "owns", and we do not assume
> > a transfer of ownership except in the "OK" list above.  Otherwise, it's
> > the kernel's memory, or the owner is simply not well defined.
>
> Agreed on the general idea: a process should be able to pass tagged pointers 
> at the
> syscall interface, as long as they point to memory privately owned by the 
> process. I
> think it would be possible to simplify the definition of "non-weird" memory 
> by using
> only this "OK" list:
> - mmap() done by the process itself, where either:
>* flags = MAP_PRIVATE | MAP_ANONYMOUS
>* flags = MAP_PRIVATE and fd refers to a regular file or a well-defined 
> list of
> device files (like /dev/zero)
> - brk() done by the process itself
> - Any memory mapped by the kernel in the new process's address space during 
> execve(),
> with the same restrictions as above ([vdso]/[vvar] are therefore excluded)
>
> > I would also like to see advice for userspace developers, particularly
> > things like (strawman, please challenge!):
>
> To some extent, one could call me a userspace developer, so I'll try to help 
> :)
>
> >   * Userspace should set tags at the point of allocation only.
>
> Yes, tags are only supposed to be set at the point of either allocation or
> deallocation/reallocation. However, allocators can in principle be nested, so 
> an
> allocator could  take a region allocated by malloc() as input and subdivide it
> (changing tags in the process). That said, this suballocator must not free() 
> that
> region until all the suballocations themselves have been freed (thereby 
> restoring the
> tags initially set by malloc()).
>
> >   * If you don't know how an object was allocated, you cannot modify the
> > tag, period.
>
> Agreed, allocators that tag memory can only operate on memory with a 
> well-defined
> provenance (for instance anonymous mmap() or malloc()).
>
> >   * A single C object should be accessed using a single, fixed pointer tag
> > throughout its entire lifetime.
>
> Agreed. Allocators themselves may need to be excluded though, depending on 
> how they
> represent their managed memory.
>
> >   * Tags can be changed only when there are no outstanding pointers to
> > the affected object or region that may be used to access the object
> > or region (i.e., if the object were allocated from the C heap and
> > is it safe to realloc() it, then it is safe to change the tag; for
> > other types of allocation, analogous arguments can be applied).
>
> Tags can only be changed at the point of deallocation/reallocation. Pointers 
> to the
> object become invalid and cannot be used after it has been deallocated; memory
> tagging allows to catch such invalid usage.
>
> >   * When the kernel dereferences a pointer on userspace's behalf, it
> > shall behave equivalently to userspace dereferencing the same pointer,
> > including use of the same tag (where passed by userspace).
> 

Re: [RFC][PATCH 0/3] arm64 relaxed ABI

2019-02-13 Thread Evgenii Stepanov
On Wed, Feb 13, 2019 at 9:43 AM Dave Martin  wrote:
>
> On Wed, Feb 13, 2019 at 04:42:11PM +, Kevin Brodsky wrote:
> > (+Cc other people with MTE experience: Branislav, Ruben)
>
> [...]
>
> > >I'm wondering whether we can piggy-back on existing concepts.
> > >
> > >We could say that recolouring memory is safe when and only when
> > >unmapping of the page or removing permissions on the page (via
> > >munmap/mremap/mprotect) would be safe.  Otherwise, the resulting
> > >behaviour of the process is undefined.
> >
> > Is that a sufficient requirement? I don't think that anything prevents you
> > from using mprotect() on say [vvar], but we don't necessarily want to map
> > [vvar] as tagged. I'm not sure it's easy to define what "safe" would mean
> > here.
>
> I think the origin rules have to apply too: [vvar] is not a regular,
> private page but a weird, shared thing mapped for you by the kernel.
>
> Presumably userspace _cannot_ do mprotect(PROT_WRITE) on it.
>
> I'm also assuming that userspace cannot recolour memory in read-only
> pages.  That sounds bad if there's no way to prevent it.

That sounds like something we would like to do to catch out of bounds
read of .rodata globals.
Another potentially interesting use case for MTE is infinite hardware
watchpoints - that would require trapping reads for individual tagging
granules, include those in read-only binary segment.

>
> [...]
>
> > >It might be reasonable to do the check in access_ok() and skip it in
> > >__put_user() etc.
> > >
> > >(I seem to remember some separate discussion about abolishing
> > >__put_user() and friends though, due to the accident risk they pose.)
> >
> > Keep in mind that with MTE, there is no need to do any explicit check when
> > accessing user memory via a user-provided pointer. The tagged user pointer
> > is directly passed to copy_*_user() or put_user(). If the load/store causes
> > a tag fault, then it is handled just like a page fault (i.e. invoking the
> > fixup handler). As far as I can tell, there's no need to do anything special
> > in access_ok() in that case.
> >
> > [The above applies to precise mode. In imprecise mode, some more work will
> > be needed after the load/store to check whether a tag fault happened.]
>
> Fair enough, I'm a bit hazy on the details as of right now..
>
> [...]
>
> > There are many possible ways to deploy MTE, and debugging is just one of
> > them. For instance, you may want to turn on heap colouring for some
> > processes in the system, including in production.
>
> To implement enforceable protection, or as a diagnostic tool for when
> something goes wrong?
>
> In the latter case it's still OK for the kernel's tag checking not to be
> exhaustive.
>
> > Regarding those cases where it is impossible to check tags at the point of
> > accessing user memory, it is indeed possible to check the memory tags at the
> > point of stripping the tag from the user pointer. Given that some MTE
> > use-cases favour performance over tag check coverage, the ideal approach
> > would be to make these checks configurable (e.g. check one granule, check
> > all of them, or check none). I don't know how feasible this is in practice.
>
> Check all granules of a massive DMA buffer?
>
> That doesn't sounds feasible without explicit support in the hardware to
> have the DMA check tags itself as the memory is accessed.  MTE by itself
> doesn't provide for this IIUC (at least, it would require support in the
> platform, not just the CPU).
>
> We do not want to bake any assumptions into the ABI about whether a
> given data transfer may or may not be offloaded to DMA.  That feels
> like a slippery slope.
>
> Providing we get the checks for free in put_user/get_user/
> copy_{to,from}_user(), those will cover a lot of cases though, for
> non-bulk-IO cases.
>
>
> My assumption has been that at this point in time we are mainly aiming
> to support the debug/diagnostic use cases today.
>
> At least, those are the low(ish)-hanging fruit.
>
> Others are better placed than me to comment on the goals here.
>
> Cheers
> ---Dave


Re: [RFC PATCH 3/6] mm, arm64: untag user addresses in memory syscalls

2018-03-15 Thread Evgenii Stepanov
On Wed, Mar 14, 2018 at 10:44 AM, Catalin Marinas
 wrote:
> On Wed, Mar 14, 2018 at 04:45:20PM +0100, Andrey Konovalov wrote:
>> On Fri, Mar 9, 2018 at 6:42 PM, Evgenii Stepanov  wrote:
>> > On Fri, Mar 9, 2018 at 9:31 AM, Andrey Konovalov  
>> > wrote:
>> >> On Fri, Mar 9, 2018 at 4:53 PM, Catalin Marinas  
>> >> wrote:
>> >>> I'm not yet convinced these functions need to allow tagged pointers.
>> >>> They are not doing memory accesses but rather dealing with the memory
>> >>> range, hence an untagged pointer is better suited. There is probably a
>> >>> reason why the "start" argument is "unsigned long" vs "void __user *"
>> >>> (in the kernel, not the man page).
>> >>
>> >> So that would make the user to untag pointers before passing to these 
>> >> syscalls.
>> >>
>> >> Evgeniy, would that be possible to untag pointers in HWASan before
>> >> using memory subsystem syscalls? Is there a reason for untagging them
>> >> in the kernel?
>> >
>> > Generally, no. It's possible to intercept a libc call using symbol
>> > interposition, but I don't know how to rewrite arguments of a raw
>> > system call other than through ptrace, and that creates more problems
>> > than it solves.
>
> With these patches, we are trying to relax the user/kernel ABI so that
> tagged pointers can be passed into the kernel. Since this is a new ABI
> (or an extension to the existing one), it might be ok to change the libc
> so that the top byte is zeroed on specific syscalls before issuing the
> SVC.
>
> I agree that it is problematic for HWASan if it only relies on
> overriding malloc/free.
>
>> > AFAIU, it's valid for a program to pass an address obtained from
>> > malloc or, better, posix_memalign to an mm syscall like mprotect().
>> > These arguments are pointers from the userspace point of view.
>>
>> Catalin, do you think this is a good reason to have the untagging done
>> in the kernel?
>
> malloc() or posix_memalign() are C library implementations and it's the
> C library (or overridden functions) setting a tag on the returned
> pointers. Since the TBI hardware feature allows memory accesses with a
> non-zero tag, we could allow them in the kernel for syscalls performing
> such accesses on behalf of the user (e.g. get_user/put_user would not
> need to clear the tag).
>
> madvise(), OTOH, does not perform a memory access on behalf of the user,
> it's just advising the kernel about a range of virtual addresses. That's
> where I think, from an ABI perspective, it doesn't make much sense to
> allow tags into the kernel for these syscalls (even if it's simpler from
> a user space perspective).
>
> (but I don't have a very strong opinion on this ;))

I don't have a strong opinion on this, either.
Ideally, I would like tags to be fully transparent for user space
code. MM syscalls used on a malloc/memalign address are not a very
common pattern, so it might be OK to not allow tags there. But all
such code will have to be changed with explicit knowledge of TBI.


Re: [PATCH 21/35] arm64: mte: Add in-kernel tag fault handler

2020-08-27 Thread Evgenii Stepanov
On Thu, Aug 27, 2020 at 7:56 AM Catalin Marinas  wrote:
>
> On Thu, Aug 27, 2020 at 03:34:42PM +0200, Andrey Konovalov wrote:
> > On Thu, Aug 27, 2020 at 3:10 PM Catalin Marinas  
> > wrote:
> > > On Thu, Aug 27, 2020 at 02:31:23PM +0200, Andrey Konovalov wrote:
> > > > On Thu, Aug 27, 2020 at 11:54 AM Catalin Marinas
> > > >  wrote:
> > > > > On Fri, Aug 14, 2020 at 07:27:03PM +0200, Andrey Konovalov wrote:
> > > > > > +static int do_tag_recovery(unsigned long addr, unsigned int esr,
> > > > > > +struct pt_regs *regs)
> > > > > > +{
> > > > > > + report_tag_fault(addr, esr, regs);
> > > > > > +
> > > > > > + /* Skip over the faulting instruction and continue: */
> > > > > > + arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > > > >
> > > > > Ooooh, do we expect the kernel to still behave correctly after this? I
> > > > > thought the recovery means disabling tag checking altogether and
> > > > > restarting the instruction rather than skipping over it.
> [...]
> > > > Can we disable MTE, reexecute the instruction, and then reenable MTE,
> > > > or something like that?
> > >
> > > If you want to preserve the MTE enabled, you could single-step the
> > > instruction or execute it out of line, though it's a bit more convoluted
> > > (we have a similar mechanism for kprobes/uprobes).
> > >
> > > Another option would be to attempt to set the matching tag in memory,
> > > under the assumption that it is writable (if it's not, maybe it's fine
> > > to panic). Not sure how this interacts with the slub allocator since,
> > > presumably, the logical tag in the pointer is wrong rather than the
> > > allocation one.
> > >
> > > Yet another option would be to change the tag in the register and
> > > re-execute but this may confuse the compiler.
> >
> > Which one of these would be simpler to implement?
>
> Either 2 or 3 would be simpler (re-tag the memory location or the
> pointer) with the caveats I mentioned. Also, does the slab allocator
> need to touch the memory on free with a tagged pointer? Otherwise slab
> may hit an MTE fault itself.

Changing the memory tag can cause faults in other threads, and that
could be very confusing.
Probably the safest thing is to retag the register, single step and
then retag it back, but be careful with the instructions that change
the address register (like ldr x0, [x0]).

>
> > Perhaps we could somehow only skip faulting instructions that happen
> > in the KASAN test module?.. Decoding stack trace would be an option,
> > but that's a bit weird.
>
> If you want to restrict this to the KASAN tests, just add some
> MTE-specific accessors with a fixup entry similar to get_user/put_user.
> __do_kernel_fault() (if actually called) will invoke the fixup code
> which skips the access and returns an error. This way KASAN tests can
> actually verify that tag checking works, I'd find this a lot more
> useful.
>
> --
> Catalin


Re: [PATCH 00/20] kasan: boot parameters for hardware tag-based mode

2020-11-05 Thread Evgenii Stepanov
On Wed, Nov 4, 2020 at 4:02 PM Andrey Konovalov  wrote:
>
> === Overview
>
> Hardware tag-based KASAN mode [1] is intended to eventually be used in
> production as a security mitigation. Therefore there's a need for finer
> control over KASAN features and for an existence of a kill switch.
>
> This patchset adds a few boot parameters for hardware tag-based KASAN that
> allow to disable or otherwise control particular KASAN features.
>
> There's another planned patchset what will further optimize hardware
> tag-based KASAN, provide proper benchmarking and tests, and will fully
> enable tag-based KASAN for production use.
>
> Hardware tag-based KASAN relies on arm64 Memory Tagging Extension (MTE)
> [2] to perform memory and pointer tagging. Please see [3] and [4] for
> detailed analysis of how MTE helps to fight memory safety problems.
>
> The features that can be controlled are:
>
> 1. Whether KASAN is enabled at all.
> 2. Whether KASAN collects and saves alloc/free stacks.
> 3. Whether KASAN panics on a detected bug or not.
>
> The patch titled "kasan: add and integrate kasan boot parameters" of this
> series adds a few new boot parameters.
>
> kasan.mode allows to choose one of three main modes:
>
> - kasan.mode=off - KASAN is disabled, no tag checks are performed
> - kasan.mode=prod - only essential production features are enabled
> - kasan.mode=full - all KASAN features are enabled
>
> The chosen mode provides default control values for the features mentioned
> above. However it's also possible to override the default values by
> providing:
>
> - kasan.stack=off/on - enable stacks collection
>(default: on for mode=full, otherwise off)

I think this was discussed before, but should this be kasan.stacktrace
or something like that?
In other places "kasan stack" refers to stack instrumentation, not
stack trace collection.
Ex.: CONFIG_KASAN_STACK

> - kasan.fault=report/panic - only report tag fault or also panic
>  (default: report)
>
> If kasan.mode parameter is not provided, it defaults to full when
> CONFIG_DEBUG_KERNEL is enabled, and to prod otherwise.
>
> It is essential that switching between these modes doesn't require
> rebuilding the kernel with different configs, as this is required by
> the Android GKI (Generic Kernel Image) initiative.
>
> === Benchmarks
>
> For now I've only performed a few simple benchmarks such as measuring
> kernel boot time and slab memory usage after boot. There's an upcoming
> patchset which will optimize KASAN further and include more detailed
> benchmarking results.
>
> The benchmarks were performed in QEMU and the results below exclude the
> slowdown caused by QEMU memory tagging emulation (as it's different from
> the slowdown that will be introduced by hardware and is therefore
> irrelevant).
>
> KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory
> impact compared to KASAN_HW_TAGS=n.
>
> kasan.mode=prod (manually excluding tagging) introduces 3% of performance
> and no memory impact (except memory used by hardware to store tags)
> compared to kasan.mode=off.
>
> kasan.mode=full has about 40% performance and 30% memory impact over
> kasan.mode=prod. Both come from alloc/free stack collection.
>
> === Notes
>
> This patchset is available here:
>
> https://github.com/xairy/linux/tree/up-boot-mte-v1
>
> and on Gerrit here:
>
> https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/3707
>
> This patchset is based on v8 of "kasan: add hardware tag-based mode for
> arm64" patchset [1].
>
> For testing in QEMU hardware tag-based KASAN requires:
>
> 1. QEMU built from master [6] (use "-machine virt,mte=on -cpu max" arguments
>to run).
> 2. GCC version 10.
>
> [1] https://lkml.org/lkml/2020/11/4/1208
> [2] 
> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
> [3] https://arxiv.org/pdf/1802.09517.pdf
> [4] 
> https://github.com/microsoft/MSRC-Security-Research/blob/master/papers/2020/Security%20analysis%20of%20memory%20tagging.pdf
> [5] 
> https://source.android.com/devices/architecture/kernel/generic-kernel-image
> [6] https://github.com/qemu/qemu
>
> === History
>
> Changes RFC v2 -> v1:
> - Rebrand the patchset from fully enabling production use to partially
>   addressing that; another optimization and testing patchset will be
>   required.
> - Rebase onto v8 of KASAN_HW_TAGS series.
> - Fix "ASYNC" -> "async" typo.
> - Rework depends condition for VMAP_STACK and update config text.
> - Remove unneeded reset_tag() macro, use kasan_reset_tag() instead.
> - Rename kasan.stack to kasan.stacks to avoid confusion with stack
>   instrumentation.
> - Introduce kasan_stack_collection_enabled() and kasan_is_enabled()
>   helpers.
> - Simplify kasan_stack_collection_enabled() usage.
> - Rework SLAB_KASAN flag and metadata allocation (see the corresponding
>   patch for details).
> - Allow cache merging with KASAN_HW_TAGS when kasan.stacks 

Re: [PATCH 00/20] kasan: boot parameters for hardware tag-based mode

2020-11-05 Thread Evgenii Stepanov
On Thu, Nov 5, 2020 at 12:55 PM Andrey Konovalov  wrote:
>
> On Thu, Nov 5, 2020 at 9:49 PM Evgenii Stepanov  wrote:
> >
> > > The chosen mode provides default control values for the features mentioned
> > > above. However it's also possible to override the default values by
> > > providing:
> > >
> > > - kasan.stack=off/on - enable stacks collection
> > >(default: on for mode=full, otherwise off)
> >
> > I think this was discussed before, but should this be kasan.stacktrace
> > or something like that?
> > In other places "kasan stack" refers to stack instrumentation, not
> > stack trace collection.
> > Ex.: CONFIG_KASAN_STACK
>
> Forgot to update it here, but it's kasan.stacks now (with an s at the
> end). kasan.stacktrace might be better, although it's somewhat long.
> WDYT?

I like kasan.stacktrace, but I would not insist.