On Fri, Aug 09, 2019 at 07:10:18AM -0700, Dave Hansen wrote:
> On 8/8/19 10:27 AM, Catalin Marinas wrote:
> > On Wed, Aug 07, 2019 at 01:38:16PM -0700, Dave Hansen wrote:
> >> Also, shouldn't this be converted over to an arch_prctl()?
> >
> > What do you mean by arch_prctl()? We don't have such thing, apart from
> > maybe arch_prctl_spec_ctrl_*(). We achieve the same thing with the
> > {SET,GET}_TAGGED_ADDR_CTRL macros. They could be renamed to
> > arch_prctl_tagged_addr_{set,get} or something but I don't see much
> > point.
>
> Silly me. We have an x86-specific:
>
> SYSCALL_DEFINE2(arch_prctl, int , option, unsigned long , arg2)
>
> I guess there's no ARM equivalent so you're stuck with the generic one.
>
> > What would be better (for a separate patch series) is to clean up
> > sys_prctl() and move the arch-specific options into separate
> > arch_prctl() under arch/*/kernel/. But it's not really for this series.
>
> I think it does make sense for truly arch-specific features to stay out
> of the arch-generic prctl(). Yes, I know I've personally violated this
> in the past. :)
Maybe Dave M could revive his prctl() clean-up patches which moves the
arch specific cases to the corresponding arch/*/ code
> >> What is the scope of these prctl()'s? Are they thread-scoped or
> >> process-scoped? Can two threads in the same process run with different
> >> tagging ABI modes?
> >
> > Good point. They are thread-scoped and this should be made clear in the
> > doc. Two threads can have different modes.
> >
> > The expectation is that this is invoked early during process start (by
> > the dynamic loader or libc init) while in single-thread mode and
> > subsequent threads will inherit the same mode. However, other uses are
> > possible.
>
> If that's the expectation, it would be really nice to codify it.
> Basically, you can't enable the feature if another thread is already
> been forked off.
Well, that's my expectation but I'm not a userspace developer. I don't
think there is any good reason to prevent it. It doesn't cost us
anything to support in the kernel, other than making the documentation
clearer.
> >>> +When a process has successfully enabled the new ABI by invoking
> >>> +prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE), the following
> >>> +behaviours are guaranteed:
> >>> +
> >>> +- Every currently available syscall, except the cases mentioned in
> >>> section
> >>> + 3, can accept any valid tagged pointer. The same rule is applicable to
> >>> + any syscall introduced in the future.
> >>> +
> >>> +- The syscall behaviour is undefined for non valid tagged pointers.
> >>
> >> Do you really mean "undefined"? I mean, a bad pointer is a bad pointer.
> >> Why should it matter if it's a tagged bad pointer or an untagged bad
> >> pointer?
> >
> > Szabolcs already replied here. We may have tagged pointers that can be
> > dereferenced just fine but being passed to the kernel may not be well
> > defined (e.g. some driver doing a find_vma() that fails unless it
> > explicitly untags the address). It's as undefined as the current
> > behaviour (without these patches) guarantees.
>
> It might just be nicer to point out what this features *changes* about
> invalid pointer handling, which is nothing. :) Maybe:
>
> The syscall behaviour for invalid pointers is the same for both
> tagged and untagged pointers.
Good point.
> >>> +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and
> >>> + PR_SET_MM_MAP_SIZE.
> >>> +
> >>> +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map
> >>> fields.
> >>> +
> >>> +Any attempt to use non-zero tagged pointers will lead to undefined
> >>> +behaviour.
> >>
> >> I wonder if you want to generalize this a bit. I think you're saying
> >> that parts of the ABI that modify the *layout* of the address space
> >> never accept tagged pointers.
> >
> > I guess our difficulty in specifying this may have been caused by
> > over-generalising. For example, madvise/mprotect came under the same
> > category but there is a use-case for malloc'ed pointers (and tagged) to
> > the kernel (e.g. MADV_DONTNEED). If we can restrict the meaning to
> > address space *layout* manipulation, we'd have mmap/mremap/munmap,
> > brk/sbrk, prctl(PR_SET_MM). Did I miss anything?. Other related syscalls
> > like mprotect/madvise preserve the layout while only changing permissions,
> > backing store, so the would be allowed to accept tags.
>
> shmat() comes to mind. I also did a quick grep for mmap_sem taken for
> write and didn't see anything else obvious jump out at me.
I'll document shmat() as not supported, together with the prctl().
As I submitted a fixup already, I propose that we allow tagged pointers
on mmap/munmap/mremap/brk. It makes the documentation simpler ;) (and
the user understanding of what is and is not allowed).
> >> It looks like the TAG_SHIFT and tag size are pretty baked into the
> >> aarch64 architecture. But, are you confident that no future
> >> implementations will want different positions or sizes? (obviously
> >> controlled by other TCR_EL1 bits)
> >
> > For the top-byte-ignore (TBI), that's been baked in the architecture
> > since ARMv8.0 and we'll have to keep the backwards compatible mode. As
> > the name implies, it's the top byte of the address and that's what the
> > document above refers to.
> >
> > With MTE, I can't exclude other configurations in the future but I'd
> > expect the kernel to present the option as a new HWCAP and the user to
> > explicitly opt in via a new prctl() flag. I seriously doubt we'd break
> > existing binaries. So, yes TAG_SHIFT may be different but so would the
> > prctl() above.
>
> Basically, what you have is a "turn tagging on" and "turn tagging off"
> call which are binary: all on or all off. How about exposing a mask:
>
> /* Replace hard-coded mask size/position: */
> unsigned long mask = prctl(GET_POSSIBLE_TAGGED_ADDR_BITS);
>
> if (mask == 0)
> // no tagging is supported obviously
>
> prctl(SET_TAGGED_ADDR_BITS, mask);
>
> // now userspace knows via 'mask' where the tag bits are
For the actual hardware memory tagging, maybe we could get the possible
bits but for TBI, as I said above, that's baked into the architecture. I
don't think it's worth the effort of getting a mask as I don't see ARM
changing this without breaking existing software. Even compiler support
like hwasan relies on the 8-bit TBI.
--
Catalin