[Bug sanitizer/90414] [Feature] Implementing HWASAN (and eventually MTE)

marxin at gcc dot gnu.org Sun, 12 May 2019 23:47:50 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414


Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-05-13
     Ever confirmed|0                           |1

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Matthew Malcomson from comment #0)
> Hello,
> 
> I'm looking into how we can implement MTE in the compiler.
> A productive first step could be implementing HWASAN for GCC, which does a
> software implementation of MTE using the top-byte-ignore feature.

Agree with that, HWASAN can be a first step.

> 
> This has already been implemented in LLVM and the design can be found at the
> link below.
> https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> 
> 
> Hopefully we can make this change in such a way that will enable the use of
> MTE
> in the future.
> 
> 
> I don't know the best approach here, and would appreciate any feedback.
> From inspection it looks like most of the work is already handled by ASAN --
> especially in finding all those places that need to be instrumented -- so I
> was
> looking into what modifications would need to be made from that starting
> point.

Yes, that is a very common base that be just leveraged a bit.

> 
> 
> I believe that tagging stack allocated memory can be done in a similar way to
> ASAN by expanding the equivalent of ASAN_MARK in a relevant manner.

Take a look at asan.c (asan_emit_stack_protection). It's a place where a shadow
memory is generated for a stack frame. With HWASAN you won't need to emit red
zones,
but only tags in shadow memory.

> 
> However, checking memory accesses seems to need a different approach to the
> current ASAN one with ASAN_CHECK.
> 
> For both HWASAN and MTE we need to find the tag that a given memory access
> should be done through.
> In order to produce the best machine-code we would need to associate each
> stack
> variable with a tag internally.

Yes.

> In the LLVM implementation this is done by generating a random tag for the
> current stack, and associating each stack variable with an increment from
> this
> tag.
> 
> Also, for MTE the access itself needs to be made with a tagged pointer, which
> means the current method of adding instructions before a memory access can't
> be
> used and instead we need to modify the memory access itself.
> 
> 
> I have some very basic questions that I would appreciate any help in
> answering.
> 
> 1) Where should such passes be put?
>    I would guess that putting HWASAN and/or MTE passes in the same position
> as
>    the ASAN passes and updating the SANOPT pass to handle any changes would
> be
>    ok, but I don't have a good understanding of why they are in their current
>    position.

Current asan pass is responsible for instrumentation of memory accesses
checking
(IFN_ASAN_CHECK) and IFN_ASAN_MARK (poisoning/unpoisoning).

Sanopt is responsible for lowering of these internal fns and that would be the
place where you'll need to tweak. General speaking, you would only need
a different instrumentation for:

                case IFN_ASAN_CHECK:
                  no_next = asan_expand_check_ifn (&gsi, use_calls);
                  break;
                case IFN_ASAN_MARK:
                  no_next = asan_expand_mark_ifn (&gsi);
                  break;
                case IFN_ASAN_POISON:
                  no_next = asan_expand_poison_ifn (&gsi,
                                                    &need_commit_edge_insert,
                                                    shadow_vars_mapping);

> 
> 2) Can we always find the base object that's being referenced from the gimple
>    statement where memory is accessed or a pointer is created?
>    If not, when is it problematic?
>    Finding the base object is pretty fundamental to getting the tag for a
>    pointer.
>    It seems like this should be possible based on a reading of the
> documentation
>    and looking at the TREE_CODEs that the current ASAN `instrument_derefs`
>    function works on.
> 
>    (ARRAY_REF     -> first operand is the array
>     MEM_REF       -> first operand is the base
>     COMPONENT_REF -> first operand is the object
>     INDIRECT_REF  -> first operand is the pointer which should reference
> object
>     VAR_DECL      -> this is the object
>     BIT_FIELD_REF -> first operand is the object)

There would be cases where a base is known and for these you could probably
instrument checks with a constant known tag. For other situation, you'll
probably
need to extract the tag from the pointer. Right?

> 
> 3) Would there be any obvious difficulties with a transformation of the form:
>       _4 = big_arrayD.3771[num_3(D)]
>       
>       TO
>       
>       _6 = &big_arrayD.3771[num_3(D)];
>       _7 = HWASAN_CHECK(6, _6, 4, 4);
>       _4 = *_7;
> 
>    Instead of
>       _4 = big_arrayD.3771[num_3(D)]
>       
>       TO
>       
>       _6 = &big_arrayD.3771[num_3(D)];
>       ASAN_CHECK(6, _6, 4, 4);
>       _4 = big_arrayD.3771[num_3(D)]
> 
>    which is what ASAN currently does.

No, it's just an implementation detail.

>    This new form would enable using MTE by allowing the check to modify the
>    pointer that the access will be made with (so it can have have its tag).
> 
> 4) Builtin memory calls look like they could be handled with HWASAN in
> basically
>    the same way as ASAN, while for MTE they should be fine once the pointers
> the
>    calls are provided are tagged.
>    Is there anything stopping that approach?
> 
> 
> 
> Thanks,
> MM

In general, I'm interested in implementation of the feature, but I'll probably
not
find a time to do it. However, I can help you with that.

[Bug sanitizer/90414] [Feature] Implementing HWASAN (and eventually MTE)

Reply via email to