https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90414
Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2019-05-13 Ever confirmed|0 |1 --- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to Matthew Malcomson from comment #0) > Hello, > > I'm looking into how we can implement MTE in the compiler. > A productive first step could be implementing HWASAN for GCC, which does a > software implementation of MTE using the top-byte-ignore feature. Agree with that, HWASAN can be a first step. > > This has already been implemented in LLVM and the design can be found at the > link below. > https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html > > > Hopefully we can make this change in such a way that will enable the use of > MTE > in the future. > > > I don't know the best approach here, and would appreciate any feedback. > From inspection it looks like most of the work is already handled by ASAN -- > especially in finding all those places that need to be instrumented -- so I > was > looking into what modifications would need to be made from that starting > point. Yes, that is a very common base that be just leveraged a bit. > > > I believe that tagging stack allocated memory can be done in a similar way to > ASAN by expanding the equivalent of ASAN_MARK in a relevant manner. Take a look at asan.c (asan_emit_stack_protection). It's a place where a shadow memory is generated for a stack frame. With HWASAN you won't need to emit red zones, but only tags in shadow memory. > > However, checking memory accesses seems to need a different approach to the > current ASAN one with ASAN_CHECK. > > For both HWASAN and MTE we need to find the tag that a given memory access > should be done through. > In order to produce the best machine-code we would need to associate each > stack > variable with a tag internally. Yes. > In the LLVM implementation this is done by generating a random tag for the > current stack, and associating each stack variable with an increment from > this > tag. > > Also, for MTE the access itself needs to be made with a tagged pointer, which > means the current method of adding instructions before a memory access can't > be > used and instead we need to modify the memory access itself. > > > I have some very basic questions that I would appreciate any help in > answering. > > 1) Where should such passes be put? > I would guess that putting HWASAN and/or MTE passes in the same position > as > the ASAN passes and updating the SANOPT pass to handle any changes would > be > ok, but I don't have a good understanding of why they are in their current > position. Current asan pass is responsible for instrumentation of memory accesses checking (IFN_ASAN_CHECK) and IFN_ASAN_MARK (poisoning/unpoisoning). Sanopt is responsible for lowering of these internal fns and that would be the place where you'll need to tweak. General speaking, you would only need a different instrumentation for: case IFN_ASAN_CHECK: no_next = asan_expand_check_ifn (&gsi, use_calls); break; case IFN_ASAN_MARK: no_next = asan_expand_mark_ifn (&gsi); break; case IFN_ASAN_POISON: no_next = asan_expand_poison_ifn (&gsi, &need_commit_edge_insert, shadow_vars_mapping); > > 2) Can we always find the base object that's being referenced from the gimple > statement where memory is accessed or a pointer is created? > If not, when is it problematic? > Finding the base object is pretty fundamental to getting the tag for a > pointer. > It seems like this should be possible based on a reading of the > documentation > and looking at the TREE_CODEs that the current ASAN `instrument_derefs` > function works on. > > (ARRAY_REF -> first operand is the array > MEM_REF -> first operand is the base > COMPONENT_REF -> first operand is the object > INDIRECT_REF -> first operand is the pointer which should reference > object > VAR_DECL -> this is the object > BIT_FIELD_REF -> first operand is the object) There would be cases where a base is known and for these you could probably instrument checks with a constant known tag. For other situation, you'll probably need to extract the tag from the pointer. Right? > > 3) Would there be any obvious difficulties with a transformation of the form: > _4 = big_arrayD.3771[num_3(D)] > > TO > > _6 = &big_arrayD.3771[num_3(D)]; > _7 = HWASAN_CHECK(6, _6, 4, 4); > _4 = *_7; > > Instead of > _4 = big_arrayD.3771[num_3(D)] > > TO > > _6 = &big_arrayD.3771[num_3(D)]; > ASAN_CHECK(6, _6, 4, 4); > _4 = big_arrayD.3771[num_3(D)] > > which is what ASAN currently does. No, it's just an implementation detail. > This new form would enable using MTE by allowing the check to modify the > pointer that the access will be made with (so it can have have its tag). > > 4) Builtin memory calls look like they could be handled with HWASAN in > basically > the same way as ASAN, while for MTE they should be fine once the pointers > the > calls are provided are tagged. > Is there anything stopping that approach? > > > > Thanks, > MM In general, I'm interested in implementation of the feature, but I'll probably not find a time to do it. However, I can help you with that.