Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

Jakub Jelinek Thu, 18 Aug 2016 06:36:49 -0700

On Thu, May 12, 2016 at 04:12:21PM +0200, Martin Liška wrote:
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value;
>  static bool asan_shadow_offset_computed;
>  static vec<char *> sanitized_sections;
>  
> +/* Set of variable declarations that are going to be guarded by
> +   use-after-scope sanitizer.  */
> +
> +static hash_set<tree> asan_handled_variables (13);


Doesn't this introduce yet another global ctor?  If yes (and we
unfortunately have already way too many), I'd strongly prefer to avoid that,
use pointer to hash_set<tree> or something similar.

> +/* Depending on POISON flag, emit a call to poison (or unpoison) stack memory
> +   allocated for local variables, localted in OFFSETS.  LENGTH is number
> +   of OFFSETS, BASE is the register holding the stack base,
> +   against which OFFSETS array offsets are relative to.  BASE_OFFSET 
> represents
> +   an offset requested by alignment and similar stuff.  */
> +
> +static void
> +asan_poison_stack_variables (rtx shadow_base, rtx base,
> +                          HOST_WIDE_INT base_offset,
> +                          HOST_WIDE_INT *offsets, int length,
> +                          tree *decls, bool poison)
> +{
> +  if (asan_sanitize_use_after_scope ())
> +    for (int l = length - 2; l > 0; l -= 2)
> +      {

I think this is unfortunate, it leads to:
        movl    $-235802127, 2147450880(%rax)
        movl    $-185335552, 2147450884(%rax)
        movl    $-202116109, 2147450888(%rax)
        movb    $-8, 2147450884(%rax)
        movb    $-8, 2147450885(%rax)
(e.g. on use-after-scope-1.c).
The asan_emit_stack_protection function already walks all the
entries in the offsets array in both of the
  for (l = length; l; l -= 2)
loops, so please handle the initial poisoning and final unpoisoning there
as well.  The goal is that for variables that you want poison-after-scope
at the start of the function (btw, I've noticed that current SVN LLVM
doesn't bother with it and thus doesn't track "use before scope" (before the
scope is entered for the first time, maybe we shouldn't either, that would
catch only compiler bugs rather than user code bugs, right?)) have 0xf8
on all corresponding bytes including the one that would otherwise have 0x01
through 0x07.  When unpoisoning at the end of the function, again you should
combine that with unpoisoning of the red zone and partial zone bytes plus
the last 0x01 through 0x07, etc.

Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK
unpoison appearing strictly before (i.e. dominating) the first (non-shadow) 
memory read
or write in the function (explicit or possible through function calls etc.)
you really don't need to unpoison (depending on whether we follow LLVM as
mentioned above then it can be removed without anything, or the decl needs
to be somehow marked and tell asan_emit_stack_protection it shouldn't poison
it at the start), and for ASAN_MARK poisoning appearing after the last
load/store in the function (post dominating those, you don't care about
noreturn though) you can combine that (remove the ASAN_MARK) with letting
asan_emit_stack_protection know it doesn't need to unpoison.

> +         char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0;
> +         for (unsigned i = 0; i < shadow_size; ++i)
> +           {
> +             emit_move_insn (var_mem, gen_int_mode (c, QImode));
> +             var_mem = adjust_address (var_mem, QImode, 1);

When you combine it with the loop, you can also use the infrastructure to
handle it 4 bytes at a time.

Another thing I've noticed is that the inline expansion of
__asan_unpoison_stack_memory you emit looks buggy.
In use-after-scope-1.c I see:
  _9 = (unsigned long) &my_char;
  _10 = _9 >> 3;
  _11 = _10 + 2147450880;
  _12 = (signed char *) _11;
  MEM[(short int *)_12] = 0;

That would be fine only for 16 byte long my_char, but we have instead 9 byte
one.  So I believe in that case we need to store
0x00, 0x01 bytes, for little endian thus 0x0100.  You could use for it
a function similarly to asan_shadow_cst, just build INTEGER_CST rather than
CONST_INT out of it.  In general, poisioning is storing 0xf8 to all affected
shadow bytes, unpoisioning should restore the state what we would emit
without use-after-scope sanitization, which is all but the last byte 0, and
the last byte 0 only if the var size is a multiple of 8, otherwise number
of valid bytes (1-7).

As for the option, it seems clang uses now
-fsanitize-address-use-after-scope option, while I don't like that much, if
they have already released some version with that option or if they are
unwilling to change, I'd go with their option.

> +         if (flag_stack_reuse != SR_NONE
> +             && flag_openacc
> +             && oacc_declare_returns != NULL)

This actually looks like preexisting OpenACC bug, I doubt the OpenACC
behavior should depend on -fstack-reuse= setting.

+      bool unpoison_var = asan_poisoned_variables.contains (t);
+      if (asan_sanitize_use_after_scope ()
+         && unpoison_var)
+       asan_poisoned_variables.remove (t);

Similarly to asan_handled_variables, I'd prefer it to be a pointer to
hash_set or something similar, so that it costs as few as possible for the
general case (no sanitization).  Similarly, querying the hash_set even for
no use-after-scope sanitization looks wrong.

+         if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())

I would say if asan_sanitize_stack_p () is false, then we should not be
doing use-after-scope sanitization (error if user requested that
explicitly).

Don't remember if I've mentioned it earlier, but for vars that are
TREE_ADDRESSABLE only because of ASAN_MARK calls, we should probably turn
them non-addressable and remove those ASAN_MARK calls, those shouldn't leak.
You can have a look at the r237814 change for how similarly
compare and exchange is special cased for the
addressables discovery (though, the ASAN_MARK case would be easier, just
drop it rather than turn it into something different).

I think I miss some testsuite coverage for what has been discussed, stuff
like:
  switch (x)
    {
      int a;
    case 1:
      int b;
      foo (&a);
      foo (&b);
      /* FALLTHRU */
    case 2:
      int c;
      foo (&a);
      foo (&b);
      foo (&c);
...
    }

On use-after-scope-goto-1.c it seems LLVM does a better job, there the goto
doesn't cross the declaration of any of the a/b/c/d/e/f vars, so doesn't
need unpoisioning of them; and another testcase is:
int main(int argc, char **argv)
{
  if (argc == 0)
  {
    int a;
    int b;
    int c;
    int d;
    int e;
    int f;
    int *ptr;
    int *ptr2;
    int *ptr3;
    int *ptr4;
    int *ptr5;
    int *ptr6;
    label:
      {
        a = 123; b = 123; c = 123; d = 123; e = 123; f = 123;
        ptr = &a;
        *ptr = 1;
        ptr2 = &b;
        *ptr2 = 1;
        ptr3 = &c;
        *ptr3 = 1;
        ptr4 = &d;
        *ptr4 = 1;
        ptr5 = &e;
        *ptr5 = 1;
        ptr6 = &f;
        *ptr6 = 1;
        return 0;
      }
  }
  else
    goto label;

  return 0;
}

I believe the C/C++ FEs have warnings/errors for crossing initialization of
a variable with goto, can't that infrastructure be used also for discovery
of what variables needs unpoisoning when entering a scope through a goto?
Of course for something like setjmp (if we need to do anything about it at
all, maybe longjmp just clears it), or non-local or computed goto we
probably have to be conservative, but for the common case of local goto
can't we figure out what needs to be unpoisoned either in the FEs, or during
gimplification, and emit the unpoisoning not at the label, but on the edge
between the goto and the label?  

        Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope

Reply via email to