Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08

Jeff Law Wed, 12 Jul 2017 20:56:53 -0700

On 07/12/2017 06:31 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Jul 11, 2017 at 03:19:57PM -0600, Jeff Law wrote:
>> It introduces a new style of stack probing -fstack-check=clash,
>> including documentation of the new option, how it differs from
>> -fstack-check=specific, etc.
>>
>> FWIW -fstack-check=specific is dreadfully named.  I haven't tried to
>> address that.
> 
> -fstack-check=clash is itself not such a super name either.  It's not
> checking stack, and it isn't clashing: it just does a store to every
> page the stack will touch (minus the first and last or something).
Yea.  I don't particularly like it either.  Suggestions?  I considered
"probe" as well, but "specific" also does probing.  In the end I used
the part of the marketing name of the exploits.


I also considered trying to separate the fact that we are doing stack
probing from the exact method of probing.  So something like:

-fstack-check -fstack-check-probe=frob

And the current default would map to

-fstack-check -fstack-check-probe=ahead <probing ahead of need>

I didn't particularly like that either and using something like
-fstack-check -fstack-check-probe=frob is a bit cumbersome.  It's also
the case that we're changing two key aspects relative to
-fstack-check=specific.

1. We never probe ahead of need.  ie, if a function requests N bytes of
stack space, then we'll never probe beyond N bytes.  In contrast
-fstack-check=specific will tend to probe 2-3 pages beyond the N byte
request.

2. We probe as each page is allocated.  in contrast most
-fstack-check=specific implementations allocate all the space, then
probe into the space.

THe concept I keep coming back to is that we're changing probing policy.
 as-needed vs ahead-of-need for how much to probe.  And "as-allocated"
rather than "after-allocation" for when we emit the probe.

Certainly open to ideas on the interface aspects.

> 
> How does this work for targets that want to enable this by default?  How
> does that interact with checking for stack size overflow?
I don't currently have a way to enable it by default -- for my tests I
just slam the value I want into the initializer in common.opt :-)

It's independent of stack size overflow checking.  They could (in
theory) even co-exist on ports that support stack size overflow
checking, but I didn't test that.

> 
>> However, a sufficiently smart compiler could realize that a call to a
>> noreturn function could be converted into a jump, even if the caller has
>> a frame because that frame need not be torn down.
> 
> Currently GCC will never make a tail call to a noreturn function (see
> calls.c:can_implement_as_sibling_call_p).  It would be nice (for code
> size, if nothing else) if that was changed.  But you know all this :-)
:-)  Initially when I started looking at this issue I fully expected to
need to explicitly reject tail calls to noreturn functions to make
things safe.  Imagine my surprise when I found out that we look at the
preds of the exit block to find opportunities.  Which obviously doesn't
work with noreturn calls.

On the theory that someone might want to change that one day, there is a
test in the suite that will complain if we have a tail optimized call to
a noreturn function.  At the least it will spark a discussion about the
validity of such an optimization in a world where jumping the guard is a
concern.


> 
>> @@ -11333,7 +11334,8 @@ target support in the compiler but comes with the 
>> following drawbacks:
>>  @enumerate
>>  @item
>>  Modified allocation strategy for large objects: they are always
>> -allocated dynamically if their size exceeds a fixed threshold.
>> +allocated dynamically if their size exceeds a fixed threshold.  Note this
>> +may change the semantics of some code.
> 
> How so?  Semantics of valid (portable) code, as well?  It would be nice
> to have some detail here, not just a threat :-)
I'd have to go back and run the testsuite with generic checking enabled
by default.  I didn't dig deeply (because generic testing just isn't
interesting for so many reasons).  But what happened was we had a large
auto object, which gets turned into an alloca'd object with generic
checking.

We created that alloca'd object at the wrong lexical scope which mucked
up its expected persistence.  I'm sure I'd spot it trivially once I set
up the test again.

> 
>> +@samp{clash} is designed to prevent jumping the stack guard page as seen in
>> +stack clash style attacks.  However, it does not guarantee sufficient stack
>> +space to handle a signal in the event that the program hits the stack guard
>> +and is thus incompatible with Ada's needs.
> 
> Is there some way we could have both?
With kernel support, yes.  The kernel would set up a reserved stack area
contiguous with the guard and the two areas would move in unison.  Once
they're unable to move, the kernel could map in the reserved page(s) and
trigger the signal handler.


> 
> In principle stack-clash protection is completely orthogonal to
> -fstack-check.
To some degree, yes.  But on other levels they are tightly related.

In fact, the indirection with get_stack_check_protect allows us to use
the existing -fstack-check=specific prologue code in many targets to
give a fairly high amount of stack-clash protection.

The biggest problem is that -fstack-check=specific code allocates all
the stack it needs.  Then after allocation is complete it goes back and
probes the pages it allocated.

That leaves the target vulnerable to an async signal arriving after
allocation, but before probing and the signal handler running on a
clashed stack.


> 
>>    /* Check the stack and entirely rely on the target configuration
>> -     files, i.e. do not use the generic mechanism at all.  */
>> +     files, i.e. do not use the generic mechanism at all.  This
>> +     does not prevent stack guard jumping and stack clash style
>> +     attacks.  */
>>    FULL_BUILTIN_STACK_CHECK
>>  };
> 
>> +      else if (!strcmp (arg, "clash"))
>> +    {
>> +      /* This is the stack checking method, designed to prevent
>> +         stack-clash attacks.  */
>> +      if (!STACK_GROWS_DOWNWARD)
>> +        sorry ("-fstack-check=clash not implemented on this target");
>> +      else
>> +        opts->x_flag_stack_check = (STACK_CHECK_BUILTIN
>> +                                    ? FULL_BUILTIN_STACK_CHECK
>> +                                    : (STACK_CHECK_STATIC_BUILTIN
>> +                                       ? STACK_CLASH_BUILTIN_STACK_CHECK
>> +                                       : GENERIC_STACK_CHECK));
>> +    }
> 
> So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get
> stack clash protection if you asked for it specifically, without warning,
> if I read that correctly?
That's an unknown.  I'd have to dig into the guts of the alpha and spu
to understand precisely how their STACK_CHECK_BUILTIN works.


> 
>> +# Return 1 if the target has support for stack probing designed
>> +# to avoid stack-clash style attacks
>> +#
>> +# This is used to restrict the stack-clash mitigation tests to
>> +# just those targets that have been explicitly supported
> 
> Missing full stop, twice.  More later in the file.
Will fix.

> 
>> +proc check_effective_target_stack_clash_protected { } {
> 
> The name is maybe not so great: nothing is protected until you actually
> use the option.  "supported", maybe?
I hate all the names...  "supports_stack_clash_protection" perhaps?


> 
>> +# Return 1 if the target's calling sequence or its ABI
>> +# create implicit stack probes at *sp at function
>> +# entry.
>> +proc check_effective_target_caller_implicit_probes { } {
> 
> "At function entry" isn't really true for Power ("when setting up a
> stack frame", instead -- and you are required to set one up before
> calling anything).
I think it's close enough -- I'll ponder a better name.  s390x doesn't
really have caller implicit probes either, but stack saves in the callee
act like them (because the caller allocates the space for the callee's
save area).


> 
>> +# The stack realignment often causes residual stack allocations and
>> +# can also make it difficult to elide loops or residual allocations
>> +# for dynamic allocations
>> +proc check_effective_target_callee_realigns_stack { } {
> 
> Does this return 1 if the callee always realigns the stack?  Or maybe
> realigns the stack?
Yea, I can clarify that it's the "callee may realign the stack".

Thanks for the comments questions.  I've got more to respond to, but
probably won't get to it tonight.

jeff

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08

Reply via email to