Jeff Law wrote:

> So would a half-half (2k caller/2k callee) split like Florian has
> proposed work for you?  ie, we simply declare a caller that pushes the
> stack pointer 2k or more into the guard as broken?

My results show using a 4KB guard size is not ideal. 2KB of outgoing
args is too large (it means 264 arguments) so it's a waste of the limited
range.

> While I'm not really comfortable with the 2k/2k split in general, I
> think I can support it from a Red Hat point of view -- largely because
> we use 64k pages in RHEL.  So our guard is actually 64k.  Having a
> hostile call chain push 2k into the guard isn't so bad in that case.

A minimum guard size of 64KB seems reasonable even on systems with
4KB pages. However whatever the chosen guard size, you cannot defend
against hostile code. An OS can of course increase the guard size well 
beyond the minimum required, but that's simply reducing the probability -
it is never going to block a big unchecked alloca.

> In fact, with a 64k guard, the dynamic area dwarfs the outgoing args as
> the area of most concern.  And with that size guard we're far more
> likely to see an attempted jump with an unconstrained alloca rather than
> with a fairly large alloca.

Outgoing args are not an area for concern even with a 4KB stack guard.

Unconstrained alloca's are indeed the most risky - if a program has a single
alloca that can be controlled then it is 100% unsafe no matter how many
million probes you do elsewhere.

> And if there's an attacker controlled unconstrained alloca, then, well,
> we've lost because no matter the size of the guard, the attacker can
> jump it and there isn't a damn thing the callee can do about it.

Exactly. This means checking dynamic allocation by default is essential.

> I believe Richard Earnshaw indicated that 4k pages were in use on some
> aarch64 systems, so I didn't try to use a larger probe interval.  Though
> I certainly considered it.  I think that's a decision that belongs in
> the hands of the target maintainers.  Though I think it's best if you're
> going to use a larger probe interval to mandate a suitable page size in
> the ABI.

The probe interval doesn't have to be the same as the (minimum) page size.

> Some (simpler) tracking is still needed because allocations happen in
> potentially 3 places for aarch64.  There's almost certainly cases where
> none of them individually are larger than PROBE_INTERVAL, but as a group
> they could be.

In 99% of the frames only one stack allocation is made. There are a few
cases where the stack can be adjusted twice.

> So how about this.
>
> First we have to disallow any individual allocation from allocating more
> than PROBE_INTERVAL bytes.
>
> If the total frame size is less than X (where we're still discussing X),
> we emit no probes.

... and the outgoing args are smaller than Y.

> If the total frame size is greater than X, then after the first
> allocation we probe the highest address within in the just allocated
> area and we probe every PROBE_INTERVAL bytes thereafter as space is
> allocated.

To be safe I think we first need to probe and then allocate. Or are there going
to be extra checks in asynchronous interrupt handlers that check whether SP is
above the stack guard?

> PROBE_INTERVAL is currently 4k, but the aarch64 maintainers can choose
> to change that.  Note that significantly larger probe intervals may
> require tweaking the sequences -- I'm not familiar enough with the
> various immediate field limitations on aarch64 instructions to be sure
> either way on that issue.

On AArch64 a probe can reach up to 16MBytes using 2 instructions.

Wilco

    

Reply via email to