I don't have any special expertise in this matter, but the
possibility occurs to me that if the caller is an improperly vetted
runtime linked-in agent such as a device driver, then the stack
scrubbing might accidently or intentionally be omitted, reopening
the security hole that stack scrubbing is intended to close.

Having the scrubbing occur in the calle means the callee controls
what information is returned, making it responsible for its own
security.

Someone with a deeper understanding of the security reasons for
stack scrubbing may know whether my concern has any basis.

- Patrick McGehearty


On 7/14/2021 12:28 AM, Alexandre Oliva wrote:
I've been working on an implementation of stack scrubbing, strub for
short.  It's quite different from the one that Embecosm folks presented
at the Cauldron, in that this one aims to be machine-independent.
Instead of machine-specific tweaking of epilogue logic to zero out a
function's own stack frame, this design performs scrubbing at callers,
passing to strubbed functions a watermark pointer, that they update as
they move the stack pointer.  The caller performs the stack cleaning up
as a "finally" block after the call.

- functions explicitly marked for strubbing, or internal-to-a-unit
functions that use variables marked as requiring strubbing, just have
their signature modified to add the watermark pointer, and they update
the watermark at the end of the prologue and after alloca calls.

- for functions that require strubbing (say, for using variables that
require strubbing) but whose interface cannot be modified, the body is
split into a clone, and the function is turned into a wrapper that calls
the clone with its modified calling conventions, and then performs the
strubbing.  Variable argument lists and of builtin apply args are passed
as extra arguments to the wrapped function, so that these features are
not obstacles to strubbing.  Large (> 4 words) arguments that are not
gimple registers are passed by reference from the wrapper to the wrapped
clone, to avoid duplicate copying.

This is currently prototyped with an implementation that enables
strubbing for nearly every function.  Some exceptions are always_inline
functions, and externally-visible functions with attributes that prevent
cloning/splitting.

Inlining strubbed functions into non-strubbed ones is not allowed (this
would reverse the wrapping); I'm yet to figure out how to enable
inlining of a wrapped body when the wrapper gets inlined into a strubbed
function.  Furthermore, I'm yet to implement logic to prevent strubbed
functions from calling non-strubbed functions.

The prototype bootstraps on x86_64-linux-gnu, and builds some working
cross toolchains.  I expect to contribute it not long after it's
completed.  For now, I look forward to feedback on any potentially
objectionable implementation details that I've described, and I welcome
advice on some issues described below.


I've added a builtin that returns the stack address, and 3 new entry
points in libgcc, each one also associated with a builtin: one to be
called before a strubbed function, to initialize the watermark to be
passed to it, one to update the watermark, and one to clean the stack up
to the watermark.  I'm considering making them independently inlineable,
inlining none of them at -O0, the first one at -O[1gs], the second one
at -O2, and all of them at -O3.

Functions and variables with strubbing functionality are to be marked
with an attribute, and I'm leaning towards naming it "strub".  For
functions, I intend the attribute to take a parameter, to select between
the two strubbing modes, or to disable strubbing, whether enabling or
preventing calls from strubbing functions.  Internally, I'm using a
numeric encoding for this attribute parameter, but I'm considering using
such mnemonic terms as "at_calls", "internal", "callable", and
"disabled".  WDYT?

I'm also considering the possibility of adding yet another scrubbing
mode, that would select optional machine-dependent epilogue logic, as
implemented by Embecosm.  That will depend on schedule and on whether
this possibility is found to be useful.  Extending it to catch
exceptions and perform strubbing of the propagating frame seems more
challenging than the caller-based strubbing I've implemented, with
exception support.  I could use feedback on the usefulness of this
strubbing mode (and on any issues with the others :-)


The prototype uses modified copies of create_wrapper and expand_thunk
for the wrapping.  I find the body copying and dropping wasteful,
constraining, and, in some cases, bug-inducing (taking address of labels
comes to mind).  I wonder if it would be acceptable to introduce
wrapping logic to short-circuit the process, moving the body instead of
copying it, and introducing hooks to grant callers better control over
argument passing.  The approaches to va_list and apply_args, and to
passing some arguments by reference could presumably be useful to other
future wrapping transformations.

It would be nice if the arguments turned into by-reference were NOT
detached from their abstract origin, but rather were supported as a new
IPA_PARAM_OP_ kind.  Do these sound like worth pursuing to make these
possibilities available to others?


Thanks in advance for feedback and advice,


Reply via email to