I don't have any special expertise in this matter, but the possibility occurs to me that if the caller is an improperly vetted runtime linked-in agent such as a device driver, then the stack scrubbing might accidently or intentionally be omitted, reopening the security hole that stack scrubbing is intended to close.
Having the scrubbing occur in the calle means the callee controls what information is returned, making it responsible for its own security. Someone with a deeper understanding of the security reasons for stack scrubbing may know whether my concern has any basis. - Patrick McGehearty On 7/14/2021 12:28 AM, Alexandre Oliva wrote:
I've been working on an implementation of stack scrubbing, strub for short. It's quite different from the one that Embecosm folks presented at the Cauldron, in that this one aims to be machine-independent. Instead of machine-specific tweaking of epilogue logic to zero out a function's own stack frame, this design performs scrubbing at callers, passing to strubbed functions a watermark pointer, that they update as they move the stack pointer. The caller performs the stack cleaning up as a "finally" block after the call. - functions explicitly marked for strubbing, or internal-to-a-unit functions that use variables marked as requiring strubbing, just have their signature modified to add the watermark pointer, and they update the watermark at the end of the prologue and after alloca calls. - for functions that require strubbing (say, for using variables that require strubbing) but whose interface cannot be modified, the body is split into a clone, and the function is turned into a wrapper that calls the clone with its modified calling conventions, and then performs the strubbing. Variable argument lists and of builtin apply args are passed as extra arguments to the wrapped function, so that these features are not obstacles to strubbing. Large (> 4 words) arguments that are not gimple registers are passed by reference from the wrapper to the wrapped clone, to avoid duplicate copying. This is currently prototyped with an implementation that enables strubbing for nearly every function. Some exceptions are always_inline functions, and externally-visible functions with attributes that prevent cloning/splitting. Inlining strubbed functions into non-strubbed ones is not allowed (this would reverse the wrapping); I'm yet to figure out how to enable inlining of a wrapped body when the wrapper gets inlined into a strubbed function. Furthermore, I'm yet to implement logic to prevent strubbed functions from calling non-strubbed functions. The prototype bootstraps on x86_64-linux-gnu, and builds some working cross toolchains. I expect to contribute it not long after it's completed. For now, I look forward to feedback on any potentially objectionable implementation details that I've described, and I welcome advice on some issues described below. I've added a builtin that returns the stack address, and 3 new entry points in libgcc, each one also associated with a builtin: one to be called before a strubbed function, to initialize the watermark to be passed to it, one to update the watermark, and one to clean the stack up to the watermark. I'm considering making them independently inlineable, inlining none of them at -O0, the first one at -O[1gs], the second one at -O2, and all of them at -O3. Functions and variables with strubbing functionality are to be marked with an attribute, and I'm leaning towards naming it "strub". For functions, I intend the attribute to take a parameter, to select between the two strubbing modes, or to disable strubbing, whether enabling or preventing calls from strubbing functions. Internally, I'm using a numeric encoding for this attribute parameter, but I'm considering using such mnemonic terms as "at_calls", "internal", "callable", and "disabled". WDYT? I'm also considering the possibility of adding yet another scrubbing mode, that would select optional machine-dependent epilogue logic, as implemented by Embecosm. That will depend on schedule and on whether this possibility is found to be useful. Extending it to catch exceptions and perform strubbing of the propagating frame seems more challenging than the caller-based strubbing I've implemented, with exception support. I could use feedback on the usefulness of this strubbing mode (and on any issues with the others :-) The prototype uses modified copies of create_wrapper and expand_thunk for the wrapping. I find the body copying and dropping wasteful, constraining, and, in some cases, bug-inducing (taking address of labels comes to mind). I wonder if it would be acceptable to introduce wrapping logic to short-circuit the process, moving the body instead of copying it, and introducing hooks to grant callers better control over argument passing. The approaches to va_list and apply_args, and to passing some arguments by reference could presumably be useful to other future wrapping transformations. It would be nice if the arguments turned into by-reference were NOT detached from their abstract origin, but rather were supported as a new IPA_PARAM_OP_ kind. Do these sound like worth pursuing to make these possibilities available to others? Thanks in advance for feedback and advice,