[ARM] Fix PR85434: spill of stack protector's guard address
On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by loading its address first before loading its value from it as part of the stack_protect_set or stack_protect_check insn pattern. This creates the risk of spilling between the two. It is particularly likely on Aarch32 when compiling PIC code because - computing the address takes several instructions (first compute the GOT base and then the GOT entry by adding an offset) which increases the likelyhood of CSE - the address computation can be CSEd due to the GOT entry computation being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC of a MEM like on AArche64. This patch address both issues by (i) adding some scheduler barriers around the stack protector code and (ii) making all memory loads involved in computing the guard's address volatile. The use of volatile rather than unspec was chosen so that the patterns for computing the guard address can be the same as for normal global variable access thus reusing more code. Finally the patch also improves the documentation to mention the need to be careful when computing the address of the guard. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-27 Thomas Preud'homme PR target/85434 * cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around stack protector code. * function.c (stack_protect_epilogue): Likewise. * config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare. * config/arm/arm.md (calculate_pic_address): Mark memory volatile if is computing address of stack protector's guard. (calculate_pic_address splitter): Likewise. * config/arm/arm.c (require_pic_register): Add parameter to control whether to insert instruction at the end of the instruction stream. (legitimize_pic_address): Force computing PIC address at the end of instruction stream and adapt logic to change in calculate_pic_address insn pattern. (arm_stack_chk_guard_decl_p): New function. (arm_emit_call_insn): Adapt to change in require_pic_register(). * target.def (TARGET_STACK_PROTECT_GUARD): Document requirement on guard's address computation to be careful about not spilling. * doc/tm.texi: Regenerate. *** gcc/testsuite/ChangeLog *** 2018-04-27 Thomas Preud'homme PR target/85434 * gcc.target/arm/pr85434.c: New testcase. Testing: The code has been boostraped on an Armv8-A machine targeting: - Aarch32 ARM mode with -mfpu=neon-fpv4 and hardfloat ABI - Aarch64 Testsuite has been run for the following sets of flag: - arm-eabi-aem/-mthumb/-march=armv4t - arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp - arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard (thereby testing the code for ARM, Thumb-2 and Thumb-1 mode) without any regression. Is it ok for trunk? Best regards, Thomas From 76c48e31130f212721addeeca830477e3b6f5e10 Mon Sep 17 00:00:00 2001 From: Thomas Preud'homme Date: Mon, 23 Apr 2018 14:37:11 +0100 Subject: [PATCH] [ARM] Fix PR85434: spill of stack protector's guard address On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by loading its address first before loading its value from it as part of the stack_protect_set or stack_protect_check insn pattern. This creates the risk of spilling between the two. It is particularly likely on Aarch32 when compiling PIC code because - computing the address takes several instructions (first compute the GOT base and then the GOT entry by adding an offset) which increases the likelyhood of CSE - the address computation can be CSEd due to the GOT entry computation being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC of a MEM like on AArche64. This patch address both issues by (i) adding some scheduler barriers around the stack protector code and (ii) making all memory loads involved in computing the guard's address volatile. The use of volatile rather than unspec was chosen so that the patterns for computing the guard address can be the same as for normal global variable access thus reusing more code. Finally the patch also improves the documentation to mention the need to be careful when computing the address of the guard. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-27 Thomas Preud'homme * cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around stack protector code. * function.c (stack_protect_epilogue): Likewise. * config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare. * config/arm/arm.md (calculate_pic_address): Mark memory volatile if is computing address of stack protector's guard. (calculate_pic_address splitter): Likewise. * config/arm/arm.c (require_pic_register): Add parameter to control whether to insert instruction at the end of the instru
[ARM] Fix PR85434: spill of stack protector's guard address
I'll make a fool of myself but I still have further questions if you don't mind (see inline). On Friday, 4 May 2018, Segher Boessenkool wrote: > Hi! > > On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote: >> As mentionned in the ticket this was my first thought but this means >> making the pattern aware of all the possible way the address could be >> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many >> scratch registers are needed. I'd rather reuse the existing pattern as >> much as possible to make sure they are well tested. Ideally I wanted a >> way to mark a REG RTX so that it is never spilled and such that the >> mark is propagated when the register is moved to another register or >> propagated. But that is a bigger change so decided it should be an >> improvement for later but needed another solution right now. > > How would that work, esp. for pseudos? If too many regs have such a > mark then the compiler will have to sorry() or similar, not a good > thing at all. I'm missing something, there should be the same amount of pseudo with that mark as there is scratch in the new pattern doing memory address load(s) + set / check. I'm guessing this is not as easy to achieve as it sounds. > >> By the way about making sure the address is not left in a register, I >> have a question regarding the current stack_protect_set and >> stack_protect_check pattern and their requirements to have register >> cleared afterwards: why is that necessary? Currently not all registers >> are cleared and the guard is available in the canari before it is >> overwritten anyway so I don't see how clearing the register adds any >> extra security. What sort of attack is it protecting against? > > From md.texi: > > @item @samp{stack_protect_set} > This pattern, if defined, moves a @code{ptr_mode} value from the memory > in operand 1 to the memory in operand 0 without leaving the value in > a register afterward. This is to avoid leaking the value some place > that an attacker might use to rewrite the stack guard slot after > having clobbered it. > > (etc.) I've read that doc but what I don't understand is why the guard value being leaked in a register would be a problem if modified. The pattern as they are guarantee the guard is always reloaded from its canonical location (e.g. TLS var). Because the patterns do not represent in RTL what they do the compiler could not reuse the value left in a register. Are we worrying about optimization the assembler could do? > > Having the canary in a global variable makes it a lot easier for exploit > code to access it then if it is e.g. in TLS data. Actually leaking a > pointer to it would make it extra easy... If an attacker can execute code to access and modify the guard, why would s/he bother doing a stack overflow instead of just executing the code he wants to directly? Best regards, Thomas
Re: [ARM] Fix PR85434: spill of stack protector's guard address
Thomas Preudhomme writes: > On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by > loading its address first before loading its value from it as part of > the stack_protect_set or stack_protect_check insn pattern. This creates > the risk of spilling between the two. > > It is particularly likely on Aarch32 when compiling PIC code because > - computing the address takes several instructions (first compute the > GOT base and then the GOT entry by adding an offset) which increases > the likelyhood of CSE > - the address computation can be CSEd due to the GOT entry computation > being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC > of a MEM like on AArche64. > > This patch address both issues by (i) adding some scheduler barriers > around the stack protector code and (ii) making all memory loads > involved in computing the guard's address volatile. The use of volatile > rather than unspec was chosen so that the patterns for computing the > guard address can be the same as for normal global variable access thus > reusing more code. Finally the patch also improves the documentation to > mention the need to be careful when computing the address of the guard. [...] > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index deab929..c7ced8f 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -6156,6 +6156,10 @@ stack_protect_prologue (void) >tree guard_decl = targetm.stack_protect_guard (); >rtx x, y; > > + /* Prevent scheduling of instruction(s) between computation of the guard's > + address and setting of the canari to avoid any spill of the guard's > + address if computed outside the setting of the canari. */ > + emit_insn (gen_blockage ()); >x = expand_normal (crtl->stack_protect_guard); >if (guard_decl) > y = expand_normal (guard_decl); Scheduling barriers should definitely make spilling less likely, but I don't think they avoid it completely. For that I think we'd need to: (a) make sure that gen_stack_protect_set gets passed (mem (symbol_ref)), which we can probably do by using DECL_RTL directly. (Or failing that, expand_expr with EXPAND_CONST_ADDRESS should work.) (b) make the target pattern accept (mem (symbol_ref)) and only legitimise it during a post-RA split. The scheduling barriers would then be unnecessary, since the pattern would be indivisible and self-contained until after RA. On its own that would probably mean changing all backends to accept the new style of mem. One way of avoiding that would be to use the maybe_expand_insn interface with targetm.code_for_stack_protect_set instead of calling targetm.gen_stack_protect_set directly. Arguably that's better anyway. Thanks, Richard
Re: [ARM] Fix PR85434: spill of stack protector's guard address
Hi! On Sat, Apr 28, 2018 at 12:32:26AM +0100, Thomas Preudhomme wrote: > On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by > loading its address first before loading its value from it as part of > the stack_protect_set or stack_protect_check insn pattern. This creates > the risk of spilling between the two. > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index deab929..c7ced8f 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -6156,6 +6156,10 @@ stack_protect_prologue (void) >tree guard_decl = targetm.stack_protect_guard (); >rtx x, y; > > + /* Prevent scheduling of instruction(s) between computation of the guard's > + address and setting of the canari to avoid any spill of the guard's > + address if computed outside the setting of the canari. */ > + emit_insn (gen_blockage ()); >x = expand_normal (crtl->stack_protect_guard); >if (guard_decl) > y = expand_normal (guard_decl); [ etc. ] Why pessimise code for all targets (quite a lot), when it does not even fix the problem on Arm completely (or not obviously, anyway)? Instead, implement stack_protect_* and hide the memory accesses to the stored canary value (and make sure its address is not left in a register either!) I doubt this can be done completely target-independent, it will always be best effort that way, aka it won't really work. Segher
Re: [ARM] Fix PR85434: spill of stack protector's guard address
Hi Segher, As mentionned in the ticket this was my first thought but this means making the pattern aware of all the possible way the address could be access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many scratch registers are needed. I'd rather reuse the existing pattern as much as possible to make sure they are well tested. Ideally I wanted a way to mark a REG RTX so that it is never spilled and such that the mark is propagated when the register is moved to another register or propagated. But that is a bigger change so decided it should be an improvement for later but needed another solution right now. By the way about making sure the address is not left in a register, I have a question regarding the current stack_protect_set and stack_protect_check pattern and their requirements to have register cleared afterwards: why is that necessary? Currently not all registers are cleared and the guard is available in the canari before it is overwritten anyway so I don't see how clearing the register adds any extra security. What sort of attack is it protecting against? Best regards, Thomas On 29 April 2018 at 00:33, Segher Boessenkool wrote: > Hi! > > On Sat, Apr 28, 2018 at 12:32:26AM +0100, Thomas Preudhomme wrote: >> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by >> loading its address first before loading its value from it as part of >> the stack_protect_set or stack_protect_check insn pattern. This creates >> the risk of spilling between the two. > >> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c >> index deab929..c7ced8f 100644 >> --- a/gcc/cfgexpand.c >> +++ b/gcc/cfgexpand.c >> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void) >>tree guard_decl = targetm.stack_protect_guard (); >>rtx x, y; >> >> + /* Prevent scheduling of instruction(s) between computation of the guard's >> + address and setting of the canari to avoid any spill of the guard's >> + address if computed outside the setting of the canari. */ >> + emit_insn (gen_blockage ()); >>x = expand_normal (crtl->stack_protect_guard); >>if (guard_decl) >> y = expand_normal (guard_decl); > > [ etc. ] > > Why pessimise code for all targets (quite a lot), when it does not even > fix the problem on Arm completely (or not obviously, anyway)? > > Instead, implement stack_protect_* and hide the memory accesses to the > stored canary value (and make sure its address is not left in a register > either!) > > I doubt this can be done completely target-independent, it will always > be best effort that way, aka it won't really work. > > > Segher
Re: [ARM] Fix PR85434: spill of stack protector's guard address
Hi! On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote: > As mentionned in the ticket this was my first thought but this means > making the pattern aware of all the possible way the address could be > access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many > scratch registers are needed. I'd rather reuse the existing pattern as > much as possible to make sure they are well tested. Ideally I wanted a > way to mark a REG RTX so that it is never spilled and such that the > mark is propagated when the register is moved to another register or > propagated. But that is a bigger change so decided it should be an > improvement for later but needed another solution right now. How would that work, esp. for pseudos? If too many regs have such a mark then the compiler will have to sorry() or similar, not a good thing at all. > By the way about making sure the address is not left in a register, I > have a question regarding the current stack_protect_set and > stack_protect_check pattern and their requirements to have register > cleared afterwards: why is that necessary? Currently not all registers > are cleared and the guard is available in the canari before it is > overwritten anyway so I don't see how clearing the register adds any > extra security. What sort of attack is it protecting against? >From md.texi: @item @samp{stack_protect_set} This pattern, if defined, moves a @code{ptr_mode} value from the memory in operand 1 to the memory in operand 0 without leaving the value in a register afterward. This is to avoid leaking the value some place that an attacker might use to rewrite the stack guard slot after having clobbered it. (etc.) Having the canary in a global variable makes it a lot easier for exploit code to access it then if it is e.g. in TLS data. Actually leaking a pointer to it would make it extra easy... Segher
Re: [ARM] Fix PR85434: spill of stack protector's guard address
On 05/03/2018 10:55 AM, Segher Boessenkool wrote: > Hi! > > On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote: >> As mentionned in the ticket this was my first thought but this means >> making the pattern aware of all the possible way the address could be >> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many >> scratch registers are needed. I'd rather reuse the existing pattern as >> much as possible to make sure they are well tested. Ideally I wanted a >> way to mark a REG RTX so that it is never spilled and such that the >> mark is propagated when the register is moved to another register or >> propagated. But that is a bigger change so decided it should be an >> improvement for later but needed another solution right now. > > How would that work, esp. for pseudos? If too many regs have such a > mark then the compiler will have to sorry() or similar, not a good > thing at all. > >> By the way about making sure the address is not left in a register, I >> have a question regarding the current stack_protect_set and >> stack_protect_check pattern and their requirements to have register >> cleared afterwards: why is that necessary? Currently not all registers >> are cleared and the guard is available in the canari before it is >> overwritten anyway so I don't see how clearing the register adds any >> extra security. What sort of attack is it protecting against? > > From md.texi: > > @item @samp{stack_protect_set} > This pattern, if defined, moves a @code{ptr_mode} value from the memory > in operand 1 to the memory in operand 0 without leaving the value in > a register afterward. This is to avoid leaking the value some place > that an attacker might use to rewrite the stack guard slot after > having clobbered it. > > (etc.) > > Having the canary in a global variable makes it a lot easier for exploit > code to access it then if it is e.g. in TLS data. Actually leaking a > pointer to it would make it extra easy... Yup. And at least one guard (not the stack guard) has that properly. It's stuffed into static storage. Not good. Though I must admit it was awful convenient when tracking down a bug in how the guard was used. jeff
Re: [ARM] Fix PR85434: spill of stack protector's guard address
Hi Thomas, On Fri, May 04, 2018 at 05:52:57AM +0100, Thomas Preudhomme wrote: > >> As mentionned in the ticket this was my first thought but this means > >> making the pattern aware of all the possible way the address could be > >> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many > >> scratch registers are needed. I'd rather reuse the existing pattern as > >> much as possible to make sure they are well tested. Ideally I wanted a > >> way to mark a REG RTX so that it is never spilled and such that the > >> mark is propagated when the register is moved to another register or > >> propagated. But that is a bigger change so decided it should be an > >> improvement for later but needed another solution right now. > > > > How would that work, esp. for pseudos? If too many regs have such a > > mark then the compiler will have to sorry() or similar, not a good > > thing at all. > > I'm missing something, there should be the same amount of pseudo with that > mark as there is scratch in the new pattern doing memory address load(s) + > set / check. I'm guessing this is not as easy to achieve as it sounds. But this pattern is expanded all the way at the beginning of the RTL pipeline, so you'll need to prevent anything copying this. And if any other pattern wants to use this do-not-spill feature as well, you'll have a problem no matter what. > >> By the way about making sure the address is not left in a register, I > >> have a question regarding the current stack_protect_set and > >> stack_protect_check pattern and their requirements to have register > >> cleared afterwards: why is that necessary? Currently not all registers > >> are cleared and the guard is available in the canari before it is > >> overwritten anyway so I don't see how clearing the register adds any > >> extra security. What sort of attack is it protecting against? > > > > From md.texi: > > > > @item @samp{stack_protect_set} > > This pattern, if defined, moves a @code{ptr_mode} value from the memory > > in operand 1 to the memory in operand 0 without leaving the value in > > a register afterward. This is to avoid leaking the value some place > > that an attacker might use to rewrite the stack guard slot after > > having clobbered it. > > > > (etc.) > > I've read that doc but what I don't understand is why the guard value being > leaked in a register would be a problem if modified. The pattern as they > are guarantee the guard is always reloaded from its canonical location > (e.g. TLS var). Because the patterns do not represent in RTL what they do > the compiler could not reuse the value left in a register. Are we worrying > about optimization the assembler could do? > > > Having the canary in a global variable makes it a lot easier for exploit > > code to access it then if it is e.g. in TLS data. Actually leaking a > > pointer to it would make it extra easy... > > If an attacker can execute code to access and modify the guard, why would > s/he bother doing a stack overflow instead of just executing the code he > wants to directly? The issue is leaking the value so the user can observe it, and then when overwriting the stack write the expected value to the cookie again, so that the protection isn't triggered. You don't necessarily need to execute code of your choice to overwrite a memory location of your choice, fwiw. SSP does not prevent all attacks, just very many. Segher
Re: [ARM] Fix PR85434: spill of stack protector's guard address
On 05/02/2018 12:57 AM, Thomas Preudhomme wrote: > Hi Segher, > > As mentionned in the ticket this was my first thought but this means > making the pattern aware of all the possible way the address could be > access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many > scratch registers are needed. I'd rather reuse the existing pattern as > much as possible to make sure they are well tested. Ideally I wanted a > way to mark a REG RTX so that it is never spilled and such that the > mark is propagated when the register is moved to another register or > propagated. But that is a bigger change so decided it should be an > improvement for later but needed another solution right now. > > By the way about making sure the address is not left in a register, I > have a question regarding the current stack_protect_set and > stack_protect_check pattern and their requirements to have register > cleared afterwards: why is that necessary? Currently not all registers > are cleared and the guard is available in the canari before it is > overwritten anyway so I don't see how clearing the register adds any > extra security. What sort of attack is it protecting against? I'm not aware of any way to make a REG so that it's never spilled. It's a concept we simply don't have. About the closest you get is a fixed register. But you certainly don't want to do that. I really think you're going to have to address this primarily in the ARM backend, probably making a fair amount of things opaque to the rest of the compiler. Jeff