[ARM] Fix PR85434: spill of stack protector's guard address

2018-04-27 Thread Thomas Preudhomme
On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
loading its address first before loading its value from it as part of
the stack_protect_set or stack_protect_check insn pattern. This creates
the risk of spilling between the two.

It is particularly likely on Aarch32 when compiling PIC code because
- computing the address takes several instructions (first compute the
  GOT base and then the GOT entry by adding an offset) which increases
  the likelyhood of CSE
- the address computation can be CSEd due to the GOT entry computation
  being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC
  of a MEM like on AArche64.

This patch address both issues by (i) adding some scheduler barriers
around the stack protector code and (ii) making all memory loads
involved in computing the guard's address volatile. The use of volatile
rather than unspec was chosen so that the patterns for computing the
guard address can be the same as for normal global variable access thus
reusing more code. Finally the patch also improves the documentation to
mention the need to be careful when computing the address of the guard.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2018-04-27  Thomas Preud'homme  

PR target/85434
* cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around
stack protector code.
* function.c (stack_protect_epilogue): Likewise.
* config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare.
* config/arm/arm.md (calculate_pic_address): Mark memory volatile if
is computing address of stack protector's guard.
(calculate_pic_address splitter): Likewise.
* config/arm/arm.c (require_pic_register): Add parameter to control
whether to insert instruction at the end of the instruction stream.
(legitimize_pic_address): Force computing PIC address at the end of
instruction stream and adapt logic to change in calculate_pic_address
insn pattern.
(arm_stack_chk_guard_decl_p): New function.
(arm_emit_call_insn): Adapt to change in require_pic_register().
* target.def (TARGET_STACK_PROTECT_GUARD): Document requirement on
guard's address computation to be careful about not spilling.
* doc/tm.texi: Regenerate.

*** gcc/testsuite/ChangeLog ***

2018-04-27  Thomas Preud'homme  

PR target/85434
* gcc.target/arm/pr85434.c: New testcase.

Testing: The code has been boostraped on an Armv8-A machine targeting:
- Aarch32 ARM mode with -mfpu=neon-fpv4 and hardfloat ABI
- Aarch64
Testsuite has been run for the following sets of flag:

- arm-eabi-aem/-mthumb/-march=armv4t
- arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
- 
arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard

(thereby testing the code for ARM, Thumb-2 and Thumb-1 mode) without any
regression.

Is it ok for trunk?

Best regards,

Thomas
From 76c48e31130f212721addeeca830477e3b6f5e10 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme 
Date: Mon, 23 Apr 2018 14:37:11 +0100
Subject: [PATCH] [ARM] Fix PR85434: spill of stack protector's guard address

On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
loading its address first before loading its value from it as part of
the stack_protect_set or stack_protect_check insn pattern. This creates
the risk of spilling between the two.

It is particularly likely on Aarch32 when compiling PIC code because
- computing the address takes several instructions (first compute the
  GOT base and then the GOT entry by adding an offset) which increases
  the likelyhood of CSE
- the address computation can be CSEd due to the GOT entry computation
  being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC
  of a MEM like on AArche64.

This patch address both issues by (i) adding some scheduler barriers
around the stack protector code and (ii) making all memory loads
involved in computing the guard's address volatile. The use of volatile
rather than unspec was chosen so that the patterns for computing the
guard address can be the same as for normal global variable access thus
reusing more code. Finally the patch also improves the documentation to
mention the need to be careful when computing the address of the guard.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2018-04-27  Thomas Preud'homme  

	* cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around
	stack protector code.
	* function.c (stack_protect_epilogue): Likewise.
	* config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare.
	* config/arm/arm.md (calculate_pic_address): Mark memory volatile if
	is computing address of stack protector's guard.
	(calculate_pic_address splitter): Likewise.
	* config/arm/arm.c (require_pic_register): Add parameter to control
	whether to insert instruction at the end of the instru

[ARM] Fix PR85434: spill of stack protector's guard address

2018-05-03 Thread Thomas Preudhomme
I'll make a fool of myself but I still have further questions if you don't
mind (see inline).

On Friday, 4 May 2018, Segher Boessenkool 
wrote:
> Hi!
>
> On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote:
>> As mentionned in the ticket this was my first thought but this means
>> making the pattern aware of all the possible way the address could be
>> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
>> scratch registers are needed. I'd rather reuse the existing pattern as
>> much as possible to make sure they are well tested. Ideally I wanted a
>> way to mark a REG RTX so that it is never spilled and such that the
>> mark is propagated when the register is moved to another register or
>> propagated. But that is a bigger change so decided it should be an
>> improvement for later but needed another solution right now.
>
> How would that work, esp. for pseudos?  If too many regs have such a
> mark then the compiler will have to sorry() or similar, not a good
> thing at all.

I'm missing something, there should be the same amount of pseudo with that
mark as there is scratch in the new pattern doing memory address load(s) +
set / check. I'm guessing this is not as easy to achieve as it sounds.

>
>> By the way about making sure the address is not left in a register, I
>> have a question regarding the current stack_protect_set and
>> stack_protect_check pattern and their requirements to have register
>> cleared afterwards: why is that necessary? Currently not all registers
>> are cleared and the guard is available in the canari before it is
>> overwritten anyway so I don't see how clearing the register adds any
>> extra security. What sort of attack is it protecting against?
>
> From md.texi:
>
> @item @samp{stack_protect_set}
> This pattern, if defined, moves a @code{ptr_mode} value from the memory
> in operand 1 to the memory in operand 0 without leaving the value in
> a register afterward.  This is to avoid leaking the value some place
> that an attacker might use to rewrite the stack guard slot after
> having clobbered it.
>
> (etc.)

I've read that doc but what I don't understand is why the guard value being
leaked in a register would be a problem if modified. The pattern as they
are guarantee the guard is always reloaded from its canonical location
(e.g. TLS var). Because the patterns do not represent in RTL what they do
the compiler could not reuse the value left in a register. Are we worrying
about optimization the assembler could do?

>
> Having the canary in a global variable makes it a lot easier for exploit
> code to access it then if it is e.g. in TLS data.  Actually leaking a
> pointer to it would make it extra easy...

If an attacker can execute code to access and modify the guard, why would
s/he bother doing a stack overflow instead of just executing the code he
wants to directly?

Best regards,

Thomas


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-04-28 Thread Richard Sandiford
Thomas Preudhomme  writes:
> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
> loading its address first before loading its value from it as part of
> the stack_protect_set or stack_protect_check insn pattern. This creates
> the risk of spilling between the two.
>
> It is particularly likely on Aarch32 when compiling PIC code because
> - computing the address takes several instructions (first compute the
>   GOT base and then the GOT entry by adding an offset) which increases
>   the likelyhood of CSE
> - the address computation can be CSEd due to the GOT entry computation
>   being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC
>   of a MEM like on AArche64.
>
> This patch address both issues by (i) adding some scheduler barriers
> around the stack protector code and (ii) making all memory loads
> involved in computing the guard's address volatile. The use of volatile
> rather than unspec was chosen so that the patterns for computing the
> guard address can be the same as for normal global variable access thus
> reusing more code. Finally the patch also improves the documentation to
> mention the need to be careful when computing the address of the guard.

[...]

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index deab929..c7ced8f 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void)
>tree guard_decl = targetm.stack_protect_guard ();
>rtx x, y;
>  
> +  /* Prevent scheduling of instruction(s) between computation of the guard's
> + address and setting of the canari to avoid any spill of the guard's
> + address if computed outside the setting of the canari.  */
> +  emit_insn (gen_blockage ());
>x = expand_normal (crtl->stack_protect_guard);
>if (guard_decl)
>  y = expand_normal (guard_decl);

Scheduling barriers should definitely make spilling less likely,
but I don't think they avoid it completely.  For that I think we'd need to:

(a) make sure that gen_stack_protect_set gets passed (mem (symbol_ref)),
which we can probably do by using DECL_RTL directly.  (Or failing that,
expand_expr with EXPAND_CONST_ADDRESS should work.)

(b) make the target pattern accept (mem (symbol_ref)) and only legitimise
it during a post-RA split.

The scheduling barriers would then be unnecessary, since the pattern
would be indivisible and self-contained until after RA.

On its own that would probably mean changing all backends to accept
the new style of mem.  One way of avoiding that would be to use the
maybe_expand_insn interface with targetm.code_for_stack_protect_set
instead of calling targetm.gen_stack_protect_set directly.
Arguably that's better anyway.

Thanks,
Richard


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-04-28 Thread Segher Boessenkool
Hi!

On Sat, Apr 28, 2018 at 12:32:26AM +0100, Thomas Preudhomme wrote:
> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
> loading its address first before loading its value from it as part of
> the stack_protect_set or stack_protect_check insn pattern. This creates
> the risk of spilling between the two.

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index deab929..c7ced8f 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void)
>tree guard_decl = targetm.stack_protect_guard ();
>rtx x, y;
>  
> +  /* Prevent scheduling of instruction(s) between computation of the guard's
> + address and setting of the canari to avoid any spill of the guard's
> + address if computed outside the setting of the canari.  */
> +  emit_insn (gen_blockage ());
>x = expand_normal (crtl->stack_protect_guard);
>if (guard_decl)
>  y = expand_normal (guard_decl);

[ etc. ]

Why pessimise code for all targets (quite a lot), when it does not even
fix the problem on Arm completely (or not obviously, anyway)?

Instead, implement stack_protect_* and hide the memory accesses to the
stored canary value (and make sure its address is not left in a register
either!)

I doubt this can be done completely target-independent, it will always
be best effort that way, aka it won't really work.


Segher


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-05-01 Thread Thomas Preudhomme
Hi Segher,

As mentionned in the ticket this was my first thought but this means
making the pattern aware of all the possible way the address could be
access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
scratch registers are needed. I'd rather reuse the existing pattern as
much as possible to make sure they are well tested. Ideally I wanted a
way to mark a REG RTX so that it is never spilled and such that the
mark is propagated when the register is moved to another register or
propagated. But that is a bigger change so decided it should be an
improvement for later but needed another solution right now.

By the way about making sure the address is not left in a register, I
have a question regarding the current stack_protect_set and
stack_protect_check pattern and their requirements to have register
cleared afterwards: why is that necessary? Currently not all registers
are cleared and the guard is available in the canari before it is
overwritten anyway so I don't see how clearing the register adds any
extra security. What sort of attack is it protecting against?

Best regards,

Thomas

On 29 April 2018 at 00:33, Segher Boessenkool
 wrote:
> Hi!
>
> On Sat, Apr 28, 2018 at 12:32:26AM +0100, Thomas Preudhomme wrote:
>> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
>> loading its address first before loading its value from it as part of
>> the stack_protect_set or stack_protect_check insn pattern. This creates
>> the risk of spilling between the two.
>
>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>> index deab929..c7ced8f 100644
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void)
>>tree guard_decl = targetm.stack_protect_guard ();
>>rtx x, y;
>>
>> +  /* Prevent scheduling of instruction(s) between computation of the guard's
>> + address and setting of the canari to avoid any spill of the guard's
>> + address if computed outside the setting of the canari.  */
>> +  emit_insn (gen_blockage ());
>>x = expand_normal (crtl->stack_protect_guard);
>>if (guard_decl)
>>  y = expand_normal (guard_decl);
>
> [ etc. ]
>
> Why pessimise code for all targets (quite a lot), when it does not even
> fix the problem on Arm completely (or not obviously, anyway)?
>
> Instead, implement stack_protect_* and hide the memory accesses to the
> stored canary value (and make sure its address is not left in a register
> either!)
>
> I doubt this can be done completely target-independent, it will always
> be best effort that way, aka it won't really work.
>
>
> Segher


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-05-03 Thread Segher Boessenkool
Hi!

On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote:
> As mentionned in the ticket this was my first thought but this means
> making the pattern aware of all the possible way the address could be
> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
> scratch registers are needed. I'd rather reuse the existing pattern as
> much as possible to make sure they are well tested. Ideally I wanted a
> way to mark a REG RTX so that it is never spilled and such that the
> mark is propagated when the register is moved to another register or
> propagated. But that is a bigger change so decided it should be an
> improvement for later but needed another solution right now.

How would that work, esp. for pseudos?  If too many regs have such a
mark then the compiler will have to sorry() or similar, not a good
thing at all.

> By the way about making sure the address is not left in a register, I
> have a question regarding the current stack_protect_set and
> stack_protect_check pattern and their requirements to have register
> cleared afterwards: why is that necessary? Currently not all registers
> are cleared and the guard is available in the canari before it is
> overwritten anyway so I don't see how clearing the register adds any
> extra security. What sort of attack is it protecting against?

>From md.texi:

@item @samp{stack_protect_set}
This pattern, if defined, moves a @code{ptr_mode} value from the memory
in operand 1 to the memory in operand 0 without leaving the value in
a register afterward.  This is to avoid leaking the value some place
that an attacker might use to rewrite the stack guard slot after
having clobbered it.

(etc.)

Having the canary in a global variable makes it a lot easier for exploit
code to access it then if it is e.g. in TLS data.  Actually leaking a
pointer to it would make it extra easy...


Segher


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-05-03 Thread Jeff Law
On 05/03/2018 10:55 AM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote:
>> As mentionned in the ticket this was my first thought but this means
>> making the pattern aware of all the possible way the address could be
>> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
>> scratch registers are needed. I'd rather reuse the existing pattern as
>> much as possible to make sure they are well tested. Ideally I wanted a
>> way to mark a REG RTX so that it is never spilled and such that the
>> mark is propagated when the register is moved to another register or
>> propagated. But that is a bigger change so decided it should be an
>> improvement for later but needed another solution right now.
> 
> How would that work, esp. for pseudos?  If too many regs have such a
> mark then the compiler will have to sorry() or similar, not a good
> thing at all.
> 
>> By the way about making sure the address is not left in a register, I
>> have a question regarding the current stack_protect_set and
>> stack_protect_check pattern and their requirements to have register
>> cleared afterwards: why is that necessary? Currently not all registers
>> are cleared and the guard is available in the canari before it is
>> overwritten anyway so I don't see how clearing the register adds any
>> extra security. What sort of attack is it protecting against?
> 
> From md.texi:
> 
> @item @samp{stack_protect_set}
> This pattern, if defined, moves a @code{ptr_mode} value from the memory
> in operand 1 to the memory in operand 0 without leaving the value in
> a register afterward.  This is to avoid leaking the value some place
> that an attacker might use to rewrite the stack guard slot after
> having clobbered it.
> 
> (etc.)
> 
> Having the canary in a global variable makes it a lot easier for exploit
> code to access it then if it is e.g. in TLS data.  Actually leaking a
> pointer to it would make it extra easy...
Yup.  And at least one guard (not the stack guard) has that properly.
It's stuffed into static storage.  Not good.

Though I must admit it was awful convenient when tracking down a bug in
how the guard was used.

jeff


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-05-04 Thread Segher Boessenkool
Hi Thomas,

On Fri, May 04, 2018 at 05:52:57AM +0100, Thomas Preudhomme wrote:
> >> As mentionned in the ticket this was my first thought but this means
> >> making the pattern aware of all the possible way the address could be
> >> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
> >> scratch registers are needed. I'd rather reuse the existing pattern as
> >> much as possible to make sure they are well tested. Ideally I wanted a
> >> way to mark a REG RTX so that it is never spilled and such that the
> >> mark is propagated when the register is moved to another register or
> >> propagated. But that is a bigger change so decided it should be an
> >> improvement for later but needed another solution right now.
> >
> > How would that work, esp. for pseudos?  If too many regs have such a
> > mark then the compiler will have to sorry() or similar, not a good
> > thing at all.
> 
> I'm missing something, there should be the same amount of pseudo with that
> mark as there is scratch in the new pattern doing memory address load(s) +
> set / check. I'm guessing this is not as easy to achieve as it sounds.

But this pattern is expanded all the way at the beginning of the RTL
pipeline, so you'll need to prevent anything copying this.  And if any
other pattern wants to use this do-not-spill feature as well, you'll
have a problem no matter what.

> >> By the way about making sure the address is not left in a register, I
> >> have a question regarding the current stack_protect_set and
> >> stack_protect_check pattern and their requirements to have register
> >> cleared afterwards: why is that necessary? Currently not all registers
> >> are cleared and the guard is available in the canari before it is
> >> overwritten anyway so I don't see how clearing the register adds any
> >> extra security. What sort of attack is it protecting against?
> >
> > From md.texi:
> >
> > @item @samp{stack_protect_set}
> > This pattern, if defined, moves a @code{ptr_mode} value from the memory
> > in operand 1 to the memory in operand 0 without leaving the value in
> > a register afterward.  This is to avoid leaking the value some place
> > that an attacker might use to rewrite the stack guard slot after
> > having clobbered it.
> >
> > (etc.)
> 
> I've read that doc but what I don't understand is why the guard value being
> leaked in a register would be a problem if modified. The pattern as they
> are guarantee the guard is always reloaded from its canonical location
> (e.g. TLS var). Because the patterns do not represent in RTL what they do
> the compiler could not reuse the value left in a register. Are we worrying
> about optimization the assembler could do?
> 
> > Having the canary in a global variable makes it a lot easier for exploit
> > code to access it then if it is e.g. in TLS data.  Actually leaking a
> > pointer to it would make it extra easy...
> 
> If an attacker can execute code to access and modify the guard, why would
> s/he bother doing a stack overflow instead of just executing the code he
> wants to directly?

The issue is leaking the value so the user can observe it, and then when
overwriting the stack write the expected value to the cookie again, so
that the protection isn't triggered.

You don't necessarily need to execute code of your choice to overwrite
a memory location of your choice, fwiw.  SSP does not prevent all attacks,
just very many.


Segher


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-05-21 Thread Jeff Law
On 05/02/2018 12:57 AM, Thomas Preudhomme wrote:
> Hi Segher,
> 
> As mentionned in the ticket this was my first thought but this means
> making the pattern aware of all the possible way the address could be
> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
> scratch registers are needed. I'd rather reuse the existing pattern as
> much as possible to make sure they are well tested. Ideally I wanted a
> way to mark a REG RTX so that it is never spilled and such that the
> mark is propagated when the register is moved to another register or
> propagated. But that is a bigger change so decided it should be an
> improvement for later but needed another solution right now.
> 
> By the way about making sure the address is not left in a register, I
> have a question regarding the current stack_protect_set and
> stack_protect_check pattern and their requirements to have register
> cleared afterwards: why is that necessary? Currently not all registers
> are cleared and the guard is available in the canari before it is
> overwritten anyway so I don't see how clearing the register adds any
> extra security. What sort of attack is it protecting against?
I'm not aware of any way to make a REG so that it's never spilled.  It's
a concept we simply don't have.

About the closest you get is a fixed register.  But you certainly don't
want to do that.

I really think you're going to have to address this primarily in the ARM
backend, probably making a fair amount of things opaque to the rest of
the compiler.

Jeff