RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-10-08 Thread Tamar Christina
Hi All,

I'm looking for permission to backport this patch to the GCC-8 branch
to fix PR86486.

OK for backport?

Thanks,
Tamar

> -Original Message-
> From: James Greenhalgh 
> Sent: Tuesday, September 11, 2018 16:56
> To: Tamar Christina 
> Cc: Richard Sandiford ; Jeff Law
> ; gcc-patches@gcc.gnu.org; nd ; Richard
> Earnshaw ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/7)]
> 
> On Fri, Sep 07, 2018 at 11:03:28AM -0500, Tamar Christina wrote:
> > Hi Richard,
> >
> > The 08/28/2018 21:58, Richard Sandiford wrote:
> > > Tamar Christina  writes:
> > > > +  HOST_WIDE_INT guard_used_by_caller =
> STACK_CLASH_CALLER_GUARD;
> > > > +  /* When doing the final adjustment for the outgoing argument size
> we can't
> > > > + assume that LR was saved at position 0.  So subtract it's offset 
> > > > from
> the
> > > > + ABI safe buffer so that we don't accidentally allow an adjustment
> that
> > > > + would result in an allocation larger than the ABI buffer without
> > > > + probing.  */
> > > > +  HOST_WIDE_INT min_probe_threshold
> > > > += final_adjustment_p
> > > > +  ? guard_used_by_caller - cfun->machine-
> >frame.reg_offset[LR_REGNUM]
> > > > +  : guard_size - guard_used_by_caller;
> > > [...]
> > > > +  if (residual)
> > > > +{
> > > > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> > > > +  if (residual >= min_probe_threshold)
> > > > +   {
> > > > + if (dump_file)
> > > > +   fprintf (dump_file,
> > > > +"Stack clash AArch64 prologue residuals: "
> > > > +HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
> required."
> > > > +"\n", residual);
> > > > + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> > > > +  STACK_CLASH_CALLER_GUARD));
> > >
> > > reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
> > > 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
> > > It looks like the probe would then write above the region.
> > >
> > > Using >= rather than > means that the same thing could happen when
> > > LR_REGNUM is at position 0, if the residual is exactly
> > > STACK_CLASH_CALLER_GUARD.
> >
> > That's true. While addressing this we changed how the residuals are
> probed.
> >
> > To address a comment you raised offline about the saving of LR when
> > calling a no-return function using a tail call and
> > -fomit-frame-pointer, I think this should be safe as the code in
> frame_layout (line 4131-4136) would ensure that R30 is saved.
> >
> > I have added two new tests to check for this, so that if it does
> > change in the future they would fail.
> >
> > Attached is the updated patch and new changelog
> >
> > Ok for trunk?
> 
> I'm happy with this patch version; I'd have preferred a FORNOW comment on
> this:
> 
> > +  /* If SIZE is not large enough to require probing, just adjust the stack 
> > and
> > + exit.  */
> > +  if (!poly_size.is_constant (&size)
> > +  || known_lt (poly_size, min_probe_threshold)
> > +  || !flag_stack_clash_protection)
> 
> as you don't fix it until 2/7, but that is a minor point.
> 
> I'm happy with you responding to Richard S' request for an assert either in
> this patch, or tacked on as an 8/7.
> 
> OK.
> 
> Thanks,
> James
> 
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-09-07  Jeff Law  
> > Richard Sandiford 
> > Tamar Christina  
> >
> > PR target/86486
> > * config/aarch64/aarch64.md
> > (probe_stack_range): Add k (SP) constraint.
> > * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> > STACK_CLASH_MAX_UNROLL_PAGES): New.
> > * config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
> > stack probes for stack clash.
> > (aarch64_allocate_and_probe_stack_space): New.
> > (aarch64_expand_prologue): Use it.
> > (aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
> > (aarch64_sub_sp): Add emit_move_imm optional param.
> >
> > gcc/testsuit

Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-09-12 Thread Richard Sandiford
Tamar Christina  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, September 11, 2018 15:49
>> To: Tamar Christina 
>> Cc: Jeff Law ; gcc-patches@gcc.gnu.org; nd
>> ; James Greenhalgh ;
>> Richard Earnshaw ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
>> supporting 64k probes. [patch (1/7)]
>> 
>> Tamar Christina  writes:
>> > Hi Richard,
>> >
>> > The 08/28/2018 21:58, Richard Sandiford wrote:
>> >> Tamar Christina  writes:
>> >> > +  HOST_WIDE_INT guard_used_by_caller =
>> STACK_CLASH_CALLER_GUARD;
>> >> > + /* When doing the final adjustment for the outgoing argument size
>> >> > we can't
>> >> > + assume that LR was saved at position 0.  So subtract it's offset
>> >> > from the
>> >> > + ABI safe buffer so that we don't accidentally allow an adjustment
>> that
>> >> > + would result in an allocation larger than the ABI buffer without
>> >> > + probing.  */
>> >> > +  HOST_WIDE_INT min_probe_threshold
>> >> > += final_adjustment_p
>> >> > +  ? guard_used_by_caller - cfun->machine-
>> >frame.reg_offset[LR_REGNUM]
>> >> > +  : guard_size - guard_used_by_caller;
>> >> [...]
>> >> > +  if (residual)
>> >> > +{
>> >> > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
>> >> > +  if (residual >= min_probe_threshold)
>> >> > +   {
>> >> > + if (dump_file)
>> >> > +   fprintf (dump_file,
>> >> > +"Stack clash AArch64 prologue residuals: "
>> >> > +HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
>> required."
>> >> > +"\n", residual);
>> >> > + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>> >> > +  STACK_CLASH_CALLER_GUARD));
>> >>
>> >> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
>> >> 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
>> >> It looks like the probe would then write above the region.
>> >>
>> >> Using >= rather than > means that the same thing could happen when
>> >> LR_REGNUM is at position 0, if the residual is exactly
>> >> STACK_CLASH_CALLER_GUARD.
>> >
>> > That's true. While addressing this we changed how the residuals are
>> probed.
>> >
>> > To address a comment you raised offline about the saving of LR when
>> > calling a no-return function using a tail call and
>> > -fomit-frame-pointer, I think this should be safe as the code in
>> > frame_layout (line 4131-4136) would ensure that R30 is saved.
>> 
>> That line number range doesn't seem to match up with current sources.
>> But my point was that "X is a non-leaf function" does not directly imply "X
>> saves R30_REGNUM".  It might happen to imply that indirectly for all cases at
>> the moment, but it's not something that the AArch64 code enforces itself.
>> AFAICT the only time we force R30 to be saved explicitly is when emitting a
>> chain:
>> 
>>   if (cfun->machine->frame.emit_frame_chain)
>> {
>>   /* FP and LR are placed in the linkage record.  */
>>   cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
>>   cfun->machine->frame.wb_candidate1 = R29_REGNUM;
>>   cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
>>   cfun->machine->frame.wb_candidate2 = R30_REGNUM;
>>   offset = 2 * UNITS_PER_WORD;
>> }
>> 
>> Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
>> And in principle, it should be correct to use a sibcall pattern that doesn't
>> clobber R30 for a noreturn call even in functions that require a frame.  We
>> don't do that yet, and it probably doesn't make sense from a QoI perspective
>> on AArch64, but I don't think it's invalid in terms of rtl semantics.  
>> There's no
>> real reason why a noreturn function has to save the address of its caller
>> unless the target forces it to, such as for frame chains above.
>> 
>> In this patch we'

Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-09-11 Thread James Greenhalgh
On Fri, Sep 07, 2018 at 11:03:28AM -0500, Tamar Christina wrote:
> Hi Richard,
> 
> The 08/28/2018 21:58, Richard Sandiford wrote:
> > Tamar Christina  writes:
> > > +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> > > +  /* When doing the final adjustment for the outgoing argument size we 
> > > can't
> > > + assume that LR was saved at position 0.  So subtract it's offset 
> > > from the
> > > + ABI safe buffer so that we don't accidentally allow an adjustment 
> > > that
> > > + would result in an allocation larger than the ABI buffer without
> > > + probing.  */
> > > +  HOST_WIDE_INT min_probe_threshold
> > > += final_adjustment_p
> > > +  ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> > > +  : guard_size - guard_used_by_caller;
> > [...]
> > > +  if (residual)
> > > +{
> > > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> > > +  if (residual >= min_probe_threshold)
> > > + {
> > > +   if (dump_file)
> > > + fprintf (dump_file,
> > > +  "Stack clash AArch64 prologue residuals: "
> > > +  HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> > > +  "\n", residual);
> > > +   emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> > > +STACK_CLASH_CALLER_GUARD));
> > 
> > reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
> > min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
> > like the probe would then write above the region.
> > 
> > Using >= rather than > means that the same thing could happen when
> > LR_REGNUM is at position 0, if the residual is exactly
> > STACK_CLASH_CALLER_GUARD.
> 
> That's true. While addressing this we changed how the residuals are probed.
> 
> To address a comment you raised offline about the saving of LR when calling
> a no-return function using a tail call and -fomit-frame-pointer, I think this 
> should
> be safe as the code in frame_layout (line 4131-4136) would ensure that R30 is 
> saved.
> 
> I have added two new tests to check for this, so that if it does change in 
> the future they
> would fail. 
> 
> Attached is the updated patch and new changelog
> 
> Ok for trunk?

I'm happy with this patch version; I'd have preferred a FORNOW comment on this:

> +  /* If SIZE is not large enough to require probing, just adjust the stack 
> and
> + exit.  */
> +  if (!poly_size.is_constant (&size)
> +  || known_lt (poly_size, min_probe_threshold)
> +  || !flag_stack_clash_protection)

as you don't fix it until 2/7, but that is a minor point.

I'm happy with you responding to Richard S' request for an assert either in
this patch, or tacked on as an 8/7.

OK.

Thanks,
James

> Thanks,
> Tamar
> 
> gcc/
> 2018-09-07  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.md
>   (probe_stack_range): Add k (SP) constraint.
>   * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
>   STACK_CLASH_MAX_UNROLL_PAGES): New.
>   * config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
>   stack probes for stack clash.
>   (aarch64_allocate_and_probe_stack_space): New.
>   (aarch64_expand_prologue): Use it.
>   (aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
>   (aarch64_sub_sp): Add emit_move_imm optional param.
> 
> gcc/testsuite/
> 2018-09-07  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * gcc.target/aarch64/stack-check-12.c: New.
>   * gcc.target/aarch64/stack-check-13.c: New.
>   * gcc.target/aarch64/stack-check-cfa-1.c: New.
>   * gcc.target/aarch64/stack-check-cfa-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-1.c: New.
>   * gcc.target/aarch64/stack-check-prologue-10.c: New.
>   * gcc.target/aarch64/stack-check-prologue-11.c: New.
>   * gcc.target/aarch64/stack-check-prologue-12.c: New.
>   * gcc.target/aarch64/stack-check-prologue-13.c: New.
>   * gcc.target/aarch64/stack-check-prologue-14.c: New.
>   * gcc.target/aarch64/stack-check-prologue-15.c: New.
>   * gcc.target/aarch64/stack-check-prologue-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-3.c: New.
>   * gcc.target/aarch64/stack-check-prologue-4.c: New.
>   * gcc.target/aarch64/stack-check-prologue-5.c: New.
>   * gcc.target/aarch64/stack-check-prologue-6.c: New.
>   * gcc.target/aarch64/stack-check-prologue-7.c: New.
>   * gcc.target/aarch64/stack-check-prologue-8.c: New.
>   * gcc.target/aarch64/stack-check-prologue-9.c: New.
>   * gcc.target/aarch64/stack-check-prologue.h: New.
>   * lib/target-supports.exp
>   (check_effective_target_supports_stack_clash_protection): Add AArch64.
> 


RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-09-11 Thread Tamar Christina
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, September 11, 2018 15:49
> To: Tamar Christina 
> Cc: Jeff Law ; gcc-patches@gcc.gnu.org; nd
> ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/7)]
> 
> Tamar Christina  writes:
> > Hi Richard,
> >
> > The 08/28/2018 21:58, Richard Sandiford wrote:
> >> Tamar Christina  writes:
> >> > +  HOST_WIDE_INT guard_used_by_caller =
> STACK_CLASH_CALLER_GUARD;
> >> > + /* When doing the final adjustment for the outgoing argument size
> >> > we can't
> >> > + assume that LR was saved at position 0.  So subtract it's offset
> >> > from the
> >> > + ABI safe buffer so that we don't accidentally allow an adjustment
> that
> >> > + would result in an allocation larger than the ABI buffer without
> >> > + probing.  */
> >> > +  HOST_WIDE_INT min_probe_threshold
> >> > += final_adjustment_p
> >> > +  ? guard_used_by_caller - cfun->machine-
> >frame.reg_offset[LR_REGNUM]
> >> > +  : guard_size - guard_used_by_caller;
> >> [...]
> >> > +  if (residual)
> >> > +{
> >> > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> >> > +  if (residual >= min_probe_threshold)
> >> > +{
> >> > +  if (dump_file)
> >> > +fprintf (dump_file,
> >> > + "Stack clash AArch64 prologue residuals: "
> >> > + HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
> required."
> >> > + "\n", residual);
> >> > +  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> >> > +   STACK_CLASH_CALLER_GUARD));
> >>
> >> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
> >> 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
> >> It looks like the probe would then write above the region.
> >>
> >> Using >= rather than > means that the same thing could happen when
> >> LR_REGNUM is at position 0, if the residual is exactly
> >> STACK_CLASH_CALLER_GUARD.
> >
> > That's true. While addressing this we changed how the residuals are
> probed.
> >
> > To address a comment you raised offline about the saving of LR when
> > calling a no-return function using a tail call and
> > -fomit-frame-pointer, I think this should be safe as the code in
> > frame_layout (line 4131-4136) would ensure that R30 is saved.
> 
> That line number range doesn't seem to match up with current sources.
> But my point was that "X is a non-leaf function" does not directly imply "X
> saves R30_REGNUM".  It might happen to imply that indirectly for all cases at
> the moment, but it's not something that the AArch64 code enforces itself.
> AFAICT the only time we force R30 to be saved explicitly is when emitting a
> chain:
> 
>   if (cfun->machine->frame.emit_frame_chain)
> {
>   /* FP and LR are placed in the linkage record.  */
>   cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
>   cfun->machine->frame.wb_candidate1 = R29_REGNUM;
>   cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
>   cfun->machine->frame.wb_candidate2 = R30_REGNUM;
>   offset = 2 * UNITS_PER_WORD;
> }
> 
> Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
> And in principle, it should be correct to use a sibcall pattern that doesn't
> clobber R30 for a noreturn call even in functions that require a frame.  We
> don't do that yet, and it probably doesn't make sense from a QoI perspective
> on AArch64, but I don't think it's invalid in terms of rtl semantics.  
> There's no
> real reason why a noreturn function has to save the address of its caller
> unless the target forces it to, such as for frame chains above.
> 
> In this patch we're relying on the link between the two concepts for a
> security feature, so I think we should either enforce it explicitly or add an
> assert like:
> 
>   gcc_assert (crtl->is_leaf
> || (cfun->machine->frame.reg_offset[R30_REGNUM]
> != SLOT_NOT_REQUIRED));
> 
> to aarch64_expand_prologue.  (I don't think we should make the assert
> dependent on stack-clash, since the point is that we're assuming this
> happens regardless of stack-clash.)

I agree that the assert would be a good idea, though I'm not sure enabling it 
always is
a good idea. I'm not sure what other languages that don't use 
stack-clash-protection and do their
own probing do. Like Ada or Go?

Regards,
Tamar

> 
> > I have added two new tests to check for this, so that if it does
> > change in the future they would fail.
> 
> Thanks.
> 
> Richard


Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-09-11 Thread Richard Sandiford
Tamar Christina  writes:
> Hi Richard,
>
> The 08/28/2018 21:58, Richard Sandiford wrote:
>> Tamar Christina  writes:
>> > +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
>> > + /* When doing the final adjustment for the outgoing argument size
>> > we can't
>> > + assume that LR was saved at position 0.  So subtract it's offset
>> > from the
>> > + ABI safe buffer so that we don't accidentally allow an adjustment 
>> > that
>> > + would result in an allocation larger than the ABI buffer without
>> > + probing.  */
>> > +  HOST_WIDE_INT min_probe_threshold
>> > += final_adjustment_p
>> > +  ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
>> > +  : guard_size - guard_used_by_caller;
>> [...]
>> > +  if (residual)
>> > +{
>> > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
>> > +  if (residual >= min_probe_threshold)
>> > +  {
>> > +if (dump_file)
>> > +  fprintf (dump_file,
>> > +   "Stack clash AArch64 prologue residuals: "
>> > +   HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
>> > +   "\n", residual);
>> > +emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>> > + STACK_CLASH_CALLER_GUARD));
>> 
>> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
>> min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
>> like the probe would then write above the region.
>> 
>> Using >= rather than > means that the same thing could happen when
>> LR_REGNUM is at position 0, if the residual is exactly
>> STACK_CLASH_CALLER_GUARD.
>
> That's true. While addressing this we changed how the residuals are probed.
>
> To address a comment you raised offline about the saving of LR when
> calling a no-return function using a tail call and
> -fomit-frame-pointer, I think this should be safe as the code in
> frame_layout (line 4131-4136) would ensure that R30 is saved.

That line number range doesn't seem to match up with current sources.
But my point was that "X is a non-leaf function" does not directly
imply "X saves R30_REGNUM".  It might happen to imply that indirectly
for all cases at the moment, but it's not something that the AArch64
code enforces itself.  AFAICT the only time we force R30 to be saved
explicitly is when emitting a chain:

  if (cfun->machine->frame.emit_frame_chain)
{
  /* FP and LR are placed in the linkage record.  */
  cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
  cfun->machine->frame.wb_candidate1 = R29_REGNUM;
  cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
  cfun->machine->frame.wb_candidate2 = R30_REGNUM;
  offset = 2 * UNITS_PER_WORD;
}

Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
And in principle, it should be correct to use a sibcall pattern that
doesn't clobber R30 for a noreturn call even in functions that require
a frame.  We don't do that yet, and it probably doesn't make sense from
a QoI perspective on AArch64, but I don't think it's invalid in terms
of rtl semantics.  There's no real reason why a noreturn function has to
save the address of its caller unless the target forces it to, such as
for frame chains above.

In this patch we're relying on the link between the two concepts for a
security feature, so I think we should either enforce it explicitly or
add an assert like:

  gcc_assert (crtl->is_leaf
  || (cfun->machine->frame.reg_offset[R30_REGNUM]
  != SLOT_NOT_REQUIRED));

to aarch64_expand_prologue.  (I don't think we should make the assert
dependent on stack-clash, since the point is that we're assuming this
happens regardless of stack-clash.)

> I have added two new tests to check for this, so that if it does
> change in the future they would fail.

Thanks.

Richard


Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-09-07 Thread Tamar Christina
Hi Richard,

The 08/28/2018 21:58, Richard Sandiford wrote:
> Tamar Christina  writes:
> > +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> > +  /* When doing the final adjustment for the outgoing argument size we 
> > can't
> > + assume that LR was saved at position 0.  So subtract it's offset from 
> > the
> > + ABI safe buffer so that we don't accidentally allow an adjustment that
> > + would result in an allocation larger than the ABI buffer without
> > + probing.  */
> > +  HOST_WIDE_INT min_probe_threshold
> > += final_adjustment_p
> > +  ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> > +  : guard_size - guard_used_by_caller;
> [...]
> > +  if (residual)
> > +{
> > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> > +  if (residual >= min_probe_threshold)
> > +   {
> > + if (dump_file)
> > +   fprintf (dump_file,
> > +"Stack clash AArch64 prologue residuals: "
> > +HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> > +"\n", residual);
> > + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> > +  STACK_CLASH_CALLER_GUARD));
> 
> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
> min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
> like the probe would then write above the region.
> 
> Using >= rather than > means that the same thing could happen when
> LR_REGNUM is at position 0, if the residual is exactly
> STACK_CLASH_CALLER_GUARD.

That's true. While addressing this we changed how the residuals are probed.

To address a comment you raised offline about the saving of LR when calling
a no-return function using a tail call and -fomit-frame-pointer, I think this 
should
be safe as the code in frame_layout (line 4131-4136) would ensure that R30 is 
saved.

I have added two new tests to check for this, so that if it does change in the 
future they
would fail. 

Attached is the updated patch and new changelog

Ok for trunk?
Thanks,
Tamar

gcc/
2018-09-07  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* config/aarch64/aarch64.md
(probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-09-07  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-12.c: New.
* gcc.target/aarch64/stack-check-prologue-13.c: New.
* gcc.target/aarch64/stack-check-prologue-14.c: New.
* gcc.target/aarch64/stack-check-prologue-15.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* gcc.target/aarch64/stack-check-prologue-9.c: New.
* gcc.target/aarch64/stack-check-prologue.h: New.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Add AArch64.

> 
> Thanks,
> Richard

-- 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c1218503bab19323eee1cca8b7e4bea8fbfcf573..bfb6d92f5e665b14926514b864489cb3f2793336 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -84,6 +84,14 @@
 
 #define LONG_DOUBLE_TYPE_SIZE	128
 
+/* This value is the amount of bytes a caller is allowed to drop the stack
+   before probing has to be done for stack clash protection.  */
+#define STACK_CLASH_CALLER_GUARD 1024
+
+/* This value controls how many pages we manually unroll the loop for when
+   generating stack clash probes.  */
+#define STACK_CLASH_MAX_UNROLL_PAGES 4
+
 /* The architecture reserves all bits of the address for hardware use,
so the vbit must go into the delta field of pointer

Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-08-28 Thread Richard Sandiford
Tamar Christina  writes:
> +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> +  /* When doing the final adjustment for the outgoing argument size we can't
> + assume that LR was saved at position 0.  So subtract it's offset from 
> the
> + ABI safe buffer so that we don't accidentally allow an adjustment that
> + would result in an allocation larger than the ABI buffer without
> + probing.  */
> +  HOST_WIDE_INT min_probe_threshold
> += final_adjustment_p
> +  ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> +  : guard_size - guard_used_by_caller;
[...]
> +  if (residual)
> +{
> +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> +  if (residual >= min_probe_threshold)
> + {
> +   if (dump_file)
> + fprintf (dump_file,
> +  "Stack clash AArch64 prologue residuals: "
> +  HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> +  "\n", residual);
> +   emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> +STACK_CLASH_CALLER_GUARD));

reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
like the probe would then write above the region.

Using >= rather than > means that the same thing could happen when
LR_REGNUM is at position 0, if the residual is exactly
STACK_CLASH_CALLER_GUARD.

Thanks,
Richard


RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-08-28 Thread Tamar Christina
Hi All,

As requested this patch series now contains basic SVE support, following that
I am updating this patch to remove the error/warnings generated when SVE is 
used.

The series now consists of 7 patches but I will only send updates for those 
that changed.


Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-28  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* config/aarch64/aarch64.md
(probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-08-28  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* gcc.target/aarch64/stack-check-prologue-9.c: New.
* gcc.target/aarch64/stack-check-prologue.h: New.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Add AArch64.


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Tamar Christina
> Sent: Tuesday, August 7, 2018 11:10
> To: Jeff Law 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: RE: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/6)]
> 
> Hi All,
> 
> This is  a re-spin of the patch to address review comments.
> It mostly just adds more comments and corrects typos.
> 
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-08-07  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.md (cmp,
>   probe_stack_range): Add k (SP) constraint.
>   * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
>   STACK_CLASH_MAX_UNROLL_PAGES): New.
>   * config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
>   stack probes for stack clash.
>   (aarch64_allocate_and_probe_stack_space): New.
>   (aarch64_expand_prologue): Use it.
>   (aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
>   (aarch64_sub_sp): Add emit_move_imm optional param.
> 
> gcc/testsuite/
> 2018-08-07  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * gcc.target/aarch64/stack-check-12.c: New.
>   * gcc.target/aarch64/stack-check-13.c: New.
>   * gcc.target/aarch64/stack-check-cfa-1.c: New.
>   * gcc.target/aarch64/stack-check-cfa-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-1.c: New.
>   * gcc.target/aarch64/stack-check-prologue-10.c: New.
>   * gcc.target/aarch64/stack-check-prologue-11.c: New.
>   * gcc.target/aarch64/stack-check-prologue-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-3.c: New.
>   * gcc.target/aarch64/stack-check-prologue-4.c: New.
>   * gcc.target/aarch64/stack-check-prologue-5.c: New.
>   * gcc.target/aarch64/stack-check-prologue-6.c: New.
>   * gcc.target/aarch64/stack-check-prologue-7.c: New.
>   * gcc.target/aarch64/stack-check-prologue-8.c: New.
>   * gcc.target/aarch64/stack-check-prologue-9.c: New.
>   * gcc.target/aarch64/stack-check-prologue.h: New.
>   * lib/target-supports.exp
>   (check_effective_target_supports_stack_clash_protection): Add
> AArch64.
> 
> > -Original Message-
> > From: Jeff Law 
> > Sent: Friday, August 3, 2018 19:05
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> > ; Richard Earnshaw
> > ; Marcus Shawcroft
> > 
> > Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> > supporting 64k probes. [patch (1/6)]
> >
> > On 07/25/2018 05:09 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > Attached is an updated patch that clarifies some of