I've finally built up enough courage to start getting my head around this...

I see one outstanding issue sitting on this patch version:

On Sat, Oct 28, 2017 at 05:08:54AM +0100, Jeff Law wrote:
> On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
> > --param=stack-clash-protection-probe-interval=13
> > --param=stack-clash-protection-guard-size=12
> > 
> > So if there is a good reason to continue with 2 separate values, we must
> > force probe interval <= guard size!
> The param code really isn't designed to enforce values that are
> inter-dependent.  It has a min, max & default values.  No more, no less.
>  If you set up something inconsistent with the params, it's simply not
> going to work.
> 
> 
> > 
> > Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes
> > crashes due to the offsets used in the probes - we don't need large offsets
> > as we want to probe close to the bottom of the stack.
> Not a surprise.  While I tried to handle larger intervals, I certainly
> didn't test them.  Given the ISA I wouldn't expect an interval > 12 to
> be useful or necessarily even work correctly.

Understood - weird behaviour with weird params don't concern me.

> > Functions with a large stack emit like alloca a lot of code, here I used
> > --param=stack-clash-protection-probe-interval=15:
> > 
> > int f1(int x)
> > {
> >   char arr[128*1024];
> >   return arr[x];
> > }
> > 
> > f1:
> >     mov     x16, 64512
> >     sub     sp, sp, x16
> >     .cfi_def_cfa_offset 64512
> >     mov     x16, -32768
> >     add     sp, sp, x16
> >     .cfi_def_cfa_offset -1024
> >     str     xzr, [sp, 32760]
> >     add     sp, sp, x16
> >     .cfi_def_cfa_offset -66560
> >     str     xzr, [sp, 32760]
> >     sub     sp, sp, #1024
> >     .cfi_def_cfa_offset -65536
> >     str     xzr, [sp, 1016]
> >     ldrb    w0, [sp, w0, sxtw]
> >     .cfi_def_cfa_offset 131072
> >     add     sp, sp, 131072
> >     .cfi_def_cfa_offset 0
> >     ret
> > 
> > Note the cfa offsets are wrong.
> Yes.  They definitely look wrong.  There's a clear logic error in
> setting up the ADJUST_CFA note when the probing interval is larger than
> 2**12.  That should be easily fixed.  Let me poke at it.

This one does concern me, how did you get on? Did it respond well to
prodding?

> > There is an odd mix of a big initial adjustment, then some 
> > probes+adjustments and
> > then a final adjustment and probe for the remainder. I can't see the point 
> > of having
> > both an initial and remainder adjustment. I would expect this:
> > 
> >     sub     sp, sp, 65536
> >     str     xzr, [sp, 1024]
> >     sub     sp, sp, 65536
> >     str     xzr, [sp, 1024]
> >     ldrb    w0, [sp, w0, sxtw]
> >     add     sp, sp, 131072
> >     ret
> I'm really not able to justify spending further time optimizing the
> aarch64 implementation.  I've done the best I can.  You can take the
> work as-is or improve it, but I really can't justify further time
> investment on that architecture.

Makes sense. Understood. And certainly not required to land this patch.

> > int f2(int x)
> > {
> >   char arr[128*1024];
> >   return arr[x];
> > }
> > 
> > f2:
> >     mov     x16, 64512
> >     sub     sp, sp, x16
> >     mov     x16, -65536
> >     movk    x16, 0xfffd, lsl 16
> >     add     x16, sp, x16
> > .LPSRL0:
> >     sub     sp, sp, 4096
> >     str     xzr, [sp, 4088]
> >     cmp     sp, x16
> >     b.ne    .LPSRL0
> >     sub     sp, sp, #1024
> >     str     xzr, [sp, 1016]
> >     ldrb    w0, [sp, w0, sxtw]
> >     add     sp, sp, 262144
> >     ret
> > 
> > The cfa entries are OK for this case. There is a mix of positive/negative 
> > offsets which
> > makes things confusing. Again there are 3 kinds of adjustments when for 
> > this size we
> > only need the loop.
> > 
> > Reusing the existing gen_probe_stack_range code appears a bad idea since
> > it ignores the probe interval and just defaults to 4KB. I don't see why it 
> > should be
> > any more complex than this:
> > 
> >     sub     x16, sp, 262144  // only need temporary if > 1MB
> > .LPSRL0:
> >     sub     sp, sp, 65536
> >     str     xzr, [sp, 1024]
> >     cmp     sp, x16
> >     b.ne    .LPSRL0
> >     ldrb    w0, [sp, w0, sxtw]
> >     add     sp, sp, 262144
> >     ret
> > 
> > Probe insertion if final adjustment >= 1024 also generates a lot of 
> > redundant
> > code - although this is more a theoretical issue given this is so rare.
> Again, if ARM wants this optimized, then ARM's engineers are going to
> have to take the lead here.  I've invested all I can reasonably invest
> in terms of trying optimize the probing for this target.

Likewise here - thanks for your work so far, I have no expectation of this
being fully optimized before I OK it to land.

Sorry for the big delay getting round to this patch, I hope to get serious
time to put in to it later this week, and it would be helpful to close out
the few remaining issues before I do.

Thanks,
James

Reply via email to