I've finally built up enough courage to start getting my head around this...
I see one outstanding issue sitting on this patch version: On Sat, Oct 28, 2017 at 05:08:54AM +0100, Jeff Law wrote: > On 10/13/2017 02:26 PM, Wilco Dijkstra wrote: > > --param=stack-clash-protection-probe-interval=13 > > --param=stack-clash-protection-guard-size=12 > > > > So if there is a good reason to continue with 2 separate values, we must > > force probe interval <= guard size! > The param code really isn't designed to enforce values that are > inter-dependent. It has a min, max & default values. No more, no less. > If you set up something inconsistent with the params, it's simply not > going to work. > > > > > > Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes > > crashes due to the offsets used in the probes - we don't need large offsets > > as we want to probe close to the bottom of the stack. > Not a surprise. While I tried to handle larger intervals, I certainly > didn't test them. Given the ISA I wouldn't expect an interval > 12 to > be useful or necessarily even work correctly. Understood - weird behaviour with weird params don't concern me. > > Functions with a large stack emit like alloca a lot of code, here I used > > --param=stack-clash-protection-probe-interval=15: > > > > int f1(int x) > > { > > char arr[128*1024]; > > return arr[x]; > > } > > > > f1: > > mov x16, 64512 > > sub sp, sp, x16 > > .cfi_def_cfa_offset 64512 > > mov x16, -32768 > > add sp, sp, x16 > > .cfi_def_cfa_offset -1024 > > str xzr, [sp, 32760] > > add sp, sp, x16 > > .cfi_def_cfa_offset -66560 > > str xzr, [sp, 32760] > > sub sp, sp, #1024 > > .cfi_def_cfa_offset -65536 > > str xzr, [sp, 1016] > > ldrb w0, [sp, w0, sxtw] > > .cfi_def_cfa_offset 131072 > > add sp, sp, 131072 > > .cfi_def_cfa_offset 0 > > ret > > > > Note the cfa offsets are wrong. > Yes. They definitely look wrong. There's a clear logic error in > setting up the ADJUST_CFA note when the probing interval is larger than > 2**12. That should be easily fixed. Let me poke at it. This one does concern me, how did you get on? Did it respond well to prodding? > > There is an odd mix of a big initial adjustment, then some > > probes+adjustments and > > then a final adjustment and probe for the remainder. I can't see the point > > of having > > both an initial and remainder adjustment. I would expect this: > > > > sub sp, sp, 65536 > > str xzr, [sp, 1024] > > sub sp, sp, 65536 > > str xzr, [sp, 1024] > > ldrb w0, [sp, w0, sxtw] > > add sp, sp, 131072 > > ret > I'm really not able to justify spending further time optimizing the > aarch64 implementation. I've done the best I can. You can take the > work as-is or improve it, but I really can't justify further time > investment on that architecture. Makes sense. Understood. And certainly not required to land this patch. > > int f2(int x) > > { > > char arr[128*1024]; > > return arr[x]; > > } > > > > f2: > > mov x16, 64512 > > sub sp, sp, x16 > > mov x16, -65536 > > movk x16, 0xfffd, lsl 16 > > add x16, sp, x16 > > .LPSRL0: > > sub sp, sp, 4096 > > str xzr, [sp, 4088] > > cmp sp, x16 > > b.ne .LPSRL0 > > sub sp, sp, #1024 > > str xzr, [sp, 1016] > > ldrb w0, [sp, w0, sxtw] > > add sp, sp, 262144 > > ret > > > > The cfa entries are OK for this case. There is a mix of positive/negative > > offsets which > > makes things confusing. Again there are 3 kinds of adjustments when for > > this size we > > only need the loop. > > > > Reusing the existing gen_probe_stack_range code appears a bad idea since > > it ignores the probe interval and just defaults to 4KB. I don't see why it > > should be > > any more complex than this: > > > > sub x16, sp, 262144 // only need temporary if > 1MB > > .LPSRL0: > > sub sp, sp, 65536 > > str xzr, [sp, 1024] > > cmp sp, x16 > > b.ne .LPSRL0 > > ldrb w0, [sp, w0, sxtw] > > add sp, sp, 262144 > > ret > > > > Probe insertion if final adjustment >= 1024 also generates a lot of > > redundant > > code - although this is more a theoretical issue given this is so rare. > Again, if ARM wants this optimized, then ARM's engineers are going to > have to take the lead here. I've invested all I can reasonably invest > in terms of trying optimize the probing for this target. Likewise here - thanks for your work so far, I have no expectation of this being fully optimized before I OK it to land. Sorry for the big delay getting round to this patch, I hope to get serious time to put in to it later this week, and it would be helpful to close out the few remaining issues before I do. Thanks, James