Hi, To continue the review of the AArch64 frame code I tried a few examples to figure out what it does now. For initial_adjust <= 63*1024 and final_adjust < 1024 there are no probes inserted as expected, ie. the vast majority of functions are unaffected. So that works perfectly.
For larger frames the first oddity is that there are now 2 separate params controlling how probes are generated: stack-clash-protection-guard-size (default 12, but set to 16 on AArch64) stack-clash-protection-probe-interval (default 12) I don't see how this makes sense. These values are closely related, so if one is different from the other, probing becomes ineffective/incorrect. For example we generate code that trivially bypasses the guard despite all the probing: --param=stack-clash-protection-probe-interval=13 --param=stack-clash-protection-guard-size=12 So if there is a good reason to continue with 2 separate values, we must force probe interval <= guard size! Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes crashes due to the offsets used in the probes - we don't need large offsets as we want to probe close to the bottom of the stack. Functions with a large stack emit like alloca a lot of code, here I used --param=stack-clash-protection-probe-interval=15: int f1(int x) { char arr[128*1024]; return arr[x]; } f1: mov x16, 64512 sub sp, sp, x16 .cfi_def_cfa_offset 64512 mov x16, -32768 add sp, sp, x16 .cfi_def_cfa_offset -1024 str xzr, [sp, 32760] add sp, sp, x16 .cfi_def_cfa_offset -66560 str xzr, [sp, 32760] sub sp, sp, #1024 .cfi_def_cfa_offset -65536 str xzr, [sp, 1016] ldrb w0, [sp, w0, sxtw] .cfi_def_cfa_offset 131072 add sp, sp, 131072 .cfi_def_cfa_offset 0 ret Note the cfa offsets are wrong. There is an odd mix of a big initial adjustment, then some probes+adjustments and then a final adjustment and probe for the remainder. I can't see the point of having both an initial and remainder adjustment. I would expect this: sub sp, sp, 65536 str xzr, [sp, 1024] sub sp, sp, 65536 str xzr, [sp, 1024] ldrb w0, [sp, w0, sxtw] add sp, sp, 131072 ret int f2(int x) { char arr[128*1024]; return arr[x]; } f2: mov x16, 64512 sub sp, sp, x16 mov x16, -65536 movk x16, 0xfffd, lsl 16 add x16, sp, x16 .LPSRL0: sub sp, sp, 4096 str xzr, [sp, 4088] cmp sp, x16 b.ne .LPSRL0 sub sp, sp, #1024 str xzr, [sp, 1016] ldrb w0, [sp, w0, sxtw] add sp, sp, 262144 ret The cfa entries are OK for this case. There is a mix of positive/negative offsets which makes things confusing. Again there are 3 kinds of adjustments when for this size we only need the loop. Reusing the existing gen_probe_stack_range code appears a bad idea since it ignores the probe interval and just defaults to 4KB. I don't see why it should be any more complex than this: sub x16, sp, 262144 // only need temporary if > 1MB .LPSRL0: sub sp, sp, 65536 str xzr, [sp, 1024] cmp sp, x16 b.ne .LPSRL0 ldrb w0, [sp, w0, sxtw] add sp, sp, 262144 ret Probe insertion if final adjustment >= 1024 also generates a lot of redundant code - although this is more a theoretical issue given this is so rare. Wilco