https://bugs.kde.org/show_bug.cgi?id=479996
Adhemerval Zanella <zatr...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zatr...@gmail.com --- Comment #1 from Adhemerval Zanella <zatr...@gmail.com> --- I think the issue is how -fstack-check is implemented on aarch64 and I am not sure how to fix on valgrind without decrease the coverage somehow. On aarch64, the example code is translated as: --- a_function: sub x10, sp, #8192 str xzr, [x10] sub x10, x10, #4096 str xzr, [x10, 1536] mov x12, 10752 sub sp, sp, x12 nop add sp, sp, x12 ret main: sub x10, sp, #8192 str xzr, [x10, 4080] stp x29, x30, [sp, -16]! mov x29, sp bl a_function mov w0, 0 ldp x29, x30, [sp], 16 ret --- So the stack is probed *without* updating the stack pointer, and it is technically a stack overflow. Different than x86_64, where the stack pointer is updated and then adjusted after the probe: --- main: push rbp mov rbp, rsp sub rsp, 8216 or QWORD PTR [rsp], 0 add rsp, 8216 mov eax, 0 call a_function mov eax, 0 leave ret --- Both ABIs defined specific code generation (-fstack-check=specific), but x86_64 also internally defines STACK_CHECK_MOVING_SP (https://gcc.gnu.org/onlinedocs//gccint/Stack-Checking.html). I think it is because x86_64 frame generation code requires the SP to be updated to correctly generate FP and local variables access correctly, which is not required by aarch64 backend and thus gcc developers could turn it off for better code generation. The SP update is not strickly required, since for Linux and other OS, the kernel will guarantee a minimum stack size that is lazy allocated. Ideally the probe would either trigger lazy allocations through soft page faults or hit the guard page/invalid memory allocation. The problem is -fstack-check strategy has multiple corner cases, that's why recent gcc provides the -fstack-clash-protection instead. On aarch64 the -fstack-clash-protection will update the SP in the expected increments if the stack allocation requires the probing. Using -fstack-clash-protection --param stack-clash-protection-probe-interval=12 --param stack-clash-protection-guard-size=12: --- a_function: sub sp, sp, #4096 str xzr, [sp, 1024] sub sp, sp, #4096 str xzr, [sp, 1024] sub sp, sp, #2560 nop mov x12, 10752 add sp, sp, x12 ret main: stp x29, x30, [sp, -16]! mov x29, sp bl a_function mov w0, 0 ldp x29, x30, [sp], 16 ret --- So one option to proper fix would fix aarch64 code generation to always update and rollback SP, as x86_64. I am not sure if this is really worth once -fstack-clash-protection has a better strategy for stack probing. The ADA compiler uses -fstack-clash as default, but I am not sure the language runtime requirement and/or -fstack-clash-protection is used instead. As a side note, this issue happens not only for aarch64 but potentially for all architectures that might not set STACK_CHECK_MOVING_SP. For instance on powerpc64le: $ gcc -fstack-check example.c -o example && valgrind ./example [...] ==2197773== Invalid write of size 8 ==2197773== at 0x100005FC: main (in /home/azanella/projects/valgrind/bz479996/example-fstack-check) ==2197773== Address 0x1fff00a760 is on thread 1's stack ==2197773== 16496 bytes below stack pointer ==2197773== [...] SInce the stack probing is also doing without updating the SP (r0 is used as the scratch register): --- main: .LFB1: .cfi_startproc .LCF1: 0: addis 2,12,.TOC.-.LCF1@ha addi 2,2,.TOC.-.LCF1@l .localentry main,.-main std 0,-16496(1) mflr 0 std 0,16(1) std 31,-8(1) stdu 1,-112(1) .cfi_def_cfa_offset 112 .cfi_offset 65, 16 .cfi_offset 31, -8 mr 31,1 .cfi_def_cfa_register 31 bl a_function li 9,0 extsw 9,9 mr 3,9 addi 1,31,112 .cfi_def_cfa 1, 0 ld 0,16(1) mtlr 0 ld 31,-8(1) blr --- -- You are receiving this mail because: You are watching all bug changes.