https://bugs.kde.org/show_bug.cgi?id=479996

Adhemerval Zanella <zatr...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zatr...@gmail.com

--- Comment #1 from Adhemerval Zanella <zatr...@gmail.com> ---
I think the issue is how -fstack-check is implemented on aarch64 and I am not
sure how to fix on valgrind without decrease the coverage somehow. On aarch64,
the example code is translated as:

---
a_function:
        sub     x10, sp, #8192
        str     xzr, [x10]
        sub     x10, x10, #4096
        str     xzr, [x10, 1536]
        mov     x12, 10752
        sub     sp, sp, x12
        nop
        add     sp, sp, x12
        ret
main:
        sub     x10, sp, #8192
        str     xzr, [x10, 4080]
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      a_function
        mov     w0, 0
        ldp     x29, x30, [sp], 16
        ret
---

So the stack is probed *without* updating the stack pointer, and it is
technically a stack overflow. Different than x86_64, where the stack pointer is
updated and then adjusted after the probe:

---
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 8216
        or      QWORD PTR [rsp], 0
        add     rsp, 8216
        mov     eax, 0
        call    a_function
        mov     eax, 0
        leave
        ret
---

Both ABIs defined specific code generation (-fstack-check=specific), but x86_64
also internally defines STACK_CHECK_MOVING_SP
(https://gcc.gnu.org/onlinedocs//gccint/Stack-Checking.html). I think it is
because x86_64 frame generation code requires the SP to be updated to correctly
generate FP and local variables access correctly, which is not required by
aarch64 backend and thus gcc developers could turn it off for better code
generation.

The SP update is not strickly required, since for Linux and other OS, the
kernel will guarantee a minimum stack size that is lazy allocated. Ideally the
probe would either trigger lazy allocations through soft page faults or hit the
guard page/invalid memory allocation. The problem is -fstack-check strategy has
multiple corner cases, that's why recent gcc provides the
-fstack-clash-protection instead.

On aarch64 the -fstack-clash-protection will update the SP in the expected
increments if the stack allocation requires the probing. Using
-fstack-clash-protection --param stack-clash-protection-probe-interval=12
--param stack-clash-protection-guard-size=12:

---
a_function:
        sub     sp, sp, #4096
        str     xzr, [sp, 1024]
        sub     sp, sp, #4096
        str     xzr, [sp, 1024]
        sub     sp, sp, #2560
        nop
        mov     x12, 10752
        add     sp, sp, x12
        ret
main:
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      a_function
        mov     w0, 0
        ldp     x29, x30, [sp], 16
        ret
---

So one option to proper fix would fix aarch64 code generation to always update
and rollback SP, as x86_64. I am not sure if this is really worth once
-fstack-clash-protection has a better strategy for stack probing. The ADA
compiler uses -fstack-clash as default, but I am not sure the language runtime
requirement and/or -fstack-clash-protection is used instead.

As a side note, this issue happens not only for aarch64 but potentially for all
architectures that might not set STACK_CHECK_MOVING_SP. For instance on
powerpc64le:

$ gcc -fstack-check example.c -o example && valgrind ./example
[...]
==2197773== Invalid write of size 8
==2197773==    at 0x100005FC: main (in
/home/azanella/projects/valgrind/bz479996/example-fstack-check)
==2197773==  Address 0x1fff00a760 is on thread 1's stack
==2197773==  16496 bytes below stack pointer
==2197773==
[...]

SInce the stack probing is also doing without updating the SP (r0 is used as
the scratch register):
---
main:
.LFB1:
        .cfi_startproc
.LCF1:
0:      addis 2,12,.TOC.-.LCF1@ha
        addi 2,2,.TOC.-.LCF1@l
        .localentry     main,.-main
        std 0,-16496(1)
        mflr 0
        std 0,16(1)
        std 31,-8(1)
        stdu 1,-112(1)
        .cfi_def_cfa_offset 112
        .cfi_offset 65, 16
        .cfi_offset 31, -8
        mr 31,1
        .cfi_def_cfa_register 31
        bl a_function
        li 9,0
        extsw 9,9
        mr 3,9
        addi 1,31,112
        .cfi_def_cfa 1, 0
        ld 0,16(1)
        mtlr 0
        ld 31,-8(1)
        blr
---

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to