Issue 178268
Summary [X86] Stack clash protection with loop-based probing omits varargs XMM register saves
Labels new issue
Assignees
Reporter philiptaron
    ## Summary

When `-fstack-clash-protection` triggers loop-based stack probing (stack frame ≥ 32KB on Linux x86-64), the XMM register save sequence for varargs functions is completely omitted. This causes `va_arg(ap, double)` to read uninitialized memory instead of the actual floating-point argument value.

I ran into this while compiling `mpr` in Nixpkgs using Clang.

## Environment

- **Clang version**: 21.1.8 (also reproduced with earlier versions)
- **Target**: x86_64-unknown-linux-gnu
- **OS**: Linux (NixOS, but reproducible on other distros)

## Minimal Reproducer

```c
// t32k.c - FAILS (reads uninitialized memory, prints 0)
#include <stdarg.h>
#include <stdio.h>

void test(const char *f, ...) {
    char b[32768];  // >= 32KB triggers loop-based stack probing
    va_list ap;
    va_start(ap, f);
    double d = va_arg(ap, double);
    sprintf(b, f, d);
    printf("%s\n", b);
    va_end(ap);
}

int main() {
    test("%e", -1.25);
    return 0;
}
```

```c
// t16k.c - WORKS (correctly prints -1.250000e+00)
#include <stdarg.h>
#include <stdio.h>

void test(const char *f, ...) {
    char b[16384];  // < 32KB uses inline probing, works correctly
    va_list ap;
    va_start(ap, f);
    double d = va_arg(ap, double);
    sprintf(b, f, d);
    printf("%s\n", b);
    va_end(ap);
}

int main() {
    test("%e", -1.25);
    return 0;
}
```

**Compile and run:**
```bash
$ clang -fstack-clash-protection -o t32k t32k.c && ./t32k
0.000000e+00   # WRONG - should be -1.250000e+00

$ clang -fstack-clash-protection -o t16k t16k.c && ./t16k
-1.250000e+00  # Correct
```

**Workaround:**
```bash
$ clang -fno-stack-clash-protection -o t32k t32k.c && ./t32k
-1.250000e+00  # Correct with stack clash protection disabled
```

## Root Cause Analysis

### Expected Behavior (16KB buffer - inline probing)

The x86-64 SysV ABI requires varargs functions to:
1. Check `%al` for the count of XMM registers used by caller
2. If non-zero, save XMM0-XMM7 to the register save area

With a 16KB buffer, the generated prologue correctly includes:
```asm
test   %al,%al               # Check if any FP args were passed
je     .skip_xmm_save        # Skip if none
movaps %xmm0,-0x40b0(%rbp)   # Save XMM0
movaps %xmm1,-0x40a0(%rbp)   # Save XMM1
... (saves all xmm0-xmm7)
.skip_xmm_save:
```

### Actual Behavior (32KB buffer - loop-based probing)

With a 32KB buffer, stack clash protection switches to loop-based probing. The generated prologue **completely omits** the `test %al,%al` check and all XMM register saves:

```asm
# Loop-based stack probe
mov    %rsp,%r11
sub    $0x8000,%r11
.probe_loop:
sub    $0x1000,%rsp
test   %rsp,(%rsp)           # Probe the stack
cmp    %r11,%rsp
jne    .probe_loop

# GP register saves (present)
mov    %rsi,-0x80d8(%rbp)
mov    %rdx,-0x80d0(%rbp)
...

# Stack canary setup
mov    %fs:0x28,%rax

# NO test %al, NO movaps instructions!
# XMM register saves are completely missing
```

This causes `va_arg(ap, double)` to read from the uninitialized register save area, returning garbage (typically 0).

## Impact

This bug affects any varargs function that:
1. Has a stack frame ≥ 32KB (the threshold for loop-based probing on Linux)
2. Is compiled with `-fstack-clash-protection` (default in many hardened builds)
3. Receives floating-point arguments via varargs

Real-world impact: **MPFR 4.2.2** test suite fails (`tsprintf` test) when built with Clang and stack clash protection enabled, because `mpfr_vsprintf` uses a 65KB buffer internally.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to