https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to H.J. Lu from comment #3) > lzma_sha256_update in sha256.c is miscompiled with -O3 -march=skylake. > Correct code: > > L42: > ... > vpshufb %ymm7, %ymm1, %ymm0 > vmovdqa %ymm0, (%rsp) > leaq 64(%r13), %rdi > vpshufb %ymm7, %ymm2, %ymm0 > movq %rsp, %rsi > vmovdqa %ymm0, 32(%rsp) > call transform > vmovdqa .LC0(%rip), %ymm7 <<< This is missing. > .L49: > testq %rbx, %rbx > je .L69 > .L50: > ... > cmpl $32, %ecx > jb .L65 vzeroupper clears the upper bits of %ymm7 when transform returns. If -mzeroupper is used, upper bits of vector registers are clobbered upon callee return if any YMM/ZMM registers are used in callee.