https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #3)
> lzma_sha256_update in sha256.c is miscompiled with -O3 -march=skylake.
> Correct code:
> 
> L42:
> ...
>         vpshufb %ymm7, %ymm1, %ymm0
>         vmovdqa %ymm0, (%rsp)
>         leaq    64(%r13), %rdi
>         vpshufb %ymm7, %ymm2, %ymm0
>         movq    %rsp, %rsi
>         vmovdqa %ymm0, 32(%rsp)
>         call    transform
>         vmovdqa .LC0(%rip), %ymm7  <<< This is missing.
> .L49:
>         testq   %rbx, %rbx
>         je      .L69
> .L50:
> ...
>         cmpl    $32, %ecx
>         jb      .L65

vzeroupper clears the upper bits of %ymm7 when transform returns.  If
-mzeroupper is used, upper bits of vector registers are clobbered upon
callee return if any YMM/ZMM registers are used in callee.

Reply via email to