https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
I was looking at generated code (with -mtune=intel):

        vpbroadcastd    %edi, %zmm0     # 9     *avx512f_vec_dup_gprv16si/2    
[length = 6]
        movl    %edi, %edi      # 12    *zero_extendsidi2/4     [length = 2]
        vmovq   %rdi, %xmm1     # 26    *movdi_internal/20      [length = 6]
        vpsrad  %xmm1, %zmm0, %zmm0     # 17    ashrv16si3/1    [length = 6]
        ret     # 29    simple_return_internal  [length = 1]

(insn 12) and (insn 26) could be merged to

        vmovd   %edx, %xmm0     # 13    *zero_extendsidi2/10    [length = 6]

Register allocator somehow avoids zero-extension to SSE reg in (insn 12) and
generates input reload (insn 26) for (insn 17):

    Inserting insn reload before:
   26: r107:DI=r103:DI
         ...
         Choosing alt 19 in insn 26:  (0) ?*Yi  (1) r {*movdi_internal}

RA could choose the same (?*Yi, r) alternative in the (insn 12).

REE pass also doesn't merge (insn 12) and (insn 26).

Reply via email to