https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115116

            Bug ID: 115116
           Summary: [x86] rtx_cost is overestimated for big size memory.
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

typedef char v16qi __attribute__((vector_size(16)));


v16qi
__attribute__((noipa))
foo (v16qi a)
{
  v16qi c = __extension__(v16qi)
    { 0x1,0x2,0x3,0x4,0x5,0x6,0x7,0x8,
      0x8,0x7,0x6,0x5,0x4,0x3,0x2,0x1 };
  return a * c;
}

with -O2 -march=x86-64-v4

        .cfi_startproc
        vpmovzxbw       .LC0(%rip), %ymm1
        vpmovzxbw       %xmm0, %ymm0
        vpmullw %ymm1, %ymm0, %ymm0
        vpmovwb %ymm0, %xmm0
        vzeroupper

but it can be optimized to 

        .cfi_startproc
        vpmovzxbw       %xmm0, %ymm0
        vpmullw .LC0(%rip), %ymm0, %ymm0
        vpmovwb %ymm0, %xmm0
        vzeroupper

but failed due to cost comparison

        .cfi_startproc
        vpmovzxbw       %xmm0, %ymm0
        vpmullw .LC0(%rip), %ymm0, %ymm0
        vpmovwb %ymm0, %xmm0
        vzeroupper

Successfully matched this instruction:
(set (reg:V16HI 104)
    (mem/u/c:V16HI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S32 A256]))
rejecting combination of insns 6 and 10
original costs 9 + 4 = 13
replacement cost 17

For bigger mode, rtx_cost use factor = GET_MODE_SIZE / UNIT_PER_WORD, and
return cost = factor * COSTS_N_INSNS (1), that's too much for 256/512-bit
vector, they're probably loaded/stored with sse register.

Reply via email to