https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970

            Bug ID: 111970
           Summary: [tree-optimization] SLP for non-IFN gathers result in
                    RISC-V test failure on gather
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pan2.li at intel dot com
  Target Milestone: ---

Created attachment 56197
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56197&action=edit
Within this commit

Hi Richard Biener,

Recently we found one regression of RISC-V backend for gather autovec, aka
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c.
I narrow it down to a small piece of code like below:

include <stdint-gcc.h>

#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)                                      
\
  void __attribute__ ((noinline, noclone))                                    
\
  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x, 
\
                                INDEX_TYPE *restrict index)                   
\
  {                                                                           
\
    for (int i = 0; i < 100; ++i)                                             
\
      {                                                                       
\
        y[i * 2] = x[index[i * 2]] + 1;                                       
\
        y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;                               
\
      }                                                                       
\
  }

TEST_LOOP (float, uint8_t)

The commit id beab5b95c581452adeb26efd59ae84a61fb3b429
(tree-optimization/111131 - SLP for non-IFN gathers) makes the tree generate
the incorrect IR as the attachments.

The data array and the index array should have the same step after
vectorization. But we get incorrect offset for the second iteration.

vector(32) float vect__11.11;
_209 = BIT_FIELD_REF <MEM <vector(64) unsigned char> [(uint8_t *)_163], 8, 16>;

then update offset for the second iteration.

ivtmp.35_613 = ivtmp.35_594 + 64; // should be ivtmp = ivtmp + 32
ivtmp.38_76 = ivtmp.38_620 + 256;

I also upload the tree.optimized code before and after this commit, you can
check more details about it. Any more information required please feel free to
let me know.

Pan

Reply via email to