[Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 14 Jan 2022 00:19:28 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-01-14
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
             Status|UNCONFIRMED                 |WAITING

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Not sure if I can parse the assembly.  The rev quoted changes costing, so I
assume the rest is the same.  I see

t.c:5:20: missed:   not vectorized: relevant stmt not supported: _24 = _27 ==
_25;
t.c:5:13: note:   Building vector operands of 0x3411680 from scalars instead
t.c:5:13: note:   ==> examining statement: _22 = (int) _24;
t.c:5:13: missed:   type conversion to/from bit-precision unsupported.
t.c:5:20: missed:   not vectorized: relevant stmt not supported: _22 = (int)
_24;
t.c:5:13: note:   Building vector operands of 0x34115f8 from scalars instead

and so we end up with

t.c:5:13: note: ***** Analysis succeeded with vector mode V8QI
t.c:5:13: note: SLPing BB part
t.c:5:13: note: Costing subgraph:
t.c:5:13: note: node 0x3411570 (max_nunits=4, refcnt=1)
t.c:5:13: note: op template: *dest_15(D) = _22;
t.c:5:13: note:         stmt 0 *dest_15(D) = _22;
t.c:5:13: note:         stmt 1 *_45 = _46;
t.c:5:13: note:         stmt 2 *_60 = _61;
t.c:5:13: note:         stmt 3 *_8 = _9;
t.c:5:13: note:         children 0x34115f8
t.c:5:13: note: node (external) 0x34115f8 (max_nunits=4, refcnt=1)
t.c:5:13: note:         stmt 0 _22 = (int) _24;
t.c:5:13: note:         stmt 1 _46 = (int) _44;
t.c:5:13: note:         stmt 2 _61 = (int) _59;
t.c:5:13: note:         stmt 3 _9 = (int) _7;
t.c:5:13: note:         children 0x3411680
t.c:5:13: note: node (external) 0x3411680 (max_nunits=4, refcnt=1)
t.c:5:13: note:         stmt 0 _24 = _27 == _25;
t.c:5:13: note:         stmt 1 _44 = _41 == _43;
t.c:5:13: note:         stmt 2 _59 = _56 == _58;
t.c:5:13: note:         stmt 3 _7 = _4 == _6;
t.c:5:13: note:         children 0x3411708 0x3411790
t.c:5:13: note: node 0x3411708 (max_nunits=2, refcnt=1)
t.c:5:13: note: op template: _27 = *a_13(D);
t.c:5:13: note:         stmt 0 _27 = *a_13(D);
t.c:5:13: note:         stmt 1 _41 = *_40;
t.c:5:13: note:         stmt 2 _56 = *_55;
t.c:5:13: note:         stmt 3 _4 = *_3;
t.c:5:13: note: node 0x3411790 (max_nunits=2, refcnt=1)
t.c:5:13: note: op template: _25 = *b_14(D);
t.c:5:13: note:         stmt 0 _25 = *b_14(D);
t.c:5:13: note:         stmt 1 _43 = *_42;
t.c:5:13: note:         stmt 2 _58 = *_57;
t.c:5:13: note:         stmt 3 _6 = *_5;
t.c:5:13: note: Cost model analysis:
_22 1 times scalar_store costs 1 in body
_46 1 times scalar_store costs 1 in body
_61 1 times scalar_store costs 1 in body
_9 1 times scalar_store costs 1 in body
_22 2 times unaligned_store (misalign -1) costs 2 in body
<unknown> 1 times vec_construct costs 2 in prologue
<unknown> 1 times vec_construct costs 2 in prologue
t.c:5:13: note: Cost model analysis for part in loop 0:
  Vector cost: 6
  Scalar cost: 4
t.c:5:13: missed: not vectorized: vectorization is not profitable.

but maybe I'm doing sth wrong since your assembler has the compare vectorized.

I'm doing, with a cc1 cross configured as

./src/trunk/configure --target=arm-none-linux-gnueabihf --with-float=hard
--with-cpu=cortex-a9 --with-fpu=neon-fp1

> ./cc1 -quiet t.c -I include -mcpu=cortex-a9 -mfpu=neon -O3

[Bug tree-optimization/104010] [12 regression] short loop no longer vectorized with Neon after r12-3362

Reply via email to