https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-01-14 Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |WAITING --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Not sure if I can parse the assembly. The rev quoted changes costing, so I assume the rest is the same. I see t.c:5:20: missed: not vectorized: relevant stmt not supported: _24 = _27 == _25; t.c:5:13: note: Building vector operands of 0x3411680 from scalars instead t.c:5:13: note: ==> examining statement: _22 = (int) _24; t.c:5:13: missed: type conversion to/from bit-precision unsupported. t.c:5:20: missed: not vectorized: relevant stmt not supported: _22 = (int) _24; t.c:5:13: note: Building vector operands of 0x34115f8 from scalars instead and so we end up with t.c:5:13: note: ***** Analysis succeeded with vector mode V8QI t.c:5:13: note: SLPing BB part t.c:5:13: note: Costing subgraph: t.c:5:13: note: node 0x3411570 (max_nunits=4, refcnt=1) t.c:5:13: note: op template: *dest_15(D) = _22; t.c:5:13: note: stmt 0 *dest_15(D) = _22; t.c:5:13: note: stmt 1 *_45 = _46; t.c:5:13: note: stmt 2 *_60 = _61; t.c:5:13: note: stmt 3 *_8 = _9; t.c:5:13: note: children 0x34115f8 t.c:5:13: note: node (external) 0x34115f8 (max_nunits=4, refcnt=1) t.c:5:13: note: stmt 0 _22 = (int) _24; t.c:5:13: note: stmt 1 _46 = (int) _44; t.c:5:13: note: stmt 2 _61 = (int) _59; t.c:5:13: note: stmt 3 _9 = (int) _7; t.c:5:13: note: children 0x3411680 t.c:5:13: note: node (external) 0x3411680 (max_nunits=4, refcnt=1) t.c:5:13: note: stmt 0 _24 = _27 == _25; t.c:5:13: note: stmt 1 _44 = _41 == _43; t.c:5:13: note: stmt 2 _59 = _56 == _58; t.c:5:13: note: stmt 3 _7 = _4 == _6; t.c:5:13: note: children 0x3411708 0x3411790 t.c:5:13: note: node 0x3411708 (max_nunits=2, refcnt=1) t.c:5:13: note: op template: _27 = *a_13(D); t.c:5:13: note: stmt 0 _27 = *a_13(D); t.c:5:13: note: stmt 1 _41 = *_40; t.c:5:13: note: stmt 2 _56 = *_55; t.c:5:13: note: stmt 3 _4 = *_3; t.c:5:13: note: node 0x3411790 (max_nunits=2, refcnt=1) t.c:5:13: note: op template: _25 = *b_14(D); t.c:5:13: note: stmt 0 _25 = *b_14(D); t.c:5:13: note: stmt 1 _43 = *_42; t.c:5:13: note: stmt 2 _58 = *_57; t.c:5:13: note: stmt 3 _6 = *_5; t.c:5:13: note: Cost model analysis: _22 1 times scalar_store costs 1 in body _46 1 times scalar_store costs 1 in body _61 1 times scalar_store costs 1 in body _9 1 times scalar_store costs 1 in body _22 2 times unaligned_store (misalign -1) costs 2 in body <unknown> 1 times vec_construct costs 2 in prologue <unknown> 1 times vec_construct costs 2 in prologue t.c:5:13: note: Cost model analysis for part in loop 0: Vector cost: 6 Scalar cost: 4 t.c:5:13: missed: not vectorized: vectorization is not profitable. but maybe I'm doing sth wrong since your assembler has the compare vectorized. I'm doing, with a cc1 cross configured as ./src/trunk/configure --target=arm-none-linux-gnueabihf --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp1 > ./cc1 -quiet t.c -I include -mcpu=cortex-a9 -mfpu=neon -O3