[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

rguenther at suse dot de via Gcc-bugs Tue, 28 Sep 2021 00:09:51 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494


--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 28 Sep 2021, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
> 
> --- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
> After supporting v4hi reduce, gimple seems not optimal to convert v8qi to 
> v8hi.
> 
>  6  vector(4) short int vect__21.36;
>  7  vector(4) unsigned short vect__2.31;
>  8  int16_t stmp_r_17.17;
>  9  vector(8) short int vect__16.15;
> 10  int16_t D.2229[8];
> 11  vector(8) short int _50;
> 12  vector(8) short int _51;
> 13  vector(8) short int _52;
> 14  vector(8) short int _53;
> 15  vector(8) short int _54;
> 16  vector(8) short int _55;
> 
> 18  <bb 2> [local count: 189214783]:
> 19  vect__2.31_97 = [vec_unpack_lo_expr] a_90(D);
> 20  vect__2.31_98 = [vec_unpack_hi_expr] a_90(D);
> 21  vect__21.36_105 = VIEW_CONVERT_EXPR<vector(4) short int>(vect__2.31_97);
> 22  vect__21.36_106 = VIEW_CONVERT_EXPR<vector(4) short int>(vect__2.31_98);
> 23  MEM <vector(4) short int> [(short int *)&D.2229] = vect__21.36_105;
> 24  MEM <vector(4) short int> [(short int *)&D.2229 + 8B] = vect__21.36_106;

so the above could possibly use a V8QI -> V8HI conversion, the loop
vectorizer isn't good at producing those though.  And of course the
appropriate conversion optab has to exist.

> 25  vect__16.15_47 = MEM <vector(8) short int> [(short int *)&D.2229];

Here's lack of "CSE" - I do have patches somewhere to turn this into

  vect__16.15_47 = { vect__21.36_105, vect__21.36_106 };

but I'm not sure that's going to be profitable (well, the code as-is
will get a STLF hit).

There's also store-merging that could instead merge the stores
similarly (but then there's no CSE after store-merging so the load
would remain).

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

Reply via email to