[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #10 from Robin Dapp  ---
>From what I can tell with my barely working connection no regressions on x86,
aarch64 or power10 with the adjusted check.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #9 from Robin Dapp  ---
Yes, that's from pattern recog:

slp.c:11:20: note:   === vect_pattern_recog ===
slp.c:11:20: note:   vect_recog_mask_conversion_pattern: detected: _5 = _2 &
_4;
slp.c:11:20: note:   mask_conversion pattern recognized: patt_157 = patt_156 &
_4;
slp.c:11:20: note:   extra pattern stmt: patt_156 = () _2;
slp.c:11:20: note:   vect_recog_bool_pattern: detected: _6 = (int) _5;
slp.c:11:20: note:   bool pattern recognized: patt_159 = (int) patt_158;
slp.c:11:20: note:   extra pattern stmt: patt_158 = _5 ? 1 : 0;
slp.c:11:20: note:   vect_recog_mask_conversion_pattern: detected: _11 = _8 &
_10;
slp.c:11:20: note:   mask_conversion pattern recognized: patt_161 = patt_160 &
_10;
slp.c:11:20: note:   extra pattern stmt: patt_160 = () _8;
...

In vect_recog_mask_conversion_pattern we arrive at

  if (TYPE_PRECISION (rhs1_type) < TYPE_PRECISION (rhs2_type))
{
  vectype1 = get_mask_type_for_scalar_type (vinfo, rhs1_type);
  if (!vectype1)
return NULL;
  rhs2 = build_mask_conversion (vinfo, rhs2, vectype1, stmt_vinfo);
}
  else
{
  vectype1 = get_mask_type_for_scalar_type (vinfo, rhs2_type);
  if (!vectype1)
return NULL;
  rhs1 = build_mask_conversion (vinfo, rhs1, vectype1, stmt_vinfo);
}
  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
  pattern_stmt = gimple_build_assign (lhs, rhs_code, rhs1, rhs2);


vectype1 is then e.g. vector([8,8]) .  Then
vect_recog_bool_pattern creates the COND_EXPR.

Testsuites are running with your proposed change.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #8 from rguenther at suse dot de  ---
On Mon, 16 Oct 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
> 
> --- Comment #7 from Robin Dapp  ---
>   vectp.4_188 = x_50(D);
>   vect__1.5_189 = MEM  [(int *)vectp.4_188];
>   mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189;
>   mask_patt_156.7_191 = VIEW_CONVERT_EXPR >(mask__2.6_190);
>   _1 = *x_50(D);
>   _2 = _1 == 1;
>   vectp.9_192 = y_51(D);
>   vect__3.10_193 = MEM  [(short int *)vectp.9_192];
>   mask__4.11_194 = { 2, 2, 2, 2, 2, 2, 2, 2 } == vect__3.10_193;
>   mask_patt_157.12_195 = mask_patt_156.7_191 & mask__4.11_194;
>   vect_patt_158.13_196 = VEC_COND_EXPR  1, 1, 1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
>   vect_patt_159.14_197 = (vector(8) int) vect_patt_158.13_196;
> 
> 
> This yields the following assembly:
> vsetivlizero,8,e32,m2,ta,ma
> vle32.v v2,0(a0)
> vmv.v.i v4,1
> vle16.v v1,0(a1)
> vmseq.vvv0,v2,v4
> vsetvli zero,zero,e16,m1,ta,ma
> vmseq.viv1,v1,2
> vsetvli zero,zero,e32,m2,ta,ma
> vmv.v.i v2,0
> vmand.mmv0,v0,v1
> vmerge.vvm  v2,v2,v4,v0
> vse32.v v2,0(a0)
> 
> Apart from CSE'ing v4 this looks pretty good to me.  My connection is really
> poor at the moment so I cannot quickly compare what aarch64 does for that
> example.

That looks reasonable.  Note this then goes through
vectorizable_assignment as a no-op move.  The question is
if we can arrive here with signed bool : 2 vs. _Bool : 2
somehow (I wonder how we arrive with singed bool : 1 here - that's
from pattern recog, right?  why didn't that produce a
COND_EXPR for this?).

I think for more thorough testing the condition should change to

  /* But a conversion that does not change the bit-pattern is ok.  */
  && !(INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
   && INTEGRAL_TYPE_P (TREE_TYPE (op))
   && ((TYPE_PRECISION (TREE_TYPE (scalar_dest))
   > TYPE_PRECISION (TREE_TYPE (op)))
   && TYPE_UNSIGNED (TREE_TYPE (op
   || TYPE_PRECISION (TREE_TYPE (scalar_dest))
  == TYPE_PRECISION (TREE_TYPE (op)

rather than just doing >= which would be odd (why allow
to skip sign-extenting from the unsigned MSB but not allow
to skip zero-extending from it)

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #7 from Robin Dapp  ---
  vectp.4_188 = x_50(D);
  vect__1.5_189 = MEM  [(int *)vectp.4_188];
  mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189;
  mask_patt_156.7_191 = VIEW_CONVERT_EXPR>(mask__2.6_190);
  _1 = *x_50(D);
  _2 = _1 == 1;
  vectp.9_192 = y_51(D);
  vect__3.10_193 = MEM  [(short int *)vectp.9_192];
  mask__4.11_194 = { 2, 2, 2, 2, 2, 2, 2, 2 } == vect__3.10_193;
  mask_patt_157.12_195 = mask_patt_156.7_191 & mask__4.11_194;
  vect_patt_158.13_196 = VEC_COND_EXPR ;
  vect_patt_159.14_197 = (vector(8) int) vect_patt_158.13_196;


This yields the following assembly:
vsetivlizero,8,e32,m2,ta,ma
vle32.v v2,0(a0)
vmv.v.i v4,1
vle16.v v1,0(a1)
vmseq.vvv0,v2,v4
vsetvli zero,zero,e16,m1,ta,ma
vmseq.viv1,v1,2
vsetvli zero,zero,e32,m2,ta,ma
vmv.v.i v2,0
vmand.mmv0,v0,v1
vmerge.vvm  v2,v2,v4,v0
vse32.v v2,0(a0)

Apart from CSE'ing v4 this looks pretty good to me.  My connection is really
poor at the moment so I cannot quickly compare what aarch64 does for that
example.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #6 from rguenther at suse dot de  ---
On Mon, 16 Oct 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
> 
> --- Comment #5 from Robin Dapp  ---
> Disregarding the reasons for the precision adjustment, for this case here, we
> seem to fail at:
> 
>   /* We do not handle bit-precision changes.  */
>   if ((CONVERT_EXPR_CODE_P (code)
>|| code == VIEW_CONVERT_EXPR)
>   && ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
>&& !type_has_mode_precision_p (TREE_TYPE (scalar_dest)))
>   || (INTEGRAL_TYPE_P (TREE_TYPE (op))
>   && !type_has_mode_precision_p (TREE_TYPE (op
>   /* But a conversion that does not change the bit-pattern is ok.  */
>   && !(INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
>&& INTEGRAL_TYPE_P (TREE_TYPE (op))
>&& (TYPE_PRECISION (TREE_TYPE (scalar_dest))
>> TYPE_PRECISION (TREE_TYPE (op)))
>&& TYPE_UNSIGNED (TREE_TYPE (op
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "type conversion to/from bit-precision "
>  "unsupported.\n");
>   return false;
> }
> 
> for the expression
>  patt_156 = () _2;
> where _2 (op) is of type _Bool (i.e. TYPE_MODE QImode) and patt_156
> (scalar_dest) is signed-boolean:1.  In that case the mode's precision (8) does
> not match the type's precision (1) for both op and _scalar_dest.
> 
> The second part of the condition I don't fully get.  When does a conversion
> change the bit pattern?  When the source has higher precision than the dest we
> would need to truncate which we probably don't want.  When the dest has higher
> precision that's considered ok?  What about equality?
> 
> If both op and dest have precision 1 the padding could differ (or rather the 1
> could be at different positions) but do we even support that?  In other words,
> could we relax the condition to TYPE_PRECISION (TREE_TYPE (scalar_dest)) >=
> TYPE_PRECISION (TREE_TYPE (op)) (>= instead of >)?
> 
> FWIW bootstrap and testsuite unchanged with >= instead of > on x86, aarch64 
> and
> power10 but we might not have a proper test for that?

It's about sign- vs. zero-extending into padding.  What kind of code
does the vectorizer emit?

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #5 from Robin Dapp  ---
Disregarding the reasons for the precision adjustment, for this case here, we
seem to fail at:

  /* We do not handle bit-precision changes.  */
  if ((CONVERT_EXPR_CODE_P (code)
   || code == VIEW_CONVERT_EXPR)
  && ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
   && !type_has_mode_precision_p (TREE_TYPE (scalar_dest)))
  || (INTEGRAL_TYPE_P (TREE_TYPE (op))
  && !type_has_mode_precision_p (TREE_TYPE (op
  /* But a conversion that does not change the bit-pattern is ok.  */
  && !(INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
   && INTEGRAL_TYPE_P (TREE_TYPE (op))
   && (TYPE_PRECISION (TREE_TYPE (scalar_dest))
   > TYPE_PRECISION (TREE_TYPE (op)))
   && TYPE_UNSIGNED (TREE_TYPE (op
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "type conversion to/from bit-precision "
 "unsupported.\n");
  return false;
}

for the expression
 patt_156 = () _2;
where _2 (op) is of type _Bool (i.e. TYPE_MODE QImode) and patt_156
(scalar_dest) is signed-boolean:1.  In that case the mode's precision (8) does
not match the type's precision (1) for both op and _scalar_dest.

The second part of the condition I don't fully get.  When does a conversion
change the bit pattern?  When the source has higher precision than the dest we
would need to truncate which we probably don't want.  When the dest has higher
precision that's considered ok?  What about equality?

If both op and dest have precision 1 the padding could differ (or rather the 1
could be at different positions) but do we even support that?  In other words,
could we relax the condition to TYPE_PRECISION (TREE_TYPE (scalar_dest)) >=
TYPE_PRECISION (TREE_TYPE (op)) (>= instead of >)?

FWIW bootstrap and testsuite unchanged with >= instead of > on x86, aarch64 and
power10 but we might not have a proper test for that?

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #4 from Robin Dapp  ---
Just to mention here as well.  As this seems ninstance++ where the
adjust_precision thing comes back to bite us, I'm going to go back and check if
the issue why it was introduced (DCE?) cannot be solved differently.  I'd
rather have us not deviate from other backends at such a central part as mode
precisions.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #3 from Richard Biener  ---
I'm failing to see the issue as with -march=rv64gcv I run into

t.c:4:8: missed:   not vectorized: relevant stmt not supported: *x_50(D) = _6;
t.c:4:8: note:   removing SLP instance operations starting from: *x_50(D) = _6;
t.c:4:8: missed:  not vectorized: bad operation in basic block.

but just guessing, the issue is bool pattern recognition and

t.c:12:1: note:   using normal nonmask vectors for _2 = _1 == 1;
t.c:12:1: note:   using normal nonmask vectors for _4 = _3 == 2;
t.c:12:1: note:   using normal nonmask vectors for _5 = _2 & _4;
...

?  To vectorize you'd want to see

t.c:12:1: note:   using boolean precision 32 for _2 = _1 == 1;
t.c:12:1: note:   using boolean precision 16 for _4 = _3 == 2;
t.c:12:1: note:   using boolean precision 16 for _5 = _2 & _4;
...

and a pattern used for the value use:

t.c:12:1: note:   extra pattern stmt: patt_62 = _5 ? 1 : 0;

You need to see why this doesn't work (it's a very delicate area).

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #2 from JuzheZhong  ---
Note that the reason we adjust the mask mode precision here is because 
the DSE bug for "small mask mode"


https://github.com/gcc-mirror/gcc/commit/247cacc9e381d666a492dfa4ed61b7b19e2d008f

This is the commit show why we adjust precision.

[Bug c/111794] RISC-V: Missed SLP optimization due to mask mode precision

2023-10-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794

--- Comment #1 from JuzheZhong  ---
This is RISC-V target specific issue.

ARM SVE can vectorize it.