https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754

--- Comment #7 from prathamesh3492 at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
> It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 })
> 2 and VECTOR_CST_NPATTERNS == 1.  And the selector { 1, 0, 1, 2 } has
> npatterns == 1 and nelts-per-pattern == 3.
> 
>   /* (1) If SEL is a suitable mask as determined by
>      valid_mask_for_fold_vec_perm_cst_p, then:
>      res_npatterns = max of npatterns between ARG0, ARG1, and SEL
>      res_nelts_per_pattern = max of nelts_per_pattern between
>                              ARG0, ARG1 and SEL.
>      (2) If SEL is not a suitable mask, and TYPE is VLS then:
>      res_npatterns = nelts in result vector.
>      res_nelts_per_pattern = 1.
>      This exception is made so that VLS ARG0, ARG1 and SEL work as before. 
> */
>   if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
>     {
>       res_npatterns
>         = std::max (VECTOR_CST_NPATTERNS (arg0),
>                     std::max (VECTOR_CST_NPATTERNS (arg1),
>                               sel.encoding ().npatterns ()));
> 
>       res_nelts_per_pattern
>         = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
>                     std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
>                               sel.encoding ().nelts_per_pattern ()));
> 
>       res_nelts = res_npatterns * res_nelts_per_pattern;
> 
> this seems to be a case that doesn't fit, so the fix needs to be to
> valid_mask_for_fold_vec_perm_cst_p which really looks a bit
> unwieldly.
valid_mask_for_fold_vec_perm_cst_p returns incorrectly true here,
which is being addressed in PR111648 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631926.html

Even if the vectors had integral element type:
arg0 = arg1 = (v4si){ 9, 0, 0, 0 }  // encoded as {9, 0, ...}
and sel = { 1, 0, 1, 2 }  // encoded as {1, 0, 1, ...}

The pattern in sel {1, 0, 1, ...}
would choose elements from arg0, and
res would have incorrect encoding with step = -9:
res = { arg0[1], arg0[0], arg0[1], ... } 
    = { 0, 9, 0, ... }
And res[3] will be incorrectly computed as -9 instead of arg0[2].

However, for floating element types, even if encoding is correct,
I assume it will still ICE when trying to derive elements not present in
encoding since poly_int_cst can only deal with integral elements ?
> 
> An assert that res_nelts is power-of-two would be nice to add.
Sorry, I don't understand. res_nelts for VLA need not be power of 2,
since res_nelts_per_pattern can be 3. The encoding for res is chosen
to be max of npatterns and max of nelts_per_pattern between arg0, arg1, and
sel.

Thanks,
Prathamesh

Reply via email to