https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121867

--- Comment #1 from Jeevitha <jeevitha at gcc dot gnu.org> ---
The modulo reduction for shift amounts in AltiVec’s vec_sl is already
implemented in the GIMPLE folding pass for PowerPC within the
rs6000_gimple_fold_builtin function. However, this folding is restricted by a
type check that excludes non-overflow-wrapping types:

if (INTEGRAL_TYPE_P (TREE_TYPE (arg0_type)) && !TYPE_OVERFLOW_WRAPS (TREE_TYPE
(arg0_type)))
    return false;

This check prevents signed types for the first argument (arg0, the vector to be
shifted) from reaching the modulo reduction logic.

For unsigned types, the folding should apply the modulo reduction, as shown
below:

_1 = { 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35 };
_2 = _1 % { 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 };
D.4059 = in << _2;
return D.4059;

However, in our case, even though arg0 is a vector unsigned char (which
satisfies TYPE_OVERFLOW_WRAPS), the modulo reduction is not applied. This is
because the input vector 'in' is unexpectedly cast to vector signed char in the
GIMPLE representation, as shown below:

{
  # DEBUG BEGIN STMT;
  return VIEW_CONVERT_EXPR<__vector unsigned char>(
      __builtin_altivec_vslb(
          VIEW_CONVERT_EXPR<__vector signed char>(in),
          {35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35}));
}

The root cause lies in how the AltiVec built-ins are defined in
rs6000-builtins.def. The prototype for vslb is defined as:

const vsc __builtin_altivec_vslb (vsc, vuc);
    VSLB vashlv16qi3 {}

Here, vsc (vector signed char) is used for the first argument, while vuc
(vector unsigned char) is used for the second (shift amount). Despite overloads
defined in rs6000-overload.def for unsigned cases:

[VEC_SL, vec_sl, __builtin_vec_sl]
  vsc __builtin_vec_sl (vsc, vuc);
    VSLB  VSLB_VSC
  vuc __builtin_vec_sl (vuc, vuc);
    VSLB  VSLB_VUC

GCC prioritizes the rs6000-builtins.def definition, which casts the input (in)
to vector signed char when processing __builtin_vec_sl. As a result, the
unsigned overload (vuc) is ignored, the type check in
rs6000_gimple_fold_builtin fails, and the modulo optimization is not invoked.

Reply via email to