https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108803

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/optabs.cc.jj    2023-01-02 09:32:53.309838465 +0100
+++ gcc/optabs.cc       2023-02-16 18:04:54.794871019 +0100
@@ -596,6 +596,16 @@ expand_doubleword_shift_condmove (scalar
 {
   rtx outof_superword, into_superword;

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                               GET_MODE (superword_op1)),
+                                     GET_MODE (superword_op1));
+      superword_op1
+       = simplify_expand_binop (op1_mode, and_optab, superword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the superword version of the output into OUTOF_SUPERWORD and
      INTO_SUPERWORD.  */
   outof_superword = outof_target != 0 ? gen_reg_rtx (word_mode) : 0;
@@ -617,6 +627,16 @@ expand_doubleword_shift_condmove (scalar
        return false;
     }

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                               GET_MODE (subword_op1)),
+                                     GET_MODE (subword_op1));
+      subword_op1
+       = simplify_expand_binop (op1_mode, and_optab, subword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the subword version directly in OUTOF_TARGET and INTO_TARGET.  */
   if (!expand_subword_shift (op1_mode, binoptab,
                             outof_input, into_input, subword_op1,
indeed fixes the miscompilation, but unfortunately with e.g.
__attribute__((noipa)) __int128
foo (__int128 a, unsigned k)
{
  return a << k;
}

__attribute__((noipa)) __int128
bar (__int128 a, unsigned k)
{
  return a >> k;
}
results in one extra insn in each of the functions.  While the superword_op1
case
is fine because aarch64 (among other arches) has a pattern to catch shift with
masked count, in the subword_op1 case that doesn't work, because
expand_subword_shift actually emits 3 shifts instead of just one, one with
(BIT_PER_WORD - 1) - op1 as shift count
and two with op1.  If the op1 &= (BITS_PER_WORD - 1) masking is done in the
caller, then
it can't be easily merged with the shifts.
We could do that also separately in expand_subword_shift under some new bool
and in that
case instead of using op1 &= (BITS_PER_WORD - 1); shift1 by ((BITS_PER_WORD -
1) - op1); shift2 by op1; shift3 by op1 use tmp = (63 - op1) & (BITS_PER_WORD -
1); shift1 by tmp; op1 &= (BITS_PER_WORD - 1); shift2 by op1; shift3 by op1,
but that would be larger code if the target doesn't have those shift with
masking patterns that trigger on it.  Perhaps have some target hook?  Or try to
recog the combined instruction?

Reply via email to