On 19/11/15 10:57, Segher Boessenkool wrote:
Hi Kyrill,

On Thu, Nov 19, 2015 at 10:26:25AM +0000, Kyrill Tkachov wrote:
In this PR we end up removing a widening multiplication. I suspect it's
some path in combine
that doesn't handle the case where we return clobber of 0 properly.
That is troublesome.  Could you look deeper?

Yes.
So the bad case is when we're in subst and returning a CLOBBER of zero
and 'from' is (reg/v:SI 80 [ x ]) and 'to' is (zero_extend:SI (reg:HI 0 x0 [ x 
])).
The call to subst comes from try_combine at line 3403:

   if (added_sets_1)
    {
      rtx t = i1pat;
      if (i0_feeds_i1_n)
        t = subst (t, i0dest, i0src_copy ? i0src_copy : i0src, 0, 0, 0);

      XVECEXP (newpat, 0, --total_sets) = t;
    }

It uses t after calling subst on it without checking that it didn't return a 
clobber.
If I change that snippet to check for CLOBBER:
  if (added_sets_1)
    {
      rtx t = i1pat;
      if (i0_feeds_i1_n)
        t = subst (t, i0dest, i0src_copy ? i0src_copy : i0src, 0, 0, 0);

      if (GET_CODE (t) == CLOBBER)
        {
          undo_all ();
          return 0;
        }
      XVECEXP (newpat, 0, --total_sets) = t;
    }

The testcase gets fixed.
But shouldn't the XVECEXP (newpat, 0, --total_sets) = t; line create an 
uncrecognizable rtx
that would then get rejected by combine or something?

If we don't check for clobber there and perform the "XVECEXP = ..."
the resulting newpat looks like:
(parallel [
        (set (reg:CC 66 cc)
            (compare:CC (const_int 0 [0])
                (const_int 0 [0])))
        (nil)
        (clobber:DI (const_int 0 [0]))
    ])

ah, so the clobber is put in a parallel with another insn and is thus accepted 
by recog?
So, should we check 't' after subst for clobbers as above? Or should this be 
fixed in
some other place?


This
started with my recent
combine patch r230326.
Changing the subst hunk from that patch to just return x when handling a
no-op substitution
fixes the testcase for me and doesn't regress the widening-mult cases that
r230326 improved.
In fact, with this patch I'm seeing improved codegen for aarch64 with some
widening multiplications
combined with constant operands and ending up in bitfield move/insert
instructions.

Bootstrapped and tested on aarch64, arm, x86_64.

Ok for trunk?
I'll have a look what it does for code quality on other targets.

Thanks. In light of the above I think this patch happens to avoid
the issue highlighted above but we should fix the above code separately?

Kyrill


Segher


Reply via email to