On Wed, Nov 17, 2021 at 07:52:38AM -0600, Bill Schmidt wrote:
> >>  - For int_128bit-runnable.c, I chose not to do gimple folding on the 
> >> 128-bit
> >>    comparison operations in the new implementation, because doing so 
> >> results in
> >>    bad code that splits things into two 64-bit values.  That needs separate
> >>    attention; but the point here is, when I did that, I started generating
> >>    more of the vcmpequq, vcmpgtsq, and vcmpgtuq instructions.
> > And you now get worse code (albeit in some cases no longer invalid)?
> 
> No, sorry that this wasn't more clear.  The "old" builtins code performs
> gimple folding on 128-bit compares.  This results in correct but very
> inefficient code.  The "new" builtins code has removed the gimple folding
> for 128-bit compares.  This results in directly generating vcmpequq and
> friends, which is the efficient code we're looking for.  This test case
> then needs modification to show we're doing better.  I'll submit this
> separately.

Hrm.  Folding should always be a good thing to do; and folding should
never split an operation on a 128-bit datum into two operations on
64-bit things.  That kind of optimisation cannot be sanely done on
Gimple level: the abstractions are not close enough to the hardware for
that, and the instruction stream is not close at all to what the
eventual machine insns will be.  We have an RTL pass that does this
("subreg"), it runs almost immediately after expand (and two more
times, even again after the split pass).

So there is a generic bug that you counteract with a target bug :-(

> >> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
> >> @@ -14,7 +14,7 @@ get_exponent (double *p)
> >>  {
> >>    double source = *p;
> >>  
> >> -  return scalar_extract_exp (source);     /* { dg-error 
> >> "'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
> >> configuration" } */
> >> +  return scalar_extract_exp (source);     /* { dg-error 
> >> "'__builtin_vsx_scalar_extract_exp' requires the" } */
> >>  }
> > The testcase uses __builtin_vec_scalar_extract_exp, so this is not okay.
> 
> Sorry, this is a case of my bad eyesight not identifying this had changed.
> As with the test case (cmpb-3.c) in the 32-bit patch, this error message
> isn't all that the user sees.  There is also a "note" diagnostic that ties
> the generic overload name to the specific underlying builtin name so that
> confusion is avoided.  I'll just submit these separately with a full
> explanation.

Can't you go just two inches further and report the actual builtin used
by the user (which even is documented!), and not cause any confusion?

> > It is not okay to blindly adjust the testcases to accept what the new
> > code does.  This is a regression.  It is okay to have it regressed for a
> > while.  It is also okay to xfail things, if there is no expectation it
> > can be fixed before the next release (or some other suitably big time
> > frame, this isn't an exact science).
> 
> This isn't really a regression, as I'll describe with each patch.

Looking forward to it :-)

> >> --- a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> >> @@ -10,5 +10,5 @@
> >>  int
> >>  test_byte_in_set (unsigned char b, unsigned long long set_members)
> >>  {
> >> -  return __builtin_byte_in_set (b, set_members); /* { dg-warning 
> >> "implicit declaration of function" } */
> >> +  return __builtin_byte_in_set (b, set_members); /* { dg-error 
> >> "'__builtin_scalar_byte_in_set' requires the" } */
> >>  }
> > Huh.  How can the old warning ever have fired?  Was the builtin not
> > declared on 32-bit before?  Ouch.
> 
> I'll remind myself what changed here, but yes, that's what it looks like --
> an inadvertent problem with the old logic for 32-bit.

In general it is better to always have all builtins (and other
interfaces) declared internally, so that you can give much better error
messages (and so that you get errors if there are conflicts, etc.)

There can be exceptions, but this is not a case like that :-)  (So your
change is great :-) )

> >> --- a/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
> >> @@ -10,6 +10,6 @@ main ()
> >>    int mask;
> >>  
> >>    /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
> >> -  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error 
> >> {argument 3 must be in the range \[0, 15\]} } */
> >> +  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error 
> >> {argument 3 must be a 4-bit unsigned literal} } */
> >>    return 0;
> >>  }
> > Hrm, make this say "must be a literal between 0 and 15, inclusive" like
> > the other errors?
> 
> The "n-bit unsigned literal" is the usual case.  I'll provide more explanation
> in the separate patch.

We should use the same formulation always.  I like the more verbose more
exact less confusing and even *correct* formulation :-)

> Again, I'm sorry this was difficult to review.  Originally I thought it would 
> be easiest
> to keep all these together, but that clearly wasn't helpful.  I'll work on 
> breaking
> this up.

In general it is best to keep testcase changes together with the patch
that necessitates them.  You cannot use this now; this is one of the
reasons why it is much better to do the changes step by step (where
every change is immediately engaged!)  This is more work up front, but
it may well be less work in total.  It certainly is less frustrating :-)


Segher

Reply via email to