Hi!

On Mon, Jul 04, 2022 at 02:27:42PM +0800, HAO CHEN GUI wrote:
>   This patch fails TImode for all 128-bit logical operation expanders. So
> TImode splits to two DI registers during expand. Potential optimizations can
> be taken after expand pass. Originally, the TImode logical operations are
> split after reload pass. It's too late. The test case illustrates it.

Out of interest, did you see any performance win?  There is a lot of
opportunity for this to cause performance *loss*, on newer systems :-(

> ChangeLog
> 2022-07-04 Haochen Gui <guih...@linux.ibm.com>

Two spaces both before and after your name, in changelogs.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7078,27 +7078,38 @@ (define_expand "subti3"
>  })
>  
>  ;; 128-bit logical operations expanders
> +;; Fail TImode in all 128-bit logical operations expanders and split it into
> +;; two DI registers.
> 
>  (define_expand "and<mode>3"
>    [(set (match_operand:BOOL_128 0 "vlogical_operand")
>       (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>                     (match_operand:BOOL_128 2 "vlogical_operand")))]
>    ""
> -  "")
> +{
> +  if (<MODE>mode == TImode)
> +    FAIL;
> +})

It is better to not FAIL it, but simply not have a pattern for the
TImode version at all.

Does nothing depend on the :TI version to exist?

What about the :PTI version?  Getting rid of that as well will allow
some nice optimisations.

Of course we *do* have instructions to do such TImode ops, on newer
CPUs, but in vector registers only.  It isn't obvious what is faster.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
> +/* { dg-final { scan-assembler-not {\mli\M} } } */
> +/* { dg-final { scan-assembler-not {\mor\M} } } */
> +
> +/* It just needs two std.  */
> +void foo (unsigned __int128* res, unsigned long long hi, unsigned long long 
> lo)
> +{
> +   unsigned __int128 i = hi;
> +   i <<= 64;
> +   i |= lo;
> +   *res = i;
> +}

You can also just count the number of generated insns:
/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 3 } } */
(three, not two, because of the blr insn at the end).

If possible, we should simply not do :TI ops on older systems at all,
and only on the newer systems that have instructions for it (and that
does not fix PR100694 btw, the problems there have to be solved, not
side-stepped :-( )


Segher

Reply via email to