[Bug target/95704] PPC: int128 shifts should be implemented branchless

segher at gcc dot gnu.org Wed, 17 Jun 2020 05:57:17 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95704


Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |segher at gcc dot gnu.org
   Last reconfirmed|                            |2020-06-17

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jens Seifert from comment #0)
> PowerPC processors don't like branches and branch mispredicts lead to large
> overhead.

While that is of course true, the situation isn't worse than on
other CPUs.

The situation here is exactly analogous to 64-bit shifts with -m32.

Fixed distance shifts (and rotates) generate pretty much ideal code
already (sometimes it could save a "mr" insn, by reordering more --
that is because the rl*imi insns use a register as both input and
output).

> shift left/right unsigned __in128 can be implemented in 8 instructions which
> can be processed on 2 pipelines almost in parallel leading to ~5 cycle
> latency on Power 7 and 8.

> shift right algebraic __int128 can be implemented in 10 instructions.
> Overall comparable in latency of the branching code.

This can be done better, using the fact that shifts over 64..127
bits are defined just fine for 64-bit power shift insns.

> The unnecessary rldicl 8,5,0,32 at the beginning of the routines are also
> not necessary.

I see no rldicl?

Confirmed.

[Bug target/95704] PPC: int128 shifts should be implemented branchless

Reply via email to