Hi!

On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote:
> On 2021/6/3 06:20, Segher Boessenkool wrote:
> > On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
> >> On P8LE, extra rot64+rot64 load or store instructions are generated
> >> in float128 to vector __int128 conversion.
> >>
> >> This patch teaches pass swaps to also handle such pattens to remove
> >> extra swap instructions.
> > 
> > Did you check if this is already handled by simplify-rtx if the mode had
> > been TImode (not V1TImode)?  If not, why do you not handle it there?
> 
> I tried to do it in combine or peephole, the later pass split2
> or split3 will still split it to rotate + rotate again as we have split
> after reload, and this pattern is quite P8LE specific, so put it in pass
> swap.  The simplify-rtx could simplify 
> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register
> operations already.

What mode are those subregs?  Abbreviated RTL printouts are very lossy.
Assuming those are TImode (please check), then yes, that is what I
asked, thanks.

> ;; The post-reload split requires that we re-permute the source
> ;; register in case it is still live.
> (define_split
>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
>         (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>    && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)"
>   [(const_int 0)]
> {
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>   rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode);
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode);
>   DONE;
> })

Yes, that needs improvement itself.

The tthing to realise is that TImode is optimised by generic code just
fine (as all scalar integer modes are), but V1TImode is not.  We have
that mode because we really needed to not put TImode in vector registers
so much on older cpus, but that balance may have changed by now.  Worth
experimenting with, we now can do pretty much all noormal operations in
vector registers!


Segher

Reply via email to