On 7/8/21 6:28 PM, Segher Boessenkool wrote: > It needs testing on BE. Will do.
>> +static bool consecutive_mem_locations (rtx, rtx); > > Please don't; just move functions to somewhere earlier than where they > are used. Will do. >> @@ -16841,8 +16843,35 @@ rs6000_split_multireg_move (rtx dst, rtx src) >> for (int i = 0; i < nvecs; i++) >> { >> int index = WORDS_BIG_ENDIAN ? i : nvecs - 1 - i; >> - rtx dst_i = gen_rtx_REG (reg_mode, reg + index); >> - emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); >> + int index_next = WORDS_BIG_ENDIAN ? index + 1 : index - 1; > > What does index_next mean? The machine instructions do the same thing > in any endianness. Yeah, I'm bad at coming up with names! :-) So "index" is the index into XVECEXP (src, 0, ...) which is the operand that is to be assigned to regno. "index_next" is the index into XVECEXP (src, 0, ...) which is the operand to be assigned to regno + 1 (ie, the next register of the even/odd register pair). Whether the "next index" is index+1 or index-1 is dependent on LE versus BE. >> + /* If we are loading an even VSX register and our memory location >> + is adjacent to the next register's memory location (if any), >> + then we can load them both with one LXVP instruction. */ >> + if ((regno & 1) == 0 >> + && VSX_REGNO_P (regno) >> + && MEM_P (XVECEXP (src, 0, index)) >> + && MEM_P (XVECEXP (src, 0, index_next))) >> + { >> + rtx base = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index) >> + : XVECEXP (src, 0, index_next); >> + rtx next = WORDS_BIG_ENDIAN ? XVECEXP (src, 0, index_next) >> + : XVECEXP (src, 0, index); > > Please get rid of index_next, if you still have to do different code for > LE here -- it doesn't make the code any clearer (in fact I cannot follow > it at all anymore :-( ) We do need different code for LE versus BE. So you want something like if (WORDS_BIG_ENDIAN) {...} else {...} ...instead? I can try that to see if the code is easier to read. > So this converts pairs of lxv to an lxvp in only a very limited case, > right? Can we instead do it more generically? And what about stxvp? Doing it more generically is my next TODO and that will cover both lxvp and stxvp. My thought was to write a simple pass run at about the same time as our swap optimization pass to look for adjacent lxv's and stxv's and convert them into lxvp and stxvp. However, that won't catch the above case, since the assemble/build pattern is not split until very late, so we still want the above change. Also, given the new pass will be more complicated than the above code, it will be a GCC 12 only change. Let me make the changes you want and I'll repost with what I come up with. Peter