Richard Sandiford wrote:
Tejas Belagod <tbela...@arm.com> writes:
Richard Sandiford wrote:
Tejas Belagod <tbela...@arm.com> writes:
+ /* This is big-endian-safe because the elements are kept in target
+ memory order. So, for eg. PARALLEL element value of 2 is the same in
+ either endian-ness. */
+ if (GET_CODE (src) == VEC_SELECT
+ && REG_P (XEXP (src, 0)) && REG_P (dst)
+ && REGNO (XEXP (src, 0)) == REGNO (dst))
+ {
+ rtx par = XEXP (src, 1);
+ int i;
+
+ for (i = 0; i < XVECLEN (par, 0); i++)
+ {
+ rtx tem = XVECEXP (par, 0, i);
+ if (!CONST_INT_P (tem) || INTVAL (tem) != i)
+ return 0;
+ }
+ return 1;
+ }
+
I think for big endian it needs to be:
INTVAL (tem) != i + base
where base is something like:
int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0);
E.g. a big-endian V4HI looks like:
msb lsb
0000111122223333
and shortening it to say V2HI only gives the low 32 bits:
msb lsb
22223333
But, in this case we want
msb lsb
00001111
It depends on whether the result occupies a full register or not.
I was thinking of the case where it didn't, but I realise now you were
thinking of the case where it did. And yeah, my suggestion doesn't
cope with that...
I was under the impression that the const vector parallel for vec_select
represents the element indexes of the array in memory order.
Therefore, in bigendian,
msb lsb
0000 1111 2222 3333
element a[0] a[1] a[2] a[3]
and in littleendian
msb lsb
3333 2222 1111 0000
element a[3] a[2] a[1] a[0]
Right. But if an N-bit value is stored in a register, it's assumed to
occupy the lsb of the register and the N-1 bits above that. The other
bits in the register are don't-care.
E.g., leaving vectors to one side, if you have:
(set (reg:HI N) (truncate:SI (reg:SI N)))
on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this:
msb lsb
01234567
VVVV
xxxx4567
rather than:
msb lsb
01234567
VVVV
0123xxxx
for both endiannesses. The same principle applies to vectors.
The lsb of the register is always assumed to be significant.
So maybe the original patch was correct for partial-register and
full-register results on little-endian, but only for full-register
results on big-endian.
Ah, ok! I think I get it. By eliminating
set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0]))))
using the check INTVAL (tem) != i, I'm essentially making subsequent operations
use (reg:V2DI n) in DI mode which is a partial register result and this gives me
the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial
register mode, I have to make sure the correct elements coincide with the lsb in
big-endian...
Thanks for your input, I'll apply the offset correction for big-endian you
suggested. I'll respin the patch.
Thanks,
Tejas.