https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104882
Christophe Lyon <clyon at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2022-03-16
--- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> ---
My understanding is that MVE's vmovn instructions do not work like Neon's.
If q0 = { 0x33333333, 0x22222222, 0x11111111, 0 } ( 4x32 bits)
q1 = { 0x77777777, 0x66666666, 0x55555555, 0x44444444 }
With Neon:
vmovn.i32 d4, q0 gives:
d4 = { 0x3333, 0x2222, 0x1111, 0 } (4x16 bits)
vmovn.i32 d5, q1 gives:
d5 = { 0x7777, 0x6666, 0x5555, 0x4444 }
thus q2 = { 0x7777, 0x6666, 0x5555, 0x4444, 0x3333, 0x2222, 0x1111, 0 }
But with MVE:
vmovnb.i32 q2, q0 gives:
q2 = { 0x????, 0x3333, 0x????, 0x2222, 0x????, 0x1111, 0x????, 0 } (8x16 bits,
only the bottom bits of each 32 bits element are updated)
vmovnt.i32 q2, q1 then gives:
q2 = { 0x7777, 0x3333, 0x6666, 0x2222, 0x5555, 0x1111, 0x4444, 0 } (only the
top bits are updated)
This means that the input should be shuffled before using MVE's vmovn[bt] to
have
q0 = { 0x66666666, 0x44444444, 0x22222222, 0 }
q1 = { 0x77777777, 0x55555555, 0x33333333, 0x11111111 }
since MVE's vmovn do not seem to naturally map to GCC's vec_pack_trunc