https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104882

Christophe Lyon <clyon at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2022-03-16

--- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> ---
My understanding is that MVE's vmovn instructions do not work like Neon's.

If q0 = { 0x33333333, 0x22222222, 0x11111111, 0 } ( 4x32 bits)
   q1 = { 0x77777777, 0x66666666, 0x55555555, 0x44444444 }

With Neon:
vmovn.i32 d4, q0 gives:
d4 = { 0x3333, 0x2222, 0x1111, 0 } (4x16 bits)
vmovn.i32 d5, q1 gives:
d5 = { 0x7777, 0x6666, 0x5555, 0x4444 }
thus q2 = { 0x7777, 0x6666, 0x5555, 0x4444, 0x3333, 0x2222, 0x1111, 0 }

But with MVE:
vmovnb.i32 q2, q0 gives:
q2 = { 0x????, 0x3333, 0x????, 0x2222, 0x????, 0x1111, 0x????, 0 } (8x16 bits,
only the bottom bits of each 32 bits element are updated)
vmovnt.i32 q2, q1 then gives:
q2 = { 0x7777, 0x3333, 0x6666, 0x2222, 0x5555, 0x1111, 0x4444, 0 } (only the
top bits are updated)

This means that the input should be shuffled before using MVE's vmovn[bt] to
have
q0 = { 0x66666666, 0x44444444, 0x22222222, 0 }
q1 = { 0x77777777, 0x55555555, 0x33333333, 0x11111111 }

since MVE's vmovn do not seem to naturally map to GCC's vec_pack_trunc

Reply via email to