https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104882
Christophe Lyon <clyon at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2022-03-16 --- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> --- My understanding is that MVE's vmovn instructions do not work like Neon's. If q0 = { 0x33333333, 0x22222222, 0x11111111, 0 } ( 4x32 bits) q1 = { 0x77777777, 0x66666666, 0x55555555, 0x44444444 } With Neon: vmovn.i32 d4, q0 gives: d4 = { 0x3333, 0x2222, 0x1111, 0 } (4x16 bits) vmovn.i32 d5, q1 gives: d5 = { 0x7777, 0x6666, 0x5555, 0x4444 } thus q2 = { 0x7777, 0x6666, 0x5555, 0x4444, 0x3333, 0x2222, 0x1111, 0 } But with MVE: vmovnb.i32 q2, q0 gives: q2 = { 0x????, 0x3333, 0x????, 0x2222, 0x????, 0x1111, 0x????, 0 } (8x16 bits, only the bottom bits of each 32 bits element are updated) vmovnt.i32 q2, q1 then gives: q2 = { 0x7777, 0x3333, 0x6666, 0x2222, 0x5555, 0x1111, 0x4444, 0 } (only the top bits are updated) This means that the input should be shuffled before using MVE's vmovn[bt] to have q0 = { 0x66666666, 0x44444444, 0x22222222, 0 } q1 = { 0x77777777, 0x55555555, 0x33333333, 0x11111111 } since MVE's vmovn do not seem to naturally map to GCC's vec_pack_trunc