http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
ktkachov at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ktkachov at gcc dot gnu.org --- Comment #8 from ktkachov at gcc dot gnu.org --- > arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb: > sqrlen4D_16u8: > vmov d18, r0, r1 @ v16qi > vmov d19, r2, r3 > vld1.64 {d16-d17}, [sp:64] > vabd.u8 q8, q9, q8 > vmull.u8 q9, d16, d16 > vmull.u8 q8, d17, d17 > vuzp.32 q9, q8 > vpaddl.u16 q9, q9 > vmov q10, q9 @ v4si > vpadal.u16 q10, q8 > vmov r0, r1, d20 @ v4si > vmov r2, r3, d21 > bx lr With current trunk I'm getting for the softfp case: push {lr} @ 40 *push_multi [length = 2] vmov d16, r0, r1 @ v16qi @ 37 *neon_movv16qi/6 [length = 8] vmov d17, r2, r3 add lr, sp, #4 @ 36 *arm_addsi3/5 [length = 4] vldr d18, [sp, #4] @ 3 *neon_movv16qi/4 [length = 8] vldr d19, [sp, #12] vabd.u8 q9, q8, q9 @ 7 neon_vabdv16qi [length = 4] vmull.u8 q8, d18, d18 @ 14 neon_vmullv8qi [length = 4] vmull.u8 q9, d19, d19 @ 16 neon_vmullv8qi [length = 4] vuzp.32 q8, q9 @ 18 *neon_vuzpv4si_insn [length = 4] vpaddl.u16 q8, q8 @ 22 neon_vpaddlv8hi [length = 4] vpadal.u16 q8, q9 @ 28 neon_vpadalv8hi [length = 4] vmov r0, r1, d16 @ v4si @ 39 *neon_movv4si/5 [length = 8] vmov r2, r3, d17 ldr pc, [sp], #4 @ 45 *ldr_with_return [length = 4] The move between the vpad*s is gone, but there's a couple of redundant loads and some register spillage.