Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> James Greenhalgh wrote:
>
>> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.
>>
>> > I'd prefer more detail than this for a workaround; which test, why did it
>> > start to fail, why is this the right solution, etc.
>
> It was gcc.target/aarch64/vect_copy_lane_1.c generating:
>
> test_copy_laneq_f64:
>         umov    x0, v1.d[1]
>         fmov    d0, x0
>         ret
>
> For some reason returning a double uses DImode temporaries, so it's essential
> to prefer FP_REGS here and mark the lane copy correctly.

The "?" change seems to make intrinsic sense given the extra cost of the
GPR alternative.  But I think the real reason for this failure is that
we define no V1DF patterns, and target-independent code falls back to
using moves in the corresponding *integer* mode.  So for that function
we generate the rather ugly code:

(note 6 1 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 3 6 2 2 (clobber (reg/v:V1DF 92 [ aD.21157 ])) "vect_copy_lane_1.c":45 -1
     (nil))
(insn 2 3 4 2 (set (subreg:DI (reg/v:V1DF 92 [ aD.21157 ]) 0)
        (reg:DI 32 v0 [ aD.21157 ])) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 4 2 5 2 (set (reg/v:V2DF 93 [ bD.21158 ])
        (reg:V2DF 33 v1 [ bD.21158 ])) "vect_copy_lane_1.c":45 1063 
{*aarch64_simd_movv2df}
     (nil))
(note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 5 9 2 (set (reg:DF 95)
        (vec_select:DF (reg/v:V2DF 93 [ bD.21158 ])
            (parallel [
                    (const_int 1 [0x1])
                ]))) "./include/arm_neon.h":14441 1993 {aarch64_get_lanev2df}
     (nil))
(insn 9 8 11 2 (set (reg:DI 96)
        (subreg:DI (reg:DF 95) 0)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 11 9 10 2 (clobber (reg:V1DF 91 [ <retval> ])) "vect_copy_lane_1.c":45 -1
     (nil))
(insn 10 11 15 2 (set (subreg:DI (reg:V1DF 91 [ <retval> ]) 0)
        (reg:DI 96)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 15 10 16 2 (set (reg:DI 32 v0)
        (subreg:DI (reg:V1DF 91 [ <retval> ]) 0)) "vect_copy_lane_1.c":45 47 
{*movdi_aarch64}
     (nil))
(insn 16 15 0 2 (use (reg/i:V1DF 32 v0)) "vect_copy_lane_1.c":45 -1
     (nil))

which by IRA gets optimised to:

(insn 9 8 15 2 (set (subreg:DF (reg:DI 96) 0)
        (vec_select:DF (reg:V2DF 33 v1 [ bD.21158 ])
            (parallel [
                    (const_int 1 [0x1])
                ]))) "vect_copy_lane_1.c":45 1993 {aarch64_get_lanev2df}
     (expr_list:REG_DEAD (reg:V2DF 33 v1 [ bD.21158 ])
        (nil)))
(insn 15 9 16 2 (set (reg:DI 32 v0)
        (reg:DI 96)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 96)
        (nil)))
(insn 16 15 18 2 (use (reg/i:V1DF 32 v0)) "vect_copy_lane_1.c":45 -1
     (nil))

with the move now being done purely in DImode.  This defeats the
heuristic in aarch64_ira_change_pseudo_allocno_class because the
pseudo appears to be a normal integer rather than a (float) vector.

Although the "?" fixes this particular instance, I think more
complicated V1DF code would still regress by being forced to
use GENERAL_REGS.  Of course, the fix is to add the move pattern
rather than disable the heuristic...

Thanks,
Richard

Reply via email to