https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88560

--- Comment #5 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
We have too many tests checking expected generated code.  We should more focus
on overall effect of the change.  SPEC would be a good criterium although it is
hard to check SPEC for each patch.

I've checked the generated code of arm8_2-fp16-move-1.c and found that in most
cases GCC generates better code with the patch:

@@ -80,7 +80,6 @@ test_load_store_1:
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        lsl     r1, r1, #1
-       vmov.f16        s0, r3  @ __fp16
        ldrh    r3, [r2, r1]    @ __fp16
        strh    r3, [r0, r1]    @ __fp16
        bx      lr
@@ -97,10 +96,11 @@ test_load_store_2:
        @ link register save eliminated.
        add     r1, r1, #2
        lsl     r1, r1, #1
+       add     r3, r2, r1
        add     r0, r0, r1
-       ldrh    r3, [r2, r1]    @ __fp16
+       vld1.16 {d0[0]}, [r3]
+       vmov.f16        r3, s0  @ __fp16
        strh    r3, [r0, #-4]   @ __fp16
-       vmov.f16        s0, r3  @ __fp16
        bx      lr
        .size   test_load_store_2, .-test_load_store_2
        .align  2
@@ -176,14 +176,13 @@ test_select_5:
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vcvtb.f32.f16   s0, s0
-       vcvtb.f32.f16   s14, s1
-       vmov    s15, s1 @ __fp16
-       vcmpe.f32       s0, s14
+       vcvtb.f32.f16   s15, s1
+       vcmpe.f32       s0, s15
        vmrs    APSR_nzcv, FPSCR
        bmi     .L17
-       vmov    s15, s2 @ __fp16
+       vmov    s1, s2  @ __fp16
 .L17:
-       vmov    s0, s15 @ __fp16
+       vmov    s0, s1  @ __fp16
        bx      lr
        .size   test_select_5, .-test_select_5
        .align  2
@@ -197,14 +196,13 @@ test_select_6:
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vcvtb.f32.f16   s0, s0
-       vcvtb.f32.f16   s14, s1
-       vmov    s15, s1 @ __fp16
-       vcmpe.f32       s0, s14
+       vcvtb.f32.f16   s15, s1
+       vcmpe.f32       s0, s15
        vmrs    APSR_nzcv, FPSCR
        bls     .L19
-       vmov    s15, s2 @ __fp16
+       vmov    s1, s2  @ __fp16
 .L19:
-       vmov    s0, s15 @ __fp16
+       vmov    s0, s1  @ __fp16
        bx      lr
        .size   test_select_6, .-test

The only worse code is for test_load_store_2.  The culprit insn is 

(insn 12 10 16 2 (set (mem:HF (plus:SI (reg/f:SI 121)
                (const_int -4 [0xfffffffffffffffc])) [1 *_6+0 S2 A16])
        (reg:HF 116 [ <retval> ])) "b.i":4:8 653 {*movhf_vfp_fp16}
     (expr_list:REG_DEAD (reg/f:SI 121)
        (nil)))

Before the patch p116 had GENERAL_REGS class and after the patch it has
VFP_LO_REGS.  The memory move cost for VFP_LO_REGS is small although the insn
for VFP_LO_REGS prohibits memory with this kind address.

There are a few ways to fix this:
  1. give RA correct cost for store VFP_LO_REGS into given memory which means
adding a new hook taking specific memory rtx
  2. check insn constraints to modify cost in some way
  3. try to reload address instead of memory itself

I prefer the last solution although it requires changes in sensitive part of
LRA and will require a lot testing on a few targets.  So I am working on the
patch.

In any case, I don't think the PR should have P1.  P3 or even P4 would be more
accurate estimation (please see my top comments).

Reply via email to