Re: Followup on PR/109279: large constants on RISCV

Jeff Law via Gcc-patches Wed, 07 Jun 2023 16:44:14 -0700



On 6/1/23 20:38, Vineet Gupta wrote:

Hi Jeff,
I finally got around to collecting various observations on PR/109279 -more importantly the state of large constants in RV backend, apologiesin advance for the long email.
It seems the various commits in area have improved the original testcase of 0x1010101_01010101
Before 2e886eef7f2b | With 2e886eef7f2b | With0530254413f8 | With c104ef4b5eb1

Right. The handling of that constant shows a nice progression. On ourarchitecture the latter two versions are probably equivalent from alatency standpoint, but the last is obviously best as it's smaller andprobably better on in-order architectures as well.

But same commits seem to have regressed Andrew's test from same PR(which is the theme of this email).

The seemingly contrived test turned out to be much more than I'd hoped for.

    long long f(void)
    {
      unsigned t = 0x101_0101;
      long long t1 = t;
      long long t2 = ((unsigned long long )t) << 32;
      asm("":"+r"(t1));
      return t1 | t2;
    }

[ ... ]

It may be more instructions, but I suspect they end up being the sameperformance for us across all three varaints. Fusion and out-of-orderexecution save the day. But I realize there may be targets where thefirst is going to be preferred.


   Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8
     (ideal code)       | define_insn_and_split  | "splitter relaxed new
                        |                        |  pseudos"
    li   a0,0x1010000   |    li   a5,0x1010000   |    li a0,0x101_0000
    addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101
    slli a5,a0,32       |    mv   a0,a5          |    li a5,0x101_0000
    or   a0,a0,a5       |    slli a5,a5,32       |    slli a0,a0,32
    ret                 |    or   a0,a0,a5       |    addi a5,a5,0x101
                        |    ret                 |    or   a0,a5,a0
                                                 |    ret

As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:

[ ... ]
Right. Standard looking synthesis.

Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalentregisters for the fragments of const and reusing the reg.

Right.  That's what I would expect.

[ ... ]

With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog()kicks in during cse1, eliding insns for a const_int.


    (insn 7 6 8 2 (set (reg:DI 137)
         (const_int [0x1010101])) {*mvconst_internal}
         (expr_list:REG_EQUAL (const_int [0x1010101])))
    [...]

    (insn 11 10 12 2 (set (reg:DI 140)
         (const_int [0x1010101_00000000])) {*mvconst_internal}
         (expr_list:REG_EQUAL (const_int  [0x1010101_00000000]) ))

Understood. Not ideal, but we generally don't have good ways to limitpatterns to being available at different times during the optimizationphase. One thing you might want to try (which I thought we used at onepoint) was make the pattern conditional on cse_not_expected. The goalwould be to avoid exposing the pattern until a later point in theoptimizer pipeline. It may have been the case that we dropped that overtime during development. It's all getting fuzzy at this point.

Eventually split1 breaks it up using same mvconst_internal splitter, butthe cse opportunity has been lost.

Right. I'd have to look at the pass definitions, but I suspect thesplitting pass where this happens is after the last standard CSE pass.So we don't get a chance to CSE the constant synthesis.

*This is a now a baseline for large consts handling for RV backend whichwe all need to be aware of*.

Understood. Though it's not as bad as you might think :-) You canspend an inordinate amount of time improving constant synthesis,generate code that looks really good, but in the end it may not make abit of different in real performance. Been there, done that. I'm notsaying we give up, but we need to keep in mind that we're often betteroff trading a bit on the constant synthesis if doing so helps code wherethose constants get used.

(2) Now on to the nuances as to why things get progressively worse aftercommit 0530254413f8.


It all seems to get down to register allocation passes:

sched1 before 0530254413f8

    ;;     0--> b  0: i  22 r140=0x1010000    :alu
    ;;     1--> b  0: i  20 r137=0x1010000    :alu
    ;;     2--> b  0: i  23 r140=r140+0x101   :alu
    ;;     3--> b  0: i  21 r137=r137+0x101   :alu
    ;;     4--> b  0: i  24 r140=r140<<0x20   :alu
    ;;     5--> b  0: i  25 r136=r137         :alu
    ;;     6--> b  0: i   8 r136=asm_operands :nothing
    ;;     7--> b  0: i  17 a0=r136|r140      :alu
    ;;     8--> b  0: i  18 use a0            :nothing

sched1 with 0530254413f8

    ;;     0--> b  0: i  22 r144=0x1010000    :alu
    ;;     1--> b  0: i  20 r143=0x1010000    :alu
    ;;     2--> b  0: i  23 r145=r144+0x101   :alu
    ;;     3--> b  0: i  21 r137=r143+0x101   :alu
    ;;     4--> b  0: i  24 r140=r145<<0x20   :alu
    ;;     5--> b  0: i  25 r136=r137         :alu
    ;;     6--> b  0: i   8 r136=asm_operands :nothing
    ;;     7--> b  0: i  17 a0=r136|r140      :alu
    ;;     8--> b  0: i  18 use a0            :nothing

The insn stream is same, only differences being registers reuse (due tosplitter restriction) vs. not.

Next IRA, for reasons I don't understand (and not brave enough yet todive into) decides to regenerate const_int.

Sure. It's pretty standard practice. When it finds a register that hasa known re-synthesizable value it will often replace the register withthe value. It can help in cases where register pressure it excessivelyhigh by reducing the range of the register holding the value.

And my guess is, this being so late in the game that it getsrematerialized as is in the end, causing the duplicity.

Yup. Though there is a post-reload CSE pass. It's pretty limited inwhat it can do and register assignments often make it impossible to doanything, but it's worth looking at to see if why it's not helping here.

FWIW, IRA for latter case only, emits additional REG_EQUIV notes whichcould also be playing a role.

REG_EQUAL notes get promoted to REG_EQUIV notes in some cases. And whenother equivalences are discovered it may create a REG_EQUIV note out ofthin air.

The REG_EQUIV note essentially means that everywhere the register occursyou can validly (from a program semantics standpoint) replace theregister with the value. It might require reloading, but it's a validsemantic transformation which may reduce register pressure -- especiallyfor constants that were subject to LICM.

Contrast to REG_EQUAL which creates an equivalence at a particular pointin the IL, but the equivalence may not hold elsewhere in the IL.

I naively tried to gate mvconst_internal to !reload_completed, but thattriggered some ICE.

I wouldn't expect that to help here.

I would first start to see if using cse_not_expected in the splitterpattern. I would also look at reload_cse_regs which should give ussome chance at seeing the value reuse if/when IRA/LRA muck things up.


jeff

Re: Followup on PR/109279: large constants on RISCV

Reply via email to