Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

Vineet Gupta Tue, 18 Oct 2022 14:51:21 -0700

Hi Jeff,

On 10/14/22 09:54, Jeff Law via Gcc wrote:

...

.L2:
    xor    a4,a4,a5
    andi    a4,a4,1
    srli    a3,a0,2
    srli    a5,a5,1
    beq    a4,zero,.L3

    li    a4,-24576    # 0xFFFF_A000
    addi    a4,a4,1        # 0xFFFF_A001
    xor    a5,a5,a4
    zext.h    a5,a5

.L3:
    xor    a3,a3,a5
    andi    a3,a3,1
    srli    a4,a0,3
    srli    a5,a5,1
    beq    a3,zero,.L4

    li    a3,-24576    # 0xFFFF_A000
    addi    a3,a3,1        # 0xFFFF_A001
...
...

I see that with small tests cse1 is able to substitute redundantconstant reg with equivalent old reg.

I find it easier to reason about this stuff with a graphical CFG, so abit of ascii art...


Yeah A picture is worth thousand words :-)

Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation ofthe constants occurs in BB3 and BB5.


And Evaluation here means use of the constant (vs. definition ?).

CSE isn't going to catch this. The best way to think about CSE'scapabilities is that it can work on extended basic blocks. Anextended basic block can have jumps out, but not jumps in. There are 3EBBs in this code. (1,2), (4,5) and 6. So BB4 is in a different EBBthan BB3. So the evaluation in BB3 can't be used by CSE in the EBBcontaining BB4, BB5.


Thanks for the detailed explanation.

PRE/GCSE is better suited for this scenario, but it has a criticalconstraint. In particular our PRE formulation is never allowed to putan evaluation of an expression on a path that didn't have one before. Sowhile there clearly a redundancy on the path 2->3->4->5 (BB3 and BB5),there is nowhere we could put an evaluation that would reduce the numberof evaluation on that path without introducing an evaluation on pathsthat didn't have one. So consider 2->4->6. On that path there are zeroevaluations. So we can't place an eval in BB2 because that will causeevaluations on 2->4->6 which didn't have any evaluations.

OK. How does PRE calculate all possible paths to consider: say yourexample 2-3-4-5 and 2-4-6 ? Is that just indicative or would actually bethe one PRE calculates for this case. Would there be more ?

There isn't a great place in GCC to handle this right now. If theconstraints were relaxed in PRE, then we'd have a chance, but gettingthe cost model right is going to be tough.

It would have been better (for this specific case) if loop unrolling wasnot being done so early. The tree pass cunroll is flattening it out andleaving for rest of the all tree/rtl passes to pick up the pieces andremove any redundancies, if at all. It obviously needs to be early if weare injecting 7x more instructions, but seems like a lot to unravel.

FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it alwaysunrolls. Is that expected ?

If this seems worthwhile and you have ideas to do this any better, I'dbe happy to work on this with some guidance.


Thx,
-Vineet

Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

Reply via email to