[Richard, combine question at the bottom for you. I've quoted Ulrich's whole message in order to provide a bit of context.]
"Ulrich Weigand" <uweig...@de.ibm.com> writes: > Richard Sandiford wrote: >> At the moment, fwprop will propagate constants and registers >> even if no further rtl simplifications are possible: >> >> if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) >> flags |= PR_CAN_APPEAR; >> >> What do you think about extending this to subregs? > > I've been testing your patch (in its latest version including > the SUBREG test pointed out by H.J.), and ran into issues where > this additional propagation suppresses other optimizations. > > In particular, I'm seeing this testsuite regression on x86_64: > FAIL: gcc.target/i386/pr30315.c scan-assembler-times cmp 4 > > The problem is that the combined "compute result and set > condition code" insn patterns sometimes no longer match. > > The source code in question looks like: > > int c; > > unsigned char > decccc (unsigned char a, unsigned char b) > { > unsigned char difference = a - b; > if (difference > a) > c--; > return difference; > } > > > Before your patch, we generate insns like: > > (insn 3 4 5 2 (set (reg/v:QI 63 [ a ]) > (subreg:QI (reg:SI 64 [ a ]) 0)) pr30315.i:5 66 {*movqi_internal} > (expr_list:REG_DEAD (reg:SI 64 [ a ]) > (nil))) > > (insn 5 3 6 2 (set (reg/v:QI 65 [ b ]) > (subreg:QI (reg:SI 66 [ b ]) 0)) pr30315.i:5 66 {*movqi_internal} > (expr_list:REG_DEAD (reg:SI 66 [ b ]) > (nil))) > > (insn 9 6 10 2 (parallel [ > (set (reg/v:QI 59 [ difference ]) > (minus:QI (reg/v:QI 63 [ a ]) > (reg/v:QI 65 [ b ]))) > (clobber (reg:CC 17 flags)) > ]) pr30315.i:6 287 {*subqi_1} > (expr_list:REG_DEAD (reg/v:QI 65 [ b ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil)))) > > (insn 10 9 11 2 (set (reg:CC 17 flags) > (compare:CC (reg/v:QI 63 [ a ]) > (reg/v:QI 59 [ difference ]))) pr30315.i:7 4 {*cmpqi_1} > (expr_list:REG_DEAD (reg/v:QI 63 [ a ]) > (nil))) > > which get combined into: > > (insn 4 2 3 2 (set (reg:SI 66 [ b ]) > (reg:SI 4 si [ b ])) pr30315.i:5 64 {*movsi_internal} > (expr_list:REG_DEAD (reg:SI 4 si [ b ]) > (nil))) > > (insn 3 4 5 2 (set (reg/v:QI 63 [ a ]) > (subreg:QI (reg:SI 64 [ a ]) 0)) pr30315.i:5 66 {*movqi_internal} > (expr_list:REG_DEAD (reg:SI 64 [ a ]) > (nil))) > > (insn 10 9 11 2 (parallel [ > (set (reg:CCC 17 flags) > (compare:CCC (minus:QI (reg/v:QI 63 [ a ]) > (subreg:QI (reg:SI 66 [ b ]) 0)) > (reg/v:QI 63 [ a ]))) > (set (reg/v:QI 59 [ difference ]) > (minus:QI (reg/v:QI 63 [ a ]) > (subreg:QI (reg:SI 66 [ b ]) 0))) > ]) pr30315.i:7 322 {*subqi3_cc_overflow} > (expr_list:REG_DEAD (reg:SI 66 [ b ]) > (expr_list:REG_DEAD (reg/v:QI 63 [ a ]) > (nil)))) > > > Now, with your patch we have instead before combine: > > (insn 9 6 10 2 (parallel [ > (set (reg/v:QI 59 [ difference ]) > (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0) > (subreg:QI (reg:SI 66 [ b ]) 0))) > (clobber (reg:CC 17 flags)) > ]) pr30315.i:6 287 {*subqi_1} > (expr_list:REG_DEAD (reg:SI 66 [ b ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil)))) > > (insn 10 9 11 2 (set (reg:CC 17 flags) > (compare:CC (subreg:QI (reg:SI 64 [ a ]) 0) > (reg/v:QI 59 [ difference ]))) pr30315.i:7 4 {*cmpqi_1} > (expr_list:REG_DEAD (reg:SI 64 [ a ]) > (nil))) > > which combine is unfortunately unable to process: > > Trying 9 -> 10: > Failed to match this instruction: > (parallel [ > (set (reg:CC 17 flags) > (compare:CC (subreg:QI (minus:SI (reg:SI 64 [ a ]) > (reg:SI 66 [ b ])) 0) > (subreg:QI (reg:SI 64 [ a ]) 0))) > (set (reg/v:QI 59 [ difference ]) > (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0) > (subreg:QI (reg:SI 66 [ b ]) 0))) > ]) > > > The problem appears to be that an RTX expression like > > (minus:QI (subreg:QI (reg:SI 64 [ a ]) 0) > (subreg:QI (reg:SI 66 [ b ]) 0)) > > seems to be considered non-canonical by combine, and is instead > transformed into > > (subreg:QI (minus:SI (reg:SI 64 [ a ]) > (reg:SI 66 [ b ])) 0) > > This happens via combine.c:apply_distributive_law. > > > On the one hand, this seems odd, as SUBREGs with a non-trivial > expression inside will usually not match any insn pattern. On > the other hand, I guess this might still be useful behaviour > for combine on platforms that support only word arithmetic, > when the SUBREG might be considered a "split" point. (Of course, > this doesn't work for the compute-result-and-cc type patterns.) > > > In either case, it is always odd to have RTX in the insn stream > that isn't "stable" under common simplication ... Well, I suppose I see "common simplification" as what simplify-rtx.c does. The problem comes at times like this when combine has its own ideas. It looks like Paolo was thinking of making simplify-rtx.c go the other way (which sounds like a good idea on the face of it). > Do you have any suggestions on how to fix this? If we add the fwprop > patch, should we then disable apply_distributive_law for SUBREGs? No immediate suggestions, sorry. It looks like the combine case was added by this pre-egcs patch: Wed Mar 18 05:54:25 1998 Richard Kenner <ken...@vlsi1.ultra.nyu.edu> * combine.c (gen_binary): Don't make AND that does nothing. (simplify_comparison, case AND): Commute AND and SUBREG. so maybe Richard remembers (or has a record of) what this was designed to handle? Richard