RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
-Original Message- From: Kai Tietz [mailto:ktiet...@googlemail.com] Sent: Thursday, October 27, 2011 5:36 PM To: Jiangning Liu Cc: Michael Matz; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs 2011/10/27 Jiangning Liu jiangning@arm.com: -Original Message- From: Michael Matz [mailto:m...@suse.de] Sent: Wednesday, October 26, 2011 11:47 PM To: Kai Tietz Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc- patc...@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: So you would mean that memory dereferencing shouldn't be considered as side-effect at all? No. I haven't said this at all. Of course it's a side-effect, but we're allowed to remove existing ones (under some circumstances). We're not allowed to introduce new ones, which means that this ... So we would happily cause by code 'if (i *i != 0) an crash, as memory-dereference has for you no side-effect? ... is not allowed. But in the original example the memread was on the left side, hence occured always, therefore we can move it to the right side, even though it might occur less often. In you special case it might be valid that, if first (and C-fold- const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Hmm? I don't see why it's necessary to check if it's the left-most condition in a chain. If the left hand of '' is a memread it can always be moved to the right side (or the operator transformed into '' which can have the same effect), of course only if the original rhs is free of side effects, but then independed if the was part of a larger expression. The memread will possibly be done fewer times than originally, but as said, that's okay. Agree. The point is for the small case I gave RHS doesn't have side effect at all, so the optimization of changing it to AND doesn't violate C specification. We need to recover something for this case, although it did improve a lot for some particular benchmarks. Thanks, -Jiangning Ciao, Michael. Hmm, so we can allow merging to AND, if the left-hand-side might trap but has no-side-effects and rhs has neither trapping nor side-effects. As for the case that left-hand side has side-effects but right-hand not, we aren't allowed to do this AND/OR merge. For example 'if ((f = foo ()) != 0 f 24)' we aren't allowed to make this transformation. This shouldn't be that hard. We need to provide to simple_operand_p_2 an additional argument for checking trapping or not. Would it be OK if I file a tracker in bugzilla against this? Regards, Kai
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
-Original Message- From: Michael Matz [mailto:m...@suse.de] Sent: Wednesday, October 26, 2011 11:47 PM To: Kai Tietz Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: So you would mean that memory dereferencing shouldn't be considered as side-effect at all? No. I haven't said this at all. Of course it's a side-effect, but we're allowed to remove existing ones (under some circumstances). We're not allowed to introduce new ones, which means that this ... So we would happily cause by code 'if (i *i != 0) an crash, as memory-dereference has for you no side-effect? ... is not allowed. But in the original example the memread was on the left side, hence occured always, therefore we can move it to the right side, even though it might occur less often. In you special case it might be valid that, if first (and C-fold- const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Hmm? I don't see why it's necessary to check if it's the left-most condition in a chain. If the left hand of '' is a memread it can always be moved to the right side (or the operator transformed into '' which can have the same effect), of course only if the original rhs is free of side effects, but then independed if the was part of a larger expression. The memread will possibly be done fewer times than originally, but as said, that's okay. Agree. The point is for the small case I gave RHS doesn't have side effect at all, so the optimization of changing it to AND doesn't violate C specification. We need to recover something for this case, although it did improve a lot for some particular benchmarks. Thanks, -Jiangning Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/27 Jiangning Liu jiangning@arm.com: -Original Message- From: Michael Matz [mailto:m...@suse.de] Sent: Wednesday, October 26, 2011 11:47 PM To: Kai Tietz Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: So you would mean that memory dereferencing shouldn't be considered as side-effect at all? No. I haven't said this at all. Of course it's a side-effect, but we're allowed to remove existing ones (under some circumstances). We're not allowed to introduce new ones, which means that this ... So we would happily cause by code 'if (i *i != 0) an crash, as memory-dereference has for you no side-effect? ... is not allowed. But in the original example the memread was on the left side, hence occured always, therefore we can move it to the right side, even though it might occur less often. In you special case it might be valid that, if first (and C-fold- const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Hmm? I don't see why it's necessary to check if it's the left-most condition in a chain. If the left hand of '' is a memread it can always be moved to the right side (or the operator transformed into '' which can have the same effect), of course only if the original rhs is free of side effects, but then independed if the was part of a larger expression. The memread will possibly be done fewer times than originally, but as said, that's okay. Agree. The point is for the small case I gave RHS doesn't have side effect at all, so the optimization of changing it to AND doesn't violate C specification. We need to recover something for this case, although it did improve a lot for some particular benchmarks. Thanks, -Jiangning Ciao, Michael. Hmm, so we can allow merging to AND, if the left-hand-side might trap but has no-side-effects and rhs has neither trapping nor side-effects. As for the case that left-hand side has side-effects but right-hand not, we aren't allowed to do this AND/OR merge. For example 'if ((f = foo ()) != 0 f 24)' we aren't allowed to make this transformation. This shouldn't be that hard. We need to provide to simple_operand_p_2 an additional argument for checking trapping or not. Regards, Kai
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Michael Matz Sent: Tuesday, October 11, 2011 10:45 PM To: Kai Tietz Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. Well, either I make it a new function and call it instead of simple_operand_p, That's what I meant, yes. @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non- comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. Yes, this part introduced different behavior for this small case, int f(char *i, int j) { if (*i j!=2) return *i; else return j; } Before the fix, we have D.4710 = *i; D.4711 = D.4710 != 0; D.4712 = j != 2; D.4713 = D.4711 D.4712; if (D.4713 != 0) goto D.4714; else goto D.4715; D.4714: D.4710 = *i; D.4716 = (int) D.4710; return D.4716; D.4715: D.4716 = j; return D.4716; After the fix, we have D.4711 = *i; if (D.4711 != 0) goto D.4712; else goto D.4710; D.4712: if (j != 2) goto D.4713; else goto D.4710; D.4713: D.4711 = *i; D.4714 = (int) D.4711; return D.4714; D.4710: D.4714 = j; return D.4714; Does this meet the original expectation? I find the code below in function fold_truth_andor makes difference, /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) into (A OR B). For sequence point consistancy, we need to check for trapping, and side-effects. */ else if (code == icode simple_operand_p_2 (arg0) simple_operand_p_2 (arg1)) return fold_build2_loc (loc, ncode, type, arg0, arg1); for *i != 0 simple_operand_p(*i) returns false. Originally this is not checked by the code. I don't see the patch originally wanted to cover this. Can this be explained reasonably? I'm not arguing this patch did worse thing, but only want to understand the rationale behind this and justify this patch doesn't hurt anything else. Actually on the contrary, I measured and this change accidently made some benchmarks significantly improved due to some other reasons. Thanks, -Jiangning Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is better done at a single place in fold_truth_andor only. As fold_truthop is called twice by fold_truth_andor, the latter might indeed be the better place. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Jiangning Liu jiangning@arm.com: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Michael Matz Sent: Tuesday, October 11, 2011 10:45 PM To: Kai Tietz Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. Well, either I make it a new function and call it instead of simple_operand_p, That's what I meant, yes. @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non- comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. Yes, this part introduced different behavior for this small case, int f(char *i, int j) { if (*i j!=2) return *i; else return j; } Before the fix, we have D.4710 = *i; D.4711 = D.4710 != 0; D.4712 = j != 2; D.4713 = D.4711 D.4712; if (D.4713 != 0) goto D.4714; else goto D.4715; D.4714: D.4710 = *i; D.4716 = (int) D.4710; return D.4716; D.4715: D.4716 = j; return D.4716; After the fix, we have D.4711 = *i; if (D.4711 != 0) goto D.4712; else goto D.4710; D.4712: if (j != 2) goto D.4713; else goto D.4710; D.4713: D.4711 = *i; D.4714 = (int) D.4711; return D.4714; D.4710: D.4714 = j; return D.4714; Does this meet the original expectation? I find the code below in function fold_truth_andor makes difference, /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) into (A OR B). For sequence point consistancy, we need to check for trapping, and side-effects. */ else if (code == icode simple_operand_p_2 (arg0) simple_operand_p_2 (arg1)) return fold_build2_loc (loc, ncode, type, arg0, arg1); for *i != 0 simple_operand_p(*i) returns false. Originally this is not checked by the code. I don't see the patch originally wanted to cover this. Can this be explained reasonably? I'm not arguing this patch did worse thing, but only want to understand the rationale behind this and justify this patch doesn't hurt anything else. Actually on the contrary, I measured and this change accidently made some benchmarks significantly improved due to some other reasons. Thanks, -Jiangning Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is better done at a single place in fold_truth_andor only. As fold_truthop is called twice by fold_truth_andor, the latter might indeed be the better place. Ciao, Michael. Well, as far as I understand C specification and sequence points, it makes indeed a difference to do branch-cost optimization on instructions might cause a signal traps. As signal could be handled in specifc handlers. You need to consider here that short-circuit optimization assumes associative law on operands. So right-hand side might be evaluaded before the left-hand-side. And this indeed breaks IMHO the sequences as mentioned in C specification about conditions. So a memory de-referencing (same as a floating-point trap) can never be treated as simple, as far as I understood this. So this patch - beside improving logic for branch-cost merging - fixes this latent issue. Regards, Kai
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Jiangning Liu wrote: - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non- comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. Yes, this part introduced different behavior for this small case, D.4710 = *i; D.4711 = D.4710 != 0; D.4712 = j != 2; D.4713 = D.4711 D.4712; if (D.4713 != 0) goto D.4714; else goto D.4715; After the fix, we have D.4711 = *i; if (D.4711 != 0) goto D.4712; else goto D.4710; D.4712: if (j != 2) goto D.4713; else goto D.4710; So, we have one more jump than originally, when the point of the patch was to emit less on targets with high branch costs. So, as speculated, the hunk was not useless. (It's nice that it caused a benchmark to improve significantly, but that should be done via a proper analysis and patch, not as a side effect of a supposed non-change). Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: Yes, this part introduced different behavior for this small case, int f(char *i, int j) { if (*i j!=2) return *i; else return j; } Well, as far as I understand C specification and sequence points, it makes indeed a difference to do branch-cost optimization on instructions might cause a signal traps. As signal could be handled in specifc handlers. You need to consider here that short-circuit optimization assumes associative law on operands. So right-hand side might be evaluaded before the left-hand-side. And this indeed breaks IMHO the sequences as mentioned in C specification about conditions. I'm not sure in this specific case. If it had been: if (*a *b) ... the you'd be right. The side-effect of dereferencing a must happen before *b, and hence transforming this into (*a!=0)(*b!=0) would be wrong. But in the case at hand we only have one potentially problematic (aka detectable) side-effect (*i), so I don't think you could construct a program that detects the difference. It would entail detecting that j!=2 was already evaluated, when (say) the segfault happens, but you can't as the variable is non-volatile. It also can't happen that the side-effect *i does not occur (which also would be a problem). So, I think there wasn't an actual problem before, and it had fewer jumps. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Michael Matz m...@suse.de: Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: Yes, this part introduced different behavior for this small case, int f(char *i, int j) { if (*i j!=2) return *i; else return j; } Well, as far as I understand C specification and sequence points, it makes indeed a difference to do branch-cost optimization on instructions might cause a signal traps. As signal could be handled in specifc handlers. You need to consider here that short-circuit optimization assumes associative law on operands. So right-hand side might be evaluaded before the left-hand-side. And this indeed breaks IMHO the sequences as mentioned in C specification about conditions. I'm not sure in this specific case. If it had been: if (*a *b) ... the you'd be right. The side-effect of dereferencing a must happen before *b, and hence transforming this into (*a!=0)(*b!=0) would be wrong. But in the case at hand we only have one potentially problematic (aka detectable) side-effect (*i), so I don't think you could construct a program that detects the difference. It would entail detecting that j!=2 was already evaluated, when (say) the segfault happens, but you can't as the variable is non-volatile. It also can't happen that the side-effect *i does not occur (which also would be a problem). So, I think there wasn't an actual problem before, and it had fewer jumps. Ciao, Michael. the case can be produced quite easily. Eg: extern int global = 0; if (*a global) ... ... the issue is that by C-specification see here 5.1.2.2.3 about program-termination.The important point is here:: Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete *** and no side effects of subsequent evaluations shall have taken place *** Annex C describes sequence-points as 1 The following are the sequence points described in 5.1.2.3: — The call to a function, after the arguments have been evaluated (6.5.2.2). — The end of the first operand of the following operators: logical AND (6.5.13); logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17). ... Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
I describe the sample more closely here extern int global = 0; extern int *a = NULL; void catchSigSegV( int sig ) { a = global; } int foo (int j) { signal (SIGSEGV, catchSigSegV); if (*a global) return 2; return 0; } I admit that in most cases such a scenario is not common. This sample seems to be a valid C program. So the conditions in IF shall be evaluted strict in order of sequence-points, as first argument might trap. It doesn't matter if second argument have side-effects or none. The point is the first and so it has to be separated from other conditions. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: int f(char *i, int j) { if (*i j!=2) return *i; else return j; } the case can be produced quite easily. extern int global = 0; if (*a global) ... See? You had to change the program to prove the transformation to be invalid. But my point was that the function we discuss about was exactly as above. It didn't have globals, or two loads, or a volatile, or anything else you can come up with where the transformation would be detectable (and hence invalid). I claim that you can't construct a program that can distinguish between this function: int f(char *i, int j) { if (*i j!=2) return *i; else return j; } and this one: int f(char *i, int j) { if (*i j!=2) return *i; else return j; } And if you can't construct such a program, then the initial transformation before the fold-const.c change _for this specific situation_ was correct. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Michael Matz m...@suse.de: Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: int f(char *i, int j) { if (*i j!=2) return *i; else return j; } the case can be produced quite easily. extern int global = 0; if (*a global) ... See? You had to change the program to prove the transformation to be invalid. But my point was that the function we discuss about was exactly as above. It didn't have globals, or two loads, or a volatile, or anything else you can come up with where the transformation would be detectable (and hence invalid). I claim that you can't construct a program that can distinguish between this function: int f(char *i, int j) { if (*i j!=2) return *i; else return j; } and this one: int f(char *i, int j) { if (*i j!=2) return *i; else return j; } And if you can't construct such a program, then the initial transformation before the fold-const.c change _for this specific situation_ was correct. Ciao, Michael. well, if such a function is used as inline and we know for it that j has value != 2, then we have here a big difference. For your first example, we still have to do the memory access to *i, even if we are not interested in result. See here point 4 of 5.1.2.3 of C-spec. For your second sample we don't need to do that, as the itself is no sequence-point and so we can eliminate the *i access without breaking anything. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: well, if such a function is used as inline and we know for it that j has value != 2, then we have here a big difference. For your first example, we still have to do the memory access to *i, even if we are not interested in result. Actually we don't have to preserve memory accesses. The interesting case is if the pointer has an invalid value. The behaviour of the access then is undefined, and it's okay to not do it at all. In case the pointer does point to an object the access (if it's value isn't needed) also isn't necessary. IOW: in void f(int *p) { int i = *p; } we can always remove the pointer read. So, I still maintain that the transformation on the original example was okay. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Michael Matz m...@suse.de: Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: well, if such a function is used as inline and we know for it that j has value != 2, then we have here a big difference. For your first example, we still have to do the memory access to *i, even if we are not interested in result. Actually we don't have to preserve memory accesses. The interesting case is if the pointer has an invalid value. The behaviour of the access then is undefined, and it's okay to not do it at all. In case the pointer does point to an object the access (if it's value isn't needed) also isn't necessary. IOW: in void f(int *p) { int i = *p; } we can always remove the pointer read. So, I still maintain that the transformation on the original example was okay. Ciao, Michael. So you would mean that memory dereferencing shouldn't be considered as side-effect at all? So we would happily cause by code 'if (i *i != 0) an crash, as memory-dereference has for you no side-effect? In you special case it might be valid that, if first (and C-fold-const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Regards, Kai Hmm, not sure
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: So you would mean that memory dereferencing shouldn't be considered as side-effect at all? No. I haven't said this at all. Of course it's a side-effect, but we're allowed to remove existing ones (under some circumstances). We're not allowed to introduce new ones, which means that this ... So we would happily cause by code 'if (i *i != 0) an crash, as memory-dereference has for you no side-effect? ... is not allowed. But in the original example the memread was on the left side, hence occured always, therefore we can move it to the right side, even though it might occur less often. In you special case it might be valid that, if first (and C-fold-const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Hmm? I don't see why it's necessary to check if it's the left-most condition in a chain. If the left hand of '' is a memread it can always be moved to the right side (or the operator transformed into '' which can have the same effect), of course only if the original rhs is free of side effects, but then independed if the was part of a larger expression. The memread will possibly be done fewer times than originally, but as said, that's okay. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Fri, Oct 14, 2011 at 9:43 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, So I committed the gimplify patch separate. And here is the remaining fold-const patch. The important tests here are in gcc.dg/tree-ssa/builtin-expect[1-4].c, which cover the one special-case for branching. Also tree-ssa/20040204-1.c covers tests for branching code (on targets having high-engough BRANCH_COST and no special-casing - like MIPS, S/390, and AVR. ChangeLog 2011-10-14 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Ok with ... Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); recurse with simple_operand_p. + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) Move this check before the tcc_comparison check and remove the then redundant tree_could_trap_p check there. + return false; + + switch (code) + { + case SSA_NAME: + return true; Do not handle here, it's handled in simple_operand_p. + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; Remove the BIT_NOT_EXPR handling. Thus, simply change this switch to if (code == TRUTH_NOT_EXPR) return simple_operand_p_2 (TREE_OPERAND (exp, 0)); return simple_operand_p (exp); + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + default: + return simple_operand_p (exp); + } +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/17 Richard Guenther richard.guent...@gmail.com: On Fri, Oct 14, 2011 at 9:43 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, So I committed the gimplify patch separate. And here is the remaining fold-const patch. The important tests here are in gcc.dg/tree-ssa/builtin-expect[1-4].c, which cover the one special-case for branching. Also tree-ssa/20040204-1.c covers tests for branching code (on targets having high-engough BRANCH_COST and no special-casing - like MIPS, S/390, and AVR. ChangeLog 2011-10-14 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Ok with ... Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); recurse with simple_operand_p. No, as this again would reject simple operations and additionally wouldn't check for trapping. + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) Move this check before the tcc_comparison check and remove the then redundant tree_could_trap_p check there. Ok + return false; + + switch (code) + { + case SSA_NAME: + return true; Do not handle here, it's handled in simple_operand_p. Well, was more a short-cut here. + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; Remove the BIT_NOT_EXPR handling. Thus, simply change this switch to Why should we reject simple ~X operations from gimplified code here? I admit that from FE-code we won't see that, as always an integer-cast is done for foo (_Bool x) { ... if (~x) ... }, but from gimplified-code this is the general description of an boolean-typed != 0? if (code == TRUTH_NOT_EXPR) return simple_operand_p_2 (TREE_OPERAND (exp, 0)); return simple_operand_p (exp); + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + default: +
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 17, 2011 at 12:59 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/17 Richard Guenther richard.guent...@gmail.com: On Fri, Oct 14, 2011 at 9:43 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, So I committed the gimplify patch separate. And here is the remaining fold-const patch. The important tests here are in gcc.dg/tree-ssa/builtin-expect[1-4].c, which cover the one special-case for branching. Also tree-ssa/20040204-1.c covers tests for branching code (on targets having high-engough BRANCH_COST and no special-casing - like MIPS, S/390, and AVR. ChangeLog 2011-10-14 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Ok with ... Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); recurse with simple_operand_p. No, as this again would reject simple operations and additionally wouldn't check for trapping. ? Your code allows arbitrarily complex expressions. Also tree_could_trap_p obviously extents to operands. + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) Move this check before the tcc_comparison check and remove the then redundant tree_could_trap_p check there. Ok + return false; + + switch (code) + { + case SSA_NAME: + return true; Do not handle here, it's handled in simple_operand_p. Well, was more a short-cut here. + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; Remove the BIT_NOT_EXPR handling. Thus, simply change this switch to Why should we reject simple ~X operations from gimplified code here? Because this is FE triggered code. From gimple you won't ever see such complex expressions (never even the TRUTH_AND*_EXPR variants). I admit that from FE-code we won't see that, as always an
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/17 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 17, 2011 at 12:59 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/17 Richard Guenther richard.guent...@gmail.com: On Fri, Oct 14, 2011 at 9:43 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, So I committed the gimplify patch separate. And here is the remaining fold-const patch. The important tests here are in gcc.dg/tree-ssa/builtin-expect[1-4].c, which cover the one special-case for branching. Also tree-ssa/20040204-1.c covers tests for branching code (on targets having high-engough BRANCH_COST and no special-casing - like MIPS, S/390, and AVR. ChangeLog 2011-10-14 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Ok with ... Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); recurse with simple_operand_p. No, as this again would reject simple operations and additionally wouldn't check for trapping. ? Your code allows arbitrarily complex expressions. Also tree_could_trap_p obviously extents to operands. Ah, ok. I wasn't aware that it walks into tree. + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) Move this check before the tcc_comparison check and remove the then redundant tree_could_trap_p check there. Ok + return false; + + switch (code) + { + case SSA_NAME: + return true; Do not handle here, it's handled in simple_operand_p. Well, was more a short-cut here. + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; Remove the BIT_NOT_EXPR handling. Thus, simply change this switch to Why should we reject simple ~X operations from gimplified code here? Because this is FE triggered code. From gimple you won't ever see such complex
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 17, 2011 at 1:31 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/17 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 17, 2011 at 12:59 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/17 Richard Guenther richard.guent...@gmail.com: On Fri, Oct 14, 2011 at 9:43 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, So I committed the gimplify patch separate. And here is the remaining fold-const patch. The important tests here are in gcc.dg/tree-ssa/builtin-expect[1-4].c, which cover the one special-case for branching. Also tree-ssa/20040204-1.c covers tests for branching code (on targets having high-engough BRANCH_COST and no special-casing - like MIPS, S/390, and AVR. ChangeLog 2011-10-14 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Ok with ... Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); recurse with simple_operand_p. No, as this again would reject simple operations and additionally wouldn't check for trapping. ? Your code allows arbitrarily complex expressions. Also tree_could_trap_p obviously extents to operands. Ah, ok. I wasn't aware that it walks into tree. + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) Move this check before the tcc_comparison check and remove the then redundant tree_could_trap_p check there. Ok + return false; + + switch (code) + { + case SSA_NAME: + return true; Do not handle here, it's handled in simple_operand_p. Well, was more a short-cut here. + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; Remove the BIT_NOT_EXPR handling. Thus, simply change this switch to Why should we reject simple ~X operations from gimplified code here?
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Ok, I see. This might be profitable to do that. So fold_truth_op hunk looks like this @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } } /* See if the comparisons can be merged. Then get all the parameters for @@ -8380,13 +8400,77 @@ fold_truth_andor (location_t loc, enum t lhs is another similar operation, try to merge its rhs with our rhs. Then try to merge our lhs and rhs. */ if (TREE_CODE (arg0) == code - 0 != (tem = fold_truthop (loc, code, type, - TREE_OPERAND (arg0, 1), arg1))) + 0 != (tem = fold_truth_andor_1 (loc, code, type, +TREE_OPERAND (arg0, 1), arg1))) return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); - if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) + if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0) return tem; + if ((BRANCH_COST (optimize_function_for_speed_p (cfun), + false) = 2) + LOGICAL_OP_NON_SHORT_CIRCUIT + simple_operand_p_2 (arg1)) +{ + enum tree_code ncode; + + if (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) +{ + ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR : TRUTH_OR_EXPR); + + /* Transform ((A AND-IF B) AND-IF C) into (A AND-IF (B AND C)), +or ((A OR-IF B) OR-IF C) into (A OR-IF (B OR C)) +We don't want to pack more than two leafs to a non-IF AND/OR +expression. +If tree-code of left-hand operand isn't an AND/OR-IF code and not +equal to CODE, then we don't want to add right-hand operand. +If the inner right-hand side of left-hand operand has +side-effects, or isn't simple, then we can't add to it, +as otherwise we might destroy if-sequence. */ + if (TREE_CODE (arg0) == code + /* Needed for sequence points to handle trappings, and +side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg0, 1))) + { + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1), +arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); + } + /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) +into (A OR B). +For sequence point consistancy, we need to check for trapping, +and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); + } + else +{ + ncode = (code == TRUTH_AND_EXPR ? TRUTH_ANDIF_EXPR + : TRUTH_ORIF_EXPR); + /* Transform ((A AND-IF B) AND C) into (A AND-IF (B AND C)), +or ((A OR-IF B) OR C) into (A OR-IF (B OR C)) +We don't want to pack more than two leafs to a non-IF AND/OR +expression. +If tree-code of left-hand operand isn't an AND/OR-IF code and not +equal to NCODE, then we don't want to add right-hand operand. +If the inner right-hand side of left-hand operand has +side-effects, or isn't simple, then we can't add to it, +as otherwise we might destroy if-sequence. */ + if (TREE_CODE (arg0) == ncode + /* Needed for sequence points to handle trappings, and +side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg0, 1))) + { + tem = fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 1), +arg1); + return fold_build2_loc (loc, ncode, type, + TREE_OPERAND (arg0, 0), tem); + } + } + +} + return NULL_TREE; } Ok, with other changes you mentioned? Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 17, 2011 at 2:22 PM, Kai Tietz ktiet...@googlemail.com wrote: Ok, I see. This might be profitable to do that. So fold_truth_op hunk looks like this @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } } /* See if the comparisons can be merged. Then get all the parameters for @@ -8380,13 +8400,77 @@ fold_truth_andor (location_t loc, enum t lhs is another similar operation, try to merge its rhs with our rhs. Then try to merge our lhs and rhs. */ if (TREE_CODE (arg0) == code - 0 != (tem = fold_truthop (loc, code, type, - TREE_OPERAND (arg0, 1), arg1))) + 0 != (tem = fold_truth_andor_1 (loc, code, type, + TREE_OPERAND (arg0, 1), arg1))) return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); - if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) + if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0) return tem; + if ((BRANCH_COST (optimize_function_for_speed_p (cfun), + false) = 2) + LOGICAL_OP_NON_SHORT_CIRCUIT + simple_operand_p_2 (arg1)) + { + enum tree_code ncode; + + if (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + { + ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR : TRUTH_OR_EXPR); + + /* Transform ((A AND-IF B) AND-IF C) into (A AND-IF (B AND C)), + or ((A OR-IF B) OR-IF C) into (A OR-IF (B OR C)) + We don't want to pack more than two leafs to a non-IF AND/OR + expression. + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to CODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has + side-effects, or isn't simple, then we can't add to it, + as otherwise we might destroy if-sequence. */ + if (TREE_CODE (arg0) == code + /* Needed for sequence points to handle trappings, and + side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg0, 1))) + { + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1), + arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); + } + /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) + into (A OR B). + For sequence point consistancy, we need to check for trapping, + and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); + } + else + { + ncode = (code == TRUTH_AND_EXPR ? TRUTH_ANDIF_EXPR + : TRUTH_ORIF_EXPR); + /* Transform ((A AND-IF B) AND C) into (A AND-IF (B AND C)), + or ((A OR-IF B) OR C) into (A OR-IF (B OR C)) + We don't want to pack more than two leafs to a non-IF AND/OR + expression. + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to NCODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has + side-effects, or isn't simple, then we can't add to it, + as otherwise we might destroy if-sequence. */ + if (TREE_CODE (arg0) == ncode + /* Needed for sequence points to handle trappings, and + side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg0, 1))) + { + tem = fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 1), + arg1); + return fold_build2_loc (loc, ncode, type, + TREE_OPERAND (arg0, 0), tem); + } + } + + } + return NULL_TREE; } Ok, with other changes you mentioned? This can be done without so much code duplication. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Sure, Is simplier and also handles (A T[-IF] (B T-IF C) - (A T B) T-IF C case, which can happen by framing in conditions. @@ -8380,13 +8400,65 @@ fold_truth_andor (location_t loc, enum t lhs is another similar operation, try to merge its rhs with our rhs. Then try to merge our lhs and rhs. */ if (TREE_CODE (arg0) == code - 0 != (tem = fold_truthop (loc, code, type, - TREE_OPERAND (arg0, 1), arg1))) + 0 != (tem = fold_truth_andor_1 (loc, code, type, +TREE_OPERAND (arg0, 1), arg1))) return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); - if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) + if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0) return tem; + if ((BRANCH_COST (optimize_function_for_speed_p (cfun), + false) = 2) + LOGICAL_OP_NON_SHORT_CIRCUIT) +{ + enum tree_code ncode, icode; + + ncode = (code == TRUTH_ANDIF_EXPR || code == TRUTH_AND_EXPR) + ? TRUTH_AND_EXPR : TRUTH_OR_EXPR; + icode = ncode == TRUTH_AND_EXPR ? TRUTH_ANDIF_EXPR : TRUTH_ORIF_EXPR; + + /* Transform ((A AND-IF B) AND[-IF] C) into (A AND-IF (B AND C)), +or ((A OR-IF B) OR[-IF] C) into (A OR-IF (B OR C)) +We don't want to pack more than two leafs to a non-IF AND/OR +expression. +If tree-code of left-hand operand isn't an AND/OR-IF code and not +equal to IF-CODE, then we don't want to add right-hand operand. +If the inner right-hand side of left-hand operand has +side-effects, or isn't simple, then we can't add to it, +as otherwise we might destroy if-sequence. */ + if (TREE_CODE (arg0) == icode + simple_operand_p_2 (arg1) + /* Needed for sequence points to handle trappings, and +side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg0, 1))) + { + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1), +arg1); + return fold_build2_loc (loc, icode, type, TREE_OPERAND (arg0, 0), + tem); + } + /* Same as abouve but for (A AND[-IF] (B AND-IF C)) - ((A AND B) AND-IF C), + or (A OR[-IF] (B OR-IF C) - ((A OR B) OR-IF C). */ + else if (TREE_CODE (arg1) == icode + simple_operand_p_2 (arg0) + /* Needed for sequence points to handle trappings, and +side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg1, 0))) + { + tem = fold_build2_loc (loc, ncode, type, +arg0, TREE_OPERAND (arg1, 0)); + return fold_build2_loc (loc, icode, type, tem, + TREE_OPERAND (arg1, 1)); + } + /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) +into (A OR B). +For sequence point consistancy, we need to check for trapping, +and side-effects. */ + else if (code == icode simple_operand_p_2 (arg0) +simple_operand_p_2 (arg1)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); +} + return NULL_TREE; } Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 17, 2011 at 3:36 PM, Kai Tietz ktiet...@googlemail.com wrote: Sure, Is simplier and also handles (A T[-IF] (B T-IF C) - (A T B) T-IF C case, which can happen by framing in conditions. @@ -8380,13 +8400,65 @@ fold_truth_andor (location_t loc, enum t lhs is another similar operation, try to merge its rhs with our rhs. Then try to merge our lhs and rhs. */ if (TREE_CODE (arg0) == code - 0 != (tem = fold_truthop (loc, code, type, - TREE_OPERAND (arg0, 1), arg1))) + 0 != (tem = fold_truth_andor_1 (loc, code, type, + TREE_OPERAND (arg0, 1), arg1))) return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); - if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) + if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0) return tem; + if ((BRANCH_COST (optimize_function_for_speed_p (cfun), + false) = 2) + LOGICAL_OP_NON_SHORT_CIRCUIT) + { + enum tree_code ncode, icode; + + ncode = (code == TRUTH_ANDIF_EXPR || code == TRUTH_AND_EXPR) + ? TRUTH_AND_EXPR : TRUTH_OR_EXPR; + icode = ncode == TRUTH_AND_EXPR ? TRUTH_ANDIF_EXPR : TRUTH_ORIF_EXPR; + + /* Transform ((A AND-IF B) AND[-IF] C) into (A AND-IF (B AND C)), + or ((A OR-IF B) OR[-IF] C) into (A OR-IF (B OR C)) + We don't want to pack more than two leafs to a non-IF AND/OR + expression. + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to IF-CODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has + side-effects, or isn't simple, then we can't add to it, + as otherwise we might destroy if-sequence. */ + if (TREE_CODE (arg0) == icode + simple_operand_p_2 (arg1) + /* Needed for sequence points to handle trappings, and + side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg0, 1))) + { + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1), + arg1); + return fold_build2_loc (loc, icode, type, TREE_OPERAND (arg0, 0), + tem); + } + /* Same as abouve but for (A AND[-IF] (B AND-IF C)) - ((A AND B) AND-IF C), + or (A OR[-IF] (B OR-IF C) - ((A OR B) OR-IF C). */ + else if (TREE_CODE (arg1) == icode + simple_operand_p_2 (arg0) + /* Needed for sequence points to handle trappings, and + side-effects. */ + simple_operand_p_2 (TREE_OPERAND (arg1, 0))) + { + tem = fold_build2_loc (loc, ncode, type, + arg0, TREE_OPERAND (arg1, 0)); + return fold_build2_loc (loc, icode, type, tem, + TREE_OPERAND (arg1, 1)); + } + /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) + into (A OR B). + For sequence point consistancy, we need to check for trapping, + and side-effects. */ + else if (code == icode simple_operand_p_2 (arg0) + simple_operand_p_2 (arg1)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); + } + return NULL_TREE; } Ok with the rest of the changes I requested. Richard. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 13, 2011 at 3:25 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this new version addresses the comments from you. On gimplify.c's gimplify_expr we didn't handled the case that operands for TRUTH-AND/OR/XOR expressions need to have same operand-size in case of transformation to bitwise-binary operation. This shows up for Fortran, as there are more than one boolean-kind type with different mode-sizes. I added a testcase for this, The gimplify.c bits and the new testcase is ok. They should have been submitted separately. Please re-submit the fold-const.c part. Thanks, Richard. ChangeLog 2011-10-13 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. * gimplify.c (gimplify_expr): Take care that for bitwise-binary transformation the operands have compatible types. 2011-10-13 Kai Tietz kti...@redhat.com * gfortran.fortran-torture/compile/logical-2.f90: New test. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) + return false; + + switch (code) + { + case SSA_NAME: + return true; + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + default: + return simple_operand_p (exp); + } +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hello, this new version addresses the comments from Michael and additional fixes an latent issue shown up by this rewrite in fold-const. On gimplify.c's gimple_boolify we didn't handled the case that operands for TRUTH-expressions need to have same operand-size for transformation to bitwise operation. This shows up for Fortran, as here are more then one boolean-kind type with different mode-sizes. I added a testcase for this, ChangeLog 2011-10-13 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. * gimplify.c (gimple_boolify): Take care that for bitwise-binary transformation the operands have compatible types. 2011-10-13 Kai Tietz kti...@redhat.com * gfortran.fortran-torture/compile/logical-2.f90: New test. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) +return (!tree_could_trap_p (exp) +simple_operand_p_2 (TREE_OPERAND (exp, 0)) +simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) +return false; + + switch (code) +{ +case SSA_NAME: + return true; +case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +default: + return simple_operand_p (exp); +} +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to zero if and only if C is signed-extended to its full width. If MASK is nonzero, it is an INTEGER_CST that should be AND'ed with the extra bits. */ @@ -5025,8 +5065,8 @@ merge_truthop_with_opposite_arm (locatio
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 13, 2011 at 1:25 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this new version addresses the comments from Michael and additional fixes an latent issue shown up by this rewrite in fold-const. On gimplify.c's gimple_boolify we didn't handled the case that operands for TRUTH-expressions need to have same operand-size for transformation to bitwise operation. This shows up for Fortran, as here are more then one boolean-kind type with different mode-sizes. I added a testcase for this, ChangeLog 2011-10-13 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. * gimplify.c (gimple_boolify): Take care that for bitwise-binary transformation the operands have compatible types. 2011-10-13 Kai Tietz kti...@redhat.com * gfortran.fortran-torture/compile/logical-2.f90: New test. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) + return false; + + switch (code) + { + case SSA_NAME: + return true; + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + default: + return simple_operand_p (exp); + } +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to zero if and only if C is signed-extended to its full
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/13 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 13, 2011 at 1:25 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this new version addresses the comments from Michael and additional fixes an latent issue shown up by this rewrite in fold-const. On gimplify.c's gimple_boolify we didn't handled the case that operands for TRUTH-expressions need to have same operand-size for transformation to bitwise operation. This shows up for Fortran, as here are more then one boolean-kind type with different mode-sizes. I added a testcase for this, ChangeLog 2011-10-13 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. * gimplify.c (gimple_boolify): Take care that for bitwise-binary transformation the operands have compatible types. 2011-10-13 Kai Tietz kti...@redhat.com * gfortran.fortran-torture/compile/logical-2.f90: New test. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (!tree_could_trap_p (exp) + simple_operand_p_2 (TREE_OPERAND (exp, 0)) + simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) + return false; + + switch (code) + { + case SSA_NAME: + return true; + case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); + default: + return simple_operand_p (exp); + } +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hello, this new version addresses the comments from you. On gimplify.c's gimplify_expr we didn't handled the case that operands for TRUTH-AND/OR/XOR expressions need to have same operand-size in case of transformation to bitwise-binary operation. This shows up for Fortran, as there are more than one boolean-kind type with different mode-sizes. I added a testcase for this, ChangeLog 2011-10-13 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. * gimplify.c (gimplify_expr): Take care that for bitwise-binary transformation the operands have compatible types. 2011-10-13 Kai Tietz kti...@redhat.com * gfortran.fortran-torture/compile/logical-2.f90: New test. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3678,7 +3678,7 @@ simple_operand_p (const_tree exp) STRIP_NOPS (exp); return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME + || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) +return (!tree_could_trap_p (exp) +simple_operand_p_2 (TREE_OPERAND (exp, 0)) +simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (TREE_SIDE_EFFECTS (exp) + || tree_could_trap_p (exp)) +return false; + + switch (code) +{ +case SSA_NAME: + return true; +case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +default: + return simple_operand_p (exp); +} +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to zero if and only if C is signed-extended to its full width. If MASK is nonzero, it is an INTEGER_CST that should be AND'ed with the extra bits. */ @@ -5025,8 +5065,8 @@ merge_truthop_with_opposite_arm (locatio We return the simplified tree or 0 if no optimization
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 13 Oct 2011, Kai Tietz wrote: this new version addresses the comments from Michael and additional fixes an latent issue shown up by this rewrite in fold-const. On gimplify.c's gimple_boolify we didn't handled the case that operands for TRUTH-expressions need to have same operand-size for transformation to bitwise operation. The requirement comes from BIT_AND_EXPR, not from any of the TRUTH_* expressions, hence the point of generating the BIT_AND_EXPR is the point to fixup the types. Similar to this (fixes your testcase): Index: gimplify.c === --- gimplify.c (revision 179855) +++ gimplify.c (working copy) /* See if any simplifications can be done based on what the RHS is. */ @@ -7257,6 +7264,18 @@ gimplify_expr (tree *expr_p, gimple_seq { tree orig_type = TREE_TYPE (*expr_p); *expr_p = gimple_boolify (*expr_p); + /* We are going to transform this into BIT operations, + which have stricter requirements on the operand types. */ + if (!useless_type_conversion_p +(orig_type, TREE_TYPE (TREE_OPERAND (*expr_p, 0 + TREE_OPERAND (*expr_p, 0) + = fold_convert_loc (input_location, orig_type, + TREE_OPERAND (*expr_p, 0)); + if (!useless_type_conversion_p +(orig_type, TREE_TYPE (TREE_OPERAND (*expr_p, 1 + TREE_OPERAND (*expr_p, 1) + = fold_convert_loc (input_location, orig_type, + TREE_OPERAND (*expr_p, 1)); if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p))) { *expr_p = fold_convert_loc (input_location, orig_type, *expr_p); Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 12 Oct 2011, Kai Tietz wrote: And I think it could use some overview of the transformation done like in the initial patch, ala: Transform ((A B) C) into (A (B C)). and Or (A B) into (A B). for this part: + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); Well to use here binary form of operation seems to me misleading. Hmm? What do you mean? Both operations are binary. ANDIF is '', AND is ''. In fold-const.c comments we usually use the C notations of the operations. It is right that the non-IF AND/OR has finally the same behavior as the binary form in gimple. Nevertheless it isn't the same on AST level. But sure I can Add comments for operations like (A OR/AND-IF B) OR/AND-IF C - (A OR/AND-IF (B OR/AND C and A OR/AND-IF C - (A OR/AND C) Too much noise, leave out the || variant, and just say once Same for ||. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/12 Michael Matz m...@suse.de: Hi, On Wed, 12 Oct 2011, Kai Tietz wrote: And I think it could use some overview of the transformation done like in the initial patch, ala: Transform ((A B) C) into (A (B C)). and Or (A B) into (A B). for this part: + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); Well to use here binary form of operation seems to me misleading. Hmm? What do you mean? Both operations are binary. ANDIF is '', AND is ''. In fold-const.c comments we usually use the C notations of the operations. See TRUTH_AND_EXPR is in C-notation and TRUTH_ANDIF_EXPR is also . The transcription to binary is done in gimplifier. Btw I just noticed that by this patch a latent bug in gimplifier about boolification for TRUTH_NOT_EXPR/TRUTH_AND/OR... is present. On Fortran there are different boolean-kinds types with different precision. This makes them incompatible to eachother in gimple (as useless_type_conversion_p returns for them false). For gimplier needs to ensure that operands for those TRUTH_... expression met a compatible type of final expression type. I will sent a patch for this as soon I completed regression-testing for it. It is right that the non-IF AND/OR has finally the same behavior as the binary form in gimple. Nevertheless it isn't the same on AST level. But sure I can Add comments for operations like (A OR/AND-IF B) OR/AND-IF C - (A OR/AND-IF (B OR/AND C and A OR/AND-IF C - (A OR/AND C) Too much noise, leave out the || variant, and just say once Same for ||. Ciao, Michael. Cheers, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 12 Oct 2011, Kai Tietz wrote: Hmm? What do you mean? Both operations are binary. ANDIF is '', AND is ''. In fold-const.c comments we usually use the C notations of the operations. See TRUTH_AND_EXPR is in C-notation and TRUTH_ANDIF_EXPR is also . Ah right, confusing but there we are. A comment using ANDIF and AND it is then. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Mon, 10 Oct 2011, Kai Tietz wrote: To ensure that we use simple_operand_p in all cases, beside for branching AND/OR chains, in same way as before, I added to this function an additional argument, by which the looking into comparisons can be activated. Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type, + tree lhs, tree rhs) { /* If this is the or of two comparisons, we can do something if the comparisons are NE_EXPR. If this is the and, we can do something @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non-comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + (BRANCH_COST (optimize_function_for_speed_p (cfun), +false) = 2) + !TREE_SIDE_EFFECTS (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + simple_operand_p (arg1, true)) +{ + enum tree_code ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +: TRUTH_OR_EXPR); + + /* We don't want to pack more then two leafs to an non-IF Missing continuation of the sentence? + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to CODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has side-effects, + or isn't simple, then we can't add to it, as otherwise we might + destroy if-sequence. */ + if (TREE_CODE (arg0) == code + /* Needed for sequence points to handle trappings, and + side-effects. */ +!TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) +simple_operand_p (TREE_OPERAND (arg0, 1), true)) + { + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1), + arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); + } + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (!TREE_SIDE_EFFECTS (arg0) +simple_operand_p (arg0, true)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); +} + Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
So updated version for patch. It creates new simple_operand_p_2 function instead of modifying simple_operand_p function. ChangeLog 2011-10-11 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) +return (!tree_could_trap_p (exp) +simple_operand_p_2 (TREE_OPERAND (exp, 0)) +simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (FLOAT_TYPE_P (TREE_TYPE (exp)) + tree_could_trap_p (exp)) +return false; + + switch (code) +{ +case SSA_NAME: + return true; +case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +default: + return simple_operand_p (exp); +} +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to zero if and only if C is signed-extended to its full width. If MASK is nonzero, it is an INTEGER_CST that should be AND'ed with the extra bits. */ @@ -5025,8 +5065,8 @@ merge_truthop_with_opposite_arm (locatio We return the simplified tree or 0 if no optimization is possible. */ static tree -fold_truthop (location_t loc, enum tree_code code, tree truth_type, - tree lhs, tree rhs) +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type, + tree lhs, tree rhs) { /* If this is the or of two comparisons, we can do something if the comparisons are NE_EXPR. If this is the and, we can do something @@ -5054,8 +5094,6 @@ fold_truthop (location_t loc, enum tree_ tree lntype, rntype, result; HOST_WIDE_INT first_bit, end_bit; int volatilep; - tree orig_lhs = lhs, orig_rhs = rhs; - enum tree_code orig_code = code; /* Start by getting the comparison codes. Fail if anything is volatile. If one operand is a BIT_AND_EXPR with the constant one, treat it as if @@ -5119,8 +5157,7 @@
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. Well, either I make it a new function and call it instead of simple_operand_p, That's what I meant, yes. @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non-comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is better done at a single place in fold_truth_andor only. As fold_truthop is called twice by fold_truth_andor, the latter might indeed be the better place. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/11 Michael Matz m...@suse.de: Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. Well, either I make it a new function and call it instead of simple_operand_p, That's what I meant, yes. @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non-comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. It is. Before we changed expression here and recurse here with the non-IF AND/OR expression later. So there is no need to do this recursion. Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is better done at a single place in fold_truth_andor only. As fold_truthop is called twice by fold_truth_andor, the latter might indeed be the better place. Ciao, Michael. Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: So updated version for patch. It creates new simple_operand_p_2 function instead of modifying simple_operand_p function. FWIW: I also can't think of a nice short name for that predicate function :) One thing: move the test for TREE_SIDE_EFFECTS to that new function, then the if()s in fold_truth_andor become nicer. I think the code then is okay, but I can't approve. Just one remark about the comment: + /* We don't want to pack more then two leafs to an non-IF AND/OR s/then/than/ s/an/a/ + expression. + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to CODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has side-effects, + or isn't simple, then we can't add to it, as otherwise we might + destroy if-sequence. */ And I think it could use some overview of the transformation done like in the initial patch, ala: Transform ((A B) C) into (A (B C)). and Or (A B) into (A B). for this part: + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/7 Kai Tietz ktiet...@googlemail.com: Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test So said, it is even required to use for right-hand and left-hand side of arguments, if one of them have side-effects or isn't simple. Means the check in my patch should use for + else if (TREE_CODE (arg0) != TRUTH_AND_EXPR + TREE_CODE (arg0) != TRUTH_OR_EXPR + TREE_CODE (arg0) != TRUTH_ANDIF_EXPR + TREE_CODE (arg0) != TRUTH_ORIF_EXPR + TREE_CODE (arg0) != TRUTH_XOR_EXPR + /* Needed for sequence points and trappings, or side-effects. */ + !TREE_SIDE_EFFECTS (arg0) + !tree_could_trap_p (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); instead if (!TREE_SIDE_EFFECTS (arg0) simple_operand_p (arg0)) instead. The cause for this are the consitancies of sequences in tree. I noticed that by tests in gcc.dg/tree-ssa about builitin_expect. for example we have extern int foo (void); /* foo modifies gbl1 */ int gbl1 = 0; int foo (int ns1) { if (ns1 foo () gbl1) return 1; return 0; } so chain of trees has to look like this: (ANDIF (ns1 (ANDIF foo () gbl1)) but if we don't check here for side-effects for left-hand chaining operand, then we end up with (AND ns1 (ANDIF foo () gbl1)) As AND and has associative property, tree says that right-hand and left-hand are exchangable, which is obviously wrong. Cheers, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 10, 2011 at 12:35 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/7 Kai Tietz ktiet...@googlemail.com: Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test So said, it is even required to use for right-hand and left-hand side of arguments, if one of them have side-effects or isn't simple. Means the check in my patch should use for + else if (TREE_CODE (arg0) != TRUTH_AND_EXPR + TREE_CODE (arg0) != TRUTH_OR_EXPR + TREE_CODE (arg0) != TRUTH_ANDIF_EXPR + TREE_CODE (arg0) != TRUTH_ORIF_EXPR + TREE_CODE (arg0) != TRUTH_XOR_EXPR + /* Needed for sequence points and trappings, or side-effects. */ + !TREE_SIDE_EFFECTS (arg0) + !tree_could_trap_p (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); instead if (!TREE_SIDE_EFFECTS (arg0) simple_operand_p (arg0)) instead. The cause for this are the consitancies of sequences in tree. I noticed that by tests in gcc.dg/tree-ssa about builitin_expect. for example we have extern int foo (void); /* foo modifies gbl1 */ int gbl1 = 0; int foo (int ns1) { if (ns1 foo () gbl1) return 1; return 0; } so chain of trees has to look like this: (ANDIF (ns1 (ANDIF foo () gbl1)) but if we don't check here for side-effects for left-hand chaining operand, then we end up with (AND ns1 (ANDIF foo () gbl1)) No we don't, as the right-hand (ANDIF foo () glbl1) has side-effects. As AND and has associative property, tree says that right-hand and left-hand are exchangable, which is obviously wrong. The poitn is that if the right-hand does not have side-effects it doesn't matter if we execute it before the left-hand (independent on whether that has side-effects or not). Richard. Cheers, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 12:35 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/7 Kai Tietz ktiet...@googlemail.com: Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test So said, it is even required to use for right-hand and left-hand side of arguments, if one of them have side-effects or isn't simple. Means the check in my patch should use for + else if (TREE_CODE (arg0) != TRUTH_AND_EXPR + TREE_CODE (arg0) != TRUTH_OR_EXPR + TREE_CODE (arg0) != TRUTH_ANDIF_EXPR + TREE_CODE (arg0) != TRUTH_ORIF_EXPR + TREE_CODE (arg0) != TRUTH_XOR_EXPR + /* Needed for sequence points and trappings, or side-effects. */ + !TREE_SIDE_EFFECTS (arg0) + !tree_could_trap_p (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); instead if (!TREE_SIDE_EFFECTS (arg0) simple_operand_p (arg0)) instead. The cause for this are the consitancies of sequences in tree. I noticed that by tests in gcc.dg/tree-ssa about builitin_expect. for example we have extern int foo (void); /* foo modifies gbl1 */ int gbl1 = 0; int foo (int ns1) { if (ns1 foo () gbl1) return 1; return 0; } so chain of trees has to look like this: (ANDIF (ns1 (ANDIF foo () gbl1)) but if we don't check here for side-effects for left-hand chaining operand, then we end up with (AND ns1 (ANDIF foo () gbl1)) No we don't, as the right-hand (ANDIF foo () glbl1) has side-effects. As AND and has associative property, tree says that right-hand and left-hand are exchangable, which is obviously wrong. The poitn is that if the right-hand does not have side-effects it doesn't matter if we execute it before the left-hand (independent on whether that has side-effects or not). Richard. thats just true as long as we don't make use of associative law for AND expressions. Otherwise we would fail for much simpler cases like entern int doo (); int foo () { int c, r = 0; if ((c = foo ()) != 0 c 20) r = 1; return 0; } as for this left-hand operand has side-effects, but as it is the first one, we would chain it as AND. So we get wrong sequence. Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Sample had a typo. Correct sample has of course to return r. int foo () { int c, r = 0; if ((c = foo ()) != 0 c 20) r = 1; return r; }
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 10, 2011 at 12:47 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 12:35 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/7 Kai Tietz ktiet...@googlemail.com: Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test So said, it is even required to use for right-hand and left-hand side of arguments, if one of them have side-effects or isn't simple. Means the check in my patch should use for + else if (TREE_CODE (arg0) != TRUTH_AND_EXPR + TREE_CODE (arg0) != TRUTH_OR_EXPR + TREE_CODE (arg0) != TRUTH_ANDIF_EXPR + TREE_CODE (arg0) != TRUTH_ORIF_EXPR + TREE_CODE (arg0) != TRUTH_XOR_EXPR + /* Needed for sequence points and trappings, or side-effects. */ + !TREE_SIDE_EFFECTS (arg0) + !tree_could_trap_p (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); instead if (!TREE_SIDE_EFFECTS (arg0) simple_operand_p (arg0)) instead. The cause for this are the consitancies of sequences in tree. I noticed that by tests in gcc.dg/tree-ssa about builitin_expect. for example we have extern int foo (void); /* foo modifies gbl1 */ int gbl1 = 0; int foo (int ns1) { if (ns1 foo () gbl1) return 1; return 0; } so chain of trees has to look like this: (ANDIF (ns1 (ANDIF foo () gbl1)) but if we don't check here for side-effects for left-hand chaining operand, then we end up with (AND ns1 (ANDIF foo () gbl1)) No we don't, as the right-hand (ANDIF foo () glbl1) has side-effects. As AND and has associative property, tree says that right-hand and left-hand are exchangable, which is obviously wrong. The poitn is that if the right-hand does not have side-effects it doesn't matter if we execute it before the left-hand (independent on whether that has side-effects or not). Richard. thats just true as long as we don't make use of associative law for AND expressions. Otherwise we would fail for much simpler cases like entern int doo (); int foo () { int c, r = 0; if ((c = foo ()) != 0 c 20) r = 1; return 0; } as for this left-hand operand has side-effects, but as it is the first one, we would chain it as AND. So we get wrong sequence. Well, then the predicates checking for simplicity and side-effects should better match. Richard. Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Fri, Oct 7, 2011 at 11:36 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test int getter (void); int foo (void) { int c, r = 0; while ((c = getter ()) != '' c = 0) r +=c; return r; } would give a warning about sequence-points. As left-hand operand has side-effects, but right-hand not. If we would combine it as AND, the operands are exchange-able. So right-hand operand needs to be another ANDIF expression instead. Same apply on trapping. In fold_truthop we still have the same (albeit more restricted transform), but guarded with if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2 too. Why not here? Please delete redundant code in fold_truthop. Well, in general this is the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT, so I missed that. As for some targets the macro LOGICAL_OP_NON_SHORT_CIRCUIT might be defined differently, it might make sense to check for BRANCH_COST again. It's also odd that this is only called from fold_truth_andor but has a more generic name, so maybe rename it to fold_truth_andor_1 on the way. I renamed it. Richard. ChangeLog 2011-10-07 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p): Make argument non-const and add floating-point trapping check, and special cases for comparisons, and logical-not's. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove here TRUTH-AND|OR_EXPR generation. (fold_truth_andor): Handle branching at one place. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/10 Richard Guenther richard.guent...@gmail.com: On Fri, Oct 7, 2011 at 11:36 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test int getter (void); int foo (void) { int c, r = 0; while ((c = getter ()) != '' c = 0) r +=c; return r; } would give a warning about sequence-points. As left-hand operand has side-effects, but right-hand not. If we would combine it as AND, the operands are exchange-able. So right-hand operand needs to be another ANDIF expression instead. Same apply on trapping. In fold_truthop we still have the same (albeit more restricted transform), but guarded with if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2 too. Why not here? Please delete redundant code in fold_truthop. Well, in general this is the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT, so I missed that. As for some targets the macro LOGICAL_OP_NON_SHORT_CIRCUIT might be defined differently, it might make sense to check for BRANCH_COST again. It's also odd that this is only called from fold_truth_andor but has a more generic name, so maybe rename it to fold_truth_andor_1 on the way. I renamed it. Richard. ChangeLog 2011-10-07 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p): Make argument non-const and add floating-point trapping check, and special cases for comparisons, and logical-not's. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove here TRUTH-AND|OR_EXPR generation. (fold_truth_andor): Handle branching at one place. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz ktiet...@googlemail.com wrote: Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF. It has to be checked that the LHS code is same as outer CODE, as otherwise we wouldn't apply different TRUTH-IF only on inner RHS of LHS, which is of course wrong. Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int -simple_operand_p (const_tree exp) +simple_operand_p (tree exp) { + enum tree_code code; /* Strip any conversions that don't change the machine mode. */ STRIP_NOPS (exp); + code = TREE_CODE (exp); + + /* Handle some trivials */ + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (tree_could_trap_p (exp) + simple_operand_p (TREE_OPERAND (exp, 0)) + simple_operand_p (TREE_OPERAND (exp, 1))); And that's still wrong. Stopped reading here. Richard. + if (FLOAT_TYPE_P (TREE_TYPE (exp)) + tree_could_trap_p (exp)) + return false; + + switch (code) + { + case SSA_NAME: + return true; + case TRUTH_NOT_EXPR: + return simple_operand_p (TREE_OPERAND (exp, 0)); + case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p (TREE_OPERAND (exp, 0)); + default: + break; + } + return (CONSTANT_CLASS_P (exp) - || TREE_CODE (exp) == SSA_NAME || (DECL_P (exp) ! TREE_ADDRESSABLE (exp) ! TREE_THIS_VOLATILE (exp) @@ -4888,7 +4913,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to zero if and only if C is signed-extended to its full width. If MASK is nonzero, it is an INTEGER_CST that should be AND'ed with the extra bits. */ @@ -5025,8 +5050,8 @@ merge_truthop_with_opposite_arm (locatio We return the simplified tree or 0 if no optimization is possible. */ static tree -fold_truthop (location_t loc, enum tree_code code, tree truth_type, - tree lhs, tree rhs) +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type, + tree lhs, tree rhs) { /* If this is the or of two comparisons, we can do something if the comparisons are NE_EXPR. If this is the and, we can do something @@ -5054,8 +5079,6 @@ fold_truthop (location_t loc, enum tree_ tree lntype, rntype, result; HOST_WIDE_INT first_bit, end_bit; int volatilep; - tree orig_lhs = lhs, orig_rhs = rhs; - enum tree_code orig_code = code; /* Start by getting the comparison codes. Fail if anything is volatile. If one operand is a BIT_AND_EXPR with the constant one, treat it as if @@ -5119,8 +5142,7 @@ fold_truthop (location_t loc, enum tree_ /* If the RHS can be evaluated unconditionally and its operands are simple, it wins to evaluate the RHS unconditionally on machines with expensive branches. In this case, this isn't a comparison - that can be merged. Avoid doing this if the RHS is a floating-point - comparison since those can trap. */ + that can be merged. */ if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) =
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz ktiet...@googlemail.com wrote: Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF. It has to be checked that the LHS code is same as outer CODE, as otherwise we wouldn't apply different TRUTH-IF only on inner RHS of LHS, which is of course wrong. Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int -simple_operand_p (const_tree exp) +simple_operand_p (tree exp) { + enum tree_code code; /* Strip any conversions that don't change the machine mode. */ STRIP_NOPS (exp); + code = TREE_CODE (exp); + + /* Handle some trivials */ + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (tree_could_trap_p (exp) + simple_operand_p (TREE_OPERAND (exp, 0)) + simple_operand_p (TREE_OPERAND (exp, 1))); And that's still wrong. Stopped reading here. Richard. Oh, there is a not missing. I didn't spot that, sorry. To the point why we need to handle comparisons within simple_operand_p. If we reject comparisons and logical not here, we won't have any branching optimization anymore, as this the patch moves into fold_truthandor. The result with rejecting in simple_operand_p compares and logic-not provides for the following example: extern int doo (void); extern int doo2 (void); int foo (int a, int b, int c, int d, int e, int f) { if (a b c d e f) return doo (); return doo2 (); } we get the following gimple dump (-fdump-tree-gimple): foo (int a, int b, int c, int d, int e, int f) { int D.2752; if (a != 0) goto D.2740; else goto D.2741; D.2740: if (b != 0) goto D.2742; else goto D.2743; D.2742: if (c != 0) goto D.2744; else goto D.2745; D.2744: if (d != 0) goto D.2746; else goto D.2747; D.2746: if (e != 0) goto D.2748; else goto D.2749; D.2748: if (f != 0) goto D.2750; else goto D.2751; D.2750: D.2752 = doo (); return D.2752; D.2751: D.2749: D.2747: D.2745: D.2743: D.2741: D.2752 = doo2 (); return D.2752; } So this result is caused by the fact that a logical-and/or is always a comparison. As we are rejecting comparisons/logical-not, we end up with a sequence of TRUTH_(AND|OR)_IFs. So the hole loop in fold_truth_andor for trying to pack simple and side-effect-less operands in tupels of two is useless. For the new patch (attached with proper ! fix for compares) we get for the same sample gimple output: foo (int a, int b, int c, int d, int e, int f) { _Bool D.2740; _Bool D.2741; _Bool D.2742; _Bool D.2745; _Bool D.2746; _Bool D.2747; _Bool D.2750; _Bool D.2751; _Bool D.2752; int D.2755; D.2740 = a != 0; D.2741 = b != 0; D.2742 = D.2740 D.2741; if (D.2742 != 0) goto D.2743; else goto D.2744; D.2743: D.2745 = c != 0; D.2746 = d != 0; D.2747 = D.2745 D.2746; if (D.2747 != 0) goto D.2748; else goto D.2749; D.2748: D.2750 = e != 0; D.2751 = f != 0; D.2752 = D.2750 D.2751; if (D.2752 != 0) goto D.2753; else goto D.2754; D.2753: D.2755 = doo (); return D.2755; D.2754: D.2749: D.2744: D.2755 = doo2 (); return D.2755; } which is the proper output for high branching cost, as it tries to avoid branches and not to tend them. -- Kai Index: gcc/gcc/fold-const.c === ---
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz ktiet...@googlemail.com wrote: Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF. It has to be checked that the LHS code is same as outer CODE, as otherwise we wouldn't apply different TRUTH-IF only on inner RHS of LHS, which is of course wrong. Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int -simple_operand_p (const_tree exp) +simple_operand_p (tree exp) { + enum tree_code code; /* Strip any conversions that don't change the machine mode. */ STRIP_NOPS (exp); + code = TREE_CODE (exp); + + /* Handle some trivials */ + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (tree_could_trap_p (exp) + simple_operand_p (TREE_OPERAND (exp, 0)) + simple_operand_p (TREE_OPERAND (exp, 1))); And that's still wrong. Stopped reading here. Richard. Oh, there is a not missing. I didn't spot that, sorry. To the point why we need to handle comparisons within simple_operand_p. If we reject comparisons and logical not here, we won't have any branching optimization anymore, as this the patch moves into fold_truthandor. The result with rejecting in simple_operand_p compares and logic-not provides for the following example: But you change what simple_operand_p accepts and thus change what fold_truthop accepts as operands to its simplifications. Richard. extern int doo (void); extern int doo2 (void); int foo (int a, int b, int c, int d, int e, int f) { if (a b c d e f) return doo (); return doo2 (); } we get the following gimple dump (-fdump-tree-gimple): foo (int a, int b, int c, int d, int e, int f) { int D.2752; if (a != 0) goto D.2740; else goto D.2741; D.2740: if (b != 0) goto D.2742; else goto D.2743; D.2742: if (c != 0) goto D.2744; else goto D.2745; D.2744: if (d != 0) goto D.2746; else goto D.2747; D.2746: if (e != 0) goto D.2748; else goto D.2749; D.2748: if (f != 0) goto D.2750; else goto D.2751; D.2750: D.2752 = doo (); return D.2752; D.2751: D.2749: D.2747: D.2745: D.2743: D.2741: D.2752 = doo2 (); return D.2752; } So this result is caused by the fact that a logical-and/or is always a comparison. As we are rejecting comparisons/logical-not, we end up with a sequence of TRUTH_(AND|OR)_IFs. So the hole loop in fold_truth_andor for trying to pack simple and side-effect-less operands in tupels of two is useless. For the new patch (attached with proper ! fix for compares) we get for the same sample gimple output: foo (int a, int b, int c, int d, int e, int f) { _Bool D.2740; _Bool D.2741; _Bool D.2742; _Bool D.2745; _Bool D.2746; _Bool D.2747; _Bool D.2750; _Bool D.2751; _Bool D.2752; int D.2755; D.2740 = a != 0; D.2741 = b != 0; D.2742 = D.2740 D.2741; if (D.2742 != 0) goto D.2743; else goto D.2744; D.2743: D.2745 = c != 0; D.2746 = d != 0; D.2747 = D.2745 D.2746; if (D.2747 != 0) goto D.2748; else goto D.2749; D.2748: D.2750 = e != 0; D.2751 = f != 0; D.2752 = D.2750 D.2751; if (D.2752 != 0) goto D.2753; else goto D.2754; D.2753: D.2755 = doo (); return D.2755; D.2754: D.2749: D.2744: D.2755 = doo2 (); return
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz ktiet...@googlemail.com wrote: Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF. It has to be checked that the LHS code is same as outer CODE, as otherwise we wouldn't apply different TRUTH-IF only on inner RHS of LHS, which is of course wrong. Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int -simple_operand_p (const_tree exp) +simple_operand_p (tree exp) { + enum tree_code code; /* Strip any conversions that don't change the machine mode. */ STRIP_NOPS (exp); + code = TREE_CODE (exp); + + /* Handle some trivials */ + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (tree_could_trap_p (exp) + simple_operand_p (TREE_OPERAND (exp, 0)) + simple_operand_p (TREE_OPERAND (exp, 1))); And that's still wrong. Stopped reading here. Richard. Oh, there is a not missing. I didn't spot that, sorry. To the point why we need to handle comparisons within simple_operand_p. If we reject comparisons and logical not here, we won't have any branching optimization anymore, as this the patch moves into fold_truthandor. The result with rejecting in simple_operand_p compares and logic-not provides for the following example: But you change what simple_operand_p accepts and thus change what fold_truthop accepts as operands to its simplifications. Richard. Well, not really. I assume you mean fold_truth_andor_1 (aka fold_truthop). It checks for ... if (TREE_CODE_CLASS (lcode) != tcc_comparison || TREE_CODE_CLASS (rcode) != tcc_comparison) return 0; ... before checking for simple_operand_p. So there is actual no change. It might be of some interest here to add in a different patch support for logic-not, but well, this is would be material for a different patch. So, it won't operate on anything else then comparisons as before. Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Mon, 10 Oct 2011, Kai Tietz wrote: extern int foo (void); /* foo modifies gbl1 */ int gbl1 = 0; int foo (int ns1) { if (ns1 foo () gbl1) return 1; return 0; } so chain of trees has to look like this: (ANDIF (ns1 (ANDIF foo () gbl1)) Okay, indeed. I was wrong when I claimed that only the RHS needs to be side-effect-free for ANDIF-AND to be valid. It's true when the RHS can't depend on the LHS, but I only thought about the fact that the side-effect must occur, not that it possibly influences RHS. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Mon, Oct 10, 2011 at 5:07 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/10/10 Richard Guenther richard.guent...@gmail.com: On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz ktiet...@googlemail.com wrote: Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF. It has to be checked that the LHS code is same as outer CODE, as otherwise we wouldn't apply different TRUTH-IF only on inner RHS of LHS, which is of course wrong. Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int -simple_operand_p (const_tree exp) +simple_operand_p (tree exp) { + enum tree_code code; /* Strip any conversions that don't change the machine mode. */ STRIP_NOPS (exp); + code = TREE_CODE (exp); + + /* Handle some trivials */ + if (TREE_CODE_CLASS (code) == tcc_comparison) + return (tree_could_trap_p (exp) + simple_operand_p (TREE_OPERAND (exp, 0)) + simple_operand_p (TREE_OPERAND (exp, 1))); And that's still wrong. Stopped reading here. Richard. Oh, there is a not missing. I didn't spot that, sorry. To the point why we need to handle comparisons within simple_operand_p. If we reject comparisons and logical not here, we won't have any branching optimization anymore, as this the patch moves into fold_truthandor. The result with rejecting in simple_operand_p compares and logic-not provides for the following example: But you change what simple_operand_p accepts and thus change what fold_truthop accepts as operands to its simplifications. Richard. Well, not really. I assume you mean fold_truth_andor_1 (aka fold_truthop). It checks for ... if (TREE_CODE_CLASS (lcode) != tcc_comparison || TREE_CODE_CLASS (rcode) != tcc_comparison) return 0; ... before checking for simple_operand_p. So there is actual no change. It might be of some interest here to add in a different patch support for logic-not, but well, this is would be material for a different patch. So, it won't operate on anything else then comparisons as before. Sure, because simple_operand_p is checked on the comparison operands, not the comparison itself. Richard. Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: Hi, I modified the patch so, that it always just converts two leafs of a TRUTH(AND|OR)IF chain into a TRUTH_(AND|OR) expression, if branch costs are high and leafs are simple without side-effects. Additionally I added some testcases for it. 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. 2011-10-06 Kai Tietz kti...@redhat.com * gcc.dg/tree-ssa/ssa-ifbranch-1.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-2.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-3.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-4.c: New test. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-* mn10300-*-* } } } */ +/* { dg-options -O2 -fdump-tree-gimple } */ +/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-* ilp32 } } } */ + +extern int doo1 (void); +extern int doo2 (void); + +int bar (int a, int b, int c) +{ + if (a b c) + return doo1 (); + return doo2 (); +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/fold-const.c === --- gcc-head.orig/gcc/fold-const.c +++ gcc-head/gcc/fold-const.c @@ -8387,6 +8387,45 @@ fold_truth_andor (location_t loc, enum t if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + /* floats might trap. */ + !FLOAT_TYPE_P (TREE_TYPE (arg1)) Floats don't trap on their own. If possibly trapping trees don't have TREE_SIDE_EFFECTS set then you want !tree_could_trap_p (arg1) here. + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). + || ((TREE_CODE_CLASS (TREE_CODE (arg1)) == tcc_comparison + || TREE_CODE (arg1) == TRUTH_NOT_EXPR) + /* Float comparison might trap. */ + !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0))) + simple_operand_p (TREE_OPERAND (arg1, 0) + { + /* We want to combine truth-comparison for + ((W TRUTH-ANDOR X) TRUTH-ANDORIF Y) TRUTH-ANDORIF Z, + if Y and Z are simple operands and have no side-effect to + ((W TRUTH-ANDOR X) TRUTH-IF (Y TRUTH-ANDOR Z). */ No we don't. See down-thread. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? In fold_truthop we still have the same (albeit more restricted transform), but guarded with if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2 too. Why not here? Please delete redundant code in fold_truthop. It's also odd that this is only called from fold_truth_andor but has a more generic name, so maybe rename it to fold_truth_andor_1 on the way. Richard. + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hello, this is the updated version with the suggestion 2011/10/7 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz kti...@redhat.com wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR + simple_operand_p (arg1)) As I said previously simple_operand_p already rejects covers comparisons and TRUTH_NOT_EXPR. Also arg1 had better TREE_SIDE_EFFECTS set if the comparison might trap, as it might just be hidden in something more complicated - so the simple check isn't enough anyway (and if simple_operand_p would cover it, the check would be better placed there). I reworked simple_operand_p so that it does this special-casing and additionally also checks for trapping. + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); All trees should be folded, don't use plain build without a good reason. Ok, done + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y + are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) Again, the checks you do for arg0 do not match those for arg1. OTOH it doesn't matter whether arg0 is simple or not or has side-effects or not for this transformation, so why check it at all? It is required. For left-hand operand, if it isn't a logical and/or/xor, we need to check for side-effects (and for trapping). I see that calling of simple_operand_p is wrong here, as it rejects too much. Nevertheless the check for side-effects is necessary for having valid sequence-points. Without that checking a simple test int getter (void); int foo (void) { int c, r = 0; while ((c = getter ()) != '' c = 0) r +=c; return r; } would give a warning about sequence-points. As left-hand operand has side-effects, but right-hand not. If we would combine it as AND, the operands are exchange-able. So right-hand operand needs to be another ANDIF expression instead. Same apply on trapping. In fold_truthop we still have the same (albeit more restricted transform), but guarded with if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2 too. Why not here? Please delete redundant code in fold_truthop. Well, in general this is the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT, so I missed that. As for some targets the macro LOGICAL_OP_NON_SHORT_CIRCUIT might be defined differently, it might make sense to check for BRANCH_COST again. It's also odd that this is only called from fold_truth_andor but has a more generic name, so maybe rename it to fold_truth_andor_1 on the way. I renamed it. Richard. ChangeLog 2011-10-07 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p): Make argument non-const and add floating-point trapping check, and special cases for comparisons, and logical-not's. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove here TRUTH-AND|OR_EXPR generation. (fold_truth_andor): Handle branching at one place. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -111,14 +111,13 @@ static tree decode_field_reference (loca tree *, tree *); static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); -static int simple_operand_p (const_tree); +static int simple_operand_p (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hello, Sorry attached non-updated change. Here with proper attached patch. This patch improves in fold_truth_andor the generation of branch-conditions for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set. If right-hand side operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand operand is a simple operand, and has no side-effects. ChangeLog 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai ndex: fold-const.c === --- fold-const.c(revision 179592) +++ fold-const.c(working copy) @@ -8387,6 +8387,33 @@ if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + simple_operand_p (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + !FLOAT_TYPE_P (TREE_TYPE (arg1)) + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) +{ + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); + } + if (!TREE_SIDE_EFFECTS (arg0) + simple_operand_p (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +: TRUTH_OR_EXPR), + type, arg0, arg1); +} + return NULL_TREE; }
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz kti...@redhat.com wrote: Hello, Sorry attached non-updated change. Here with proper attached patch. This patch improves in fold_truth_andor the generation of branch-conditions for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set. If right-hand side operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand operand is a simple operand, and has no side-effects. ChangeLog 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai ndex: fold-const.c === --- fold-const.c (revision 179592) +++ fold-const.c (working copy) @@ -8387,6 +8387,33 @@ if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + simple_operand_p (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT Why only for LOGICAL_OP_NON_SHORT_CIRCUIT? It doesn't make a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ... + !FLOAT_TYPE_P (TREE_TYPE (arg1)) ? I hope we don't have || float. + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). + { + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) Err ... so why do you recurse here (and associate)? Even with different predicates than above ... And similar transforms seem to happen in fold_truthop - did you investigate why it didn't trigger there. And I'm missing a testcase. Richard. + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); + } + if (!TREE_SIDE_EFFECTS (arg0) + simple_operand_p (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, arg0, arg1); + } + return NULL_TREE; }
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz kti...@redhat.com wrote: Hello, Sorry attached non-updated change. Here with proper attached patch. This patch improves in fold_truth_andor the generation of branch-conditions for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set. If right-hand side operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand operand is a simple operand, and has no side-effects. ChangeLog 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai ndex: fold-const.c === --- fold-const.c (revision 179592) +++ fold-const.c (working copy) @@ -8387,6 +8387,33 @@ if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + simple_operand_p (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT Why only for LOGICAL_OP_NON_SHORT_CIRCUIT? It doesn't make a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ... Well, I used this check only for not doing this transformation for targets, which have low-cost branches. This is the same thing as in fold_truthop. It does this transformation only if LOGICAL_OP_NON_SHORT_CIRCUIT is true. + !FLOAT_TYPE_P (TREE_TYPE (arg1)) ? I hope we don't have || float. This can happen. Operands of TRUTH_AND|OR(IF)_EXPR aren't necessarily of integral type. After expansion in gimplifier, we have for sure comparisons, but not in c-tree. + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. This check is the same as in fold_truthop. I used this check. The point here is that floats might trap. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). See inner of if condition for those checks. I moved those checks for arg1 out of the inner conditions to avoid double-checking. + { + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) Err ... so why do you recurse here (and associate)? Even with different predicates than above ... See, here is the missing check. Point is that even if arg0 has side-effects and is a (AND|OR)IF expression, we might be able to associate with right-hand argument of arg0, if for it no side-effects are existing. Otherwise we wouldn't catch this case. We have here in maximum a recursion level of one. And similar transforms seem to happen in fold_truthop - did you investigate why it didn't trigger there. This is pretty simple. The point is that only for comparisons this transformation is done. But in c-tree we don't have here necessarily for TRUTH_(AND|OR)[IF]_EXPR comparison arguments, not necessarily integral ones (see above). And I'm missing a testcase. Ok, I'll add one. Effect can be seen best after gimplification. Richard. + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); + } + if (!TREE_SIDE_EFFECTS (arg0) + simple_operand_p (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, arg0, arg1); + } + return NULL_TREE; } Regards. Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 6 Oct 2011, Richard Guenther wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). He has it in the if() body. But why? The point of ANDIF/ORIF is to not evaluate the second argument for side-effects when the first argument is false/true already, and further to establish an order between both evaluations. The sideeffect on the first arg is always evaluated. AND/OR always evaluate both arguments (in unspecified order), but as he checks the second one for being free of side effects already that alone is already equivalent to ANDIF/ORIF. No need to check something on the first argument. Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 6, 2011 at 3:49 PM, Michael Matz m...@suse.de wrote: Hi, On Thu, 6 Oct 2011, Richard Guenther wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). He has it in the if() body. But why? The point of ANDIF/ORIF is to not evaluate the second argument for side-effects when the first argument is false/true already, and further to establish an order between both evaluations. The sideeffect on the first arg is always evaluated. AND/OR always evaluate both arguments (in unspecified order), but as he checks the second one for being free of side effects already that alone is already equivalent to ANDIF/ORIF. No need to check something on the first argument. It seems to me it should then simply be if (!TREE_SIDE_EFFECTS (arg1) simple_operand_p (arg1)) return fold-the-not-and-variant (); Richard.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, I modified the patch so, that it always just converts two leafs of a TRUTH(AND|OR)IF chain into a TRUTH_(AND|OR) expression, if branch costs are high and leafs are simple without side-effects. Additionally I added some testcases for it. 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. 2011-10-06 Kai Tietz kti...@redhat.com * gcc.dg/tree-ssa/ssa-ifbranch-1.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-2.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-3.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-4.c: New test. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-* mn10300-*-* } } } */ +/* { dg-options -O2 -fdump-tree-gimple } */ +/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-* ilp32 } } } */ + +extern int doo1 (void); +extern int doo2 (void); + +int bar (int a, int b, int c) +{ + if (a b c) +return doo1 (); + return doo2 (); +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/fold-const.c === --- gcc-head.orig/gcc/fold-const.c +++ gcc-head/gcc/fold-const.c @@ -8387,6 +8387,45 @@ fold_truth_andor (location_t loc, enum t if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + /* floats might trap. */ + !FLOAT_TYPE_P (TREE_TYPE (arg1)) + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison +TREE_CODE (arg1) != TRUTH_NOT_EXPR +simple_operand_p (arg1)) + || ((TREE_CODE_CLASS (TREE_CODE (arg1)) == tcc_comparison + || TREE_CODE (arg1) == TRUTH_NOT_EXPR) + /* Float comparison might trap. */ + !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0))) + simple_operand_p (TREE_OPERAND (arg1, 0) +{ + /* We want to combine truth-comparison for +((W TRUTH-ANDOR X) TRUTH-ANDORIF Y) TRUTH-ANDORIF Z, +if Y and Z are simple operands and have no side-effect to +((W TRUTH-ANDOR X) TRUTH-IF (Y TRUTH-ANDOR Z). */ + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), +tem); + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y +are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +: TRUTH_OR_EXPR), + type, arg0, arg1); +} + return NULL_TREE; } Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-* mn10300-*-* } } } */ +/* { dg-options -O2 -fdump-tree-gimple } */ +/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-* ilp32 } } } */ + +extern int doo1 (void); +extern int doo2 (void); + +int bar (int a, int b, int c, int d) +{ + if (a b c d) +return doo1 (); + return doo2 (); +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-*
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Michael Matz m...@suse.de: Hi, On Thu, 6 Oct 2011, Richard Guenther wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). He has it in the if() body. But why? The point of ANDIF/ORIF is to not evaluate the second argument for side-effects when the first argument is false/true already, and further to establish an order between both evaluations. The sideeffect on the first arg is always evaluated. AND/OR always evaluate both arguments (in unspecified order), but as he checks the second one for being free of side effects already that alone is already equivalent to ANDIF/ORIF. No need to check something on the first argument. Ciao, Michael. That's not the hole story. The difference between TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: That's not the hole story. The difference between TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't. Yes, of course. That is what implements the short-circuit semantics. But as Richard already mentioned I also don't understand why you do the reassociation at that point. Why not simply rewrite ANDIF - AND (when possible, i.e. no sideeffects on arg1, and desirable, i.e. when LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other folders do reassociation? I ask because your comments states to transform: ((W AND X) ANDIF Y) ANDIF Z into (W AND X) ANDIF (Y AND Z) (under condition that Y and Z are simple operands). In fact you don't check the form of arg0,0, i.e. the W AND X here. Independend of that it doesn't make sense, because if Y and Z are easy (simple and no side-effects), then Y AND Z is too, and therefore you should transform this (if at all) into: (W AND X) AND (Y AND Z) at which point this association doesn't make sense anymore, as ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Michael Matz m...@suse.de: Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: That's not the hole story. The difference between TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't. Yes, of course. That is what implements the short-circuit semantics. But as Richard already mentioned I also don't understand why you do the reassociation at that point. Why not simply rewrite ANDIF - AND (when possible, i.e. no sideeffects on arg1, and desirable, i.e. when LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other folders do reassociation? I ask because your comments states to transform: ((W AND X) ANDIF Y) ANDIF Z into (W AND X) ANDIF (Y AND Z) (under condition that Y and Z are simple operands). In fact you don't check the form of arg0,0, i.e. the W AND X here. Independend of that it doesn't make sense, because if Y and Z are easy (simple and no side-effects), then Y AND Z is too, and therefore you should transform this (if at all) into: (W AND X) AND (Y AND Z) at which point this association doesn't make sense anymore, as Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and therefore it isn't transformed into and AND. ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. For example if (a1 a2 a3 a100) return 1; would be packed by this patch into 50 branches. If we would modify all of them into AND, then we would calculate for all 100 values the result, even if the first a1 is zero. This doesn't improve speed pretty well. But you are right, that from the point of reassociation optimization it could be in some cases more profitable to have packed all elements into on AND-chain. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote: None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. For example if (a1 a2 a3 a100) return 1; would be packed by this patch into 50 branches. If we would modify all of them into AND, then we would calculate for all 100 values the result, even if the first a1 is zero. This doesn't improve speed pretty well. But you are right, that from the point of reassociation optimization it could be in some cases more profitable to have packed all elements into on AND-chain. Yeah. Perhaps we should break them up after reassoc2, or on the other side teach reassoc (or some other pass) to be able to do the optimizations on a series of GIMPLE_COND with no side-effects in between. See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a == 4; isn't optimized into (a - 1U) 4U, although it could, if branch cost cause it to be broken up into several GIMPLE_COND stmts. Or if user writes: if (a == 3) return 1; if (a == 1) return 1; if (a == 2) return 1; if (a == 4) return 1; return 0; (more probably using enums). Jakub
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: at which point this association doesn't make sense anymore, as Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and therefore it isn't transformed into and AND. Right ... ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. ... and I see that (that's why the transformation should be desirable for some definition of desirable, which probably includes and RHS not too long chain). As it stands right now your transformation seems to be a fairly ad-hoc try at avoiding this problem. That's why I wonder why to do the reassoc at all? Which testcases break _without_ the reassociation, i.e. with only rewriting ANDIF - AND at the outermost level? Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 09:37, Jakub Jelinek wrote: On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote: None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. For example if (a1 a2 a3 a100) return 1; would be packed by this patch into 50 branches. If we would modify all of them into AND, then we would calculate for all 100 values the result, even if the first a1 is zero. This doesn't improve speed pretty well. But you are right, that from the point of reassociation optimization it could be in some cases more profitable to have packed all elements into on AND-chain. Yeah. Perhaps we should break them up after reassoc2, or on the other side teach reassoc (or some other pass) to be able to do the optimizations on a series of GIMPLE_COND with no side-effects in between. See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a == 4; isn't optimized into (a - 1U) 4U, although it could, if branch cost cause it to be broken up into several GIMPLE_COND stmts. Or if user writes: if (a == 3) return 1; if (a == 1) return 1; if (a == 2) return 1; if (a == 4) return 1; return 0; (more probably using enums). I haven't followed this thread as closely as perhaps I should; what I'm seeing discussed now looks a lot like condition merging and I'm pretty sure there's some research in this area that might guide us. multi-variable condition merging is the term the authors used. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOjeYFAAoJEBRtltQi2kC7eFMIALjFM/GIg1DDryU59EoFQe5A x7pvx3FSlcjLWeyIlzYJvWF4wGybRNNXp5qziIedO6qp0Z/06VvCU07A10VoWSig /EFufo5l+Jef5s1d0mA6mBJ9A52HDL2ipOK8YDQbVzJWqHdaXLrrzUni3wGwcUVs v3UIi5OevjRhJ55fRVxBcReJKF6YAzxFDxqOnVGAbf9R3BEJ2T9JW2CLhIcd/T1L D9K+6YymHaN9eYh7B7gPKG88q+5JjcStHuMQODKSAegt3T4iP9CH/G5dV8u95Y+q 6mxo8gOGAwYR7N/U6fuXRaGJEzWSdrqRy2EBF5B7+Rt6lSWXdfzUEBusT24i67A= =HIrU -END PGP SIGNATURE-
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Michael Matz m...@suse.de: Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: at which point this association doesn't make sense anymore, as Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and therefore it isn't transformed into and AND. Right ... ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) None. I had this implemented first. But Richard was concerned about making non-IF conditions too long. I understand that point that it might not that good to always modify unconditionally to AND/OR chain. ... and I see that (that's why the transformation should be desirable for some definition of desirable, which probably includes and RHS not too long chain). As it stands right now your transformation seems to be a fairly ad-hoc try at avoiding this problem. That's why I wonder why to do the reassoc at all? Which testcases break _without_ the reassociation, i.e. with only rewriting ANDIF - AND at the outermost level? I don't do here reassociation in inner. See that patch calls build2_loc, and not fold_build2_loc anymore. So it doesn't retries to associate in inner anymore (which might be something of interest for the issue Jakub mentioned). There is no test actual failing AFAICS. I just noticed size-differences by this. Nevertheless it might be better to enhance reassociation pass to break-up (and repropagate) GIMPLE_CONDs with non-side-effect, as Jakub suggested. The other chance might be here to allow deeper chains then two elements within one AND/OR element, but this would be architecture dependent. For x86 -as example- used instruction cycles for a common for branching would suggest that it might be profitable to have here 3 or 4 leafs within one AND|OR chain. But for sure on other architectures the amount of leafs varies. Regards, Kai