Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law l...@redhat.com wrote: On 10/15/13 10:53, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Yea, looking at it now, I'm a bit surprised this hasn't been converted to a target hook which would avoid this problem entirely. You got the job! ;) Richard. jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/16/13 02:21, Richard Biener wrote: On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law l...@redhat.com wrote: On 10/15/13 10:53, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Yea, looking at it now, I'm a bit surprised this hasn't been converted to a target hook which would avoid this problem entirely. You got the job! Joys :-) What's the policy on the GDFL stuff. I've tried so hard to avoid having to worry about that rats nest that I have no idea what our policy is. Basically I just want to take the old docs for BRANCH_COST and re-use those. Is that considered kosher with all the GDFL nonsense? Jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Wed, 16 Oct 2013, Jeff Law wrote: What's the policy on the GDFL stuff. I've tried so hard to avoid having to worry about that rats nest that I have no idea what our policy is. Basically I just want to take the old docs for BRANCH_COST and re-use those. Is that considered kosher with all the GDFL nonsense? When moving documentation text from the manual into target.def (so it ends up in both target.def and tm.texi, with tm.texi.in just having an @hook line to show where the regeneration of tm.texi should put the text), CC one of the people listed as docstring relicensing maintainers and ask us to approve the GPL relicensing in that case. -- Joseph S. Myers jos...@codesourcery.com
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Monday, October 14, 2013 4:49 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote: @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, struct range_entry *otherrange, return true; } +/* Optimize X == CST1 || X == CST2 + if popcount (CST1 ^ CST2) == 1 into + (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)). + Similarly for ranges. E.g. + X != 2 X != 3 X != 10 X != 11 + will be transformed by the previous optimization into + (X - 2U) = 1U (X - 10U) = 1U + and this loop can transform that into + ((X ~8) - 2U) = 1U. */ + +static bool +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type, + tree lowi, tree lowj, tree highi, tree highj, + vecoperand_entry_t *ops, + struct range_entry *ranges) The function names are bad, you aren't transfering anything, but optimizing. Please rename try_transfer_range_tests to optimize_range_tests_1 and try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps better yet to optimize_range_tests_{xor,diff}. try_transfer_range_tests is changed to optimize_range_tests_1 try_transfer_range_tests_1 is changed to optimize_range_tests_xor try_transfer_range_tests_2 is changed to optimize_range_tests_diff Also, perhaps instead of passing ranges and i and j to these two functions you could pass struct range_entry *range1, struct range_entry *range2 (caller would pass ranges + i, ranges + j). Updated to rangei and rangej since we use i, j, lowi, lowj etc at other places. +/* It does some common checks for function try_transfer_range_tests_1 and + try_transfer_range_tests_2. Please adjust the comment for the renaming. Please change trans_option to bool optimize_xor. Updated. + if (trans_option == 1) + { + if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj, + highi, highj, ops, ranges)) + { + any_changes = true; + break; + } + } + else if (trans_option == 2) + { + if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj, + highi, highj, ops, ranges)) + { + any_changes = true; + break; + } + } I'd prefer if (optimize_xor) any_changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); else any_changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); if (any_changes) break; This is incorrect. The any_changes should be |= of the return values. In the updated patch, I changed the code segment as bool changes; ... if (optimize_xor) changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); else changes = optimize_range_tests_diff (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); if (changes) { any_changes = true; break; } Is it OK? Thanks! -Zhenqiang Ok with those changes. Jakub reassoc-final.patch Description: Binary data
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote: Is it OK? Ok, except two comments apparently still need updating. +/* Optimize X == CST1 || X == CST2 + if popcount (CST1 ^ CST2) == 1 into + (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)). + Similarly for ranges. E.g. + X != 2 X != 3 X != 10 X != 11 + will be transformed by the previous optimization into + (X - 2U) = 1U (X - 10U) = 1U + and this loop can transform that into + ((X ~8) - 2U) = 1U. */ Here the example is using != and , so it is transformed into: !((X - 2U) = 1U || (X - 10U) = 1U) (everything is negated at the end, and note || instead of , with it could fold into !(0)) and finally into: !(((X ~8) - 2U) = 1U) Or alternatively change the first expression into: X == 2 || X == 3 || X == 10 || X == 11 and the second to: (X - 2U) = 1U || (X - 10U) = 1U and the third keep as is. +/* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. + Similarly for ranges. E.g. + X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 + || X == 75 || X == 45 + will be transformed by the previous optimization into + (X - 43U) = 3U (X - 75U) = 3U + and this loop can transform that into + ((X - 43U) ~(75U - 43U)) = 3U. */ Here the example is using == and ||, so the only wrong thing in there is that the second expression should be (X - 43U) = 3U || (X - 75U) = 3U Ok with those changes. Jakub
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/11/13 20:11, Zhenqiang Chen wrote: -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Friday, October 11, 2013 1:20 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On 10/10/13 03:25, Zhenqiang Chen wrote: It comes from Coremark. The code is: if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-') I should have guessed ;-) For ARM, there are three instructions rather than 4 (in thumb state). For some older gcc, I got performance improvement on ARM chromebook. But I can not reproduce the performance gain now. I will collect logs during GCC bootstrap. Thanks. It doesn't have to be anything particularly complex. When I'm looking at a transformation I usually just put a printf when it triggers and grep for the string in a make log. Thanks. I check the make log. There are 1906 occurrences. Wow. That's awsome. Thanks, jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/15/13 02:12, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote: Is it OK? Ok, except two comments apparently still need updating. +/* Optimize X == CST1 || X == CST2 + if popcount (CST1 ^ CST2) == 1 into + (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)). + Similarly for ranges. E.g. + X != 2 X != 3 X != 10 X != 11 + will be transformed by the previous optimization into + (X - 2U) = 1U (X - 10U) = 1U + and this loop can transform that into + ((X ~8) - 2U) = 1U. */ Here the example is using != and , so it is transformed into: !((X - 2U) = 1U || (X - 10U) = 1U) (everything is negated at the end, and note || instead of , with it could fold into !(0)) and finally into: !(((X ~8) - 2U) = 1U) Or alternatively change the first expression into: X == 2 || X == 3 || X == 10 || X == 11 and the second to: (X - 2U) = 1U || (X - 10U) = 1U and the third keep as is. +/* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. + Similarly for ranges. E.g. + X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 + || X == 75 || X == 45 + will be transformed by the previous optimization into + (X - 43U) = 3U (X - 75U) = 3U + and this loop can transform that into + ((X - 43U) ~(75U - 43U)) = 3U. */ Here the example is using == and ||, so the only wrong thing in there is that the second expression should be (X - 43U) = 3U || (X - 75U) = 3U Ok with those changes. I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Jakub
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/15/13 10:53, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Yea, looking at it now, I'm a bit surprised this hasn't been converted to a target hook which would avoid this problem entirely. jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/15/13 02:12, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote: Is it OK? Ok, except two comments apparently still need updating. +/* Optimize X == CST1 || X == CST2 + if popcount (CST1 ^ CST2) == 1 into + (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)). + Similarly for ranges. E.g. + X != 2 X != 3 X != 10 X != 11 + will be transformed by the previous optimization into + (X - 2U) = 1U (X - 10U) = 1U + and this loop can transform that into + ((X ~8) - 2U) = 1U. */ Here the example is using != and , so it is transformed into: !((X - 2U) = 1U || (X - 10U) = 1U) (everything is negated at the end, and note || instead of , with it could fold into !(0)) and finally into: !(((X ~8) - 2U) = 1U) Or alternatively change the first expression into: X == 2 || X == 3 || X == 10 || X == 11 and the second to: (X - 2U) = 1U || (X - 10U) = 1U and the third keep as is. +/* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. + Similarly for ranges. E.g. + X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 + || X == 75 || X == 45 + will be transformed by the previous optimization into + (X - 43U) = 3U (X - 75U) = 3U + and this loop can transform that into + ((X - 43U) ~(75U - 43U)) = 3U. */ Here the example is using == and ||, so the only wrong thing in there is that the second expression should be (X - 43U) = 3U || (X - 75U) = 3U Ok with those changes. I fixed up the comments ChangeLog entry and committed on Zhenqiang's behalf. jeff
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
Patch and test cases are updated according to your comments. Bootstrap and no make check regression on X86-64. ChangeLog: 2013-10-14 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (try_transfer_range_tests_1): New function, extracted from optimize_range_tests * (try_transfer_range_tests_2): New function. * (try_transfer_range_tests): New function, extracted from optimize_range_tests and calls try_transfer_range_tests_1 and try_transfer_range_tests_2, * (optimize_range_tests): Use try_transfer_range_tests. testsuite/ChangeLog: 2013-10-14 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/tree-ssa/reassoc-32.c: New test case. * gcc.dg/tree-ssa/reassoc-33.c: New test case. * gcc.dg/tree-ssa/reassoc-34.c: New test case. * gcc.dg/tree-ssa/reassoc-35.c: New test case. * gcc.dg/tree-ssa/reassoc-36.c: New test case. -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Saturday, October 12, 2013 3:40 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote: As you had mentioned, the transition in this patch does not reduce instructions. But the preexisting optimization does. So we prefer to do the preexisting optimization first. E.g. X == 10 || X == 12 || X == 26 Ok, that makes sense. @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode, } } + /* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. */ Mention here also similarly to the other comment that it works also with ranges. + if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) +for (i = first; i length; i++) + { + tree lowi, highi, lowj, highj, type, tem1, tem2, mask; + + if (ranges[i].exp == NULL_TREE || ranges[i].in_p) + continue; + type = TREE_TYPE (ranges[i].exp); + if (!INTEGRAL_TYPE_P (type)) + continue; + lowi = ranges[i].low; + if (lowi == NULL_TREE) + continue; The other loop has here: if (lowi == NULL_TREE) lowi = TYPE_MIN_VALUE (type); instead, which is better, can you please change it? + highi = ranges[i].high; + if (highi == NULL_TREE) + continue; + for (j = i + 1; j length j i + 64; j++) + { + lowj = ranges[j].low; + if (lowj == NULL_TREE) + continue; + highj = ranges[j].high; + if (highj == NULL_TREE) + continue; The other loop has if (highj == NULL_TREE) highj = TYPE_MAX_VALUE (type); here instead. + if (ranges[j].exp == NULL_TREE || ranges[j].in_p + || (ranges[i].exp != ranges[j].exp)) + continue; Can you please move this test the lowj = assignment, and remove the ranges[j].exp == NULL_TREE test (both here and in the earlier loop)? I mean, we've checked at the beginning of for (i = first; i length; i++) loop that ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will be different than ranges[i].exp. So if (ranges[i].exp != ranges[j].exp || ranges[j].in_p) continue; (and please also collapse the two checks in the first loop, so that both look similar). + /* Check lowj highi. */ + tem1 = fold_binary (GT_EXPR, boolean_type_node, +lowj, highi); + if (tem1 == NULL_TREE || !integer_onep (tem1)) + continue; + /* Check highi - lowi == highj - lowj. */ + tem1 = fold_binary (MINUS_EXPR, type, highi, lowi); + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) + continue; + tem2 = fold_binary (MINUS_EXPR, type, highj, lowj); + if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST) + continue; + if (!tree_int_cst_equal (tem1, tem2)) + continue; + /* Check popcount (lowj - lowi) == 1. */ + tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi); + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) + continue; + if (tree_log2 (tem1) 0) + continue; + mask = fold_build1 (BIT_NOT_EXPR, type, tem1); + tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi); + tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask); + lowi = build_int_cst (type, 0); Please use lowj instead of lowi here. Because, if update_range_test fails, we continue looking for further matches in following ranges, and while lowj will be computed again, lowi will be assumed to contain the low bound of the first range. + if (update_range_test (ranges + i, ranges + j, 1, opcode, ops
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote: @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, struct range_entry *otherrange, return true; } +/* Optimize X == CST1 || X == CST2 + if popcount (CST1 ^ CST2) == 1 into + (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)). + Similarly for ranges. E.g. + X != 2 X != 3 X != 10 X != 11 + will be transformed by the previous optimization into + (X - 2U) = 1U (X - 10U) = 1U + and this loop can transform that into + ((X ~8) - 2U) = 1U. */ + +static bool +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type, + tree lowi, tree lowj, tree highi, tree highj, + vecoperand_entry_t *ops, + struct range_entry *ranges) The function names are bad, you aren't transfering anything, but optimizing. Please rename try_transfer_range_tests to optimize_range_tests_1 and try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps better yet to optimize_range_tests_{xor,diff}. Also, perhaps instead of passing ranges and i and j to these two functions you could pass struct range_entry *range1, struct range_entry *range2 (caller would pass ranges + i, ranges + j). +/* It does some common checks for function try_transfer_range_tests_1 and + try_transfer_range_tests_2. Please adjust the comment for the renaming. Please change trans_option to bool optimize_xor. + if (trans_option == 1) + { + if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj, + highi, highj, ops, ranges)) + { + any_changes = true; + break; + } + } + else if (trans_option == 2) + { + if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj, + highi, highj, ops, ranges)) + { + any_changes = true; + break; + } + } I'd prefer if (optimize_xor) any_changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); else any_changes = optimize_range_tests_xor (opcode, type, lowi, lowj, highi, highj, ops, ranges + i, ranges + j); if (any_changes) break; Ok with those changes. Jakub
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote: As you had mentioned, the transition in this patch does not reduce instructions. But the preexisting optimization does. So we prefer to do the preexisting optimization first. E.g. X == 10 || X == 12 || X == 26 Ok, that makes sense. @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode, } } + /* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. */ Mention here also similarly to the other comment that it works also with ranges. + if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) +for (i = first; i length; i++) + { + tree lowi, highi, lowj, highj, type, tem1, tem2, mask; + + if (ranges[i].exp == NULL_TREE || ranges[i].in_p) + continue; + type = TREE_TYPE (ranges[i].exp); + if (!INTEGRAL_TYPE_P (type)) + continue; + lowi = ranges[i].low; + if (lowi == NULL_TREE) + continue; The other loop has here: if (lowi == NULL_TREE) lowi = TYPE_MIN_VALUE (type); instead, which is better, can you please change it? + highi = ranges[i].high; + if (highi == NULL_TREE) + continue; + for (j = i + 1; j length j i + 64; j++) + { + lowj = ranges[j].low; + if (lowj == NULL_TREE) + continue; + highj = ranges[j].high; + if (highj == NULL_TREE) + continue; The other loop has if (highj == NULL_TREE) highj = TYPE_MAX_VALUE (type); here instead. + if (ranges[j].exp == NULL_TREE || ranges[j].in_p + || (ranges[i].exp != ranges[j].exp)) + continue; Can you please move this test the lowj = assignment, and remove the ranges[j].exp == NULL_TREE test (both here and in the earlier loop)? I mean, we've checked at the beginning of for (i = first; i length; i++) loop that ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will be different than ranges[i].exp. So if (ranges[i].exp != ranges[j].exp || ranges[j].in_p) continue; (and please also collapse the two checks in the first loop, so that both look similar). + /* Check lowj highi. */ + tem1 = fold_binary (GT_EXPR, boolean_type_node, + lowj, highi); + if (tem1 == NULL_TREE || !integer_onep (tem1)) + continue; + /* Check highi - lowi == highj - lowj. */ + tem1 = fold_binary (MINUS_EXPR, type, highi, lowi); + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) + continue; + tem2 = fold_binary (MINUS_EXPR, type, highj, lowj); + if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST) + continue; + if (!tree_int_cst_equal (tem1, tem2)) + continue; + /* Check popcount (lowj - lowi) == 1. */ + tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi); + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST) + continue; + if (tree_log2 (tem1) 0) + continue; + mask = fold_build1 (BIT_NOT_EXPR, type, tem1); + tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi); + tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask); + lowi = build_int_cst (type, 0); Please use lowj instead of lowi here. Because, if update_range_test fails, we continue looking for further matches in following ranges, and while lowj will be computed again, lowi will be assumed to contain the low bound of the first range. + if (update_range_test (ranges + i, ranges + j, 1, opcode, ops, tem1, + ranges[i].in_p, lowi, tem2, And here too. Or, alternatively to avoid duplication, you could put the whole for (i = first; i length; i++) loop from the first optimization into another int i; for (pass = 0; pass (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) + 1; pass++) { } loop, use whatever is common in the loop unconditionally, and just conditionalize the rest on pass == 0 vs. pass == 1. Or maybe even better just move this whole loop into a new function, with ranges, opcode, first, length arguments plus another (bool?) argument which of the two optimizations you want to perform, and return the bool any_changes. Then you wouldn't run into line length issues that you could run into with the extra loop. --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c @@ -0,0 +1,38 @@ +/* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*} } } */ + +/* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */ +/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */ + +int
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Thursday, October 10, 2013 7:13 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote: ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase. + /* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. */ if (!any_changes + (opcode == BIT_IOR_EXPR)) Wouldn't it be better to put it into the same loop that handles the other merges, rather than separate? As you had mentioned, the transition in this patch does not reduce instructions. But the preexisting optimization does. So we prefer to do the preexisting optimization first. E.g. X == 10 || X == 12 || X == 26 The preexisting one will optimize X == 10 || X == 26. And the patch will transfer X == 10 || X == 12. So I prefer to separate them. Why the !any_changes guard? Why opcode == BIT_IOR_EXPR guard? Can't you optimize X != CST1 X != CST2 if popcount (CST2 - CST1) == 1 similarly into ((X - CST1) ~(CST2 - CST1)) != 0? They are not necessary. Patch is updated to check only the BRANCH_COST. Also, like the preexisting optimization has been generalized for ranges, can't you generalize this one too? I mean if you have say: (X - 43U) = 3U || (X - 75U) = 3U (originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 || X == 75 || X == 45) where popcount (75U - 43U) == 1, can't this be: ((X - 43U) ~(75U - 43U)) = 3U For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2] the requirements for this optimization would be 1) LOWCST2 HIGHCST1 2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2 3) popcount (LOWCST2 - LOWCST1) == 1 1) and 2) are the same requirements for the other optimization, while 3) would be specific to this and would be used only if popcount (LOWCST1 ^ LOWCST2) != 1. Because of 1), LOWCST2 - LOWCST1 should be bigger than HIGHCST1 - LOWCST1 (i.e. the value we = against in the end), thus IMHO it should work fine even after generalization. Patch is updated. +if (tree_log2 (tem) 0) + continue; This is I guess cheaper than what I was doing there: gcc_checking_assert (!integer_zerop (lowxor)); tem = fold_binary (MINUS_EXPR, type, lowxor, build_int_cst (type, 1)); if (tem == NULL_TREE) continue; tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem); if (tem == NULL_TREE || !integer_zerop (tem)) continue; to check for popcount (x) == 1. Can you replace it even in the preexisting code? if (tree_log2 (lowxor) 0) continue; Updated. Of course, if the two optimizations are merged into the same loop, then some of the continue will need to be replaced by just conditional code inside of the loop, handling the two different optimizations. Thanks for working on this. Jakub Bootstrap and no make check regression on X86-64 and ARM. Thanks! -Zhenqiang reassoc-new.patch Description: Binary data
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Friday, October 11, 2013 1:20 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On 10/10/13 03:25, Zhenqiang Chen wrote: It comes from Coremark. The code is: if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-') I should have guessed ;-) For ARM, there are three instructions rather than 4 (in thumb state). For some older gcc, I got performance improvement on ARM chromebook. But I can not reproduce the performance gain now. I will collect logs during GCC bootstrap. Thanks. It doesn't have to be anything particularly complex. When I'm looking at a transformation I usually just put a printf when it triggers and grep for the string in a make log. Thanks. I check the make log. There are 1906 occurrences. Depending on what I'm doing, I may dig more deeply into the situations where its triggering to make sure I fully understand the primary and secondary effects of a transformation which often leads to tweaking the implementation. I don't think that level of rigor is needed here. Jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Thu, Oct 10, 2013 at 5:04 AM, Jeff Law l...@redhat.com wrote: On 08/05/13 02:08, Zhenqiang Chen wrote: Hi The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. Bootstrap on x86-64 and ARM chromebook. No make check regression on x86-64 and panda board. For some targets/options, the (X == CST1 || X == CST2) might be converted to if (x == CST1) else if (x == CST2) at the beginning. In such case, the patch will not be executed. It is hard to write a reliable testcase to detect the transformation. So the testcase does not check the dump logs from reassoc1 pass. It just checks the runtime result. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase. This is an interesting transformation. I suspect we'd want to gate it on something like BRANCH_COST. For a related example, see how we handle (a != 0) || (b != 0) - (a | b) != 0 in fold-const.c. I suspect the conditions under which we want to do your transformation are going to be similar if not the same as those transformations in fold-const.c. Note I've been suggesting the bits I'm referring to in fold-const.c move out into the tree-ssa optimizers. If they fit well into tree-ssa-reassoc.c I'd look favorably upon a patch which moved them. In particular I strongly vote against putting more target dependent transformations into fold-const.c. Richard. As far as testing, the way to go will be to look at the reassoc1 dumps, but skip the test on targets with the wrong branch cost. tree-ssa/vrp87.c has a suitable condition to filter out the test on targets were the branch cost is too small. Out of curiosity, what prompted you to make this transformation? Was it mostly to deal with a codesize problem or is it a significant win on some hot code in a benchmark, or something else? Closely related, have you done anything like instrument the transformation to see how often it applies during a GCC bootstrap? jeff
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Thursday, October 10, 2013 11:05 AM To: Zhenqiang Chen; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On 08/05/13 02:08, Zhenqiang Chen wrote: Hi The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. Bootstrap on x86-64 and ARM chromebook. No make check regression on x86-64 and panda board. For some targets/options, the (X == CST1 || X == CST2) might be converted to if (x == CST1) else if (x == CST2) at the beginning. In such case, the patch will not be executed. It is hard to write a reliable testcase to detect the transformation. So the testcase does not check the dump logs from reassoc1 pass. It just checks the runtime result. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase. This is an interesting transformation. I suspect we'd want to gate it on something like BRANCH_COST. For a related example, see how we handle (a != 0) || (b != 0) - (a | b) != 0 in fold-const.c. I suspect the conditions under which we want to do your transformation are going to be similar if not the same as those transformations in fold-const.c. Thank you for the comments. I will update the patch to check BRANCH_COST. Note I've been suggesting the bits I'm referring to in fold-const.c move out into the tree-ssa optimizers. If they fit well into tree-ssa-reassoc.c I'd look favorably upon a patch which moved them. The code is similar with the code (in tree-ssa-reassoc.c) for Optimize X == CST1 || X == CST2 if popcount (CST1 ^ CST2) == 1 into (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)) As far as testing, the way to go will be to look at the reassoc1 dumps, but skip the test on targets with the wrong branch cost. tree-ssa/vrp87.c has a suitable condition to filter out the test on targets were the branch cost is too small. I will update the test case. Out of curiosity, what prompted you to make this transformation? Was it mostly to deal with a codesize problem or is it a significant win on some hot code in a benchmark, or something else? Closely related, have you done anything like instrument the transformation to see how often it applies during a GCC bootstrap? It comes from Coremark. The code is: if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-') For ARM, there are three instructions rather than 4 (in thumb state). For some older gcc, I got performance improvement on ARM chromebook. But I can not reproduce the performance gain now. I will collect logs during GCC bootstrap. Thanks! -Zhenqiang
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Thu, Oct 10, 2013 at 05:25:01PM +0800, Zhenqiang Chen wrote: Note I've been suggesting the bits I'm referring to in fold-const.c move out into the tree-ssa optimizers. If they fit well into tree-ssa-reassoc.c I'd look favorably upon a patch which moved them. The code is similar with the code (in tree-ssa-reassoc.c) for Optimize X == CST1 || X == CST2 if popcount (CST1 ^ CST2) == 1 into (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)) Yeah. Though, that is one operation cheaper than this, the above is replacing two comparisons with constants plus one || with one arithmetic operation (with constant) plus one comparison of constant, I think that should be always a win (ok, it is one arithmetic operation more expensive if X == CST1 all the time, but otherwise it is likely cheaper). While in your case it is two arithmetic operations plus comparison with constant, so perhaps for very cheap BRANCH_COST it might not be beneficial. Jakub
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Mon, Aug 05, 2013 at 01:39:50AM -0700, Andrew Pinski wrote: Seems like a better place to put this is inside tree-ssa-ifcombine.c which handles the case where if(a || b) is split up into if(a) else if(b). Moving it into tree-ssa-ifcombine.c allows for code to be optimized which was written using the if(a) else if(b) form. No, tree-ssa-reassoc.c now handles both intra-bb and inter-bb range test optimizations. Jakub
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote: ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase. + /* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. */ + if (!any_changes (opcode == BIT_IOR_EXPR)) Wouldn't it be better to put it into the same loop that handles the other merges, rather than separate? Why the !any_changes guard? Why opcode == BIT_IOR_EXPR guard? Can't you optimize X != CST1 X != CST2 if popcount (CST2 - CST1) == 1 similarly into ((X - CST1) ~(CST2 - CST1)) != 0? Also, like the preexisting optimization has been generalized for ranges, can't you generalize this one too? I mean if you have say: (X - 43U) = 3U || (X - 75U) = 3U (originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 || X == 75 || X == 45) where popcount (75U - 43U) == 1, can't this be: ((X - 43U) ~(75U - 43U)) = 3U For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2] the requirements for this optimization would be 1) LOWCST2 HIGHCST1 2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2 3) popcount (LOWCST2 - LOWCST1) == 1 1) and 2) are the same requirements for the other optimization, while 3) would be specific to this and would be used only if popcount (LOWCST1 ^ LOWCST2) != 1. Because of 1), LOWCST2 - LOWCST1 should be bigger than HIGHCST1 - LOWCST1 (i.e. the value we = against in the end), thus IMHO it should work fine even after generalization. + if (tree_log2 (tem) 0) + continue; This is I guess cheaper than what I was doing there: gcc_checking_assert (!integer_zerop (lowxor)); tem = fold_binary (MINUS_EXPR, type, lowxor, build_int_cst (type, 1)); if (tem == NULL_TREE) continue; tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem); if (tem == NULL_TREE || !integer_zerop (tem)) continue; to check for popcount (x) == 1. Can you replace it even in the preexisting code? if (tree_log2 (lowxor) 0) continue; ? Of course, if the two optimizations are merged into the same loop, then some of the continue will need to be replaced by just conditional code inside of the loop, handling the two different optimizations. Thanks for working on this. Jakub
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/10/13 03:25, Zhenqiang Chen wrote: It comes from Coremark. The code is: if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-') I should have guessed ;-) For ARM, there are three instructions rather than 4 (in thumb state). For some older gcc, I got performance improvement on ARM chromebook. But I can not reproduce the performance gain now. I will collect logs during GCC bootstrap. Thanks. It doesn't have to be anything particularly complex. When I'm looking at a transformation I usually just put a printf when it triggers and grep for the string in a make log. Depending on what I'm doing, I may dig more deeply into the situations where its triggering to make sure I fully understand the primary and secondary effects of a transformation which often leads to tweaking the implementation. I don't think that level of rigor is needed here. Jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/10/13 04:10, Jakub Jelinek wrote: On Thu, Oct 10, 2013 at 05:25:01PM +0800, Zhenqiang Chen wrote: Note I've been suggesting the bits I'm referring to in fold-const.c move out into the tree-ssa optimizers. If they fit well into tree-ssa-reassoc.c I'd look favorably upon a patch which moved them. The code is similar with the code (in tree-ssa-reassoc.c) for Optimize X == CST1 || X == CST2 if popcount (CST1 ^ CST2) == 1 into (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)) Yeah. Though, that is one operation cheaper than this, the above is replacing two comparisons with constants plus one || with one arithmetic operation (with constant) plus one comparison of constant, I think that should be always a win (ok, it is one arithmetic operation more expensive if X == CST1 all the time, but otherwise it is likely cheaper). While in your case it is two arithmetic operations plus comparison with constant, so perhaps for very cheap BRANCH_COST it might not be beneficial. Wow, I had no idea we had that kind of transformation in tree-reassoc already. Jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/10/13 05:12, Jakub Jelinek wrote: On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote: ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase. + /* Optimize X == CST1 || X == CST2 + if popcount (CST2 - CST1) == 1 into + ((X - CST1) ~(CST2 - CST1)) == 0. */ + if (!any_changes (opcode == BIT_IOR_EXPR)) Wouldn't it be better to put it into the same loop that handles the other merges, rather than separate? Why the !any_changes guard? Why opcode == BIT_IOR_EXPR guard? Can't you optimize X != CST1 X != CST2 if popcount (CST2 - CST1) == 1 similarly into ((X - CST1) ~(CST2 - CST1)) != 0? Also, like the preexisting optimization has been generalized for ranges, can't you generalize this one too? I mean if you have say: (X - 43U) = 3U || (X - 75U) = 3U (originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 || X == 75 || X == 45) where popcount (75U - 43U) == 1, can't this be: ((X - 43U) ~(75U - 43U)) = 3U For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2] the requirements for this optimization would be 1) LOWCST2 HIGHCST1 2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2 3) popcount (LOWCST2 - LOWCST1) == 1 1) and 2) are the same requirements for the other optimization, while 3) would be specific to this and would be used only if popcount (LOWCST1 ^ LOWCST2) != 1. Because of 1), LOWCST2 - LOWCST1 should be bigger than HIGHCST1 - LOWCST1 (i.e. the value we = against in the end), thus IMHO it should work fine even after generalization. + if (tree_log2 (tem) 0) + continue; This is I guess cheaper than what I was doing there: gcc_checking_assert (!integer_zerop (lowxor)); tem = fold_binary (MINUS_EXPR, type, lowxor, build_int_cst (type, 1)); if (tem == NULL_TREE) continue; tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem); if (tem == NULL_TREE || !integer_zerop (tem)) continue; to check for popcount (x) == 1. Can you replace it even in the preexisting code? if (tree_log2 (lowxor) 0) continue; ? Of course, if the two optimizations are merged into the same loop, then some of the continue will need to be replaced by just conditional code inside of the loop, handling the two different optimizations. Thanks for working on this. Clearly you should be the one reviewing this patch Jakub ;-) I'm handing it off to you. jeff
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 08/05/13 02:08, Zhenqiang Chen wrote: Hi The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. Bootstrap on x86-64 and ARM chromebook. No make check regression on x86-64 and panda board. For some targets/options, the (X == CST1 || X == CST2) might be converted to if (x == CST1) else if (x == CST2) at the beginning. In such case, the patch will not be executed. It is hard to write a reliable testcase to detect the transformation. So the testcase does not check the dump logs from reassoc1 pass. It just checks the runtime result. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase. This is an interesting transformation. I suspect we'd want to gate it on something like BRANCH_COST. For a related example, see how we handle (a != 0) || (b != 0) - (a | b) != 0 in fold-const.c. I suspect the conditions under which we want to do your transformation are going to be similar if not the same as those transformations in fold-const.c. Note I've been suggesting the bits I'm referring to in fold-const.c move out into the tree-ssa optimizers. If they fit well into tree-ssa-reassoc.c I'd look favorably upon a patch which moved them. As far as testing, the way to go will be to look at the reassoc1 dumps, but skip the test on targets with the wrong branch cost. tree-ssa/vrp87.c has a suitable condition to filter out the test on targets were the branch cost is too small. Out of curiosity, what prompted you to make this transformation? Was it mostly to deal with a codesize problem or is it a significant win on some hot code in a benchmark, or something else? Closely related, have you done anything like instrument the transformation to see how often it applies during a GCC bootstrap? jeff
RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
-Original Message- From: Andrew Pinski [mailto:pins...@gmail.com] Sent: Monday, August 05, 2013 4:40 PM To: Zhenqiang Chen Cc: GCC Patches Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0 On Mon, Aug 5, 2013 at 1:08 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: Hi The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. Bootstrap on x86-64 and ARM chromebook. No make check regression on x86-64 and panda board. For some targets/options, the (X == CST1 || X == CST2) might be converted to if (x == CST1) else if (x == CST2) at the beginning. In such case, the patch will not be executed. It is hard to write a reliable testcase to detect the transformation. So the testcase does not check the dump logs from reassoc1 pass. It just checks the runtime result. Is it OK for trunk? Seems like a better place to put this is inside tree-ssa-ifcombine.c which The patch is similar with the code for Optimize X == CST1 || X == CST2 if popcount (CST1 ^ CST2) == 1 into (X ~(CST1 ^ CST2)) == (CST1 ~(CST1 ^ CST2)) Your RFC: Gimple combine/folding interface is a good design. But before it, I think reusing code in tree-ssa-reassoc.c is the most easy way to implement it. handles the case where if(a || b) is split up into if(a) else if(b). Moving it into tree-ssa-ifcombine.c allows for code to be optimized which was written using the if(a) else if(b) form. Yes. There might have opportunities to optimized if(a) else if(b) form for targets which LOGICAL_OP_NON_SHORT_CIRCUIT is FALSE. If LOGICAL_OP_NON_SHORT_CIRCUIT is TRUE, a || b is converted to a | b in fold-const.c if they are simple operands. Both reassoc and ifcombine optimize short circuiting. But they handle different scenarios. So I think we need both. As you had mentioned, we need enhance ifcombine to optimize if(a) else if(b) form using the similar methods from reassoc pass. Then it would easier to detect this for all targets and easier to remove from fold-const.c the removal of the short circuiting. A unified interface as your RFC will be helpful for fold-const.c, tree-ssa-ifcombine.c and tree-ssa-reassoc.c. If we can detect the optimization opportunity in fold_truth_andor, we do not need to split if(a || b) into if(a) else if(b) at all. Thanks! -Zhenqiang ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase.
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Mon, Aug 5, 2013 at 1:08 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: Hi The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. Bootstrap on x86-64 and ARM chromebook. No make check regression on x86-64 and panda board. For some targets/options, the (X == CST1 || X == CST2) might be converted to if (x == CST1) else if (x == CST2) at the beginning. In such case, the patch will not be executed. It is hard to write a reliable testcase to detect the transformation. So the testcase does not check the dump logs from reassoc1 pass. It just checks the runtime result. Is it OK for trunk? Seems like a better place to put this is inside tree-ssa-ifcombine.c which handles the case where if(a || b) is split up into if(a) else if(b). Moving it into tree-ssa-ifcombine.c allows for code to be optimized which was written using the if(a) else if(b) form. Then it would easier to detect this for all targets and easier to remove from fold-const.c the removal of the short circuiting. Thanks, Andrew Pinski Thanks! -Zhenqiang ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase.
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Mon, Aug 5, 2013 at 1:39 AM, Andrew Pinski pins...@gmail.com wrote: On Mon, Aug 5, 2013 at 1:08 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: Hi The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. Bootstrap on x86-64 and ARM chromebook. No make check regression on x86-64 and panda board. For some targets/options, the (X == CST1 || X == CST2) might be converted to if (x == CST1) else if (x == CST2) at the beginning. In such case, the patch will not be executed. It is hard to write a reliable testcase to detect the transformation. So the testcase does not check the dump logs from reassoc1 pass. It just checks the runtime result. Is it OK for trunk? Seems like a better place to put this is inside tree-ssa-ifcombine.c which handles the case where if(a || b) is split up into if(a) else if(b). Moving it into tree-ssa-ifcombine.c allows for code to be optimized which was written using the if(a) else if(b) form. Then it would easier to detect this for all targets and easier to remove from fold-const.c the removal of the short circuiting. The other place where it make sense to do this optimization is the gimple combiner (right now that is really tree-ssa-forwprop.c). Though it would better sense if reassoc and ifcombine uses the same functions as the combiner (see my RFC which I don't have much time to work on right now). Thanks, Andrew Pinski Thanks, Andrew Pinski Thanks! -Zhenqiang ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * tree-ssa-reassoc.c (optimize_range_tests): Reasociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0. testsuite/ChangeLog 2013-08-05 Zhenqiang Chen zhenqiang.c...@arm.com * gcc.dg/reassoc1.c: New testcase.