Re: [IPCP] Remove unreachable code
On 25 October 2016 at 04:16, kugan wrote: > Hi, > > I noticed that in ipcp_bits_lattice::meet_with we have: > > else if (TREE_CODE_CLASS (code) == tcc_unary) > > else if (code == NOP_EXPR) > > > Since TREE_CODE_CLASS for NOP_EXPR is tcc_unary, if (code == NOP_EXPR) is > unreachable and therefore removing it. I also don't think that we need any > special casing for NOP_EXPR here. bit_value_unop handles already handles > CASE_CONVERT. Oops, sorry about that :/ IICU NOP_EXPR is present when the pass-thru operation is "simple" involving only the argument. bit_value_unop() already handles the case for NOP_EXPR in CASE_CONVERT, so I suppose NOP_EXPR won't need any special casing. Thanks for catching this! Regards, Prathamesh > > Is this OK if no regressions in bootstrap and regression testing. > > Thanks, > Kugan > > > > > gcc/ChangeLog: > > 2016-10-25 Kugan Vivekanandarajah > > * ipa-cp.c (ipcp_bits_lattice::meet_with): Remove unreachable code.
Re: RFC [1/3] divmod transform v2
On 25 October 2016 at 13:43, Richard Biener wrote: > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni > wrote: >> Hi, >> After approval from Bernd Schmidt, I committed the patch to remove >> optab functions for >> sdivmod_optab and udivmod_optab in optabs.def, which removes the block >> for divmod patch. >> >> This patch is mostly the same as previous one, except it drops >> targeting __udivmoddi4() because >> it gave undefined reference link error for calling __udivmoddi4() on >> aarch64-linux-gnu. >> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4() >> isn't needed for the target >> (it was a bug in my patch that called __udivmoddi4() even though >> aarch64 supported hardware div). >> >> However this makes me wonder if it's guaranteed that __udivmoddi4() >> will be available for a target if it doesn't have hardware div and >> divmod insn and doesn't have target-specific libfunc for >> DImode divmod ? To be conservative, the attached patch doesn't >> generate call to __udivmoddi4. >> >> Passes bootstrap+test on x86_64-unknown-linux. >> Cross-tested on arm*-*-*, aarch64*-*-*. >> Verified that there are no regressions with SPEC2006 on >> x86_64-unknown-linux-gnu. >> OK to commit ? > > I think the searching is still somewhat wrong - it's been some time > since my last look at the > patch so maybe I've said this already. Please bail out early for > stmt_can_throw_internal (stmt), > otherwise the top stmt search might end up not working. So > > + > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt)) > +return false; > > can go. > > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look > for another > TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR. > Which means you want a div_seen and a mod_seen, or simply record the top_stmt > code and look for the opposite in the 2nd loop. Um sorry I don't quite understand how we could end up without a trunc_mod stmt ? The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and checks if we have come across at least a single trunc_div stmt (and we bail out if no div is seen). At 2nd loop I suppose we don't need mod_seen, because stmt is guaranteed to be trunc_mod_expr. In the 2nd loop the following condition will never trigger for stmt: if (stmt_can_throw_internal (use_stmt)) continue; since we checked before hand if stmt could throw and chose to bail out in that case. and the following condition would also not trigger for stmt: if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb)) { end_imm_use_stmt_traverse (&use_iter); return false; } since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt). The case where top_stmt == stmt, we wouldn't reach the above condition, since we have above it: if (top_stmt == stmt) continue; So IIUC, top_stmt and stmt would always get added to stmts vector. Am I missing something ? Thanks, Prathamesh > > + switch (gimple_assign_rhs_code (use_stmt)) > + { > + case TRUNC_DIV_EXPR: > + new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); > + break; > + > + case TRUNC_MOD_EXPR: > + new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res); > + break; > + > > why type of op1 and type of op2 in the other case? Choose one for > consistency. > > + if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) > + cfg_changed = true; > > as you are rejecting all internally throwing stmts this shouldn't be > necessary. > > The patch is ok with those changes. > > Thanks, > Richard. > > >> Thanks, >> Prathamesh
Re: RFC [1/3] divmod transform v2
On 25 October 2016 at 16:17, Richard Biener wrote: > On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote: > >> On 25 October 2016 at 13:43, Richard Biener >> wrote: >> > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni >> > wrote: >> >> Hi, >> >> After approval from Bernd Schmidt, I committed the patch to remove >> >> optab functions for >> >> sdivmod_optab and udivmod_optab in optabs.def, which removes the block >> >> for divmod patch. >> >> >> >> This patch is mostly the same as previous one, except it drops >> >> targeting __udivmoddi4() because >> >> it gave undefined reference link error for calling __udivmoddi4() on >> >> aarch64-linux-gnu. >> >> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4() >> >> isn't needed for the target >> >> (it was a bug in my patch that called __udivmoddi4() even though >> >> aarch64 supported hardware div). >> >> >> >> However this makes me wonder if it's guaranteed that __udivmoddi4() >> >> will be available for a target if it doesn't have hardware div and >> >> divmod insn and doesn't have target-specific libfunc for >> >> DImode divmod ? To be conservative, the attached patch doesn't >> >> generate call to __udivmoddi4. >> >> >> >> Passes bootstrap+test on x86_64-unknown-linux. >> >> Cross-tested on arm*-*-*, aarch64*-*-*. >> >> Verified that there are no regressions with SPEC2006 on >> >> x86_64-unknown-linux-gnu. >> >> OK to commit ? >> > >> > I think the searching is still somewhat wrong - it's been some time >> > since my last look at the >> > patch so maybe I've said this already. Please bail out early for >> > stmt_can_throw_internal (stmt), >> > otherwise the top stmt search might end up not working. So >> > >> > + >> > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt)) >> > +return false; >> > >> > can go. >> > >> > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look >> > for another >> > TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR. >> > Which means you want a div_seen and a mod_seen, or simply record the >> > top_stmt >> > code and look for the opposite in the 2nd loop. >> Um sorry I don't quite understand how we could end up without a trunc_mod >> stmt ? >> The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and >> checks if we have >> come across at least a single trunc_div stmt (and we bail out if no >> div is seen). >> >> At 2nd loop I suppose we don't need mod_seen, because stmt is >> guaranteed to be trunc_mod_expr. >> In the 2nd loop the following condition will never trigger for stmt: >> if (stmt_can_throw_internal (use_stmt)) >> continue; >> since we checked before hand if stmt could throw and chose to bail out >> in that case. >> >> and the following condition would also not trigger for stmt: >> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb)) >> { >> end_imm_use_stmt_traverse (&use_iter); >> return false; >> } >> since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt). >> >> The case where top_stmt == stmt, we wouldn't reach the above >> condition, since we have above it: >> if (top_stmt == stmt) >> continue; >> >> So IIUC, top_stmt and stmt would always get added to stmts vector. >> Am I missing something ? > > Ah, indeed. Maybe add a comment then, it wasn't really obvious ;) > > Please still move the stmt_can_throw_internal (stmt) check up. Sure, I will move that up and do the other suggested changes. I was wondering if this condition in 2nd loop is too restrictive ? if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb)) { end_imm_use_stmt_traverse (&use_iter); return false; } Should we rather "continue" in this case by not adding use_stmt to stmts vector rather than dropping the transform all-together if gimple_bb (use_stmt) is not dominated by gimple_bb (top_stmt) ? For instance if we have a test-case like: if (cond) { t1 = x / y; t2 = x % y; } else t3 = x % y; and suppose stmt is "t2 = x % y", we would set top_stmt to "t1 = x / y"; In this case we would still want to do divmod transform in THEN block even though "t3 = x % y" is not do
[ping * 3] PR35503 - warn for restrict
Pinging patch: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01545.html Thanks, Prathamesh
Re: RFC [1/3] divmod transform v2
On 25 October 2016 at 18:47, Richard Biener wrote: > On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote: > >> On 25 October 2016 at 16:17, Richard Biener wrote: >> > On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote: >> > >> >> On 25 October 2016 at 13:43, Richard Biener >> >> wrote: >> >> > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni >> >> > wrote: >> >> >> Hi, >> >> >> After approval from Bernd Schmidt, I committed the patch to remove >> >> >> optab functions for >> >> >> sdivmod_optab and udivmod_optab in optabs.def, which removes the block >> >> >> for divmod patch. >> >> >> >> >> >> This patch is mostly the same as previous one, except it drops >> >> >> targeting __udivmoddi4() because >> >> >> it gave undefined reference link error for calling __udivmoddi4() on >> >> >> aarch64-linux-gnu. >> >> >> It appears aarch64 has hardware insn for DImode div, so __udivmoddi4() >> >> >> isn't needed for the target >> >> >> (it was a bug in my patch that called __udivmoddi4() even though >> >> >> aarch64 supported hardware div). >> >> >> >> >> >> However this makes me wonder if it's guaranteed that __udivmoddi4() >> >> >> will be available for a target if it doesn't have hardware div and >> >> >> divmod insn and doesn't have target-specific libfunc for >> >> >> DImode divmod ? To be conservative, the attached patch doesn't >> >> >> generate call to __udivmoddi4. >> >> >> >> >> >> Passes bootstrap+test on x86_64-unknown-linux. >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*. >> >> >> Verified that there are no regressions with SPEC2006 on >> >> >> x86_64-unknown-linux-gnu. >> >> >> OK to commit ? >> >> > >> >> > I think the searching is still somewhat wrong - it's been some time >> >> > since my last look at the >> >> > patch so maybe I've said this already. Please bail out early for >> >> > stmt_can_throw_internal (stmt), >> >> > otherwise the top stmt search might end up not working. So >> >> > >> >> > + >> >> > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt)) >> >> > +return false; >> >> > >> >> > can go. >> >> > >> >> > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only look >> >> > for another >> >> > TRUNC_DIV_EXPR later ... you may end up without a single TRUNC_MOD_EXPR. >> >> > Which means you want a div_seen and a mod_seen, or simply record the >> >> > top_stmt >> >> > code and look for the opposite in the 2nd loop. >> >> Um sorry I don't quite understand how we could end up without a trunc_mod >> >> stmt ? >> >> The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and >> >> checks if we have >> >> come across at least a single trunc_div stmt (and we bail out if no >> >> div is seen). >> >> >> >> At 2nd loop I suppose we don't need mod_seen, because stmt is >> >> guaranteed to be trunc_mod_expr. >> >> In the 2nd loop the following condition will never trigger for stmt: >> >> if (stmt_can_throw_internal (use_stmt)) >> >> continue; >> >> since we checked before hand if stmt could throw and chose to bail out >> >> in that case. >> >> >> >> and the following condition would also not trigger for stmt: >> >> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb)) >> >> { >> >> end_imm_use_stmt_traverse (&use_iter); >> >> return false; >> >> } >> >> since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt). >> >> >> >> The case where top_stmt == stmt, we wouldn't reach the above >> >> condition, since we have above it: >> >> if (top_stmt == stmt) >> >> continue; >> >> >> >> So IIUC, top_stmt and stmt would always get added to stmts vector. >> >> Am I missing something ? >> > >> > Ah, indeed. Maybe add a comment then, it wasn't really obvio
Re: RFC [3/3] divmod transform v2 - add test cases
On 24 October 2016 at 21:09, Prathamesh Kulkarni wrote: > On 16 October 2016 at 11:31, Prathamesh Kulkarni > wrote: >> Hi, >> This patch adds test-cases for divmod transform. >> OK to commit ? > ping https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01241.html Hi Richard, Could you please review this part: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01241.html It only adds test-cases for the divmod transform. Is it OK to commit ? Thanks, Prathamesh > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh
Re: RFC [2/3] divmod transform v2 - override expand_divmod_libfunc for ARM port
On 26 October 2016 at 18:51, Kyrill Tkachov wrote: > > On 16/10/16 07:00, Prathamesh Kulkarni wrote: >> >> Hi, >> This patch overrides expand_divmod_libfunc hook for ARM port. >> I separated the SImode tests into separate file from DImode tests >> because certain arm configs (cortex-15) have hardware div insn for >> SImode but not for DImode, and for that config we want SImode tests to >> be disabled but not DImode tests. The patch therefore has two >> target-effective checks: divmod and divmod_simode. >> Cross-tested on arm*-*-*. >> OK to commit ? > > > Looks ok to me, the implementation of the hook is straightforward though > I have a question. > arm_expand_divmod_libfunc is not supposed to ever be called for SImode > TARGET_IDIV. > It asserts it rather than just failing the expansion in some way. > How does the midend know not to call TARGET_EXPAND_DIVMOD_LIBFUNC in that > case, does it > just check if the relevant sdiv optab is not available? Yes. The divmod transform isn't enabled if target supports hardware div in the same or wider mode even if divmod libfunc is available for the given mode. > > If so, this is ok for trunk assuming a bootstrap and test run on > arm-none-linux-gnueabihf > shows no issues. Would be good to try one for --with-cpu=cortex-a15 and one > with a !TARGET_IDIV > target, say --with-cpu=cortex-a9. Bootstrap+tested on arm-linux-gnueabihf --with-cpu=cortex-a15 and --with-cpu=cortex-a9. Also cross-tested on arm*-*-*. OK to commit ? Thanks, Prathamesh > > Sorry for the delay. > > Thanks, > Kyrill > > >> Thanks, >> Prathamesh > >
Re: RFC [3/3] divmod transform v2 - add test cases
On 27 October 2016 at 18:58, Richard Biener wrote: > On Thu, 27 Oct 2016, Prathamesh Kulkarni wrote: > >> On 24 October 2016 at 21:09, Prathamesh Kulkarni >> wrote: >> > On 16 October 2016 at 11:31, Prathamesh Kulkarni >> > wrote: >> >> Hi, >> >> This patch adds test-cases for divmod transform. >> >> OK to commit ? >> > ping https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01241.html >> Hi Richard, >> Could you please review this part: >> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01241.html >> It only adds test-cases for the divmod transform. >> Is it OK to commit ? > > Can't see where the divmod or divmod_simode effective targets are > defined. If a patch adding those has been approved then the new > tests are fine. Sorry, I had put lib/target-supports.exp hunk in the 2nd part: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01240.html Does it look OK ? Thanks, Prathamesh > > Thanks, > Richard.
[arm.c] Use VAR_P
Hi, This patch uses replaces TREE_CODE(x) == VAR_DECL by VAR_P(x) in arm.c. Bootstrap+tested on arm-linux-gnueabihf. OK to commit ? Thanks, Prathamesh 2016-10-28 Prathamesh Kulkarni * config/arm/arm.c (arm_const_not_ok_for_debug_p): Use VAR_P. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3c4c704..a39e64f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30150,9 +30150,9 @@ arm_const_not_ok_for_debug_p (rtx p) && GET_CODE (XEXP (p, 0)) == SYMBOL_REF && (decl_op0 = SYMBOL_REF_DECL (XEXP (p, 0 { - if ((TREE_CODE (decl_op1) == VAR_DECL + if ((VAR_P (decl_op1) || TREE_CODE (decl_op1) == CONST_DECL) - && (TREE_CODE (decl_op0) == VAR_DECL + && (VAR_P (decl_op0) || TREE_CODE (decl_op0) == CONST_DECL)) return (get_variable_section (decl_op1, false) != get_variable_section (decl_op0, false));
Re: RFC [1/3] divmod transform v2
On 26 October 2016 at 16:17, Richard Biener wrote: > On Wed, 26 Oct 2016, Prathamesh Kulkarni wrote: > >> On 25 October 2016 at 18:47, Richard Biener wrote: >> > On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote: >> > >> >> On 25 October 2016 at 16:17, Richard Biener wrote: >> >> > On Tue, 25 Oct 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 25 October 2016 at 13:43, Richard Biener >> >> >> wrote: >> >> >> > On Sun, Oct 16, 2016 at 7:59 AM, Prathamesh Kulkarni >> >> >> > wrote: >> >> >> >> Hi, >> >> >> >> After approval from Bernd Schmidt, I committed the patch to remove >> >> >> >> optab functions for >> >> >> >> sdivmod_optab and udivmod_optab in optabs.def, which removes the >> >> >> >> block >> >> >> >> for divmod patch. >> >> >> >> >> >> >> >> This patch is mostly the same as previous one, except it drops >> >> >> >> targeting __udivmoddi4() because >> >> >> >> it gave undefined reference link error for calling __udivmoddi4() on >> >> >> >> aarch64-linux-gnu. >> >> >> >> It appears aarch64 has hardware insn for DImode div, so >> >> >> >> __udivmoddi4() >> >> >> >> isn't needed for the target >> >> >> >> (it was a bug in my patch that called __udivmoddi4() even though >> >> >> >> aarch64 supported hardware div). >> >> >> >> >> >> >> >> However this makes me wonder if it's guaranteed that __udivmoddi4() >> >> >> >> will be available for a target if it doesn't have hardware div and >> >> >> >> divmod insn and doesn't have target-specific libfunc for >> >> >> >> DImode divmod ? To be conservative, the attached patch doesn't >> >> >> >> generate call to __udivmoddi4. >> >> >> >> >> >> >> >> Passes bootstrap+test on x86_64-unknown-linux. >> >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*. >> >> >> >> Verified that there are no regressions with SPEC2006 on >> >> >> >> x86_64-unknown-linux-gnu. >> >> >> >> OK to commit ? >> >> >> > >> >> >> > I think the searching is still somewhat wrong - it's been some time >> >> >> > since my last look at the >> >> >> > patch so maybe I've said this already. Please bail out early for >> >> >> > stmt_can_throw_internal (stmt), >> >> >> > otherwise the top stmt search might end up not working. So >> >> >> > >> >> >> > + >> >> >> > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt)) >> >> >> > +return false; >> >> >> > >> >> >> > can go. >> >> >> > >> >> >> > top_stmt may end up as a TRUNC_DIV_EXPR so it's pointless to only >> >> >> > look >> >> >> > for another >> >> >> > TRUNC_DIV_EXPR later ... you may end up without a single >> >> >> > TRUNC_MOD_EXPR. >> >> >> > Which means you want a div_seen and a mod_seen, or simply record the >> >> >> > top_stmt >> >> >> > code and look for the opposite in the 2nd loop. >> >> >> Um sorry I don't quite understand how we could end up without a >> >> >> trunc_mod stmt ? >> >> >> The 2nd loop adds both trunc_div and trunc_mod to stmts vector, and >> >> >> checks if we have >> >> >> come across at least a single trunc_div stmt (and we bail out if no >> >> >> div is seen). >> >> >> >> >> >> At 2nd loop I suppose we don't need mod_seen, because stmt is >> >> >> guaranteed to be trunc_mod_expr. >> >> >> In the 2nd loop the following condition will never trigger for stmt: >> >> >> if (stmt_can_throw_internal (use_stmt)) >> >> >> continue; >> >> >> since we checked before hand if stmt could throw and chose to bail out >> >> >> in that case. >> >> >> >> >> >> and the following condition would also not trigger for stmt: >> >> >> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb)) >> >> >> { >> >> >> end_imm_use_stmt_traverse (&use_iter); >> >> >> return false; >> >> >> } >> >> >> since gimple_bb (stmt) is always dominated by gimple_bb (top_stmt). >> >> >> >> >> >> The case where top_stmt == stmt, we wouldn't reach the above >> >> >> condition, since we have above it: >> >> >> if (top_stmt == stmt) >> >> >> continue; >> >> >> >> >> >> So IIUC, top_stmt and stmt would always get added to stmts vector. >> >> >> Am I missing something ? >> >> > >> >> > Ah, indeed. Maybe add a comment then, it wasn't really obvious ;) >> >> > >> >> > Please still move the stmt_can_throw_internal (stmt) check up. >> >> Sure, I will move that up and do the other suggested changes. >> >> >> >> I was wondering if this condition in 2nd loop is too restrictive ? >> >> if (!dominated_by_p (CDI_DOMINATORS, gimple_bb (use_stmt), top_bb)) >> >> { >> >> end_imm_use_stmt_traverse (&use_iter); >> >> return false; >> >> } >> >> >> >> Should we rather "continue" in this case by not adding use_stmt to >> >> stmts vector rather than dropping >> >> the transform all-together if gimple_bb (use_stmt) is not dominated by >> >> gimple_bb (top_stmt) ? >> > >> > Ah, yes - didn't spot that. >> Hi, >> Is this version OK ? > > Yes. Committed as r241660. Thanks a lot! Regards, Prathamesh > > Thanks, > Richard.
Re: [PATCH, testsuite]: Cleanup lib/target-supports.exp, ...
On 1 November 2016 at 23:41, Uros Bizjak wrote: > On Tue, Nov 1, 2016 at 5:05 PM, Jakub Jelinek wrote: >> On Tue, Nov 01, 2016 at 10:05:22AM +0100, Uros Bizjak wrote: >>> ... simplify some conditions and add i?86-*-* target where missing. >>> >>> 2016-11-01 Uros Bizjak >>> >>> * lib/target-supports.exp: Normalize order of i?86 and x86_64 targets. >>> Whitespace fixes. >> ... >>> (check_effective_target_divmod): Add i?86-*-* target. >> >> This part likely broke >> +FAIL: gcc.dg/divmod-1.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-2.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-3.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-4.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-6.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> on i686-linux (i.e. 32-bit). > > No, this is expected (these tests already fail with x86_64 -m32 > multilib). These will be fixed by [1]. Oops, sorry for the breakage. The tests are meant to check if the divmod transform triggered, which is done by scanning DIVMOD in the widening_mul dump. Apparently I only checked for the triplet "x86_64-*-*" in check_effective_target_divmod() and it returned 1, which probably caused the divmod DImode tests to fail with -m32. In general, could I check in check_effective_target_*(), what options are passed ? So in case of -m32, I wanted to return 0 instead of 1 to make the tests on 32-bit UNSUPPORTED. Thanks for fixing the test-cases! Thanks, Prathamesh > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02483.html > > Uros. > >> Dunno what exactly the tests are meant to test, most likely they just >> need extra guards or something. Can be reproduced even on x86_64-linux >> with >> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=divmod*' >> >>> @@ -8110,7 +8090,7 @@ >>> #TODO: Add checks for all targets that have either hardware divmod insn >>> # or define libfunc for divmod. >>> if { [istarget arm*-*-*] >>> - || [istarget x86_64-*-*] } { >>> + || [istarget i?86-*-*] || [istarget x86_64-*-*] } { >>> return 1 >>> } >>> return 0 >> >> >> Jakub
[ping * 4] PR35503 - warn for restrict
Pinging patch: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01545.html Thanks, Prathamesh
Re: [ping * 4] PR35503 - warn for restrict
On 2 November 2016 at 18:29, Jason Merrill wrote: > Then I'll approve the whole patch. Thanks! Trying the patch on kernel build (allmodconfig) reveals the following couple of warnings: http://pastebin.com/Sv2HFDUv I think warning for str_error_r() is correct, however I am not sure if warning for pager_preexec() is legit or a false positive: pager.c: In function 'pager_preexec': pager.c:35:12: warning: passing argument 2 to restrict-qualified parameter aliases with argument 4 [-Wrestrict] select(1, &in, NULL, &in, NULL); ^~~~~~ Is the warning correct for the above call to select() syscall ? I am a bit anxious about keeping the warning in Wall because it's breaking the kernel. Should we instead keep it in Wextra, or continue keeping it in Wall ? Also is there a way to gracefully disable Werror for kernel builds ? I tried: make allmodconfig make all KCFLAGS="-Wno-error=restrict" CFLAGS="-Wno-error=restrict" -j8 but that didn't work. I managed to workaround by manually modifying Makefiles to not pass Werror, but I hope there's a better way. Thanks, Prathamesh > > On Wed, Nov 2, 2016 at 8:42 AM, Joseph Myers wrote: >> The format-checking parts of the patch are OK. >> >> -- >> Joseph S. Myers >> jos...@codesourcery.com
Re: [ping * 4] PR35503 - warn for restrict
On 2 November 2016 at 23:07, Jason Merrill wrote: > On Wed, Nov 2, 2016 at 1:08 PM, Prathamesh Kulkarni > wrote: >> On 2 November 2016 at 18:29, Jason Merrill wrote: >>> Then I'll approve the whole patch. >> Thanks! >> Trying the patch on kernel build (allmodconfig) reveals the following >> couple of warnings: >> http://pastebin.com/Sv2HFDUv >> >> I think warning for str_error_r() is correct > > It's accurate, but unhelpful; snprintf isn't going to use the contents > of buf via the variadic argument, so this warning is just noise. Ah, indeed, it's just printing address of buf, not using the contents. > >> however I am not sure if >> warning for pager_preexec() is legit or a false positive: >> >> pager.c: In function 'pager_preexec': >> pager.c:35:12: warning: passing argument 2 to restrict-qualified >> parameter aliases with argument 4 [-Wrestrict] >> select(1, &in, NULL, &in, NULL); >> ^~~~~~ >> Is the warning correct for the above call to select() syscall ? > > The warning looks correct based on the prototype > > extern int select (int __nfds, fd_set *__restrict __readfds, >fd_set *__restrict __writefds, >fd_set *__restrict __exceptfds, >struct timeval *__restrict __timeout); > > But passing the same fd_set to both readfds and exceptfds seems > reasonable to me, so this also seems like a false positive. > > Looking at C11, I see this example: > > EXAMPLE 3 The function parameter declarations > void h(int n, int * restrict p, int * restrict q, int * restrict r) > { > int i; > for (i = 0; i < n; i++) > p[i] = q[i] + r[i]; > } > > illustrate how an unmodified object can be aliased through two > restricted pointers. In particular, if a and b > are disjoint arrays, a call of the form h(100, a, b, b) has defined > behavior, because array b is not > modified within function h. > > This is is another example of well-defined code that your warning will > complain about. Yes, that's a limitation of the patch, it just looks at the prototype, and not how the arguments are used in the function. > >> Should we instead keep it in Wextra, or continue keeping it in Wall ? > > It seems that it doesn't belong in -Wall. I don't feel strongly about > -Wextra. Should I commit the patch by keeping Wrestrict "standalone", ie, not including it in either Wall or Wextra ? Thanks, Prathamesh > > Jason
[match.pd] Fix for PR35691
Hi Richard, The attached patch tries to fix PR35691, by adding the following two transforms to match.pd: (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE, it gets transformed to bit_and_expr so to match for both GENERIC and GIMPLE, I had to guard the for-stmt: #if GENERIC (for op (truth_andif truth_orif) #elif GIMPLE (for op (bit_and bit_ior) #endif Is that OK ? Bootstrap+test running on x86_64-unknown-linux-gnu. Thanks, Prathamesh 2016-11-03 Prathamesh Kulkarni PR middle-end/35691 * match.pd: Add following two patterns: (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. testsuite/ * gcc.dg/pr35691-1.c: New test-case. * gcc.dg/pr35691-2.c: Likewise. * gcc.dg/pr35691-3.c: Likewise. * gcc.dg/pr35691-4.c: Likewise. diff --git a/gcc/match.pd b/gcc/match.pd index 48f7351..65930bb 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -519,6 +519,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TYPE_UNSIGNED (type)) (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1) +/* PR35691: Transform + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ + +#if GENERIC +(for op (truth_andif truth_orif) +#elif GIMPLE +(for op (bit_and bit_ior) +#endif + cmp (eq ne) + (simplify + (op (cmp @0 integer_zerop) (cmp @1 integer_zerop)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); } + /* Fold (A & ~B) - (A & B) into (A ^ B) - B. */ (simplify (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1)) diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c new file mode 100644 index 000..25a7ace --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-1.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-gimple" } */ + +int foo1(int z0, unsigned z1) +{ + return (z0 == 0) && (z1 == 0); +} + +/* { dg-final { scan-tree-dump-not "z1.\[0-9\]*_\[0-9\]* = (int) z1" "gimple" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c new file mode 100644 index 000..5211f815 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 == 0); + int t1 = (z1 == 0); + int t2 = (t0 && t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-3.c b/gcc/testsuite/gcc.dg/pr35691-3.c new file mode 100644 index 000..134bbdf --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-3.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-gimple" } */ + +int foo1(int z0, unsigned z1) +{ + return (z0 != 0) || (z1 != 0); +} + +/* { dg-final { scan-tree-dump-not "z1.\[0-9\]*_\[0-9\]* = (int) z1" "gimple" } } */ diff --git a/gcc/testsuite/gcc.dg/pr35691-4.c b/gcc/testsuite/gcc.dg/pr35691-4.c new file mode 100644 index 000..90cbf6d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0, unsigned z1) +{ + int t0 = (z0 != 0); + int t1 = (z1 != 0); + int t2 = (t0 || t1); + return t2; +} + +/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC
On 3 November 2016 at 18:36, Uros Bizjak wrote: > On Thu, Nov 3, 2016 at 1:58 PM, Eric Botcazou wrote: >>> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already >>> implemented in libgcc. But the enablement of this function inside the >>> compiler has to be performed by each target. >> >> So can we do it generically instead of duplicating it ~50 times? > > I guess it can be done. Currently the expander goes: > > --cut here-- > /* Check if optab_handler exists for divmod_optab for given mode. */ > if (optab_handler (tab, mode) != CODE_FOR_nothing) > { > quotient = gen_reg_rtx (mode); > remainder = gen_reg_rtx (mode); > expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp); > } > > /* Generate call to divmod libfunc if it exists. */ > else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX) > targetm.expand_divmod_libfunc (libfunc, mode, op0, op1, >"ient, &remainder); > > else > gcc_unreachable (); > --cut here-- > > so, by declaring divmod libfunc, the target also has to provide target hook. > > Let's ask authors of the original divmod patch for the details. Yes, in the initial patch, the default version of the hook targeted generic divmod functions in libgcc, however since full set of divmod libfuncs wasn't available (__divmoddi4 for instance), we had to drop that. I have couple of concerns: a) I am not sure if __udivmoddi4() is generically available for all targets. For aarch64, I can confirm that __udivmoddi4() isn't available (it has hardware div, so the divmod transform should never generate call to __udivmoddi4() on aarch64). Please see: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01239.html Can we safely assume that a target would have generic divmod libfunc like __udivmoddi4 available if it doesn't support hardware div insn and doesn't define target-specific divmod libfuncs ? b) Targets like AVR, have their own divmod patterns in the backend for doing the divmod transform (and haven't registered target-specific divmod libfuncs via set_optab_libfunc() ). We would end up generating call to generic divmod libfuncs for these targets even though they have target-specific divmod libfuncs available. I wonder if these targets should now use the generic divmod transform instead ? Thanks, Prathamesh > > Uros.
Re: [match.pd] Fix for PR35691
On 3 November 2016 at 16:13, Richard Biener wrote: > On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote: > >> Hi Richard, >> The attached patch tries to fix PR35691, by adding the following two >> transforms to match.pd: >> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0. >> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0. >> >> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE, >> it gets transformed to bit_and_expr >> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt: >> >> #if GENERIC >> (for op (truth_andif truth_orif) >> #elif GIMPLE >> (for op (bit_and bit_ior) >> #endif >> >> Is that OK ? > > As you are not removing the fold-const.c variant I'd say you should > simply not look for truth_* and only handle GIMPLE. Note that we > have tree-ssa-ifcombine.c which should handle the variant with > control-flow (but I guess it does not and your patch wouldn't help > it either). > > The transform would also work for vectors (element_precision for > the test but also a value-matching zero which should ensure the > same number of elements). Um sorry, I didn't get how to check vectors to be of equal length by a matching zero. Could you please elaborate on that ? Thanks! > > Richard.
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
ping * 2 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html Thanks, Prathamesh On 7 June 2016 at 13:56, Prathamesh Kulkarni wrote: > ping https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html > > Thanks, > Prathamesh > > On 25 May 2016 at 18:19, Prathamesh Kulkarni > wrote: >> On 23 May 2016 at 14:28, Prathamesh Kulkarni >> wrote: >>> Hi, >>> This patch overrides expand_divmod_libfunc for ARM port and adds test-cases. >>> I separated the SImode tests into separate file from DImode tests >>> because certain arm configs (cortex-15) have hardware div insn for >>> SImode but not for DImode, >>> and for that config we want SImode tests to be disabled but not DImode >>> tests. >>> The patch therefore has two target-effective checks: divmod and >>> divmod_simode. >>> Cross-tested on arm*-*-*. >>> Bootstrap+test on arm-linux-gnueabihf in progress. >>> Does this patch look OK ? >> Hi, >> This version adds couple of more test-cases and fixes typo in >> divmod-3-simode.c, divmod-4-simode.c >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Prathamesh
Re: move increase_alignment from simple to regular ipa pass
ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html Thanks, Prathamesh On 28 June 2016 at 14:49, Prathamesh Kulkarni wrote: > ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html > > Thanks, > Prathamesh > > On 23 June 2016 at 22:51, Prathamesh Kulkarni > wrote: >> On 17 June 2016 at 19:52, Prathamesh Kulkarni >> wrote: >>> On 14 June 2016 at 18:31, Prathamesh Kulkarni >>> wrote: >>>> On 13 June 2016 at 16:13, Jan Hubicka wrote: >>>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>>>> index ecafe63..41ac408 100644 >>>>>> --- a/gcc/cgraph.h >>>>>> +++ b/gcc/cgraph.h >>>>>> @@ -1874,6 +1874,9 @@ public: >>>>>> if we did not do any inter-procedural code movement. */ >>>>>>unsigned used_by_single_function : 1; >>>>>> >>>>>> + /* Set if -fsection-anchors is set. */ >>>>>> + unsigned section_anchor : 1; >>>>>> + >>>>>> private: >>>>>>/* Assemble thunks and aliases associated to varpool node. */ >>>>>>void assemble_aliases (void); >>>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c >>>>>> index 4bfcad7..e75d5c0 100644 >>>>>> --- a/gcc/cgraphunit.c >>>>>> +++ b/gcc/cgraphunit.c >>>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl) >>>>>> it is available to notice_global_symbol. */ >>>>>>node->definition = true; >>>>>>notice_global_symbol (decl); >>>>>> + >>>>>> + node->section_anchor = flag_section_anchors; >>>>>> + >>>>>>if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl) >>>>>>/* Traditionally we do not eliminate static variables when not >>>>>>optimizing and when not doing toplevel reoder. */ >>>>>> diff --git a/gcc/common.opt b/gcc/common.opt >>>>>> index f0d7196..e497795 100644 >>>>>> --- a/gcc/common.opt >>>>>> +++ b/gcc/common.opt >>>>>> @@ -1590,6 +1590,10 @@ fira-algorithm= >>>>>> Common Joined RejectNegative Enum(ira_algorithm) >>>>>> Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization >>>>>> -fira-algorithm=[CB|priority] Set the used IRA algorithm. >>>>>> >>>>>> +fipa-increase_alignment >>>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization >>>>>> +Option to gate increase_alignment ipa pass. >>>>>> + >>>>>> Enum >>>>>> Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA >>>>>> algorithm %qs) >>>>>> >>>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) >>>>>> Init(1) Optimization >>>>>> Enable the dependent count heuristic in the scheduler. >>>>>> >>>>>> fsection-anchors >>>>>> -Common Report Var(flag_section_anchors) Optimization >>>>>> +Common Report Var(flag_section_anchors) >>>>>> Access data in the same section from shared anchor points. >>>>>> >>>>>> fsee >>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>>> index a0db3a4..1482566 100644 >>>>>> --- a/gcc/config/aarch64/aarch64.c >>>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void) >>>>>> >>>>>>aarch64_register_fma_steering (); >>>>>> >>>>>> + /* Enable increase_alignment pass. */ >>>>>> + flag_ipa_increase_alignment = 1; >>>>> >>>>> I would rather enable it always on targets that do support anchors. >>>> AFAIK aarch64 supports section anchors. >>>>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c >>>>>> index ce9e146..7f09f3a 100644 >>>>>> --- a/gcc/lto/lto-symtab.c >>>>>> +++ b/gcc/lto/lto-symtab.c >>>>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, >>>>>> symtab_node *entry) >>>>>> The type compatibility checks or the completing of type
fold x ^ y to 0 if x == y
Hi Richard, For the following test-case: int f(int x, int y) { int ret; if (x == y) ret = x ^ y; else ret = 1; return ret; } I was wondering if x ^ y should be folded to 0 since it's guarded by condition x == y ? optimized dump shows: f (int x, int y) { int iftmp.0_1; int iftmp.0_4; : if (x_2(D) == y_3(D)) goto ; else goto ; : iftmp.0_4 = x_2(D) ^ y_3(D); : # iftmp.0_1 = PHI return iftmp.0_1; } The attached patch tries to fold for above case. I am checking if op0 and op1 are equal using: if (bitmap_intersect_p (vr1->equiv, vr2->equiv) && operand_equal_p (vr1->min, vr1->max) && operand_equal_p (vr2->min, vr2->max)) { /* equal /* } I suppose intersection would check if op0 and op1 have equivalent ranges, and added operand_equal_p check to ensure that there is only one element within the range. Does that look correct ? Bootstrap+test in progress on x86_64-unknown-linux-gnu. Thanks, Prathamesh diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 4333d60..787d068 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -6965,6 +6965,59 @@ vrp_valueize_1 (tree name) return name; } +/* Try to fold op0 xor op1 == 0 if op0 == op1. */ +static tree +maybe_fold_xor (gassign *stmt) +{ + if (!stmt) +return NULL_TREE; + + enum tree_code code = gimple_assign_rhs_code (stmt); + if (code != BIT_XOR_EXPR) +return NULL_TREE; + + tree op0 = gimple_assign_rhs1 (stmt); + tree op1 = gimple_assign_rhs2 (stmt); + + if (TREE_CODE (op0) != SSA_NAME + || TREE_CODE (op1) != SSA_NAME) +return NULL_TREE; + + value_range *vr1 = get_value_range (op0); + value_range *vr2 = get_value_range (op1); + + if (vr1 == NULL || vr2 == NULL) +return NULL_TREE; + + if (vr1->type != VR_RANGE || vr2->type != VR_RANGE) +return NULL_TREE; + + if (! (symbolic_range_p (vr1) && symbolic_range_p (vr2))) +return NULL_TREE; + + if (! (TREE_CODE (vr1->min) == SSA_NAME && TREE_CODE (vr1->max) == SSA_NAME +&& TREE_CODE (vr2->min) == SSA_NAME && TREE_CODE (vr2->max) == SSA_NAME)) +return NULL_TREE; + + if (! (vr1->equiv && vr2->equiv)) +return NULL_TREE; + + /* check if op0 == op1. */ + if (bitmap_intersect_p (vr1->equiv, vr2->equiv) + && operand_equal_p (vr1->min, vr1->max, 0) + && operand_equal_p (vr2->min, vr2->max, 0) + && code == BIT_XOR_EXPR) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gimple_assign_set_rhs_from_tree (&gsi, integer_zero_node); + update_stmt (stmt); + return integer_zero_node; +} + + return NULL_TREE; +} + + /* Visit assignment STMT. If it produces an interesting range, record the SSA name in *OUTPUT_P. */ @@ -6990,8 +7043,11 @@ vrp_visit_assignment_or_call (gimple *stmt, tree *output_p) /* Try folding the statement to a constant first. */ tree tem = gimple_fold_stmt_to_constant_1 (stmt, vrp_valueize, vrp_valueize_1); + if (!tem) + tem = maybe_fold_xor (dyn_cast (stmt)); if (tem && is_gimple_min_invariant (tem)) set_value_range_to_value (&new_vr, tem, NULL); + /* Then dispatch to value-range extracting functions. */ else if (code == GIMPLE_CALL) extract_range_basic (&new_vr, stmt); ChangeLog Description: Binary data
Re: [RFC][IPA-VRP] Add support for IPA VRP in ipa-cp/ipa-prop
On 15 July 2016 at 05:46, kugan wrote: > Hi, > > > > This patch extends ipa-cp/ipa-prop infrastructure to handle propagation of > VR. Hi Kugan, Just a small nit - perhaps you should modify ipa_print_node_jump_functions_for_edge () to pretty-print value ranges associated with the jump function. Thanks, Prathamesh > > > > Thanks, > > Kugan > > > > > > gcc/testsuite/ChangeLog: > > > > 2016-07-14 Kugan Vivekanandarajah > > > > * gcc.dg/ipa/vrp1.c: New test. > > * gcc.dg/ipa/vrp2.c: New test. > > * gcc.dg/ipa/vrp3.c: New test. > > > > > > gcc/ChangeLog: > > > > 2016-07-14 Kugan Vivekanandarajah > > > > * common.opt: New option -fipa-vrp. > > * ipa-cp.c (ipa_get_vr_lat): New. > > (ipcp_vr_lattice::print): Likewise. > > (print_all_lattices): Call ipcp_vr_lattice::print. > > (ipcp_vr_lattice::meet_with): New. > > (ipcp_vr_lattice::meet_with_1): Likewise. > > (ipcp_vr_lattice::top_p): Likewise. > > (ipcp_vr_lattice::bottom_p): Likewsie. > > (ipcp_vr_lattice::set_to_bottom): Likewise. > > (set_all_contains_variable): Call VR set_to_bottom. > > (initialize_node_lattices): Init VR lattices. > > (propagate_vr_accross_jump_function): New. > > (propagate_constants_accross_call): Call > > propagate_vr_accross_jump_function. > > (ipcp_store_alignment_results): Rename to > > ipcp_store_alignment_and_vr_results and handke VR. > > * ipa-prop.c (ipa_set_jf_unknown): > > (ipa_compute_jump_functions_for_edge): Handle Value Range. > > (ipa_node_params_t::duplicate): Likewise. > > (ipa_write_jump_function): Likewise. > > (ipa_read_jump_function): Likewise. > > (write_ipcp_transformation_info): Likewise. > > (read_ipcp_transformation_info): Likewise. > > (ipcp_update_alignments): Rename to ipcp_update_vr_and_alignments > > and handle VR. > > > > >
Re: move increase_alignment from simple to regular ipa pass
ping * 3 https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html Thanks, Prathamesh On 5 July 2016 at 10:53, Prathamesh Kulkarni wrote: > ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html > > Thanks, > Prathamesh > > On 28 June 2016 at 14:49, Prathamesh Kulkarni > wrote: >> ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html >> >> Thanks, >> Prathamesh >> >> On 23 June 2016 at 22:51, Prathamesh Kulkarni >> wrote: >>> On 17 June 2016 at 19:52, Prathamesh Kulkarni >>> wrote: >>>> On 14 June 2016 at 18:31, Prathamesh Kulkarni >>>> wrote: >>>>> On 13 June 2016 at 16:13, Jan Hubicka wrote: >>>>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>>>>> index ecafe63..41ac408 100644 >>>>>>> --- a/gcc/cgraph.h >>>>>>> +++ b/gcc/cgraph.h >>>>>>> @@ -1874,6 +1874,9 @@ public: >>>>>>> if we did not do any inter-procedural code movement. */ >>>>>>>unsigned used_by_single_function : 1; >>>>>>> >>>>>>> + /* Set if -fsection-anchors is set. */ >>>>>>> + unsigned section_anchor : 1; >>>>>>> + >>>>>>> private: >>>>>>>/* Assemble thunks and aliases associated to varpool node. */ >>>>>>>void assemble_aliases (void); >>>>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c >>>>>>> index 4bfcad7..e75d5c0 100644 >>>>>>> --- a/gcc/cgraphunit.c >>>>>>> +++ b/gcc/cgraphunit.c >>>>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl) >>>>>>> it is available to notice_global_symbol. */ >>>>>>>node->definition = true; >>>>>>>notice_global_symbol (decl); >>>>>>> + >>>>>>> + node->section_anchor = flag_section_anchors; >>>>>>> + >>>>>>>if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl) >>>>>>>/* Traditionally we do not eliminate static variables when not >>>>>>>optimizing and when not doing toplevel reoder. */ >>>>>>> diff --git a/gcc/common.opt b/gcc/common.opt >>>>>>> index f0d7196..e497795 100644 >>>>>>> --- a/gcc/common.opt >>>>>>> +++ b/gcc/common.opt >>>>>>> @@ -1590,6 +1590,10 @@ fira-algorithm= >>>>>>> Common Joined RejectNegative Enum(ira_algorithm) >>>>>>> Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization >>>>>>> -fira-algorithm=[CB|priority] Set the used IRA algorithm. >>>>>>> >>>>>>> +fipa-increase_alignment >>>>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization >>>>>>> +Option to gate increase_alignment ipa pass. >>>>>>> + >>>>>>> Enum >>>>>>> Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA >>>>>>> algorithm %qs) >>>>>>> >>>>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) >>>>>>> Init(1) Optimization >>>>>>> Enable the dependent count heuristic in the scheduler. >>>>>>> >>>>>>> fsection-anchors >>>>>>> -Common Report Var(flag_section_anchors) Optimization >>>>>>> +Common Report Var(flag_section_anchors) >>>>>>> Access data in the same section from shared anchor points. >>>>>>> >>>>>>> fsee >>>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>>>> index a0db3a4..1482566 100644 >>>>>>> --- a/gcc/config/aarch64/aarch64.c >>>>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void) >>>>>>> >>>>>>>aarch64_register_fma_steering (); >>>>>>> >>>>>>> + /* Enable increase_alignment pass. */ >>>>>>> + flag_ipa_increase_alignment = 1; >>>>>> >>>>>> I would rather enable it always on targets that do support anchors. >>>>&g
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
ping * 3 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html Thanks, Prathamesh On 29 June 2016 at 22:09, Prathamesh Kulkarni wrote: > ping * 2 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html > > Thanks, > Prathamesh > > On 7 June 2016 at 13:56, Prathamesh Kulkarni > wrote: >> ping https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html >> >> Thanks, >> Prathamesh >> >> On 25 May 2016 at 18:19, Prathamesh Kulkarni >> wrote: >>> On 23 May 2016 at 14:28, Prathamesh Kulkarni >>> wrote: >>>> Hi, >>>> This patch overrides expand_divmod_libfunc for ARM port and adds >>>> test-cases. >>>> I separated the SImode tests into separate file from DImode tests >>>> because certain arm configs (cortex-15) have hardware div insn for >>>> SImode but not for DImode, >>>> and for that config we want SImode tests to be disabled but not DImode >>>> tests. >>>> The patch therefore has two target-effective checks: divmod and >>>> divmod_simode. >>>> Cross-tested on arm*-*-*. >>>> Bootstrap+test on arm-linux-gnueabihf in progress. >>>> Does this patch look OK ? >>> Hi, >>> This version adds couple of more test-cases and fixes typo in >>> divmod-3-simode.c, divmod-4-simode.c >>> >>> Thanks, >>> Prathamesh >>>> >>>> Thanks, >>>> Prathamesh
Re: fold x ^ y to 0 if x == y
On 8 July 2016 at 12:29, Richard Biener wrote: > On Fri, 8 Jul 2016, Richard Biener wrote: > >> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote: >> >> > Hi Richard, >> > For the following test-case: >> > >> > int f(int x, int y) >> > { >> >int ret; >> > >> >if (x == y) >> > ret = x ^ y; >> >else >> > ret = 1; >> > >> >return ret; >> > } >> > >> > I was wondering if x ^ y should be folded to 0 since >> > it's guarded by condition x == y ? >> > >> > optimized dump shows: >> > f (int x, int y) >> > { >> > int iftmp.0_1; >> > int iftmp.0_4; >> > >> > : >> > if (x_2(D) == y_3(D)) >> > goto ; >> > else >> > goto ; >> > >> > : >> > iftmp.0_4 = x_2(D) ^ y_3(D); >> > >> > : >> > # iftmp.0_1 = PHI >> > return iftmp.0_1; >> > >> > } >> > >> > The attached patch tries to fold for above case. >> > I am checking if op0 and op1 are equal using: >> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv) >> >&& operand_equal_p (vr1->min, vr1->max) >> >&& operand_equal_p (vr2->min, vr2->max)) >> > { /* equal /* } >> > >> > I suppose intersection would check if op0 and op1 have equivalent ranges, >> > and added operand_equal_p check to ensure that there is only one >> > element within the range. Does that look correct ? >> > Bootstrap+test in progress on x86_64-unknown-linux-gnu. >> >> I think VRP is the wrong place to catch this and DOM should have but it >> does >> >> Optimizing block #3 >> >> 1>>> STMT 1 = x_2(D) le_expr y_3(D) >> 1>>> STMT 1 = x_2(D) ge_expr y_3(D) >> 1>>> STMT 1 = x_2(D) eq_expr y_3(D) >> 1>>> STMT 0 = x_2(D) ne_expr y_3(D) >> 0>>> COPY x_2(D) = y_3(D) >> 0>>> COPY y_3(D) = x_2(D) >> Optimizing statement ret_4 = x_2(D) ^ y_3(D); >> Replaced 'x_2(D)' with variable 'y_3(D)' >> Replaced 'y_3(D)' with variable 'x_2(D)' >> Folded to: ret_4 = x_2(D) ^ y_3(D); >> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D) >> >> heh, registering both reqivalencies is obviously not going to help... >> >> The 2nd equivalence is from doing >> >> /* We already recorded that LHS = RHS, with canonicalization, >> value chain following, etc. >> >> We also want to record RHS = LHS, but without any >> canonicalization >> or value chain following. */ >> if (TREE_CODE (rhs) == SSA_NAME) >> const_and_copies->record_const_or_copy_raw (rhs, lhs, >> SSA_NAME_VALUE (rhs)); >> >> generally recording both is not helpful. Jeff? This seems to be >> r233207 (fix for PR65917) which must have regressed this testcase. > > Just verified it works fine on the GCC 5 branch: > > Optimizing block #3 > > 0>>> COPY y_3(D) = x_2(D) > 1>>> STMT 1 = x_2(D) le_expr y_3(D) > 1>>> STMT 1 = x_2(D) ge_expr y_3(D) > 1>>> STMT 1 = x_2(D) eq_expr y_3(D) > 1>>> STMT 0 = x_2(D) ne_expr y_3(D) > Optimizing statement ret_4 = x_2(D) ^ y_3(D); > Replaced 'y_3(D)' with variable 'x_2(D)' > Applying pattern match.pd:240, gimple-match.c:11346 > gimple_simplified to ret_4 = 0; > Folded to: ret_4 = 0; I have reported it as PR71947. Could you help me point out how to fix this ? Thanks, Prathamesh > > Richard.
Re: fold x ^ y to 0 if x == y
On 20 July 2016 at 16:35, Richard Biener wrote: > On Wed, 20 Jul 2016, Prathamesh Kulkarni wrote: > >> On 8 July 2016 at 12:29, Richard Biener wrote: >> > On Fri, 8 Jul 2016, Richard Biener wrote: >> > >> >> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote: >> >> >> >> > Hi Richard, >> >> > For the following test-case: >> >> > >> >> > int f(int x, int y) >> >> > { >> >> >int ret; >> >> > >> >> >if (x == y) >> >> > ret = x ^ y; >> >> >else >> >> > ret = 1; >> >> > >> >> >return ret; >> >> > } >> >> > >> >> > I was wondering if x ^ y should be folded to 0 since >> >> > it's guarded by condition x == y ? >> >> > >> >> > optimized dump shows: >> >> > f (int x, int y) >> >> > { >> >> > int iftmp.0_1; >> >> > int iftmp.0_4; >> >> > >> >> > : >> >> > if (x_2(D) == y_3(D)) >> >> > goto ; >> >> > else >> >> > goto ; >> >> > >> >> > : >> >> > iftmp.0_4 = x_2(D) ^ y_3(D); >> >> > >> >> > : >> >> > # iftmp.0_1 = PHI >> >> > return iftmp.0_1; >> >> > >> >> > } >> >> > >> >> > The attached patch tries to fold for above case. >> >> > I am checking if op0 and op1 are equal using: >> >> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv) >> >> >&& operand_equal_p (vr1->min, vr1->max) >> >> >&& operand_equal_p (vr2->min, vr2->max)) >> >> > { /* equal /* } >> >> > >> >> > I suppose intersection would check if op0 and op1 have equivalent >> >> > ranges, >> >> > and added operand_equal_p check to ensure that there is only one >> >> > element within the range. Does that look correct ? >> >> > Bootstrap+test in progress on x86_64-unknown-linux-gnu. >> >> >> >> I think VRP is the wrong place to catch this and DOM should have but it >> >> does >> >> >> >> Optimizing block #3 >> >> >> >> 1>>> STMT 1 = x_2(D) le_expr y_3(D) >> >> 1>>> STMT 1 = x_2(D) ge_expr y_3(D) >> >> 1>>> STMT 1 = x_2(D) eq_expr y_3(D) >> >> 1>>> STMT 0 = x_2(D) ne_expr y_3(D) >> >> 0>>> COPY x_2(D) = y_3(D) >> >> 0>>> COPY y_3(D) = x_2(D) >> >> Optimizing statement ret_4 = x_2(D) ^ y_3(D); >> >> Replaced 'x_2(D)' with variable 'y_3(D)' >> >> Replaced 'y_3(D)' with variable 'x_2(D)' >> >> Folded to: ret_4 = x_2(D) ^ y_3(D); >> >> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D) >> >> >> >> heh, registering both reqivalencies is obviously not going to help... >> >> >> >> The 2nd equivalence is from doing >> >> >> >> /* We already recorded that LHS = RHS, with canonicalization, >> >> value chain following, etc. >> >> >> >> We also want to record RHS = LHS, but without any >> >> canonicalization >> >> or value chain following. */ >> >> if (TREE_CODE (rhs) == SSA_NAME) >> >> const_and_copies->record_const_or_copy_raw (rhs, lhs, >> >> SSA_NAME_VALUE (rhs)); >> >> >> >> generally recording both is not helpful. Jeff? This seems to be >> >> r233207 (fix for PR65917) which must have regressed this testcase. >> > >> > Just verified it works fine on the GCC 5 branch: >> > >> > Optimizing block #3 >> > >> > 0>>> COPY y_3(D) = x_2(D) >> > 1>>> STMT 1 = x_2(D) le_expr y_3(D) >> > 1>>> STMT 1 = x_2(D) ge_expr y_3(D) >> > 1>>> STMT 1 = x_2(D) eq_expr y_3(D) >> > 1>>> STMT 0 = x_2(D) ne_expr y_3(D) >> > Optimizing statement ret_4 = x_2(D) ^ y_3(D); >> > Replaced 'y_3(D)' with variable 'x_2(D)' >> > Applying pattern match.pd:240, gimple-match.c:11346 >> > gimple_simplified to ret_4 =
Re: fold x ^ y to 0 if x == y
On 20 July 2016 at 23:07, Prathamesh Kulkarni wrote: > On 20 July 2016 at 16:35, Richard Biener wrote: >> On Wed, 20 Jul 2016, Prathamesh Kulkarni wrote: >> >>> On 8 July 2016 at 12:29, Richard Biener wrote: >>> > On Fri, 8 Jul 2016, Richard Biener wrote: >>> > >>> >> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote: >>> >> >>> >> > Hi Richard, >>> >> > For the following test-case: >>> >> > >>> >> > int f(int x, int y) >>> >> > { >>> >> >int ret; >>> >> > >>> >> >if (x == y) >>> >> > ret = x ^ y; >>> >> >else >>> >> > ret = 1; >>> >> > >>> >> >return ret; >>> >> > } >>> >> > >>> >> > I was wondering if x ^ y should be folded to 0 since >>> >> > it's guarded by condition x == y ? >>> >> > >>> >> > optimized dump shows: >>> >> > f (int x, int y) >>> >> > { >>> >> > int iftmp.0_1; >>> >> > int iftmp.0_4; >>> >> > >>> >> > : >>> >> > if (x_2(D) == y_3(D)) >>> >> > goto ; >>> >> > else >>> >> > goto ; >>> >> > >>> >> > : >>> >> > iftmp.0_4 = x_2(D) ^ y_3(D); >>> >> > >>> >> > : >>> >> > # iftmp.0_1 = PHI >>> >> > return iftmp.0_1; >>> >> > >>> >> > } >>> >> > >>> >> > The attached patch tries to fold for above case. >>> >> > I am checking if op0 and op1 are equal using: >>> >> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv) >>> >> >&& operand_equal_p (vr1->min, vr1->max) >>> >> >&& operand_equal_p (vr2->min, vr2->max)) >>> >> > { /* equal /* } >>> >> > >>> >> > I suppose intersection would check if op0 and op1 have equivalent >>> >> > ranges, >>> >> > and added operand_equal_p check to ensure that there is only one >>> >> > element within the range. Does that look correct ? >>> >> > Bootstrap+test in progress on x86_64-unknown-linux-gnu. >>> >> >>> >> I think VRP is the wrong place to catch this and DOM should have but it >>> >> does >>> >> >>> >> Optimizing block #3 >>> >> >>> >> 1>>> STMT 1 = x_2(D) le_expr y_3(D) >>> >> 1>>> STMT 1 = x_2(D) ge_expr y_3(D) >>> >> 1>>> STMT 1 = x_2(D) eq_expr y_3(D) >>> >> 1>>> STMT 0 = x_2(D) ne_expr y_3(D) >>> >> 0>>> COPY x_2(D) = y_3(D) >>> >> 0>>> COPY y_3(D) = x_2(D) >>> >> Optimizing statement ret_4 = x_2(D) ^ y_3(D); >>> >> Replaced 'x_2(D)' with variable 'y_3(D)' >>> >> Replaced 'y_3(D)' with variable 'x_2(D)' >>> >> Folded to: ret_4 = x_2(D) ^ y_3(D); >>> >> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D) >>> >> >>> >> heh, registering both reqivalencies is obviously not going to help... >>> >> >>> >> The 2nd equivalence is from doing >>> >> >>> >> /* We already recorded that LHS = RHS, with canonicalization, >>> >> value chain following, etc. >>> >> >>> >> We also want to record RHS = LHS, but without any >>> >> canonicalization >>> >> or value chain following. */ >>> >> if (TREE_CODE (rhs) == SSA_NAME) >>> >> const_and_copies->record_const_or_copy_raw (rhs, lhs, >>> >> SSA_NAME_VALUE >>> >> (rhs)); >>> >> >>> >> generally recording both is not helpful. Jeff? This seems to be >>> >> r233207 (fix for PR65917) which must have regressed this testcase. >>> > >>> > Just verified it works fine on the GCC 5 branch: >>> > >>> > Optimizing block #3 >>> > >>> > 0>>> COPY y_3(D) = x_2(D) >>&g
fix typo in comment in tree-ssa-strlen.c
Committed as obvious (r238588). Thanks, Prathamesh Index: tree-ssa-strlen.c === --- tree-ssa-strlen.c (revision 238587) +++ tree-ssa-strlen.c (working copy) @@ -2383,7 +2383,7 @@ }; /* Callback for walk_dominator_tree. Attempt to optimize various - string ops by remembering string lenths pointed by pointer SSA_NAMEs. */ + string ops by remembering string lengths pointed by pointer SSA_NAMEs. */ edge strlen_dom_walker::before_dom_children (basic_block bb) ChangeLog Description: Binary data
[PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
Hi Richard, The attached patch tries to fix PR70920. It adds your pattern from comment 1 in the PR (with additional gating on INTEGRAL_TYPE_P to avoid regressing finalize_18.f90) and second pattern, which is reverse of the first transform. I needed to update ssa-dom-branch-1.c because with patch applied, jump threading removed the second if (i != 0B) block. The dumps with and without patch for ssa-dom-branch-1.c start to differ with forwprop1: before: : _1 = temp_16(D)->code; _2 = _1 == 42; _3 = (int) _2; _4 = (long int) _3; temp_17 = (struct rtx_def *) _4; if (temp_17 != 0B) goto ; else goto ; after: : _1 = temp_16(D)->code; _2 = _1 == 42; _3 = (int) _2; _4 = (long int) _2; temp_17 = (struct rtx_def *) _4; if (_1 == 42) goto ; else goto ; I suppose the transform is correct for above test-case ? Then vrp dump shows: Threaded jump 5 --> 9 to 13 Threaded jump 8 --> 9 to 13 Threaded jump 3 --> 9 to 13 Threaded jump 12 --> 9 to 14 Removing basic block 9 basic block 9, loop depth 0 pred: if (i1_10(D) != 0B) goto ; else goto ; succ: 10 11 So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file, and hence needed to update the test-case. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK to commit ? PS: Writing changelog entries for match.pd is a bit tedious. Should we add optional names for pattern so we can refer to them by names in the ChangeLog for the more complicated ones ? Or maybe just use comments: (simplify /* name */ ... ) -;) Thanks, Prathamesh diff --git a/gcc/match.pd b/gcc/match.pd index 21bf617..7c736be 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { CONSTRUCTOR_ELT (ctor, idx / k)->value; }) (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; } @1 { bitsize_int ((idx % k) * width); }) + +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST. */ + +(for cmp (ne eq) + (simplify + (cmp (convert@2 @0) INTEGER_CST@1) + (if (POINTER_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@2))) + (cmp @0 (convert @1) + +/* Reverse of the above case: + x has integral_type, CST is a pointer constant. + Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST. */ + +(for cmp (ne eq) + (simplify + (cmp (convert @0) @1) + (if (POINTER_TYPE_P (TREE_TYPE (@1)) + && INTEGRAL_TYPE_P (TREE_TYPE (@0))) +(cmp @0 (convert @1) diff --git a/gcc/testsuite/gcc.dg/pr70920-1.c b/gcc/testsuite/gcc.dg/pr70920-1.c new file mode 100644 index 000..9b7e2d0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr70920-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-gimple" } */ + +#include + +void f1(); +void f2(); + +void +foo (int *a) +{ + if ((intptr_t) a == 0) +{ + f1 (); + if (a) + f2 (); +} +} + +/* { dg-final { scan-tree-dump "if \\(a == 0B\\)" "gimple" } } */ diff --git a/gcc/testsuite/gcc.dg/pr70920-2.c b/gcc/testsuite/gcc.dg/pr70920-2.c new file mode 100644 index 000..2db9897 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr70920-2.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +#include + +void f1(); +void f2(); + +void +foo (int *a) +{ + int cst = 0; + if ((intptr_t) a == cst) +{ + f1 (); + if (a) + f2 (); +} +} + +/* { dg-final { scan-tree-dump "gimple_simplified to if \\(a_\[0-9\]*\\(D\\) == 0B\\)" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/pr70920-3.c b/gcc/testsuite/gcc.dg/pr70920-3.c new file mode 100644 index 000..71e0d8d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr70920-3.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-Wno-int-to-pointer-cast -fdump-tree-gimple" } */ + +#include + +void f1(); +void f2(); + +void +foo (int a) +{ + if ((int *) a == 0) +{ + f1 (); + if (a) + f2 (); +} +} + +/* { dg-final { scan-tree-dump "if \\(a == 0\\)" "gimple" } } */ diff --git a/gcc/testsuite/gcc.dg/pr70920-4.c b/gcc/testsuite/gcc.dg/pr70920-4.c new file mode 100644 index 000..f92c5a6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr70920-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details -Wno-int-to-pointer-cast" } */ + +#include + +void f1(); +void f2(); + +void +foo (int a) +{ + void *cst = 0; + if ((int *) a == cst) +{ + f1 (); + if (a) + f2 (); +} +} + +/* { dg-final { scan-tree-dump "gimple_simplified to if \\(a_\[0-9\]*\\(D\\) == 0\\)" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c index 18f9041..d38e3a8 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-branch-1.c @@ -21,7 +21,7 @@ try_combine (rtx i1, rtx newpat) /* There should be three tests again
[PR71078] x / abs(x) -> copysign (1.0, x)
Hi, The attached patch tries to fix PR71078. I am not sure if I have got the converts right. I put (convert? @0) and (convert1? (abs @1)) to match for cases when operands's types may be different from outermost type like in pr71078-3.c test-case (included in patch). Bootstrap+test in progress on x86_64-unknown-linux-gnu. Thanks, Prathamesh diff --git a/gcc/match.pd b/gcc/match.pd index 21bf617..6c3d6ec 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -391,6 +391,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (mult (abs@1 @0) @1) (mult @0 @0)) +/* PR71078: x / abs(x) -> copysign (1.0, x) */ +(simplify + (rdiv:C (convert? @0) (convert1? (abs @0))) + (if (FLOAT_TYPE_P (type) + && ! HONOR_NANS (type) + && ! HONOR_INFINITIES (type)) + (switch +(if (type == float_type_node) + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0))) +(if (type == double_type_node) + (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0))) +(if (type == long_double_type_node) + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (convert @0)) + /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */ (for coss (COS COSH) copysigns (COPYSIGN) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c new file mode 100644 index 000..6204c14 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include + +float f1(float x) +{ + float t1 = fabsf (x); + float t2 = x / t1; + return t2; +} + +double f2(double x) +{ + double t1 = fabs (x); + double t2 = x / t1; + return t2; +} + +long double f3 (long double x) +{ + long double t1 = fabsl (x); + long double t2 = x / t1; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c new file mode 100644 index 000..96485af --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include + +float f1(float x) +{ + float t1 = fabsf (x); + float t2 = t1 / x; + return t2; +} + +double f2(double x) +{ + double t1 = fabs (x); + double t2 = t1 / x; + return t2; +} + +long double f3 (long double x) +{ + long double t1 = fabsl (x); + long double t2 = t1 / x; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c new file mode 100644 index 000..8780b6a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include +double f(float f) +{ + double t1 = fabs(f); + double t2 = f / t1; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ ChangeLog Description: Binary data
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On 25 July 2016 at 14:32, Richard Biener wrote: > On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote: > >> Hi Richard, >> The attached patch tries to fix PR70920. >> It adds your pattern from comment 1 in the PR >> (with additional gating on INTEGRAL_TYPE_P to avoid regressing >> finalize_18.f90) >> and second pattern, which is reverse of the first transform. >> I needed to update ssa-dom-branch-1.c because with patch applied, >> jump threading removed the second if (i != 0B) block. >> The dumps with and without patch for ssa-dom-branch-1.c start >> to differ with forwprop1: >> >> before: >> : >> _1 = temp_16(D)->code; >> _2 = _1 == 42; >> _3 = (int) _2; >> _4 = (long int) _3; >> temp_17 = (struct rtx_def *) _4; >> if (temp_17 != 0B) >> goto ; >> else >> goto ; >> >> after: >> : >> _1 = temp_16(D)->code; >> _2 = _1 == 42; >> _3 = (int) _2; >> _4 = (long int) _2; >> temp_17 = (struct rtx_def *) _4; >> if (_1 == 42) >> goto ; >> else >> goto ; >> >> I suppose the transform is correct for above test-case ? >> >> Then vrp dump shows: >> Threaded jump 5 --> 9 to 13 >> Threaded jump 8 --> 9 to 13 >> Threaded jump 3 --> 9 to 13 >> Threaded jump 12 --> 9 to 14 >> Removing basic block 9 >> basic block 9, loop depth 0 >> pred: >> if (i1_10(D) != 0B) >> goto ; >> else >> goto ; >> succ: 10 >> 11 >> >> So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file, >> and hence needed to update the test-case. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. >> OK to commit ? > > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > { CONSTRUCTOR_ELT (ctor, idx / k)->value; }) > (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; } >@1 { bitsize_int ((idx % k) * width); }) > + > +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST. > */ > + > +(for cmp (ne eq) > + (simplify > + (cmp (convert@2 @0) INTEGER_CST@1) > + (if (POINTER_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@2))) > > you can use @1 here and omit @2. > > + (cmp @0 (convert @1) > + > +/* Reverse of the above case: > + x has integral_type, CST is a pointer constant. > + Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST. */ > + > +(for cmp (ne eq) > + (simplify > + (cmp (convert @0) @1) > + (if (POINTER_TYPE_P (TREE_TYPE (@1)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@0))) > +(cmp @0 (convert @1) > > The second pattern lacks the INTEGER_CST on @1 so it doesn't match > its comment. Please do not add vertical space between pattern > comment and pattern. > > Please place patterns not at the end of match.pd but where similar > transforms are done. Like after > > /* Simplify pointer equality compares using PTA. */ > (for neeq (ne eq) > (simplify > (neeq @0 @1) > (if (POINTER_TYPE_P (TREE_TYPE (@0)) >&& ptrs_compare_unequal (@0, @1)) >{ neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; }))) > > please also share the (for ...) for both patterns or merge them > by changing the condition to > > (if ((POINTER_TYPE_P (TREE_TYPE (@0)) > && INTEGRAL_TYPE_P (TREE_TYPE (@1))) >|| (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >&& POINTER_TYPE_P (TREE_TYPE (@1 > Hi, Done suggested changes in this version. pr70920-4.c (test-case in patch) is now folded during ccp instead of forwprop after merging the two patterns. Passes bootstrap+test on x86_64-unknown-linux-gnu. OK for trunk ? Thanks, Prathamesh > Richard. > >> PS: Writing changelog entries for match.pd is a bit tedious. >> Should we add optional names for pattern so we can refer to them by names >> in the ChangeLog for the more complicated ones ? >> Or maybe just use comments: >> (simplify /* name */ ... ) -;) > > That will add the fun of inventing names ;) > >> Thanks, >> Prathamesh >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) diff --git a/gcc/match.pd b/gcc/match.pd index 21bf617..6c2ec82 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2513,6 +2513,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && ptrs_compare_unequal (@0, @1)) { neeq == EQ_EXPR ? boolean_
warn for dead function calls [3/4] testsuite fallout
Hi, The following test-cases broke due to the warning. I think however the warning is right for all the cases: a) g++.dg/tree-ssa/invalid-dom.C: I believe the call from main() to E::bar() is dead call ? b) libffi/testsuite/libffi.call/float.c: Call from main() to floating() is dead call. c) libffi/testsuite/libffi.call/float3.c: Calls from main() to floating_1() and floating_2() are dead calls. d) libffi/testsuite/libffi.call/negint.c: Call from main() to checking () is dead call. Should I update the test-cases to pass -Wno-unsued-value, or remove the calls ? Thanks, Prathamesh
[RFC] warn on dead function calls in ipa-pure-const [1/4]
Hi, The attached patch emits warnings for functions found to be pure or const by the ipa-pure-const pass. It does not warn for functions with unused return values that have been declared as pure or const by the user since this is already handled in C and C++ FE's. I have split it into parts to individually address fallouts observed during bootstrap+test. I still have to add more test-cases. Apart from that does the patch look OK ? Thanks, Prathamesh diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index a9570e4..3b8e774 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -226,6 +226,34 @@ warn_function_noreturn (tree decl) true, warned_about, "noreturn"); } +/* Emit diagnostic if the callers don't use return value. + Only to be called for pure/const function */ + +static void +warn_function_unused_ret (struct cgraph_node *node) +{ + tree decl = node->decl; + int flags = flags_from_decl_or_type (decl); + + if (flags & ECF_NORETURN) +return; + + if (!(flags & (ECF_CONST | ECF_PURE)) +|| (flags & ECF_LOOPING_CONST_OR_PURE)) +return; + + for (cgraph_edge *e = node->callers; e; e = e->next_caller) +{ + gcall *g = e->call_stmt; + if (g + && !VOID_TYPE_P (gimple_call_return_type (g)) + && gimple_call_lhs (g) == NULL) + warning_at (gimple_location (g), OPT_Wunused_value, + "Call from %s to %s has no effect", + e->caller->name (), e->callee->name ()); +} +} + /* Return true if we have a function state for NODE. */ static inline bool @@ -1475,6 +1503,8 @@ propagate_pure_const (void) /* Inline clones share declaration with their offline copies; do not modify their declarations since the offline copy may be different. */ + bool warn_unused_ret = false; + if (!w->global.inlined_to) switch (this_state) { @@ -1482,6 +1512,7 @@ propagate_pure_const (void) if (!TREE_READONLY (w->decl)) { warn_function_const (w->decl, !this_looping); + warn_unused_ret = true; if (dump_file) fprintf (dump_file, "Function found to be %sconst: %s\n", this_looping ? "looping " : "", @@ -1496,6 +1527,8 @@ propagate_pure_const (void) NULL, true); if (w->set_const_flag (true, this_looping)) { + if (warn_unused_ret) + warn_function_unused_ret (w); if (dump_file) fprintf (dump_file, "Declaration updated to be %sconst: %s\n", @@ -1509,6 +1542,7 @@ propagate_pure_const (void) if (!DECL_PURE_P (w->decl)) { warn_function_pure (w->decl, !this_looping); + warn_unused_ret = true; if (dump_file) fprintf (dump_file, "Function found to be %spure: %s\n", this_looping ? "looping " : "", @@ -1521,6 +1555,8 @@ propagate_pure_const (void) NULL, true); if (w->set_pure_flag (true, this_looping)) { + if (warn_unused_ret) + warn_function_unused_ret (w); if (dump_file) fprintf (dump_file, "Declaration updated to be %spure: %s\n", @@ -1808,11 +1844,14 @@ pass_local_pure_const::execute (function *fun) changed = true; } + bool warn_unused_ret = false; + switch (l->pure_const_state) { case IPA_CONST: if (!TREE_READONLY (current_function_decl)) { + warn_unused_ret = true; warn_function_const (current_function_decl, !l->looping); if (dump_file) fprintf (dump_file, "Function found to be %sconst: %s\n", @@ -1828,6 +1867,8 @@ pass_local_pure_const::execute (function *fun) } if (!skip && node->set_const_flag (true, l->looping)) { + if (warn_unused_ret) + warn_function_unused_ret (cgraph_node::get_create (current_function_decl)); if (dump_file) fprintf (dump_file, "Declaration updated to be %sconst: %s\n", l->looping ? "looping " : "", @@ -1840,6 +1881,7 @@ pass_local_pure_const::execute (function *fun) if (!DECL_PURE_P (current_function_decl)) { warn_function_pure (current_function_decl, !l->looping); + warn_unused_ret = true; if (dump_file) fprintf (dump_file, "Function found to be %spure: %s\n", l->looping ? "looping " : "", @@ -1854,6 +1896,8 @@ pass_local_pure_const::execute (function *fun) } if (!
warn for dead function calls [4/4] stor-layout.c fallout
The following is an interesting case which broke stor-layout.c. The patch warned for the following call to be dead from bit_field_mode_iterator::next_mode() to get_mode_alignment (): /* Stop if the mode requires too much alignment. */ if (GET_MODE_ALIGNMENT (m_mode) > m_align && SLOW_UNALIGNED_ACCESS (m_mode, m_align)) break; GET_MODE_ALIGNMENT (MODE) is just #defined as get_mode_alignment (MODE) in machmode.h SLOW_UNALIGNED_ACCESS (MODE, ALIGN) is #defined to STRICT_ALIGNMENT in defaults.h, and i386.h sets STRICT_ALIGNMENT to 0. So essentially it comes down to: if (get_mode_alignment (m_mode) > m_align && 0) break; which clearly makes get_mode_alignment(m_mode) a dead call since it's a pure function. However if a target overrides SLOW_UNALIGNED_ACCESS(mode, align) and sets it to some runtime value, then the call won't be dead for that target. Should we split the above in two different if conditions ? if (GET_MODE_ALIGNMENT (m_mode) > m_align) if (SLOW_UNALIGNED_ACCESS (m_mode, m_align)) break; Thanks, Prathamesh
warn on dead function calls [2/4] libsupc++/eh_alloc.cc fallout
Many warnings for dead-calls are emitted with patch on call to operator new in libsupc++/eh_alloc.cc, which I am not sure are correct or false positives, for instance: /home/prathamesh.kulkarni/gcc-svn/trunk/libstdc++-v3/libsupc++/eh_alloc.cc:170:22: warning: Call from void* {anonymous}::pool::allocate(std::size_t) to void* operator new(std::size_t, void*) has no effect [-Wunused-value] new (f) free_entry; ^ It appears to me new() is defined as follows in libsupc++/new: // Default placement versions of operator new. inline void* operator new(std::size_t, void* __p) _GLIBCXX_USE_NOEXCEPT { return __p; } So could it be considered as a dead call since new() doesn't have side-effect and it's return value is not assigned to any variable or is the warning wrong for the above call ? Thanks, Prathamesh
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On 26 July 2016 at 17:28, Richard Biener wrote: > On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote: > >> On 25 July 2016 at 14:32, Richard Biener wrote: >> > On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi Richard, >> >> The attached patch tries to fix PR70920. >> >> It adds your pattern from comment 1 in the PR >> >> (with additional gating on INTEGRAL_TYPE_P to avoid regressing >> >> finalize_18.f90) >> >> and second pattern, which is reverse of the first transform. >> >> I needed to update ssa-dom-branch-1.c because with patch applied, >> >> jump threading removed the second if (i != 0B) block. >> >> The dumps with and without patch for ssa-dom-branch-1.c start >> >> to differ with forwprop1: >> >> >> >> before: >> >> : >> >> _1 = temp_16(D)->code; >> >> _2 = _1 == 42; >> >> _3 = (int) _2; >> >> _4 = (long int) _3; >> >> temp_17 = (struct rtx_def *) _4; >> >> if (temp_17 != 0B) >> >> goto ; >> >> else >> >> goto ; >> >> >> >> after: >> >> : >> >> _1 = temp_16(D)->code; >> >> _2 = _1 == 42; >> >> _3 = (int) _2; >> >> _4 = (long int) _2; >> >> temp_17 = (struct rtx_def *) _4; >> >> if (_1 == 42) >> >> goto ; >> >> else >> >> goto ; >> >> >> >> I suppose the transform is correct for above test-case ? >> >> >> >> Then vrp dump shows: >> >> Threaded jump 5 --> 9 to 13 >> >> Threaded jump 8 --> 9 to 13 >> >> Threaded jump 3 --> 9 to 13 >> >> Threaded jump 12 --> 9 to 14 >> >> Removing basic block 9 >> >> basic block 9, loop depth 0 >> >> pred: >> >> if (i1_10(D) != 0B) >> >> goto ; >> >> else >> >> goto ; >> >> succ: 10 >> >> 11 >> >> >> >> So there remained two instances of if (i1_10 (D) != 0B) in dom2 dump file, >> >> and hence needed to update the test-case. >> >> >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. >> >> OK to commit ? >> > >> > --- a/gcc/match.pd >> > +++ b/gcc/match.pd >> > @@ -3408,3 +3408,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> > { CONSTRUCTOR_ELT (ctor, idx / k)->value; }) >> > (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; } >> >@1 { bitsize_int ((idx % k) * width); }) >> > + >> > +/* PR70920: Transform (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST. >> > */ >> > + >> > +(for cmp (ne eq) >> > + (simplify >> > + (cmp (convert@2 @0) INTEGER_CST@1) >> > + (if (POINTER_TYPE_P (TREE_TYPE (@0)) >> > + && INTEGRAL_TYPE_P (TREE_TYPE (@2))) >> > >> > you can use @1 here and omit @2. >> > >> > + (cmp @0 (convert @1) >> > + >> > +/* Reverse of the above case: >> > + x has integral_type, CST is a pointer constant. >> > + Transform (typeof CST)x eq/ne CST to x eq/ne (typeof x) CST. */ >> > + >> > +(for cmp (ne eq) >> > + (simplify >> > + (cmp (convert @0) @1) >> > + (if (POINTER_TYPE_P (TREE_TYPE (@1)) >> > + && INTEGRAL_TYPE_P (TREE_TYPE (@0))) >> > +(cmp @0 (convert @1) >> > >> > The second pattern lacks the INTEGER_CST on @1 so it doesn't match >> > its comment. Please do not add vertical space between pattern >> > comment and pattern. >> > >> > Please place patterns not at the end of match.pd but where similar >> > transforms are done. Like after >> > >> > /* Simplify pointer equality compares using PTA. */ >> > (for neeq (ne eq) >> > (simplify >> > (neeq @0 @1) >> > (if (POINTER_TYPE_P (TREE_TYPE (@0)) >> >&& ptrs_compare_unequal (@0, @1)) >> >{ neeq == EQ_EXPR ? boolean_false_node : boolean_true_node; }))) >> > >> > please also share the (for ...) for both patterns or merge them >> > by changing the condition to >> > >> > (if ((POINTER_TYPE_P (TREE_TYPE (@0)) >> > && INTEGRAL_TYPE_P (TREE_TYPE (@1))) >> >|| (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> >&& POINTER_TYPE_P (TREE_TYPE (@1 >> > >> Hi, >> Done suggested changes in this version. >> pr70920-4.c (test-case in patch) is now folded during ccp instead of >> forwprop after merging the >> two patterns. >> Passes bootstrap+test on x86_64-unknown-linux-gnu. >> OK for trunk ? > > (please paste in ChangeLog entries rather than attaching them). Will do henceforth. > > In gcc.dg/tree-ssa/ssa-dom-branch-1.c you need to adjust the comment > before the dump-scan you adjust. > > Ok with that change. Thanks, committed as r238754 after adjusting the comment in ssa-dom-branch-1.c. Thanks, Prathamesh > > Thanks, > Richard.
Re: warn for dead function calls [3/4] testsuite fallout
On 26 July 2016 at 17:06, Richard Biener wrote: > On Tue, 26 Jul 2016, Prathamesh Kulkarni wrote: > >> Hi, >> The following test-cases broke due to the warning. >> I think however the warning is right for all the cases: >> >> a) g++.dg/tree-ssa/invalid-dom.C: >> I believe the call from main() to E::bar() is dead call ? > > Can't find this testcase. oops, it's dom-invalid.C -;) > >> b) libffi/testsuite/libffi.call/float.c: >> Call from main() to floating() is dead call. >> >> c) libffi/testsuite/libffi.call/float3.c: >> Calls from main() to floating_1() and floating_2() are dead calls. >> >> d) libffi/testsuite/libffi.call/negint.c: >> Call from main() to checking () is dead call. > > Looks like dead calls in the above but libffi is maintained upstream > and just copied to GCC. Please raise the issue upstream. Ok I will raise the issue upstream. Thanks, Prathamesh > > Richard. > >> Should I update the test-cases to pass -Wno-unsued-value, >> or remove the calls ? >> >> Thanks, >> Prathamesh >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)
Re: warn for dead function calls [4/4] stor-layout.c fallout
On 26 July 2016 at 17:07, Richard Biener wrote: > On Tue, 26 Jul 2016, Prathamesh Kulkarni wrote: > >> The following is an interesting case which broke stor-layout.c. >> The patch warned for the following call to be dead from >> bit_field_mode_iterator::next_mode() to get_mode_alignment (): >> >> /* Stop if the mode requires too much alignment. */ >> if (GET_MODE_ALIGNMENT (m_mode) > m_align >> && SLOW_UNALIGNED_ACCESS (m_mode, m_align)) >> break; >> >> GET_MODE_ALIGNMENT (MODE) is just #defined as get_mode_alignment (MODE) >> in machmode.h >> >> SLOW_UNALIGNED_ACCESS (MODE, ALIGN) is #defined to STRICT_ALIGNMENT >> in defaults.h, and i386.h sets STRICT_ALIGNMENT to 0. >> So essentially it comes down to: >> >> if (get_mode_alignment (m_mode) > m_align && 0) >> break; >> >> which clearly makes get_mode_alignment(m_mode) a dead call >> since it's a pure function. >> However if a target overrides SLOW_UNALIGNED_ACCESS(mode, align) >> and sets it to some runtime value, then the call won't be dead for that >> target. >> >> Should we split the above in two different if conditions ? >> if (GET_MODE_ALIGNMENT (m_mode) > m_align) >> if (SLOW_UNALIGNED_ACCESS (m_mode, m_align)) >> break; > > I'm surprised it's only one case that you hit ;) Be prepared for > other targets to be broken similarly. > > This hints at the general issue of issueing warnings after optimization, > they can easily become false positives. Hmm, this would indeed give rise to such false positives :/ I wonder whether we should restrict the warning only for cases when the call is outermost expression ? Not sure how to go about that. Maybe add a new flag to tree_exp for CALL_EXPR say OUTERMOST_CALL_P, which is set by FE when the call is determined to be outermost expression ? Thanks, Prathamesh > > Richard.
Re: [PATCH] Replacing gcc's dependence on libiberty's fnmatch to gnulib's fnmatch
On 26 July 2016 at 19:21, ayush goel wrote: > On 26 July 2016 at 3:38:59 AM, Manuel López-Ibáñez > (lopeziba...@gmail.com) wrote: >> On 25 July 2016 at 18:18, ayush goel wrote: >> > On top of the previously filed patch for importing gnulib (the link >> > isn’t available on the archive yet, however this contains some of the >> > information: >> > http://gcc.1065356.n5.nabble.com/Importing-gnulib-into-the-gcc-tree-td1275807.html#a1279573) >> > now I have replaced another function from libiberty with the >> > corresponding version from gnulib. >> > Even though in both OSX and GNU/Linux, fnmatch is provided by the GNU >> > libc already, so the copy in libiberty is not used in your systems. >> > However since the objective is to replace whatever functions can be >> > leveraged by gnulib, these changes have been made. >> >> Why the change from "fnmatch.h" to ? > > Gnulib doesn’t contain a header for fnmatch. It itself relies on > glib’c fnmatch.h > >> >> Also, are the files in gnulib and libiberty semantically identical? >> The wiki page does not say anything about this. How did you check >> this? > > Well the online docs for libiberty and gnulib claim the same > definition for fnmatch. Apart from this I’ve manually gone through the > source code and they seem to be semantically similar. > Also the fact that the system builds fine and the tests also execute > fine could serve as a manifestation for the fact that they are > semantically same. >> >> GCC can run on other systems besides OSX and GNU/Linux, how can you >> test that your change does not break anything on those systems? >> > Well I have access to these two systems only. How would you suggest I > test my patches on all possible systems? Maybe building contrib/config-list.mk could help ? AFAIK, it will cross build the gcc tree for different targets, but not run the testsuite, however I could be wrong. Thanks, Prathamesh > >> Cheers, >> >> Manuel. >> > > -Ayush
Re: [PR71078] x / abs(x) -> copysign (1.0, x)
On 26 July 2016 at 17:41, Richard Biener wrote: > On Mon, 25 Jul 2016, Prathamesh Kulkarni wrote: > >> Hi, >> The attached patch tries to fix PR71078. >> I am not sure if I have got the converts right. >> I put (convert? @0) and (convert1? (abs @1)) >> to match for cases when operands's types may >> be different from outermost type like in pr71078-3.c > > Types of RDIV_EXPR have to be the same so as you have a > match on @0 the converts need to be either both present > or not present. > > + (if (FLOAT_TYPE_P (type) > > as you special-case several types below please use SCALAR_FLOAT_TYPE_P > here. > > + && ! HONOR_NANS (type) > + && ! HONOR_INFINITIES (type)) > + (switch > +(if (type == float_type_node) > + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0))) > > please use if (types_match (type, float_type_node)) instead of > pointer equality. I _think_ you can do better here by using > IFN_COPYSIGN but possibly only so if the target supports it. > Richard - this seems to be the first pattern in need of > generating a builtin where no other was there to match the type > to - any idea how we can safely use the internal function here? > I see those do not have an expander that would fall back to > expanding the regular builtin, correct? > > Please place the pattern next to > > /* Optimize -A / A to -1.0 if we don't care about >NaNs or Infinities. */ > (simplify > (rdiv:C @0 (negate @0)) > (if (FLOAT_TYPE_P (type) > && ! HONOR_NANS (type) > && ! HONOR_INFINITIES (type)) > { build_minus_one_cst (type); })) > > where it logically belongs. Hi, Is this version OK ? Bootstrap + test in progress on x86_64-unknown-linux-gnu. Thanks, Prathamesh > > Thanks, > Richard. > >> test-case (included in patch). >> Bootstrap+test in progress on x86_64-unknown-linux-gnu. >> >> Thanks, >> Prathamesh >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) 2016-07-27 Prathamesh Kulkarni PR middle-end/71078 * match.pd (x / abs(x) -> copysign(1.0, x)): New pattern. testsuite/ * gcc.dg/tree-ssa/pr71078-1.c: New test-case. * gcc.dg/tree-ssa/pr71078-2.c: Likewise. * gcc.dg/tree-ssa/pr71078-3.c: Likewise. diff --git a/gcc/match.pd b/gcc/match.pd index 21bf617..2fd898a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -195,6 +195,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && ! HONOR_INFINITIES (type)) { build_minus_one_cst (type); })) +/* PR71078: x / abs(x) -> copysign (1.0, x) */ +(simplify + (rdiv:C (convert? @0) (convert? (abs @0))) + (if (SCALAR_FLOAT_TYPE_P (type) + && ! HONOR_NANS (type) + && ! HONOR_INFINITIES (type)) + (switch +(if (types_match (type, float_type_node)) + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0))) +(if (types_match (type, double_type_node)) + (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0))) +(if (types_match (type, long_double_type_node)) + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (convert @0)) + /* In IEEE floating point, x/1 is not equivalent to x for snans. */ (simplify (rdiv @0 real_onep) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c new file mode 100644 index 000..6204c14 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include + +float f1(float x) +{ + float t1 = fabsf (x); + float t2 = x / t1; + return t2; +} + +double f2(double x) +{ + double t1 = fabs (x); + double t2 = x / t1; + return t2; +} + +long double f3 (long double x) +{ + long double t1 = fabsl (x); + long double t2 = x / t1; + return t2; +} + +/* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ +/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c new file mode 100644 index 000..96485af --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-forwprop-details" } */ + +#include + +float f1(float x) +{ + float t1 = fabsf (x); + float t2 = t1 / x; + return t2; +} + +double f2(double x) +{ + double t1 = fabs (x); + double t2 = t1 / x; + return t2; +} + +long double f3 (long double x) +{ + long double t1 = fabsl (x); +
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On 28 July 2016 at 15:58, Andreas Schwab wrote: > On Mo, Jul 25 2016, Prathamesh Kulkarni > wrote: > >> diff --git a/gcc/testsuite/gcc.dg/pr70920-4.c >> b/gcc/testsuite/gcc.dg/pr70920-4.c >> new file mode 100644 >> index 000..dedb895 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr70920-4.c >> @@ -0,0 +1,21 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-ccp-details -Wno-int-to-pointer-cast" } */ >> + >> +#include >> + >> +void f1(); >> +void f2(); >> + >> +void >> +foo (int a) >> +{ >> + void *cst = 0; >> + if ((int *) a == cst) >> +{ >> + f1 (); >> + if (a) >> + f2 (); >> +} >> +} >> + >> +/* { dg-final { scan-tree-dump "gimple_simplified to if \\(_\[0-9\]* == >> 0\\)" "ccp1" } } */ > > This fails on all ilp32 platforms. Oops, sorry for the breakage. With -m32, the pattern is applied during forwprop1 rather than ccp1. I wonder though why ccp1 fails to fold the pattern with -m32 ? Looking at the dumps: without -m32: input to ccp1 pass: : cst_4 = 0B; _1 = (long int) a_5(D); _2 = (void *) _1; if (cst_4 == _2) goto ; else goto ; cc1 pass dump shows: Substituting values and folding statements Folding statement: _1 = (long int) a_5(D); Not folded Folding statement: _2 = (void *) _1; Not folded Folding statement: if (cst_4 == _2) which is likely CONSTANT Applying pattern match.pd:2537, gimple-match.c:6530 gimple_simplified to if (_1 == 0) Folded into: if (_1 == 0) with -m32: input to ccp1 pass: : cst_3 = 0B; a.0_1 = (void *) a_4(D); if (cst_3 == a.0_1) goto ; else goto ; ccp1 pass dump shows: Substituting values and folding statements Folding statement: a.0_1 = (void *) a_4(D); Not folded Folding statement: if (cst_3 == a.0_1) which is likely CONSTANT Folded into: if (a.0_1 == 0B) I am not able to understand why it doesn't fold it to if (a_4(D) == 0) ? forwprop1 folds a.0_1 == 0B to a_4(D) == 0. I suppose the test-case would need to scan ccp1 for non-ilp targets and forwprop1 for ilp targets. How do update the test-case to reflect this ? Thanks, Prathamesh > > Andreas. > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different."
Re: RFC [1/2] divmod transform
On 8 June 2016 at 19:53, Richard Biener wrote: > On Fri, 3 Jun 2016, Jim Wilson wrote: > >> On Mon, May 30, 2016 at 12:45 AM, Richard Biener wrote: >> > Joseph - do you know sth about why there's not a full set of divmod >> > libfuncs in libgcc? >> >> Because udivmoddi4 isn't a libfunc, it is a helper function for the >> div and mov libfuncs. Since we can compute the signed div and mod >> results from udivmoddi4, there was no need to also add a signed >> version of it. It was given a libfunc style name so that we had the >> option of making it a libfunc in the future, but that never happened. >> There was no support for calling any divmod libfunc until it was added >> as a special case to call an ARM library (not libgcc) function. This >> happened here >> >> 2004-08-09 Mark Mitchell >> >> * config.gcc (arm*-*-eabi*): New target. >> * defaults.h (TARGET_LIBGCC_FUNCS): New macro. >> (TARGET_LIB_INT_CMP_BIASED): Likewise. >> * expmed.c (expand_divmod): Try a two-valued divmod function as a >> last resort. >> ... >> * config/arm/arm.c (arm_init_libfuncs): New function. >> (arm_compute_initial_eliminatino_offset): Return HOST_WIDE_INT. >> (TARGET_INIT_LIBFUNCS): Define it. >> ... >> >> Later, two ports added their own divmod libfuncs, but I don't see any >> evidence that they were ever used, since there is no support for >> calling divmod other than the expand_divmod last resort code that only >> triggers for ARM. >> >> It is only now that Prathamesh is adding gimple support for divmod >> operations that we need to worry about getting this right, without >> breaking the existing ARM library support or the existing udivmoddi4 >> support. > > Ok, so as he is primarily targeting the special arm divmod libcall > I suppose we can live with special-casing libcall handling to > udivmoddi3. It would be nice to not lie about divmod availablilty > as libcall though... - it looks like the libcall is also guarded > on TARGET_HAS_NO_HW_DIVIDE (unless it was available historically > like on x86). > > So not sure where to go from here. Hi, I have attached patch, which is rebased on trunk. Needed to update divmod-7.c, which now gets transformed to divmod thanks to your code-hoisting patch -;) We still have the issue of optab_libfunc() returning non-existent libcalls. As in previous patch, I am checking explicitly for "__udivmoddi4", with a FIXME note. I hope that's okay for now ? Bootstrapped and tested on x86_64-unknown-linux-gnu, armv8l-unknown-linux-gnueabihf. Bootstrap+test in progress on i686-linux-gnu. Cross-tested on arm*-*-*. Thanks, Prathamesh > > Richard. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 83bd9ab..e4815cf 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -7010,6 +7010,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to the hook implementation for how different fusion types are supported. @end deftypefn +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem}) +Define this hook if the port does not have hardware div and divmod insn for +the given mode but has divmod libfunc, which is incompatible +with libgcc2.c:__udivmoddi4 +@end deftypefn + @node Sections @section Dividing the Output into Sections (Texts, Data, @dots{}) @c the above section title is WAY too long. maybe cut the part between diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index a72c3d8..3efaf4d 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4864,6 +4864,8 @@ them: try the first ones in this list first. @hook TARGET_SCHED_FUSION_PRIORITY +@hook TARGET_EXPAND_DIVMOD_LIBFUNC + @node Sections @section Dividing the Output into Sections (Texts, Data, @dots{}) @c the above section title is WAY too long. maybe cut the part between diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 49f3495..18876ce 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2326,6 +2326,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p +/* Expand DIVMOD() using: + a) optab handler for udivmod/sdivmod if it is available. + b) If optab_handler doesn't exist, Generate call to +optab_libfunc for udivmod/sdivmod. */ + +static void +expand_DIVMOD (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg0 = gimple_call_arg (stmt, 0); + tree arg1 = gimple_call_arg (stmt, 1); + + gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); + tree type = TREE_TYPE (TREE_TYPE (lhs)); + machine_mode mode = TYPE_MODE (type); + bool unsignedp = TYPE_UNSIGNED (type); + optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab; + + rtx op0 = expand_normal (arg0); + rtx op1 = expand_normal (arg1); + rtx target = exp
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
On 27 July 2016 at 18:56, Ramana Radhakrishnan wrote: > On Wed, May 25, 2016 at 1:49 PM, Prathamesh Kulkarni > wrote: >> On 23 May 2016 at 14:28, Prathamesh Kulkarni >> wrote: >>> Hi, >>> This patch overrides expand_divmod_libfunc for ARM port and adds test-cases. >>> I separated the SImode tests into separate file from DImode tests >>> because certain arm configs (cortex-15) have hardware div insn for >>> SImode but not for DImode, >>> and for that config we want SImode tests to be disabled but not DImode >>> tests. >>> The patch therefore has two target-effective checks: divmod and >>> divmod_simode. >>> Cross-tested on arm*-*-*. >>> Bootstrap+test on arm-linux-gnueabihf in progress. >>> Does this patch look OK ? >> Hi, >> This version adds couple of more test-cases and fixes typo in >> divmod-3-simode.c, divmod-4-simode.c >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Prathamesh > > From the patch (snipped out unnecessary parts) > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 201aeb4..3bbf11b 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > > > > + gcc_assert (quotient); > + gcc_assert (remainder); > + > > There's a trailing white space here. > > + *quot_p = quotient; > + *rem_p = remainder; > +} > > > > +# For ARM configs defining __ARM_ARCH_EXT_IDIV__, disable > divmod_simode test-cases > > Very unhelpful comment ... > > For versions of the architecture where there exists a DIV instruction, > the divmod helper function is not used, disable the software divmod > optimization. > > > + > +proc check_effective_target_arm_divmod_simode { } { > +return [check_no_compiler_messages arm_divmod assembly { > + #ifdef __ARM_ARCH_EXT_IDIV__ > + #error has div insn > + #endif > + int i; > +}] > +} > + > +proc check_effective_target_divmod { } { > > Missing comment above. > > +#TODO: Add checks for all targets that have either hardware divmod insn > +# or define libfunc for divmod. > +if { [istarget arm*-*-*] > +|| [istarget x86_64-*-*] } { > + return 1 > +} > +return 0 > +} > > > > > > The new helper functions need documentation in doc/sourcebuild.texi > > Please repost with the doc changes, otherwise this is OK from my side. Hi Ramana, Thanks for the review, I have updated the patch with your suggestions. Cross-tested on arm*-*-*. I came across following issue while bootstrapping on armv8l-unknown-linux-gnueabihf: All the divmod-*-simode.c tests which have /* { dg-require-effective-target divmod_simode } */ appear UNSUPPORTED. That's because this config appears to define __ARM_ARCH_EXT_IDIV__ however idiv appears not to be present. For instance __aeabi_div is called to perform division for the following test-case: int f(int x, int y) { int r = x / y; return r; } Compiling with -O2: f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 push{r4, lr} bl __aeabi_idiv pop {r4, pc} I assumed if __ARM_ARCH_EXT_IDIV was defined, then there should have been idiv instead of call to __aeabi_div or am I missing something ? Um I had configured with --with-tune=cortex-a9. Is that incorrect for armv8l-unknown-linux-gnueabihf ? xgcc -v: Using built-in specs. COLLECT_GCC=armhf-bootstrap-build/gcc/xgcc Target: armv8l-unknown-linux-gnueabihf Configured with: ../gcc/configure --enable-languages=c,c++,fortran --with-arch=armv8-a --with-fpu=neon-fp-armv8 --with-float=hard --with-mode=thumb --enable-multiarch --with-tune=cortex-a9 --disable-multilib Thread model: posix gcc version 7.0.0 20160727 (experimental) (GCC) Thanks, Prathamesh > > Thanks, > Ramana diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 195de48..f449e46 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -61,6 +61,7 @@ #include "builtins.h" #include "tm-constrs.h" #include "rtl-iter.h" +#include "optabs-libfuncs.h" /* This file should be included last. */ #include "target-def.h" @@ -299,6 +300,7 @@ static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void); static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*); static bool arm_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT, const_tree); +static void arm_expand_divmod_libfunc (bool, machine_mode, rtx, rtx, rtx *, rtx *); /* Table of machine attributes. */ @@ -729,6 +731,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_SCHED_FUSION_PRIORITY #define TARGET_SCHED_FUSION_PRIORITY arm_sch
divmod transform: add test-cases
Hi, The following patch adds test-cases for divmod transform. I separated the SImode tests into separate file from DImode tests because certain arm configs (cortex-15) have hardware div insn for SImode but not for DImode, and for that config we want SImode tests to be disabled but not DImode tests. The patch therefore has two target-effective checks: divmod and divmod_simode. Is it OK for trunk ? Thanks, Prathamesh diff --git a/gcc/testsuite/gcc.dg/divmod-1-simode.c b/gcc/testsuite/gcc.dg/divmod-1-simode.c new file mode 100644 index 000..7405f66 --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-1-simode.c @@ -0,0 +1,22 @@ +/* { dg-require-effective-target divmod_simode } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* div dominates mod. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype q = x / y; \ + if (cond) \ +foo (); \ + bigtype r = x % y; \ + return q + r; \ +} + +FOO(int, int, 1) +FOO(int, unsigned, 2) +FOO(unsigned, unsigned, 5) + +/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/divmod-1.c b/gcc/testsuite/gcc.dg/divmod-1.c new file mode 100644 index 000..40aec74 --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-1.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target divmod } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* div dominates mod. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype q = x / y; \ + if (cond) \ +foo (); \ + bigtype r = x % y; \ + return q + r; \ +} + +FOO(int, long long, 3) +FOO(int, unsigned long long, 4) +FOO(unsigned, long long, 6) +FOO(unsigned, unsigned long long, 7) +FOO(long long, long long, 8) +FOO(long long, unsigned long long, 9) +FOO(unsigned long long, unsigned long long, 10) + +/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/divmod-2-simode.c b/gcc/testsuite/gcc.dg/divmod-2-simode.c new file mode 100644 index 000..7c8313b --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-2-simode.c @@ -0,0 +1,22 @@ +/* { dg-require-effective-target divmod_simode } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* mod dominates div. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype r = x % y; \ + if (cond) \ +foo (); \ + bigtype q = x / y; \ + return q + r; \ +} + +FOO(int, int, 1) +FOO(int, unsigned, 2) +FOO(unsigned, unsigned, 5) + +/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/divmod-2.c b/gcc/testsuite/gcc.dg/divmod-2.c new file mode 100644 index 000..6a2216c --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-2.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target divmod } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* mod dominates div. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype r = x % y; \ + if (cond) \ +foo (); \ + bigtype q = x / y; \ + return q + r; \ +} + +FOO(int, long long, 3) +FOO(int, unsigned long long, 4) +FOO(unsigned, long long, 6) +FOO(unsigned, unsigned long long, 7) +FOO(long long, long long, 8) +FOO(long long, unsigned long long, 9) +FOO(unsigned long long, unsigned long long, 10) + +/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/divmod-3-simode.c b/gcc/testsuite/gcc.dg/divmod-3-simode.c new file mode 100644 index 000..6f0f63d --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-3-simode.c @@ -0,0 +1,20 @@ +/* { dg-require-effective-target divmod_simode } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* div comes before mod in same bb. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype q = x / y; \ + bigtype r = x % y; \ + return q + r; \ +} + +FOO(int, int, 1) +FOO(int, unsigned, 2) +FOO(unsigned, unsigne
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
On 28 July 2016 at 20:14, Ramana Radhakrishnan wrote: > >> appear UNSUPPORTED. >> That's because this config appears to define >> __ARM_ARCH_EXT_IDIV__ however idiv appears not to be present. >> >> For instance __aeabi_div is called to perform >> division for the following test-case: >> int f(int x, int y) >> { >> int r = x / y; >> return r; >> } >> >> Compiling with -O2: >> f: >> @ args = 0, pretend = 0, frame = 0 >> @ frame_needed = 0, uses_anonymous_args = 0 >> push{r4, lr} >> bl __aeabi_idiv >> pop {r4, pc} >> >> I assumed if __ARM_ARCH_EXT_IDIV was defined, then >> there should have been idiv instead of call to __aeabi_div >> or am I missing something ? >> >> Um I had configured with --with-tune=cortex-a9. Is that incorrect for >> armv8l-unknown-linux-gnueabihf ? > > --with-tune shouldn't make a difference to code generation settings. The code > generation you are showing is certainly odd for this testcase - and not > something I can reproduce on pristine trunk - so sounds like something's > broken by your patch . You should be seeing an sdiv in this case in the > output - Look at the .arch directive at the top of your file - maybe that > gives you a clue in terms of making sure that you had configured the > toolchain correctly. Hi, There is no .arch in the assembly however there's .cpu arm10dtmi at the top, full assembly: http://pastebin.com/6tzckiG0 With pristine trunk (r238800), I still get __aeabi_idiv for the above test-case. config opts: --enable-languages=c,c++ --target=armv8l-linux-gnueabihf --with-arch=armv8-a --with-fpu=neon-fp-armv8 --with-float=hard --with-mode=thumb --enable-multiarch --disable-multilib Tried with native stage-1 build and cross build. I verified that __ARM_ARCH_EXT_IDIV__ is defined, with following test-case, which fails to compile. #ifdef __ARM_ARCH_EXT_IDIV__ #error "has div insn" #endif int x; Thanks, Prathamesh > > > regards > Ramana > >> >> xgcc -v: >> Using built-in specs. >> COLLECT_GCC=armhf-bootstrap-build/gcc/xgcc >> Target: armv8l-unknown-linux-gnueabihf >> Configured with: ../gcc/configure --enable-languages=c,c++,fortran >> --with-arch=armv8-a --with-fpu=neon-fp-armv8 --with-float=hard >> --with-mode=thumb --enable-multiarch --with-tune=cortex-a9 >> --disable-multilib >> Thread model: posix >> gcc version 7.0.0 20160727 (experimental) (GCC) >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Ramana >
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
On 28 July 2016 at 20:39, Richard Earnshaw wrote: > On 28/07/16 14:36, Prathamesh Kulkarni wrote: >> Um I had configured with --with-tune=cortex-a9. Is that incorrect for >> armv8l-unknown-linux-gnueabihf ? > > Why on earth would you want to generate code for ARMv8 and then tune for > best performance on a core that can only run ARMv7? Oops, I realized later that was a mistake, sorry about that. Regards, Prathamesh > > R.
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On 28 July 2016 at 19:18, Richard Biener wrote: > On Thu, 28 Jul 2016, Prathamesh Kulkarni wrote: > >> On 28 July 2016 at 15:58, Andreas Schwab wrote: >> > On Mo, Jul 25 2016, Prathamesh Kulkarni >> > wrote: >> > >> >> diff --git a/gcc/testsuite/gcc.dg/pr70920-4.c >> >> b/gcc/testsuite/gcc.dg/pr70920-4.c >> >> new file mode 100644 >> >> index 000..dedb895 >> >> --- /dev/null >> >> +++ b/gcc/testsuite/gcc.dg/pr70920-4.c >> >> @@ -0,0 +1,21 @@ >> >> +/* { dg-do compile } */ >> >> +/* { dg-options "-O2 -fdump-tree-ccp-details -Wno-int-to-pointer-cast" } >> >> */ >> >> + >> >> +#include >> >> + >> >> +void f1(); >> >> +void f2(); >> >> + >> >> +void >> >> +foo (int a) >> >> +{ >> >> + void *cst = 0; >> >> + if ((int *) a == cst) >> >> +{ >> >> + f1 (); >> >> + if (a) >> >> + f2 (); >> >> +} >> >> +} >> >> + >> >> +/* { dg-final { scan-tree-dump "gimple_simplified to if \\(_\[0-9\]* == >> >> 0\\)" "ccp1" } } */ >> > >> > This fails on all ilp32 platforms. >> Oops, sorry for the breakage. >> With -m32, the pattern is applied during forwprop1 rather than ccp1. >> I wonder though why ccp1 fails to fold the pattern with -m32 ? >> Looking at the dumps: >> >> without -m32: >> input to ccp1 pass: >> : >> cst_4 = 0B; >> _1 = (long int) a_5(D); >> _2 = (void *) _1; >> if (cst_4 == _2) >> goto ; >> else >> goto ; >> >> cc1 pass dump shows: >> Substituting values and folding statements >> >> Folding statement: _1 = (long int) a_5(D); >> Not folded >> Folding statement: _2 = (void *) _1; >> Not folded >> Folding statement: if (cst_4 == _2) >> which is likely CONSTANT >> Applying pattern match.pd:2537, gimple-match.c:6530 >> gimple_simplified to if (_1 == 0) >> Folded into: if (_1 == 0) >> >> with -m32: >> input to ccp1 pass: >> : >> cst_3 = 0B; >> a.0_1 = (void *) a_4(D); >> if (cst_3 == a.0_1) >> goto ; >> else >> goto ; >> >> ccp1 pass dump shows: >> Substituting values and folding statements >> >> Folding statement: a.0_1 = (void *) a_4(D); >> Not folded >> Folding statement: if (cst_3 == a.0_1) >> which is likely CONSTANT >> Folded into: if (a.0_1 == 0B) >> >> I am not able to understand why it doesn't fold it to >> if (a_4(D) == 0) ? >> forwprop1 folds a.0_1 == 0B to a_4(D) == 0. > > It's because CCP folds with follow-single-use edges but the > match-and-simplify code uses a single callback to valueize and > decide whether its valid to follow the SSA edge. I did have some > old patches trying to fix that but never followed up on those. Thanks for the explanation. > >> I suppose the test-case would need to scan ccp1 for non-ilp targets >> and forwprop1 for >> ilp targets. How do update the test-case to reflect this ? > > It's simpler to verify that at some point (forwprop) we have the > expected IL rather than testing for the match debug prints. In forwprop dump, For m32, we have if (a_4(D) == 0) and without m32: if (_1 == 0) So need to match either a default def or anonymous name in the test-case, which I am having a bit of trouble writing regex for. In the patch i simply chose to match "== 0\\)", not sure if that's a good idea. Also how do I update the test-case so that it gets tested twice, once with -m32 and once without ? Thanks, Prathamesh > > Richard. > >> Thanks, >> Prathamesh >> > >> > Andreas. >> > >> > -- >> > Andreas Schwab, SUSE Labs, sch...@suse.de >> > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 >> > "And now for something completely different." >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) diff --git a/gcc/testsuite/gcc.dg/pr70920-4.c b/gcc/testsuite/gcc.dg/pr70920-4.c index dedb895..035c3cb 100644 --- a/gcc/testsuite/gcc.dg/pr70920-4.c +++ b/gcc/testsuite/gcc.dg/pr70920-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-ccp-details -Wno-int-to-pointer-cast" } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details -Wno-int-to-pointer-cast" } */ #include @@ -18,4 +18,4 @@ foo (int a) } } -/* { dg-final { scan-tree-dump "gimple_simplified to if \\(_\[0-9\]* == 0\\)" "ccp1" } } */ +/* { dg-final { scan-tree-dump "== 0\\)" "forwprop1" } } */
Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst
On 29 July 2016 at 12:42, Richard Biener wrote: > On Fri, 29 Jul 2016, Prathamesh Kulkarni wrote: > >> On 28 July 2016 at 19:18, Richard Biener wrote: >> > On Thu, 28 Jul 2016, Prathamesh Kulkarni wrote: >> > >> >> On 28 July 2016 at 15:58, Andreas Schwab wrote: >> >> > On Mo, Jul 25 2016, Prathamesh Kulkarni >> >> > wrote: >> >> > >> >> >> diff --git a/gcc/testsuite/gcc.dg/pr70920-4.c >> >> >> b/gcc/testsuite/gcc.dg/pr70920-4.c >> >> >> new file mode 100644 >> >> >> index 000..dedb895 >> >> >> --- /dev/null >> >> >> +++ b/gcc/testsuite/gcc.dg/pr70920-4.c >> >> >> @@ -0,0 +1,21 @@ >> >> >> +/* { dg-do compile } */ >> >> >> +/* { dg-options "-O2 -fdump-tree-ccp-details >> >> >> -Wno-int-to-pointer-cast" } */ >> >> >> + >> >> >> +#include >> >> >> + >> >> >> +void f1(); >> >> >> +void f2(); >> >> >> + >> >> >> +void >> >> >> +foo (int a) >> >> >> +{ >> >> >> + void *cst = 0; >> >> >> + if ((int *) a == cst) >> >> >> +{ >> >> >> + f1 (); >> >> >> + if (a) >> >> >> + f2 (); >> >> >> +} >> >> >> +} >> >> >> + >> >> >> +/* { dg-final { scan-tree-dump "gimple_simplified to if \\(_\[0-9\]* >> >> >> == 0\\)" "ccp1" } } */ >> >> > >> >> > This fails on all ilp32 platforms. >> >> Oops, sorry for the breakage. >> >> With -m32, the pattern is applied during forwprop1 rather than ccp1. >> >> I wonder though why ccp1 fails to fold the pattern with -m32 ? >> >> Looking at the dumps: >> >> >> >> without -m32: >> >> input to ccp1 pass: >> >> : >> >> cst_4 = 0B; >> >> _1 = (long int) a_5(D); >> >> _2 = (void *) _1; >> >> if (cst_4 == _2) >> >> goto ; >> >> else >> >> goto ; >> >> >> >> cc1 pass dump shows: >> >> Substituting values and folding statements >> >> >> >> Folding statement: _1 = (long int) a_5(D); >> >> Not folded >> >> Folding statement: _2 = (void *) _1; >> >> Not folded >> >> Folding statement: if (cst_4 == _2) >> >> which is likely CONSTANT >> >> Applying pattern match.pd:2537, gimple-match.c:6530 >> >> gimple_simplified to if (_1 == 0) >> >> Folded into: if (_1 == 0) >> >> >> >> with -m32: >> >> input to ccp1 pass: >> >> : >> >> cst_3 = 0B; >> >> a.0_1 = (void *) a_4(D); >> >> if (cst_3 == a.0_1) >> >> goto ; >> >> else >> >> goto ; >> >> >> >> ccp1 pass dump shows: >> >> Substituting values and folding statements >> >> >> >> Folding statement: a.0_1 = (void *) a_4(D); >> >> Not folded >> >> Folding statement: if (cst_3 == a.0_1) >> >> which is likely CONSTANT >> >> Folded into: if (a.0_1 == 0B) >> >> >> >> I am not able to understand why it doesn't fold it to >> >> if (a_4(D) == 0) ? >> >> forwprop1 folds a.0_1 == 0B to a_4(D) == 0. >> > >> > It's because CCP folds with follow-single-use edges but the >> > match-and-simplify code uses a single callback to valueize and >> > decide whether its valid to follow the SSA edge. I did have some >> > old patches trying to fix that but never followed up on those. >> Thanks for the explanation. >> > >> >> I suppose the test-case would need to scan ccp1 for non-ilp targets >> >> and forwprop1 for >> >> ilp targets. How do update the test-case to reflect this ? >> > >> > It's simpler to verify that at some point (forwprop) we have the >> > expected IL rather than testing for the match debug prints. >> In forwprop dump, >> For m32, we have if (a_4(D) == 0) >> and without m32: if (_1 == 0) >> So need to match either a default def or anonymous name >> in the test-case, which I am having a bit of trouble writing regex for. >> In the patch i simply chose
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
On 29 July 2016 at 05:40, Prathamesh Kulkarni wrote: > On 28 July 2016 at 20:14, Ramana Radhakrishnan > wrote: >> >>> appear UNSUPPORTED. >>> That's because this config appears to define >>> __ARM_ARCH_EXT_IDIV__ however idiv appears not to be present. >>> >>> For instance __aeabi_div is called to perform >>> division for the following test-case: >>> int f(int x, int y) >>> { >>> int r = x / y; >>> return r; >>> } >>> >>> Compiling with -O2: >>> f: >>> @ args = 0, pretend = 0, frame = 0 >>> @ frame_needed = 0, uses_anonymous_args = 0 >>> push{r4, lr} >>> bl __aeabi_idiv >>> pop {r4, pc} >>> >>> I assumed if __ARM_ARCH_EXT_IDIV was defined, then >>> there should have been idiv instead of call to __aeabi_div >>> or am I missing something ? >>> >>> Um I had configured with --with-tune=cortex-a9. Is that incorrect for >>> armv8l-unknown-linux-gnueabihf ? >> >> --with-tune shouldn't make a difference to code generation settings. The >> code generation you are showing is certainly odd for this testcase - and >> not something I can reproduce on pristine trunk - so sounds like something's >> broken by your patch . You should be seeing an sdiv in this case in the >> output - Look at the .arch directive at the top of your file - maybe that >> gives you a clue in terms of making sure that you had configured the >> toolchain correctly. > Hi, > There is no .arch in the assembly however there's .cpu arm10dtmi at > the top, full assembly: http://pastebin.com/6tzckiG0 > With pristine trunk (r238800), I still get __aeabi_idiv for the above > test-case. > config opts: --enable-languages=c,c++ --target=armv8l-linux-gnueabihf > --with-arch=armv8-a --with-fpu=neon-fp-armv8 --with-float=hard > --with-mode=thumb --enable-multiarch --disable-multilib > Tried with native stage-1 build and cross build. > I verified that __ARM_ARCH_EXT_IDIV__ is defined, with following > test-case, which fails to compile. > #ifdef __ARM_ARCH_EXT_IDIV__ > #error "has div insn" > #endif > int x; Apparently looks like I screwed sth in my build :/ After re-building from scratch, I could get sdiv in the output -;) Verified that the patch does not regress on armv8l-unknown-linux-gnu and cross-tested on arm*-*-*. Ok for trunk ? Thanks, Prathamesh > > Thanks, > Prathamesh >> >> >> regards >> Ramana >> >>> >>> xgcc -v: >>> Using built-in specs. >>> COLLECT_GCC=armhf-bootstrap-build/gcc/xgcc >>> Target: armv8l-unknown-linux-gnueabihf >>> Configured with: ../gcc/configure --enable-languages=c,c++,fortran >>> --with-arch=armv8-a --with-fpu=neon-fp-armv8 --with-float=hard >>> --with-mode=thumb --enable-multiarch --with-tune=cortex-a9 >>> --disable-multilib >>> Thread model: posix >>> gcc version 7.0.0 20160727 (experimental) (GCC) >>> >>> Thanks, >>> Prathamesh >>>> >>>> Thanks, >>>> Ramana >>
Re: [PATCH] Replacing gcc's dependence on libiberty's fnmatch to gnulib's fnmatch
On 30 July 2016 at 03:40, Joseph Myers wrote: > On Tue, 26 Jul 2016, Prathamesh Kulkarni wrote: > >> >> GCC can run on other systems besides OSX and GNU/Linux, how can you >> >> test that your change does not break anything on those systems? >> >> >> > Well I have access to these two systems only. How would you suggest I >> > test my patches on all possible systems? >> Maybe building contrib/config-list.mk could help ? AFAIK, it will cross build >> the gcc tree for different targets, but not run the testsuite, however >> I could be wrong. > > Building for different targets is fairly irrelevant here; the issue is > building for different hosts, which is harder. > > (It's possible there are some portability interfaces only used on > particular targets, but that's not the main use case for libiberty in > GCC.) Ah indeed, libiberty/gnulib would be built for hosts. Thanks for pointing out! Thanks, Prathamesh > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: [RFC] warn on dead function calls in ipa-pure-const [1/4]
On 31 July 2016 at 22:01, Jan Hubicka wrote: >> On Tue, 26 Jul 2016, Prathamesh Kulkarni wrote: >> >> > + warning_at (gimple_location (g), OPT_Wunused_value, >> > + "Call from %s to %s has no effect", >> > + e->caller->name (), e->callee->name ()); >> >> Diagnostics should not start with capital letters. Function names in >> diagnostics should be quoted, so %qs. Also, what form is this name in? >> If it's the internal UTF-8 form, you need to use identifier_to_locale on >> it to produce something suitable for a diagnostic. And for C++ you need >> to make sure the name is a proper pretty name (including classes / >> namespaces / type etc.) as produced by the decl_printable_name langhook, >> before passing it to identifier_to_locale. > > I think you just want to pass e->caller->decl (with corresponding % formatter) > rather than name() Hi, Thanks for the reviews. However after discussing with Richard, we decided to drop this warning for now, because it can lead to potentially false positives like for the following case in stor-layout.c: /* Stop if the mode requires too much alignment. */ if (GET_MODE_ALIGNMENT (m_mode) > m_align && SLOW_UNALIGNED_ACCESS (m_mode, m_align)) break; On x86_64, SLOW_UNALIGNED_ACCESS is #defined to 0, so the condition essentially becomes: if (get_mode_alignment (m_mode) > m_align && 0) break; and the patch warns for the above dead call. However the call might not always be dead, since it depends on conditionally defined macro SLOW_UNALIGNED_ACCESS, which other targets may perhaps define as a run-time value. Unfortunately I don't have any good ideas to address this issue. We could restrict the warning for cases when call is not a sub-expression, however I suppose we would need some help from FE's to determine if call_expr is outermost expression ? I thought of adding another flag to tree_exp for this purpose, but that doesn't look like a good idea. I would be grateful for suggestions for addressing this issue. Thanks, Prathamesh > > Honza
Re: [PR71078] x / abs(x) -> copysign (1.0, x)
On 30 July 2016 at 02:57, Joseph Myers wrote: > On Tue, 26 Jul 2016, Richard Sandiford wrote: > >> (which are really just extended tree codes). I suppose copysign is >> a special case since we can always open code it, but in general we >> shouldn't fall back to something that could generate a call. > > We can't always open code copysign (IBM long double, see PR 58797). Hi, Thanks for pointing that out. The attached patch doesn't transform x/abs(x) -> copysign(1.0, x) for long double. OK for trunk ? Thanks, Prathamesh > > -- > Joseph S. Myers > jos...@codesourcery.com 2016-08-01 Prathamesh Kulkarni * match.pd (x/abs(x) -> copysign(1.0, x)): Don't transform for long double. testsuite/ * gcc.dg/tree-ssa/pr71078-1.c: Remove f3. * gcc.dg/tree-ssa/pr71078-2.c: Likewise. diff --git a/gcc/match.pd b/gcc/match.pd index 2fd898a..3b6aaeb 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -198,17 +198,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* PR71078: x / abs(x) -> copysign (1.0, x) */ (simplify (rdiv:C (convert? @0) (convert? (abs @0))) + /* We can't always transform to copysign for long double. + See PR58797. */ (if (SCALAR_FLOAT_TYPE_P (type) + && ! types_match (type, long_double_type_node) && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)) (switch (if (types_match (type, float_type_node)) (BUILT_IN_COPYSIGNF { build_one_cst (type); } (convert @0))) (if (types_match (type, double_type_node)) - (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0))) -(if (types_match (type, long_double_type_node)) - (BUILT_IN_COPYSIGNL { build_one_cst (type); } (convert @0)) - + (BUILT_IN_COPYSIGN { build_one_cst (type); } (convert @0)) + /* In IEEE floating point, x/1 is not equivalent to x for snans. */ (simplify (rdiv @0 real_onep) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c index 6204c14..4606b2b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-1.c @@ -17,13 +17,5 @@ double f2(double x) return t2; } -long double f3 (long double x) -{ - long double t1 = fabsl (x); - long double t2 = x / t1; - return t2; -} - /* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ /* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ -/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c index 96485af..eaff4cc 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71078-2.c @@ -17,13 +17,5 @@ double f2(double x) return t2; } -long double f3 (long double x) -{ - long double t1 = fabsl (x); - long double t2 = t1 / x; - return t2; -} - /* { dg-final { scan-tree-dump "__builtin_copysignf" "forwprop1" } } */ /* { dg-final { scan-tree-dump "__builtin_copysign" "forwprop1" } } */ -/* { dg-final { scan-tree-dump "__builtin_copysignl" "forwprop1" } } */
fold strlen (s) eq/ne 0 to *s eq/ne 0 on GIMPLE
Hi Richard, The attached patch tries to fold strlen (s) eq/ne 0 to *s eq/ne 0 on GIMPLE. I am not sure where was the ideal place to put this transform in and ended up adding it to strlen_optimize_stmt(). Does that look OK ? I needed to add TODO_update_ssa to strlen pass, otherwise we hit the following assert in execute_todo(): if (flag_checking && cfun && need_ssa_update_p (cfun)) gcc_assert (flags & TODO_update_ssa_any); Bootstrap+test in progress on x86_64-unknown-linux-gnu. Thanks, Prathamesh 2016-08-01 Prathamesh Kulkarni * tree-ssa-strlen.c (strlen_optimize_stmt): Fold strlen (s) eq/ne 0 to *s eq/ne 0. Change todo_flags_finish for pass_data_strlen from 0 to TODO_update_ssa. testsuite/ * gcc.dg/strlenopt-30.c: New test-case. diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c b/gcc/testsuite/gcc.dg/strlenopt-30.c new file mode 100644 index 000..da9732f --- /dev/null +++ b/gcc/testsuite/gcc.dg/strlenopt-30.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +__attribute__((noinline, no_icf)) +_Bool f1(const char *s) +{ + unsigned long len = __builtin_strlen (s); + _Bool ret = (len == 0); + return ret; +} + +/* Check CONVERT_EXPR's get properly handled. */ +__attribute__((noinline, no_icf)) +_Bool f2(const char *s) +{ + unsigned len = __builtin_strlen (s); + return len == 0; +} + +/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */ diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 9d7b4df..54f8109 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-chkp.h" #include "tree-hash-traits.h" #include "builtins.h" +#include "tree-pretty-print.h" /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive value is an index into strinfo vector, negative value stands for @@ -2302,6 +2303,43 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) else if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR) handle_pointer_plus (gsi); } + /* strlen (s) eq/ne 0 -> *s eq/ne 0. */ + else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + { + tree rhs2 = gimple_assign_rhs2 (stmt); + tree_code code = gimple_assign_rhs_code (stmt); + + if ((code == EQ_EXPR || code == NE_EXPR) && integer_zerop (rhs2)) + { + tree rhs1 = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs1) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_a (def_stmt) + && (gimple_assign_rhs_code (def_stmt) == CONVERT_EXPR + || gimple_assign_rhs_code (def_stmt) == NOP_EXPR) + && TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME) + def_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def_stmt)); + + if (gcall *call_stmt = dyn_cast (def_stmt)) + { + tree callee = gimple_call_fndecl (call_stmt); + if (valid_builtin_call (call_stmt) + && DECL_FUNCTION_CODE (callee) == BUILT_IN_STRLEN) + { + tree arg = gimple_call_arg (call_stmt, 0); + tree op = build2 (MEM_REF, char_type_node, arg, build_zero_cst (TREE_TYPE (arg))); + tree temp = make_temp_ssa_name (TREE_TYPE (op), NULL, "strlen"); + gimple *memref_stmt = gimple_build_assign (temp, op); + gimple_stmt_iterator call_gsi = gsi_for_stmt (call_stmt); + gsi_insert_before (&call_gsi, memref_stmt, GSI_SAME_STMT); + gassign *g = gimple_build_assign (gimple_call_lhs (call_stmt), CONVERT_EXPR, temp); + gsi_replace (&call_gsi, g, true); + } + } + } + } + } else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) { tree type = TREE_TYPE (lhs); @@ -2505,7 +2543,7 @@ const pass_data pass_data_strlen = 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ - 0, /* todo_flags_finish */ + TODO_update_ssa, /* todo_flags_finish */ }; class pass_strlen : public gimple_opt_pass
PR78501
Hi, The attached patch fixes ada bootstrap failure. Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada Cross-tested on aarch64*-*-*, arm*-*-* with --enable-languages=c,c++,fortran. OK to commit ? Thanks, Prathamesh 2016-11-24 Jakub Jelinek Prathamesh Kulkarni PR middle-end/78501 * tree-vrp.c (extract_range_basic): Check for ptrdiff_type_node to be non null and it's precision matches precision of lhs's type. diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 33e0a75..8bea4db 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4028,15 +4028,20 @@ extract_range_basic (value_range *vr, gimple *stmt) } return; case CFN_BUILT_IN_STRLEN: - { - tree type = TREE_TYPE (gimple_call_lhs (stmt)); - tree max = vrp_val_max (ptrdiff_type_node); - wide_int wmax = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max))); - tree range_min = build_zero_cst (type); - tree range_max = wide_int_to_tree (type, wmax - 1); - set_value_range (vr, VR_RANGE, range_min, range_max, NULL); - } - return; + if (tree lhs = gimple_call_lhs (stmt)) + if (ptrdiff_type_node + && (TYPE_PRECISION (ptrdiff_type_node) + == TYPE_PRECISION (TREE_TYPE (lhs + { + tree type = TREE_TYPE (lhs); + tree max = vrp_val_max (ptrdiff_type_node); + wide_int wmax = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max))); + tree range_min = build_zero_cst (type); + tree range_max = wide_int_to_tree (type, wmax - 1); + set_value_range (vr, VR_RANGE, range_min, range_max, NULL); + return; + } + break; default: break; }
Re: [tree-tailcall] Check if function returns it's argument
On 24 November 2016 at 18:08, Richard Biener wrote: > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 17:48, Richard Biener wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 14:07, Richard Biener wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> Hi, >> >> >> Consider following test-case: >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> { >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> return a1; >> >> >> } >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> and the call could be emitted as a tail-call. >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> but if it is changed to: >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return t1; >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> Does this patch look OK ? >> >> > >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't. >> >> > */ >> >> > +tree >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> > +{ >> >> > >> >> > >> >> > Please just inline it at the single use - the name is not terribly >> >> > informative. >> >> > >> >> > I'm not sure you can rely on code-generation working if you not >> >> > effectively change the IL to >> >> > >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> > return a1; >> >> > >> >> > someone more familiar with RTL expansion plus tail call emission on >> >> > RTL needs to chime in. >> >> Well I was trying to copy-propagate function's argument into uses of >> >> it's return value if >> >> function returned that argument, so the assignment to lhs of call >> >> could be made redundant. >> >> >> >> eg: >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> { >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> return t1; >> >> } >> >> >> >> After patch, copyprop transformed it into: >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> return a1; >> > >> > But that's a bad transform -- if we know that t1 == a1 then it's >> > better to use t1 as that's readily available in the return register >> > while the register for a1 might have been clobbered and thus we >> > need to spill it for the later return. >> Oh I didn't realize this could possibly pessimize RA. >> For test-case: >> >> void *t1 = memcpy (dest, src, n); >> if (t1 != dest) >> __builtin_abort (); >> >> we could copy-propagate t1 into cond_expr and make the condition redundant. >> However I suppose this particular case could be handled with VRP instead >> (t1 and dest should be marked equivalent) ? > > Yeah, exposing this to value-numbering in general can enable some > optimizations (but I wouldn't put it in copyprop). Note it's then > difficult to avoid copy-propgating things... > > The user can also write > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3) > { > __builtin_memcpy (a1, a2, a3); > return a1; > } > > so it's good to improve code-gen for that (for the tailcall issue). For the tail-call, issue should we artificially create a lhs and use that as return value (perhaps by a separate pass before tailcall) ? __builtin_memcpy (a1, a2, a3); return a1; gets transformed to: _1 = __builtin_memcpy (a1, a2, a3) return _1; So tail-call optimization pass would see the IL in it's expected form. Thanks, Prathamesh > > Richard. > >> Thanks, >> Prathamesh >> > >> >> But this now interferes with tail-call optimization, because it is not >> >> able to emit memcpy >> >> as tail-call anymore due to which the patch regressed 20050503-1.c. >> >> I am not sure how to workaround this. >> >> >> >> Thanks, >> >> Prathamesh >> >> > >> >> > Richard. >> >> >> > >> > -- >> > Richard Biener >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB >> > 21284 (AG Nuernberg) >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)
change initialization of ptrdiff_type_node
Hi, This patch changes initialization of ptrdiff_type_node in lto-lang.c, based on Jakub's suggestion in PR78501 comment 12: "The other uses of ptrdiff_type_node in the middle-end, which need fixing anyway, would need something like your patch, but not sure if it is not a waste of time to compute it if the C/C++ FE will immediately override it anyway. So perhaps just compute it that way in the LTO FE? I mean, for the *printf warning/length stuff, those calls shouldn't appear in Ada/Go/Fortran code, they can in LTO or C-family." For unsigned_ptrdiff_type_node, I removed it's definition from c-common.h and moved it to tree.h. Is that OK ? Thanks, Prathamesh 2016-11-25 Prathamesh Kulkarni * tree-core.h (TI_UNSIGNED_PTRDIFF_TYPE): New enum value. * tree.h (unsigned_ptrdiff_type_node): New macro. c-family/ * c-common.h (CTI_UNSIGNED_PTRDIFF_TYPE): Remove. (unsigned_ptrdiff_type_node): Likewise. lto/ * lto-lang.c (lto_init): Change initialization of ptrdiff_type_node. Initialize unsigned_ptrdiff_type_node. diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index a23193e..e93a65a 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -289,7 +289,6 @@ enum c_tree_index CTI_UNDERLYING_WCHAR_TYPE, CTI_WINT_TYPE, CTI_SIGNED_SIZE_TYPE, /* For format checking only. */ -CTI_UNSIGNED_PTRDIFF_TYPE, /* For format checking only. */ CTI_INTMAX_TYPE, CTI_UINTMAX_TYPE, CTI_WIDEST_INT_LIT_TYPE, @@ -432,7 +431,6 @@ extern const unsigned int num_c_common_reswords; #define underlying_wchar_type_node c_global_trees[CTI_UNDERLYING_WCHAR_TYPE] #define wint_type_node c_global_trees[CTI_WINT_TYPE] #define signed_size_type_node c_global_trees[CTI_SIGNED_SIZE_TYPE] -#define unsigned_ptrdiff_type_node c_global_trees[CTI_UNSIGNED_PTRDIFF_TYPE] #define intmax_type_node c_global_trees[CTI_INTMAX_TYPE] #define uintmax_type_node c_global_trees[CTI_UINTMAX_TYPE] #define widest_integer_literal_type_node c_global_trees[CTI_WIDEST_INT_LIT_TYPE] diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c index a5f04ba..09b6d18 100644 --- a/gcc/lto/lto-lang.c +++ b/gcc/lto/lto-lang.c @@ -1271,8 +1271,30 @@ lto_init (void) gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) == const_ptr_type_node); - ptrdiff_type_node = integer_type_node; + if (strcmp (PTRDIFF_TYPE, "int") == 0) +ptrdiff_type_node = integer_type_node; + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) +ptrdiff_type_node = long_integer_type_node; + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) +ptrdiff_type_node = long_long_integer_type_node; + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) +ptrdiff_type_node = short_integer_type_node; + else +{ + ptrdiff_type_node = NULL_TREE; + for (int i = 0; i < NUM_INT_N_ENTS; i++) + if (int_n_enabled_p[i]) + { + char name[50]; + sprintf (name, "__int%d", int_n_data[i].bitsize); + if (strcmp (name, PTRDIFF_TYPE) == 0) + ptrdiff_type_node = int_n_trees[i].signed_type; + } + if (ptrdiff_type_node == NULL_TREE) + gcc_unreachable (); +} + unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node); lto_build_c_type_nodes (); gcc_assert (va_list_type_node); diff --git a/gcc/tree-core.h b/gcc/tree-core.h index eec2d4f..6c52387 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -617,6 +617,7 @@ enum tree_index { TI_SIZE_TYPE, TI_PID_TYPE, TI_PTRDIFF_TYPE, + TI_UNSIGNED_PTRDIFF_TYPE, TI_VA_LIST_TYPE, TI_VA_LIST_GPR_COUNTER_FIELD, TI_VA_LIST_FPR_COUNTER_FIELD, diff --git a/gcc/tree.h b/gcc/tree.h index 62cd7bb..ae69d0d 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum tree_code __code, int __i, #define size_type_node global_trees[TI_SIZE_TYPE] #define pid_type_node global_trees[TI_PID_TYPE] #define ptrdiff_type_node global_trees[TI_PTRDIFF_TYPE] +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE] #define va_list_type_node global_trees[TI_VA_LIST_TYPE] #define va_list_gpr_counter_field global_trees[TI_VA_LIST_GPR_COUNTER_FIELD] #define va_list_fpr_counter_field global_trees[TI_VA_LIST_FPR_COUNTER_FIELD]
Re: [tree-tailcall] Check if function returns it's argument
On 25 November 2016 at 13:37, Richard Biener wrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 24 November 2016 at 18:08, Richard Biener wrote: >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 17:48, Richard Biener wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 24 November 2016 at 14:07, Richard Biener wrote: >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> > >> >> >> >> Hi, >> >> >> >> Consider following test-case: >> >> >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> { >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> >> return a1; >> >> >> >> } >> >> >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> >> and the call could be emitted as a tail-call. >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> >> but if it is changed to: >> >> >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return t1; >> >> >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> >> Does this patch look OK ? >> >> >> > >> >> >> > +/* Return arg, if function returns it's argument or NULL if it >> >> >> > doesn't. >> >> >> > */ >> >> >> > +tree >> >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> >> > +{ >> >> >> > >> >> >> > >> >> >> > Please just inline it at the single use - the name is not terribly >> >> >> > informative. >> >> >> > >> >> >> > I'm not sure you can rely on code-generation working if you not >> >> >> > effectively change the IL to >> >> >> > >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> >> > return a1; >> >> >> > >> >> >> > someone more familiar with RTL expansion plus tail call emission on >> >> >> > RTL needs to chime in. >> >> >> Well I was trying to copy-propagate function's argument into uses of >> >> >> it's return value if >> >> >> function returned that argument, so the assignment to lhs of call >> >> >> could be made redundant. >> >> >> >> >> >> eg: >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> { >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return t1; >> >> >> } >> >> >> >> >> >> After patch, copyprop transformed it into: >> >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> >> return a1; >> >> > >> >> > But that's a bad transform -- if we know that t1 == a1 then it's >> >> > better to use t1 as that's readily available in the return register >> >> > while the register for a1 might have been clobbered and thus we >> >> > need to spill it for the later return. >> >> Oh I didn't realize this could possibly pessimize RA. >> >> For test-case: >> >> >> >> void *t1 = memcpy (dest, src, n); >> >> if (t1 != dest) >> >> __builtin_abort (); >> >> >> >> we could copy-propagate t1 into cond_expr and make the condition >> >> redundant. >> >> However I suppose this particular case could be handled with VRP instead >> >> (t1 and dest should be marked equivalent) ? >> > >> > Yeah, exposing this to value-numbering in general can enable some >> > optimizations (but I wouldn't put it in copyprop). N
Re: [tree-tailcall] Check if function returns it's argument
On 25 November 2016 at 13:55, Richard Biener wrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 25 November 2016 at 13:37, Richard Biener wrote: >> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 24 November 2016 at 18:08, Richard Biener wrote: >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 24 November 2016 at 17:48, Richard Biener wrote: >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> > >> >> >> >> On 24 November 2016 at 14:07, Richard Biener >> >> >> >> wrote: >> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >> > >> >> >> >> >> Hi, >> >> >> >> >> Consider following test-case: >> >> >> >> >> >> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> >> { >> >> >> >> >> __builtin_memcpy (a1, a2, a3); >> >> >> >> >> return a1; >> >> >> >> >> } >> >> >> >> >> >> >> >> >> >> return a1 can be considered equivalent to return value of memcpy, >> >> >> >> >> and the call could be emitted as a tail-call. >> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call, >> >> >> >> >> but if it is changed to: >> >> >> >> >> >> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> >> return t1; >> >> >> >> >> >> >> >> >> >> Then memcpy is emitted as a tail-call. >> >> >> >> >> The attached patch tries to handle the former case. >> >> >> >> >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-* >> >> >> >> >> Does this patch look OK ? >> >> >> >> > >> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it >> >> >> >> > doesn't. >> >> >> >> > */ >> >> >> >> > +tree >> >> >> >> > +gimple_call_return_arg (gcall *call_stmt) >> >> >> >> > +{ >> >> >> >> > >> >> >> >> > >> >> >> >> > Please just inline it at the single use - the name is not terribly >> >> >> >> > informative. >> >> >> >> > >> >> >> >> > I'm not sure you can rely on code-generation working if you not >> >> >> >> > effectively change the IL to >> >> >> >> > >> >> >> >> > a1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> > return a1; >> >> >> >> > >> >> >> >> > someone more familiar with RTL expansion plus tail call emission >> >> >> >> > on >> >> >> >> > RTL needs to chime in. >> >> >> >> Well I was trying to copy-propagate function's argument into uses of >> >> >> >> it's return value if >> >> >> >> function returned that argument, so the assignment to lhs of call >> >> >> >> could be made redundant. >> >> >> >> >> >> >> >> eg: >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3) >> >> >> >> { >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return t1; >> >> >> >> } >> >> >> >> >> >> >> >> After patch, copyprop transformed it into: >> >> >> >> t1 = __builtin_memcpy (a1, a2, a3); >> >> >> >> return a1; >> >> >> > >> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's >> >> >> > better to use t1 as that's readily available in the return register >> >&g
Re: change initialization of ptrdiff_type_node
On 25 November 2016 at 13:43, Richard Biener wrote: > On Fri, 25 Nov 2016, Jakub Jelinek wrote: > >> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote: >> > --- a/gcc/lto/lto-lang.c >> > +++ b/gcc/lto/lto-lang.c >> > @@ -1271,8 +1271,30 @@ lto_init (void) >> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) >> > == const_ptr_type_node); >> > >> > - ptrdiff_type_node = integer_type_node; >> > + if (strcmp (PTRDIFF_TYPE, "int") == 0) >> > +ptrdiff_type_node = integer_type_node; >> > + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) >> > +ptrdiff_type_node = long_integer_type_node; >> > + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) >> > +ptrdiff_type_node = long_long_integer_type_node; >> > + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) >> > +ptrdiff_type_node = short_integer_type_node; >> > + else >> > +{ >> > + ptrdiff_type_node = NULL_TREE; >> > + for (int i = 0; i < NUM_INT_N_ENTS; i++) >> > + if (int_n_enabled_p[i]) >> > + { >> > + char name[50]; >> > + sprintf (name, "__int%d", int_n_data[i].bitsize); >> > + if (strcmp (name, PTRDIFF_TYPE) == 0) >> > + ptrdiff_type_node = int_n_trees[i].signed_type; >> > + } >> > + if (ptrdiff_type_node == NULL_TREE) >> > + gcc_unreachable (); >> > +} >> >> This looks ok to me. > > But I'd like to see this in build_common_tree_nodes alongside > the initialization of size_type_node (and thus removed from > c_common_nodes_and_builtins). This way you can simply remove > the lto-lang.c code as well. > > Please then also remove the ptrdiff_type_node re-set from > free_lang_data (). Hi Richard, Does this version look OK ? Validation in progress. Thanks, Prathamesh > >> > >> > + unsigned_ptrdiff_type_node = unsigned_type_for (ptrdiff_type_node); >> >lto_build_c_type_nodes (); >> >gcc_assert (va_list_type_node); >> >> But why this and the remaining hunks? Nothing in the middle-end >> needs it, IMHO it should be kept in c-family/. > > Yeah, this change looks unnecessary to me. > >> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h >> > index eec2d4f..6c52387 100644 >> > --- a/gcc/tree-core.h >> > +++ b/gcc/tree-core.h >> > @@ -617,6 +617,7 @@ enum tree_index { >> >TI_SIZE_TYPE, >> >TI_PID_TYPE, >> >TI_PTRDIFF_TYPE, >> > + TI_UNSIGNED_PTRDIFF_TYPE, >> >TI_VA_LIST_TYPE, >> >TI_VA_LIST_GPR_COUNTER_FIELD, >> >TI_VA_LIST_FPR_COUNTER_FIELD, >> > diff --git a/gcc/tree.h b/gcc/tree.h >> > index 62cd7bb..ae69d0d 100644 >> > --- a/gcc/tree.h >> > +++ b/gcc/tree.h >> > @@ -3667,6 +3667,7 @@ tree_operand_check_code (const_tree __t, enum >> > tree_code __code, int __i, >> > #define size_type_node global_trees[TI_SIZE_TYPE] >> > #define pid_type_node global_trees[TI_PID_TYPE] >> > #define ptrdiff_type_node global_trees[TI_PTRDIFF_TYPE] >> > +#define unsigned_ptrdiff_type_node global_trees[TI_UNSIGNED_PTRDIFF_TYPE] >> > #define va_list_type_node global_trees[TI_VA_LIST_TYPE] >> > #define va_list_gpr_counter_field >> > global_trees[TI_VA_LIST_GPR_COUNTER_FIELD] >> > #define va_list_fpr_counter_field >> > global_trees[TI_VA_LIST_FPR_COUNTER_FIELD] >> >> >> Jakub >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) 2016-11-25 Prathamesh Kulkarni * tree.c (build_common_tree_nodes): Initialize ptrdiff_type_node. (free_lang_data): Remove assignment to ptrdiff_type_node. c-family/ * c-common.c (c_common_nodes_and_builtins): Remove initialization of ptrdiff_type_node. lto/ * lto-lang.c (lto_init): Remove initialization of ptrdiff_type_node. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 62174a9..0749361 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4475,8 +4475,6 @@ c_common_nodes_and_builtins (void) default_function_type = build_varargs_function_type_list (integer_type_node, NULL_TREE); - ptrdiff_type_node -= TREE_TYPE (identifier_global_value (get_identifier (PTRDIFF_TYPE))); unsigned_ptrdiff_type_node = c_common_unsigned_type (ptrdiff_type_node);
Re: change initialization of ptrdiff_type_node
On 25 November 2016 at 14:48, Richard Biener wrote: > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote: > >> On 25 November 2016 at 13:43, Richard Biener wrote: >> > On Fri, 25 Nov 2016, Jakub Jelinek wrote: >> > >> >> On Fri, Nov 25, 2016 at 01:28:06PM +0530, Prathamesh Kulkarni wrote: >> >> > --- a/gcc/lto/lto-lang.c >> >> > +++ b/gcc/lto/lto-lang.c >> >> > @@ -1271,8 +1271,30 @@ lto_init (void) >> >> >gcc_assert (TYPE_MAIN_VARIANT (const_tm_ptr_type_node) >> >> > == const_ptr_type_node); >> >> > >> >> > - ptrdiff_type_node = integer_type_node; >> >> > + if (strcmp (PTRDIFF_TYPE, "int") == 0) >> >> > +ptrdiff_type_node = integer_type_node; >> >> > + else if (strcmp (PTRDIFF_TYPE, "long int") == 0) >> >> > +ptrdiff_type_node = long_integer_type_node; >> >> > + else if (strcmp (PTRDIFF_TYPE, "long long int") == 0) >> >> > +ptrdiff_type_node = long_long_integer_type_node; >> >> > + else if (strcmp (PTRDIFF_TYPE, "short int") == 0) >> >> > +ptrdiff_type_node = short_integer_type_node; >> >> > + else >> >> > +{ >> >> > + ptrdiff_type_node = NULL_TREE; >> >> > + for (int i = 0; i < NUM_INT_N_ENTS; i++) >> >> > + if (int_n_enabled_p[i]) >> >> > + { >> >> > + char name[50]; >> >> > + sprintf (name, "__int%d", int_n_data[i].bitsize); >> >> > + if (strcmp (name, PTRDIFF_TYPE) == 0) >> >> > + ptrdiff_type_node = int_n_trees[i].signed_type; >> >> > + } >> >> > + if (ptrdiff_type_node == NULL_TREE) >> >> > + gcc_unreachable (); >> >> > +} >> >> >> >> This looks ok to me. >> > >> > But I'd like to see this in build_common_tree_nodes alongside >> > the initialization of size_type_node (and thus removed from >> > c_common_nodes_and_builtins). This way you can simply remove >> > the lto-lang.c code as well. >> > >> > Please then also remove the ptrdiff_type_node re-set from >> > free_lang_data (). >> Hi Richard, >> Does this version look OK ? >> Validation in progress. > > Yes, patch is ok if testing succeeds. Thanks, the patch passes bootstrap+test on x86_64-unknown-linux-gnu with --enable-languages=all,ada and cross-tested on arm*-*-*, aarch64*-*-* with --enable-languages=c,c++,fortran. However LTO bootstrap fails with miscompares (attached) configured with: --disable-werror --enable-stage1-checking=release --with-build-config=bootstrap-lto I verified that the same miscompares happen without the patch too, and have committed it as r242888. Thanks, Prathamesh > > Thanks, > Richard. gcc/tree-ssa-phiopt.o differs gcc/sanopt.o differs gcc/tree-ssa-loop-ivcanon.o differs gcc/gcc.o differs gcc/lra.o differs gcc/tree-ssa-loop-manip.o differs gcc/tree.o differs gcc/tree-ssa-dce.o differs gcc/gcse.o differs gcc/gimple-ssa-strength-reduction.o differs gcc/ipa-split.o differs gcc/ipa.o differs gcc/cfgexpand.o differs gcc/recog.o differs gcc/tree-ssa-loop-niter.o differs gcc/loop-doloop.o differs gcc/combine.o differs gcc/predict.o differs gcc/dce.o differs gcc/graphds.o differs gcc/asan.o differs gcc/tree-ssa.o differs gcc/tree-ssa-loop-im.o differs gcc/ipa-devirt.o differs gcc/dbxout.o differs gcc/combine-stack-adj.o differs gcc/tree-ssa-live.o differs gcc/sched-rgn.o differs gcc/trans-mem.o differs gcc/tree-ssa-loop-unswitch.o differs gcc/haifa-sched.o differs gcc/tree-diagnostic.o differs gcc/tree-vect-stmts.o differs gcc/collect2.o differs gcc/tree-vect-data-refs.o differs gcc/tree-ssa-operands.o differs gcc/ipa-icf.o differs gcc/tree-ssa-sccvn.o differs gcc/tree-ssa-forwprop.o differs gcc/tsan.o differs gcc/gimple-ssa-store-merging.o differs gcc/tree-parloops.o differs gcc/tree-complex.o differs gcc/tracer.o differs gcc/tree-vect-slp.o differs gcc/diagnostic-show-locus.o differs gcc/hsa-gen.o differs gcc/hsa.o differs gcc/df-scan.o differs gcc/gcse-common.o differs gcc/tree-object-size.o differs gcc/build/genextract.o differs gcc/build/genpreds.o differs gcc/build/read-rtl.o differs gcc/build/gengtype-state.o differs gcc/build/genattr.o differs gcc/build/genopinit.o differs gcc/build/genmatch.o differs gcc/build/genrecog.o differs gcc/build/gensupport.o differs gcc/build/genautomata.o differs gcc/tree-loop-distribution.o differs gcc/gimplify.o differs gcc/symtab.o differs gcc/lto-wrapper.o differs gcc/rtlanal.o differs gcc/dse.o differs gcc/cfgrtl.o differs gcc/dwarf2out.o d
Re: [PR78365] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413
On 28 November 2016 at 10:55, kugan wrote: > Hi, > > On 24/11/16 19:48, Richard Biener wrote: >> >> On Wed, Nov 23, 2016 at 4:33 PM, Martin Jambor wrote: >>> >>> Hi, >>> >>> On Fri, Nov 18, 2016 at 12:38:18PM +1100, kugan wrote: Hi, I was relying on ipa_get_callee_param_type to get type of parameter and then convert arguments to this type while computing jump functions. However, in cases like shown in PR78365, ipa_get_callee_param_type, instead of giving up, would return the wrong type. >>> >>> >>> At what stage does this happen? During analysis >>> (ipa_compute_jump_functions_for_edge) or at WPA >>> (propagate_constants_accross_call)? Both? >> >> >> Hmm, where does jump function compute require the callee type? >> In my view the jump function should record >> >> (expected-incoming-type) arg [OP X] >> >> for each call argument in its body. Thus required conversions are >> done at WPA propagation time. >> I think the current uses of ipa_get_callee_param_type are fine with this. Attached patch now uses callee's DECL_ARGUMENTS to get the type. If it cannot be found, then I would give up and set the jump function to varying. >>> >>> >>> But DECL_ARGUMENTS is not available at WPA stage with LTO so your >>> patch would make our IPA layer to optimize less with LTO. This was >>> the reason to go through the hoops of TYPE_ARG_TYPES in the first >>> place. >>> >>> If TYPE_ARG_TYPES cannot be trusted, then I'm afraid we are back to >>> square one and indeed need to put the correct type in jump functions. >> >> >> If DECL_ARGUMENTS is not available at WPA stage then I see no other >> way than to put the types on the jump functions. > > > Here is a patch that does this. To fox PR78365, in > ipa_get_callee_param_type, I am now checking DECL_ARGUMENTS first. I lto > bootstrapped and regression tested on x86_64-linux-gnu and ppc64le-linux > with no new regressions. I will build Firefox and measure the memory usage > as Honza suggested based on the feedback. Hi Kugan, In your patch in ipa_get_callee_param_type(): + tree t = e->callee ? DECL_ARGUMENTS (e->callee->decl) : NULL_TREE; Perhaps this should be e->callee->function_symbol() ? Thanks, Prathamesh > > Thanks, > Kugan > > > > gcc/ChangeLog: > > 2016-11-28 Kugan Vivekanandarajah > > PR IPA/78365 > * ipa-cp.c (propagate_vr_accross_jump_function): Remove param_type > argument and > use the one set in jump_func. > (propagate_constants_accross_call): Likewise. > * ipa-prop.c (ipa_get_callee_param_type): Chedk DECL_ARGUMENTS > first. > (ipa_compute_jump_functions_for_edge): Set param_type for jump_func. > (ipa_write_jump_function): Stream param_type. > (ipa_read_jump_function): Likewise. > > gcc/testsuite/ChangeLog: > > 2016-11-28 Kugan Vivekanandarajah > > > PR IPA/78365 > * gcc.dg/torture/pr78365.c: New test. > > > >>> If just preferring DECL_ARGUMENTS is enough, then changing >>> ipa_get_callee_param_type to use that if is is available, as Richi >>> suggested, would indeed be preferable. But if even falling back on it >>> can cause errors, then I am not sure if it helps. >>> >>> In any event, thanks for diligently dealing with the fallout, >>> >>> Martin >>> >>> Bootstrapped and regression tested on x86_64-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/testsuite/ChangeLog: 2016-11-18 Kugan Vivekanandarajah PR IPA/78365 * gcc.dg/torture/pr78365.c: New test. gcc/ChangeLog: 2016-11-18 Kugan Vivekanandarajah PR IPA/78365 * ipa-cp.c (propagate_constants_accross_call): get param type from callees DECL_ARGUMENTS if available. * ipa-prop.c (ipa_compute_jump_functions_for_edge): Likewise. (ipcp_update_vr): Remove now redundant conversion of precision for VR. * ipa-prop.h: Make ipa_get_callee_param_type local again.
PR78599
Hi, As mentioned in PR, the issue seems to be that in propagate_bits_accross_jump_functions(), ipa_get_type() returns record_type during WPA and hence we pass invalid precision to ipcp_bits_lattice::meet_with (value, mask, precision) which eventually leads to runtime error. The attached patch tries to fix that, by bailing out if type of param is not integral or pointer type. This happens for the edge from deque_test -> _Z4copyIPd1BEvT_S2_T0_.isra.0/9. However I am not sure how ipcp_bits_lattice::meet_with (value, mask, precision) gets called for this case. In ipa_compute_jump_functions_for_edge(), we set jfunc->bits.known to true only if parm's type satisfies INTEGRAL_TYPE_P or POINTER_TYPE_P. And ipcp_bits_lattice::meet_with (value, mask, precision) is called only if jfunc->bits.known is set to true. So I suppose it shouldn't really happen that ipcp_bits_lattice::meet_with(value, mask, precision) gets called when callee parameter's type is record_type, since the corresponding argument's type would also need to be record_type and jfunc->bits.known would be set to false. Without -flto, parm_type is reference_type so that satisfies POINTER_TYPE_P, but with -flto it's appearing to be record_type. Is this possibly the same issue of TYPE_ARG_TYPES returning bogus types during WPA ? I verified the attached patch fixes the runtime error with ubsan-built gcc. Bootstrap+tested on x86_64-unknown-linux-gnu. Cross-tested on arm*-*-*, aarch64*-*-*. LTO bootstrap on x86_64-unknown-linux-gnu in progress. Is it OK to commit if it succeeds ? Thanks, Prathamesh 2016-12-01 Prathamesh Kulkarni PR ipa/78599 * ipa-cp.c (propagate_bits_accross_jump_function): Check if parm_type is integral or pointer type. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 2ec671f..28eb74c 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -1770,12 +1770,15 @@ propagate_bits_accross_jump_function (cgraph_edge *cs, int idx, ipa_jump_func *j tree parm_type = ipa_get_type (callee_info, idx); /* For K&R C programs, ipa_get_type() could return NULL_TREE. - Avoid the transform for these cases. */ - if (!parm_type) + Avoid the transform for these cases or if parm type is not + integral or pointer type. */ + if (!parm_type + || !(INTEGRAL_TYPE_P (parm_type) || POINTER_TYPE_P (parm_type))) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Setting dest_lattice to bottom, because" - " param %i type is NULL for %s\n", idx, + " param %i type is %s for %s\n", idx, + (parm_type == NULL) ? "NULL" : "non-integral", cs->callee->name ()); return dest_lattice->set_to_bottom ();
Re: [tree-tailcall] Check if function returns it's argument
On 25 November 2016 at 21:17, Jeff Law wrote: > On 11/25/2016 01:07 AM, Richard Biener wrote: > >>> For the tail-call, issue should we artificially create a lhs and use that >>> as return value (perhaps by a separate pass before tailcall) ? >>> >>> __builtin_memcpy (a1, a2, a3); >>> return a1; >>> >>> gets transformed to: >>> _1 = __builtin_memcpy (a1, a2, a3) >>> return _1; >>> >>> So tail-call optimization pass would see the IL in it's expected form. >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call >> itself should do this rewrite. But if this form is required to make >> things work (I suppose you checked it _does_ actually work?) then >> we'd need to make sure later passes do not undo it. So it looks >> fragile to me. OTOH I seem to remember that the flags we set on >> GIMPLE are merely a hint to RTL expansion and the tailcalling is >> verified again there? > > So tail calling actually sits on the border between trees and RTL. > Essentially it's an expand-time decision as we use information from trees as > well as low level target information. > > I would not expect the former sequence to tail call. The tail calling code > does not know that the return value from memcpy will be a1. Thus the tail > calling code has to assume that it'll have to copy a1 into the return > register after returning from memcpy, which obviously can't be done if we > tail called memcpy. > > The second form is much more likely to turn into a tail call sequence > because the return value from memcpy will be sitting in the proper register. > This form out to work for most calling conventions that allow tail calls. > > We could (in theory) try and exploit the fact that memcpy returns its first > argument as a return value, but that would only be helpful on a target where > the first argument and return value use the same register. So I'd have a > slight preference to rewriting per Prathamesh's suggestion above since it's > more general. Thanks for the suggestion. The attached patch creates artificial lhs, and returns it if the function returns it's argument and that argument is used as return-value. eg: f (void * a1, void * a2, long unsigned int a3) { [0.0%]: # .MEM_5 = VDEF <.MEM_1(D)> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); # VUSE <.MEM_5> return a1_2(D); } is transformed to: f (void * a1, void * a2, long unsigned int a3) { void * _6; [0.0%]: # .MEM_5 = VDEF <.MEM_1(D)> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); # VUSE <.MEM_5> return _6; } While testing, I came across an issue with function f() defined intail-padding1.C: struct X { ~X() {} int n; char d; }; X f() { X nrvo; __builtin_memset (&nrvo, 0, sizeof(X)); return nrvo; } input to the pass: X f() () { [0.0%]: # .MEM_3 = VDEF <.MEM_1(D)> __builtin_memset (nrvo_2(D), 0, 8); # VUSE <.MEM_3> return nrvo_2(D); } verify_gimple_return failed with: tail-padding1.C:13:1: error: invalid conversion in return statement } ^ struct X struct X & # VUSE <.MEM_3> return _4; It seems the return type of function (struct X) differs with the type of return value (struct X&). Not sure how this is possible ? To work around that, I guarded the transform on: useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), TREE_TYPE (retval))) in the patch. Does that look OK ? Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada. Cross-tested on arm*-*-*, aarch64*-*-*. Thanks, Prathamesh > > > Jeff diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c new file mode 100644 index 000..b3fdc6c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-tailc-details" } */ + +void *f(void *a1, void *a2, __SIZE_TYPE__ a3) +{ + __builtin_memcpy (a1, a2, a3); + return a1; +} + +/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c index 66a0a4c..d46ca50 100644 --- a/gcc/tree-tailcall.c +++ b/gcc/tree-tailcall.c @@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret) basic_block abb; size_t idx; tree var; + greturn *ret_stmt = NULL; if (!single_succ_p (bb)) return; @@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret) for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { stmt = gsi_stmt (gsi); + if (!ret_stmt) + ret_stmt = dyn_cast (stmt); /* Ignore labels, returns, nops, clobbers and debug stmts. */ if (gimple_code (stmt) == GIMPLE_LABEL @@ -422,6 +425,37 @@ find_tail_calls (basic_block bb, struct tailcall **ret) { call = as_a (stmt); ass_var = gimple_call_lhs (call); + if (!ass_var) + { + /* Check if function returns one if it's arguments +
Re: [tree-tailcall] Check if function returns it's argument
On 1 December 2016 at 17:40, Richard Biener wrote: > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> On 25 November 2016 at 21:17, Jeff Law wrote: >> > On 11/25/2016 01:07 AM, Richard Biener wrote: >> > >> >>> For the tail-call, issue should we artificially create a lhs and use that >> >>> as return value (perhaps by a separate pass before tailcall) ? >> >>> >> >>> __builtin_memcpy (a1, a2, a3); >> >>> return a1; >> >>> >> >>> gets transformed to: >> >>> _1 = __builtin_memcpy (a1, a2, a3) >> >>> return _1; >> >>> >> >>> So tail-call optimization pass would see the IL in it's expected form. >> >> >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call >> >> itself should do this rewrite. But if this form is required to make >> >> things work (I suppose you checked it _does_ actually work?) then >> >> we'd need to make sure later passes do not undo it. So it looks >> >> fragile to me. OTOH I seem to remember that the flags we set on >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is >> >> verified again there? >> > >> > So tail calling actually sits on the border between trees and RTL. >> > Essentially it's an expand-time decision as we use information from trees >> > as >> > well as low level target information. >> > >> > I would not expect the former sequence to tail call. The tail calling code >> > does not know that the return value from memcpy will be a1. Thus the tail >> > calling code has to assume that it'll have to copy a1 into the return >> > register after returning from memcpy, which obviously can't be done if we >> > tail called memcpy. >> > >> > The second form is much more likely to turn into a tail call sequence >> > because the return value from memcpy will be sitting in the proper >> > register. >> > This form out to work for most calling conventions that allow tail calls. >> > >> > We could (in theory) try and exploit the fact that memcpy returns its first >> > argument as a return value, but that would only be helpful on a target >> > where >> > the first argument and return value use the same register. So I'd have a >> > slight preference to rewriting per Prathamesh's suggestion above since it's >> > more general. >> Thanks for the suggestion. The attached patch creates artificial lhs, >> and returns it if the function returns it's argument and that argument >> is used as return-value. >> >> eg: >> f (void * a1, void * a2, long unsigned int a3) >> { >>[0.0%]: >> # .MEM_5 = VDEF <.MEM_1(D)> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> # VUSE <.MEM_5> >> return a1_2(D); >> >> } >> >> is transformed to: >> f (void * a1, void * a2, long unsigned int a3) >> { >> void * _6; >> >>[0.0%]: >> # .MEM_5 = VDEF <.MEM_1(D)> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> # VUSE <.MEM_5> >> return _6; >> >> } >> >> While testing, I came across an issue with function f() defined >> intail-padding1.C: >> struct X >> { >> ~X() {} >> int n; >> char d; >> }; >> >> X f() >> { >> X nrvo; >> __builtin_memset (&nrvo, 0, sizeof(X)); >> return nrvo; >> } >> >> input to the pass: >> X f() () >> { >>[0.0%]: >> # .MEM_3 = VDEF <.MEM_1(D)> >> __builtin_memset (nrvo_2(D), 0, 8); >> # VUSE <.MEM_3> >> return nrvo_2(D); >> >> } >> >> verify_gimple_return failed with: >> tail-padding1.C:13:1: error: invalid conversion in return statement >> } >> ^ >> struct X >> >> struct X & >> >> # VUSE <.MEM_3> >> return _4; >> >> It seems the return type of function (struct X) differs with the type >> of return value (struct X&). >> Not sure how this is possible ? > > You need to honor DECL_BY_REFERENCE of DECL_RESULT. Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) resolved the error. Does the attached version look OK ? Validation in progress. Thanks, Prathamesh > >> To work around that, I guarded the transform on: >> useless_type_conversion_p (TREE_TYPE (
Re: [tree-tailcall] Check if function returns it's argument
On 1 December 2016 at 18:26, Richard Biener wrote: > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> On 1 December 2016 at 17:40, Richard Biener wrote: >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> > >> >> On 25 November 2016 at 21:17, Jeff Law wrote: >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: >> >> > >> >> >>> For the tail-call, issue should we artificially create a lhs and use >> >> >>> that >> >> >>> as return value (perhaps by a separate pass before tailcall) ? >> >> >>> >> >> >>> __builtin_memcpy (a1, a2, a3); >> >> >>> return a1; >> >> >>> >> >> >>> gets transformed to: >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) >> >> >>> return _1; >> >> >>> >> >> >>> So tail-call optimization pass would see the IL in it's expected form. >> >> >> >> >> >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call >> >> >> itself should do this rewrite. But if this form is required to make >> >> >> things work (I suppose you checked it _does_ actually work?) then >> >> >> we'd need to make sure later passes do not undo it. So it looks >> >> >> fragile to me. OTOH I seem to remember that the flags we set on >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is >> >> >> verified again there? >> >> > >> >> > So tail calling actually sits on the border between trees and RTL. >> >> > Essentially it's an expand-time decision as we use information from >> >> > trees as >> >> > well as low level target information. >> >> > >> >> > I would not expect the former sequence to tail call. The tail calling >> >> > code >> >> > does not know that the return value from memcpy will be a1. Thus the >> >> > tail >> >> > calling code has to assume that it'll have to copy a1 into the return >> >> > register after returning from memcpy, which obviously can't be done if >> >> > we >> >> > tail called memcpy. >> >> > >> >> > The second form is much more likely to turn into a tail call sequence >> >> > because the return value from memcpy will be sitting in the proper >> >> > register. >> >> > This form out to work for most calling conventions that allow tail >> >> > calls. >> >> > >> >> > We could (in theory) try and exploit the fact that memcpy returns its >> >> > first >> >> > argument as a return value, but that would only be helpful on a target >> >> > where >> >> > the first argument and return value use the same register. So I'd have a >> >> > slight preference to rewriting per Prathamesh's suggestion above since >> >> > it's >> >> > more general. >> >> Thanks for the suggestion. The attached patch creates artificial lhs, >> >> and returns it if the function returns it's argument and that argument >> >> is used as return-value. >> >> >> >> eg: >> >> f (void * a1, void * a2, long unsigned int a3) >> >> { >> >>[0.0%]: >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> # VUSE <.MEM_5> >> >> return a1_2(D); >> >> >> >> } >> >> >> >> is transformed to: >> >> f (void * a1, void * a2, long unsigned int a3) >> >> { >> >> void * _6; >> >> >> >>[0.0%]: >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> # VUSE <.MEM_5> >> >> return _6; >> >> >> >> } >> >> >> >> While testing, I came across an issue with function f() defined >> >> intail-padding1.C: >> >> struct X >> >> { >> >> ~X() {} >> >> int n; >> >> char d; >> >> }; >> >> >> >> X f() >> >> { >> >> X nrvo; >> >> __
Re: [tree-tailcall] Check if function returns it's argument
On 1 December 2016 at 18:38, Richard Biener wrote: > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> On 1 December 2016 at 18:26, Richard Biener wrote: >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> > >> >> On 1 December 2016 at 17:40, Richard Biener wrote: >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 25 November 2016 at 21:17, Jeff Law wrote: >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: >> >> >> > >> >> >> >>> For the tail-call, issue should we artificially create a lhs and >> >> >> >>> use that >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ? >> >> >> >>> >> >> >> >>> __builtin_memcpy (a1, a2, a3); >> >> >> >>> return a1; >> >> >> >>> >> >> >> >>> gets transformed to: >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) >> >> >> >>> return _1; >> >> >> >>> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected >> >> >> >>> form. >> >> >> >> >> >> >> >> >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call >> >> >> >> itself should do this rewrite. But if this form is required to make >> >> >> >> things work (I suppose you checked it _does_ actually work?) then >> >> >> >> we'd need to make sure later passes do not undo it. So it looks >> >> >> >> fragile to me. OTOH I seem to remember that the flags we set on >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is >> >> >> >> verified again there? >> >> >> > >> >> >> > So tail calling actually sits on the border between trees and RTL. >> >> >> > Essentially it's an expand-time decision as we use information from >> >> >> > trees as >> >> >> > well as low level target information. >> >> >> > >> >> >> > I would not expect the former sequence to tail call. The tail >> >> >> > calling code >> >> >> > does not know that the return value from memcpy will be a1. Thus >> >> >> > the tail >> >> >> > calling code has to assume that it'll have to copy a1 into the return >> >> >> > register after returning from memcpy, which obviously can't be done >> >> >> > if we >> >> >> > tail called memcpy. >> >> >> > >> >> >> > The second form is much more likely to turn into a tail call sequence >> >> >> > because the return value from memcpy will be sitting in the proper >> >> >> > register. >> >> >> > This form out to work for most calling conventions that allow tail >> >> >> > calls. >> >> >> > >> >> >> > We could (in theory) try and exploit the fact that memcpy returns >> >> >> > its first >> >> >> > argument as a return value, but that would only be helpful on a >> >> >> > target where >> >> >> > the first argument and return value use the same register. So I'd >> >> >> > have a >> >> >> > slight preference to rewriting per Prathamesh's suggestion above >> >> >> > since it's >> >> >> > more general. >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs, >> >> >> and returns it if the function returns it's argument and that argument >> >> >> is used as return-value. >> >> >> >> >> >> eg: >> >> >> f (void * a1, void * a2, long unsigned int a3) >> >> >> { >> >> >>[0.0%]: >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> >> # VUSE <.MEM_5> >> >> >> return a1_2(D); >> >> >> >> >> >> } >> >> >> >&g
PR78629
Hi Richard, I tested your fix for the patch with ubsan stage-1 built gcc, and it fixes the error. Is it OK to commit if bootstrap+test passes on x86_64-unknown-linux-gnu ? Thanks, Prathamesh 2016-12-01 Richard Biener Prathamesh Kulkarni PR middle-end/78629 * vec.h (vec::quick_grow_cleared): Guard call to memset if len-oldlen != 0. (vec::safe_grow_cleared): Likewise. diff --git a/gcc/vec.h b/gcc/vec.h index 14fb2a6..aa93411 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -1092,8 +1092,10 @@ inline void vec::quick_grow_cleared (unsigned len) { unsigned oldlen = length (); + size_t sz = sizeof (T) * (len - oldlen); quick_grow (len); - memset (&(address ()[oldlen]), 0, sizeof (T) * (len - oldlen)); + if (sz != 0) +memset (&(address ()[oldlen]), 0, sz); } @@ -1605,8 +1607,10 @@ inline void vec::safe_grow_cleared (unsigned len MEM_STAT_DECL) { unsigned oldlen = length (); + size_t sz = sizeof (T) * (len - oldlen); safe_grow (len PASS_MEM_STAT); - memset (&(address ()[oldlen]), 0, sizeof (T) * (len - oldlen)); + if (sz != 0) +memset (&(address ()[oldlen]), 0, sz); }
Re: [tree-tailcall] Check if function returns it's argument
On 2 December 2016 at 03:57, Jeff Law wrote: > On 12/01/2016 06:22 AM, Richard Biener wrote: >>> >>> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but >>> differently: >>> >>> tail-padding1.C:13:1: error: RESULT_DECL should be read only when >>> DECL_BY_REFERENCE is set >>> } >>> ^ >>> while verifying SSA_NAME nrvo_4 in statement >>> # .MEM_3 = VDEF <.MEM_1(D)> >>> nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); >>> tail-padding1.C:13:1: internal compiler error: verify_ssa failed >> >> >> Hmm, ok. Not sure why we enforce this. > > I don't know either. But I would start by looking at tree-nrv.c since it > looks (based on the variable names) that the named-value-return optimization > kicked in. Um, the name nrv0 was in the test-case itself. The transform takes place in tailr1 pass, which appears to be before nrv, so possibly this is not related to nrv ? The verify check seems to be added in r161898 by Honza to fix PR 44813 based on Richard's following suggestion from https://gcc.gnu.org/ml/gcc-patches/2010-07/msg00358.html: "We should never see a defintion of a RESULT_DECL SSA name for DECL_BY_REFERENCE RESULT_DECLs (that would be a bug - we should add verification to the SSA verifier, can you do add that?)." The attached patch moves && ret_stmt together with !ass_var, and keeps the !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) check, and adjusts tailcall-9.c testcase to scan _\[0-9\]* = __builtin_memcpy in tailr1 dump since that's where the transform takes place. Is this version OK ? Thanks, Prathamesh > >> >> Note that in the end this patch looks fishy -- iff we really need >> the LHS on the assignment for correctness if we have the tailcall >> flag set then what guarantees that later passes do not remove it >> again? So anybody removing a LHS would need to unset the tailcall flag? >> >> Saying again that I don't know enough about the RTL part of tailcall >> expansion. > > The LHS on the assignment makes it easier to identify when a tail call is > possible. It's not needed for correctness. Not having the LHS on the > assignment just means we won't get an optimized tail call. > > Under what circumstances would the LHS possibly be removed? We know the > return statement references the LHS, so it's not going to be something that > DCE will do. > > jeff diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c new file mode 100644 index 000..9c482f4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-tailr-details" } */ + +void *f(void *a1, void *a2, __SIZE_TYPE__ a3) +{ + __builtin_memcpy (a1, a2, a3); + return a1; +} + +/* { dg-final { scan-tree-dump "_\[0-9\]* = __builtin_memcpy" "tailr1" } } */ diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c index 66a0a4c..64f624f 100644 --- a/gcc/tree-tailcall.c +++ b/gcc/tree-tailcall.c @@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret) basic_block abb; size_t idx; tree var; + greturn *ret_stmt = NULL; if (!single_succ_p (bb)) return; @@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret) for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { stmt = gsi_stmt (gsi); + if (!ret_stmt) + ret_stmt = dyn_cast (stmt); /* Ignore labels, returns, nops, clobbers and debug stmts. */ if (gimple_code (stmt) == GIMPLE_LABEL @@ -422,6 +425,35 @@ find_tail_calls (basic_block bb, struct tailcall **ret) { call = as_a (stmt); ass_var = gimple_call_lhs (call); + if (!ass_var && ret_stmt) + { + /* Check if function returns one if it's arguments +and that argument is used as return value. +In that case create an artificial lhs to call_stmt, +and set it as the return value. */ + + unsigned rf = gimple_call_return_flags (call); + if (rf & ERF_RETURNS_ARG) + { + unsigned argnum = rf & ERF_RETURN_ARG_MASK; + if (argnum < gimple_call_num_args (call)) + { + tree arg = gimple_call_arg (call, argnum); + tree retval = gimple_return_retval (ret_stmt); + if (retval + && TREE_CODE (retval) == SSA_NAME + && operand_equal_p (retval, arg, 0) + && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))) + { + ass_var = copy_ssa_name (arg); + gimple_call_set_lhs (call, ass_var); + update_stmt (call); + gimple_return_set_retval (ret_stmt, ass_var); + update_stmt (ret_stmt); + } + } + } + } brea
Re: PR78599
On 3 December 2016 at 00:25, Martin Jambor wrote: > Hi, > > On Thu, Dec 01, 2016 at 01:43:16PM +0100, Richard Biener wrote: >> On Thu, Dec 1, 2016 at 11:07 AM, Prathamesh Kulkarni >> wrote: >> > Hi, >> > As mentioned in PR, the issue seems to be that in >> > propagate_bits_accross_jump_functions(), >> > ipa_get_type() returns record_type during WPA and hence we pass >> > invalid precision to >> > ipcp_bits_lattice::meet_with (value, mask, precision) which eventually >> > leads to runtime error. >> > The attached patch tries to fix that, by bailing out if type of param >> > is not integral or pointer type. >> > This happens for the edge from deque_test -> >> > _Z4copyIPd1BEvT_S2_T0_.isra.0/9. >> >> Feels more like a DECL_BY_REFERENCE mishandling and should be fixed >> elsewhere. > > That is indeed what is happening. Prathamesh, if you are going to save > the type of arguments in the jump function, it should help you also > with this issue. I know it was me who suggested using the function > type to get at them and am sorry, I did not realize there potential > issues with promotions and by_reference passing. > > By the way, please be careful not to introduce code style violations, > especially lines exceeding 80 characters and adding trailing > whitespace (propagate_bits_accross_jump_function has a few instances > of both), I'd suggest setting your editor to highlight them. Oops sorry about that, I will pay more attention to formatting henceforth. Using editor to highlight stray whitespace is indeed quite useful, thanks for that suggestion. Kugan has a patch for adding param type to jump function: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02732.html Once that gets committed, I will send a patch to use jfunc->param_type in propagate_bits_accross_jump_function(). Thanks, Prathamesh > > Thanks, > > Martin >
Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
Hi, This patch folds strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known. One issue I came across was forwprop1 reverses the order of operands in eq_expr below: eg test-case: _Bool f(char *s, int cond) { char *t1 = __builtin_strstr (s, "hello"); _Bool t2 = (t1 == s); return t2; } forwprop1 dump: f (char * s, int cond) { _Bool t2; char * t1; [0.0%]: t1_3 = __builtin_strstr (s_2(D), "hello"); t2_4 = s_2(D) == t1_3; return t2_4; } So I had to check if SSA_NAME_DEF_STMT (rhs2) was call to strstr rather than rhs1. I suppose that's OK ? clang unconditionally transforms strstr (s, t) == s to strncmp (s, t, strlen (t)) However I am not sure what algorithm glibc's strstr uses, so didn't attempt to transform if strlen (t) is unknown. Should we do the transform even if strlen (t) is unknown ? Thanks, Prathamesh 2016-12-05 Prathamesh Kulkarni * tree-ssa-strlen.c (strlen_optimize_stmt): Fold strstr(s, t) == s to strcmp (s, t) == 0. (pass_data_strlen): Set todo_flags_finish to TODO_update_ssa. testsuite/ * gcc.dg/strlenopt-30.c: New test-case. diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c b/gcc/testsuite/gcc.dg/strlenopt-30.c new file mode 100644 index 000..737f37d --- /dev/null +++ b/gcc/testsuite/gcc.dg/strlenopt-30.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +_Bool f1(char *s) +{ + char *t = "hello"; + char *t1 = __builtin_strstr (s, t); + _Bool t2 = (t1 == s); + return t2; +} + +_Bool f2(char *s) +{ + char *t = "hello"; + char *t1 = __builtin_strstr (s, t); + _Bool t2 = (t1 != s); + return t2; +} + +_Bool f3(char *s, char *t) +{ + char *t1 = __builtin_strstr (s, t); + _Bool t2 = (t1 == s); + return t2; +} + +/* { dg-final { scan-tree-dump-times "__builtin_strcmp" 2 "strlen" } } */ diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 339812e..8977e80 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -2302,6 +2302,55 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) else if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR) handle_pointer_plus (gsi); } + + /* Fold strstr (s, t) == s to strcmp (s, t) == 0. if strlen (t) +is known. */ + else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + { + enum tree_code code = gimple_assign_rhs_code (stmt); + if (code == EQ_EXPR || code == NE_EXPR) + { + tree rhs1 = gimple_assign_rhs1 (stmt); + tree rhs2 = gimple_assign_rhs2 (stmt); + if (TREE_CODE (rhs2) == SSA_NAME) + { + gcall *call_stmt = dyn_cast (SSA_NAME_DEF_STMT (rhs2)); + if (call_stmt + && gimple_call_builtin_p (call_stmt, BUILT_IN_STRSTR)) + { + tree arg0 = gimple_call_arg (call_stmt, 0); + if (operand_equal_p (arg0, rhs1, 0)) + { + /* Check if strlen(arg1) is known. */ + tree arg1 = gimple_call_arg (call_stmt, 1); + int idx = get_stridx (arg1); + strinfo *si = NULL; + if (idx) + si = get_strinfo (idx); + if ((idx < 0) + || (si && (get_string_length (si) != NULL_TREE))) + { + gimple_stmt_iterator gsi = gsi_for_stmt (call_stmt); + tree strcmp_decl = builtin_decl_explicit (BUILT_IN_STRCMP); + gcall *strcmp_call = gimple_build_call (strcmp_decl, 2, + arg0, arg1); + tree strcmp_lhs = make_ssa_name (integer_type_node); + gimple_call_set_lhs (strcmp_call, strcmp_lhs); + update_stmt (strcmp_call); + gsi_remove (&gsi, true); + gsi_insert_before (&gsi, strcmp_call, GSI_SAME_STMT); + + gsi = gsi_for_stmt (stmt); + tree zero = build_zero_cst (TREE_TYPE (strcmp_lhs)); + gassign *ga = gimple_build_assign (lhs, code, + strcmp_lhs, zero); + gsi_replace (&gsi, ga, false); + } + } + } + } + } + } else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) { tree type = TREE_TYPE (lhs); @@ -2505,7 +2554,7 @@ const pass_data pass_da
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 5 December 2016 at 23:38, Bernd Schmidt wrote: > On 12/05/2016 07:02 PM, Prathamesh Kulkarni wrote: >> >> This patch folds strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if >> strlen (t) is known. > > > That's not the same thing, is it? > > s = "hello world", t = "hello": > strstr (s, t) == s, but not strcmp (s, t) == 0. > > I think you'd want memcmp (s, t, strlen (t)) == 0. Ah indeed! Dunno why I thought strstr (s, t) == strcmp (s, t) :( Thanks for pointing out! > > > Bernd >
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 5 December 2016 at 23:40, Prathamesh Kulkarni wrote: > On 5 December 2016 at 23:38, Bernd Schmidt wrote: >> On 12/05/2016 07:02 PM, Prathamesh Kulkarni wrote: >>> >>> This patch folds strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if >>> strlen (t) is known. >> >> >> That's not the same thing, is it? >> >> s = "hello world", t = "hello": >> strstr (s, t) == s, but not strcmp (s, t) == 0. >> >> I think you'd want memcmp (s, t, strlen (t)) == 0. > Ah indeed! Dunno why I thought strstr (s, t) == strcmp (s, t) :( Err, I meant strstr(s, t) == s to strcmp(s, t) == 0. I will send a patch to fold strstr (s, t) to memcmp (s, t, strlen (t)) == 0. Thanks for the suggestions. Regards, Prathamesh > Thanks for pointing out! >> >> >> Bernd >>
Re: [PATCH] Fix ICE due to IPA-VRP (PR tree-optimization/78681)
On 6 December 2016 at 14:50, Richard Biener wrote: > On Tue, 6 Dec 2016, Jakub Jelinek wrote: > >> On Tue, Dec 06, 2016 at 09:36:55AM +0100, Richard Biener wrote: >> > > As shown on the testcase, with K&R definitions and fn prototypes with >> > > promoted types, we can end up computing caller's value ranges in wider >> > > type than the parameter actually has in the function. >> > > The problem with that is that wide_int_storage::from can actually wrap >> > > around, so either as in the testcase we end up with invalid range >> > > (minimum >> > > larger than maximum), or just with a range that doesn't cover all the >> > > values >> > > the parameter can have. >> > > The patch punts if the range bounds cast to type aren't equal to the >> > > original values. Similarly (just theoretical), for pointers it only >> > > optimizes if the caller's precision as at most as wide as the pointer, >> > > if it would be wider, even ~[0, 0] range could actually be a NULL pointer >> > > (some multiple of ~(uintptr_t)0 + (uintmax_t) 1). >> > > >> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> > >> > Ok, but I wonder whether this also addresses PR78365 which has a >> > patch pending (to be reviewed by IPA maintainers) that makes propagation >> > across such calls more sensible by recording type information in >> > the jump functions. >> >> It is effectively a dup. So, my patch fixes the PR78365 testcase and I bet >> (though haven't tried, but it is extremely likely) that the other patch >> fixes PR78681 testcase. >> So, do you want me to add the pr78365.c testcase to my patch, or prefer >> the other patch? OT, in the other patch I've noticed incorrect formatting: >> +parm ? >> +TREE_TYPE (parm) : >> NULL_TREE); > > I prefer the other patch (well, the other approach, didn't look into > the patch in detail). We would also need param types to be saved in jump function for ipa-bits-propagation, to fix cases like PR78599. Thanks, Prathamesh > > Richard. > >> > > 2016-12-05 Jakub Jelinek >> > > >> > > PR tree-optimization/78681 >> > > * ipa-prop.c (ipcp_update_vr): Punt if vr[i].min precision is bigger >> > > then type's precision and vr[i].min or vr[i].max in type would wrap. >> > > >> > > * gcc.c-torture/compile/pr78681.c: New test. >> > > >> > > --- gcc/ipa-prop.c.jj 2016-11-25 18:11:05.0 +0100 >> > > +++ gcc/ipa-prop.c2016-12-05 18:48:48.853882864 +0100 >> > > @@ -5709,8 +5709,23 @@ ipcp_update_vr (struct cgraph_node *node >> > > { >> > > tree type = TREE_TYPE (ddef); >> > > unsigned prec = TYPE_PRECISION (type); >> > > + unsigned mprec = wi::get_precision (vr[i].min); >> > > + gcc_assert (mprec == wi::get_precision (vr[i].max)); >> > > if (INTEGRAL_TYPE_P (TREE_TYPE (ddef))) >> > > { >> > > + if (prec < mprec) >> > > + { >> > > + /* If there is a disagreement between callers and callee >> > > + on the argument type, e.g. when using K&R function >> > > + definitions, punt if vr[i].min or vr[i].max are outside >> > > + of type's precision. */ >> > > + wide_int m = wi::ext (vr[i].min, prec, TYPE_SIGN (type)); >> > > + if (m != vr[i].min) >> > > + continue; >> > > + m = wi::ext (vr[i].max, prec, TYPE_SIGN (type)); >> > > + if (m != vr[i].max) >> > > + continue; >> > > + } >> > > if (dump_file) >> > > { >> > > fprintf (dump_file, "Setting value range of param %u ", i); >> > > @@ -5729,6 +5744,7 @@ ipcp_update_vr (struct cgraph_node *node >> > > } >> > > else if (POINTER_TYPE_P (TREE_TYPE (ddef)) >> > > && vr[i].type == VR_ANTI_RANGE >> > > +&& mprec <= prec >> > > && wi::eq_p (vr[i].min, 0) >> > > && wi::eq_p (vr[i].max, 0)) >> > > { >> > > --- gcc/testsuite/gcc.c-torture/compile/pr78681.c.jj 2016-12-05 >> > > 19:51:15.353646309 +0100 >> > > +++ gcc/testsuite/gcc.c-torture/compile/pr78681.c 2016-12-05 >> > > 19:50:57.0 +0100 >> > > @@ -0,0 +1,27 @@ >> > > +/* PR tree-optimization/78681 */ >> > > + >> > > +struct S { char b; }; >> > > +char d, e, f, l, m; >> > > +struct S n; >> > > +int bar (char, char); >> > > +static void foo (struct S *, int, int, int, int); >> > > + >> > > +static void >> > > +foo (x, g, h, i, j) >> > > + struct S *x; >> > > + char g, h, i, j; >> > > +{ >> > > + char k; >> > > + for (k = 0; k <= j; k++) >> > > +if (bar (g, k)) >> > > + for (; i; k++) >> > > + if (d) >> > > + x->b = g; >> > > +} >> > > + >> > > +void >> > > +baz (int q) >> > > +{ >> > > + foo (&n, m, l, f, 1); >> > > + foo (&n, m, e, f, e - 1); >> > > +} >> >> Jakub >> >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nu
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 5 December 2016 at 23:47, Jakub Jelinek wrote: > On Mon, Dec 05, 2016 at 11:32:15PM +0530, Prathamesh Kulkarni wrote: >> So I had to check if SSA_NAME_DEF_STMT (rhs2) was call to strstr >> rather than rhs1. > > Then you need to test both whether it is strstr (s, t) == s or > s == strstr (s, t). > >> + gassign *ga = gimple_build_assign (lhs, code, >> + strcmp_lhs, zero); > > The formatting is wrong here. > >> + gsi_replace (&gsi, ga, false); >> + } >> + } >> + } >> + } >> + } >> + } >>else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) >> { >> tree type = TREE_TYPE (lhs); >> @@ -2505,7 +2554,7 @@ const pass_data pass_data_strlen = >>0, /* properties_provided */ >>0, /* properties_destroyed */ >>0, /* todo_flags_start */ >> - 0, /* todo_flags_finish */ >> + TODO_update_ssa, /* todo_flags_finish */ > > No, please don't. Just make sure to build proper SSA right away. Hi, Thanks for the suggestions, I have tried to modify the patch accordingly. Does this version look OK ? Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada Cross tested on arm*-*-*, aarch64*-*-*. Thanks, Prathamesh > > Jakub 2016-12-07 Prathamesh Kulkarni * tree-ssa-strlen.c (strlen_optimize_stmt): Fold strstr(s, t) == s to memcmp (s, t, strlen (t)) == 0. Include tree-into-ssa.h. testsuite/ * gcc.dg/strlenopt-30.c: New test-case. diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c b/gcc/testsuite/gcc.dg/strlenopt-30.c new file mode 100644 index 000..603e23c --- /dev/null +++ b/gcc/testsuite/gcc.dg/strlenopt-30.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +__attribute__((no_icf)) +_Bool f1(char *s) +{ + return __builtin_strstr (s, "hello") == s; +} + +__attribute__((no_icf)) +_Bool f2(char *s) +{ + return s == __builtin_strstr (s, "hello"); +} + +__attribute__((no_icf)) +_Bool f3(char *s) +{ + return s != __builtin_strstr (s, "hello"); +} + +__attribute__((no_icf)) +_Bool f4(char *s, char *t) +{ + return __builtin_strstr (s, t) == s; +} + +/* Do not perform transform in this case, since + t1 doesn't have single use. */ + +__attribute__((no_icf)) +_Bool f5(char *s, char *t) +{ + void foo(char *); + + char *t1 = __builtin_strstr (s, t); + foo (t1); + return (t1 == s); +} + +/* { dg-final { scan-tree-dump-times "__builtin_memcmp" 4 "strlen" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_strlen" 1 "strlen" } } */ diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 339812e..b7f4cee 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-chkp.h" #include "tree-hash-traits.h" #include "builtins.h" +#include "tree-into-ssa.h" /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive value is an index into strinfo vector, negative value stands for @@ -2302,7 +2303,94 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) else if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR) handle_pointer_plus (gsi); } - else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) + + /* Fold strstr (s, t) == s to memcmp (s, t, strlen (t)) == 0. +if var holding return value of strstr has single use. */ + + else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + { + enum tree_code code = gimple_assign_rhs_code (stmt); + if (code == EQ_EXPR || code == NE_EXPR) + { + tree rhs1 = gimple_assign_rhs1 (stmt); + tree rhs2 = gimple_assign_rhs2 (stmt); + if (TREE_CODE (rhs1) == SSA_NAME + && TREE_CODE (rhs2) == SSA_NAME) + { + gcall *call_stmt = dyn_cast (SSA_NAME_DEF_STMT (rhs1)); + if (!call_stmt) + { + call_stmt = dyn_cast (SSA_NAME_DEF_STMT (rhs2)); + tree tmp = rhs1; + rhs1 = rhs2; + rhs2 = tmp; + } + + tree call_lhs; + if (call_stmt + && gimple_call_builtin_p (call_stmt, BUILT_IN_STRSTR) + && (call_lhs = gimple_call_lhs (call_stmt)) + && has_single_use (call_lhs)) + { +
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 7 December 2016 at 17:36, Jakub Jelinek wrote: > On Wed, Dec 07, 2016 at 05:02:46PM +0530, Prathamesh Kulkarni wrote: >> + if (arg1_len == NULL_TREE) >> + { >> + gimple_stmt_iterator gsi; >> + tree strlen_decl; >> + gimple *strlen_call; >> + >> + strlen_decl = builtin_decl_explicit >> (BUILT_IN_STRLEN); >> + strlen_call = gimple_build_call (strlen_decl, 1, >> +arg1); >> + arg1_len = make_ssa_name (size_type_node); >> + gimple_call_set_lhs (strlen_call, arg1_len); >> + update_stmt (strlen_call); >> + gsi = gsi_for_stmt (call_stmt); >> + gsi_insert_before (&gsi, strlen_call, >> GSI_SAME_STMT); >> + } > > Why? If the strlen isn't readily available, do you really think it is > always a win to replace one call with 2 calls? The string you want to do > strlen on can be huge, the haystack could be empty or very short, etc. > I'd just punt if strlen isn't known. >> + >> + gimple_stmt_iterator gsi = gsi_for_stmt (call_stmt); >> + tree memcmp_decl = builtin_decl_explicit >> (BUILT_IN_MEMCMP); >> + gcall *memcmp_call >> + = gimple_build_call (memcmp_decl, 3, arg0, arg1, >> + arg1_len); >> + tree memcmp_lhs = make_ssa_name (integer_type_node); >> + gimple_call_set_lhs (memcmp_call, memcmp_lhs); >> + update_stmt (memcmp_call); >> + gsi_remove (&gsi, true); >> + gsi_insert_before (&gsi, memcmp_call, GSI_SAME_STMT); >> + >> + gsi = gsi_for_stmt (stmt); >> + tree zero = build_zero_cst (TREE_TYPE (memcmp_lhs)); >> + gassign *ga = gimple_build_assign (lhs, code, >> + memcmp_lhs, zero); >> + gsi_replace (&gsi, ga, false); >> + update_ssa (TODO_update_ssa); > > And this is certainly even more wrong than the old TODO_update_ssa at the > end of the pass, now you'll do it for every single replacement in the > function. Why do you need it? The old call stmt has gimple_vdef and > gimple_vuse, so just copy those over, see how e.g. > replace_call_with_call_and_fold in gimple-fold.c does that. > If you don't add strlen, you need to move the vdef/vuse from stmt to > memcmp_call, if you really want to add strlen (see above note though), > then that call should have a vuse added (same vuse as the stmt originally > had). Hi, Thanks for the suggestions. In attached patch, I dropped the transform if strlen (t) is unknown. Since strstr is marked pure, so IIUC call_stmt for strstr shouldn't have vdef assoicated with it ? (gimple_vdef for call_stmt returned NULL for test-cases I tried it with). Moving gimple_vuse from call_stmt to memcmp_call worked for me. Does the patch look OK ? Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-langauges=all,ada Cross-tested on arm*-*-*, aarch64*-*-*. Thanks, Prathamesh > > Jakub diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c b/gcc/testsuite/gcc.dg/strlenopt-30.c new file mode 100644 index 000..329bc25 --- /dev/null +++ b/gcc/testsuite/gcc.dg/strlenopt-30.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +__attribute__((no_icf)) +_Bool f1(char *s) +{ + return __builtin_strstr (s, "hello") == s; +} + +__attribute__((no_icf)) +_Bool f2(char *s) +{ + return s == __builtin_strstr (s, "hello"); +} + +__attribute__((no_icf)) +_Bool f3(char *s) +{ + return s != __builtin_strstr (s, "hello"); +} + +/* Do not perform transform, since strlen (t) + is unknown. */ + +__attribute__((no_icf)) +_Bool f4(char *s, char *t) +{ + return __builtin_strstr (s, t) == s; +} + +/* Do not perform transform in this case, since + t1 doesn't have single use. */ + +__attribute__((no_icf)) +_Bool f5(char *s) +{ + void foo(char *); + + char *t1 = __builtin_strstr (s, "hello"); + foo (t1); + return (t1 == s); +} + +/* { dg-final { scan-tree-dump-times "__builtin_memcmp" 3 "strlen" } } */ diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 339812e..06b07b0 100644
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 9 December 2016 at 17:59, Jakub Jelinek wrote: > On Fri, Dec 09, 2016 at 05:36:41PM +0530, Prathamesh Kulkarni wrote: >> --- a/gcc/tree-ssa-strlen.c >> +++ b/gcc/tree-ssa-strlen.c >> @@ -2302,7 +2302,81 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) >> else if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR) >> handle_pointer_plus (gsi); >> } >> - else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) >> + >> + /* Fold strstr (s, t) == s to memcmp (s, t, strlen (t)) == 0. >> + if strlen (t) is known and var holding return value of strstr >> + has single use. */ >> + >> + else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE >> (lhs))) >> + { >> + enum tree_code code = gimple_assign_rhs_code (stmt); >> + if (code == EQ_EXPR || code == NE_EXPR) > > This way you handle _8 = _5 == _7;, but not if (_5 == _7) bar ();. Shouldn't > you > also handle GIMPLE_COND similarly (of course, the rhs1 and rhs2 grabbing > and code grabbing is different for GIMPLE_COND. But the rest should be > the same, except again that you don't want to replace the GIMPLE_COND, but > adjust it. Maybe also COND_EXPR in gimple_assign (_9 = _5 == _7 ? _10 : > _11;). > >> + { >> + tree rhs1 = gimple_assign_rhs1 (stmt); >> + tree rhs2 = gimple_assign_rhs2 (stmt); >> + if (TREE_CODE (rhs1) == SSA_NAME >> + && TREE_CODE (rhs2) == SSA_NAME) >> + { >> + gcall *call_stmt = dyn_cast (SSA_NAME_DEF_STMT >> (rhs1)); >> + if (!call_stmt) >> + { >> + call_stmt = dyn_cast (SSA_NAME_DEF_STMT (rhs2)); >> + tree tmp = rhs1; >> + rhs1 = rhs2; >> + rhs2 = tmp; > > We use std::swap (rhs1, rhs2); in this case these days. > >> + } >> + >> + tree call_lhs; >> + if (call_stmt >> + && gimple_call_builtin_p (call_stmt, BUILT_IN_STRSTR) >> + && (call_lhs = gimple_call_lhs (call_stmt)) >> + && has_single_use (call_lhs)) > > This might not optimize if you have: > _5 = foo (); > _7 = __builtin_strstr (_5, "abcd"); > _8 = _5 == _7; > > Or even you could have: > _5 = __builtin_strstr (...); > _7 = __builtin_strstr (_5, "abcd"); > _8 = _5 == _7; > > So I wonder if you shouldn't do: > gimple *call_stmt = NULL; > for (int pass = 0; pass < 2; pass++) > { > gimple *g = SSA_NAME_DEF_STMT (rhs1); > if (gimple_call_builtin_p (g, BUILT_IN_STRSTR) > && gimple_call_lhs (g) == rhs1 > && has_single_use (rhs1) > && gimple_call_arg (g, 0) == rhs2) > { > call_stmt = g; > break; > } > std::swap (rhs1, rhs2); > } > if (call_stmt) > ... > > I think you don't need operand_equal_p, because SSA_NAMEs should just > be the same pointer if they are the same thing. > The above way you handle both orderings. Perhaps also it is big enough to > be done in a separate function, which you call with the code/rhs1/rhs2 and > stmt for the EQ/NE_EXPR is_gimple_assign as well as for COND_EXPR and > GIMPLE_COND. Hi Jakub, Thanks for the suggestions. It didn't occur to me to check for gimple_cond. I have tried to do the changes in the attached version. I am not sure if I have handled cond_expr correctly. IIUC, if gimple_assign has code cond_expr, then the condition is stored in gimple_assign_rhs1, however it's not a single operand but a tree of the form "op1 cond_code op2". Is that correct ? However I am not able to write a test-case that generates cond_expr in the IL. I tried: t1 = strstr (s, t); (t1 == s) ? foo() : bar (); and other such variants but it seems the ?: operator is getting lowered to gimple_cond instead. Bootstrap+tested on x86_64-unknown-linux-gnu and cross-tested on arm*-*-*, aarch64*-*-*. Does it look OK ? Thanks, Prathamesh > > Jakub 2016-12-13 Jakub Jelinek Prathamesh Kulkarni * tree-ssa-strlen.c (fold_strstr_to_memcmp): New function. (strlen_optimize_stmt): Call fold_strstr_to_memcmp. testsuite/
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 13 December 2016 at 15:27, Jakub Jelinek wrote: > On Tue, Dec 13, 2016 at 03:08:17PM +0530, Prathamesh Kulkarni wrote: >> Thanks for the suggestions. It didn't occur to me to check for gimple_cond. >> I have tried to do the changes in the attached version. >> I am not sure if I have handled cond_expr correctly. >> IIUC, if gimple_assign has code cond_expr, then the condition is >> stored in gimple_assign_rhs1, >> however it's not a single operand but a tree of the form "op1 cond_code op2". >> Is that correct ? > > Yes. gimple_assign_rhs1 will be in what you are looking for EQ_EXPR or > NE_EXPR tree, its TREE_CODE will be this code you want to check, and > TREE_OPERAND (exp, 0) and TREE_OPERAND (exp, 1) the rhs1 and rhs2 you use > elsewhere. > >> However I am not able to write a test-case that generates cond_expr in the >> IL. >> I tried: >> t1 = strstr (s, t); >> (t1 == s) ? foo() : bar (); >> and other such variants but it seems the ?: operator is getting >> lowered to gimple_cond instead. > > It is, but in some cases tree-if-conv.c turns them back into COND_EXPRs. > I guess you need -ftree-loop-if-convert now, and it has to be in some loop > where the addition of cond_expr would likely turn it into a single bb loop. > You probably want constants or vars, not function calls in the ? : > expressions though. > >> +/* Try to fold strstr (s, t) == s to memcmp (s, t, strlen (t)) == 0. */ >> + >> +static void >> +fold_strstr_to_memcmp(enum tree_code code, tree rhs1, tree rhs2, gimple >> *stmt) > > Formatting, space before (. > >> +{ >> + gimple *call_stmt = NULL; >> + for (int pass = 0; pass < 2; pass++) >> +{ >> + gimple *g = SSA_NAME_DEF_STMT (rhs1); >> + if (g > > I think g should be always non-NULL (except for invalid IL), so probably no > need to check it. Ah indeed, thanks for pointing out. I assumed if ssa-var has default definition, then SSA_NAME_DEF_STMT would be NULL, but it's GIMPLE_NOP. > >> + && gimple_call_builtin_p (g, BUILT_IN_STRSTR) >> + && has_single_use (rhs1) >> + && gimple_call_arg (as_a (g), 0) == rhs2) > > I think gimple_call_arg works fine even with just gimple * argument. > So you can avoid the as_a (g) uglification and just use g. > >> + if (is_gimple_assign (stmt)) >> + { >> + if (gimple_assign_rhs_code (stmt) == COND_EXPR) >> + { >> + tree cond = gimple_assign_rhs1 (stmt); >> + TREE_SET_CODE (cond, EQ_EXPR); > > This looks weird. You are hardcoding EQ_EXPR, while for the > other case below you use code. So, do you handle properly both > EQ_EXPR and NE_EXPR for this and gimple_cond cases? > Also, for non-COND_EXPR assign you build a new stmt instead of reusing > the existing one, why? > >> + TREE_OPERAND (cond, 0) = memcmp_lhs; >> + TREE_OPERAND (cond, 1) = zero; >> + update_stmt (stmt); >> + } >> + else >> + { >> + gsi = gsi_for_stmt (stmt); >> + tree lhs = gimple_assign_lhs (stmt); >> + gassign *ga = gimple_build_assign (lhs, code, memcmp_lhs, >> + zero); >> + gsi_replace (&gsi, ga, false); >> + } >> + } >> + else >> + { >> + gcond *cond = as_a (stmt); >> + gimple_cond_set_lhs (cond, memcmp_lhs); >> + gimple_cond_set_rhs (cond, zero); >> + gimple_cond_set_code (cond, EQ_EXPR); > > Likewise here. Oops, sorry about that :/ Does this version look OK ? Bootstrap+test in progress. Thanks, Prathamesh > > Jakub 2016-12-13 Jakub Jelinek Prathamesh Kulkarni * tree-ssa-strlen.c (fold_strstr_to_memcmp): New function. (strlen_optimize_stmt): Call fold_strstr_to_memcmp. testsuite/ * gcc.dg/strlenopt-30.c: New test-case. diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c b/gcc/testsuite/gcc.dg/strlenopt-30.c new file mode 100644 index 000..089b3a2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/strlenopt-30.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +__attribute__((no_icf)) +_Bool f1(char *s) +{ + return __builtin_strstr (s, "hello") == s; +} + +__attribute__((no_icf)) +_Bool f2(char *s) +{ + return s == __builtin_strstr (s, "hello");
Re: Fold strstr (s, t) eq/ne s to strcmp (s, t) eq/ne 0 if strlen (t) is known
On 13 December 2016 at 17:54, Jakub Jelinek wrote: > On Tue, Dec 13, 2016 at 05:41:09PM +0530, Prathamesh Kulkarni wrote: >> --- a/gcc/tree-ssa-strlen.c >> +++ b/gcc/tree-ssa-strlen.c >> @@ -,6 +,90 @@ handle_char_store (gimple_stmt_iterator *gsi) >>return true; >> } >> >> +/* Try to fold strstr (s, t) eq/ne s to memcmp (s, t, strlen (t)) eq/ne 0. >> */ >> + >> +static void >> +fold_strstr_to_memcmp (enum tree_code code, tree rhs1, tree rhs2, gimple >> *stmt) > > You can drop code argument here, see below. And I'd say it is better to > do the > if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != SSA_NAME) > return; > here than repeat it in all the callers. > >> + if (gimple_assign_rhs_code (stmt) == COND_EXPR) >> + { >> + tree cond = gimple_assign_rhs1 (stmt); >> + TREE_SET_CODE (cond, code); > > TREE_CODE (cond) is already code, so no need to set it again. > >> + gcond *cond = as_a (stmt); >> + gimple_cond_set_lhs (cond, memcmp_lhs); >> + gimple_cond_set_rhs (cond, zero); >> + gimple_cond_set_code (cond, code); > > And gimple_cond_code (cond) == code here too. > >> + update_stmt (cond); >> + } > > You can perhaps move the update_stmt (stmt); to a single spot after > all the 3 cases are handled. > >> + if (cond_code == EQ_EXPR || cond_code == NE_EXPR) >> + { >> + tree rhs1 = TREE_OPERAND (cond, 0); >> + tree rhs2 = TREE_OPERAND (cond, 1); > > While it is necessary to check cond_code here and in the other spots > similarly, because otherwise you don't know if it has 2 arguments etc., > you can avoid the SSA_NAME tests here. > >> + if (TREE_CODE (rhs1) == SSA_NAME >> + && TREE_CODE (rhs2) == SSA_NAME) >> + fold_strstr_to_memcmp (cond_code, rhs1, rhs2, stmt); >> + } >> + } >> + else if (code == EQ_EXPR || code == NE_EXPR) >> + { >> + tree rhs1 = gimple_assign_rhs1 (stmt); >> + tree rhs2 = gimple_assign_rhs2 (stmt); >> + >> + if (TREE_CODE (rhs1) == SSA_NAME >> + && TREE_CODE (rhs2) == SSA_NAME) > > And here. >> + fold_strstr_to_memcmp (code, rhs1, rhs2, stmt); >> + } >> + } >> +else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs)) >> { >> tree type = TREE_TYPE (lhs); >> if (TREE_CODE (type) == ARRAY_TYPE) >> @@ -2316,6 +2427,17 @@ strlen_optimize_stmt (gimple_stmt_iterator *gsi) >> } >> } >> } >> + else if (gcond *cond = dyn_cast (stmt)) >> +{ >> + enum tree_code code = gimple_cond_code (cond); >> + tree lhs = gimple_cond_lhs (stmt); >> + tree rhs = gimple_cond_rhs (stmt); >> + >> + if ((code == EQ_EXPR || code == NE_EXPR) >> + && TREE_CODE (lhs) == SSA_NAME >> + && TREE_CODE (rhs) == SSA_NAME) > > And here. >> + fold_strstr_to_memcmp (code, lhs, rhs, stmt); >> +} >> >>if (gimple_vdef (stmt)) >> maybe_invalidate (stmt); > > Otherwise LGTM, but it would be nice to cover also the COND_EXPR case by a > testcase (can be done incrementally). Hi Jakub, Done the changes in attached version. Bootstrap+tested on x86_64-unknown-linux-gnu with default languages and cross-tested on arm*-*-*, aarch64*-*-* with c,c++,fortran. It it OK to commit ? I am trying to come up with COND_EXPR test-case. Thanks, Prathamesh > > Jakub 2016-12-14 Jakub Jelinek Prathamesh Kulkarni * tree-ssa-strlen.c (fold_strstr_to_memcmp): New function. (strlen_optimize_stmt): Call fold_strstr_to_memcmp. testsuite/ * gcc.dg/strlenopt-30.c: New test-case. diff --git a/gcc/testsuite/gcc.dg/strlenopt-30.c b/gcc/testsuite/gcc.dg/strlenopt-30.c new file mode 100644 index 000..089b3a2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/strlenopt-30.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-strlen" } */ + +__attribute__((no_icf)) +_Bool f1(char *s) +{ + return __builtin_strstr (s, "hello") == s; +} + +__attribute__((no_icf)) +_Bool f2(char *s) +{ + return s == __builtin_strstr (s, "hello"); +} + +__attribute__((no_icf)) +_Bool f3(char *s) +{ + return s != __builtin_strstr (s, "hello"); +} + +__attribute__((no_icf)) +_Bool f4() +{ + char *foo_f4(void); + char *t1 = fo
[gimplefe] reject invalid pass name in startwith
Hi Richard, The attached patch attempts to reject invalid pass-name in startwith and verified gimplefe tests pass with the patch (not sure if bootstrap is required?) Does it look OK ? Thanks, Prathamesh 2016-12-18 Prathamesh Kulkarni c/ * gimple-parser.c (c_parser_gimple_pass_list): Reject invalid pass name. testsuite/ * gcc.dg/gimplefe-19.c: New test-case. diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c index ddecaec..ec1dbb3 100644 --- a/gcc/c/gimple-parser.c +++ b/gcc/c/gimple-parser.c @@ -1046,6 +1046,17 @@ c_parser_gimple_pass_list (c_parser *parser) if (! c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>")) return NULL; + if (pass) +{ + char *full_passname = (char *) xmalloc (strlen ("tree-") + strlen (pass) + 1); + strcpy (full_passname, "tree-"); + strcat (full_passname, pass); + opt_pass *p = g->get_passes ()->get_pass_by_name (full_passname); + if (!p || p->type != GIMPLE_PASS) + error_at (c_parser_peek_token (parser)->location, + "%s is not a valid GIMPLE pass\n", pass); + free (full_passname); +} return pass; } diff --git a/gcc/testsuite/gcc.dg/gimplefe-19.c b/gcc/testsuite/gcc.dg/gimplefe-19.c new file mode 100644 index 000..bb5be33 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gimplefe-19.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fgimple" } */ + +void __GIMPLE (startwith ("combine")) foo () /* { dg-error "not a valid GIMPLE pass" } */ +{ + return; +}
Re: [gimplefe] reject invalid pass name in startwith
On 18 December 2016 at 18:02, Jakub Jelinek wrote: > On Sun, Dec 18, 2016 at 05:41:23PM +0530, Prathamesh Kulkarni wrote: >> --- a/gcc/c/gimple-parser.c >> +++ b/gcc/c/gimple-parser.c >> @@ -1046,6 +1046,17 @@ c_parser_gimple_pass_list (c_parser *parser) >>if (! c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>")) >> return NULL; >> >> + if (pass) >> +{ >> + char *full_passname = (char *) xmalloc (strlen ("tree-") + strlen >> (pass) + 1); >> + strcpy (full_passname, "tree-"); >> + strcat (full_passname, pass); > > Use > char *full_passname = concat ("tree-", pass, NULL); > instead? Thanks! Modified the patch to use concat(). Regards, Prathamesh > > Jakub 2016-12-18 Prathamesh Kulkarni c/ * gimple-parser.c (c_parser_gimple_pass_list): Reject invalid pass name. testsuite/ * gcc.dg/gimplefe-19.c: New test-case. diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c index ddecaec..68d2d74 100644 --- a/gcc/c/gimple-parser.c +++ b/gcc/c/gimple-parser.c @@ -1046,6 +1046,15 @@ c_parser_gimple_pass_list (c_parser *parser) if (! c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>")) return NULL; + if (pass) +{ + char *full_passname = concat ("tree-", pass, NULL); + opt_pass *p = g->get_passes ()->get_pass_by_name (full_passname); + if (!p || p->type != GIMPLE_PASS) + error_at (c_parser_peek_token (parser)->location, + "%s is not a valid GIMPLE pass\n", pass); + free (full_passname); +} return pass; } diff --git a/gcc/testsuite/gcc.dg/gimplefe-19.c b/gcc/testsuite/gcc.dg/gimplefe-19.c new file mode 100644 index 000..bb5be33 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gimplefe-19.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fgimple" } */ + +void __GIMPLE (startwith ("combine")) foo () /* { dg-error "not a valid GIMPLE pass" } */ +{ + return; +}
Re: increase alignment of global structs in increase_alignment pass
On 19 May 2016 at 13:19, Richard Biener wrote: > On Thu, 19 May 2016, Prathamesh Kulkarni wrote: > >> On 18 May 2016 at 19:38, Richard Biener wrote: >> > On Wed, 18 May 2016, Prathamesh Kulkarni wrote: >> > >> >> On 17 May 2016 at 18:36, Richard Biener wrote: >> >> > On Wed, 11 May 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 6 May 2016 at 17:20, Richard Biener wrote: >> >> >> > >> >> >> > You can't simply use >> >> >> > >> >> >> > + offset = int_byte_position (field); >> >> >> > >> >> >> > as it can be placed at variable offset which will make >> >> >> > int_byte_position >> >> >> > ICE. Note it also returns a truncated byte position (bit position >> >> >> > stripped) which may be undesirable here. I think you want to use >> >> >> > bit_position and if and only if DECL_FIELD_OFFSET and >> >> >> > DECL_FIELD_BIT_OFFSET are INTEGER_CST. >> >> >> oops, I didn't realize offsets could be variable. >> >> >> Will that be the case only for VLA member inside struct ? >> >> > >> >> > And non-VLA members after such member. >> >> > >> >> >> > Your observation about the expensiveness of the walk still stands I >> >> >> > guess >> >> >> > and eventually you should at least cache the >> >> >> > get_vec_alignment_for_record_decl cases. Please make those workers >> >> >> > _type rather than _decl helpers. >> >> >> Done >> >> >> > >> >> >> > You seem to simply get at the maximum vectorized field/array element >> >> >> > alignment possible for all arrays - you could restrict that to >> >> >> > arrays with at least vector size (as followup). >> >> >> Um sorry, I didn't understand this part. >> >> > >> >> > It doesn't make sense to align >> >> > >> >> > struct { int a; int b; int c; int d; float b[3]; int e; }; >> >> > >> >> > because we have a float[3] member. There is no vector size that >> >> > would cover the float[3] array. >> >> Thanks for the explanation. >> >> So we do not want to align struct if sizeof (array_field) < sizeof >> >> (vector_type). >> >> This issue is also present without patch for global arrays, so I modified >> >> get_vec_alignment_for_array_type, to return 0 if sizeof (array_type) < >> >> sizeof (vectype). >> >> > >> >> >> > >> >> >> > + /* Skip artificial decls like typeinfo decls or if >> >> >> > + record is packed. */ >> >> >> > + if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type)) >> >> >> > +return 0; >> >> >> > >> >> >> > I think we should honor DECL_USER_ALIGN as well and not mess with >> >> >> > those >> >> >> > decls. >> >> >> Done >> >> >> > >> >> >> > Given the patch now does quite some extra work it might make sense >> >> >> > to split the symtab part out of the vect_can_force_dr_alignment_p >> >> >> > predicate and call that early. >> >> >> In the patch I call symtab_node::can_increase_alignment_p early. I >> >> >> tried >> >> >> moving that to it's callers - vect_compute_data_ref_alignment and >> >> >> increase_alignment::execute, however that failed some tests in vect, >> >> >> and >> >> >> hence I didn't add the following hunk in the patch. Did I miss some >> >> >> check ? >> >> > >> >> > Not sure. >> >> > >> >> >> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> >> >> index 7652e21..2c1acee 100644 >> >> >> --- a/gcc/tree-vect-data-refs.c >> >> >> +++ b/gcc/tree-vect-data-refs.c >> >> >> @@ -795,7 +795,10 @@ vect_compute_data_ref_alignment (struct >> >> >> data_reference *dr) >> >> >>&& TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR) >> >> >> base = TREE_O
RFC [1/2] divmod transform
Hi, I have updated my patch for divmod (attached), which was originally based on Kugan's patch. The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR having same operands to divmod representation, so we can cse computation of mod. t1 = a TRUNC_DIV_EXPR b; t2 = a TRUNC_MOD_EXPR b is transformed to: complex_tmp = DIVMOD (a, b); t1 = REALPART_EXPR (complex_tmp); t2 = IMAGPART_EXPR (complex_tmp); * New hook divmod_expand_libfunc The rationale for introducing the hook is that different targets have incompatible calling conventions for divmod libfunc. Currently three ports define divmod libfunc: c6x, spu and arm. c6x and spu follow the convention of libgcc2.c:__udivmoddi4: return quotient and store remainder in argument passed as pointer, while the arm version takes two arguments and returns both quotient and remainder having mode double the size of the operand mode. The port should hence override the hook expand_divmod_libfunc to generate call to target-specific divmod. Ports should define this hook if: a) The port does not have divmod or div insn for the given mode. b) The port defines divmod libfunc for the given mode. The default hook default_expand_divmod_libfunc() generates call to libgcc2.c:__udivmoddi4 provided the operands are unsigned and are of DImode. Patch passes bootstrap+test on x86_64-unknown-linux-gnu and cross-tested on arm*-*-*. Bootstrap+test in progress on arm-linux-gnueabihf. Does this patch look OK ? Thanks, Prathamesh diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8c7f2a1..111f19f 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, please refer to the hook implementation for how different fusion types are supported. @end deftypefn +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool @var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx *@var{quot}, rtx *@var{rem}) +Define this hook if the port does not have hardware div and divmod insn for +the given mode but has divmod libfunc, which is incompatible +with libgcc2.c:__udivmoddi4 +@end deftypefn + @node Sections @section Dividing the Output into Sections (Texts, Data, @dots{}) @c the above section title is WAY too long. maybe cut the part between diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index f963a58..2c9a800 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4848,6 +4848,8 @@ them: try the first ones in this list first. @hook TARGET_SCHED_FUSION_PRIORITY +@hook TARGET_EXPAND_DIVMOD_LIBFUNC + @node Sections @section Dividing the Output into Sections (Texts, Data, @dots{}) @c the above section title is WAY too long. maybe cut the part between diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index c867ddc..0cb59f7 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p +/* Expand DIVMOD() using: + a) optab handler for udivmod/sdivmod if it is available. + b) If optab_handler doesn't exist, Generate call to +optab_libfunc for udivmod/sdivmod. */ + +static void +expand_DIVMOD (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg0 = gimple_call_arg (stmt, 0); + tree arg1 = gimple_call_arg (stmt, 1); + + gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); + tree type = TREE_TYPE (TREE_TYPE (lhs)); + machine_mode mode = TYPE_MODE (type); + bool unsignedp = TYPE_UNSIGNED (type); + optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab; + + rtx op0 = expand_normal (arg0); + rtx op1 = expand_normal (arg1); + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + + rtx quotient, remainder; + + /* Check if optab handler exists for [u]divmod. */ + if (optab_handler (tab, mode) != CODE_FOR_nothing) +{ + quotient = gen_reg_rtx (mode); + remainder = gen_reg_rtx (mode); + expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp); +} + else +targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1, + "ient, &remainder); + + /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR. */ + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), + make_tree (TREE_TYPE (arg0), quotient), + make_tree (TREE_TYPE (arg1), remainder)), + target, VOIDmode, EXPAND_NORMAL); +} + /* Return true if FN is supported for the types in TYPES when the optimization type is OPT_TYPE. The types are those associated with the "type0" and "type1" fields of FN's direct_internal_fn_info diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index e729d85..56a80f1 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -194,6 +194,9 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AN
[RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
Hi, This patch overrides expand_divmod_libfunc for ARM port and adds test-cases. I separated the SImode tests into separate file from DImode tests because certain arm configs (cortex-15) have hardware div insn for SImode but not for DImode, and for that config we want SImode tests to be disabled but not DImode tests. The patch therefore has two target-effective checks: divmod and divmod_simode. Cross-tested on arm*-*-*. Bootstrap+test on arm-linux-gnueabihf in progress. Does this patch look OK ? Thanks, Prathamesh diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 12060ba..1310006 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -61,6 +61,7 @@ #include "builtins.h" #include "tm-constrs.h" #include "rtl-iter.h" +#include "optabs-libfuncs.h" /* This file should be included last. */ #include "target-def.h" @@ -300,6 +301,7 @@ static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void); static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*); static bool arm_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT, const_tree); +static void arm_expand_divmod_libfunc (bool, machine_mode, rtx, rtx, rtx *, rtx *); /* Table of machine attributes. */ @@ -730,6 +732,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_SCHED_FUSION_PRIORITY #define TARGET_SCHED_FUSION_PRIORITY arm_sched_fusion_priority +#undef TARGET_EXPAND_DIVMOD_LIBFUNC +#define TARGET_EXPAND_DIVMOD_LIBFUNC arm_expand_divmod_libfunc + struct gcc_target targetm = TARGET_INITIALIZER; /* Obstack for minipool constant handling. */ @@ -30354,6 +30359,37 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri, return; } +/* Expand call to __aeabi_[mode]divmod (op0, op1). */ + +static void +arm_expand_divmod_libfunc (bool unsignedp, machine_mode mode, + rtx op0, rtx op1, + rtx *quot_p, rtx *rem_p) +{ + if (mode == SImode) +gcc_assert (!TARGET_IDIV); + + optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab; + rtx libfunc = optab_libfunc (tab, mode); + gcc_assert (libfunc); + + machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), +MODE_INT); + + rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, + libval_mode, 2, + op0, GET_MODE (op0), + op1, GET_MODE (op1)); + + rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0); + rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, GET_MODE_SIZE (mode)); + + gcc_assert (quotient); + gcc_assert (remainder); + + *quot_p = quotient; + *rem_p = remainder; +} /* Construct and return a PARALLEL RTX vector with elements numbering the lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of diff --git a/gcc/testsuite/gcc.dg/divmod-1-simode.c b/gcc/testsuite/gcc.dg/divmod-1-simode.c new file mode 100644 index 000..7405f66 --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-1-simode.c @@ -0,0 +1,22 @@ +/* { dg-require-effective-target divmod_simode } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* div dominates mod. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype q = x / y; \ + if (cond) \ +foo (); \ + bigtype r = x % y; \ + return q + r; \ +} + +FOO(int, int, 1) +FOO(int, unsigned, 2) +FOO(unsigned, unsigned, 5) + +/* { dg-final { scan-tree-dump-times "DIVMOD" 3 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/divmod-1.c b/gcc/testsuite/gcc.dg/divmod-1.c new file mode 100644 index 000..40aec74 --- /dev/null +++ b/gcc/testsuite/gcc.dg/divmod-1.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target divmod } */ +/* { dg-options "-O2 -fdump-tree-widening_mul-details" } */ +/* div dominates mod. */ + +extern int cond; +void foo(void); + +#define FOO(smalltype, bigtype, no) \ +bigtype f_##no(smalltype x, bigtype y) \ +{ \ + bigtype q = x / y; \ + if (cond) \ +foo (); \ + bigtype r = x % y; \ + return q + r; \ +} + +FOO(int, long long, 3) +FOO(int, unsigned long long, 4) +FOO(unsigned, long long, 6) +FOO(unsigned, unsigned long long, 7) +FOO(long long, long long, 8) +FOO(long long, unsigned long long, 9) +FOO(unsigned long long, unsigned long long, 10) + +/* { dg-final { scan-tree-dump-times "DIVMOD" 7 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/divmod-2-simode.c b/gcc/testsuite/gcc.dg/divmod-2-simod
Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations
On 5 February 2016 at 18:40, Prathamesh Kulkarni wrote: > On 4 February 2016 at 16:31, Ramana Radhakrishnan > wrote: >> On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni >> wrote: >>> On 31 July 2015 at 15:04, Ramana Radhakrishnan >>> wrote: >>>> >>>> >>>> On 29/07/15 11:09, Prathamesh Kulkarni wrote: >>>>> Hi, >>>>> This patch tries to implement division with multiplication by >>>>> reciprocal using vrecpe/vrecps >>>>> with -funsafe-math-optimizations and -freciprocal-math enabled. >>>>> Tested on arm-none-linux-gnueabihf using qemu. >>>>> OK for trunk ? >>>>> >>>>> Thank you, >>>>> Prathamesh >>>>> >>>> >>>> I've tried this in the past and never been convinced that 2 iterations are >>>> enough to get to stability with this given that the results are only >>>> precise for 8 bits / iteration. Thus I've always believed you need 3 >>>> iterations rather than 2 at which point I've never been sure that it's >>>> worth it. So the testing that you've done with this currently is not >>>> enough for this to go into the tree. >>>> >>>> I'd like this to be tested on a couple of different AArch32 >>>> implementations with a wider range of inputs to verify that the results >>>> are acceptable as well as running something like SPEC2k(6) with atleast >>>> one iteration to ensure correctness. >>> Hi, >>> I got results of SPEC2k6 fp benchmarks: >>> a15: +0.64% overall, 481.wrf: +6.46% >>> a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% >>> a57: +0.35% overall, 481.wrf: +3.84% >>> The other benchmarks had (almost) identical results. >> >> Thanks for the benchmarking results - Please repost the patch with >> the changes that I had requested in my previous review - given it is >> now stage4 , I would rather queue changes like this for stage1 now. > Hi, > Please find the updated patch attached. > It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and > arm-none-eabi. > However the test-case added in the patch (neon-vect-div-1.c) fails to > get vectorized at -O2 > for armeb-none-linux-gnueabihf. > Charles suggested me to try with -O3, which worked. > It appears the test-case fails to get vectorized with > -fvect-cost-model=cheap (which is default enabled at -O2) > and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic > > I can't figure out why it fails -fvect-cost-model=cheap. > From the vect dump (attached): > neon-vect-div-1.c:12:3: note: Setting misalignment to -1. > neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9 Hi, I think I have some idea why the test-case fails attached with patch fail to get vectorized on armeb with -O2. Issue with big endian vectorizer: The patch does not cause regressions on big endian vectorizer but fails to vectorize the test-cases attached with the patch, while they get vectorized on litttle-endian. Fails with armeb with the following message in dump: note: not vectorized: unsupported unaligned load.*_9 The behavior of big and little endian vectorizer seems to be different in arm_builtin_support_vector_misalignment() which overrides the hook targetm.vectorize.support_vector_misalignment(). targetm.vectorize.support_vector_misalignment is called by vect_supportable_dr_alignment () which in turn is called by verify_data_refs_alignment (). Execution upto following condition is common between arm and armeb in vect_supportable_dr_alignment(): if ((TYPE_USER_ALIGN (type) && !is_packed) || targetm.vectorize.support_vector_misalignment (mode, type, DR_MISALIGNMENT (dr), is_packed)) /* Can't software pipeline the loads, but can at least do them. */ return dr_unaligned_supported; For little endian case: arm_builtin_support_vector_misalignment() is called with V2SF mode and misalignment == -1, and the following condition becomes true: /* If the misalignment is unknown, we should be able to handle the access so long as it is not to a member of a packed data structure. */ if (misalignment == -1) return true; Since the hook returned true we enter the condition above in vect_supportable_dr_alignment() and return dr_unaligned_supported; For big-endian: arm_builtin_support_vector_misalignment() is called with V2SF mode. The following condition that gates the entire function body fails: if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) and the default hook gets called with V2SF mod
Re: RFC [1/2] divmod transform
On 23 May 2016 at 17:35, Richard Biener wrote: > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni > wrote: >> Hi, >> I have updated my patch for divmod (attached), which was originally >> based on Kugan's patch. >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR >> having same operands to divmod representation, so we can cse computation of >> mod. >> >> t1 = a TRUNC_DIV_EXPR b; >> t2 = a TRUNC_MOD_EXPR b >> is transformed to: >> complex_tmp = DIVMOD (a, b); >> t1 = REALPART_EXPR (complex_tmp); >> t2 = IMAGPART_EXPR (complex_tmp); >> >> * New hook divmod_expand_libfunc >> The rationale for introducing the hook is that different targets have >> incompatible calling conventions for divmod libfunc. >> Currently three ports define divmod libfunc: c6x, spu and arm. >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4: >> return quotient and store remainder in argument passed as pointer, >> while the arm version takes two arguments and returns both >> quotient and remainder having mode double the size of the operand mode. >> The port should hence override the hook expand_divmod_libfunc >> to generate call to target-specific divmod. >> Ports should define this hook if: >> a) The port does not have divmod or div insn for the given mode. >> b) The port defines divmod libfunc for the given mode. >> The default hook default_expand_divmod_libfunc() generates call >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and >> are of DImode. >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and >> cross-tested on arm*-*-*. >> Bootstrap+test in progress on arm-linux-gnueabihf. >> Does this patch look OK ? > > diff --git a/gcc/targhooks.c b/gcc/targhooks.c > index 6b4601b..e4a021a 100644 > --- a/gcc/targhooks.c > +++ b/gcc/targhooks.c > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, > machine_mode, optimization_type) >return true; > } > > +void > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode, > + rtx op0, rtx op1, > + rtx *quot_p, rtx *rem_p) > > functions need a comment. > > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style? In that > case we could avoid the target hook. Well I would prefer adding the hook because that's more easier -;) Would it be ok for now to go with the hook ? > > + /* If target overrides expand_divmod_libfunc hook > +then perform divmod by generating call to the target-specifc divmod > libfunc. */ > + if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc) > + return true; > + > + /* Fall back to using libgcc2.c:__udivmoddi4. */ > + return (mode == DImode && unsignedp); > > I don't understand this - we know optab_libfunc returns non-NULL for 'mode' > but still restrict this to DImode && unsigned? Also if > targetm.expand_divmod_libfunc > is not the default we expect the target to handle all modes? Ah indeed, the check for DImode is unnecessary. However I suppose the check for unsignedp should be there, since we want to generate call to __udivmoddi4 only if operand is unsigned ? > > That said - I expected the above piece to be simply a 'return true;' ;) > > Usually we use some can_expand_XXX helper in optabs.c to query if the target > supports a specific operation (for example SImode divmod would use DImode > divmod by means of widening operands - for the unsigned case of course). Thanks for pointing out. So if a target does not support divmod libfunc for a mode but for a wider mode, then we could zero-extend operands to the wider-mode, perform divmod on the wider-mode, and then cast result back to the original mode. I haven't done that in this patch, would it be OK to do that as a follow up ? > > + /* Disable the transform if either is a constant, since > division-by-constant > + may have specialized expansion. */ > + if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2)) > +return false; > > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2) > > + if (TYPE_OVERFLOW_TRAPS (type)) > +return false; > > why's that? Generally please first test cheap things (trapping, > constant-ness) > before checking expensive stuff (target_supports_divmod_p). I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in: https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html "When looking at TRUNC_DIV_EXPR you should also exclude the case where TYPE_OVERFLOW_TRAPS (type) as that should expand using the [su]divv optabs
Re: RFC [1/2] divmod transform
On 24 May 2016 at 17:42, Richard Biener wrote: > On Tue, 24 May 2016, Prathamesh Kulkarni wrote: > >> On 23 May 2016 at 17:35, Richard Biener wrote: >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni >> > wrote: >> >> Hi, >> >> I have updated my patch for divmod (attached), which was originally >> >> based on Kugan's patch. >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR >> >> having same operands to divmod representation, so we can cse computation >> >> of mod. >> >> >> >> t1 = a TRUNC_DIV_EXPR b; >> >> t2 = a TRUNC_MOD_EXPR b >> >> is transformed to: >> >> complex_tmp = DIVMOD (a, b); >> >> t1 = REALPART_EXPR (complex_tmp); >> >> t2 = IMAGPART_EXPR (complex_tmp); >> >> >> >> * New hook divmod_expand_libfunc >> >> The rationale for introducing the hook is that different targets have >> >> incompatible calling conventions for divmod libfunc. >> >> Currently three ports define divmod libfunc: c6x, spu and arm. >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4: >> >> return quotient and store remainder in argument passed as pointer, >> >> while the arm version takes two arguments and returns both >> >> quotient and remainder having mode double the size of the operand mode. >> >> The port should hence override the hook expand_divmod_libfunc >> >> to generate call to target-specific divmod. >> >> Ports should define this hook if: >> >> a) The port does not have divmod or div insn for the given mode. >> >> b) The port defines divmod libfunc for the given mode. >> >> The default hook default_expand_divmod_libfunc() generates call >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and >> >> are of DImode. >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and >> >> cross-tested on arm*-*-*. >> >> Bootstrap+test in progress on arm-linux-gnueabihf. >> >> Does this patch look OK ? >> > >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c >> > index 6b4601b..e4a021a 100644 >> > --- a/gcc/targhooks.c >> > +++ b/gcc/targhooks.c >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, >> > machine_mode, optimization_type) >> >return true; >> > } >> > >> > +void >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode, >> > + rtx op0, rtx op1, >> > + rtx *quot_p, rtx *rem_p) >> > >> > functions need a comment. >> > >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style? In >> > that >> > case we could avoid the target hook. >> Well I would prefer adding the hook because that's more easier -;) >> Would it be ok for now to go with the hook ? >> > >> > + /* If target overrides expand_divmod_libfunc hook >> > +then perform divmod by generating call to the target-specifc >> > divmod >> > libfunc. */ >> > + if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc) >> > + return true; >> > + >> > + /* Fall back to using libgcc2.c:__udivmoddi4. */ >> > + return (mode == DImode && unsignedp); >> > >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode' >> > but still restrict this to DImode && unsigned? Also if >> > targetm.expand_divmod_libfunc >> > is not the default we expect the target to handle all modes? >> Ah indeed, the check for DImode is unnecessary. >> However I suppose the check for unsignedp should be there, >> since we want to generate call to __udivmoddi4 only if operand is unsigned ? > > The optab libfunc for sdivmod should be NULL in that case. Ah indeed, thanks. > >> > >> > That said - I expected the above piece to be simply a 'return true;' ;) >> > >> > Usually we use some can_expand_XXX helper in optabs.c to query if the >> > target >> > supports a specific operation (for example SImode divmod would use DImode >> > divmod by means of widening operands - for the unsigned case of course). >> Thanks for pointing out. So if a target does not support divmod >> libfunc for a mode >> but for a wider mode, then we could zero
Re: RFC [1/2] divmod transform
On 24 May 2016 at 19:39, Richard Biener wrote: > On Tue, 24 May 2016, Prathamesh Kulkarni wrote: > >> On 24 May 2016 at 17:42, Richard Biener wrote: >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote: >> > >> >> On 23 May 2016 at 17:35, Richard Biener >> >> wrote: >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni >> >> > wrote: >> >> >> Hi, >> >> >> I have updated my patch for divmod (attached), which was originally >> >> >> based on Kugan's patch. >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR >> >> >> having same operands to divmod representation, so we can cse >> >> >> computation of mod. >> >> >> >> >> >> t1 = a TRUNC_DIV_EXPR b; >> >> >> t2 = a TRUNC_MOD_EXPR b >> >> >> is transformed to: >> >> >> complex_tmp = DIVMOD (a, b); >> >> >> t1 = REALPART_EXPR (complex_tmp); >> >> >> t2 = IMAGPART_EXPR (complex_tmp); >> >> >> >> >> >> * New hook divmod_expand_libfunc >> >> >> The rationale for introducing the hook is that different targets have >> >> >> incompatible calling conventions for divmod libfunc. >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm. >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4: >> >> >> return quotient and store remainder in argument passed as pointer, >> >> >> while the arm version takes two arguments and returns both >> >> >> quotient and remainder having mode double the size of the operand mode. >> >> >> The port should hence override the hook expand_divmod_libfunc >> >> >> to generate call to target-specific divmod. >> >> >> Ports should define this hook if: >> >> >> a) The port does not have divmod or div insn for the given mode. >> >> >> b) The port defines divmod libfunc for the given mode. >> >> >> The default hook default_expand_divmod_libfunc() generates call >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and >> >> >> are of DImode. >> >> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and >> >> >> cross-tested on arm*-*-*. >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf. >> >> >> Does this patch look OK ? >> >> > >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c >> >> > index 6b4601b..e4a021a 100644 >> >> > --- a/gcc/targhooks.c >> >> > +++ b/gcc/targhooks.c >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode, >> >> > machine_mode, optimization_type) >> >> >return true; >> >> > } >> >> > >> >> > +void >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode, >> >> > + rtx op0, rtx op1, >> >> > + rtx *quot_p, rtx *rem_p) >> >> > >> >> > functions need a comment. >> >> > >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style? >> >> > In that >> >> > case we could avoid the target hook. >> >> Well I would prefer adding the hook because that's more easier -;) >> >> Would it be ok for now to go with the hook ? >> >> > >> >> > + /* If target overrides expand_divmod_libfunc hook >> >> > +then perform divmod by generating call to the target-specifc >> >> > divmod >> >> > libfunc. */ >> >> > + if (targetm.expand_divmod_libfunc != >> >> > default_expand_divmod_libfunc) >> >> > + return true; >> >> > + >> >> > + /* Fall back to using libgcc2.c:__udivmoddi4. */ >> >> > + return (mode == DImode && unsignedp); >> >> > >> >> > I don't understand this - we know optab_libfunc returns non-NULL for >> >> > 'mode' >> >> > but still restrict this to DImode && unsigned? Also if >> >> > targetm.expand_divmod_libfunc >> >> > is not the default we expect
Re: [match.pd] Fix for PR35691
On 4 November 2016 at 13:41, Richard Biener wrote: > On Thu, 3 Nov 2016, Marc Glisse wrote: > >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >> > > > > The transform would also work for vectors (element_precision for >> > > > > the test but also a value-matching zero which should ensure the >> > > > > same number of elements). >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a >> > > > matching zero. >> > > > Could you please elaborate on that ? >> > > >> > > He may have meant something like: >> > > >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> > >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the >> > point of the pattern. >> >> Oups, that's what I had written first, and then I somehow managed to confuse >> myself enough to remove it so as to remove the call to types_match :-( >> >> > > So the last operand is checked with operand_equal_p instead of >> > > integer_zerop. But the fact that we could compute bit_ior on the >> > > comparison results should already imply that the number of elements is >> > > the >> > > same. >> > >> > Though for equality compares we also allow scalar results IIRC. >> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one >> for a testcase, at least until the GIMPLE FE lands... >> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some >> > > vector mode on s390?), it would be cool to do the same for floats (most >> > > likely at the RTL level). >> > >> > On GIMPLE view-converts could come to the rescue here as well. Or we cab >> > just allow bit-and/or on floats as much as we allow them on pointers. >> >> Would that generate sensible code on targets that do not have logic insns for >> floats? Actually, even on x86_64 that generates inefficient code, so there >> would be some work (for instance grep finds no gen_iordf3, only >> gen_iorv2df3). >> >> I am also a bit wary of doing those obfuscating optimizations too early... >> a==0 is something that other optimizations might use. long >> c=(long&)a|(long&)b; (double&)c==0; less so... >> >> (and I am assuming that signaling NaNs don't make the whole transformation >> impossible, which might be wrong) > > Yeah. I also think it's not so much important - I just wanted to mention > vectors... > > Btw, I still think we need a more sensible infrastructure for passes > to gather, analyze and modify complex conditions. (I'm always pointing > to tree-affine.c as an, albeit not very good, example for handling > a similar problem) Thanks for mentioning the value-matching capture @@, I wasn't aware of this match.pd feature. The current patch keeps it restricted to only bitwise operators on integers. Bootstrap+test running on x86_64-unknown-linux-gnu. OK to commit if passes ? Thanks, Prathamesh > > Richard. 2016-11-04 Prathamesh Kulkarni PR middle-end/35691 * match.pd: Add following two patterns: (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. testsuite/ * gcc.dg/pr35691-1.c: New test-case. * gcc.dg/pr35691-2.c: Likewise. * gcc.dg/pr35691-3.c: Likewise. * gcc.dg/pr35691-4.c: Likewise. diff --git a/gcc/match.pd b/gcc/match.pd index 48f7351..4f74942 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -519,6 +519,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TYPE_UNSIGNED (type)) (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1) +/* PR35691: Transform + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ + +(for bitop (bit_and bit_ior) + cmp (eq ne) + (simplify + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); } + /* Fold (A & ~B) - (A & B) into (A ^ B) - B. */ (simplify (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1)) diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c new file mode 100644 index 000..5211f815 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr35691-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop-details" } */ + +int foo(int z0
Re: [match.pd] Fix for PR35691
On 7 November 2016 at 15:43, Richard Biener wrote: > On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: > >> On 4 November 2016 at 13:41, Richard Biener wrote: >> > On Thu, 3 Nov 2016, Marc Glisse wrote: >> > >> >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >> >> >> > > > > The transform would also work for vectors (element_precision for >> >> > > > > the test but also a value-matching zero which should ensure the >> >> > > > > same number of elements). >> >> > > > Um sorry, I didn't get how to check vectors to be of equal length >> >> > > > by a >> >> > > > matching zero. >> >> > > > Could you please elaborate on that ? >> >> > > >> >> > > He may have meant something like: >> >> > > >> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> >> > >> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was >> >> > the >> >> > point of the pattern. >> >> >> >> Oups, that's what I had written first, and then I somehow managed to >> >> confuse >> >> myself enough to remove it so as to remove the call to types_match :-( >> >> >> >> > > So the last operand is checked with operand_equal_p instead of >> >> > > integer_zerop. But the fact that we could compute bit_ior on the >> >> > > comparison results should already imply that the number of elements >> >> > > is the >> >> > > same. >> >> > >> >> > Though for equality compares we also allow scalar results IIRC. >> >> >> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate >> >> one >> >> for a testcase, at least until the GIMPLE FE lands... >> >> >> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe >> >> > > some >> >> > > vector mode on s390?), it would be cool to do the same for floats >> >> > > (most >> >> > > likely at the RTL level). >> >> > >> >> > On GIMPLE view-converts could come to the rescue here as well. Or we >> >> > cab >> >> > just allow bit-and/or on floats as much as we allow them on pointers. >> >> >> >> Would that generate sensible code on targets that do not have logic insns >> >> for >> >> floats? Actually, even on x86_64 that generates inefficient code, so there >> >> would be some work (for instance grep finds no gen_iordf3, only >> >> gen_iorv2df3). >> >> >> >> I am also a bit wary of doing those obfuscating optimizations too early... >> >> a==0 is something that other optimizations might use. long >> >> c=(long&)a|(long&)b; (double&)c==0; less so... >> >> >> >> (and I am assuming that signaling NaNs don't make the whole transformation >> >> impossible, which might be wrong) >> > >> > Yeah. I also think it's not so much important - I just wanted to mention >> > vectors... >> > >> > Btw, I still think we need a more sensible infrastructure for passes >> > to gather, analyze and modify complex conditions. (I'm always pointing >> > to tree-affine.c as an, albeit not very good, example for handling >> > a similar problem) >> Thanks for mentioning the value-matching capture @@, I wasn't aware of >> this match.pd feature. >> The current patch keeps it restricted to only bitwise operators on integers. >> Bootstrap+test running on x86_64-unknown-linux-gnu. >> OK to commit if passes ? > > +/* PR35691: Transform > + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. > + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > + > > Please omit the vertical space > > +(for bitop (bit_and bit_ior) > + cmp (eq ne) > + (simplify > + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) > > if you capture the first integer_zerop as @2 then you can re-use it... > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE > (@1))) > +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); > > ... here inplace of the { build_zero_cst ... }. > > Ok with th
Re: [match.pd] Fix for PR35691
On 7 November 2016 at 23:06, Prathamesh Kulkarni wrote: > On 7 November 2016 at 15:43, Richard Biener wrote: >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >>> On 4 November 2016 at 13:41, Richard Biener wrote: >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >>> > >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >>> >> >>> >> > > > > The transform would also work for vectors (element_precision for >>> >> > > > > the test but also a value-matching zero which should ensure the >>> >> > > > > same number of elements). >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length >>> >> > > > by a >>> >> > > > matching zero. >>> >> > > > Could you please elaborate on that ? >>> >> > > >>> >> > > He may have meant something like: >>> >> > > >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >>> >> > >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which >>> >> > was the >>> >> > point of the pattern. >>> >> >>> >> Oups, that's what I had written first, and then I somehow managed to >>> >> confuse >>> >> myself enough to remove it so as to remove the call to types_match :-( >>> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the >>> >> > > comparison results should already imply that the number of elements >>> >> > > is the >>> >> > > same. >>> >> > >>> >> > Though for equality compares we also allow scalar results IIRC. >>> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate >>> >> one >>> >> for a testcase, at least until the GIMPLE FE lands... >>> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe >>> >> > > some >>> >> > > vector mode on s390?), it would be cool to do the same for floats >>> >> > > (most >>> >> > > likely at the RTL level). >>> >> > >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or we >>> >> > cab >>> >> > just allow bit-and/or on floats as much as we allow them on pointers. >>> >> >>> >> Would that generate sensible code on targets that do not have logic >>> >> insns for >>> >> floats? Actually, even on x86_64 that generates inefficient code, so >>> >> there >>> >> would be some work (for instance grep finds no gen_iordf3, only >>> >> gen_iorv2df3). >>> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too >>> >> early... >>> >> a==0 is something that other optimizations might use. long >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... >>> >> >>> >> (and I am assuming that signaling NaNs don't make the whole >>> >> transformation >>> >> impossible, which might be wrong) >>> > >>> > Yeah. I also think it's not so much important - I just wanted to mention >>> > vectors... >>> > >>> > Btw, I still think we need a more sensible infrastructure for passes >>> > to gather, analyze and modify complex conditions. (I'm always pointing >>> > to tree-affine.c as an, albeit not very good, example for handling >>> > a similar problem) >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of >>> this match.pd feature. >>> The current patch keeps it restricted to only bitwise operators on integers. >>> Bootstrap+test running on x86_64-unknown-linux-gnu. >>> OK to commit if passes ? >> >> +/* PR35691: Transform >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ >> + >> >> Please omit the vertical space >> >> +(for bitop (bit_and bit_ior) >> + cmp (eq ne) >> + (simplify >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) >> >> if you capture the first integer_zerop as @2 then you can re-use it... >> >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >> (@1))) >> +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); >> >> ... here inplace of the { build_zero_cst ... }. >> >> Ok with that changes. > Thanks, committed the attached version as r241915. ugh, the svn commit message has: testsuite/ * gcc.dg/pr35691-1.c: New test-case. * gcc.dg/pr35691-4.c: Likewise. pr35691-4.c was a typo, should be pr35691-2.c :/ However testsuite/ChangeLog correctly has entry for pr35691-2.c Is it possible to edit the commit message for r241915 ? Sorry about this. Regards, Prathamesh > >> >> Richard.
Re: [match.pd] Fix for PR35691
On 8 November 2016 at 13:23, Richard Biener wrote: > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: > >> On 7 November 2016 at 23:06, Prathamesh Kulkarni >> wrote: >> > On 7 November 2016 at 15:43, Richard Biener wrote: >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >>> On 4 November 2016 at 13:41, Richard Biener wrote: >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >> >>> > >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >>> >> >> >>> >> > > > > The transform would also work for vectors (element_precision >> >>> >> > > > > for >> >>> >> > > > > the test but also a value-matching zero which should ensure >> >>> >> > > > > the >> >>> >> > > > > same number of elements). >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal >> >>> >> > > > length by a >> >>> >> > > > matching zero. >> >>> >> > > > Could you please elaborate on that ? >> >>> >> > > >> >>> >> > > He may have meant something like: >> >>> >> > > >> >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> >>> >> > >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which >> >>> >> > was the >> >>> >> > point of the pattern. >> >>> >> >> >>> >> Oups, that's what I had written first, and then I somehow managed to >> >>> >> confuse >> >>> >> myself enough to remove it so as to remove the call to types_match :-( >> >>> >> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the >> >>> >> > > comparison results should already imply that the number of >> >>> >> > > elements is the >> >>> >> > > same. >> >>> >> > >> >>> >> > Though for equality compares we also allow scalar results IIRC. >> >>> >> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to >> >>> >> generate one >> >>> >> for a testcase, at least until the GIMPLE FE lands... >> >>> >> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, >> >>> >> > > maybe some >> >>> >> > > vector mode on s390?), it would be cool to do the same for floats >> >>> >> > > (most >> >>> >> > > likely at the RTL level). >> >>> >> > >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or >> >>> >> > we cab >> >>> >> > just allow bit-and/or on floats as much as we allow them on >> >>> >> > pointers. >> >>> >> >> >>> >> Would that generate sensible code on targets that do not have logic >> >>> >> insns for >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so >> >>> >> there >> >>> >> would be some work (for instance grep finds no gen_iordf3, only >> >>> >> gen_iorv2df3). >> >>> >> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too >> >>> >> early... >> >>> >> a==0 is something that other optimizations might use. long >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... >> >>> >> >> >>> >> (and I am assuming that signaling NaNs don't make the whole >> >>> >> transformation >> >>> >> impossible, which might be wrong) >> >>> > >> >>> > Yeah. I also think it's not so much important - I just wanted to >> >>> > mention >> >>> > vectors... >> >>> > >> >>> > Btw, I still think we need a more sensib
Re: [match.pd] Fix for PR35691
On 8 November 2016 at 16:46, Richard Biener wrote: > On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote: > >> On 8 November 2016 at 13:23, Richard Biener wrote: >> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni >> >> wrote: >> >> > On 7 November 2016 at 15:43, Richard Biener wrote: >> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >> >> >>> On 4 November 2016 at 13:41, Richard Biener wrote: >> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >> >> >>> > >> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >> >>> >> >> >> >>> >> > > > > The transform would also work for vectors >> >> >>> >> > > > > (element_precision for >> >> >>> >> > > > > the test but also a value-matching zero which should >> >> >>> >> > > > > ensure the >> >> >>> >> > > > > same number of elements). >> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal >> >> >>> >> > > > length by a >> >> >>> >> > > > matching zero. >> >> >>> >> > > > Could you please elaborate on that ? >> >> >>> >> > > >> >> >>> >> > > He may have meant something like: >> >> >>> >> > > >> >> >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> >> >>> >> > >> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 >> >> >>> >> > which was the >> >> >>> >> > point of the pattern. >> >> >>> >> >> >> >>> >> Oups, that's what I had written first, and then I somehow managed >> >> >>> >> to confuse >> >> >>> >> myself enough to remove it so as to remove the call to types_match >> >> >>> >> :-( >> >> >>> >> >> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on >> >> >>> >> > > the >> >> >>> >> > > comparison results should already imply that the number of >> >> >>> >> > > elements is the >> >> >>> >> > > same. >> >> >>> >> > >> >> >>> >> > Though for equality compares we also allow scalar results IIRC. >> >> >>> >> >> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to >> >> >>> >> generate one >> >> >>> >> for a testcase, at least until the GIMPLE FE lands... >> >> >>> >> >> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, >> >> >>> >> > > maybe some >> >> >>> >> > > vector mode on s390?), it would be cool to do the same for >> >> >>> >> > > floats (most >> >> >>> >> > > likely at the RTL level). >> >> >>> >> > >> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. >> >> >>> >> > Or we cab >> >> >>> >> > just allow bit-and/or on floats as much as we allow them on >> >> >>> >> > pointers. >> >> >>> >> >> >> >>> >> Would that generate sensible code on targets that do not have >> >> >>> >> logic insns for >> >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, >> >> >>> >> so there >> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only >> >> >>> >> gen_iorv2df3). >> >> >>> >> >> >> >>> >> I am a
Re: [ping * 4] PR35503 - warn for restrict
On 2 November 2016 at 23:17, Prathamesh Kulkarni wrote: > On 2 November 2016 at 23:07, Jason Merrill wrote: >> On Wed, Nov 2, 2016 at 1:08 PM, Prathamesh Kulkarni >> wrote: >>> On 2 November 2016 at 18:29, Jason Merrill wrote: >>>> Then I'll approve the whole patch. >>> Thanks! >>> Trying the patch on kernel build (allmodconfig) reveals the following >>> couple of warnings: >>> http://pastebin.com/Sv2HFDUv >>> >>> I think warning for str_error_r() is correct >> >> It's accurate, but unhelpful; snprintf isn't going to use the contents >> of buf via the variadic argument, so this warning is just noise. > Ah, indeed, it's just printing address of buf, not using the contents. >> >>> however I am not sure if >>> warning for pager_preexec() is legit or a false positive: >>> >>> pager.c: In function 'pager_preexec': >>> pager.c:35:12: warning: passing argument 2 to restrict-qualified >>> parameter aliases with argument 4 [-Wrestrict] >>> select(1, &in, NULL, &in, NULL); >>> ^~~~~~ >>> Is the warning correct for the above call to select() syscall ? >> >> The warning looks correct based on the prototype >> >> extern int select (int __nfds, fd_set *__restrict __readfds, >>fd_set *__restrict __writefds, >>fd_set *__restrict __exceptfds, >>struct timeval *__restrict __timeout); >> >> But passing the same fd_set to both readfds and exceptfds seems >> reasonable to me, so this also seems like a false positive. >> >> Looking at C11, I see this example: >> >> EXAMPLE 3 The function parameter declarations >> void h(int n, int * restrict p, int * restrict q, int * restrict r) >> { >> int i; >> for (i = 0; i < n; i++) >> p[i] = q[i] + r[i]; >> } >> >> illustrate how an unmodified object can be aliased through two >> restricted pointers. In particular, if a and b >> are disjoint arrays, a call of the form h(100, a, b, b) has defined >> behavior, because array b is not >> modified within function h. >> >> This is is another example of well-defined code that your warning will >> complain about. > Yes, that's a limitation of the patch, it just looks at the prototype, and > not how the arguments are used in the function. >> >>> Should we instead keep it in Wextra, or continue keeping it in Wall ? >> >> It seems that it doesn't belong in -Wall. I don't feel strongly about >> -Wextra. > Should I commit the patch by keeping Wrestrict "standalone", > ie, not including it in either Wall or Wextra ? Hi, After Joseph and Jason's approval, I have committed a rebased version of patch as r242366 after bootstrap+test on x86_64-unknown-linux-gnu, cross-test on arm*-*-*, aarch64*-*-* and verifying no warning is triggered on kernel build with make allmodconfig && make all. Because the patch only looks at function prototype, there could be false positives with the warning (restrict example 3 in C11 std), and hence the warning isn't enabled by default, and neither by Wall or Wextra. The warning is only enabled with -Wrestrict option. Thanks, Prathamesh > > Thanks, > Prathamesh >> >> Jason
Fix PR78154
Hi Richard, Following your suggestion in PR78154, the patch checks if stmt contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p and returns true in that case. Bootstrapped+tested on x86_64-unknown-linux-gnu. Cross-testing on arm*-*-*, aarch64*-*-* in progress. Would it be OK to commit this patch in stage-3 ? Thanks, Prathamesh 2016-11-17 Prathamesh Kulkarni * tree-vrp.c (gimple_str_nonzero_warnv_p): New function. (gimple_stmt_nonzero_warnv_p): Call gimple_str_nonzero_warnv_p. testsuite/ * gcc.dg/tree-ssa/pr78154.c: New test-case. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr78154.c b/gcc/testsuite/gcc.dg/tree-ssa/pr78154.c new file mode 100644 index 000..d3463f4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr78154.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-slim" } */ + +void f (void *d, const void *s, __SIZE_TYPE__ n) +{ + if (__builtin_memcpy (d, s, n) == 0) +__builtin_abort (); + + if (__builtin_memmove (d, s, n) == 0) +__builtin_abort (); + + if (__builtin_memset (d, 0, n) == 0) +__builtin_abort (); + + if (__builtin_strcpy (d, s) == 0) +__builtin_abort (); + + if (__builtin_strcat (d, s) == 0) +__builtin_abort (); + + if (__builtin_strncpy (d, s, n) == 0) +__builtin_abort (); + + if (__builtin_strncat (d, s, n) == 0) +__builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */ diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index c2a4133..b563a7f 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -1069,6 +1069,34 @@ gimple_assign_nonzero_warnv_p (gimple *stmt, bool *strict_overflow_p) } } +/* Return true if STMT is known to contain call to a string-builtin function + that is known to return nonnull. */ + +static bool +gimple_str_nonzero_warnv_p (gimple *stmt) +{ + if (!is_gimple_call (stmt)) +return false; + + tree fndecl = gimple_call_fndecl (stmt); + if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL) +return false; + + switch (DECL_FUNCTION_CODE (fndecl)) +{ + case BUILT_IN_MEMMOVE: + case BUILT_IN_MEMCPY: + case BUILT_IN_MEMSET: + case BUILT_IN_STRCPY: + case BUILT_IN_STRNCPY: + case BUILT_IN_STRCAT: + case BUILT_IN_STRNCAT: + return true; + default: + return false; +} +} + /* Return true if STMT is known to compute a non-zero value. If the return value is based on the assumption that signed overflow is undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change @@ -1097,7 +1125,7 @@ gimple_stmt_nonzero_warnv_p (gimple *stmt, bool *strict_overflow_p) lookup_attribute ("returns_nonnull", TYPE_ATTRIBUTES (gimple_call_fntype (stmt return true; - return gimple_alloca_call_p (stmt); + return gimple_alloca_call_p (stmt) || gimple_str_nonzero_warnv_p (stmt); } default: gcc_unreachable ();
PR78319
Hi, As discussed in PR, this patch marks the test-case to xfail on arm-none-eabi. OK to commit ? Thanks, Prathamesh 2016-11-17 Prathamesh Kulkarni PR tree-optimization/78319 testsuite/ * gcc.dg/uninit-pred-8_a.c (foo): Mark dg-bogus test to xfail on arm-none-eabi. diff --git a/gcc/testsuite/gcc.dg/uninit-pred-8_a.c b/gcc/testsuite/gcc.dg/uninit-pred-8_a.c index 1b7c472..c45fba0 100644 --- a/gcc/testsuite/gcc.dg/uninit-pred-8_a.c +++ b/gcc/testsuite/gcc.dg/uninit-pred-8_a.c @@ -16,8 +16,9 @@ int foo (int n, int l, int m, int r) if (m) g++; else bar(); + /* marking this test as xfail on arm-none-eabi, see PR78319. */ if ( n || m || r || l) - blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */ + blah(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail arm-none-eabi } } */ if ( n ) blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */
Re: PR78319
On 17 November 2016 at 03:20, Jeff Law wrote: > On 11/16/2016 01:23 PM, Prathamesh Kulkarni wrote: >> >> Hi, >> As discussed in PR, this patch marks the test-case to xfail on >> arm-none-eabi. >> OK to commit ? > > You might check if Aldy's change to the uninit code helps your case > (approved earlier today, so hopefully in the tree very soon). I quickly > scanned the BZ. There's some overlap, but it might be too complex for > Aldy's enhancements to catch. Hi Jeff, I tried Aldy's patch [1], but it didn't catch the case in PR78319. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00225.html Thanks, Prathamesh > > jeff
Re: Fix PR78154
On 17 November 2016 at 14:21, Richard Biener wrote: > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: > >> Hi Richard, >> Following your suggestion in PR78154, the patch checks if stmt >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p >> and returns true in that case. >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> Cross-testing on arm*-*-*, aarch64*-*-* in progress. >> Would it be OK to commit this patch in stage-3 ? > > As people noted we have returns_nonnull for this and that is already > checked. So please make sure the builtins get this attribute instead. OK thanks, I will add the returns_nonnull attribute to the required string builtins. I noticed some of the string builtins don't have RET1 in builtins.def: strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF. Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar to entries for memmove, strcpy ? Thanks, Prathamesh > > Thanks, > Richard.
Re: Fix PR78154
On 17 November 2016 at 15:24, Richard Biener wrote: > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: > >> On 17 November 2016 at 14:21, Richard Biener wrote: >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi Richard, >> >> Following your suggestion in PR78154, the patch checks if stmt >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p >> >> and returns true in that case. >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress. >> >> Would it be OK to commit this patch in stage-3 ? >> > >> > As people noted we have returns_nonnull for this and that is already >> > checked. So please make sure the builtins get this attribute instead. >> OK thanks, I will add the returns_nonnull attribute to the required >> string builtins. >> I noticed some of the string builtins don't have RET1 in builtins.def: >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF. >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar >> to entries for memmove, strcpy ? > > Yes, I think so. Hi, In the attached patch I added returns_nonnull attribute to ATTR_RET1_NOTHROW_NONNULL_LEAF, and changed few builtins like strcat, strncpy, strncat and corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF. Does the patch look correct ? Thanks, Prathamesh > > Richard. diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def index 8dc59c9..da82da5 100644 --- a/gcc/builtin-attrs.def +++ b/gcc/builtin-attrs.def @@ -108,6 +108,7 @@ DEF_ATTR_IDENT (ATTR_TYPEGENERIC, "type generic") DEF_ATTR_IDENT (ATTR_TM_REGPARM, "*tm regparm") DEF_ATTR_IDENT (ATTR_TM_TMPURE, "transaction_pure") DEF_ATTR_IDENT (ATTR_RETURNS_TWICE, "returns_twice") +DEF_ATTR_IDENT (ATTR_RETURNS_NONNULL, "returns_nonnull") DEF_ATTR_TREE_LIST (ATTR_NOVOPS_LIST, ATTR_NOVOPS, ATTR_NULL, ATTR_NULL) @@ -195,8 +196,11 @@ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL) /* Nothrow leaf functions whose pointer parameter(s) are all nonnull, and which return their first argument. */ -DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \ +DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF_1, ATTR_RETURNS_NONNULL, ATTR_NULL, \ ATTR_NOTHROW_NONNULL_LEAF) +DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \ + ATTR_RET1_NOTHROW_NONNULL_LEAF_1) + /* Nothrow const leaf functions whose pointer parameter(s) are all nonnull. */ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL_LEAF, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL_LEAF) diff --git a/gcc/builtins.def b/gcc/builtins.def index 219feeb..c697b0a 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -646,13 +646,13 @@ DEF_LIB_BUILTIN(BUILT_IN_MEMCHR, "memchr", BT_FN_PTR_CONST_PTR_INT_SIZE, DEF_LIB_BUILTIN(BUILT_IN_MEMCMP, "memcmp", BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMCPY, "memcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMMOVE, "memmove", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_MEMPCPY, "mempcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_MEMPCPY, "mempcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMSET, "memset", BT_FN_PTR_PTR_INT_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_RINDEX, "rindex", BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_STPCPY, "stpcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, "stpncpy", BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_STPCPY, "stpcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, "stpncpy", BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STRCASECMP, "strcasecmp", BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCAT, "strcat", BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCAT, "strcat", BT_FN_STRING_STRING_CONST_STRING,
PR78153
Hi, As suggested by Martin in PR78153 strlen's return value cannot exceed PTRDIFF_MAX. So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic() in the attached patch. However it regressed strlenopt-3.c: Consider fn1() from strlenopt-3.c: __attribute__((noinline, noclone)) size_t fn1 (char *p, char *q) { size_t s = strlen (q); strcpy (p, q); return s - strlen (p); } The optimized dump shows the following: __attribute__((noclone, noinline)) fn1 (char * p, char * q) { size_t s; size_t _7; long unsigned int _9; : s_4 = strlen (q_3(D)); _9 = s_4 + 1; __builtin_memcpy (p_5(D), q_3(D), _9); _7 = 0; return _7; } which introduces the regression, because the test expects "return 0;" in fn1(). The issue seems to be in vrp2: Before the patch: Visiting statement: s_4 = strlen (q_3(D)); Found new range for s_4: VARYING Visiting statement: _1 = s_4; Found new range for _1: [s_4, s_4] marking stmt to be not simulated again Visiting statement: _7 = s_4 - _1; Applying pattern match.pd:111, gimple-match.c:27997 Match-and-simplified s_4 - _1 to 0 Intersecting [0, 0] and [0, +INF] to [0, 0] Found new range for _7: [0, 0] __attribute__((noclone, noinline)) fn1 (char * p, char * q) { size_t s; long unsigned int _1; long unsigned int _9; : s_4 = strlen (q_3(D)); _9 = s_4 + 1; __builtin_memcpy (p_5(D), q_3(D), _9); _1 = s_4; return 0; } After the patch: Visiting statement: s_4 = strlen (q_3(D)); Intersecting [0, 9223372036854775806] and [0, 9223372036854775806] to [0, 9223372036854775806] Found new range for s_4: [0, 9223372036854775806] marking stmt to be not simulated again Visiting statement: _1 = s_4; Intersecting [0, 9223372036854775806] EQUIVALENCES: { s_4 } (1 elements) and [0, 9223372036854775806] to [0, 9223372036854775806] EQUIVALENCES: { s_4 } (1 elements) Found new range for _1: [0, 9223372036854775806] marking stmt to be not simulated again Visiting statement: _7 = s_4 - _1; Intersecting ~[9223372036854775807, 9223372036854775809] and ~[9223372036854775807, 9223372036854775809] to ~[9223372036854775807, 9223372036854775809] Found new range for _7: ~[9223372036854775807, 9223372036854775809] marking stmt to be not simulated again __attribute__((noclone, noinline)) fn1 (char * p, char * q) { size_t s; long unsigned int _1; size_t _7; long unsigned int _9; : s_4 = strlen (q_3(D)); _9 = s_4 + 1; __builtin_memcpy (p_5(D), q_3(D), _9); _1 = s_4; _7 = s_4 - _1; return _7; } Then forwprop4 turns _1 = s_4 _7 = s_4 - _1 into _7 = 0 and we end up with: _7 = 0 return _7 in optimized dump. Running ccp again after forwprop4 trivially solves the issue, however I am not sure if we want to run ccp again ? The issue is probably with extract_range_from_ssa_name(): For _1 = s_4 Before patch: VR for s_4 is set to varying. So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name. Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to s_4, and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using match.pd pattern x - x -> 0). After patch: VR for s_4 is set to [0, PTRDIFF_MAX - 1] And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1] so IIUC, we then lose the information that _1 is equal to s_4, and vrp doesn't transform _7 = s_4 - _1 to _7 = 0. forwprop4 does that because it sees that s_4 and _1 are equivalent. Does this sound correct ? I am not sure how to proceed with the patch, and would be grateful for suggestions. Thanks, Prathamesh diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr78153-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr78153-1.c new file mode 100644 index 000..2530ba0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr78153-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-slim" } */ + +void f(const char *s) +{ + if (__PTRDIFF_MAX__ <= __builtin_strlen (s)) +__builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr78153-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr78153-2.c new file mode 100644 index 000..de70450 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr78153-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-slim" } */ + +void f(const char *s) +{ + __PTRDIFF_TYPE__ n = __builtin_strlen (s); + if (n < 0) +__builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */ diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index c2a4133..d17b413 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4013,6 +4013,16 @@ extract_range_basic (value_range *vr, gimple *stmt) : vrp_val_max (type), NULL); } return; + case CFN_BUILT_IN_STRLEN: + { + tree type = TREE_TYPE (gimple_call_lhs (stmt)); + unsigned HOST_WIDE_INT max = + TREE_INT_CST_LOW (vrp_val_max (ptrdiff_type_node)) - 1; + +
Re: PR78153
On 20 November 2016 at 19:34, Jakub Jelinek wrote: > On Sun, Nov 20, 2016 at 07:20:20PM +0530, Prathamesh Kulkarni wrote: >> --- a/gcc/tree-vrp.c >> +++ b/gcc/tree-vrp.c >> @@ -4013,6 +4013,16 @@ extract_range_basic (value_range *vr, gimple *stmt) >> : vrp_val_max (type), NULL); >> } >> return; >> + case CFN_BUILT_IN_STRLEN: >> + { >> + tree type = TREE_TYPE (gimple_call_lhs (stmt)); >> + unsigned HOST_WIDE_INT max = >> + TREE_INT_CST_LOW (vrp_val_max (ptrdiff_type_node)) - 1; > > Wrong formatting, = should go on the next line, and should be indented only > 2 columns more than the previous line. Plus TREE_INT_CST_LOW really > shouldn't be used in new code. You should use tree_to_uhwi or tree_to_shwi > instead. Why the -1? Can you just > fold_convert (type, TYPE_MAX_VALUE (ptrdiff_type_node)); ? > Or, if you really want the -1, e.g. wide_int max = vrp_val_max > (ptrdiff_type_node); > wide_int_to_tree (type, max - 1); > or something similar. Hi Jakub, Thanks for the suggestions. Sorry I wrote misleading info in the patch. As per PR, strlen's return value should always be less than PTRDIFF_MAX, so I am setting the range to [0, PTRDIFF_MAX - 1]. I will use wide_int in the next version of patch. Thanks, Prathamesh >> + >> + set_value_range (vr, VR_RANGE, build_int_cst (type, 0), >> + build_int_cst (type, max), NULL); >> + } >> + return; >> default: >> break; >> } > > Jakub
Re: Fix PR78154
On 21 November 2016 at 15:34, Richard Biener wrote: > On Fri, 18 Nov 2016, Prathamesh Kulkarni wrote: > >> On 17 November 2016 at 15:24, Richard Biener wrote: >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 17 November 2016 at 14:21, Richard Biener wrote: >> >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> Hi Richard, >> >> >> Following your suggestion in PR78154, the patch checks if stmt >> >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p >> >> >> and returns true in that case. >> >> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress. >> >> >> Would it be OK to commit this patch in stage-3 ? >> >> > >> >> > As people noted we have returns_nonnull for this and that is already >> >> > checked. So please make sure the builtins get this attribute instead. >> >> OK thanks, I will add the returns_nonnull attribute to the required >> >> string builtins. >> >> I noticed some of the string builtins don't have RET1 in builtins.def: >> >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF. >> >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar >> >> to entries for memmove, strcpy ? >> > >> > Yes, I think so. >> Hi, >> In the attached patch I added returns_nonnull attribute to >> ATTR_RET1_NOTHROW_NONNULL_LEAF, >> and changed few builtins like strcat, strncpy, strncat and >> corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF. >> Does the patch look correct ? > > Hmm, given you only change ATTR_RET1_NOTHROW_NONNULL_LEAF means that > the gimple_stmt_nonzero_warnv_p code is incomplete -- it should > infer returns_nonnull itself from RET1 (which is fnspec("1") basically) > and the nonnull attribute on the argument. So > > unsigned rf = gimple_call_return_flags (stmt); > if (rf & ERF_RETURNS_ARG) >{ > tree arg = gimple_call_arg (stmt, rf & ERF_RETURN_ARG_MASK); > if (range of arg is ! VARYING) >use range of arg; > else if (infer_nonnull_range_by_attribute (stmt, arg)) > ... nonnull ... > Hi, Thanks for the suggestions, modified gimple_stmt_nonzero_warnv_p accordingly in this version. For functions like stpcpy that return nonnull but not one of it's arguments, I added new enum ATTR_RETNONNULL_NOTHROW_LEAF. Is that OK ? Bootstrapped+tested on x86_64-unknown-linux-gnu. Cross-testing on arm*-*-*, aarch64*-*-* in progress. Thanks, Prathamesh > Richard. > >> Thanks, >> Prathamesh >> > >> > Richard. >> > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg) 2016-11-22 Richard Biener Prathamesh Kulkarni * tree-vrp.c (gimple_stmt_nonzero_warnv_p): Return true if function returns it's argument and the argument is nonnull. * builtin-attrs.def: Define ATTR_RETURNS_NONNULL, ATT_RETNONNULL_NOTHROW_LEAF. * builtins.def (BUILT_IN_MEMPCPY): Change attribute to ATTR_RETNONNULL_NOTHROW_LEAF. (BUILT_IN_STPCPY): Likewise. (BUILT_IN_STPNCPY): Likewise. (BUILT_IN_MEMPCPY_CHK): Likewise. (BUILT_IN_STPCPY_CHK): Likewise. (BUILT_IN_STPNCPY_CHK): Likewise. (BUILT_IN_STRCAT): Change attribute to ATTR_RET1_NOTHROW_NONNULL_LEAF. (BUILT_IN_STRNCAT): Likewise. (BUILT_IN_STRNCPY): Likewise. (BUILT_IN_MEMSET_CHK): Likewise. (BUILT_IN_STRCAT_CHK): Likewise. (BUILT_IN_STRCPY_CHK): Likewise. (BUILT_IN_STRNCAT_CHK): Likewise. (BUILT_IN_STRNCPY_CHK): Likewise. testsuite/ * gcc.dg/tree-ssa/pr78154.c: New test. diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def index 8dc59c9..94d0c62 100644 --- a/gcc/builtin-attrs.def +++ b/gcc/builtin-attrs.def @@ -108,6 +108,7 @@ DEF_ATTR_IDENT (ATTR_TYPEGENERIC, "type generic") DEF_ATTR_IDENT (ATTR_TM_REGPARM, "*tm regparm") DEF_ATTR_IDENT (ATTR_TM_TMPURE, "transaction_pure") DEF_ATTR_IDENT (ATTR_RETURNS_TWICE, "returns_twice") +DEF_ATTR_IDENT (ATTR_RETURNS_NONNULL, "returns_nonnull") DEF_ATTR_TREE_LIST (ATTR_NOVOPS_LIST, ATTR_NOVOPS, ATTR_NULL, ATTR_NULL) @@ -197,6 +198,9 @@ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, ATTR_NULL, \ and which return their first argument. */ DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \