RE: A problem about loop store motion
> -Original Message- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Richard Guenther > Sent: Tuesday, February 21, 2012 6:19 PM > To: Jiangning Liu > Cc: gcc@gcc.gnu.org > Subject: Re: A problem about loop store motion > > On Tue, Feb 21, 2012 at 10:57 AM, Jiangning Liu > wrote: > > > > > >> -Original Message- > >> From: Richard Guenther [mailto:richard.guent...@gmail.com] > >> Sent: Tuesday, February 21, 2012 5:40 PM > >> To: Jiangning Liu > >> Cc: gcc@gcc.gnu.org > >> Subject: Re: A problem about loop store motion > >> > >> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu > > >> wrote: > >> >> The MEM form is more "canonical", so the loop SM machinery to > detect > >> >> equality should be adjusted accordingly. Alternatively you can > >> teach > >> >> PRE insertion to strip off the MEM if possible (though > >> >> fold_stmt_inplace should > >> >> arelady do this if possible). > >> > > >> > Richard, > >> > > >> > Thank you! You are right. I noticed on latest trunk the problem in > >> PRE was > >> > already fixed by invoking fold_stmt_inplace. > >> > > >> > Unfortunately for this small case, the latest trunk code still > can't > >> do SM > >> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is > true, > >> that > >> > is, pos has alias with l[pos]. > >> > > >> > I think alias analysis should be able to know they don't have > alias > >> with > >> > each other, unless there is an assignment statement like "l=&pos;". > >> > > >> > Can alias analysis fix the problem? > >> > >> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think > >> its address got taken. 'l' points to any global possibly address- > taken > >> variable. It get's the TREE_ADDRESSABLE flag via PRE insertion > which > >> re-gimplifies the tree it creates which marks the variable as > >> addressable. > >> > >> I'll have a look. > > > > Terrific! :-) Could you please let me know when you have a patch to > fix > > this, because I want to double check the big case, and there might be > other > > hidden problems? > > PR52324, I am testing the following patch (in reality the gimplifier > should not > mark things addressable unless it itself makes them so, but frontends > are > broken, so we do that ... ugh). > Richard, Now trunk works for the big case as well. Thanks a lot for your quick fix. BTW, why can't we fix front-end directly? Thanks, -Jiangning > Index: gcc/gimplify.c > === > --- gcc/gimplify.c (revision 184428) > +++ gcc/gimplify.c (working copy) > @@ -7061,15 +7061,20 @@ gimplify_expr (tree *expr_p, gimple_seq > ret = GS_OK; > break; > } > - ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, > post_p, > - is_gimple_mem_ref_addr, fb_rvalue); > - if (ret == GS_ERROR) > - break; > + /* Avoid re-gimplifying the address operand if it is already > +in suitable form. */ > + if (!is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0))) > + { > + ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, > post_p, > + is_gimple_mem_ref_addr, fb_rvalue); > + if (ret == GS_ERROR) > + break; > + } > recalculate_side_effects (*expr_p); > ret = GS_ALL_DONE; > break; > > - /* Constants need not be gimplified. */ > + /* Constants need not be gimplified. */ > case INTEGER_CST: > case REAL_CST: > case FIXED_CST:
gcc-4.4-20120221 is now available
Snapshot gcc-4.4-20120221 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20120221/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 184449 You'll find: gcc-4.4-20120221.tar.bz2 Complete GCC MD5=df3598802125f5a24e70e0039d7b077b SHA1=efcdf9a007fca4524663704c417f91e02a8f4a58 Diffs from 4.4-20120214 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Inefficient end-of-loop value computation - missed optimization somewhere?
On Tue, Feb 21, 2012 at 1:27 AM, Richard Guenther wrote: > On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand wrote: >> Hello, >> >> we've noticed that the loop optimizer sometimes inserts weirdly >> inefficient code to compute the value of an induction variable >> at the end of the loop. >> >> As a test case stripped down to the core of the problem, consider: >> >> int test (int start, int end) >> { >> int i; >> >> for (i = start; i < end; i++) >> ; >> >> return i; >> } >> >> That function really ought to be equivalent to >> >> int test2 (int start, int end) >> { >> return start < end ? end : start; >> } >> >> And indeed on i386-linux-gnu, we get mostly identical code >> (except for the branch direction prediction). >> >> However, on arm-linux-gnuabi (using -mthumb and -march=armv7-a), >> we see for the first function: >> >> test: >> cmp r0, r1 >> bge .L2 >> adds r3, r0, #1 >> mvns r0, r0 >> adds r1, r1, r0 >> adds r0, r3, r1 >> .L2: >> bx lr >> >> instead of simply (as for the second function): >> >> test2: >> cmp r1, r0 >> it ge >> movge r0, r1 >> bx lr >> >> >> >> Where does that weird addition-and-negation sequence come from? >> In 100t.dceloop we still have: >> >> : >> # i_9 = PHI >> i_5 = i_9 + 1; >> if (end_4(D) > i_5) >> goto ; >> else >> goto ; >> >> : >> goto ; >> >> : >> # i_1 = PHI >> >> >> But 102t.sccp has: >> >> : >> # i_9 = PHI >> i_5 = i_9 + 1; >> if (end_4(D) > i_5) >> goto ; >> else >> goto ; >> >> : >> goto ; >> >> : >> D.4123_10 = start_2(D) + 1; >> D.4124_11 = ~start_2(D); >> D.4125_12 = (unsigned int) D.4124_11; >> D.4126_13 = (unsigned int) end_4(D); >> D.4127_14 = D.4125_12 + D.4126_13; >> D.4128_15 = (int) D.4127_14; >> i_1 = D.4123_10 + D.4128_15; >> >> This is the sequence that ends up in the final assembler code. >> >> Note that this computes: >> >> i_1 = (start_2(D) + 1) >> + (int)((unsigned int) ~start_2(D) + (unsigned int) end_4(D)) >> >> which is (since those casts are no-ops on the assembler level): >> >> i_1 = start_2(D) + 1 + ~start_2(D) + end_4(D) >> >> which is due to two's-complement arithmetic: >> >> i_1 = start_2(D) - start_2(D) + end_4(D) >> >> that is simply: >> >> i_1 = end_4(D) >> >> >> Unfortunately, no tree-based optimization pass after 102t.sccp is able to >> clean up that mess. This is true even on *i386*: the only reason we get >> nice assembler there is due to *RTX* optimization, notably in combine. >> This hits on i386 because of an intermediate stage that adds two registers >> and the constant 1 (which matches the lea pattern). On arm, there is no >> instruction that would handle that intermediate stage, and combine is not >> powerful enough to perform the whole simplification in a single step. >> >> >> Now that sequence of gimple operations is generated in scev_const_prop >> in tree-scalar-evolution.c. First, the phi node corresponding to the >> loop exit value is evaluated: >> >> def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL); >> >> which returns a chrec of the form >> >> { 1, +, (start + 1) } >> >> This is then further simplified by >> >> def = compute_overall_effect_of_inner_loop (ex_loop, def); >> >> which first computes the number of loop iterations: >> >> tree nb_iter = number_of_latch_executions (loop); >> >> which returns a tree representing >> >> (unsigned int) ~start + (unsigned int) end >> >> (which at this point seems to be the validly most efficient way to >> compute end - start - 1). >> >> The chrec is then evaluated via chrec_apply which basically computes >> >> (start + 1) + 1 * (int)((unsigned int) ~start + (unsigned int) end) >> >> which ends up being converted to the long gimple sequence seen above. >> >> >> Now I guess my questions are: >> >> - In the computation shown above, should that tree have been folded >> into just "end" to start with? The tree is constructed at its > > The issue is that (start + 1) + 1 * (int) ... is computed using signed > integer arithmetic which may invoke undefined behavior when it wraps. > Thus we cannot re-associate it. I suppose if you'd arrange the code > to do the arithmetic in unsigned and only cast the result back to the > desired type we might fold it ... > >> outermost level in chrec_fold_plus, which simply calls fold_build2. >> While simplify-rtx.c has code to detect sequences of operations >> that cancel out while folding a PLUS or MINUS RTX, there does >> not appear to be corresponding code in fold-const.c to handle >> the equivalent optimization at the tree level. > > There are. fold-const.c has some re-association logic to catch this > and we have tree-ssa-reassoc.c. All of those only handle same-sign > operands though I think. > >> - If not, should there be another tree pass later on that ought to >> clean this up? I notice there is code to handle plus/minus >> sequences in tree-ssa
Re: GCC: OpenMP posix pthread
On Tue, Feb 21, 2012 at 04:50:58PM +0100, erotavlas_tu...@libero.it wrote: > Nobody can answer to my question? You can configure gcc not to use futexes in libgomp (--disable-linux-futex), but the default is to use them for performance reasons. Jakub
Re: Some confuse about the pass of the arguments by gcc
On 02/21/2012 03:18 PM, 嘉谟 wrote: > I do a experiments to check how gcc pass the arguments. > here is the code > > #include > int main(int argc , char *argv[]){ > int a=3; > int b=3; > int c=3; > printf("%d %d\n",++a+c,a+c); > printf("%d %d\n",++b,b); > return 0; > } > > the anwer is > > 8 7 > 4 4 > > the piece of assembly language: gcc 4.6.2 > > movl$3, 28(%esp) > movl$3, 24(%esp) > movl$3, 20(%esp) > movl20(%esp), %eax > movl28(%esp), %edx > leal(%edx,%eax), %ecx > addl$1, 28(%esp) > movl20(%esp), %eax > movl28(%esp), %edx > addl%eax, %edx > movl$.LC0, %eax > movl%ecx, 8(%esp) > movl%edx, 4(%esp) > movl%eax, (%esp) > callprintf > addl$1, 24(%esp) > movl$.LC0, %eax > movl24(%esp), %edx > movl%edx, 8(%esp) > movl24(%esp), %edx > movl%edx, 4(%esp) > movl%eax, (%esp) > callprintf > > In the first case , gcc first compute the a+c to %ecx ,and pass it > stack , the compute ++a+c to %edx ,so the answer is 8 7 > > In the second case , why it didn't do the same thing like > compute b=3 and pass it to stack ,then compute ++b and pass it to > stack .to the result 4 3. However it first > addl$1, 24(%esp) ==> b+1 I think it compute the expression > b+1. the pass it to stack . the b which now is 4 and was passed to > stack. > > I was wondering why gcc handle the same mode in two ways. > > Is there some errors I had made ? http://c-faq.com/expr/evalorder2.html Andrew.
Some confuse about the pass of the arguments by gcc
I do a experiments to check how gcc pass the arguments. here is the code #include int main(int argc , char *argv[]){ int a=3; int b=3; int c=3; printf("%d %d\n",++a+c,a+c); printf("%d %d\n",++b,b); return 0; } the anwer is 8 7 4 4 the piece of assembly language: gcc 4.6.2 movl$3, 28(%esp) movl$3, 24(%esp) movl$3, 20(%esp) movl20(%esp), %eax movl28(%esp), %edx leal(%edx,%eax), %ecx addl$1, 28(%esp) movl20(%esp), %eax movl28(%esp), %edx addl%eax, %edx movl$.LC0, %eax movl%ecx, 8(%esp) movl%edx, 4(%esp) movl%eax, (%esp) callprintf addl$1, 24(%esp) movl$.LC0, %eax movl24(%esp), %edx movl%edx, 8(%esp) movl24(%esp), %edx movl%edx, 4(%esp) movl%eax, (%esp) callprintf In the first case , gcc first compute the a+c to %ecx ,and pass it stack , the compute ++a+c to %edx ,so the answer is 8 7 In the second case , why it didn't do the same thing like compute b=3 and pass it to stack ,then compute ++b and pass it to stack .to the result 4 3. However it first addl$1, 24(%esp) ==> b+1 I think it compute the expression b+1. the pass it to stack . the b which now is 4 and was passed to stack. I was wondering why gcc handle the same mode in two ways. Is there some errors I had made ? In my opinion ,for the reason of the concept of consistent ,it may be 8 8 4 4 thank you advance
Re: GCC: OpenMP posix pthread
Nobody can answer to my question? Thank you Salvatore Frandina
Re: GCC and x86intrin.h header
erotavlas_tu...@libero.it writes: > I read in the manual of GCC the following line: > // new intrinsic header file, it should be included before > using any IA-32/x86-64 intrinsics. > What does it mean? I have to explicitly include this library in my code if I > want to use the intrinsic functions like ssex or mcrc32 etc. This question would be more appropriate for the mailing list gcc-h...@gcc.gnu.org than for gcc@. The gcc@ mailing list is for discussions related to the development of gcc itself. Please take any followups to gcc-help. Thanks. To answer your question: the header file provides definitions for all intrinsic functions which are relevant to the target processor. The intrinsic functions are the ones with names like _mm_crc32_u8. You are not required to #include . But it is a more reliable mechanism than the older one of remembering that _mm_crc32_u8 is defined in . Ian
GCC and x86intrin.h header
Hi, I read in the manual of GCC the following line: // new intrinsic header file, it should be included before using any IA-32/x86-64 intrinsics. What does it mean? I have to explicitly include this library in my code if I want to use the intrinsic functions like ssex or mcrc32 etc. Thank you
Re: How to suppress frame pointer usage by default?
Konstantin Vladimirov writes: > What one must use in custom backend to suppress frame pointer usage by > default? > > Frame pointer is mentioned in ELIMINABLE_REGS: > > #define ELIMINABLE_REGS \ > { \ > {ARG_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ > {ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM}, \ > {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ > } > > TARGET_CAN_ELIMINATE is defined to be always true, > > TARGET_FRAME_POINTER_REQUIRED is defined to be always false > > Instead of this, gcc generates code with frame pointer usage, and I > need manually specify -fomit-frame-pointer to suppress this behavior. > What else may influence this? You may need to set flag_omit_frame_pointer in your TARGET_OPTION_OVERRIDE or TARGET_OPTION_OPTIMIZATION_TABLE or TARGET_OPTION_OVERRIDE (note that these have different names in earlier versions of gcc). Ian
How to suppress frame pointer usage by default?
Hi, What one must use in custom backend to suppress frame pointer usage by default? Frame pointer is mentioned in ELIMINABLE_REGS: #define ELIMINABLE_REGS \ { \ {ARG_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ {ARG_POINTER_REGNUM, FRAME_POINTER_REGNUM}, \ {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}, \ } TARGET_CAN_ELIMINATE is defined to be always true, TARGET_FRAME_POINTER_REQUIRED is defined to be always false Instead of this, gcc generates code with frame pointer usage, and I need manually specify -fomit-frame-pointer to suppress this behavior. What else may influence this? I feel, I missed something in the GCC internals. Thanks in advance for any suggestions. --- With best regards, Konstantin
Re: A problem about loop store motion
On Tue, Feb 21, 2012 at 10:57 AM, Jiangning Liu wrote: > > >> -Original Message- >> From: Richard Guenther [mailto:richard.guent...@gmail.com] >> Sent: Tuesday, February 21, 2012 5:40 PM >> To: Jiangning Liu >> Cc: gcc@gcc.gnu.org >> Subject: Re: A problem about loop store motion >> >> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu >> wrote: >> >> The MEM form is more "canonical", so the loop SM machinery to detect >> >> equality should be adjusted accordingly. Alternatively you can >> teach >> >> PRE insertion to strip off the MEM if possible (though >> >> fold_stmt_inplace should >> >> arelady do this if possible). >> > >> > Richard, >> > >> > Thank you! You are right. I noticed on latest trunk the problem in >> PRE was >> > already fixed by invoking fold_stmt_inplace. >> > >> > Unfortunately for this small case, the latest trunk code still can't >> do SM >> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, >> that >> > is, pos has alias with l[pos]. >> > >> > I think alias analysis should be able to know they don't have alias >> with >> > each other, unless there is an assignment statement like "l=&pos;". >> > >> > Can alias analysis fix the problem? >> >> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think >> its address got taken. 'l' points to any global possibly address-taken >> variable. It get's the TREE_ADDRESSABLE flag via PRE insertion which >> re-gimplifies the tree it creates which marks the variable as >> addressable. >> >> I'll have a look. > > Terrific! :-) Could you please let me know when you have a patch to fix > this, because I want to double check the big case, and there might be other > hidden problems? PR52324, I am testing the following patch (in reality the gimplifier should not mark things addressable unless it itself makes them so, but frontends are broken, so we do that ... ugh). Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 184428) +++ gcc/gimplify.c (working copy) @@ -7061,15 +7061,20 @@ gimplify_expr (tree *expr_p, gimple_seq ret = GS_OK; break; } - ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, - is_gimple_mem_ref_addr, fb_rvalue); - if (ret == GS_ERROR) - break; + /* Avoid re-gimplifying the address operand if it is already +in suitable form. */ + if (!is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0))) + { + ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, + is_gimple_mem_ref_addr, fb_rvalue); + if (ret == GS_ERROR) + break; + } recalculate_side_effects (*expr_p); ret = GS_ALL_DONE; break; - /* Constants need not be gimplified. */ + /* Constants need not be gimplified. */ case INTEGER_CST: case REAL_CST: case FIXED_CST:
RE: A problem about loop store motion
> -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Tuesday, February 21, 2012 5:40 PM > To: Jiangning Liu > Cc: gcc@gcc.gnu.org > Subject: Re: A problem about loop store motion > > On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu > wrote: > >> The MEM form is more "canonical", so the loop SM machinery to detect > >> equality should be adjusted accordingly. Alternatively you can > teach > >> PRE insertion to strip off the MEM if possible (though > >> fold_stmt_inplace should > >> arelady do this if possible). > > > > Richard, > > > > Thank you! You are right. I noticed on latest trunk the problem in > PRE was > > already fixed by invoking fold_stmt_inplace. > > > > Unfortunately for this small case, the latest trunk code still can't > do SM > > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, > that > > is, pos has alias with l[pos]. > > > > I think alias analysis should be able to know they don't have alias > with > > each other, unless there is an assignment statement like "l=&pos;". > > > > Can alias analysis fix the problem? > > The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think > its address got taken. 'l' points to any global possibly address-taken > variable. It get's the TREE_ADDRESSABLE flag via PRE insertion which > re-gimplifies the tree it creates which marks the variable as > addressable. > > I'll have a look. Terrific! :-) Could you please let me know when you have a patch to fix this, because I want to double check the big case, and there might be other hidden problems? Thanks, -Jiangning > > Richard. > > > > > Thanks, > > -Jiangning > > > >> > >> Richard. > >> > > > > > >
Re: A problem about loop store motion
On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu wrote: >> The MEM form is more "canonical", so the loop SM machinery to detect >> equality should be adjusted accordingly. Alternatively you can teach >> PRE insertion to strip off the MEM if possible (though >> fold_stmt_inplace should >> arelady do this if possible). > > Richard, > > Thank you! You are right. I noticed on latest trunk the problem in PRE was > already fixed by invoking fold_stmt_inplace. > > Unfortunately for this small case, the latest trunk code still can't do SM > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, that > is, pos has alias with l[pos]. > > I think alias analysis should be able to know they don't have alias with > each other, unless there is an assignment statement like "l=&pos;". > > Can alias analysis fix the problem? The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think its address got taken. 'l' points to any global possibly address-taken variable. It get's the TREE_ADDRESSABLE flag via PRE insertion which re-gimplifies the tree it creates which marks the variable as addressable. I'll have a look. Richard. > Thanks, > -Jiangning > >> >> Richard. >> > > >
Re: Inefficient end-of-loop value computation - missed optimization somewhere?
On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand wrote: > Hello, > > we've noticed that the loop optimizer sometimes inserts weirdly > inefficient code to compute the value of an induction variable > at the end of the loop. > > As a test case stripped down to the core of the problem, consider: > > int test (int start, int end) > { > int i; > > for (i = start; i < end; i++) > ; > > return i; > } > > That function really ought to be equivalent to > > int test2 (int start, int end) > { > return start < end ? end : start; > } > > And indeed on i386-linux-gnu, we get mostly identical code > (except for the branch direction prediction). > > However, on arm-linux-gnuabi (using -mthumb and -march=armv7-a), > we see for the first function: > > test: > cmp r0, r1 > bge .L2 > adds r3, r0, #1 > mvns r0, r0 > adds r1, r1, r0 > adds r0, r3, r1 > .L2: > bx lr > > instead of simply (as for the second function): > > test2: > cmp r1, r0 > it ge > movge r0, r1 > bx lr > > > > Where does that weird addition-and-negation sequence come from? > In 100t.dceloop we still have: > > : > # i_9 = PHI > i_5 = i_9 + 1; > if (end_4(D) > i_5) > goto ; > else > goto ; > > : > goto ; > > : > # i_1 = PHI > > > But 102t.sccp has: > > : > # i_9 = PHI > i_5 = i_9 + 1; > if (end_4(D) > i_5) > goto ; > else > goto ; > > : > goto ; > > : > D.4123_10 = start_2(D) + 1; > D.4124_11 = ~start_2(D); > D.4125_12 = (unsigned int) D.4124_11; > D.4126_13 = (unsigned int) end_4(D); > D.4127_14 = D.4125_12 + D.4126_13; > D.4128_15 = (int) D.4127_14; > i_1 = D.4123_10 + D.4128_15; > > This is the sequence that ends up in the final assembler code. > > Note that this computes: > > i_1 = (start_2(D) + 1) > + (int)((unsigned int) ~start_2(D) + (unsigned int) end_4(D)) > > which is (since those casts are no-ops on the assembler level): > > i_1 = start_2(D) + 1 + ~start_2(D) + end_4(D) > > which is due to two's-complement arithmetic: > > i_1 = start_2(D) - start_2(D) + end_4(D) > > that is simply: > > i_1 = end_4(D) > > > Unfortunately, no tree-based optimization pass after 102t.sccp is able to > clean up that mess. This is true even on *i386*: the only reason we get > nice assembler there is due to *RTX* optimization, notably in combine. > This hits on i386 because of an intermediate stage that adds two registers > and the constant 1 (which matches the lea pattern). On arm, there is no > instruction that would handle that intermediate stage, and combine is not > powerful enough to perform the whole simplification in a single step. > > > Now that sequence of gimple operations is generated in scev_const_prop > in tree-scalar-evolution.c. First, the phi node corresponding to the > loop exit value is evaluated: > > def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL); > > which returns a chrec of the form > > { 1, +, (start + 1) } > > This is then further simplified by > > def = compute_overall_effect_of_inner_loop (ex_loop, def); > > which first computes the number of loop iterations: > > tree nb_iter = number_of_latch_executions (loop); > > which returns a tree representing > > (unsigned int) ~start + (unsigned int) end > > (which at this point seems to be the validly most efficient way to > compute end - start - 1). > > The chrec is then evaluated via chrec_apply which basically computes > > (start + 1) + 1 * (int)((unsigned int) ~start + (unsigned int) end) > > which ends up being converted to the long gimple sequence seen above. > > > Now I guess my questions are: > > - In the computation shown above, should that tree have been folded > into just "end" to start with? The tree is constructed at its The issue is that (start + 1) + 1 * (int) ... is computed using signed integer arithmetic which may invoke undefined behavior when it wraps. Thus we cannot re-associate it. I suppose if you'd arrange the code to do the arithmetic in unsigned and only cast the result back to the desired type we might fold it ... > outermost level in chrec_fold_plus, which simply calls fold_build2. > While simplify-rtx.c has code to detect sequences of operations > that cancel out while folding a PLUS or MINUS RTX, there does > not appear to be corresponding code in fold-const.c to handle > the equivalent optimization at the tree level. There are. fold-const.c has some re-association logic to catch this and we have tree-ssa-reassoc.c. All of those only handle same-sign operands though I think. > - If not, should there be another tree pass later on that ought to > clean this up? I notice there is code to handle plus/minus > sequences in tree-ssa-reassoc.c. But this doesn't seem to be > able to handle this particular case, possibly because the presence > of the casts (nop_exprs) ... > > - Anywhere else I'm overlooking right now? > > > I'd appreciate any tips / suggestions how to fix
RE: A problem about loop store motion
> The MEM form is more "canonical", so the loop SM machinery to detect > equality should be adjusted accordingly. Alternatively you can teach > PRE insertion to strip off the MEM if possible (though > fold_stmt_inplace should > arelady do this if possible). Richard, Thank you! You are right. I noticed on latest trunk the problem in PRE was already fixed by invoking fold_stmt_inplace. Unfortunately for this small case, the latest trunk code still can't do SM for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, that is, pos has alias with l[pos]. I think alias analysis should be able to know they don't have alias with each other, unless there is an assignment statement like "l=&pos;". Can alias analysis fix the problem? Thanks, -Jiangning > > Richard. >