RE: A problem about loop store motion

2012-02-21 Thread Jiangning Liu


> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Richard Guenther
> Sent: Tuesday, February 21, 2012 6:19 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: Re: A problem about loop store motion
> 
> On Tue, Feb 21, 2012 at 10:57 AM, Jiangning Liu 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: Tuesday, February 21, 2012 5:40 PM
> >> To: Jiangning Liu
> >> Cc: gcc@gcc.gnu.org
> >> Subject: Re: A problem about loop store motion
> >>
> >> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu
> 
> >> wrote:
> >> >> The MEM form is more "canonical", so the loop SM machinery to
> detect
> >> >> equality should be adjusted accordingly.  Alternatively you can
> >> teach
> >> >> PRE insertion to strip off the MEM if possible (though
> >> >> fold_stmt_inplace should
> >> >> arelady do this if possible).
> >> >
> >> > Richard,
> >> >
> >> > Thank you! You are right. I noticed on latest trunk the problem in
> >> PRE was
> >> > already fixed by invoking fold_stmt_inplace.
> >> >
> >> > Unfortunately for this small case, the latest trunk code still
> can't
> >> do SM
> >> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is
> true,
> >> that
> >> > is, pos has alias with l[pos].
> >> >
> >> > I think alias analysis should be able to know they don't have
> alias
> >> with
> >> > each other, unless there is an assignment statement like "l=&pos;".
> >> >
> >> > Can alias analysis fix the problem?
> >>
> >> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think
> >> its address got taken.  'l' points to any global possibly address-
> taken
> >> variable.  It get's the TREE_ADDRESSABLE flag via PRE insertion
> which
> >> re-gimplifies the tree it creates which marks the variable as
> >> addressable.
> >>
> >> I'll have a look.
> >
> > Terrific! :-) Could you please let me know when you have a patch to
> fix
> > this, because I want to double check the big case, and there might be
> other
> > hidden problems?
> 
> PR52324, I am testing the following patch (in reality the gimplifier
> should not
> mark things addressable unless it itself makes them so, but frontends
> are
> broken, so we do that ... ugh).
> 

Richard,

Now trunk works for the big case as well. Thanks a lot for your quick fix.

BTW, why can't we fix front-end directly?

Thanks,
-Jiangning

> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c  (revision 184428)
> +++ gcc/gimplify.c  (working copy)
> @@ -7061,15 +7061,20 @@ gimplify_expr (tree *expr_p, gimple_seq
>   ret = GS_OK;
>   break;
> }
> - ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> post_p,
> -  is_gimple_mem_ref_addr, fb_rvalue);
> - if (ret == GS_ERROR)
> -   break;
> + /* Avoid re-gimplifying the address operand if it is already
> +in suitable form.  */
> + if (!is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0)))
> +   {
> + ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
> post_p,
> +  is_gimple_mem_ref_addr, fb_rvalue);
> + if (ret == GS_ERROR)
> +   break;
> +   }
>   recalculate_side_effects (*expr_p);
>   ret = GS_ALL_DONE;
>   break;
> 
> - /* Constants need not be gimplified.  */
> +   /* Constants need not be gimplified.  */
> case INTEGER_CST:
> case REAL_CST:
> case FIXED_CST:






gcc-4.4-20120221 is now available

2012-02-21 Thread gccadmin
Snapshot gcc-4.4-20120221 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20120221/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 184449

You'll find:

 gcc-4.4-20120221.tar.bz2 Complete GCC

  MD5=df3598802125f5a24e70e0039d7b077b
  SHA1=efcdf9a007fca4524663704c417f91e02a8f4a58

Diffs from 4.4-20120214 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Inefficient end-of-loop value computation - missed optimization somewhere?

2012-02-21 Thread Andrew Pinski
On Tue, Feb 21, 2012 at 1:27 AM, Richard Guenther
 wrote:
> On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand  wrote:
>> Hello,
>>
>> we've noticed that the loop optimizer sometimes inserts weirdly
>> inefficient code to compute the value of an induction variable
>> at the end of the loop.
>>
>> As a test case stripped down to the core of the problem, consider:
>>
>> int test (int start, int end)
>> {
>>  int i;
>>
>>  for (i = start; i < end; i++)
>>    ;
>>
>>  return i;
>> }
>>
>> That function really ought to be equivalent to
>>
>> int test2 (int start, int end)
>> {
>>  return start < end ? end : start;
>> }
>>
>> And indeed on i386-linux-gnu, we get mostly identical code
>> (except for the branch direction prediction).
>>
>> However, on arm-linux-gnuabi (using -mthumb and -march=armv7-a),
>> we see for the first function:
>>
>> test:
>>        cmp     r0, r1
>>        bge     .L2
>>        adds    r3, r0, #1
>>        mvns    r0, r0
>>        adds    r1, r1, r0
>>        adds    r0, r3, r1
>> .L2:
>>        bx      lr
>>
>> instead of simply (as for the second function):
>>
>> test2:
>>        cmp     r1, r0
>>        it      ge
>>        movge   r0, r1
>>        bx      lr
>>
>>
>>
>> Where does that weird addition-and-negation sequence come from?
>> In 100t.dceloop we still have:
>>
>> :
>>  # i_9 = PHI 
>>  i_5 = i_9 + 1;
>>  if (end_4(D) > i_5)
>>    goto ;
>>  else
>>    goto ;
>>
>> :
>>  goto ;
>>
>> :
>>  # i_1 = PHI 
>>
>>
>> But 102t.sccp has:
>>
>> :
>>  # i_9 = PHI 
>>  i_5 = i_9 + 1;
>>  if (end_4(D) > i_5)
>>    goto ;
>>  else
>>    goto ;
>>
>> :
>>  goto ;
>>
>> :
>>  D.4123_10 = start_2(D) + 1;
>>  D.4124_11 = ~start_2(D);
>>  D.4125_12 = (unsigned int) D.4124_11;
>>  D.4126_13 = (unsigned int) end_4(D);
>>  D.4127_14 = D.4125_12 + D.4126_13;
>>  D.4128_15 = (int) D.4127_14;
>>  i_1 = D.4123_10 + D.4128_15;
>>
>> This is the sequence that ends up in the final assembler code.
>>
>> Note that this computes:
>>
>>  i_1 = (start_2(D) + 1)
>>         + (int)((unsigned int) ~start_2(D) + (unsigned int) end_4(D))
>>
>> which is (since those casts are no-ops on the assembler level):
>>
>>  i_1 = start_2(D) + 1 + ~start_2(D) + end_4(D)
>>
>> which is due to two's-complement arithmetic:
>>
>>  i_1 = start_2(D) - start_2(D) + end_4(D)
>>
>> that is simply:
>>
>>  i_1 = end_4(D)
>>
>>
>> Unfortunately, no tree-based optimization pass after 102t.sccp is able to
>> clean up that mess.  This is true even on *i386*: the only reason we get
>> nice assembler there is due to *RTX* optimization, notably in combine.
>> This hits on i386 because of an intermediate stage that adds two registers
>> and the constant 1 (which matches the lea pattern).  On arm, there is no
>> instruction that would handle that intermediate stage, and combine is not
>> powerful enough to perform the whole simplification in a single step.
>>
>>
>> Now that sequence of gimple operations is generated in scev_const_prop
>> in tree-scalar-evolution.c.  First, the phi node corresponding to the
>> loop exit value is evaluated:
>>
>>  def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL);
>>
>> which returns a chrec of the form
>>
>>  { 1, +, (start + 1) }
>>
>> This is then further simplified by
>>
>>  def = compute_overall_effect_of_inner_loop (ex_loop, def);
>>
>> which first computes the number of loop iterations:
>>
>>  tree nb_iter = number_of_latch_executions (loop);
>>
>> which returns a tree representing
>>
>>  (unsigned int) ~start + (unsigned int) end
>>
>> (which at this point seems to be the validly most efficient way to
>> compute end - start - 1).
>>
>> The chrec is then evaluated via chrec_apply which basically computes
>>
>>  (start + 1) + 1 * (int)((unsigned int) ~start + (unsigned int) end)
>>
>> which ends up being converted to the long gimple sequence seen above.
>>
>>
>> Now I guess my questions are:
>>
>> - In the computation shown above, should that tree have been folded
>>  into just "end" to start with?  The tree is constructed at its
>
> The issue is that (start + 1) + 1 * (int) ... is computed using signed
> integer arithmetic which may invoke undefined behavior when it wraps.
> Thus we cannot re-associate it.  I suppose if you'd arrange the code
> to do the arithmetic in unsigned and only cast the result back to the
> desired type we might fold it ...
>
>>  outermost level in chrec_fold_plus, which simply calls fold_build2.
>>  While simplify-rtx.c has code to detect sequences of operations
>>  that cancel out while folding a PLUS or MINUS RTX, there does
>>  not appear to be corresponding code in fold-const.c to handle
>>  the equivalent optimization at the tree level.
>
> There are.  fold-const.c has some re-association logic to catch this
> and we have tree-ssa-reassoc.c.  All of those only handle same-sign
> operands though I think.
>
>> - If not, should there be another tree pass later on that ought to
>>  clean this up?  I notice there is code to handle plus/minus
>>  sequences in tree-ssa

Re: GCC: OpenMP posix pthread

2012-02-21 Thread Jakub Jelinek
On Tue, Feb 21, 2012 at 04:50:58PM +0100, erotavlas_tu...@libero.it wrote:
> Nobody can answer to my question?

You can configure gcc not to use futexes in libgomp (--disable-linux-futex),
but the default is to use them for performance reasons.

Jakub


Re: Some confuse about the pass of the arguments by gcc

2012-02-21 Thread Andrew Haley
On 02/21/2012 03:18 PM, 嘉谟 wrote:
> I do a experiments to check how gcc pass the arguments.
> here is the code
> 
> #include 
> int main(int argc , char *argv[]){
> int a=3;
> int b=3;
> int c=3;
> printf("%d %d\n",++a+c,a+c);
> printf("%d %d\n",++b,b);
> return 0;
> }
> 
> the anwer is
> 
> 8 7
> 4 4
> 
> the piece of assembly language:  gcc 4.6.2
> 
> movl$3, 28(%esp)
> movl$3, 24(%esp)
> movl$3, 20(%esp)
> movl20(%esp), %eax
> movl28(%esp), %edx
> leal(%edx,%eax), %ecx
> addl$1, 28(%esp)
> movl20(%esp), %eax
> movl28(%esp), %edx
> addl%eax, %edx
> movl$.LC0, %eax
> movl%ecx, 8(%esp)
> movl%edx, 4(%esp)
> movl%eax, (%esp)
> callprintf
> addl$1, 24(%esp)
> movl$.LC0, %eax
> movl24(%esp), %edx
> movl%edx, 8(%esp)
> movl24(%esp), %edx
> movl%edx, 4(%esp)
> movl%eax, (%esp)
> callprintf
> 
> In the first case , gcc first compute the a+c to %ecx ,and pass it
> stack , the compute ++a+c to %edx ,so the answer is 8 7
> 
> In the second case , why it didn't do the same thing like
> compute b=3  and pass it to stack ,then compute ++b and pass it to
> stack .to the result 4 3. However it  first
> addl$1, 24(%esp)  ==>  b+1  I think it compute the expression
> b+1. the pass it to stack . the b which now is 4 and was passed to
> stack.
> 
> I was wondering why gcc handle  the same mode in two ways.
> 
> Is there some errors I had made ?

http://c-faq.com/expr/evalorder2.html

Andrew.


Some confuse about the pass of the arguments by gcc

2012-02-21 Thread 嘉谟
I do a experiments to check how gcc pass the arguments.
here is the code

#include 
int main(int argc , char *argv[]){
int a=3;
int b=3;
int c=3;
printf("%d %d\n",++a+c,a+c);
printf("%d %d\n",++b,b);
return 0;
}

the anwer is

8 7
4 4

the piece of assembly language:  gcc 4.6.2

movl$3, 28(%esp)
movl$3, 24(%esp)
movl$3, 20(%esp)
movl20(%esp), %eax
movl28(%esp), %edx
leal(%edx,%eax), %ecx
addl$1, 28(%esp)
movl20(%esp), %eax
movl28(%esp), %edx
addl%eax, %edx
movl$.LC0, %eax
movl%ecx, 8(%esp)
movl%edx, 4(%esp)
movl%eax, (%esp)
callprintf
addl$1, 24(%esp)
movl$.LC0, %eax
movl24(%esp), %edx
movl%edx, 8(%esp)
movl24(%esp), %edx
movl%edx, 4(%esp)
movl%eax, (%esp)
callprintf

In the first case , gcc first compute the a+c to %ecx ,and pass it
stack , the compute ++a+c to %edx ,so the answer is 8 7

In the second case , why it didn't do the same thing like
compute b=3  and pass it to stack ,then compute ++b and pass it to
stack .to the result 4 3. However it  first
addl$1, 24(%esp)  ==>  b+1  I think it compute the expression
b+1. the pass it to stack . the b which now is 4 and was passed to
stack.

I was wondering why gcc handle  the same mode in two ways.

Is there some errors I had made ?
In my opinion ,for the reason  of the concept of consistent ,it  may be
8  8
4  4

thank you advance


Re: GCC: OpenMP posix pthread

2012-02-21 Thread Erotavlas_turbo
Nobody can answer to my question?

Thank you

Salvatore Frandina




Re: GCC and x86intrin.h header

2012-02-21 Thread Ian Lance Taylor
erotavlas_tu...@libero.it writes:

> I read in the manual of GCC the following line:
>  // new intrinsic header file, it should be included before 
> using any IA-32/x86-64 intrinsics.
> What does it mean? I have to explicitly include this library in my code if I 
> want to use the intrinsic functions like ssex or mcrc32 etc.

This question would be more appropriate for the mailing list
gcc-h...@gcc.gnu.org than for gcc@.  The gcc@ mailing list is for
discussions related to the development of gcc itself.  Please take any
followups to gcc-help.  Thanks.

To answer your question: the  header file provides
definitions for all intrinsic functions which are relevant to the target
processor.  The intrinsic functions are the ones with names like
_mm_crc32_u8.  You are not required to #include .  But it
is a more reliable mechanism than the older one of remembering that
_mm_crc32_u8 is defined in .

Ian


GCC and x86intrin.h header

2012-02-21 Thread Erotavlas_turbo
Hi,

I read in the manual of GCC the following line:
 // new intrinsic header file, it should be included before using 
any IA-32/x86-64 intrinsics.
What does it mean? I have to explicitly include this library in my code if I 
want to use the intrinsic functions like ssex or mcrc32 etc.

Thank you




Re: How to suppress frame pointer usage by default?

2012-02-21 Thread Ian Lance Taylor
Konstantin Vladimirov  writes:

> What one must use in custom backend to suppress frame pointer usage by 
> default?
>
> Frame pointer is mentioned in ELIMINABLE_REGS:
>
> #define ELIMINABLE_REGS \
> {   \
>   {ARG_POINTER_REGNUM,   STACK_POINTER_REGNUM}, \
>   {ARG_POINTER_REGNUM,   FRAME_POINTER_REGNUM}, \
>   {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}, \
> }
>
> TARGET_CAN_ELIMINATE is defined to be always true,
>
> TARGET_FRAME_POINTER_REQUIRED is defined to be always false
>
> Instead of this, gcc generates code with frame pointer usage, and I
> need manually specify -fomit-frame-pointer to suppress this behavior.
> What else may influence this?

You may need to set flag_omit_frame_pointer in your
TARGET_OPTION_OVERRIDE or TARGET_OPTION_OPTIMIZATION_TABLE or
TARGET_OPTION_OVERRIDE (note that these have different names in earlier
versions of gcc).

Ian


How to suppress frame pointer usage by default?

2012-02-21 Thread Konstantin Vladimirov
Hi,

What one must use in custom backend to suppress frame pointer usage by default?

Frame pointer is mentioned in ELIMINABLE_REGS:

#define ELIMINABLE_REGS \
{   \
  {ARG_POINTER_REGNUM,   STACK_POINTER_REGNUM}, \
  {ARG_POINTER_REGNUM,   FRAME_POINTER_REGNUM}, \
  {FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}, \
}

TARGET_CAN_ELIMINATE is defined to be always true,

TARGET_FRAME_POINTER_REQUIRED is defined to be always false

Instead of this, gcc generates code with frame pointer usage, and I
need manually specify -fomit-frame-pointer to suppress this behavior.
What else may influence this?

I feel, I missed something in the GCC internals.

Thanks in advance for any suggestions.

---
With best regards, Konstantin


Re: A problem about loop store motion

2012-02-21 Thread Richard Guenther
On Tue, Feb 21, 2012 at 10:57 AM, Jiangning Liu  wrote:
>
>
>> -Original Message-
>> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, February 21, 2012 5:40 PM
>> To: Jiangning Liu
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: A problem about loop store motion
>>
>> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu 
>> wrote:
>> >> The MEM form is more "canonical", so the loop SM machinery to detect
>> >> equality should be adjusted accordingly.  Alternatively you can
>> teach
>> >> PRE insertion to strip off the MEM if possible (though
>> >> fold_stmt_inplace should
>> >> arelady do this if possible).
>> >
>> > Richard,
>> >
>> > Thank you! You are right. I noticed on latest trunk the problem in
>> PRE was
>> > already fixed by invoking fold_stmt_inplace.
>> >
>> > Unfortunately for this small case, the latest trunk code still can't
>> do SM
>> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true,
>> that
>> > is, pos has alias with l[pos].
>> >
>> > I think alias analysis should be able to know they don't have alias
>> with
>> > each other, unless there is an assignment statement like "l=&pos;".
>> >
>> > Can alias analysis fix the problem?
>>
>> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think
>> its address got taken.  'l' points to any global possibly address-taken
>> variable.  It get's the TREE_ADDRESSABLE flag via PRE insertion which
>> re-gimplifies the tree it creates which marks the variable as
>> addressable.
>>
>> I'll have a look.
>
> Terrific! :-) Could you please let me know when you have a patch to fix
> this, because I want to double check the big case, and there might be other
> hidden problems?

PR52324, I am testing the following patch (in reality the gimplifier should not
mark things addressable unless it itself makes them so, but frontends are
broken, so we do that ... ugh).

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 184428)
+++ gcc/gimplify.c  (working copy)
@@ -7061,15 +7061,20 @@ gimplify_expr (tree *expr_p, gimple_seq
  ret = GS_OK;
  break;
}
- ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
-  is_gimple_mem_ref_addr, fb_rvalue);
- if (ret == GS_ERROR)
-   break;
+ /* Avoid re-gimplifying the address operand if it is already
+in suitable form.  */
+ if (!is_gimple_mem_ref_addr (TREE_OPERAND (*expr_p, 0)))
+   {
+ ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
+  is_gimple_mem_ref_addr, fb_rvalue);
+ if (ret == GS_ERROR)
+   break;
+   }
  recalculate_side_effects (*expr_p);
  ret = GS_ALL_DONE;
  break;

- /* Constants need not be gimplified.  */
+   /* Constants need not be gimplified.  */
case INTEGER_CST:
case REAL_CST:
case FIXED_CST:


RE: A problem about loop store motion

2012-02-21 Thread Jiangning Liu


> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, February 21, 2012 5:40 PM
> To: Jiangning Liu
> Cc: gcc@gcc.gnu.org
> Subject: Re: A problem about loop store motion
> 
> On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu 
> wrote:
> >> The MEM form is more "canonical", so the loop SM machinery to detect
> >> equality should be adjusted accordingly.  Alternatively you can
> teach
> >> PRE insertion to strip off the MEM if possible (though
> >> fold_stmt_inplace should
> >> arelady do this if possible).
> >
> > Richard,
> >
> > Thank you! You are right. I noticed on latest trunk the problem in
> PRE was
> > already fixed by invoking fold_stmt_inplace.
> >
> > Unfortunately for this small case, the latest trunk code still can't
> do SM
> > for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true,
> that
> > is, pos has alias with l[pos].
> >
> > I think alias analysis should be able to know they don't have alias
> with
> > each other, unless there is an assignment statement like "l=&pos;".
> >
> > Can alias analysis fix the problem?
> 
> The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think
> its address got taken.  'l' points to any global possibly address-taken
> variable.  It get's the TREE_ADDRESSABLE flag via PRE insertion which
> re-gimplifies the tree it creates which marks the variable as
> addressable.
> 
> I'll have a look.

Terrific! :-) Could you please let me know when you have a patch to fix
this, because I want to double check the big case, and there might be other
hidden problems?

Thanks,
-Jiangning

> 
> Richard.
> 
> 
> 
> > Thanks,
> > -Jiangning
> >
> >>
> >> Richard.
> >>
> >
> >
> >






Re: A problem about loop store motion

2012-02-21 Thread Richard Guenther
On Tue, Feb 21, 2012 at 9:54 AM, Jiangning Liu  wrote:
>> The MEM form is more "canonical", so the loop SM machinery to detect
>> equality should be adjusted accordingly.  Alternatively you can teach
>> PRE insertion to strip off the MEM if possible (though
>> fold_stmt_inplace should
>> arelady do this if possible).
>
> Richard,
>
> Thank you! You are right. I noticed on latest trunk the problem in PRE was
> already fixed by invoking fold_stmt_inplace.
>
> Unfortunately for this small case, the latest trunk code still can't do SM
> for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, that
> is, pos has alias with l[pos].
>
> I think alias analysis should be able to know they don't have alias with
> each other, unless there is an assignment statement like "l=&pos;".
>
> Can alias analysis fix the problem?

The problem is that 'pos' is marked TREE_ADDRESSABLE, so we think
its address got taken.  'l' points to any global possibly address-taken
variable.  It get's the TREE_ADDRESSABLE flag via PRE insertion which
re-gimplifies the tree it creates which marks the variable as addressable.

I'll have a look.

Richard.



> Thanks,
> -Jiangning
>
>>
>> Richard.
>>
>
>
>


Re: Inefficient end-of-loop value computation - missed optimization somewhere?

2012-02-21 Thread Richard Guenther
On Mon, Feb 20, 2012 at 11:19 PM, Ulrich Weigand  wrote:
> Hello,
>
> we've noticed that the loop optimizer sometimes inserts weirdly
> inefficient code to compute the value of an induction variable
> at the end of the loop.
>
> As a test case stripped down to the core of the problem, consider:
>
> int test (int start, int end)
> {
>  int i;
>
>  for (i = start; i < end; i++)
>    ;
>
>  return i;
> }
>
> That function really ought to be equivalent to
>
> int test2 (int start, int end)
> {
>  return start < end ? end : start;
> }
>
> And indeed on i386-linux-gnu, we get mostly identical code
> (except for the branch direction prediction).
>
> However, on arm-linux-gnuabi (using -mthumb and -march=armv7-a),
> we see for the first function:
>
> test:
>        cmp     r0, r1
>        bge     .L2
>        adds    r3, r0, #1
>        mvns    r0, r0
>        adds    r1, r1, r0
>        adds    r0, r3, r1
> .L2:
>        bx      lr
>
> instead of simply (as for the second function):
>
> test2:
>        cmp     r1, r0
>        it      ge
>        movge   r0, r1
>        bx      lr
>
>
>
> Where does that weird addition-and-negation sequence come from?
> In 100t.dceloop we still have:
>
> :
>  # i_9 = PHI 
>  i_5 = i_9 + 1;
>  if (end_4(D) > i_5)
>    goto ;
>  else
>    goto ;
>
> :
>  goto ;
>
> :
>  # i_1 = PHI 
>
>
> But 102t.sccp has:
>
> :
>  # i_9 = PHI 
>  i_5 = i_9 + 1;
>  if (end_4(D) > i_5)
>    goto ;
>  else
>    goto ;
>
> :
>  goto ;
>
> :
>  D.4123_10 = start_2(D) + 1;
>  D.4124_11 = ~start_2(D);
>  D.4125_12 = (unsigned int) D.4124_11;
>  D.4126_13 = (unsigned int) end_4(D);
>  D.4127_14 = D.4125_12 + D.4126_13;
>  D.4128_15 = (int) D.4127_14;
>  i_1 = D.4123_10 + D.4128_15;
>
> This is the sequence that ends up in the final assembler code.
>
> Note that this computes:
>
>  i_1 = (start_2(D) + 1)
>         + (int)((unsigned int) ~start_2(D) + (unsigned int) end_4(D))
>
> which is (since those casts are no-ops on the assembler level):
>
>  i_1 = start_2(D) + 1 + ~start_2(D) + end_4(D)
>
> which is due to two's-complement arithmetic:
>
>  i_1 = start_2(D) - start_2(D) + end_4(D)
>
> that is simply:
>
>  i_1 = end_4(D)
>
>
> Unfortunately, no tree-based optimization pass after 102t.sccp is able to
> clean up that mess.  This is true even on *i386*: the only reason we get
> nice assembler there is due to *RTX* optimization, notably in combine.
> This hits on i386 because of an intermediate stage that adds two registers
> and the constant 1 (which matches the lea pattern).  On arm, there is no
> instruction that would handle that intermediate stage, and combine is not
> powerful enough to perform the whole simplification in a single step.
>
>
> Now that sequence of gimple operations is generated in scev_const_prop
> in tree-scalar-evolution.c.  First, the phi node corresponding to the
> loop exit value is evaluated:
>
>  def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL);
>
> which returns a chrec of the form
>
>  { 1, +, (start + 1) }
>
> This is then further simplified by
>
>  def = compute_overall_effect_of_inner_loop (ex_loop, def);
>
> which first computes the number of loop iterations:
>
>  tree nb_iter = number_of_latch_executions (loop);
>
> which returns a tree representing
>
>  (unsigned int) ~start + (unsigned int) end
>
> (which at this point seems to be the validly most efficient way to
> compute end - start - 1).
>
> The chrec is then evaluated via chrec_apply which basically computes
>
>  (start + 1) + 1 * (int)((unsigned int) ~start + (unsigned int) end)
>
> which ends up being converted to the long gimple sequence seen above.
>
>
> Now I guess my questions are:
>
> - In the computation shown above, should that tree have been folded
>  into just "end" to start with?  The tree is constructed at its

The issue is that (start + 1) + 1 * (int) ... is computed using signed
integer arithmetic which may invoke undefined behavior when it wraps.
Thus we cannot re-associate it.  I suppose if you'd arrange the code
to do the arithmetic in unsigned and only cast the result back to the
desired type we might fold it ...

>  outermost level in chrec_fold_plus, which simply calls fold_build2.
>  While simplify-rtx.c has code to detect sequences of operations
>  that cancel out while folding a PLUS or MINUS RTX, there does
>  not appear to be corresponding code in fold-const.c to handle
>  the equivalent optimization at the tree level.

There are.  fold-const.c has some re-association logic to catch this
and we have tree-ssa-reassoc.c.  All of those only handle same-sign
operands though I think.

> - If not, should there be another tree pass later on that ought to
>  clean this up?  I notice there is code to handle plus/minus
>  sequences in tree-ssa-reassoc.c.  But this doesn't seem to be
>  able to handle this particular case, possibly because the presence
>  of the casts (nop_exprs) ...
>
> - Anywhere else I'm overlooking right now?
>
>
> I'd appreciate any tips / suggestions how to fix

RE: A problem about loop store motion

2012-02-21 Thread Jiangning Liu
> The MEM form is more "canonical", so the loop SM machinery to detect
> equality should be adjusted accordingly.  Alternatively you can teach
> PRE insertion to strip off the MEM if possible (though
> fold_stmt_inplace should
> arelady do this if possible).

Richard,

Thank you! You are right. I noticed on latest trunk the problem in PRE was
already fixed by invoking fold_stmt_inplace.

Unfortunately for this small case, the latest trunk code still can't do SM
for variable pos, because refs_may_alias_p(*D.4074_10, pos) is true, that
is, pos has alias with l[pos].

I think alias analysis should be able to know they don't have alias with
each other, unless there is an assignment statement like "l=&pos;". 

Can alias analysis fix the problem?

Thanks,
-Jiangning

> 
> Richard.
>