[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-12 Thread pmatos at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

pmatos at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #24 from pmatos at gcc dot gnu.org ---
Closing as invalid. Thanks Richard.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-12 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #23 from rguenther at suse dot de  ---
On Wed, 12 Feb 2014, pa...@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> 
> --- Comment #22 from Paulo J. Matos  ---
> After some thought, I am concluding this cannot actually be optimized and that
> GCC 4.5.4 was better because it was taking advantage of an undefined behaviour
> that doesn't exist.
> 
> The thought process is as follows. The whole process has to do with this type
> of loop:
> void foo (int loopCount)
> {
>   short i;
>   for (i = 0; (int)i < loopCount; i++)
> ...
> }
> 
> GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment 
> was
> done in type short. Then i was promoted to int through a sign_extend and
> compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate 
> an
> int scev for the loop.
> 
> In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have
> undefined behaviour. i++ due to C integer promotion rules is: i = (short)
> ((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1).
> This is then sign extended to int for comparison. GCC cannot generate an int
> scev because it's not simple: (int) (short) {1, +, 1}_1.
> 
> This can validly loop forever if loopCount > SHORT_MAX.
> For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and
> is incremented by one the addition is fine because is done in (unsigned short)
> and then truncated using modulo 2 (implementation defined behaviour) to short,
> therefore never reaching loopCount and looping forever.
> 
> In RTL we get the following sequence:
> r4:SI <- [loopCount]
> r0:HI <- 0
> 
> code label...
> 
> ...
> 
> r2:HI <- r1:HI + 1
> r3:SI <- sign_extend r2:HI
> 
> p0:BI <- r3:SI < r4:SI
> loop to code label if p0:BI
> 
> I was tempted to simplify this to:
> r4:SI <- [loopCount]
> r0:SI <- 0
> 
> code label...
> 
> ...
> 
> r2:SI <- r1:SI + 1
> 
> p0:BI <- r2:SI < r4:SI
> loop to code label if p0:BI
> 
> However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX,
> therefore I think that at least in this case this cannot be optimized.
> 
> I am tempted to close the bug report. Richard?

Yes.  That sounds correct.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-12 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #22 from Paulo J. Matos  ---
After some thought, I am concluding this cannot actually be optimized and that
GCC 4.5.4 was better because it was taking advantage of an undefined behaviour
that doesn't exist.

The thought process is as follows. The whole process has to do with this type
of loop:
void foo (int loopCount)
{
  short i;
  for (i = 0; (int)i < loopCount; i++)
...
}

GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was
done in type short. Then i was promoted to int through a sign_extend and
compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an
int scev for the loop.

In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have
undefined behaviour. i++ due to C integer promotion rules is: i = (short)
((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1).
This is then sign extended to int for comparison. GCC cannot generate an int
scev because it's not simple: (int) (short) {1, +, 1}_1.

This can validly loop forever if loopCount > SHORT_MAX.
For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and
is incremented by one the addition is fine because is done in (unsigned short)
and then truncated using modulo 2 (implementation defined behaviour) to short,
therefore never reaching loopCount and looping forever.

In RTL we get the following sequence:
r4:SI <- [loopCount]
r0:HI <- 0

code label...

...

r2:HI <- r1:HI + 1
r3:SI <- sign_extend r2:HI

p0:BI <- r3:SI < r4:SI
loop to code label if p0:BI

I was tempted to simplify this to:
r4:SI <- [loopCount]
r0:SI <- 0

code label...

...

r2:SI <- r1:SI + 1

p0:BI <- r2:SI < r4:SI
loop to code label if p0:BI

However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX,
therefore I think that at least in this case this cannot be optimized.

I am tempted to close the bug report. Richard?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-07 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #21 from rguenther at suse dot de  ---
On Fri, 7 Feb 2014, pa...@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> 
> --- Comment #20 from Paulo J. Matos  ---
> OK, I was trying to make sense of all this and there are two things that stick
> out.
> 
> One is when you say that due to C integer promotion rules make i = 
> (short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) 
> i + 1). Am I missing something that allows this or makes the addition in 
> int equivalent to the addition in unsigned short?

This is a valid shortening optimization GCC performs.

> Secondly we still have a dangling sign_extend later on that we could 
> possibly optimize. I find it hard to understand if this can be done 
> properly in expand or if a small pass like ree but before zero overhead 
> loop generation is better. What do you think?

That entirely depends on where the extension is generated and what
information is present there ... if it can be avoided at expand
time then that's surely the best thing to do.  Maybe it can even
be avoided on the GIMPLE level.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-07 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #20 from Paulo J. Matos  ---
OK, I was trying to make sense of all this and there are two things that stick
out.

One is when you say that due to C integer promotion rules make i =
(short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) i + 1).
Am I missing something that allows this or makes the addition in int equivalent
to the addition in unsigned short?

Secondly we still have a dangling sign_extend later on that we could possibly
optimize. I find it hard to understand if this can be done properly in expand
or if a small pass like ree but before zero overhead loop generation is better.
What do you think?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #19 from rguenther at suse dot de  ---
On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> 
> --- Comment #16 from Paulo J. Matos  ---
> (In reply to rguent...@suse.de from comment #15)
> > Exactly the same problem.  C integral type promotion rules make
> > that i = (short)((int)i + 1) again.  Note that (int)i + 1
> > does not overflow, (short) ((int)i + 1) invokes implementation-defined
> > behavior which in our case is modulo-2 reduction.
> > 
> > Nothing guarantees that (short)i + 1 does not overflow.
> 
> OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens.
> It seems to be i = (short) ((unsigned short) i + 1)
> Later i is cast to int for comparison.
> 
> Before ivopts this is the end of the loop body:
>   i.7_19 = (unsigned short) i_26;
>   _20 = i.7_19 + 1;
>   i_21 = (short intD.8) _20;
>   _10 = (intD.1) i_21;
>   if (_10 < _25)
> goto ;
>   else
> goto ;
> 
> i is initially a short, then moved to unsigned short. The addition is 
> performed
> and returned to short. Then cast to int for the comparison.
> 
> For GCC 4.5.4 the end of loop body is:
>   iD.2767_18 = iD.2767_26 + 1;
>   D.5046_9 = (intD.0) iD.2767_18;
>   if (D.5046_9 < D.5047_25)
> goto ;
>   else
> goto ;
> 
> Here the addition is made in short int and then there's only one cast to int.

Yes, and thus GCC 4.5 still contains the bug that i++ invokes undefined
behavior when overflowing (which it does not).


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #18 from rguenther at suse dot de  ---
On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> 
> --- Comment #17 from Paulo J. Matos  ---
> (In reply to rguent...@suse.de from comment #15)
> > On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote:
> > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> > > 
> > > --- Comment #14 from Paulo J. Matos  ---
> > > Something like this which looks much simpler hits the same problem:
> > > extern int arr[];
> > > 
> > > void
> > > foo32 (int limit)
> > > {
> > >   short i;
> > >   for (i = 0; (int)i < limit; i++)
> > > arr[i] += 1;
> > > }
> > 
> > Exactly the same problem.  C integral type promotion rules make
> > that i = (short)((int)i + 1) again.  Note that (int)i + 1
> > does not overflow, (short) ((int)i + 1) invokes implementation-defined
> > behavior which in our case is modulo-2 reduction.
> > 
> > Nothing guarantees that (short)i + 1 does not overflow.
> 
> I am being thick... indeed I forgot to notice that i++ also invokes undefined
> behaviour. I guess then GCC sorts that out by casting i into unsigned short 
> for
> the addition and all the remaining issues then unfold.

No, i++ doesn't invoke undefined behavior - that's the whole point
and GCC got this wrong until it was fixed (4.5 is still broken).
The whole point is that limit == SHORT_MAX + 1 and the loop being
endless is _valid_ (well, apart from arr[i] then overflowing - looks
like an opportunity to derive that i can _not_ overflow ... ;))


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #17 from Paulo J. Matos  ---
(In reply to rguent...@suse.de from comment #15)
> On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> > 
> > --- Comment #14 from Paulo J. Matos  ---
> > Something like this which looks much simpler hits the same problem:
> > extern int arr[];
> > 
> > void
> > foo32 (int limit)
> > {
> >   short i;
> >   for (i = 0; (int)i < limit; i++)
> > arr[i] += 1;
> > }
> 
> Exactly the same problem.  C integral type promotion rules make
> that i = (short)((int)i + 1) again.  Note that (int)i + 1
> does not overflow, (short) ((int)i + 1) invokes implementation-defined
> behavior which in our case is modulo-2 reduction.
> 
> Nothing guarantees that (short)i + 1 does not overflow.

I am being thick... indeed I forgot to notice that i++ also invokes undefined
behaviour. I guess then GCC sorts that out by casting i into unsigned short for
the addition and all the remaining issues then unfold.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #16 from Paulo J. Matos  ---
(In reply to rguent...@suse.de from comment #15)
> Exactly the same problem.  C integral type promotion rules make
> that i = (short)((int)i + 1) again.  Note that (int)i + 1
> does not overflow, (short) ((int)i + 1) invokes implementation-defined
> behavior which in our case is modulo-2 reduction.
> 
> Nothing guarantees that (short)i + 1 does not overflow.

OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens.
It seems to be i = (short) ((unsigned short) i + 1)
Later i is cast to int for comparison.

Before ivopts this is the end of the loop body:
  i.7_19 = (unsigned short) i_26;
  _20 = i.7_19 + 1;
  i_21 = (short intD.8) _20;
  _10 = (intD.1) i_21;
  if (_10 < _25)
goto ;
  else
goto ;

i is initially a short, then moved to unsigned short. The addition is performed
and returned to short. Then cast to int for the comparison.

For GCC 4.5.4 the end of loop body is:
  iD.2767_18 = iD.2767_26 + 1;
  D.5046_9 = (intD.0) iD.2767_18;
  if (D.5046_9 < D.5047_25)
goto ;
  else
goto ;

Here the addition is made in short int and then there's only one cast to int.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #15 from rguenther at suse dot de  ---
On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> 
> --- Comment #14 from Paulo J. Matos  ---
> Something like this which looks much simpler hits the same problem:
> extern int arr[];
> 
> void
> foo32 (int limit)
> {
>   short i;
>   for (i = 0; (int)i < limit; i++)
> arr[i] += 1;
> }

Exactly the same problem.  C integral type promotion rules make
that i = (short)((int)i + 1) again.  Note that (int)i + 1
does not overflow, (short) ((int)i + 1) invokes implementation-defined
behavior which in our case is modulo-2 reduction.

Nothing guarantees that (short)i + 1 does not overflow.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #14 from Paulo J. Matos  ---
Something like this which looks much simpler hits the same problem:
extern int arr[];

void
foo32 (int limit)
{
  short i;
  for (i = 0; (int)i < limit; i++)
arr[i] += 1;
}


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #13 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #12)
> 
> Note that {1, +, 1}_1 is unsigned.  The issue is that while i is short
> i++ is really i = (short)((int) i + 1) and thus only the operation in
> type 'int' is known to not overflow and thus the IV in short _can_
> overflow and the loop can loop infinitely for example for loopCount
> == SHORT_MAX + 1.
> 
> The fix to SCEV analysis was to still be able to analyze the evolution at
> all.
> 
> The testcase is simply very badly written (unsigned short upper bound,
> signed short IV and IV comparison against upper bound in signed int).

I thought any signed operation cannot overflow, independently on its width,
therefore (short) (int + 1) shouldn't overflow.

I agree with you on the testcase, however, that's taken from customer code and
it's even if badly written, it's acceptable C. GCC 4.5.4 generates the scalar
evolution for the integer variable: {1, +, 1}_1 without the casts (therefore a
simple_iv). This allows GCC to use an int for an IV which helps discard the
sign extend in the loop body and later on allows the zero overhead loop being
generated. This case happens again and again and causes serious performance
regression on customer code.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #12 from Richard Biener  ---
(In reply to Paulo J. Matos from comment #10)
> (In reply to Paulo J. Matos from comment #8)
> > 
> > Made a mistake. With the attached test, the final gimple before expand for
> > the loop basic block is:
> > ;;   basic block 5, loop depth 0
> > ;;pred:   5
> > ;;4
> >   # i_26 = PHI 
> >   # ivtmp.24_18 = PHI 
> >   _28 = (void *) ivtmp.24_18;
> >   _13 = MEM[base: _28, offset: 0B];
> >   x.4_14 = x;
> >   _15 = _13 ^ x.4_14;
> >   MEM[base: _28, offset: 0B] = _15;
> >   ivtmp.24_12 = ivtmp.24_18 + 4;
> >   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
> >   _11 = (unsigned short) i_26;
> >   _2 = _11 + 1;
> >   i_1 = (short int) _2;
> >   _10 = (int) i_1;
> >   if (_10 < _25)
> > goto ;
> >   else
> > goto ;
> > ;;succ:   5
> > ;;6
> > 
> > However, the point is the same. IVOPTS should probably generate an int IV
> > instead of a short int IV to avoid the sign extend since removing the sign
> > extend during RTL seems to be quite hard.
> > 
> > What do you think?
> 
> For >= 4.8 the scalar evolution of _10 is deemed not simple, because it
> looks like the following:
>   type  size 
> unit size 
> align 32 symtab 0 alias set 3 canonical type 0x2ab16690
> precision 32 min  max  0x2ab12fa0 2147483647> context  D.2881>
> pointer_to_this >
>
> arg 0  type  HI
> size 
> unit size 
> align 16 symtab 0 alias set 4 canonical type 0x2ab16540
> precision 16 min  max  0x2ab12ee0 32767>
> pointer_to_this >
>
> arg 0 
> arg 1  arg 2  0x2acc9140 1>>>
> 
> This is something like: (int) (short int) {1, +, 1}_1. Since these are
> signed integers, we can assume they don't overflow, can't we simplify the
> scalar evolution to a polynomial_chrec over 32bit integers and forget the
> nop_expr that represents the sign extend?

Note that {1, +, 1}_1 is unsigned.  The issue is that while i is short
i++ is really i = (short)((int) i + 1) and thus only the operation in
type 'int' is known to not overflow and thus the IV in short _can_
overflow and the loop can loop infinitely for example for loopCount
== SHORT_MAX + 1.

The fix to SCEV analysis was to still be able to analyze the evolution at all.

The testcase is simply very badly written (unsigned short upper bound,
signed short IV and IV comparison against upper bound in signed int).


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #11 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #10)
> (In reply to Paulo J. Matos from comment #8)
> > 
> > Made a mistake. With the attached test, the final gimple before expand for
> > the loop basic block is:
> > ;;   basic block 5, loop depth 0
> > ;;pred:   5
> > ;;4
> >   # i_26 = PHI 
> >   # ivtmp.24_18 = PHI 
> >   _28 = (void *) ivtmp.24_18;
> >   _13 = MEM[base: _28, offset: 0B];
> >   x.4_14 = x;
> >   _15 = _13 ^ x.4_14;
> >   MEM[base: _28, offset: 0B] = _15;
> >   ivtmp.24_12 = ivtmp.24_18 + 4;
> >   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
> >   _11 = (unsigned short) i_26;
> >   _2 = _11 + 1;
> >   i_1 = (short int) _2;
> >   _10 = (int) i_1;
> >   if (_10 < _25)
> > goto ;
> >   else
> > goto ;
> > ;;succ:   5
> > ;;6
> > 
> > However, the point is the same. IVOPTS should probably generate an int IV
> > instead of a short int IV to avoid the sign extend since removing the sign
> > extend during RTL seems to be quite hard.
> > 
> > What do you think?
> 
> For >= 4.8 the scalar evolution of _10 is deemed not simple, because it
> looks like the following:
>   type  size 
> unit size 
> align 32 symtab 0 alias set 3 canonical type 0x2ab16690
> precision 32 min  max  0x2ab12fa0 2147483647> context  D.2881>
> pointer_to_this >
>
> arg 0  type  HI
> size 
> unit size 
> align 16 symtab 0 alias set 4 canonical type 0x2ab16540
> precision 16 min  max  0x2ab12ee0 32767>
> pointer_to_this >
>
> arg 0 
> arg 1  arg 2  0x2acc9140 1>>>
> 
> This is something like: (int) (short int) {1, +, 1}_1. Since these are
> signed integers, we can assume they don't overflow, can't we simplify the
> scalar evolution to a polynomial_chrec over 32bit integers and forget the
> nop_expr that represents the sign extend?

This chain of nop_expr in the scalar evolution is due to Richards fix for
PR53676. It is still not clear to me, what the fix is for and if it needs
tweaking or if it needs for a later pass to remove the widening from the loop.
I am investigating.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #10 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #8)
> 
> Made a mistake. With the attached test, the final gimple before expand for
> the loop basic block is:
> ;;   basic block 5, loop depth 0
> ;;pred:   5
> ;;4
>   # i_26 = PHI 
>   # ivtmp.24_18 = PHI 
>   _28 = (void *) ivtmp.24_18;
>   _13 = MEM[base: _28, offset: 0B];
>   x.4_14 = x;
>   _15 = _13 ^ x.4_14;
>   MEM[base: _28, offset: 0B] = _15;
>   ivtmp.24_12 = ivtmp.24_18 + 4;
>   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
>   _11 = (unsigned short) i_26;
>   _2 = _11 + 1;
>   i_1 = (short int) _2;
>   _10 = (int) i_1;
>   if (_10 < _25)
> goto ;
>   else
> goto ;
> ;;succ:   5
> ;;6
> 
> However, the point is the same. IVOPTS should probably generate an int IV
> instead of a short int IV to avoid the sign extend since removing the sign
> extend during RTL seems to be quite hard.
> 
> What do you think?

For >= 4.8 the scalar evolution of _10 is deemed not simple, because it looks
like the following:
 
unit size 
align 32 symtab 0 alias set 3 canonical type 0x2ab16690 precision
32 min  max  context 
pointer_to_this >

arg 0 
unit size 
align 16 symtab 0 alias set 4 canonical type 0x2ab16540
precision 16 min  max 
pointer_to_this >

arg 0 
arg 1  arg 2 >>

This is something like: (int) (short int) {1, +, 1}_1. Since these are signed
integers, we can assume they don't overflow, can't we simplify the scalar
evolution to a polynomial_chrec over 32bit integers and forget the nop_expr
that represents the sign extend?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #9 from Paulo J. Matos  ---
Created attachment 32044
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32044&action=edit
Testcase


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #8 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #7)
> (In reply to Richard Biener from comment #5)
> > Apart from expand there is the redundant-extension-elimination, ree.c.
> 
> In expand we get the following gimple for the loop:
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;4
>   # i_15 = PHI <0(2), i_12(4)>
>   # _18 = PHI <0(2), _4(4)>
>   _6 = arr[_18];
>   _7 = _6 + 1;
>   arr[_18] = _7;
>   _17 = (unsigned short) i_15;
>   _13 = _17 + 1;
>   i_12 = (short int) _13;
>   _4 = (int) i_12;
>   if (_4 < limit_5(D))
> goto ;
>   else
> goto ;
> ;;succ:   4
> ;;3
> 
> 
> Where _13 is an unsigned short and what we want to eliminate is this sign
> extend:
>   _4 = (int) i_12;
> 
> This doesn't seem trivial in the expand phase because to eliminate the sign
> expand, you promote i_12 to int and have then to promote a bunch of other
> variables, whose insn have been already emitted when you get here. Shouldn't
> this be ivopts noticing that if it generates an int IV, it saves a sign
> extend and therefore is better?

Made a mistake. With the attached test, the final gimple before expand for the
loop basic block is:
;;   basic block 5, loop depth 0
;;pred:   5
;;4
  # i_26 = PHI 
  # ivtmp.24_18 = PHI 
  _28 = (void *) ivtmp.24_18;
  _13 = MEM[base: _28, offset: 0B];
  x.4_14 = x;
  _15 = _13 ^ x.4_14;
  MEM[base: _28, offset: 0B] = _15;
  ivtmp.24_12 = ivtmp.24_18 + 4;
  temp_ptr.5_17 = (Sample *) ivtmp.24_12;
  _11 = (unsigned short) i_26;
  _2 = _11 + 1;
  i_1 = (short int) _2;
  _10 = (int) i_1;
  if (_10 < _25)
goto ;
  else
goto ;
;;succ:   5
;;6

However, the point is the same. IVOPTS should probably generate an int IV
instead of a short int IV to avoid the sign extend since removing the sign
extend during RTL seems to be quite hard.

What do you think?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #7 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #5)
> Apart from expand there is the redundant-extension-elimination, ree.c.

In expand we get the following gimple for the loop:
;;   basic block 4, loop depth 0
;;pred:   2
;;4
  # i_15 = PHI <0(2), i_12(4)>
  # _18 = PHI <0(2), _4(4)>
  _6 = arr[_18];
  _7 = _6 + 1;
  arr[_18] = _7;
  _17 = (unsigned short) i_15;
  _13 = _17 + 1;
  i_12 = (short int) _13;
  _4 = (int) i_12;
  if (_4 < limit_5(D))
goto ;
  else
goto ;
;;succ:   4
;;3


Where _13 is an unsigned short and what we want to eliminate is this sign
extend:
  _4 = (int) i_12;

This doesn't seem trivial in the expand phase because to eliminate the sign
expand, you promote i_12 to int and have then to promote a bunch of other
variables, whose insn have been already emitted when you get here. Shouldn't
this be ivopts noticing that if it generates an int IV, it saves a sign extend
and therefore is better?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #6 from Paulo J. Matos  ---
humm, ree is no good because by then we missed already the generation of zero
overhead loops. Do you think this is something that could be added to expand?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #5 from Richard Biener  ---
Apart from expand there is the redundant-extension-elimination, ree.c.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #4 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #3)
> Yes, I think that the IV choice merely shows that we miss to optimize the
> extension - which would be somewhere in the RTL opt pipeline.

Makes sense. My first instinct was to do it in expand but since expand does one
gimple statement at a time it might be too much for it to handle since it
probably has to detect the sign extend and promote the type of the register if
there are no conflicting conditions. 

If you suggest where to do this kind of thing I can give it a try.

Thanks.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|tree-optimization   |rtl-optimization

--- Comment #3 from Richard Biener  ---
Yes, I think that the IV choice merely shows that we miss to optimize the
extension - which would be somewhere in the RTL opt pipeline.