[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 pmatos at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #24 from pmatos at gcc dot gnu.org --- Closing as invalid. Thanks Richard.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #23 from rguenther at suse dot de --- On Wed, 12 Feb 2014, pa...@matos-sorge.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > --- Comment #22 from Paulo J. Matos --- > After some thought, I am concluding this cannot actually be optimized and that > GCC 4.5.4 was better because it was taking advantage of an undefined behaviour > that doesn't exist. > > The thought process is as follows. The whole process has to do with this type > of loop: > void foo (int loopCount) > { > short i; > for (i = 0; (int)i < loopCount; i++) > ... > } > > GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment > was > done in type short. Then i was promoted to int through a sign_extend and > compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate > an > int scev for the loop. > > In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have > undefined behaviour. i++ due to C integer promotion rules is: i = (short) > ((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1). > This is then sign extended to int for comparison. GCC cannot generate an int > scev because it's not simple: (int) (short) {1, +, 1}_1. > > This can validly loop forever if loopCount > SHORT_MAX. > For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and > is incremented by one the addition is fine because is done in (unsigned short) > and then truncated using modulo 2 (implementation defined behaviour) to short, > therefore never reaching loopCount and looping forever. > > In RTL we get the following sequence: > r4:SI <- [loopCount] > r0:HI <- 0 > > code label... > > ... > > r2:HI <- r1:HI + 1 > r3:SI <- sign_extend r2:HI > > p0:BI <- r3:SI < r4:SI > loop to code label if p0:BI > > I was tempted to simplify this to: > r4:SI <- [loopCount] > r0:SI <- 0 > > code label... > > ... > > r2:SI <- r1:SI + 1 > > p0:BI <- r2:SI < r4:SI > loop to code label if p0:BI > > However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX, > therefore I think that at least in this case this cannot be optimized. > > I am tempted to close the bug report. Richard? Yes. That sounds correct.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #22 from Paulo J. Matos --- After some thought, I am concluding this cannot actually be optimized and that GCC 4.5.4 was better because it was taking advantage of an undefined behaviour that doesn't exist. The thought process is as follows. The whole process has to do with this type of loop: void foo (int loopCount) { short i; for (i = 0; (int)i < loopCount; i++) ... } GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was done in type short. Then i was promoted to int through a sign_extend and compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an int scev for the loop. In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have undefined behaviour. i++ due to C integer promotion rules is: i = (short) ((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1). This is then sign extended to int for comparison. GCC cannot generate an int scev because it's not simple: (int) (short) {1, +, 1}_1. This can validly loop forever if loopCount > SHORT_MAX. For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and is incremented by one the addition is fine because is done in (unsigned short) and then truncated using modulo 2 (implementation defined behaviour) to short, therefore never reaching loopCount and looping forever. In RTL we get the following sequence: r4:SI <- [loopCount] r0:HI <- 0 code label... ... r2:HI <- r1:HI + 1 r3:SI <- sign_extend r2:HI p0:BI <- r3:SI < r4:SI loop to code label if p0:BI I was tempted to simplify this to: r4:SI <- [loopCount] r0:SI <- 0 code label... ... r2:SI <- r1:SI + 1 p0:BI <- r2:SI < r4:SI loop to code label if p0:BI However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX, therefore I think that at least in this case this cannot be optimized. I am tempted to close the bug report. Richard?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #21 from rguenther at suse dot de --- On Fri, 7 Feb 2014, pa...@matos-sorge.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > --- Comment #20 from Paulo J. Matos --- > OK, I was trying to make sense of all this and there are two things that stick > out. > > One is when you say that due to C integer promotion rules make i = > (short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) > i + 1). Am I missing something that allows this or makes the addition in > int equivalent to the addition in unsigned short? This is a valid shortening optimization GCC performs. > Secondly we still have a dangling sign_extend later on that we could > possibly optimize. I find it hard to understand if this can be done > properly in expand or if a small pass like ree but before zero overhead > loop generation is better. What do you think? That entirely depends on where the extension is generated and what information is present there ... if it can be avoided at expand time then that's surely the best thing to do. Maybe it can even be avoided on the GIMPLE level.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #20 from Paulo J. Matos --- OK, I was trying to make sense of all this and there are two things that stick out. One is when you say that due to C integer promotion rules make i = (short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) i + 1). Am I missing something that allows this or makes the addition in int equivalent to the addition in unsigned short? Secondly we still have a dangling sign_extend later on that we could possibly optimize. I find it hard to understand if this can be done properly in expand or if a small pass like ree but before zero overhead loop generation is better. What do you think?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #19 from rguenther at suse dot de --- On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > --- Comment #16 from Paulo J. Matos --- > (In reply to rguent...@suse.de from comment #15) > > Exactly the same problem. C integral type promotion rules make > > that i = (short)((int)i + 1) again. Note that (int)i + 1 > > does not overflow, (short) ((int)i + 1) invokes implementation-defined > > behavior which in our case is modulo-2 reduction. > > > > Nothing guarantees that (short)i + 1 does not overflow. > > OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens. > It seems to be i = (short) ((unsigned short) i + 1) > Later i is cast to int for comparison. > > Before ivopts this is the end of the loop body: > i.7_19 = (unsigned short) i_26; > _20 = i.7_19 + 1; > i_21 = (short intD.8) _20; > _10 = (intD.1) i_21; > if (_10 < _25) > goto ; > else > goto ; > > i is initially a short, then moved to unsigned short. The addition is > performed > and returned to short. Then cast to int for the comparison. > > For GCC 4.5.4 the end of loop body is: > iD.2767_18 = iD.2767_26 + 1; > D.5046_9 = (intD.0) iD.2767_18; > if (D.5046_9 < D.5047_25) > goto ; > else > goto ; > > Here the addition is made in short int and then there's only one cast to int. Yes, and thus GCC 4.5 still contains the bug that i++ invokes undefined behavior when overflowing (which it does not).
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #18 from rguenther at suse dot de --- On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > --- Comment #17 from Paulo J. Matos --- > (In reply to rguent...@suse.de from comment #15) > > On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote: > > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > > > > > --- Comment #14 from Paulo J. Matos --- > > > Something like this which looks much simpler hits the same problem: > > > extern int arr[]; > > > > > > void > > > foo32 (int limit) > > > { > > > short i; > > > for (i = 0; (int)i < limit; i++) > > > arr[i] += 1; > > > } > > > > Exactly the same problem. C integral type promotion rules make > > that i = (short)((int)i + 1) again. Note that (int)i + 1 > > does not overflow, (short) ((int)i + 1) invokes implementation-defined > > behavior which in our case is modulo-2 reduction. > > > > Nothing guarantees that (short)i + 1 does not overflow. > > I am being thick... indeed I forgot to notice that i++ also invokes undefined > behaviour. I guess then GCC sorts that out by casting i into unsigned short > for > the addition and all the remaining issues then unfold. No, i++ doesn't invoke undefined behavior - that's the whole point and GCC got this wrong until it was fixed (4.5 is still broken). The whole point is that limit == SHORT_MAX + 1 and the loop being endless is _valid_ (well, apart from arr[i] then overflowing - looks like an opportunity to derive that i can _not_ overflow ... ;))
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #17 from Paulo J. Matos --- (In reply to rguent...@suse.de from comment #15) > On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote: > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > > > --- Comment #14 from Paulo J. Matos --- > > Something like this which looks much simpler hits the same problem: > > extern int arr[]; > > > > void > > foo32 (int limit) > > { > > short i; > > for (i = 0; (int)i < limit; i++) > > arr[i] += 1; > > } > > Exactly the same problem. C integral type promotion rules make > that i = (short)((int)i + 1) again. Note that (int)i + 1 > does not overflow, (short) ((int)i + 1) invokes implementation-defined > behavior which in our case is modulo-2 reduction. > > Nothing guarantees that (short)i + 1 does not overflow. I am being thick... indeed I forgot to notice that i++ also invokes undefined behaviour. I guess then GCC sorts that out by casting i into unsigned short for the addition and all the remaining issues then unfold.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #16 from Paulo J. Matos --- (In reply to rguent...@suse.de from comment #15) > Exactly the same problem. C integral type promotion rules make > that i = (short)((int)i + 1) again. Note that (int)i + 1 > does not overflow, (short) ((int)i + 1) invokes implementation-defined > behavior which in our case is modulo-2 reduction. > > Nothing guarantees that (short)i + 1 does not overflow. OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens. It seems to be i = (short) ((unsigned short) i + 1) Later i is cast to int for comparison. Before ivopts this is the end of the loop body: i.7_19 = (unsigned short) i_26; _20 = i.7_19 + 1; i_21 = (short intD.8) _20; _10 = (intD.1) i_21; if (_10 < _25) goto ; else goto ; i is initially a short, then moved to unsigned short. The addition is performed and returned to short. Then cast to int for the comparison. For GCC 4.5.4 the end of loop body is: iD.2767_18 = iD.2767_26 + 1; D.5046_9 = (intD.0) iD.2767_18; if (D.5046_9 < D.5047_25) goto ; else goto ; Here the addition is made in short int and then there's only one cast to int.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #15 from rguenther at suse dot de --- On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 > > --- Comment #14 from Paulo J. Matos --- > Something like this which looks much simpler hits the same problem: > extern int arr[]; > > void > foo32 (int limit) > { > short i; > for (i = 0; (int)i < limit; i++) > arr[i] += 1; > } Exactly the same problem. C integral type promotion rules make that i = (short)((int)i + 1) again. Note that (int)i + 1 does not overflow, (short) ((int)i + 1) invokes implementation-defined behavior which in our case is modulo-2 reduction. Nothing guarantees that (short)i + 1 does not overflow.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #14 from Paulo J. Matos --- Something like this which looks much simpler hits the same problem: extern int arr[]; void foo32 (int limit) { short i; for (i = 0; (int)i < limit; i++) arr[i] += 1; }
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #13 from Paulo J. Matos --- (In reply to Richard Biener from comment #12) > > Note that {1, +, 1}_1 is unsigned. The issue is that while i is short > i++ is really i = (short)((int) i + 1) and thus only the operation in > type 'int' is known to not overflow and thus the IV in short _can_ > overflow and the loop can loop infinitely for example for loopCount > == SHORT_MAX + 1. > > The fix to SCEV analysis was to still be able to analyze the evolution at > all. > > The testcase is simply very badly written (unsigned short upper bound, > signed short IV and IV comparison against upper bound in signed int). I thought any signed operation cannot overflow, independently on its width, therefore (short) (int + 1) shouldn't overflow. I agree with you on the testcase, however, that's taken from customer code and it's even if badly written, it's acceptable C. GCC 4.5.4 generates the scalar evolution for the integer variable: {1, +, 1}_1 without the casts (therefore a simple_iv). This allows GCC to use an int for an IV which helps discard the sign extend in the loop body and later on allows the zero overhead loop being generated. This case happens again and again and causes serious performance regression on customer code.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #12 from Richard Biener --- (In reply to Paulo J. Matos from comment #10) > (In reply to Paulo J. Matos from comment #8) > > > > Made a mistake. With the attached test, the final gimple before expand for > > the loop basic block is: > > ;; basic block 5, loop depth 0 > > ;;pred: 5 > > ;;4 > > # i_26 = PHI > > # ivtmp.24_18 = PHI > > _28 = (void *) ivtmp.24_18; > > _13 = MEM[base: _28, offset: 0B]; > > x.4_14 = x; > > _15 = _13 ^ x.4_14; > > MEM[base: _28, offset: 0B] = _15; > > ivtmp.24_12 = ivtmp.24_18 + 4; > > temp_ptr.5_17 = (Sample *) ivtmp.24_12; > > _11 = (unsigned short) i_26; > > _2 = _11 + 1; > > i_1 = (short int) _2; > > _10 = (int) i_1; > > if (_10 < _25) > > goto ; > > else > > goto ; > > ;;succ: 5 > > ;;6 > > > > However, the point is the same. IVOPTS should probably generate an int IV > > instead of a short int IV to avoid the sign extend since removing the sign > > extend during RTL seems to be quite hard. > > > > What do you think? > > For >= 4.8 the scalar evolution of _10 is deemed not simple, because it > looks like the following: > type size > unit size > align 32 symtab 0 alias set 3 canonical type 0x2ab16690 > precision 32 min max 0x2ab12fa0 2147483647> context D.2881> > pointer_to_this > > > arg 0 type HI > size > unit size > align 16 symtab 0 alias set 4 canonical type 0x2ab16540 > precision 16 min max 0x2ab12ee0 32767> > pointer_to_this > > > arg 0 > arg 1 arg 2 0x2acc9140 1>>> > > This is something like: (int) (short int) {1, +, 1}_1. Since these are > signed integers, we can assume they don't overflow, can't we simplify the > scalar evolution to a polynomial_chrec over 32bit integers and forget the > nop_expr that represents the sign extend? Note that {1, +, 1}_1 is unsigned. The issue is that while i is short i++ is really i = (short)((int) i + 1) and thus only the operation in type 'int' is known to not overflow and thus the IV in short _can_ overflow and the loop can loop infinitely for example for loopCount == SHORT_MAX + 1. The fix to SCEV analysis was to still be able to analyze the evolution at all. The testcase is simply very badly written (unsigned short upper bound, signed short IV and IV comparison against upper bound in signed int).
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #11 from Paulo J. Matos --- (In reply to Paulo J. Matos from comment #10) > (In reply to Paulo J. Matos from comment #8) > > > > Made a mistake. With the attached test, the final gimple before expand for > > the loop basic block is: > > ;; basic block 5, loop depth 0 > > ;;pred: 5 > > ;;4 > > # i_26 = PHI > > # ivtmp.24_18 = PHI > > _28 = (void *) ivtmp.24_18; > > _13 = MEM[base: _28, offset: 0B]; > > x.4_14 = x; > > _15 = _13 ^ x.4_14; > > MEM[base: _28, offset: 0B] = _15; > > ivtmp.24_12 = ivtmp.24_18 + 4; > > temp_ptr.5_17 = (Sample *) ivtmp.24_12; > > _11 = (unsigned short) i_26; > > _2 = _11 + 1; > > i_1 = (short int) _2; > > _10 = (int) i_1; > > if (_10 < _25) > > goto ; > > else > > goto ; > > ;;succ: 5 > > ;;6 > > > > However, the point is the same. IVOPTS should probably generate an int IV > > instead of a short int IV to avoid the sign extend since removing the sign > > extend during RTL seems to be quite hard. > > > > What do you think? > > For >= 4.8 the scalar evolution of _10 is deemed not simple, because it > looks like the following: > type size > unit size > align 32 symtab 0 alias set 3 canonical type 0x2ab16690 > precision 32 min max 0x2ab12fa0 2147483647> context D.2881> > pointer_to_this > > > arg 0 type HI > size > unit size > align 16 symtab 0 alias set 4 canonical type 0x2ab16540 > precision 16 min max 0x2ab12ee0 32767> > pointer_to_this > > > arg 0 > arg 1 arg 2 0x2acc9140 1>>> > > This is something like: (int) (short int) {1, +, 1}_1. Since these are > signed integers, we can assume they don't overflow, can't we simplify the > scalar evolution to a polynomial_chrec over 32bit integers and forget the > nop_expr that represents the sign extend? This chain of nop_expr in the scalar evolution is due to Richards fix for PR53676. It is still not clear to me, what the fix is for and if it needs tweaking or if it needs for a later pass to remove the widening from the loop. I am investigating.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #10 from Paulo J. Matos --- (In reply to Paulo J. Matos from comment #8) > > Made a mistake. With the attached test, the final gimple before expand for > the loop basic block is: > ;; basic block 5, loop depth 0 > ;;pred: 5 > ;;4 > # i_26 = PHI > # ivtmp.24_18 = PHI > _28 = (void *) ivtmp.24_18; > _13 = MEM[base: _28, offset: 0B]; > x.4_14 = x; > _15 = _13 ^ x.4_14; > MEM[base: _28, offset: 0B] = _15; > ivtmp.24_12 = ivtmp.24_18 + 4; > temp_ptr.5_17 = (Sample *) ivtmp.24_12; > _11 = (unsigned short) i_26; > _2 = _11 + 1; > i_1 = (short int) _2; > _10 = (int) i_1; > if (_10 < _25) > goto ; > else > goto ; > ;;succ: 5 > ;;6 > > However, the point is the same. IVOPTS should probably generate an int IV > instead of a short int IV to avoid the sign extend since removing the sign > extend during RTL seems to be quite hard. > > What do you think? For >= 4.8 the scalar evolution of _10 is deemed not simple, because it looks like the following: unit size align 32 symtab 0 alias set 3 canonical type 0x2ab16690 precision 32 min max context pointer_to_this > arg 0 unit size align 16 symtab 0 alias set 4 canonical type 0x2ab16540 precision 16 min max pointer_to_this > arg 0 arg 1 arg 2 >> This is something like: (int) (short int) {1, +, 1}_1. Since these are signed integers, we can assume they don't overflow, can't we simplify the scalar evolution to a polynomial_chrec over 32bit integers and forget the nop_expr that represents the sign extend?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #9 from Paulo J. Matos --- Created attachment 32044 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32044&action=edit Testcase
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #8 from Paulo J. Matos --- (In reply to Paulo J. Matos from comment #7) > (In reply to Richard Biener from comment #5) > > Apart from expand there is the redundant-extension-elimination, ree.c. > > In expand we get the following gimple for the loop: > ;; basic block 4, loop depth 0 > ;;pred: 2 > ;;4 > # i_15 = PHI <0(2), i_12(4)> > # _18 = PHI <0(2), _4(4)> > _6 = arr[_18]; > _7 = _6 + 1; > arr[_18] = _7; > _17 = (unsigned short) i_15; > _13 = _17 + 1; > i_12 = (short int) _13; > _4 = (int) i_12; > if (_4 < limit_5(D)) > goto ; > else > goto ; > ;;succ: 4 > ;;3 > > > Where _13 is an unsigned short and what we want to eliminate is this sign > extend: > _4 = (int) i_12; > > This doesn't seem trivial in the expand phase because to eliminate the sign > expand, you promote i_12 to int and have then to promote a bunch of other > variables, whose insn have been already emitted when you get here. Shouldn't > this be ivopts noticing that if it generates an int IV, it saves a sign > extend and therefore is better? Made a mistake. With the attached test, the final gimple before expand for the loop basic block is: ;; basic block 5, loop depth 0 ;;pred: 5 ;;4 # i_26 = PHI # ivtmp.24_18 = PHI _28 = (void *) ivtmp.24_18; _13 = MEM[base: _28, offset: 0B]; x.4_14 = x; _15 = _13 ^ x.4_14; MEM[base: _28, offset: 0B] = _15; ivtmp.24_12 = ivtmp.24_18 + 4; temp_ptr.5_17 = (Sample *) ivtmp.24_12; _11 = (unsigned short) i_26; _2 = _11 + 1; i_1 = (short int) _2; _10 = (int) i_1; if (_10 < _25) goto ; else goto ; ;;succ: 5 ;;6 However, the point is the same. IVOPTS should probably generate an int IV instead of a short int IV to avoid the sign extend since removing the sign extend during RTL seems to be quite hard. What do you think?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #7 from Paulo J. Matos --- (In reply to Richard Biener from comment #5) > Apart from expand there is the redundant-extension-elimination, ree.c. In expand we get the following gimple for the loop: ;; basic block 4, loop depth 0 ;;pred: 2 ;;4 # i_15 = PHI <0(2), i_12(4)> # _18 = PHI <0(2), _4(4)> _6 = arr[_18]; _7 = _6 + 1; arr[_18] = _7; _17 = (unsigned short) i_15; _13 = _17 + 1; i_12 = (short int) _13; _4 = (int) i_12; if (_4 < limit_5(D)) goto ; else goto ; ;;succ: 4 ;;3 Where _13 is an unsigned short and what we want to eliminate is this sign extend: _4 = (int) i_12; This doesn't seem trivial in the expand phase because to eliminate the sign expand, you promote i_12 to int and have then to promote a bunch of other variables, whose insn have been already emitted when you get here. Shouldn't this be ivopts noticing that if it generates an int IV, it saves a sign extend and therefore is better?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #6 from Paulo J. Matos --- humm, ree is no good because by then we missed already the generation of zero overhead loops. Do you think this is something that could be added to expand?
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #5 from Richard Biener --- Apart from expand there is the redundant-extension-elimination, ree.c.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 --- Comment #4 from Paulo J. Matos --- (In reply to Richard Biener from comment #3) > Yes, I think that the IV choice merely shows that we miss to optimize the > extension - which would be somewhere in the RTL opt pipeline. Makes sense. My first instinct was to do it in expand but since expand does one gimple statement at a time it might be too much for it to handle since it probably has to detect the sign extend and promote the type of the register if there are no conflicting conditions. If you suggest where to do this kind of thing I can give it a try. Thanks.
[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Component|tree-optimization |rtl-optimization --- Comment #3 from Richard Biener --- Yes, I think that the IV choice merely shows that we miss to optimize the extension - which would be somewhere in the RTL opt pipeline.