[Bug target/36133] GCC creates suboptimal ASM : Code includes unneeded TST instructions
--- Comment #8 from gunnar at greyhound-data dot com 2008-11-10 12:54 --- (In reply to comment #7) > (In reply to comment #4) > > There are two causes where GCC generates unneeded TST instructions. > > A) General arithmetic > > lsr.l #1,D0 > > tst.l d0 > > jbne ... > > > > This tst instruction is unneeded as the LSR is setting the flags correctly > > already. > > This is NOT correct. LSL does write to the condition codes, but not all of it. > In particular, the bit involved in the not-equal test is not set. > > This TST *is* required. What you is say is not correct. The bit involved in the not-eval test is the "Z-Bit" Both the LSR and TST do set the Z-Bit, 100% equally. The TST instruction in the example is 100% unneeded and NOT required. Please check the official 68K documentation for the which flags the conditinal branch instruction BCC tests (in this case BNE), and verify the behavior of TST and LSR in regards of setting these bits. Best Regards Gunnar von Boehn -- gunnar at greyhound-data dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133
[Bug middle-end/36770] PowerPC generated PTR code inefficiency
--- Comment #2 from gunnar at greyhound-data dot com 2008-07-10 09:18 --- (In reply to comment #1) > forward-propagate is causing some of the issues as shown by: > int *test2(int *a ){ > a[1]=a[0]; > a++; > return a; > } Your example creates the following ASM code: test2: mr 9,3 addi 3,3,4 lwz 0,0(9) stw 0,4(9) blr Correct would be: test2: lwz 0,0(3) stwu 0,4(3) blr Is you can see the created bad code is just the same. This is independent of the register pinning. Can I understand you comment a verification that the forward propagation is broken in GCC/PPC? Kind regards Gunnar von Boehn -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36770
[Bug c/36772] New: GCC generates impossible BRANCH instruction
Andreas Schwab and Gunther Nikl have pointed out that GCC will incorrectly create "on purpose" impossible branch instructions. Reason Summary: - GCC is able to simplify certain compares. - GCC seems to be unable to correctly rewrite the corresponding branch instructions. - GCC thereby generates branches that are by definition impossible to do. Example: void foo (unsigned long j) { unsigned int i; for (i = 0; i < (j>>5); ++i) ; } In this example the generated code does include a compare with a variable that is ZERO. GCC correctly knows that it can simplify a compare against ZERO. But GCC fails to correctly adapt the condition codes. Background: A compare of two unsigned variables can set the CARRY flag. A compare against ZERO can never set the CARRY flag. While GCC recognizes that it can rewrite a CMP with 0 , GCC does not rewrite the branch that checks for the Carry. GCC hereby creates branches that include conditions that are known to be impossible at compile time. > The following link should be the thread about this issue > http://gcc.gnu.org/ml/gcc/2003-10/msg01236.html The problem that we describe here leads to the following: GCC creates branches that include impossible conditions. GCC does not remove the impossible conditions. GCC wants to rewrite the CMP with 0 with a more efficient code. But the 68K backend need to forbid this. Normally it would be possible to leave the CMP 0,variable simply away. And all branches that test for a condition that could actually be set by a cmp with 0 would be possible to evaluate. But as GCC writes branches which include known impossible conditions leaving the cmp away is not possible. The CMP with 0 is explicitly needed to ensure that all flags are cleared to ensure that the branches to known impossible conditions are not taken. It would be great if you could fix the reason for this instead curing the sympton. As this would allow the backend to really remove the unneeded CMP instructions thereby generating smaller and faster code. Many thanks in advance Gunnar von Boehn -- Summary: GCC generates impossible BRANCH instruction Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: m68k-linux-gnu GCC host triplet: m68k-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36772
[Bug c/36770] New: PowerPC generated PTR code ineffiency
GCC fails to generate efficient code for basic pointer operations. Please have a look at this example: *** test.c: register int * src asm("r15"); int test( ){ src[1]=src[0]; src++; } main(){ } *** compile the above with gcc -S -O3 test.c shows us the following ASM output: test: mr 9,15 addi 15,15,4 lwz 0,0(9) stw 0,4(9) blr compile with gcc -S -Os test.c Gives this output test: mr 9,15 addi 15,15,4 lwz 0,0(9) stw 0,4(9) blr As you can see both -O3 and -Os produce the same output. The generated output is far from optimal. GCC generates for the simple pointer operation this code: mr 9,15 addi 15,15,4 lwz 0,0(9) stw 0,4(9) But GCC should rather generate this: lwz 0,0(15) stwu 0,4(15) Two of the four instructions are unneeded. We've here code with literally thousands of unneeded instructions generated like this. I very much hope that this information is helpful to you and that you can fix this. Many thanks in advance Gunnar von Boehn -- Summary: PowerPC generated PTR code ineffiency Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36770
[Bug target/36135] GCC creates suboptimal ASM : suboptimal Adressing-Modes used
--- Comment #6 from gunnar at greyhound-data dot com 2008-06-13 13:34 --- (In reply to comment #4) > This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. > There is another bug about this somewhere I think for arm. PowerPC has the > same issue too ... > Hi Andrew, I want to make clear that the 68K backend used to be able to do this optimization in the GCC 2.9 times. Later with 3.4 or 4.x this optmization did not work anymore and the code became worth. Does this make sense in your opinion? Cheers -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36135
[Bug target/36135] GCC creates suboptimal ASM : suboptimal Adressing-Modes used
--- Comment #5 from gunnar at greyhound-data dot com 2008-06-13 09:31 --- (In reply to comment #4) > This comes down to IV-OPTs not understanding {post,pre}_{dec,inc} at all. > There is another bug about this somewhere I think for arm. PowerPC has the > same issue too ... > If this effects so many platforms this sounds like an important issue to me. Maybe someone should increase the priority and severity of the issue in this case? Andrew, do you plan to fix this issue? Cheers Gunnar -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36135
[Bug target/36135] GCC creates suboptimal ASM : suboptimal Adressing-Modes used
--- Comment #3 from gunnar at greyhound-data dot com 2008-06-12 14:34 --- Andreas, What is your opinion to this? GCC 2.9 used to combine the move with increment in the combine step to something like this: *** (insn 32 30 33 (set (reg/v:SI 32) (mem:SI (post_inc:SI (reg/v:SI 34)) 0)) 42 {movsi+1} (nil) (expr_list:REG_INC (reg/v:SI 34) (nil))) *** So problem is that now GCC seems not to be able to do this anymore by itself With GCC 4.4 the output is: ** (insn 34 33 35 4 example2.c:11 (set (reg/v:SI 54 [ value ]) (mem:SI (reg/v/f:SI 52 [ src ]) [2 S4 A16])) 37 {*movsi_cf} (nil)) (insn 35 34 36 4 example2.c:12 (set (reg/v:SI 53 [ value2 ]) (mem:SI (plus:SI (reg/v/f:SI 52 [ src ]) (const_int 4 [0x4])) [2 S4 A16])) 37 {*movsi_cf} (nil)) (insn 36 35 38 4 example2.c:5 (set (reg/v/f:SI 52 [ src ]) (plus:SI (reg/v/f:SI 52 [ src ]) (const_int 8 [0x8]))) 133 {*addsi3_5200} (nil)) (insn 38 36 40 4 example2.c:10 (set (reg/v:SI 50 [ size.21 ]) (plus:SI (reg/v:SI 50 [ size.21 ]) (const_int -1 [0x]))) 133 {*addsi3_5200} (nil)) *** Any ideas about this? Kind regards Gunnar von Boehn -- gunnar at greyhound-data dot com changed: What|Removed |Added CC||schwab at suse dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36135
[Bug target/36134] GCC creates suboptimal ASM : usage of ADDA.L where LEA could be used
--- Comment #6 from gunnar at greyhound-data dot com 2008-06-12 14:27 --- Andreas, could you please have a look at this? Cheers Gunnar -- gunnar at greyhound-data dot com changed: What|Removed |Added CC||schwab at suse dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36134
[Bug target/36133] GCC creates suboptimal ASM : Code includes unneeded TST instructions
--- Comment #6 from gunnar at greyhound-data dot com 2008-06-12 14:26 --- Andreas, Could you have a look at this? Cheers Gunnar -- gunnar at greyhound-data dot com changed: What|Removed |Added CC||schwab at suse dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133
[Bug target/25128] [m68k] Suboptimal comparisons against 65536
--- Comment #2 from gunnar at greyhound-data dot com 2008-06-10 16:02 --- > Note that > > cmp.l #65535,%d0 > jbhi .L10 > > can be replaced with > > swap %d0 > tst.w %d0 > jbne .L10 > > A similar trick can be applied to signed comparisons as well. But this "trick" will run slower on the higher 68k CPUs. On 68040 or 68060 or SuperScalar Coldfire its better to generate less instructions that do not have dependancies. I think "cmp.l #65535,%d0" is the code that should be generated by "O2" as its faster on many 68K models. The shorter two instruction trick might be an option for compile optiont "Os" Kind regards Gunnar von Boehn -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25128
[Bug c/36488] New: Generated 68K code bad for pipelining (case swap)
+++ This bug was initially created as a clone of Bug #36487 +++ The code generation by GCC 4.4 (trunk) for 68K/Coldfire will run slow on the SuperScalar pipelines of the 68060 and Coldfire V4/V5 cores. if you compilining this example: uint32_t fletcher( uint16_t *data, size_t len ) { uint32_t sum1 = 0x, sum2 = 0x; while (len) { unsigned tlen = len > 360 ? 360 : len; len -= tlen; do { sum1 += *data++; sum2 += sum1; } while (--tlen); sum1 = (sum1 & 0x) + (sum1 >> 16); sum2 = (sum2 & 0x) + (sum2 >> 16); } /* Second reduction step to reduce sums to 16 bits */ sum1 = (sum1 & 0x) + (sum1 >> 16); sum2 = (sum2 & 0x) + (sum2 >> 16); return sum2 << 16 | sum1; } with "m68k-linux-gnu-gcc -mcpu=54470 -fomit-frame-pointer -O3 -S -o example.s example.c" Then you will see that this code is created: 1 clr.w %d3 2 swap %d3 3 clr.w %d4 4 swap %d4 Instruction 2 depends on instruction 1 Instruction 4 depends on instruction 3 A simple reorder of the code to have the instruction in that order would double the performance as now Superscaler design as 68060 or V5 Coldfire can execute more instruction in parrallel 1 clr.w %d3 2 clr.w %d4 3 swap %d3 4 swap %d4 GCC does not try to reduce the instruction dependencies. The Code that GCC generates does not follow the scheduling recommendation for 68040/68060 and above multiscalar CPUs. Can you please be so kind and correct this? -- Summary: Generated 68K code bad for pipelining (case swap) Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC host triplet: m68k-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36488
[Bug c/36487] New: Generated 68K code bad for pipelining
The code generation by GCC 4.4 (trunk) for 68K/Coldfire will run slow on the SuperScalar pipelines of the 68060 and Coldfire V4/V5 cores. if you compilining this example: uint32_t fletcher( uint16_t *data, size_t len ) { uint32_t sum1 = 0x, sum2 = 0x; while (len) { unsigned tlen = len > 360 ? 360 : len; len -= tlen; do { sum1 += *data++; sum2 += sum1; } while (--tlen); sum1 = (sum1 & 0x) + (sum1 >> 16); sum2 = (sum2 & 0x) + (sum2 >> 16); } /* Second reduction step to reduce sums to 16 bits */ sum1 = (sum1 & 0x) + (sum1 >> 16); sum2 = (sum2 & 0x) + (sum2 >> 16); return sum2 << 16 | sum1; } with "m68k-linux-gnu-gcc -mcpu=68060 -fomit-frame-pointer -O3 -S -o example.s example.c" Then you will see that this defination will generate the below code: { uint32_t sum1 = 0x, sum2 = 0x; } moveq #0,%d2 not.w %d2 move.l %d2,%d3 That are THREE depending instructions in a row. Even with result forwarding these THREE instruction will need 3 clocks to execute. Instead writing the above in three lines the compiler could have generated two lines like this: move.l #0x,%d2 move.l #0x,%d3 Or the compiler could have put other independing instructions between those. GCC does not try to reduce the instruction dependencies. The Code that GCC generates does not follow the scheduling recommendation for 68040/68060 and above multiscalar CPUs. Please be so kind and correct this. -- Summary: Generated 68K code bad for pipelining Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC host triplet: m68k-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36487
[Bug target/36134] GCC creates suboptimal ASM : usage of ADDA.L where LEA could be used
--- Comment #5 from gunnar at greyhound-data dot com 2008-06-10 15:24 --- (In reply to comment #4) > Could you please submit your patch to [EMAIL PROTECTED], including a > ChangeLog entry and stating how you tested it. > As requested I did send the email last week. Do you need anything else from me to work on this? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36134
[Bug target/36133] GCC creates suboptimal ASM : Code includes unneeded TST instructions
--- Comment #5 from gunnar at greyhound-data dot com 2008-06-05 12:07 --- Please find below a proposed patch. The patch will making GCC aware that shift does set the CC already and the TST is not needed in this case. The same example could be used to used to make GCC aware of the CC set by other instructions. Index: gcc/config/m68k/m68k.md === *** gcc/config/m68k/m68k.md.orig2008-05-30 10:00:55.0 +0200 --- gcc/config/m68k/m68k.md 2008-06-04 17:01:11.0 +0200 *** *** 5198,5203 --- 5198,5215 [(set_attr "type" "shift") (set_attr "opy" "2")]) + (define_insn "*lshrsi3_cc" + [(set (cc0) + (lshiftrt:SI (match_operand:SI 1 "register_operand" "0") +(match_operand:SI 2 "general_operand" "dI"))) +(set (match_operand:SI 0 "register_operand" "=d") + (lshiftrt:SI (match_dup 1) +(match_dup 2)))] + "" + "lsr%.l %2,%0" + [(set_attr "type" "shift") +(set_attr "opy" "2")]) + (define_insn "lshrhi3" [(set (match_operand:HI 0 "register_operand" "=d") (lshiftrt:HI (match_operand:HI 1 "register_operand" "0") -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133
[Bug c/36433] mregeparm not supported on 68k / Coldfire
--- Comment #1 from gunnar at greyhound-data dot com 2008-06-04 09:54 --- The parameter -mregparm is not supported on M68k / Coldfire. As it its known from the X86 platform compiling with mregparm does improve the size and performance of the generated code. On X86 an overall improvement of 5%-7% is generally stated. This parameter is unfortunately not supported for the M68k and Coldfire platform. This is a serious drawback especiall as on 68k there are operating systems which have parameter passing in registers as their default behavior. (i.e AmigaOS) Please be so kind and add the regparm feature to the 68k Coldfire. It will certainly improve generated code a lot. Many thanks in advance Gunnar von Boehn -- gunnar at greyhound-data dot com changed: What|Removed |Added Summary|mregeparm |mregeparm not supported on ||68k / Coldfire http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36433
[Bug c/36433] New: mregeparm
-- Summary: mregeparm Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: m68k-linux-gnu GCC host triplet: m68k-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36433
[Bug target/36133] GCC creates suboptimal ASM : Code includes unneeded TST instructions
--- Comment #4 from gunnar at greyhound-data dot com 2008-06-04 09:29 --- I want to add that this wrong behavior is partly related to the compile option "-Os". There are two causes where GCC generates unneeded TST instructions. A) General arithmetic lsr.l #1,D0 tst.l d0 jbne ... This tst instruction is unneeded as the LSR is setting the flags correctly already. B) subq.l #1,D1 tst.l d1 jbne ... This unneeded TST is related to the compile option used. If you compile the source with "-O2" then the second unneeded TST instructions are not included in the source. It seems to me that a general important optimizations step - which used to be in "Os" in GCC 2.9 was removed from "Os" causing GCC to generate worse code now. Can you please be so kind and correct this? I believe that this issue is quite serious for the performance of the generated code. 1st The unneeded TST instructions are increasing code size, which is important in embedded environments. 2nd There are case were the instruction which really did set the condition codes correctly in the first place is far enough away from the conditional branch and no CC trashing instruction in between them - so that the instruction fetcher can 100% correctly predict the branch and fold it away completely. The unneeded TST instruction makes branch folding impossible and requires the CPU to guess the branch instead. This will cause a serious performance impact in case of mispredicting the branch. It should be clear that the unneeded TST instruction doas not only bloat the code but the above mentioned conditions can serious degrade the performance as well, depending on your used CPU of course. In the light of this, wouldn't it might sense to increase the Severity of this issue? Regards Gunnar -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133
[Bug target/36134] GCC creates suboptimal ASM : usage of ADDA.L where LEA could be used
--- Comment #3 from gunnar at greyhound-data dot com 2008-05-29 12:50 --- Created an attachment (id=15699) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15699&action=view) Prefer 4 byte long LEA over 6 byte long ADD.L Please include the attached patch for GCC. The added patch has changed the case statement to prefer the 4 byte long lea over the 6 byte long add.l for immediate sub/add instructions to address registers with an immediate operant size of 16bit max. LEA is optimized for pipelining (with destination forwarding) and is shorter than ADD.L Regards Gunnar von Boehn -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36134
[Bug target/36136] GCC creates suboptimal ASM : constant work registers are set up inside work loops and not outside of the loop
--- Comment #2 from gunnar at greyhound-data dot com 2008-05-28 16:28 --- (In reply to comment #1) > It would have been nice to check at least gcc 4.3 (or better current trunk). > I have verified this with the most current GCC source trunk. GCC 4.4 code snapshot 2008-05-23 The problem is still persistant. GCC sets up his work registers inside the work loop. write_32x4: link.w %fp,#0 move.l 16(%fp),%d0 move.l 8(%fp),%a0 lsr.l #4,%d0 jra .L50 .L51: moveq #1,%d1 move.l %d1,(%a0) move.l %d1,4(%a0) move.l %d1,8(%a0) move.l %d1,12(%a0) lea (16,%a0),%a0 subq.l #1,%d0 .L50: tst.l %d0 jne .L51 unlk %fp rts -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36136
[Bug target/36135] GCC creates suboptimal ASM : suboptimal Adressing-Modes used
--- Comment #2 from gunnar at greyhound-data dot com 2008-05-28 16:23 --- (In reply to comment #1) > It would have been nice to check at least gcc 4.3 (or better current trunk). > I have verified this for you with the most current GCC source. Verified with gcc version 4.4.0 20080523 (experimental) (GCC) The problem that GCC uses bad addressing modes is still persistent. Code generated by GCC 4.4 copy_32x4: link.w %fp,#-12 movem.l #3076,(%sp) move.l 16(%fp),%d2 lsr.l #4,%d2 move.l 8(%fp),%a3 move.l 12(%fp),%a2 jra .L6 .L7: move.l (%a2),%a1 subq.l #1,%d2 move.l 4(%a2),%d0 move.l 8(%a2),%d1 move.l 12(%a2),%a0 add.l #16,%a2 move.l %a1,(%a3) move.l %d0,4(%a3) move.l %d1,8(%a3) move.l %a0,12(%a3) add.l #16,%a3 .L6: tst.l %d2 jne .L7 movem.l (%sp),#3076 unlk %fp rts -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36135
[Bug target/36134] GCC creates suboptimal ASM : usage of ADDA.L where LEA could be used
--- Comment #2 from gunnar at greyhound-data dot com 2008-05-28 16:18 --- (In reply to comment #1) > It would have been nice to check at least gcc 4.3 (or better current trunk). > I've verified with latest source gcc source "version 4.4.0 20080523 (experimental) (GCC)" The most current GCC source still has the problem that ADD.L instructions are used for incrementing pointers instead using shorter LEA instruction. Code generated by GCC 4.4 for the testcase. copy_32x4: link.w %fp,#-12 movem.l #3076,(%sp) move.l 16(%fp),%d2 lsr.l #4,%d2 move.l 8(%fp),%a3 move.l 12(%fp),%a2 jra .L6 .L7: move.l (%a2),%a1 subq.l #1,%d2 move.l 4(%a2),%d0 move.l 8(%a2),%d1 move.l 12(%a2),%a0 add.l #16,%a2 move.l %a1,(%a3) move.l %d0,4(%a3) move.l %d1,8(%a3) move.l %a0,12(%a3) add.l #16,%a3 .L6: tst.l %d2 jne .L7 movem.l (%sp),#3076 unlk %fp rts Regards Gunnar -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36134
[Bug target/36133] GCC creates suboptimal ASM : Code includes unneeded TST instructions
--- Comment #3 from gunnar at greyhound-data dot com 2008-05-28 16:14 --- (In reply to comment #1) > It would have been nice to check at least gcc 4.3 (or better current trunk). > I've verified with latest source gcc source "version 4.4.0 20080523 (experimental) (GCC)" The problem that GCC used totally unneeded TST instructions is still in the current source. Regards Gunnar -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133
[Bug c/36136] New: GCC creates suboptimal ASM : constant work registers are set up inside work loops and not outside of the loop
+++ This bug was initially created as a clone of Bug #36133 +++ Hello, The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2, the generated code got partially much worse with GCC 4. One problem that was visible a lot was that GCC set ups constant work registers inside of working loops and not outside of them. At address (1c): the instruction moveq #1,%d1 to set up the work register is inside the working loop and will be unneeded executed with very iteration. Second problem: At address (16) the instruction movel #1,%a0@ uses the literal value #1 and not the work register that has the same value. The literal move.l #1 has a length of 6 bytes while using the work register would have 2 bytes only. Example: C-source Code: void * write_32x4(void *destparam, const void *srcparam, size_t size) { int value=1; int *dst = destparam; size = size / 16; for (; size; size--) { *dst++=value; *dst++=value; *dst++=value; *dst++=value; } } Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c Code generated by GCC 4.2: : 0a: 202f 000c movel %sp@(12),%d0 0e: 206f 0004 moveal %sp@(4),%a0 12: e888lsrl #4,%d0 14: 601cbras 32 16: 20bc 0001 movel #1,%a0@ 1c: 7201moveq #1,%d1 1e: 2141 0004 movel %d1,%a0@(4) 22: 2141 0008 movel %d1,%a0@(8) 26: 2141 000c movel %d1,%a0@(12) 2a: d1fc 0010 addal #16,%a0 30: 5380subql #1,%d0 32: 4a80tstl %d0 34: 66e0bnes 16 36: 4e75rts Generated code length = 46 Byte Length of Workloop: 9 instructions, 32 byte For comparison here is code that you would expect: 0a: 202f 000c movel %sp@(12),%d0 0e: 206f 0004 moveal %sp@(4),%a0 12: 7201moveq #1,%d1 14: e888lsrl #4,%d0 16: 601cbeqs 24 18: 21c0movel %d1,[EMAIL PROTECTED] 1a: 21c0movel %d1,[EMAIL PROTECTED] 1c: 21c0movel %d1,[EMAIL PROTECTED] 1e: 21c0movel %d1,[EMAIL PROTECTED] 20: 5380subql #1,%d0 22: 66e0bnes 18 24: 4e75rts Expected code length = 28 Byte Length of Workloop: 6 instructions, 12 byte Compiler used: m68k-linux-gnu-gcc -v Using built-in specs. Target: m68k-linux-gnu Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin Thread model: posix gcc version 4.2.1 (Sourcery G++ Lite 4.2-47) I hope that this report help you to improve the quality of GCC. Kind regards Gunnar von Boehn -- P.S. I put the noticed issues in individual tickets for easier tracking. I hope that this is helpful to you. -- Summary: GCC creates suboptimal ASM : constant work registers are set up inside work loops and not outside of the loop Product: gcc Version: 4.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36136
[Bug c/36135] New: GCC creates suboptimal ASM : suboptimal Adressing-Modes used
+++ This bug was initially created as a clone of Bug #36133 +++ Hello, The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2, the generated code got partially much worse with GCC 4. One problem that was visible a lot was that GCC uses suboptimal addressing modes. Please see the below example for details. In line 14 to line 2E this code was created: 14: 2290movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0010 addal #16,%a1 2e: d1fc 0010 addal #16,%a0 Much shorter and more efficient would have been this: 14: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 16: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 18: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1a: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] Example: C-source Code: void * copy_32x4a(void *destparam, const void *srcparam, size_t size) { int *dest = destparam; const int *src = srcparam; int size32; size32 = size / 16; for (; size32; size32--) { *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; } } Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c Code generated by GCC 4.2: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888lsrl #4,%d0 12: 6022bras 36 14: 2290movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0010 addal #16,%a1 2e: d1fc 0010 addal #16,%a0 34: 5380subql #1,%d0 36: 4a80tstl %d0 38: 66dabnes 14 3a: 4e75rts For comparison here is code that you would expect: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888lsrl #4,%d0 12: 6022beq 20 14: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 16: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 18: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1a: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1c: 5380subql #1,%d0 1e: 66dabnes 14 20: 4e75rts Compiler used: m68k-linux-gnu-gcc -v Using built-in specs. Target: m68k-linux-gnu Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin Thread model: posix gcc version 4.2.1 (Sourcery G++ Lite 4.2-47) I hope that this report help you to improve the quality of GCC. Kind regards Gunnar von Boehn -- P.S. I put the noticed issues in individual tickets for easier tracking. I hope that this is helpful to you. -- Summary: GCC creates suboptimal ASM : suboptimal Adressing-Modes used Product: gcc Version: 4.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36135
[Bug c/36134] New: GCC creates suboptimal ASM : usage of ADDA.L where LEA could be used
+++ This bug was initially created as a clone of Bug #36133 +++ Hello, The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2, the generated code got partially much worse with GCC 4. One problem that was visible a lot was that GCC used ADDA.L instead using the shorter LEA instruction. Please see the below example for details. In line 28 and 2E you can see that two times the ADDA.L instructions was used, where instead the shorter LEA instruction could have been used. Example: C-source Code: void * copy_32x4a(void *destparam, const void *srcparam, size_t size) { int *dest = destparam; const int *src = srcparam; int size32; size32 = size / 16; for (; size32; size32--) { *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; } } Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c Code generated by GCC 4.2: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888lsrl #4,%d0 12: 6022bras 36 14: 2290movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0010 addal #16,%a1 2e: d1fc 0010 addal #16,%a0 34: 5380subql #1,%d0 36: 4a80tstl %d0 38: 66dabnes 14 3a: 4e75rts For comparison here is code that you would expect: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888lsrl #4,%d0 12: 6022beq 20 14: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 16: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 18: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1a: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1c: 5380subql #1,%d0 1e: 66dabnes 14 20: 4e75rts Compiler used: m68k-linux-gnu-gcc -v Using built-in specs. Target: m68k-linux-gnu Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin Thread model: posix gcc version 4.2.1 (Sourcery G++ Lite 4.2-47) I hope that this report help you to improve the quality of GCC. Kind regards Gunnar von Boehn -- P.S. I put the noticed issues in indivitual tivkets for easier tracking. I hope that this is helpfull. -- Summary: GCC creates suboptimal ASM : usage of ADDA.L where LEA could be used Product: gcc Version: 4.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36134
[Bug c/36133] New: GCC creates suboptimal ASM : Code includes unneeded TST instructions
Hello, The ASM code created by GCC 4.2.1 for 68k/Coldfire platform is not optimal. Comparing ASM output created by GCC 2.9 with GCC 4.2, the generated code got partially much worse with GCC 4. One problem that was visible in very many places was that GCC created unnecessary TST instructions. Please see the below example for details. The TST.L instruction at address 36 is in the loop and unneeded. The lsrl at address (10) and the subql #1,%d0 at address (34) do both set the condition codes already, there is no need for using an extra TST instruction at all. Example: C-source Code: void * copy_32x4a(void *destparam, const void *srcparam, size_t size) { int *dest = destparam; const int *src = srcparam; int size32; size32 = size / 16; for (; size32; size32--) { *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; } } Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c Code generated by GCC 4.2: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888lsrl #4,%d0 12: 6022bras 36 14: 2290movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0010 addal #16,%a1 2e: d1fc 0010 addal #16,%a0 34: 5380subql #1,%d0 36: 4a80tstl %d0 38: 66dabnes 14 3a: 4e75rts For comparison here is code that you would expect: 04: 202f 000c movel %sp@(12),%d0 08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888lsrl #4,%d0 12: 6022beq 20 14: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 16: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 18: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1a: 20d9movel [EMAIL PROTECTED],[EMAIL PROTECTED] 1c: 5380subql #1,%d0 1e: 66dabnes 14 20: 4e75rts Compiler used: m68k-linux-gnu-gcc -v Using built-in specs. Target: m68k-linux-gnu Configured with: /scratch/shinwell/cf-fall-linux-lite/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=m68k-linux-gnu --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-arch=cf --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --enable-shared --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=Sourcery G++ Lite 4.2-47 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --prefix=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux --with-sysroot=/opt/freescale/usr/local/gcc-4.2.47-eglibc-2.5.47/m68k-linux/m68k-linux-gnu/libc --with-build-sysroot=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/libc --enable-poison-system-directories --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin --with-build-time-tools=/scratch/shinwell/cf-fall-linux-lite/install/m68k-linux-gnu/bin Thread model: posix gcc version 4.2.1 (Sourcery G++ Lite 4.2-47) I hope that this report help you to improve the quality of GCC. Kind regards Gunnar von Boehn -- Summary: GCC creates suboptimal ASM : Code includes unneeded TST instructions Product: gcc Version: 4.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: gunnar at greyhound-data dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: m68k-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133