[Bug target/29996] sh-elf: should enable -fomit-frame-pointer
--- Comment #1 from chrbr at gcc dot gnu dot org 2010-07-16 11:34 --- done since http://gcc.gnu.org/ml/gcc-patches/2010-01/msg01147.html needed ACCUMULATE_OUTGOING_ARG to fix unwinding (can go back to previous behavior with -mno-accumulate-outgoing-args -fno-omit-frame-pointer) -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29996
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #36 from chrbr at gcc dot gnu dot org 2010-02-10 12:02 --- (In reply to comment #33) Your fix of the middle end looks plausible but I think the target shouldn't generate a CP at the eh landing pad anyway. I'll commit the hunk below anyway after your patch for pic problem is installed. done, you can commit your w/a. @@ -4654,6 +4654,13 @@ find_barrier (int num_mova, rtx mova, rt if (last_got) from = PREV_INSN (last_got); + /* Don't insert the constant pool table at the position which +may be the landing pad. */ + if (flag_exceptions + CALL_P (from) + find_reg_note (from, REG_EH_REGION, NULL_RTX)) + from = PREV_INSN (from); + /* Walk back to be just before any jump or label. Putting it before a label reduces the number of times the branch around the constant pool table will be hit. Putting it before -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #34 from chrbr at gcc dot gnu dot org 2010-02-05 08:26 --- (In reply to comment #33) Your fix of the middle end looks plausible but I think the target shouldn't generate a CP at the eh landing pad anyway. I'll commit the hunk below anyway after your patch for pic problem is installed. OK. I didn't check the code quality difference between the middle-end fix and yours. Since there are no fallthru to the landing pad, and locality with the upcoming exception region is not important, (if we suppose that the exception handler is not on the critical path), I was expecting that the landing pad was a good place for the constant pool on the contrary. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #32 from chrbr at gcc dot gnu dot org 2010-02-05 07:05 --- Looks smart and clean! One minor nit, I guess that the occurence of gbr and GBR in ChangeLog and comments should be replaced with GOT to avoid confusion with GBR register of SH CPU. Thanks for catching up this error in the comment. I meant GP of course, which is even more preferable that GOT (which is what we load, not what we compute). (In reply to comment #31) When you propose it to the list, could you please separate the third hunk which is for the original PR42841 as an independant patch. Also don't forget to update the copyright years in the first one. OK, that was also my intention to submit the 3rd hunk (the one that fixes the jump to the landing pad around the constant table right ?) as a separate patch as it will require the approval of a middle end maintainer. If it cannot go in the trunk before the 4.5 freeze I can propose you to commit your workaround (comment #23) so not to block the regression. Then we can revert when the proper patch is discussed/accepted. (I'm a little bit late for that sorry). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #28 from chrbr at gcc dot gnu dot org 2010-02-03 08:30 --- Hello Kaj, thanks for your proposal thanks for the proposal. but I'm wondering if preventing the scheduling of the mov.l and mova instructions are not too much overkill ? (sh_reorg comes after the scheduler, but even if it didn't that should be ok to mov up instructions. (the R0 liverange between the add and load is another more general problem) Do I miss something ? We only want to avoid the CP to be inserted between those 2 instructions, it's not necessary to have more blockages. I'm working on something that tracks the GOT loading access during the find_barrier walk and then revert back at the end to the latest safe place. OK on the example but the full linux distrib rebuild and validation is still ongoing. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #30 from chrbr at gcc dot gnu dot org 2010-02-03 13:12 --- Created an attachment (id=19794) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19794action=view) patch to fix GOT access load with constant pool Patch under validation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #26 from chrbr at gcc dot gnu dot org 2010-02-01 16:30 --- I'm afraid the unaligned access sigbug regression is another latent bug just exhibited by the fix for the original PR :-( what happens is the the GOT loading sequence is broken by a constant pool: we end up to emit: mov.l .L542,r12(X) bra .L516 nop .L542: .align 2 .long _GLOBAL_OFFSET_TABLE_ ... .L516: mova.L545,r0 (Y) ! add r0,r12 .L545: .long _GLOBAL_OFFSET_TABLE_ The reason for that is that the second mova instruction is unluckily now out of range by 2 bytes. (which could happen with any other situation, even without this patch). IMHO We should forbid the duplication of a _GLOBAL_OFFSET_TABLE_ loading constant while in a UNSPEC_MOVA sequence. We should probably reduce si_limit in find_barrier when a (set (reg:SI 0 r0) (unspec:SI [ (const:SI (unspec:SI [ (symbol_ref (*_GLOBAL_OFFSET_TABLE_)) is met and next is (set (reg:SI 12 r12) (const:SI (unspec:SI [ (symbol_ref (*_GLOBAL_OFFSET_TABLE_) in PIC. I experimenting with a couple of different solutions in this direction. this PR was a really interesting bugs finder ! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #25 from chrbr at gcc dot gnu dot org 2010-01-29 08:59 --- by the way, FYI, trying to explain the differences between your results and mine for sh4-linux. my build was is configured with --enable-target-optspace, so all my runtime build tests are ran with -Os, not -O2 like yours. Which could make a huge differences in CP layout... I repass in -O2 over the week end. Cheers -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #22 from chrbr at gcc dot gnu dot org 2010-01-28 13:09 --- humm, looks like a latent bug. Accidentally the CP is inserted before a compact_jump, which enable further redirect jump optimisation. I don't think it is directly related to the fix, but lets work it a little bit more. so we have just before dbr: jump_insn - 2586 a constant pool L2586 jump_insn - 3394 L3394: ... then in reorg_redirect_jump we redirect the jump over the CP and delete_related_insn so the code between the CP and the jump becomes dead. and we have jump_insn - 3394 a constant pool L3394 ... but the label L2586 is used in the exception table... and thus remains undefined. now my question: how the exception table can refer to a region delimited by deleted labels. It's should be built after dbr isn't it ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #24 from chrbr at gcc dot gnu dot org 2010-01-29 07:46 --- Created an attachment (id=19747) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19747action=view) fixed removal of landing pad label rtx The landing_pad label rtx was created and recorded in tree_inline (duplicate_eh_regions). Seems that reorg_redirect_jump or delete_insn should check for it before deciding it can be removed. I'm testing this patch that does this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #17 from chrbr at gcc dot gnu dot org 2010-01-27 12:50 --- strange, I didn't see that, even the undefined symbol in the assembler. OK I disable the fix until this is clarified. Let me do a recheck on the silicium, will let you know. -c (In reply to comment #16) I've got some new libstdc++-v3 testsuite failures with the patch on my nightly sh4-linux tester: Running /exp/ldroot/dodes/ORIG/trunk/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ... FAIL: 23_containers/deque/requirements/exception/basic.cc (test for excess errors) WARNING: 23_containers/deque/requirements/exception/basic.cc compilation failed to produce executable FAIL: 23_containers/deque/requirements/exception/propagation_consistent.cc (test for excess errors) WARNING: 23_containers/deque/requirements/exception/propagation_consistent.cc compilation failed to produce executable FAIL: 30_threads/packaged_task/members/get_future.cc execution test FAIL: 30_threads/shared_future/members/get.cc execution test The first failure is /tmp/ccl5TCl4.s: Assembler messages: /tmp/ccl5TCl4.s:43070: Error: undefined symbol `.L3394' in operation FAIL: 23_containers/deque/requirements/exception/basic.cc (test for excess errors) The last 2 failures are resulted with the unaligned accesses. I saw Sending SIGBUS to get_future.exe due to unaligned access (PC 296554a8 PR 2965549a) Sending SIGBUS to get.exe due to unaligned access (PC 296554a8 PR 2965549a) on the target machine. With reverting the first hunk of the patch, these errors go away. Christian, could you please revert or disable the first hunk of patches temporarily? Sorry I didn't catch this earlier. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #18 from chrbr at gcc dot gnu dot org 2010-01-27 13:24 --- Subject: Bug 42841 Author: chrbr Date: Wed Jan 27 13:24:40 2010 New Revision: 156282 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=156282 Log: temporarily revert fix for PR target/42841 Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #19 from chrbr at gcc dot gnu dot org 2010-01-27 13:40 --- to make sure we are in the same testing/configuration environment could you please send me the preprocessed file for 23_containers/deque/requirements/exception/propagation_consistent.cc as well as the compilation line in libstdc++.log that you used ? many thanks Christian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #21 from chrbr at gcc dot gnu dot org 2010-01-27 18:13 --- This one is marked as unsupported in my sh-superh-elf log, But I can reproduce it now on sh4-linux. (despite that I have rebuilt a whole distrib without seeing it :O). Anyway I'm investigating. I'm reopening the bug and will revert in the branches as well if I don't find a quick solution. Regards (In reply to comment #20) Created an attachment (id=19729) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19729action=view) [edit] A test case cc1plus -std=gnu++0x -O2 propagation_consistent.ii produces a problematic code here. -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #11 from chrbr at gcc dot gnu dot org 2010-01-26 07:20 --- Subject: Bug 42841 Author: chrbr Date: Tue Jan 26 07:20:27 2010 New Revision: 156229 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=156229 Log: fix PR target/42841 Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #12 from chrbr at gcc dot gnu dot org 2010-01-26 07:22 --- Subject: Bug 42841 Author: chrbr Date: Tue Jan 26 07:21:57 2010 New Revision: 156230 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=156230 Log: fix PR target/42841 Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/config/sh/sh.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #13 from chrbr at gcc dot gnu dot org 2010-01-26 07:28 --- Subject: Bug 42841 Author: chrbr Date: Tue Jan 26 07:28:05 2010 New Revision: 156231 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=156231 Log: fix PR target/42841 Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/config/sh/sh.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug target/42841] [4.3/4.4/4.5 Regression] SH: Assembler complains pcrel too far.
--- Comment #14 from chrbr at gcc dot gnu dot org 2010-01-26 07:29 --- fixed in 4.5, 4.3 and 4.4 -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED Target Milestone|--- |4.5.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42841
[Bug treelang/41639] New: synchronisation primitives take unsigned as input and output values.
Current implementation of the synchronization builtins in gcc (from http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/intref_cls/common/intref_itanium_synchro_prim.htm) describe type as unsigned. although it is stated as type is either a 32-bit or 64-bit integer consequently, testsuite tests such as sync-2.c: if (__sync_sub_and_fetch(AI+13, 12) != (char)-12) abort (); might fail. (unless the target/runtime dependant primitive implementation artificially change the return type). -- Summary: synchronisation primitives take unsigned as input and output values. Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: treelang AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41639
[Bug treelang/41639] synchronisation primitives take unsigned as input and output values.
--- Comment #1 from chrbr at gcc dot gnu dot org 2009-10-09 07:12 --- Created an attachment (id=18758) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18758action=view) Fix synchronisation parameter/output signess The attached patch gives the correct semantic. But should be checked on target using them (pa/arm) for possible legacy regression. (tested on SH with a non-linux, in house runtime, implementation) 2009-10-08 Christian Bruel christian.br...@st.com * builtin-types.def (BT_I[1,2,4,8,16): Set signed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41639
[Bug tree-optimization/41486] New: cselim is not dse aware
The cs elim pass introduces a conditional store, but does not remove the original one. If the former is not removed by DSE, this results in worse code. original thread: http://gcc.gnu.org/ml/gcc-patches/2009-09/msg01955.html. on machines with no predicated stores, disabling this optimization is generally a win, but only as a workaround. -- Summary: cselim is not dse aware Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41486
[Bug tree-optimization/41486] cselim is not dse aware
--- Comment #1 from chrbr at gcc dot gnu dot org 2009-09-28 10:53 --- Created an attachment (id=18665) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18665action=view) case -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41486
[Bug target/39423] [SH] performance regression: lost mov @(disp,Rn)
--- Comment #9 from chrbr at gcc dot gnu dot org 2009-03-12 15:04 --- The attached patch improves the SH generation, but I noticed a small regression with the ARM that could make use before of a shifted constant addressing mode, so not using the extra register for the value. A target description check should be done while expanding the canonicalization, that should not be done only when a base+cst addressing mode exists and the cst must be shifter in a register. Any input welcome to how to target conditionalize this transformation. No performance impact was noticed on i686 however. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug target/39423] [SH] performance regression: lost mov @(disp,Rn)
--- Comment #10 from chrbr at gcc dot gnu dot org 2009-03-12 15:10 --- Created an attachment (id=17447) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17447action=view) SH illustrative patch for feedback only. win on SH. lost on ARM 2009-03-12 Christian Bruel christian.br...@st.com * fold-const.c (fold_plusminus_mult_expr): Move canonicalization of index+cst... * expr.c (expand_expr_real_1): ... here. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug target/39423] [SH] performance regression: lost mov @(disp,Rn)
--- Comment #2 from chrbr at gcc dot gnu dot org 2009-03-11 08:46 --- I observed some large performance regressions in 4.3 and upwards for many benchmarks for the superh targets, There are many causes but the main one is reduced to the indirect+offset access : int foo (int tab[], int index) { return tab [index+1]; } compiles (-O2 -fomit-frame-pointer) into mov r5,r0 add #1,r0 shll2 r0 rts mov.l @(r0,r4),r0 instead of shll2 r5 add r4,r5 rts mov.l @(4,r5),r0 Note that in more complex code the problem is emphasized because only r0 register class can be used as indirect register index, putting extra pressure on reload. It seems to be that the problem is in the way that the constant index is now hidden by gimple, so we now have return *(tab + ((unsigned int) index + 1) * 4) instead of return *(tab + 4B + (int *) ((unsigned int) index * 4)) It seems more easy to change gimple, but this is a target dependant transformation. On the other hand the RTL code gen should be able to redistribute the factorization, but that seems extra work to undo what was done previously. -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | Summary|[ |[SH] performance ||regression: lost mov ||@(disp,Rn) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug target/39423] [SH] performance regression: lost mov @(disp,Rn)
-- chrbr at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |chrbr at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-03-11 08:52:58 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug target/39423] [SH] performance regression: lost mov @(disp,Rn)
--- Comment #4 from chrbr at gcc dot gnu dot org 2009-03-11 09:30 --- (In reply to comment #3) See http://gcc.gnu.org/ml/gcc-patches/2008-12/msg01134.html Thanks, I tried your patch against a 4.3.3 base but it didn't fix the problem, your patch canonicalizes while what I need is a distribution (base + 1) * 4 = base * 4 + 4 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug target/39423] [SH] performance regression: lost mov @(disp,Rn)
--- Comment #8 from chrbr at gcc dot gnu dot org 2009-03-11 14:07 --- I have picky disabled the canonicalization in fold_plusminus_mult_expr for identical constants that are power of 2, so my mov @(disp, rn) is back :-(. For some reason your patch let the base+index computation factorization thru This is experimental for now, because expand_expr needs to be extended to repair expressions like return ((a * 4) + 4) that are not an indirect_ref. (thanks we differ PLUS expr from POINTER_PLUS_EXPR) +++ fold-const.c2009-03-11 13:49:40.0 +0100 @@ -7431,7 +7431,10 @@ same = NULL_TREE; if (operand_equal_p (arg01, arg11, 0)) -same = arg01, alt0 = arg00, alt1 = arg10; +{ + if (code != PLUS_EXPR || exact_log2 (TREE_INT_CST_LOW (arg01)) == -1) + same = arg01, alt0 = arg00, alt1 = arg10; +} else if (operand_equal_p (arg00, arg10, 0)) same = arg00, alt0 = arg01, alt1 = arg11; else if (operand_equal_p (arg00, arg11, 0)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug target/39423] New: [
-- Summary: [ Product: gcc Version: 4.3.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39423
[Bug c++/39391] New: argument dependant name lookup don't catch pointer to function
ref iec/iso c++ section 3.4.2 gcc correctly reports an error when the argument is one of the fundamental type and the associated namespace is empty. like the call to 'f' in the attached example. However if the argument is a pointer to function the associated name space should be the one associated with the function. So it seems to me that the call to 'h' should not generate an error. -- Summary: argument dependant name lookup don't catch pointer to function Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39391
[Bug c++/39391] argument dependant name lookup don't catch pointer to function
--- Comment #1 from chrbr at gcc dot gnu dot org 2009-03-06 14:05 --- Created an attachment (id=17406) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17406action=view) Example -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39391
[Bug tree-optimization/35178] Misaligned Accesses on arrays of packed stucts
--- Comment #4 from chrbr at gcc dot gnu dot org 2008-02-19 07:53 --- fixed in mainline -- chrbr at gcc dot gnu dot org changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2008- ||02/msg00690.html Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35178
[Bug middle-end/23868] [4.1/4.2/4.3 regression] builtin_apply uses wrong mode for multi-hard-register return values
--- Comment #11 from chrbr at gcc dot gnu dot org 2008-02-15 13:08 --- If no one as started to do so, I'm resurecting this patch for the mainline, I can test builtin-apply4.c on sh4. btw, builtin-apply4.c doesn't currently fail with the testsuite because it is restricted to { target { { i?86-*-* x86_64-*-* } ilp32 } } -- chrbr at gcc dot gnu dot org changed: What|Removed |Added CC||kkojima at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23868
[Bug c/35178] New: Misaligned Accesses on arrays of packed stucts
This bug was noticed on sh but potentially impacts other STRICT_ALIGNMENT targets. The attached test case reduces a misaligned field access from an array of packed structs. In this example compiled with -O2 the field wMaxPacketSize is accessed (on sh4) with a mov.w instruction although it is byte aligned. It seems that the tree-ssa-loop ivopts did not check packed struct offsets indexed by an induction variable. Thus if the struct size is not aligned, the field becomes unaligned after the first iteration even if it was aligned from the base of this structure. The proposed patch solves this problem by expanding may_be_unaligned_p to check that a loop carried offset is a multiple of the desired alignment. -- Summary: Misaligned Accesses on arrays of packed stucts Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org GCC target triplet: sh4-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35178
[Bug c/35178] Misaligned Accesses on arrays of packed stucts
--- Comment #1 from chrbr at gcc dot gnu dot org 2008-02-13 13:42 --- Created an attachment (id=15138) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15138action=view) gcc testsuite case -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35178
[Bug c/35178] Misaligned Accesses on arrays of packed stucts
--- Comment #2 from chrbr at gcc dot gnu dot org 2008-02-13 13:45 --- Created an attachment (id=15139) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15139action=view) Proposed patch regression tested on sh-superh-elf and sh4-linux-gnu. and bootstraped for i686-pc-linux-gnu. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35178
[Bug target/34807] New: SH4 �R0_REGS� spill failure when using asm
Building the attached file (extracted and reduced from the uclibc) with -O[1,2,3,s] -fPIC fails : test.c: In function _start: test.c:14: error: unable to find a register to spill in class R0_REGS test.c:14: error: this is the insn: (insn 16 28 17 0 (set (mem/c/i:SI (plus:SI (reg:SI 12 r12) (reg/f:SI 1 r1 [160])) [0 buf+0 S4 A32]) (reg/v:SI 0 r0 [ __sc0 ])) 179 {movsi_ie} (insn_list:REG_DEP_TRUE 13 (insn_list:REG_DEP_TRUE 11 (nil))) (expr_list:REG_DEAD (reg/f:SI 1 r1 [160]) (expr_list:REG_DEAD (reg/v:SI 0 r0 [ __sc0 ]) (nil test.c:14: internal compiler error: in spill_failure -- Summary: SH4 R0_REGS spill failure when using asm Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: chrbr at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org GCC target triplet: sh-superh-elf,sh4-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34807
[Bug target/34807] SH4 �R0_REGS� spill failure when using asm
--- Comment #1 from chrbr at gcc dot gnu dot org 2008-01-16 08:47 --- Created an attachment (id=14945) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14945action=view) Test case build with sh-superh-elf-gcc -O1 -fPIC test.c -S -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34807
[Bug target/34807] SH4 �R0_REGS� spill failure when using asm
--- Comment #2 from chrbr at gcc dot gnu dot org 2008-01-16 11:15 --- Created an attachment (id=14946) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14946action=view) fails with 4.2.2 and 4.3.0 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34807
[Bug c++/19531] NRV is performed on volatile temporary
--- Comment #7 from chrbr at gcc dot gnu dot org 2007-10-31 07:56 --- Subject: Bug 19531 Author: chrbr Date: Wed Oct 31 07:55:46 2007 New Revision: 129792 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129792 Log: fix PR c++/19531: NRV is performed on volatile temporary Added: trunk/gcc/testsuite/g++.dg/opt/nrv8.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/typeck.c trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19531
[Bug c++/19531] NRV is performed on volatile temporary
--- Comment #8 from chrbr at gcc dot gnu dot org 2007-10-31 08:01 --- fixed check_return_expr -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19531
[Bug rtl-optimization/15473] Sibcall optimization for libcalls.
--- Comment #4 from chrbr at gcc dot gnu dot org 2007-10-09 08:36 --- *** Bug 32684 has been marked as a duplicate of this bug. *** -- chrbr at gcc dot gnu dot org changed: What|Removed |Added CC||pinskia at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15473
[Bug tree-optimization/32684] Missed tail call with sin/cos and sincos pass
--- Comment #2 from chrbr at gcc dot gnu dot org 2007-10-09 08:36 --- I think this is a duplicate of #15473 (Sibcall optimization for libcalls). *** This bug has been marked as a duplicate of 15473 *** -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32684
[Bug tree-optimization/32684] Missed tail call with sin/cos and sincos pass
--- Comment #5 from chrbr at gcc dot gnu dot org 2007-10-09 13:12 --- you are right, it's not a sibcall, my mistake. But even at the tree level I still don't see the builtin marked as tailcall. On a reduced case when entering find_tail_calls I have D.1177_2 = __builtin_cos (phi_1(D)); D.1176_3 = COMPLEX_EXPR D.1177_2, 0.0; return D.1176_3; and this is not recognized as a tailcall a candidate because the GIMPLE_MODIFY_STMT operand 1 is a complex_expr, not a call. note that in the absence of complex_expr, such as a builtin_memset. all is fine -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32684
[Bug tree-optimization/32684] Missed tail call with sin/cos and sincos pass
--- Comment #6 from chrbr at gcc dot gnu dot org 2007-10-09 13:15 --- you are right, it's not a sibcall, my mistake. typo, I meant libcall not sibcall -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32684
[Bug c++/19531] NRV is performed on volatile temporary
--- Comment #6 from chrbr at gcc dot gnu dot org 2007-10-08 11:02 --- patch+testcase at http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01902.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19531
[Bug c++/19531] NRV is performed on volatile temporary
--- Comment #4 from chrbr at gcc dot gnu dot org 2007-09-24 07:10 --- Created an attachment (id=14248) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14248action=view) volatile nrv patch -- chrbr at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |chrbr at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19531
[Bug c++/19531] NRV is performed on volatile temporary
--- Comment #5 from chrbr at gcc dot gnu dot org 2007-09-24 07:14 --- the attached patch was hanging in my sandbox. will submit it along with a testsuite case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19531
[Bug rtl-optimization/15473] Sibcall optimization for libcalls.
--- Comment #3 from chrbr at gcc dot gnu dot org 2007-09-03 13:24 --- this report is quite old, but worth to pop : We found similar problems with implicit memory block copying when using struct copying by value. (frequent in C++ ) Softfloat architectures making a very extensive use of libcalls are also very sensitive to this lost optimisation (it is a performance regression since the optimisation was correctly done with a gcc 3.4.3). The rtl was then emitted both for normal calls and sibling calls and stored in a placeholder. The placeholder was decided to be emitted after all the stmts were expanded. Since gcc 4.0 the placeholders have disapeared so we lost the ability to optimise libcalls in the backend. I will try to make use of the cfg information available in expand to decide if we can pass BLOCK_OP_TAILCALL to emit_block_move. I expect that libcalls can share the same interface. -- chrbr at gcc dot gnu dot org changed: What|Removed |Added CC||chrbr at gcc dot gnu dot org Last reconfirmed|2006-03-01 02:40:48 |2007-09-03 13:24:57 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15473
[Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
--- Comment #7 from chrbr at gcc dot gnu dot org 2007-06-08 07:58 --- Subject: Bug 29953 Author: chrbr Date: Fri Jun 8 07:58:41 2007 New Revision: 125564 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=125564 Log: PR target/29953 * config/sh/sh.md (doloop_end): New pattern and splitter. * loop-iv.c (simple_rhs_p): Check for hardware registers. Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh.md trunk/gcc/loop-iv.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
[Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
--- Comment #8 from chrbr at gcc dot gnu dot org 2007-06-08 08:18 --- doloop_optimize does the iv inversion with the doloop_end insn support in the machine description. -- chrbr at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
[Bug target/29953] [SH-4] Perfomance regression in loops. cmp/eq used instead of dt
--- Comment #6 from chrbr at gcc dot gnu dot org 2007-05-15 10:30 --- I dropped the 4.1 and implemented a -finvert-loops option on the trunk. This option allows a basic induction variable to be decremented instead of incremented to support exit testing against 0. I'm validating a patch on intel and sh. -- chrbr at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |chrbr at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED Last reconfirmed|2007-04-03 16:34:17 |2007-05-15 10:30:36 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29953
[Bug target/31403] wrong branch instructions generated with -m2a on sh-elf
--- Comment #3 from chrbr at gcc dot gnu dot org 2007-04-23 07:59 --- Hi Kaj, The same problem seems to transpire from the movsf_ie pattern for the sh2a-fpu that also have 32 bit memory instructions. So your fix also applies there. Note that traditional sh memory move instructions can also have a length of 2 so your fix is conservative (but not more than the previous code). Shouldn't the new 4 bytes instructions be described latter with a new memory constraint ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31403
[Bug target/31640] New: cache align alignment is too aggressive on sh-elf
The sh4 port aligns blocks that have no fallthrus and that are either frequently executed (JUMP_ALIGN) or preceeded a barrier (LABEL_ALIGN_AFTER_BARRIER) on a cache line. While in theory this help to avoid cache misses if the block slits over 2 cache lines, in practise this reduces cache locality and lenghten distance between blocks. The number of issued instructions are also impacted. For example the relative indirect address in jump tables needs a byte zero extend instruction if the distance occupies 8 bits instead of 7 bits. I ran some experiments and benchmarked (eembc) with 2 strategies 1) -falign-jumps=1 2) Align the block if the size is bigger than a given threshold. (empirically set to 16 bytes, half of the cache line size). See illustrating attached patch. My conclusion is that in -O3 the performance never degrades (option 2 is a little bit better, even improving dhrystone by 3%) when removing this padding. And the text size improves by ~15%. So I was not able to measurate the benefit of the cache line padding although the code size impact is big (even in -O2/-O3 a code size bloat should be motivated by some performance improvement). Is there a motivating test that justifies this microoptimisation ? In the illustrating patch I still align the basic blocks on 4-bytes to account for better instruction fetch accesses -- Summary: cache align alignment is too aggressive on sh-elf Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: chrbr at gcc dot gnu dot org GCC target triplet: sh-superh-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug target/31640] cache block alignment is too aggressive on sh-elf
-- chrbr at gcc dot gnu dot org changed: What|Removed |Added Severity|normal |minor Summary|cache align alignment is too|cache block alignment is too |aggressive on sh-elf|aggressive on sh-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug target/31640] cache block alignment is too aggressive on sh-elf
--- Comment #1 from chrbr at gcc dot gnu dot org 2007-04-20 14:13 --- Created an attachment (id=13391) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13391action=view) Illustrative patch to not align small basic blocks I used this patch to reduce the number of basic blocks aligned on cache-lines. My choice was not to align blocks less than 16 bytes (also tried 32 bytes) seems to give the best results. Note than never aligning doesn't degrade eebmc perfs (similar to -falign-jumps=1) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640
[Bug target/31640] cache block alignment is too aggressive on sh-elf
--- Comment #2 from chrbr at gcc dot gnu dot org 2007-04-20 15:51 --- Created an attachment (id=13393) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13393action=view) testcase for new instruction introduced by increased distance In this example, the max distance between the jump table and the cases is artificially augmented by the padding. Although each basic block is very small and has very few chances to spread over several cache blocks. In addition the extu.b r1,r1 instruction can be avoided. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640