[Bug rtl-optimization/78255] [5/6 regression] Indirect sibling call causing wrong code generation for ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #17 from Andre Vieira --- Yes, how do I change this to "verified"?
[Bug c++/77388] Reference to a packed structure member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388 --- Comment #5 from Andre Vieira --- I see, thank you! Oh and leaving out the const yields an error: t.cpp:28:16: error: cannot bind packed field '((B*)this)->B::s->test_struct::c' to 'short int&' return A (s->c);
[Bug c++/77388] Reference to a packed structure member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388 --- Comment #3 from Andre Vieira --- Thank you Richard! I have a follow up question. Why is this only a problem when passing by reference and not when passing a pointer? So say: #define PACKED __attribute__ ((packed)) #define TYPE_C short typedef struct { TYPE_C c; } PACKED test_struct; class A { const TYPE_C * c; public: A (const TYPE_C * _c) : c(_c) {}; }; class B { public: B(); A foo (); private: test_struct * s; }; A B::foo () { return A (&(s->c)); } Wouldn't there still be an alignment mismatch between A::c and s->c?
[Bug c++/77388] New: Reference to a packed structure member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388 Bug ID: 77388 Summary: Reference to a packed structure member Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- As initially reported by Michal on https://answers.launchpad.net/gcc-arm-embedded/+question/345145 gcc seems to be showing some weird behavior when it comes to passing a reference to a member in a packed structure. I was able to reduce the testcase presented in that launchpad ticket to the following program: $cat t.cpp: #define PACKED __attribute__ ((packed)) #define TYPE_C short typedef struct { TYPE_C c; } PACKED test_struct; class A { const TYPE_C public: A (const TYPE_C & _c) : c(_c) {}; }; class B { public: B(); A foo (); private: test_struct * s; }; A B::foo () { return A (s->c); } Compiling this with $arm-none-eabi-g++ -mcpu=cortex-m7 -mthumb -S -O1 t.cpp -fdump-tree-optimized Will yield the following dump: ;; Function A B::foo() (_ZN1B3fooEv, funcdef_no=3, decl_uid=4607, cgraph_uid=3, symbol_order=3) A B::foo() (struct B * const this) { const short int D.4636; struct A D.4650; : MEM[(struct A *)] = D.4636 ={v} {CLOBBER}; return D.4650; } As you can see, it will not load the struct's field. Changing the 'TYPE_C' define to 'char' will yield the following dump: ;; Function A B::foo() (_ZN1B3fooEv, funcdef_no=3, decl_uid=4607, cgraph_uid=3, symbol_order=3) A B::foo() (struct B * const this) { struct A D.4649; struct test_struct * _1; char * _2; : _1 = this_4(D)->s; _2 = &_1->c; MEM[(struct A *)] = _2; return D.4649; } Now when the type is 'char' it seems to be able to get the fields address. Can anyone shine some light on this for me? Is referencing a packed structure's member that is not guaranteed to be aligned (so not char) undefined behavior?
[Bug rtl-optimization/70164] [6/7 Regression] Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 --- Comment #16 from Andre Vieira --- Any progress on this one?
[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237 --- Comment #3 from Andre Vieira --- Created attachment 38576 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38576=edit Regular generation at -O2
[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237 --- Comment #2 from Andre Vieira --- Created attachment 38575 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38575=edit Assembly with the changed passes.def removing one pass of lim
[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237 --- Comment #1 from Andre Vieira --- So yes disabling LIM will make the tests "PASS". Though I couldnt find an option to do this, I disabled the pass by changing passes.def, so that doesnt sound like a good idea to test SCCP. However, it might be good to point out that at least for arm-none-eabi and x86_64-pc-linux-gnu these tests are no longer testing SCCP, SCCP will not change this code. I looked at the dumps and compared assembly of -O2 with and without '-fno-tree-scev-cprop'. On arm-none-eabi, it used to be IVOPTS that made the test pass, it would reuse the same ivtmp for computing the address used by the memory dereference and the a_p assignment. Now due to the reordering of LIM, it will no longer do this. On x86_64 I see the following code coming out of the OPTIMIZED dump for the scev-4.c case: ... : # ivtmp.10_14 = PHI <_24(3), ivtmp.10_25(4)> i_11 = (int) ivtmp.10_14; MEM[symbol: a, index: ivtmp.10_14, step: 8, offset: 4B] = 100; ivtmp.10_25 = ivtmp.10_14 + _24; i_22 = (int) ivtmp.10_25; if (i_22 <= 999) goto ; else goto ; : _2 = (sizetype) i_11; _3 = _2 * 8; _10 = _3 + 4; _1 = + _10; a_p = _1; ... Now yes the scan-times will pass, but thats because the MEM is using symbol:a instead of base: Not sure this can be qualified as a proper PASS. Disabling LIM here the same way I did before, that is removing the pass_lim after pass_laddress and before pass_split_crit_edges generates the following OPTIMIZED dump: ... : _16 = (sizetype) k_4(D); _15 = _16 * 8; _21 = _15 + 4; _22 = + _21; ivtmp.9_14 = (unsigned long) _22; : # i_11 = PHI# ivtmp.9_13 = PHI _1 = (int *) ivtmp.9_13; MEM[base: _1, offset: 0B] = 100; i_8 = k_4(D) + i_11; ivtmp.9_17 = ivtmp.9_13 + _15; if (i_8 <= 999) goto ; else goto ; : a_p = _1; ... I prefer this output, since you loose the needless tailing address calculation. I am not so sure the eventually generated assembly is better in this case though. Ill add both as attachments.
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 --- Comment #19 from Andre Vieira --- > First of all please open a new bug for the FAILs. Second, the fix will > be mostly adjusting the testcase expectations (eventually disabling LIM > for example if we want to test SCCP abilities). Opened a new ticket for this PR71237, makes sense to continue the discussions there. I also quoted your comment there.
[Bug tree-optimization/71237] New: scev tests failing after pass reorganization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237 Bug ID: 71237 Summary: scev tests failing after pass reorganization Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- Ever since the reorganization of the passes moving lim before sink makes these tests fail. FAIL: gcc.dg/tree-ssa/scev-3.c scan-tree-dump-times optimized "" 1 FAIL: gcc.dg/tree-ssa/scev-4.c scan-tree-dump-times optimized "" 1 FAIL: gcc.dg/tree-ssa/scev-5.c scan-tree-dump-times optimized "" 1 This was first reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563#c11 for sparc*-*-solaris2.*, it also fails on arm-none-eabi. Some further discussions on that thread. Quoting Richard: rguent...@suse.de 2016-05-23 07:40:31 UTC >On Fri, 20 May 2016, andre.simoesdiasvieira at arm dot com wrote: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 >> >> --- Comment #17 from Andre Vieira --- >> Ah yes my bad, its not sccp doing it... got a bit confused there... It is >> indeed sink that moves that sequence down. Sorry for the noise. >> >> Question remains on how to clean this up though. Ideally you would like to >> >end >> up with >> >> a_p = ; >> outside of the loop. > >First of all please open a new bug for the FAILs. Second, the fix will >be mostly adjusting the testcase expectations (eventually disabling LIM >for example if we want to test SCCP abilities). > >As to your question it techincally is a job of CSE though it may be a >tough job to make it figure out the equivalence. > >Now, in the case of scev-5.c (the only regression I see on x86_64, with >-m32), SCCP fails to do final value replacement for i_24: > > : > # i_12 = PHI <i_10(3), i_9(5)> > MEM[(int *)][i_12] = 100; > i_9 = i_5 + i_12; > if (i_9 <= 999) >goto ; > else >goto ; > > : > goto ; > > : > # i_24 = PHI <i_12(4)> > _2 = (sizetype) i_24; > _3 = _2 * 4; > _1 = + _3; > a_p = _1; > >which may or may not be a good thing - this way IVOPTs can see the >extra use of i_12 on the loop exit and it _could_ look for derived >uses of that so it _may_ be able to replace the use of i_24 with >something better.
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 --- Comment #17 from Andre Vieira --- Ah yes my bad, its not sccp doing it... got a bit confused there... It is indeed sink that moves that sequence down. Sorry for the noise. Question remains on how to clean this up though. Ideally you would like to end up with a_p = ; outside of the loop.
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 --- Comment #15 from Andre Vieira --- So the code change for sccp moves the following sequence out of the loop: _2 = (sizetype) i_30; _3 = _2 * 8; _10 = _3 + 4; _1 = + _10; a_p = _1; This is basically: *a_p = [last_i].y; I agree that that makes sense, were it not for the fact that sequence is recomputing the address of a[i].y which is already computed inside the loop for: MEM[(int *)][i_11].y = 100; When IVOPTS comes around it creates a code sequence similar to this one to calculate the address it writes 100 to, and you end up with a needless recomputation of the address. Now I don't know what phase should be responsible for cleaning this up, whether its sccp's responsibility to realize that the address computation is the same, or whether there should be some sort of common sub-expression elimination step in between or something else.
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #12 from Andre Vieira --- Same regression observed on arm-none-eabi.
[Bug middle-end/71062] [7 regression] r235622 and restrict pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71062 Andre Vieira changed: What|Removed |Added Target||arm Summary|[bugzilla] r235622 and |[7 regression] r235622 and |restrict pointers |restrict pointers --- Comment #1 from Andre Vieira --- Register keyword here is superfluous. It is all down to the restrict keyword.
[Bug middle-end/71062] New: [bugzilla] r235622 and restrict pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71062 Bug ID: 71062 Summary: [bugzilla] r235622 and restrict pointers Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- Hi there, I have encountered a new FAIL when testing newlib-nano for the arm-none-eabi toolchain which I believe is caused by a change in code generation for freopen.c. After some investigation I was able to trace the issue back to a pointer comparison with a restrict qualified pointer. See below a small piece of code that illustrates the issue. $cat t.c extern const char bar; int foo (register char *__restrict p) { if (p == ) return 1; return 0; } Since revision r235622, the pointer comparison here is evaluated to false during compilation and the whole basic block is optimized away. After some inner struggle and quite a few passes over the restrict definition in the C99 standard(Committee Draft -- April 12, 2011, N1570) I think this assumption that p and can't point to the same object might be invalid. Yes, the C-standard defines the '&' operator to yield a pointer to the object. Though the formal definition of restrict only seems to apply to the dereferencing of pointers. In this case, we do not dereference the pointer created by '' and thus do not access the object that 'p' might be pointing to. Does my reasoning make sense? I find it quite difficult to wrap my head around the definition of restrict. Cheers, Andre
[Bug libstdc++/70379] c99_classification_macros_c++98.cc failing with newlib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70379 Andre Vieira changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Andre Vieira --- Yaakov fixed this in newlib, commit hash b9bbe1bccb1254ce891fc92961be2ec3cd3f6e4a Thanks, closing ticket.
[Bug libstdc++/70379] New: c99_classification_macros_c++98.cc failing with newlib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70379 Bug ID: 70379 Summary: c99_classification_macros_c++98.cc failing with newlib Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- 26_numerics/headers/cmath/c99_classification_macros_c++98.cc fails for newlib on arm-none-eabi with the following errors (clipped the error messages to only include a few): src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:38:16: error: macro "isgreater" requires 2 arguments, but only 1 given src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:40:21: error: macro "isgreaterequal" requires 2 arguments, but only 1 given src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:42:13: error: macro "isless" requires 2 arguments, but only 1 given src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:44:18: error: macro "islessequal" requires 2 arguments, but only 1 given src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:46:20: error: macro "islessgreater" requires 2 arguments, but only 1 given src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:48:18: error: macro "isunordered" requires 2 arguments, but only 1 given This new failure is due to a change that has been made to newlib where -std=c++98 no longer includes the C99 math functions from math.h whereas gnu++98 still does. This leads to _GLIBCXX98_USE_C99_MATH not being declared at configuration time, since this is set by testing a compilation with -std=c++98. This macro is the macro used in cmath to know whether the C99 math functions are present, if so it needs to undefine the ones that are macros. This test uses -std=gnu++98, which will have these macros defined and _GLIBCXX98_USE_C99_MATH not set which leads us to the errors above. This is now also an issue with at least gcc-5.
[Bug rtl-optimization/70278] [6 regression] LRA ICE on trunk for ARM Thumb1 with Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70278 Andre Vieira changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Andre Vieira --- This fixes it on our end. Thank you Bernd. Marking this as RESOLVED FIXED.
[Bug rtl-optimization/70278] New: LRA ICE on trunk for ARM Thumb1 with Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70278 Bug ID: 70278 Summary: LRA ICE on trunk for ARM Thumb1 with Os Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- Created attachment 37999 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37999=edit reduced e_hypot.c Hello, We are running into an ICE in lra on trunk when compiling newlib for an ARM thumb1 target with -Os. This happens when compiling newlib/libm/math/e_hypot.c See the attached reduced source and the error when executing the command below: $arm-none-eabi-gcc -march=armv4t -mthumb -Os -S besttry.c The error: besttry.c: In function '__ieee754_hypot': besttry.c:27:1: internal compiler error: in lra_create_new_reg_with_unique_value, at lra.c:188 } ^ 0x94d30b lra_create_new_reg_with_unique_value(machine_mode, rtx_def*, reg_class, char const*) ../src/gcc/lra.c:188 0x94d358 lra_create_new_reg(machine_mode, rtx_def*, reg_class, char const*) ../src/gcc/lra.c:228 0x9590e4 split_reg ../src/gcc/lra-constraints.c:5034 0x95a2da split_if_necessary ../src/gcc/lra-constraints.c:5142 0x95e08c inherit_in_ebb ../src/gcc/lra-constraints.c:5527 0x95e08c lra_inheritance() ../src/gcc/lra-constraints.c:5813 0x950900 lra(_IO_FILE*) ../src/gcc/lra.c:2312 0x9082d1 do_reload ../src/gcc/ira.c:5408 0x9082d1 execute ../src/gcc/ira.c:5592 My initial investigation into lra shows that this is due to split_reg calling lra_create_new_reg with mode = VOIDmode. Others with more lra experience might be able to spot the origin of the issue quicker though. A bisect shows that this seems to be introduced with revision r234184 where reg split now calls lra_create_new_reg with 'mode' that seems to be set by 'lra_reg_info[hard_regno].biggest_mode' which in our case will be SImode. This mode is passed down to lra_create_new_reg_with_unique_value which asserts if its not VOIDmode. Cheers, Andre PS: Unrelated to this, the code in lra_create_new_reg_with_unique_value looks a bit odd, I think mode should have been initialized with VOIDmode there.
[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 --- Comment #5 from Andre Vieira --- Ah yes I forgot to mention, this is reproduceable with: $arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -Os -S pr45701-1.c
[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #59 from Andre Vieira --- I believe PR70164 is related to this.
[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 --- Comment #4 from Andre Vieira --- Revision r226901 is linked to PR64164, so I added Alexandre Oliva to the watch list.
[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 --- Comment #3 from Andre Vieira --- Created attachment 37923 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37923=edit pre-patch reload dump
[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 --- Comment #2 from Andre Vieira --- Created attachment 37922 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37922=edit pre-patch ira dump
[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 --- Comment #1 from Andre Vieira --- Created attachment 37921 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37921=edit current reload dump
[Bug rtl-optimization/70164] New: Code/performance regression due to poor register allocation on Cortex-M0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164 Bug ID: 70164 Summary: Code/performance regression due to poor register allocation on Cortex-M0 Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- Created attachment 37920 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37920=edit current ira dump After a quick investigation of the testcase in gcc/testsuite/gcc.target/arm/pr45701-1.c for cortex-m0 on trunk I found out that the test case was failing due to a change in the register allocation after revision r226901. Before this register allocation would choose to load the global 'hist_verify' onto r6 representing 'old_verify' prior to the function call to pre_process_line. This old_verify is used after the function call. With the patch it decides to load it onto r3, a caller-saved register, which means it has to be spilled before the function call and reloaded after. Before patch: history_expand_line_internal: push{r3, r4, r5, r6, r7, lr} ldr r3, .L5 ldr r5, .L5+4 ldr r4, [r3] movsr3, #0 ldr r6, [r5] ; <--- load of 'hist_verify' onto r6 movsr7, r0 str r3, [r5] bl pre_process_line addsr6, r4, r6 str r6, [r5] movsr4, r0 cmp r7, r0 bne .L2 bl str_len addsr0, r0, #1 bl x_malloc movsr1, r4 bl str_cpy movsr4, r0 .L2: movsr0, r4 @ sp needed pop {r3, r4, r5, r6, r7, pc} Current: history_expand_line_internal: push{r0, r1, r2, r4, r5, r6, r7, lr} ldr r3, .L3 ldr r5, .L3+4 ldr r6, [r3] ldr r3, [r5]; <--- load of 'hist_verify' onto r3 movsr7, r0 str r3, [sp, #4]; <--- Spill movsr3, #0 str r3, [r5] bl pre_process_line ldr r3, [sp, #4]; <--- Reload movsr4, r0 addsr6, r6, r3 str r6, [r5] cmp r7, r0 bne .L1 bl str_len addsr0, r0, #1 bl x_malloc movsr1, r4 bl str_cpy movsr4, r0 .L1: movsr0, r4 @ sp needed pop {r1, r2, r3, r4, r5, r6, r7, pc} I have also attached the dumps for ira and reload for both pre-patch and current. In the current reload dump insn 9 represents the load onto r3 and insn 62 the spill. In pre-patch ira/reload the load is in insn 10. I am not familiar with RA in GCC, so I'm not entirely sure what code to blame for this sub-optimal allocation, any comments or pointers would be most welcome.
[Bug target/70063] msp430 stack corruption for naked functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70063 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #2 from Andre Vieira --- I believe pr69979 reports a related issue with arm targets.
[Bug target/69979] ARM naked function attribute not handling structs bigger than 32 bits correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69979 --- Comment #1 from Andre Vieira --- I believe expand_function_start is responsible for this code. When it calls assign_parms it will generate RTL to copy the incoming struct parameter onto the stack.
[Bug target/69979] New: ARM naked function attribute not handling structs bigger than 32 bits correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69979 Bug ID: 69979 Summary: ARM naked function attribute not handling structs bigger than 32 bits correctly Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- As reported by Cory in https://bugs.launchpad.net/gcc-arm-embedded/+bug/1549542 It seems the naked function attribute for ARM is generating code for struct parameters being passed in registers. This code stores these structs being passed as registers on the stack, using 'r3' as a scratch register. Apart from being suboptimal, this writes to 'r3' even though 'r3' might be used to hold a parameter! For instance with the following C code: struct test { int a; int b; }; int foo (struct test t, int a, int b) { __asm ("mov r0, r3\n\t" "bx lr"); } when compiled with $arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -S will yield the following assembly: foo: @ Naked Function: prologue and epilogue provided by programmer. @ args = 0, pretend = 0, frame = 8 @ frame_needed = 1, uses_anonymous_args = 0 mov r3, r7 stm r3, {r0, r1} .syntax unified @ 9 "tnaked.c" 1 mov r0, r3 bx lr @ 0 "" 2 .syntax unified nop mov r0, r3 As you see 'r3' will have been rewritten with the frame pointer before being moved to 'r0' for the return. Also the last 'mov r0, r3' after the 'nop' looks a bit odd! Something equally weird happens when returning such a struct: struct test bar (int a, int b, int c) { __asm ("stmia r0, {r2, r3}\n\t" "bx lr"); } One would naturally expect to be storing 'b' and 'c' into '[r0]', the place where the caller expects the return value to be written to. However the following assembly is generated, which overwrites r3 (which should contain argument 'c'): bar: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, r0 @ 16 "tnaked.c" 1 stmia r0, {r2, r3} bx lr @ 0 "" 2 .thumb mov r0, r3 bx lr Again with the unexpected epilogue code creeping in. I have observed this behavior for various ARM targets dating back to gcc 4.8 (haven't tried earlier than that).
[Bug rtl-optimization/69752] New: Reload removing instruction with side-effect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69752 Bug ID: 69752 Summary: Reload removing instruction with side-effect Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- This behavior was caught when debugging the following fail for -mcpu=cortex-m0: FAIL: g++.dg/torture/vshuf-v2di.C -O2 execution test After some debugging I noticed that reload would remove an insn that contained a post_inc which would cause the shuffle to be off by 4 (the value of the post increase). Using the exact same sources as the testsuite, if you compile with -fdump-rtl-all you can observe that pre reload (ira) you encounter the following sequence of RTL: (insn 455 191 213 6 (set (reg/f:SI 267) (reg/f:SI 379)) 748 {*thumb1_movsi_insn} (nil)) (insn 213 455 216 6 (set (reg:SI 266) (mem/u/c:SI (post_inc:SI (reg/f:SI 267)) [4 S4 A32])) 748 {*thumb1_movsi_insn} (expr_list:REG_EQUIV (const_int -1044200508 [0xc1c2c3c4]) (expr_list:REG_INC (reg/f:SI 267) (nil (insn 216 213 218 6 (set (reg:SI 268) (mem/u/c:SI (reg/f:SI 267) [4 S4 A64])) 748 {*thumb1_movsi_insn} (expr_list:REG_DEAD (reg/f:SI 267) (nil))) Where pseudo register 267 is post_incremented in insn 213 and used in insn 216 right after. After reload: ... Removing equiv init insn 443 (freq=107) 443: r381:SI=sfp:SI+0x10 REG_EQUIV sfp:SI-0x40 deleting insn with uid = 443. ... (insn 455 191 213 6 (set (reg/f:SI 5 r5 [267]) (reg/f:SI 2 r2 [379])) 748 {*thumb1_movsi_insn} (nil)) (note 213 455 216 6 NOTE_INSN_DELETED) (insn 216 213 521 6 (set (reg:SI 5 r5 [268]) (mem/u/c:SI (reg/f:SI 5 r5 [267]) [4 S4 A64])) 748 {*thumb1_movsi_insn} (nil)) As you see pseudo register 268 (now in r5), will be loaded from mem (r5) which is still pointing to the old value of 267 and not the increased one, causing an offset error of 4. I checked and pseudo register 748 is the same in both cases. Also adding -fno-auto-inc to the test will yield a PASS result.
[Bug rtl-optimization/69752] Reload removing instruction with side-effect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69752 --- Comment #1 from Andre Vieira --- Tried it with GCC 5.2.1 and 6.0, all show the same behavior. For 4.9 I couldnt reproduce the issue.
[Bug target/69538] New: gcc.dg/torture/stackalign/builtin-apply-4.c fails with flto for aarch32 targets with single precision FPU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69538 Bug ID: 69538 Summary: gcc.dg/torture/stackalign/builtin-apply-4.c fails with flto for aarch32 targets with single precision FPU Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- I am getting an execution failure for: gcc.dg/torture/stackalign/builtin-apply-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects I have tried trunk, 5.2 and 4.9 all fail for various (cortex-m4/7 armv7-a) AArch32 targets when running with: "-mfloat-abi=hard -mfpu=fpv{4,5}-sp-d16" I inspected the RTL produced with the following command line: $arm-none-eabi-gcc -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -fno-fat-lto-objects -fgnu89-inline -specs=rdimon.specs -Wa,-mno-warn-deprecated -Wl,-Ttext-segment=0x2100 builtin-apply-4.c -lm -mthumb -march=armv7-a -mfloat-abi=hard -mfpu=fpv5-sp-d16 -o builtin-apply-4_working.exe -save-temps -fdump-rtl-all Which doesnt have flto and passes the execution test. It produces the following RTL for the call to bar: (call_insn/c/i:TI 8 7 38 2 (parallel [ (set (reg:DF 16 s0) (call (mem:SI (symbol_ref:SI ("bar") [flags 0x3] ) [0 bar S4 A32]) (const_int 0 [0]))) (use (const_int 0 [0])) (clobber (reg:SI 14 lr)) ]) builtin-apply-4.c:34 209 {*call_value_symbol} Now the same command line with -flto fails the execution test and produces the following RTL for the call to bar: (call_insn/u/i:TI 8 7 39 2 (parallel [ (set (reg:DF 0 r0) (call (mem:SI (symbol_ref:SI ("bar.constprop.0") [flags 0x3] ) [0 bar.constprop S4 A32]) (const_int 0 [0]))) (use (const_int 0 [0])) (clobber (reg:SI 14 lr)) ]) builtin-apply-4.c:27 209 {*call_value_symbol} Using the same float ABI the LTO expects the return of bar in r0-r1, even though its a double and in hard float abi it should be passed in s0-s1 (d0) as is the case with the no LTO version.
[Bug target/69227] FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69227 Andre Vieira changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Andre Vieira --- Test was changed to require C99 runtime in r232487.
[Bug c++/68385] [6 Regression] ICE building libstdc++ on arm-none-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68385 --- Comment #8 from Andre Vieira --- It did fix it for me, sorry for the late reply.
[Bug target/69227] FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69227 --- Comment #2 from Andre Vieira --- I have decided to email the newlib mailinglist to figure out which function classes we should and should not support for 'arm-none-eabi'. See https://sourceware.org/ml/newlib/2016/msg9.html
[Bug target/69227] New: FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69227 Bug ID: 69227 Summary: FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors) Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andre.simoesdiasvieira at arm dot com Target Milestone: --- Commit r232191 causes the following fail on arm-none-eabi target: FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors) As it no longer folds away __builtin_ceill for __builtin_fabsf. This is because Gerald's patch checks for 'targetm.libc_has_function (function_c99_misc)' for a transformation used here and, for arm-none-eabi, TARGET_LIBC_HAS_FUNCTION is defined as 'no_c99_libc_has_function', which always returns false. The question now is whether we should support function_c99_misc with 'arm-none-eabi', which comes with newlib. I believe newlib does not claim to fully support C99.
[Bug testsuite/68232] gcc.dg/ifcvt-4.c fails on some arm configurations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #3 from Andre Vieira --- Fails also on any ARM M-profile arch/cpu combination I've tried (all with -mthumb): -march={armv6-m,armv7-m} or -mcpu=cortex-m{0,0plus,3,4,7} It does pass for armv7-r and cortex-r4 with and without -mthumb. This all with target 'arm-none-eabi'.
[Bug c++/68385] [6 Regression] ICE building libstdc++ on arm-none-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68385 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #2 from Andre Vieira --- Hi Jason, I don't fully understand what is going wrong here, but when debugging I found that the tree it complains about is coming from a call to convert_to_integer_nofold in the line in ocp_convert, this used to have a fold_if_not_in_template. I found that I no longer got the ICE after reverting the code there to fold 'converted'. Not sure this actually fixes it, I'd need to look further into your patch for this. Hopefully this saves you some debugging yourself. The issue seemed to originate from a nop_expr around a param_declaration and fold gets rid of it. Hope this helps. Cheers, Andre
[Bug testsuite/67948] xor-and.c needs updating after r228661
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67948 Andre Vieira changed: What|Removed |Added CC||andre.simoesdiasvieira@arm. ||com --- Comment #2 from Andre Vieira --- I am working on this and proposed a fix on gcc-patches, see https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01899.html I can't assign the bug to me as I don't have write access.