[Bug target/57748] ICE on ARM with -mfloat-abi=softfp -mfpu=neo
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748 philb at gnu dot org changed: What|Removed |Added CC||philb at gnu dot org --- Comment #4 from philb at gnu dot org --- I was able to get Khem's testcase to provoke a crash at: 4761 gcc_assert (TREE_CODE (offset) == INTEGER_CST); Apparently OFFSET is: plus_expr 0x76380d48 type integer_type 0x76c6 sizetype public unsigned SI size integer_cst 0x76c5c080 constant 32 unit size integer_cst 0x76c5c0a0 constant 4 align 32 symtab 0 alias set -1 canonical type 0x76c6 precision 32 min integer_cst 0x76c5c0c0 0 max integer_cst 0x76c4b000 4294967295 arg 0 mult_expr 0x76380d20 type integer_type 0x76c6 sizetype arg 0 nop_expr 0x76381b80 type integer_type 0x76c6 sizetype arg 0 ssa_name 0x76374900 type integer_type 0x76c605e8 int var var_decl 0x7637a428 jdef_stmt j_22 = PHI 0(4), j_31(7) version 22 arg 1 integer_cst 0x76c5c600 constant 16 arg 1 integer_cst 0x76c5c120 type integer_type 0x76c6 sizetype constant 8
[Bug target/49473] [arm] poor scheduling of loads
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473 --- Comment #3 from philb at gnu dot org 2011-08-03 10:38:28 UTC --- (In reply to comment #2) This looks like it might be to do with the latency of the call instruction at least for the LPIC0 case. The scheduler thinks that r0 isn't ready really till cycle 34 or so and hence the compiler can't hoist the mov r5, r0 above the add r4, pc, r4 . That seems rather peculiar. The worst case behaviour that the called function is likely to have would be something like: ldr r0, [r1] bx lr It's possible that the ldr might have a result latency of up to four cycles (if it were an ARM1136 unaligned access), but the bx will take a minimum of four cycles even if it was correctly predicted by the return stack and hence the result latency of the ldr will effectively be annulled. So, as far as the scheduler is concerned, it seems as though the result latency of the call instruction should be considered to be one.
[Bug target/49422] [arm] unable to find a register to spill in class 'VFP_LO_REGS'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49422 philb at gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID --- Comment #2 from philb at gnu dot org 2011-06-22 13:46:51 UTC --- I can't reproduce it now either. I think I must have been testing against a locally patched tree rather than the clean one by mistake. I'll close this bug until/unless I can reproduce the failure on a released version.
[Bug target/49473] New: [arm] poor scheduling of loads
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473 Summary: [arm] poor scheduling of loads Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org Target: arm-linux The instruction scheduler doesn't seem to be doing a very good job of accounting for the load delay slots on ARM1136JF-S. See for example the attached testcase: $ ./cc1 -fPIC -O2 -mtune=arm1136jf-s -march=armv6 -mfpu=vfp -mfloat-abi=soft which yields: gst_mpegts_demux_sink_setcaps: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 stmfdsp!, {r4, r5, r6, r7, r8, lr} subsp, sp, #16 movr7, r1 blgst_object_get_parent(PLT) movr1, #0 ldrr4, .L7 .LPIC0: addr4, pc, r4 movr5, r0 movr0, r7 blgst_caps_get_structure(PLT) ldrr3, .L7+4 ldrr6, [r4, r3] ldrr3, [r6, #0] cmpr3, #3 movr8, r0 bls.L5 ldrr3, .L7+8 ldrr1, .L7+12 .LPIC2: addr3, pc, r3 addr2, r3, #64 stmiasp, {r1, r5} strr2, [sp, #8] strr7, [sp, #12] addr2, r3, #12 movr0, #0 movr1, #4 addr3, r3, #32 blgst_debug_log(PLT) .L5: ldrr4, .L7+16 addr2, r5, #32768 .LPIC1: addr4, pc, r4 movr0, r8 movr1, r4 addr2, r2, #172 blgst_structure_get_int(PLT) cmpr0, #0 bne.L3 ldrr3, [r6, #0] cmpr3, #3 bls.L3 movr2, #484 addr3, r4, #88 stmiasp, {r2, r5} strr3, [sp, #8] movr1, #4 addr2, r4, #12 addr3, r4, #32 blgst_debug_log(PLT) .L3: movr0, r5 blgst_object_unref(PLT) movr0, #1 addsp, sp, #16 ldmfdsp!, {r4, r5, r6, r7, r8, pc} Note that: - the add at .LPIC0 will stall for two cycles because the preceding load has a result latency of three. The two subsequent MOVs could have been scheduled in these slots since they don't have any data dependency on the ADD; - the add at .LPIC1 will stall for one cycle for the same reason, and the same applies to the following MOV. On this topic I noticed that arm1136jfs.md has: ;; An alu op can start sooner after a load, if that alu op does not ;; have an early register dependency on the load (define_bypass 2 11_load1 11_alu_op) (define_bypass 2 11_load1 11_alu_shift_op arm_no_early_alu_shift_value_dep) (define_bypass 2 11_load1 11_alu_shift_reg_op arm_no_early_alu_shift_dep) ... which seems a little strange, since the result latency of LDR is three not two according to the documentation. The above bypasses look like they would be correct for instructions where the dependency is a Late Reg, but that isn't the case for alu_ops.
[Bug target/49473] [arm] poor scheduling of loads
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473 --- Comment #1 from philb at gnu dot org 2011-06-20 11:43:48 UTC --- Created attachment 24564 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24564 testcase
[Bug c++/49433] New: internal compiler error: in write_builtin_type, at cp/mangle.c:2167
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49433 Summary: internal compiler error: in write_builtin_type, at cp/mangle.c:2167 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org Created attachment 24543 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24543 testcase Due to some stray CFLAGS I found myself compiling libstdc++ with -flto turned on, which yielded: $ arm-oe-linux-gnueabi-g++ -O2 -flto -g -fpermissive -std=gnu++0x atomic.ii -S In file included from /home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/functional:59:0, from /home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/mutex:43, from /home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/libstdc++-v3/src/atomic.cc:28: /home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/bits/functional_hash.h: In instantiation of 'std::size_t std::hash_Tp::operator()(_Tp) const [with _Tp = long double, std::size_t = unsigned int]': /home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/libstdc++-v3/src/atomic.cc:122:1: instantiated from here /home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/bits/functional_hash.h:184:5: internal compiler error: in write_builtin_type, at cp/mangle.c:2167 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. $
[Bug target/49421] New: [arm] suboptimal choice of working regs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49421 Summary: [arm] suboptimal choice of working regs Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org If a leaf function requires one more working register than can be accomodated in the call-clobbered set, gcc currently tends to push r4 and use that next. However, in the specific case of a leaf function, it would be better to push lr and use that as the working register, since then the return can be done with a single pop. Consider the made-up example: int f(int *a, int *b, int *c, int *d) { int i; for (i = 0; i 4; i++) if (a[i] || b[i] || c[i] || d[i]) return 1; return 0; } which compiles (-march=armv6 -mtune=arm1136jf-s -O2) to: f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movip, #0 strr4, [sp, #-4]! .L3: ldrr4, [r0, ip] cmpr4, #0 bne.L7 ldrr4, [r1, ip] cmpr4, #0 bne.L7 ldrr4, [r2, ip] cmpr4, #0 bne.L7 ldrr4, [r3, ip] addip, ip, #4 cmpr4, #0 bne.L7 cmpip, #16 bne.L3 movr0, r4 .L2: ldmfdsp!, {r4} bxlr .L7: movr0, #1 b.L2 If lr had been pushed instead of r4 then the return could have simply been pop {lr}. Also, since this is arm11, it is no more expensive to push two words than one. If the compiler had stacked both r4 and lr, it would have freed up an extra register for the loop which would probably have allowed the loads to be scheduled better.
[Bug target/49422] New: [arm] unable to find a register to spill in class 'VFP_LO_REGS'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49422 Summary: [arm] unable to find a register to spill in class 'VFP_LO_REGS' Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org Target: arm-linux Created attachment 24536 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24536 testcase $ arm-oe-linux-gnueabi-gcc -fPIC -mfpu=vfp -O2 s_span.i -march=armv6j -mtune=arm1136jf-s -mfloat-abi=softfp -ffast-math -S swrast/s_span.c: In function '_swrast_write_rgba_span': swrast/s_span.c:1297:1: error: unable to find a register to spill in class 'VFP_LO_REGS' swrast/s_span.c:1297:1: error: this is the insn: (insn 2389 2380 3422 269 (set (subreg:SI (reg:QI 2169) 0) (unsigned_fix:SI (fix:SF (reg/v:SF 78 s15 [orig:685 a ] [685] swrast/s_span.c:867 670 {fixuns_truncsfsi2} (expr_list:REG_DEAD (reg/v:SF 78 s15 [orig:685 a ] [685]) (nil))) swrast/s_span.c:1297: confused by earlier errors, bailing out $
[Bug target/49423] New: [arm] internal compiler error: in push_minipool_fix
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49423 Summary: [arm] internal compiler error: in push_minipool_fix Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org $ arm-oe-linux-gnueabi-gcc -march=armv7-a -O2 -S -mfloat-abi=softfp -mfpu=vfp svga_tgsi_insn.i svga_tgsi_insn.c: In function 'svga_shader_emit_instructions': svga_tgsi_insn.c:2969:1: internal compiler error: in push_minipool_fix, at config/arm/arm.c:12138 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. $
[Bug target/49423] [arm] internal compiler error: in push_minipool_fix
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49423 --- Comment #1 from philb at gnu dot org 2011-06-15 13:50:23 UTC --- Created attachment 24537 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24537 testcase
[Bug target/49392] [arm] spurious EABI version mismatches when LTO enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49392 philb at gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID --- Comment #3 from philb at gnu dot org 2011-06-15 14:57:32 UTC --- I just tried a different linker and that does seem to have made the problem go away. So I guess there is no gcc bug here. Thanks.
[Bug target/49391] New: [arm] sp not accepted as input for alu operation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49391 Summary: [arm] sp not accepted as input for alu operation Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org Target: arm-linux $ cat t.c #define THREAD_SIZE8192 static inline struct thread_info *current_thread_info(void) { register unsigned long sp asm (sp); return (struct thread_info *)(sp ~(THREAD_SIZE - 1)); } int f() { return (int)current_thread_info(); } $ arm-linux-gnueabi-gcc -O2 -S t.c $ cat t.s .cpu arm10tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 18, 4 .file t.c .text .align 2 .global f .type f, %function f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, sp bic r0, r3, #8128 bic r0, r0, #63 bx lr .size f, .-f .ident GCC: (GNU) 4.6.0 .section.note.GNU-stack,,%progbits The mov r3, sp is redundant since sp could be used directly as the second operand to BIC. It wasn't immediately obvious to me from the predicates on arm_andsi3_insn why combine wouldn't be accepting sp as an input operand to that pattern, but apparently it isn't. (This particular idiom of calculating from sp is used quite frequently in the Linux kernel.)
[Bug target/49392] New: [arm] spurious EABI version mismatches when LTO enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49392 Summary: [arm] spurious EABI version mismatches when LTO enabled Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ph...@gnu.org Target: arm-linux Attempting to build even a trivial executable with -flto yields: pb@lander:~$ cat t.c #include stdio.h int main() { printf(Hello world); } pb@lander:~$ arm-oe-linux-gnueabi-gcc -flto t.c /home/pb/oe/build-giga/tmp-eglibc/sysroots/x86_64-linux/libexec/armv6-oe-linux-gnueabi/gcc/arm-oe-linux-gnueabi/4.6.0/arm-oe-linux-gnueabi-ld: error: Source object /tmp/cc60ozAJ.o.ironly has EABI version 0, but target a.out has EABI version 5 /home/pb/oe/build-giga/tmp-eglibc/sysroots/x86_64-linux/libexec/armv6-oe-linux-gnueabi/gcc/arm-oe-linux-gnueabi/4.6.0/arm-oe-linux-gnueabi-ld: failed to merge target specific data of file /tmp/cc60ozAJ.o.ironly collect2: ld returned 1 exit status pb@lander:~$