[Bug target/57748] ICE on ARM with -mfloat-abi=softfp -mfpu=neo

2013-07-03 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

philb at gnu dot org changed:

   What|Removed |Added

 CC||philb at gnu dot org

--- Comment #4 from philb at gnu dot org ---
I was able to get Khem's testcase to provoke a crash at:

4761  gcc_assert (TREE_CODE (offset) == INTEGER_CST);

Apparently OFFSET is:

 plus_expr 0x76380d48
type integer_type 0x76c6 sizetype public unsigned SI
size integer_cst 0x76c5c080 constant 32
unit size integer_cst 0x76c5c0a0 constant 4
align 32 symtab 0 alias set -1 canonical type 0x76c6 precision
32 min integer_cst 0x76c5c0c0 0 max integer_cst 0x76c4b000
4294967295

arg 0 mult_expr 0x76380d20 type integer_type 0x76c6 sizetype

arg 0 nop_expr 0x76381b80 type integer_type 0x76c6
sizetype

arg 0 ssa_name 0x76374900 type integer_type 0x76c605e8
int
var var_decl 0x7637a428 jdef_stmt j_22 = PHI 0(4),
j_31(7)

version 22
arg 1 integer_cst 0x76c5c600 constant 16
arg 1 integer_cst 0x76c5c120 type integer_type 0x76c6
sizetype constant 8


[Bug target/49473] [arm] poor scheduling of loads

2011-08-03 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473

--- Comment #3 from philb at gnu dot org 2011-08-03 10:38:28 UTC ---
(In reply to comment #2)
 This looks like it might be to do with the latency of the call instruction at
 least for the LPIC0 case. The scheduler thinks that r0 isn't ready really till
 cycle 34 or so and hence the compiler can't hoist the mov r5, r0 above the add
 r4, pc, r4 . 

That seems rather peculiar.  The worst case behaviour that the called function
is likely to have would be something like:

ldr r0, [r1]
bx lr

It's possible that the ldr might have a result latency of up to four cycles (if
it were an ARM1136 unaligned access), but the bx will take a minimum of four
cycles even if it was correctly predicted by the return stack and hence the
result latency of the ldr will effectively be annulled.  So, as far as the
scheduler is concerned, it seems as though the result latency of the call
instruction should be considered to be one.


[Bug target/49422] [arm] unable to find a register to spill in class 'VFP_LO_REGS'

2011-06-22 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49422

philb at gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||INVALID

--- Comment #2 from philb at gnu dot org 2011-06-22 13:46:51 UTC ---
I can't reproduce it now either.  I think I must have been testing against a
locally patched tree rather than the clean one by mistake.  I'll close this bug
until/unless I can reproduce the failure on a released version.


[Bug target/49473] New: [arm] poor scheduling of loads

2011-06-20 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473

   Summary: [arm] poor scheduling of loads
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org
Target: arm-linux


The instruction scheduler doesn't seem to be doing a very good job of
accounting for the load delay slots on ARM1136JF-S.  See for example the
attached testcase:

$ ./cc1 -fPIC -O2 -mtune=arm1136jf-s -march=armv6 -mfpu=vfp -mfloat-abi=soft

which yields:

gst_mpegts_demux_sink_setcaps:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
stmfdsp!, {r4, r5, r6, r7, r8, lr}
subsp, sp, #16
movr7, r1
blgst_object_get_parent(PLT)
movr1, #0
ldrr4, .L7
.LPIC0:
addr4, pc, r4
movr5, r0
movr0, r7
blgst_caps_get_structure(PLT)
ldrr3, .L7+4
ldrr6, [r4, r3]
ldrr3, [r6, #0]
cmpr3, #3
movr8, r0
bls.L5
ldrr3, .L7+8
ldrr1, .L7+12
.LPIC2:
addr3, pc, r3
addr2, r3, #64
stmiasp, {r1, r5}
strr2, [sp, #8]
strr7, [sp, #12]
addr2, r3, #12
movr0, #0
movr1, #4
addr3, r3, #32
blgst_debug_log(PLT)
.L5:
ldrr4, .L7+16
addr2, r5, #32768
.LPIC1:
addr4, pc, r4
movr0, r8
movr1, r4
addr2, r2, #172
blgst_structure_get_int(PLT)
cmpr0, #0
bne.L3
ldrr3, [r6, #0]
cmpr3, #3
bls.L3
movr2, #484
addr3, r4, #88
stmiasp, {r2, r5}
strr3, [sp, #8]
movr1, #4
addr2, r4, #12
addr3, r4, #32
blgst_debug_log(PLT)
.L3:
movr0, r5
blgst_object_unref(PLT)
movr0, #1
addsp, sp, #16
ldmfdsp!, {r4, r5, r6, r7, r8, pc}

Note that:

- the add at .LPIC0 will stall for two cycles because the preceding load has a
result latency of three.  The two subsequent MOVs could have been scheduled in
these slots since they don't have any data dependency on the ADD;

- the add at .LPIC1 will stall for one cycle for the same reason, and the same
applies to the following MOV.

On this topic I noticed that arm1136jfs.md has:

;; An alu op can start sooner after a load, if that alu op does not
;; have an early register dependency on the load
(define_bypass 2 11_load1
   11_alu_op)
(define_bypass 2 11_load1
   11_alu_shift_op
   arm_no_early_alu_shift_value_dep)
(define_bypass 2 11_load1
   11_alu_shift_reg_op
   arm_no_early_alu_shift_dep)

... which seems a little strange, since the result latency of LDR is three not
two according to the documentation.  The above bypasses look like they would be
correct for instructions where the dependency is a Late Reg, but that isn't the
case for alu_ops.


[Bug target/49473] [arm] poor scheduling of loads

2011-06-20 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473

--- Comment #1 from philb at gnu dot org 2011-06-20 11:43:48 UTC ---
Created attachment 24564
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24564
testcase


[Bug c++/49433] New: internal compiler error: in write_builtin_type, at cp/mangle.c:2167

2011-06-16 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49433

   Summary: internal compiler error: in write_builtin_type, at
cp/mangle.c:2167
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org


Created attachment 24543
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24543
testcase

Due to some stray CFLAGS I found myself compiling libstdc++ with -flto turned
on, which yielded:

$ arm-oe-linux-gnueabi-g++ -O2 -flto -g -fpermissive -std=gnu++0x atomic.ii -S
In file included from
/home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/functional:59:0,
 from
/home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/mutex:43,
 from
/home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/libstdc++-v3/src/atomic.cc:28:
/home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/bits/functional_hash.h:
In instantiation of 'std::size_t std::hash_Tp::operator()(_Tp) const [with
_Tp = long double, std::size_t = unsigned int]':
/home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/libstdc++-v3/src/atomic.cc:122:1:
  instantiated from here
/home/pb/oe/build-giga/tmp-eglibc/work/armv6-oe-linux-gnueabi/gcc-runtime-4.6.0-r4/gcc-4.6.0/build.arm-oe-linux-gnueabi.arm-oe-linux-gnueabi/libstdc++-v3/include/bits/functional_hash.h:184:5:
internal compiler error: in write_builtin_type, at cp/mangle.c:2167
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
$


[Bug target/49421] New: [arm] suboptimal choice of working regs

2011-06-15 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49421

   Summary: [arm] suboptimal choice of working regs
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org


If a leaf function requires one more working register than can be accomodated
in the call-clobbered set, gcc currently tends to push r4 and use that next. 
However, in the specific case of a leaf function, it would be better to push lr
and use that as the working register, since then the return can be done with a
single pop.  Consider the made-up example:

int f(int *a, int *b, int *c, int *d)
{
  int i;
  for (i = 0; i  4; i++)
if (a[i] || b[i] || c[i] || d[i])
  return 1;

  return 0;
}

which compiles (-march=armv6 -mtune=arm1136jf-s -O2) to:

f:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movip, #0
strr4, [sp, #-4]!
.L3:
ldrr4, [r0, ip]
cmpr4, #0
bne.L7
ldrr4, [r1, ip]
cmpr4, #0
bne.L7
ldrr4, [r2, ip]
cmpr4, #0
bne.L7
ldrr4, [r3, ip]
addip, ip, #4
cmpr4, #0
bne.L7
cmpip, #16
bne.L3
movr0, r4
.L2:
ldmfdsp!, {r4}
bxlr
.L7:
movr0, #1
b.L2

If lr had been pushed instead of r4 then the return could have simply been pop
{lr}.

Also, since this is arm11, it is no more expensive to push two words than one. 
If the compiler had stacked both r4 and lr, it would have freed up an extra
register for the loop which would probably have allowed the loads to be
scheduled better.


[Bug target/49422] New: [arm] unable to find a register to spill in class 'VFP_LO_REGS'

2011-06-15 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49422

   Summary: [arm] unable to find a register to spill in class
'VFP_LO_REGS'
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org
Target: arm-linux


Created attachment 24536
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24536
testcase

$ arm-oe-linux-gnueabi-gcc -fPIC -mfpu=vfp -O2 s_span.i -march=armv6j
-mtune=arm1136jf-s -mfloat-abi=softfp -ffast-math -S
swrast/s_span.c: In function '_swrast_write_rgba_span':
swrast/s_span.c:1297:1: error: unable to find a register to spill in class
'VFP_LO_REGS'
swrast/s_span.c:1297:1: error: this is the insn:
(insn 2389 2380 3422 269 (set (subreg:SI (reg:QI 2169) 0)
(unsigned_fix:SI (fix:SF (reg/v:SF 78 s15 [orig:685 a ] [685]
swrast/s_span.c:867 670 {fixuns_truncsfsi2}
 (expr_list:REG_DEAD (reg/v:SF 78 s15 [orig:685 a ] [685])
(nil)))
swrast/s_span.c:1297: confused by earlier errors, bailing out
$


[Bug target/49423] New: [arm] internal compiler error: in push_minipool_fix

2011-06-15 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49423

   Summary: [arm] internal compiler error: in push_minipool_fix
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org


$ arm-oe-linux-gnueabi-gcc -march=armv7-a -O2 -S -mfloat-abi=softfp -mfpu=vfp
svga_tgsi_insn.i
svga_tgsi_insn.c: In function 'svga_shader_emit_instructions':
svga_tgsi_insn.c:2969:1: internal compiler error: in push_minipool_fix, at
config/arm/arm.c:12138
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
$


[Bug target/49423] [arm] internal compiler error: in push_minipool_fix

2011-06-15 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49423

--- Comment #1 from philb at gnu dot org 2011-06-15 13:50:23 UTC ---
Created attachment 24537
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24537
testcase


[Bug target/49392] [arm] spurious EABI version mismatches when LTO enabled

2011-06-15 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49392

philb at gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||INVALID

--- Comment #3 from philb at gnu dot org 2011-06-15 14:57:32 UTC ---
I just tried a different linker and that does seem to have made the problem go
away.  So I guess there is no gcc bug here.  Thanks.


[Bug target/49391] New: [arm] sp not accepted as input for alu operation

2011-06-13 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49391

   Summary: [arm] sp not accepted as input for alu operation
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org
Target: arm-linux


$ cat t.c
#define THREAD_SIZE8192

static inline struct thread_info *current_thread_info(void)
{
register unsigned long sp asm (sp);
return (struct thread_info *)(sp  ~(THREAD_SIZE - 1));
}

int f()
{
  return (int)current_thread_info();
}
$ arm-linux-gnueabi-gcc -O2 -S t.c
$ cat t.s
.cpu arm10tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 18, 4
.file   t.c
.text
.align  2
.global f
.type   f, %function
f:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mov r3, sp
bic r0, r3, #8128
bic r0, r0, #63
bx  lr
.size   f, .-f
.ident  GCC: (GNU) 4.6.0
.section.note.GNU-stack,,%progbits

The mov r3, sp is redundant since sp could be used directly as the second
operand to BIC.  It wasn't immediately obvious to me from the predicates on
arm_andsi3_insn why combine wouldn't be accepting sp as an input operand to
that pattern, but apparently it isn't.

(This particular idiom of calculating from sp is used quite frequently in the
Linux kernel.)


[Bug target/49392] New: [arm] spurious EABI version mismatches when LTO enabled

2011-06-13 Thread philb at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49392

   Summary: [arm] spurious EABI version mismatches when LTO
enabled
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ph...@gnu.org
Target: arm-linux


Attempting to build even a trivial executable with -flto yields:

pb@lander:~$ cat t.c
#include stdio.h
int main() { printf(Hello world); }

pb@lander:~$ arm-oe-linux-gnueabi-gcc -flto t.c
/home/pb/oe/build-giga/tmp-eglibc/sysroots/x86_64-linux/libexec/armv6-oe-linux-gnueabi/gcc/arm-oe-linux-gnueabi/4.6.0/arm-oe-linux-gnueabi-ld:
error: Source object /tmp/cc60ozAJ.o.ironly has EABI version 0, but target
a.out has EABI version 5
/home/pb/oe/build-giga/tmp-eglibc/sysroots/x86_64-linux/libexec/armv6-oe-linux-gnueabi/gcc/arm-oe-linux-gnueabi/4.6.0/arm-oe-linux-gnueabi-ld:
failed to merge target specific data of file /tmp/cc60ozAJ.o.ironly
collect2: ld returned 1 exit status
pb@lander:~$