[Bug c/98902] -fmerge-all-constants leaves dangling reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98902 Alexander Strange changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Alexander Strange --- Interesting, this is a bug in the Compiler Explorer site. It's hiding .set lines from me.
[Bug c/98902] New: -fmerge-all-constants leaves dangling reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98902 Bug ID: 98902 Summary: -fmerge-all-constants leaves dangling reference Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: astrange at ithinksw dot com Target Milestone: --- This source: -- #include static const int a1[] = {1}; static const int a2[] = {1}; int main (void) { printf("%p %p\n", a1, a2); return 0; } -- produces code where it doesn't emit 'a2' but still references it: -- .LC0: .string "%p %p\n" main: sub rsp, 8 mov edx, OFFSET FLAT:a2 mov esi, OFFSET FLAT:a1 xor eax, eax mov edi, OFFSET FLAT:.LC0 callprintf xor eax, eax add rsp, 8 ret a1: .long 1 -- with '-O2 -fmerge-all-constants'. Did not verify this locally, just in compiler explorer.
[Bug tree-optimization/61515] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #3 from Alexander Strange astrange at ithinksw dot com --- Without checking, -O0 went from 8 - 5 minutes. I stopped the -Os compile at 29 minutes.
[Bug tree-optimization/61515] New: Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 Bug ID: 61515 Summary: Extremely long compile time for generated code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: astrange at ithinksw dot com /usr/local/gcc49/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc49/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc49/libexec/gcc/x86_64-apple-darwin13.2.0/4.10.0/lto-wrapper Target: x86_64-apple-darwin13.2.0 Configured with: ../../cc/gcc/configure --prefix=/usr/local/gcc49 --with-arch=native --with-tune=native --disable-nls --with-gmp=/sw --disable-bootstrap --with-isl=/sw --enable-languages=c,c++,lto,objc,obj-c++ --no-create --no-recursion Thread model: posix gcc version 4.10.0 20140615 (experimental) (GCC) For the attached source (C translation from a large BF program): - gcc -O0 takes 9 minutes - gcc -Os does not finish after 40 minutes
[Bug tree-optimization/61515] Extremely long compile time for generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515 --- Comment #1 from Alexander Strange astrange at ithinksw dot com --- Created attachment 32944 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32944action=edit Preprocessed source
[Bug target/43225] Structure copies not vectorized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225 --- Comment #4 from Alexander Strange astrange at ithinksw dot com 2011-03-29 20:39:28 UTC --- Better source: #include emmintrin.h struct a1 { char l[16];} __attribute__((aligned)); struct a2 { __m128i l; } __attribute__((aligned)); void f1(struct a1 *a, struct a1 *b) { *a = *b; } void f2(struct a2 *a, struct a2 *b) { *a = *b; } void f3(__m128i *a, __m128i *b) { *a = *b; } Code is the same as above in svn. LLVM uses movaps for all three functions.
[Bug inline-asm/46615] New: [4.6 regression] possibly-invalid x86-64 inline asm miscompilation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46615 Summary: [4.6 regression] possibly-invalid x86-64 inline asm miscompilation Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassig...@gcc.gnu.org ReportedBy: astra...@ithinksw.com gcc 4.6 miscompiles this source from ffmpeg on x86-64-apple-darwin10, whereas previous compilers worked. I'm not sure if the asm is legal, but it's existed in the wild for a long time. const unsigned long long __attribute__((aligned(8))) ff_bgr24toUV[2][4] = { {0x3838DAC83838ULL, 0xECFFDAC8ECFFULL, 0xF6E4D0E3F6E4ULL, 0x3838D0E33838ULL}, {0xECFFDAC8ECFFULL, 0x3838DAC83838ULL , 0x3838D0E33838ULL, 0xF6E4D0E3F6E4ULL}, }; static void bgr24ToUV_mmx_MMX2(int f) { __asm__ volatile( movq 24+%0, %%mm6 \n\t :: m(ff_bgr24toUV[f == 0][0])); } void rgb24ToUV_MMX2() { bgr24ToUV_mmx_MMX2(1); } gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc46/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.5.0/4.6.0/lto-wrapper Target: x86_64-apple-darwin10.5.0 Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46 --with-arch=native --with-tune=native --disable-nls --with-gmp=/sw --disable-bootstrap --enable-checking --enable-languages=c,c++,lto,objc,obj-c++ Thread model: posix gcc version 4.6.0 20101122 (experimental) (GCC) gcc -O -o swscale-fails.s -S swscale.i swscale.i: In function 'rgb24ToUV_MMX2': swscale.i:10:2: warning: use of memory input without lvalue in asm operand 0 is deprecated [enabled by default] Working asm (4.2): _rgb24ToUV_MMX2: pushq%rbp movq%rsp, %rbp movq 24+_ff_bgr24toUV(%rip), %mm6 leave ret .globl _ff_bgr24toUV .const .align 3 _ff_bgr24toUV: .quad4050987868490315832 .quad-1369135209168966401 .quad-656399642184648988 .quad4051217538195929144 .quad-1369375758026740481 .quad4051228417348089912 .quad4050987868324313144 .quad-656169972313032988 .section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support Non-working asm (4.6): _rgb24ToUV_MMX2: movq 24+LC0(%rip), %mm6 ret .globl _ff_bgr24toUV .const .align 3 _ff_bgr24toUV: .quad4050987868490315832 .quad-1369135209168966401 .quad-656399642184648988 .quad4051217538195929144 .quad-1369375758026740481 .quad4051228417348089912 .quad4050987868324313144 .quad-656169972313032988 .literal8 .align 3 LC0: .quad4050987868490315832 .section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support 24+_ff_bgr24toUV(%rip) is fine, but 24+LC0(%rip) is a pointer to nothing, and ld breaks: ld: in /var/folders/MY/MYkVh2TwHgKZhNFIG8M3wU+++TI/-Tmp-//cc9dJIWa.o, in section __TEXT,__text reloc 0: local relocation for address 0x000C in section __text does not target section __literal8 I'm going to fix the asm since it looks fragile anyway, but that won't fix existing releases of ffmpeg. Note that creating LC0 is not even an optimization since it doesn't save any space (because the array is __attribute__((used))).
[Bug rtl-optimization/46248] New: 4.6 regression: crash+infinite recursion in combine
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46248 Summary: 4.6 regression: crash+infinite recursion in combine Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: astra...@ithinksw.com Created attachment 22210 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22210 source gcc r166084 crashes compiling ffmpeg libpostproc on x86-64-apple-darwin10. Minimized-ish source attached. gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc46/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.4.0/4.6.0/lto-wrapper Target: x86_64-apple-darwin10.4.0 Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46 --with-arch=native --with-tune=native --disable-nls --with-gmp=/sw --disable-bootstrap --enable-languages=c,c++,lto,objc,obj-c++ Thread model: posix gcc version 4.6.0 20101030 (experimental) (GCC) gcc -O3 -S postprocess.i gcc: internal compiler error: Segmentation fault (program cc1) Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Backtrace: #0 0x00010031fc34 in if_then_else_cond (x=0x1425e14b0, ptrue=0x7fff5f400078, pfalse=0x7fff5f400068) at ../../../src/gcc/gcc/combine.c:8471 #1 0x00010031fd82 in if_then_else_cond (x=0x1425e1498, ptrue=0x7fff5f400118, pfalse=0x7fff5f400108) at ../../../src/gcc/gcc/combine.c:8507 #2 0x00010031fd82 in if_then_else_cond (x=0x1425e14b0, ptrue=0x7fff5f4001b8, pfalse=0x7fff5f4001a8) at ../../../src/gcc/gcc/combine.c:8507 #3 0x00010031fd82 in if_then_else_cond (x=0x1425e1498, ptrue=0x7fff5f400258, pfalse=0x7fff5f400248) at ../../../src/gcc/gcc/combine.c:8507 #4 0x00010031fd82 in if_then_else_cond (x=0x1425e14b0, ptrue=0x7fff5f4002f8, pfalse=0x7fff5f4002e8) at ../../../src/gcc/gcc/combine.c:8507 #5 0x00010031fd82 in if_then_else_cond (x=0x1425e1498, ptrue=0x7fff5f400398, pfalse=0x7fff5f400388) at ../../../src/gcc/gcc/combine.c:8507 #6 0x00010031fd82 in if_then_else_cond (x=0x1425e14b0, ptrue=0x7fff5f400438, pfalse=0x7fff5f400428) at ../../../src/gcc/gcc/combine.c:8507 ...
[Bug target/36503] x86 can use x -y for x 32-y
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503 --- Comment #8 from Alexander Strange astrange at ithinksw dot com 2010-10-21 04:39:36 UTC --- I built ffmpeg for x86-64 with --disable-asm with the attached patch and the regression tests failed. Reverting the patch fixes them. I saved the binaries but haven't investigated yet.
[Bug rtl-optimization/45788] New: -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788 Summary: -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: astra...@ithinksw.com gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc46/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.4.0/4.6.0/lto-wrapper Target: x86_64-apple-darwin10.4.0 Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46 --with-arch=native --with-tune=native --disable-nls --with-gmp=/sw --disable-bootstrap --enable-languages=c,c++,lto,objc,obj-c++ Thread model: posix gcc version 4.6.0 20100924 (experimental) (GCC) gcc -O3 -fwhole-program -S eh_ice.ii eh_ice.ii: In function 'void _ZL9set_colorP9primitive7vectorXIfLi4EE.isra.3.constprop.5(texture**, color4)': eh_ice.ii:93:15: error: BB 3 can not throw but has an EH edge eh_ice.ii:93:15: internal compiler error: verify_flow_info failed Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Removing -fwhole-program fixes it. -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.
[Bug rtl-optimization/45788] -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788 --- Comment #1 from Alexander Strange astrange at ithinksw dot com 2010-09-25 06:51:33 UTC --- BTW, I think the error would be a lot clearer if it printed the pre-cloning/etc function name. -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.
[Bug rtl-optimization/45788] -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788 --- Comment #4 from Alexander Strange astrange at ithinksw dot com 2010-09-25 19:50:29 UTC --- I (probably) definitely attached it, is the attachment form in the new bugs page not working?
[Bug target/44474] GCC inserts redundant test instruction due to incorrect clobber
--- Comment #2 from astrange at ithinksw dot com 2010-08-29 06:39 --- Still happens with the new combine work (not that I really expected it to change). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44474
[Bug target/44073] x86 constants could be unduplicated
--- Comment #5 from astrange at ithinksw dot com 2010-08-08 06:39 --- That commit doesn't reverse cleanly anymore, and I'm not sure how to update it. I don't have any pre-2005 gccs at the moment to test with. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073
[Bug target/44474] GCC inserts redundant test instruction due to incorrect clobber
--- Comment #1 from astrange at ithinksw dot com 2010-07-01 03:43 --- The problem is combine. This: int test2( int *b ) { int b_ = *b; b_--; if( b_ == 0 ) { *b = b_; return foo(); } *b = b_; return 0; } works: _test2: LFB1: movl(%rdi), %eax decl%eax je L7 - uses decl movl%eax, (%rdi) xorl%eax, %eax ret .align 4,0x90 L7: movl$0, (%rdi) xorl%eax, %eax jmp _foo The original turns (*b)-- into load/dec/store/cmp - combine tries to combine dec/store which fails, but doesn't try dec/cmp. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44474
[Bug target/44532] New: x86-64 unnecessary parameter extension
Source: int f1(short a, int b) { return a * b; } int f2(unsigned short a, int b) { return a * b; } gcc -O3 -fomit-frame-pointer -S paramext.c _f1: LFB0: movl%esi, %eax movswl %di, %edi - imull %edi, %eax ret ... _f2: LFB1: movl%esi, %eax movzwl %di, %edi - imull %edi, %eax ret AFAIK integer parameters should already be extended to int, so those instructions are redundant. llvm doesn't generate them. -- Summary: x86-64 unnecessary parameter extension Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44532
[Bug lto/44429] New: ltp ignoring __attribute__((used))
Source: static const int __attribute__((used)) i = 1; int main(void) { int r; __asm__ (movl _i(%%rip), %0 : =r(r)); return r; } /usr/local/gcc46/bin/gcc -O3 -o attrused attrused.c /usr/local/gcc46/bin/gcc -O3 -o attrused attrused.c -flto Undefined symbols: _i, referenced from: _main in ccMflGRF.lto.o ld: symbol(s) not found collect2: ld returned 1 exit status Not sure how to construct a failing program that doesn't involve asm. This is the only thing left preventing ffmpeg (without asm disabled) from compiling under LTO. -- Summary: ltp ignoring __attribute__((used)) Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-apple-darwin10.3.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44429
[Bug lto/44090] lto ice in verify_stmts
--- Comment #3 from astrange at ithinksw dot com 2010-05-24 20:01 --- Fixed itself. Though lto still doesn't build ffmpeg, it's just a different bug now. -- astrange at ithinksw dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090
[Bug rtl-optimization/44223] New: segmentation fault with -g -fsched-pressure
gcc -O3 -g -fsched-pressure -fschedule-insns -S crash1m.i crash1m.i: In function 'ff_adts_write_frame_header': crash1m.i:35:2: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Backtrace: (gdb) run Starting program: /usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/cc1 -fpreprocessed crash1m.i -march=core2 -mcx16 -msahf -maes -mpclmul -mpopcnt -msse4.2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=core2 -fPIC -feliminate-unused-debug-symbols -quiet -dumpbase crash1m.i -mmacosx-version-min=10.6.3 -auxbase crash1m -g -O3 -version -fsched-pressure -fschedule-insns -o crash1m.s Reading symbols for shared libraries .++. done GNU C (GCC) version 4.6.0 20100521 (experimental) (x86_64-apple-darwin10.3.1) compiled by GNU C version 4.2.1 (Apple Inc. build 5659), GMP version 4.3.1, MPFR version 2.4.2-p3, MPC version 0.8 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.6.0 20100521 (experimental) (x86_64-apple-darwin10.3.1) compiled by GNU C version 4.2.1 (Apple Inc. build 5659), GMP version 4.3.1, MPFR version 2.4.2-p3, MPC version 0.8 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 5c588719ada4c17718f398d6d2dbd7a3 Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x0001004edc54 in dying_use_p (use=0x141720070) at ../../../src/gcc/gcc/haifa-sched.c:769 769 if (NONDEBUG_INSN_P (next-insn) (gdb) bt #0 0x0001004edc54 in dying_use_p (use=0x141720070) at ../../../src/gcc/gcc/haifa-sched.c:769 #1 0x0001004f055d in setup_insn_reg_pressure_info [inlined] () at /Users/astrange/Projects/src/gcc/gcc/haifa-sched.c:1130 #2 0x0001004f055d in ready_sort (ready=0x100b0b5e0) at ../../../src/gcc/gcc/haifa-sched.c:1502 #3 0x0001004f5e4b in schedule_block (target_bb=0x7fff5fbfe4e8) at ../../../src/gcc/gcc/haifa-sched.c:3203 #4 0x00010060c8bd in schedule_insns () at ../../../src/gcc/gcc/sched-rgn.c:3001 #5 0x00010060cd4f in rest_of_handle_sched () at ../../../src/gcc/gcc/sched-rgn.c:3512 #6 0x00010059cb3f in execute_one_pass (pass=0x100b99d40) at ../../../src/gcc/gcc/passes.c:1589 #7 0x00010059ce1d in execute_pass_list (pass=0x100b99d40) at ../../../src/gcc/gcc/passes.c:1644 #8 0x00010059ce2f in execute_pass_list (pass=0x100b98ec0) at ../../../src/gcc/gcc/passes.c:1645 #9 0x0001006cd1d0 in invoke_plugin_callbacks [inlined] () at /Users/astrange/Projects/src/gcc/gcc/plugin.h:413 #10 0x0001006cd1d0 in tree_rest_of_compilation (fndecl=0x14252f300) at ../../../src/gcc/gcc/tree-optimize.c:416 #11 0x000100898ef6 in cgraph_expand_function (node=0x14240cd20) at ../../../src/gcc/gcc/cgraphunit.c:1622 #12 0x00010089c07d in cgraph_expand_all_functions [inlined] () at /Users/astrange/Projects/src/gcc/gcc/cgraphunit.c:1701 #13 0x00010089c07d in cgraph_optimize () at ../../../src/gcc/gcc/cgraphunit.c:1957 #14 0x00010089c676 in cgraph_finalize_compilation_unit () at ../../../src/gcc/gcc/cgraphunit.c:1161 #15 0x0001f0f2 in c_write_global_declarations () at ../../../src/gcc/gcc/c-decl.c:9578 #16 0x0001006623c5 in do_compile () at ../../../src/gcc/gcc/toplev.c:1059 #17 0x000100662b1d in toplev_main (argc=32, argv=0x7fff5fbfe828) at ../../../src/gcc/gcc/toplev.c:2433 #18 0x00010f64 in start () -- Summary: segmentation fault with -g -fsched-pressure Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-apple-darwin10.3.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44223
[Bug rtl-optimization/44223] segmentation fault with -g -fsched-pressure
--- Comment #1 from astrange at ithinksw dot com 2010-05-21 02:02 --- Created an attachment (id=20715) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20715action=view) file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44223
[Bug target/44073] New: x86 constants could be unduplicated
void f1(int *a, int *b, int *c) { int d = 0xE0E0E0E0; *a = *b = *c = d; } produces _f1: LFB0: movl$-522133280, (%rdx) movl$-522133280, (%rsi) movl$-522133280, (%rdi) ret on x86-64 at -Os. It would save instruction space and probably not be any slower to actually assign d to a register, but this is only done for 64-bit constants. -- Summary: x86 constants could be unduplicated Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073
[Bug target/44073] x86 constants could be unduplicated
--- Comment #3 from astrange at ithinksw dot com 2010-05-11 10:36 --- It's propagated by vrp1, and then nothing removes it again. tree-uncprop doesn't change it - it looks like it doesn't have anything to handle this, actually. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073
[Bug lto/44090] New: lto ice in verify_stmts
/usr/local/gcc46/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc46/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/lto-wrapper Target: x86_64-apple-darwin10.3.1 Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46 --with-arch=native --with-tune=native --disable-nls --enable-lto --disable-bootstrap LDFLAGS=-L/sw/lib CPPFLAGS=-I/sw/include --enable-languages=c,c++,objc,obj-c++,lto Thread model: posix gcc version 4.6.0 20100511 (experimental) (GCC) The attached files have two different definitions of MpegEncContext. -flto with checking gives an ice on it instead of a readable warning/error: /usr/local/gcc46/bin/gcc -O3 -flto -c h263dec.i /usr/local/gcc46/bin/gcc -O3 -flto -c ituh263dec.i echo h263dec.o ituh263dec.o test /usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/lto1 -O3 @test Reading object files: h263dec.o ituh263dec.o Reading the callgraph Merging declarations Reading summaries Reading function bodies: ff_h263_decode_mb ff_h263_decode_init Performing interprocedural optimizations whole-program In function 'ff_h263_decode_init': lto1: error: type mismatch in address expression unnamed-signed:32 (*T4a5) (struct MpegEncContext *, unnamed-signed:16[64] *) unnamed-signed:32 T4ac (struct MpegEncContext *, unnamed-signed:16[64] *) # .MEM_5 = VDEF .MEM_4(D) s_3-decode_mb = ff_h263_decode_mb; lto1: internal compiler error: verify_stmts failed Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. It looks obviously invalid here, but building ffmpeg with -O3 -flto gives the same ice, and I can't see any bugs that would cause that. It's hard to debug it, though, since it doesn't print the origin files of the mismatched definitions or anything. The original, absolutely not unreduced version: svn co -r23100 svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg cd ffmpeg ./configure --cc=/usr/local/gcc46/bin/gcc --extra-cflags=-flto -O3 --extra-ldflags=-flto -O3 --enable-shared; make ... lots of lto type of ... does not match original declaration warnings that all seem to be wrong ... s_4-decode_mb = ff_h263_decode_mb; lto1: internal compiler error: verify_stmts failed -- Summary: lto ice in verify_stmts Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-apple-darwin10.3.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090
[Bug lto/44090] lto ice in verify_stmts
--- Comment #1 from astrange at ithinksw dot com 2010-05-12 05:27 --- Created an attachment (id=20638) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20638action=view) test file 1 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090
[Bug lto/44090] lto ice in verify_stmts
--- Comment #2 from astrange at ithinksw dot com 2010-05-12 05:27 --- Created an attachment (id=20639) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20639action=view) test file 2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090
[Bug tree-optimization/44063] [4.6 Regression]: build broken for libgcc cris-elf, ICE in cgraph_estimate_size_after_inlining, at ipa-inline
--- Comment #2 from astrange at ithinksw dot com 2010-05-11 03:38 --- Created an attachment (id=20623) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20623action=view) testcase This happens building ffmpeg on x86-64 now. Minimal-ish testcase attached. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44063
[Bug target/43766] New: x86 prefetch doesn't use complex memory addressing
Source: void p(int *a, int i) { __builtin_prefetch(a[i]); } gcc -O3 -fomit-frame-pointer -S prefetch.c _p: movslq %esi, %rsi leaq(%rdi,%rsi,4), %rax prefetcht0 (%rax) ret leaq and prefetch should be merged. -- Summary: x86 prefetch doesn't use complex memory addressing Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43766
[Bug target/43766] x86 prefetch doesn't use complex memory addressing
--- Comment #3 from astrange at ithinksw dot com 2010-04-16 21:19 --- Works with x86-64. Checking -m32, the same thing happens with or without the patch: _p: subl$12, %esp movl20(%esp), %eax sall$2, %eax addl16(%esp), %eax addl$12, %esp prefetcht0 (%eax) ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43766
[Bug rtl-optimization/43721] Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call
--- Comment #1 from astrange at ithinksw dot com 2010-04-12 03:54 --- Still the case with 4.5. arm-none-linux-gnueabi-gcc -Os -S divmod.c cat divmod.s .cpu arm10tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 4 .eabi_attribute 18, 4 .file divmod.c .global __aeabi_idivmod .global __aeabi_idiv .text .align 2 .global divmod .type divmod, %function divmod: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r4, r5, r6, lr} mov r6, r0 mov r5, r1 bl __aeabi_idivmod mov r0, r6 mov r4, r1 mov r1, r5 bl __aeabi_idiv add r0, r4, r0 ldmfd sp!, {r4, r5, r6, pc} .size divmod, .-divmod .ident GCC: (GNU) 4.5.0 20100325 (experimental) .section.note.GNU-stack,,%progbits -- astrange at ithinksw dot com changed: What|Removed |Added CC||astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721
[Bug target/43723] New: Some ARMs support unaligned
Source: struct s { int i; } __attribute__((packed)); int a(struct s *s) { return s-i; } Using 4.5: /usr/local/gcc-arm/bin/arm-none-linux-gnueabi-gcc -Os -mcpu=cortex-a8 -S unaligned.c cat unaligned.s .cpu cortex-a8 .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 4 .eabi_attribute 18, 4 .file unaligned.c .text .align 2 .global a .type a, %function a: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldrbr2, [r0, #1]@ zero_extendqisi2 ldrbr3, [r0, #0]@ zero_extendqisi2 orr r3, r3, r2, asl #8 ldrbr2, [r0, #2]@ zero_extendqisi2 ldrbr0, [r0, #3]@ zero_extendqisi2 orr r3, r3, r2, asl #16 orr r0, r3, r0, asl #24 bx lr .size a, .-a .ident GCC: (GNU) 4.5.0 20100325 (experimental) .section.note.GNU-stack,,%progbits At least some configurations of cortex-a8 support unaligned access just fine, so it should be possible to use it. But it doesn't look like it is - there is no -mno-strict-align for arm. This would be a major code size reduction for FFmpeg. -- Summary: Some ARMs support unaligned Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: arm-unknown-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43723
[Bug target/43550] New: arm missing rev16
typedef unsigned short uint16_t; typedef unsigned int uint32_t; uint16_t s16(uint16_t v) { return v8|v8; } uint32_t s32(uint32_t v) { return __builtin_bswap32(v); } gcc -O3 -mcpu=cortex-a8 -S bswap.c s16: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, r0, lsr #8 orr r0, r3, r0, asl #8 uxthr0, r0 bx lr s32: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. rev r0, r0 bx lr It generates 32-bit bswap using rev but not 16-bit using rev16. x86 can do both. -- Summary: arm missing rev16 Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: arm-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43550
[Bug lto/43373] whopr+linker plugin ICE compressed stream data error
--- Comment #2 from astrange at ithinksw dot com 2010-03-15 11:10 --- The last two commands were the source and testcase. Should have spaced it out more. i don't have enough memory allocated to this VM to build ffmpeg without whopr, so I thought i'd try the more experimental path first. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43373
[Bug lto/43342] lto1: internal compiler error: failed to reclaim unneeded function
--- Comment #4 from astrange at ithinksw dot com 2010-03-14 23:33 --- This happens building ffmpeg --enable-shared with -fwhopr. I can make a testcase out of that if needed. -- astrange at ithinksw dot com changed: What|Removed |Added CC||astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43342
[Bug lto/43372] New: lto ICE in strip_extension with linker plugin
Source: a.c: int a() { return 0; } b.c: extern int a(); int b() { a(); } gcc -fwhopr -c a.c b.c ar r liba.a a.o gcc -fwhopr -fuse-linker-plugin -shared -o libb.so b.o liba.a lto1: internal compiler error: in strip_extension, at lto/lto.c:910 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: /usr/local/gcc45/bin/gcc returned 1 exit status /usr/bin/ld: fatal error: lto-wrapper failed collect2: ld returned 1 exit status It fails trying to strip .o from liba.a. (I added an extra line to print that, so the ICE line number is off by 1.) Using gcc 20100314 and gold from Ubuntu binutils-gold 2.20-0ubuntu2. -- Summary: lto ICE in strip_extension with linker plugin Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43372
[Bug lto/43373] New: whopr+linker plugin ICE compressed stream data error
gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc45/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper Target: i686-pc-linux-gnu Configured with: ../gcc/configure --with-arch=native --with-tune=native --disable-bootstrap --with-mpc=/usr/local --enable-languages=c,c++,objc,lto --enable-gold --enable-lto --prefix=/usr/local/gcc45 Thread model: posix gcc version 4.5.0 20100314 (experimental) (GCC) ld --version GNU gold (GNU Binutils for Ubuntu 2.20) 1.9 Copyright 2008 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) a later version. This program has absolutely no warranty. cat a.c int main(void) {return 0;} gcc -fwhopr -fuse-linker-plugin -o a a.c -save-temps lto1: internal compiler error: compressed stream: data error Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. lto1: fatal error: /usr/local/gcc45/bin/gcc terminated with status 256 compilation terminated. lto-wrapper: /usr/local/gcc45/bin/gcc returned 1 exit status /usr/bin/ld: fatal error: lto-wrapper failed collect2: ld returned 1 exit status Works without -fuse-linker-plugin. This prevents ffmpeg and x264 from configuring for me if I put -fwhopr -fuse-linker-plugin in the CFLAGS/LDFLAGS. -- Summary: whopr+linker plugin ICE compressed stream data error Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43373
[Bug lto/43318] New: LTO ICE with minimal C++ program
Using svn r157325 on Ubuntu. /usr/local/gcc45/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/local/gcc45/bin/g++ COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper Target: i686-pc-linux-gnu Configured with: ../gcc/configure --enable-threads=posix --with-arch=native --with-tune=native --disable-nls --disable-bootstrap --prefix=/usr/local/gcc45 --with-mpc=/usr/local --enable-languages=c,c++,objc,lto --enable-lto --enable-gold Thread model: posix gcc version 4.5.0 20100309 (experimental) (GCC) Source: void a() { } /usr/local/gcc45/bin/g++ -flto -c a.cpp /usr/local/gcc45/bin/g++ -flto -O -r -nostdlib a.o a/0(-1) @0xb769b398 availability:available needed reachable body externally_visible finalized called by: calls: callgraph: a/0(-1) @0xb769b398 availability:available needed reachable body externally_visible finalized called by: calls: lto1: internal compiler error: in propagate, at ipa-reference.c:1244 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: /usr/local/gcc45/bin/g++ returned 1 exit status collect2: lto-wrapper returned 1 exit status /usr/local/gcc45/bin/g++ -flto -O -fno-ipa-reference -r -nostdlib a.o lto1: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: /usr/local/gcc45/bin/g++ returned 1 exit status collect2: lto-wrapper returned 1 exit status -- Summary: LTO ICE with minimal C++ program Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318
[Bug lto/43318] LTO ICE with minimal C++ program
--- Comment #1 from astrange at ithinksw dot com 2010-03-10 00:32 --- Actually, it doesn't work in C either. I find that unlikely, time to make sure I didn't build it wrong somehow... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318
[Bug lto/43318] LTO ICE with minimal C++ program
--- Comment #3 from astrange at ithinksw dot com 2010-03-10 00:37 --- *** This bug has been marked as a duplicate of 42402 *** -- astrange at ithinksw dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318
[Bug lto/42402] ICE in propagate, at ipa-reference.c:1244
--- Comment #2 from astrange at ithinksw dot com 2010-03-10 00:37 --- *** Bug 43318 has been marked as a duplicate of this bug. *** -- astrange at ithinksw dot com changed: What|Removed |Added CC||astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42402
[Bug target/43233] New: x86 flags not combined across blocks
Source: int g1,g2,g3; int f1(int a, int b) { a = 1; if (a) return g1; return g2; } int f2(int a, int b) { a = 1; if (b) g3++; if (a) return g1; return g2; } Compiled with: gcc -O3 -fomit-frame-pointer -S and_flags.c f1 is ok but f2 generates this: _f2: andl$1, %edi -- #1 testl %esi, %esi je L7 movq_...@gotpcrel(%rip), %rax incl(%rax) L7: testl %edi, %edi -- #2 jne L10 movq_...@gotpcrel(%rip), %rax movl(%rax), %eax ret .align 4,0x90 L10: movq_...@gotpcrel(%rip), %rax movl(%rax), %eax ret The andl and testl should be folded into one andl. Code is reduced from ffmpeg h264 decoder. It's easy to work around by reordering source lines, so not too important. -- Summary: x86 flags not combined across blocks Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43233
[Bug tree-optimization/43224] New: Constant load not raised out of loop
Source: #include string.h void dequant_lsps(double *lsps, int num, const unsigned short *values, int n_stages, const unsigned char * __restrict table, const double * __restrict mul_q, const double * __restrict base_q) { const unsigned char *t_off = table[values[0] * num]; int m; memset(lsps, 0, num * sizeof(*lsps)); for (m = 0; m num; m++) lsps[m] += base_q[0] + mul_q[0] * t_off[m]; } /usr/local/gcc45/bin/gcc -O3 -S base_lsp.c The inner loop: L3: movzbl (%r15), %edx incq%r15 cvtsi2sd%edx, %xmm0 mulsd 0(%r13), %xmm0 - constant (and 0 prefix) addsd (%r14), %xmm0 - constant addsd (%rbx,%rax), %xmm0 movsd %xmm0, (%rbx,%rax) addq$8, %rax cmpq%rcx, %rax jne L3 Rest of the output attached. base_q and mul_q should be loaded outside of the loop but aren't. I added __restrict to base_q/mul_q/table, but it didn't affect it. Code is reduced from FFmpeg WMA Voice decoder. -- Summary: Constant load not raised out of loop Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-apple-darwin10.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224
[Bug tree-optimization/43224] Constant load not raised out of loop
--- Comment #1 from astrange at ithinksw dot com 2010-03-02 03:45 --- Created an attachment (id=20002) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20002action=view) x86-64 asm output -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224
[Bug tree-optimization/43224] Constant load not raised out of loop
--- Comment #4 from astrange at ithinksw dot com 2010-03-02 04:00 --- Is it possible for aliased writes to affect a const pointer? I was assuming that it wasn't. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224
[Bug target/43225] New: Structure copies not vectorized
Source: #include emmintrin.h struct a1 { char l[16];}; struct a2 { __m128i l; }; void f1(struct a1 *a, struct a1 *b) { *a = *b; } void f2(struct a2 *a, struct a2 *b) { *a = *b; } /usr/local/gcc45/bin/gcc -O3 -fomit-frame-pointer -S copy_gcc.c _f1: movq(%rsi), %rax movq%rax, (%rdi) movq8(%rsi), %rax movq%rax, 8(%rdi) ret _f2: movdqa (%rsi), %xmm0 movdqa %xmm0, (%rdi) ret Both are appropriately aligned and should use movdqa. This might not show up in generic code, but I could have used it in an ffmpeg optimization. -- Summary: Structure copies not vectorized Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC host triplet: x86_64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225
[Bug target/43225] Structure copies not vectorized
--- Comment #2 from astrange at ithinksw dot com 2010-03-02 05:31 --- -fdump-tree-slp-details: copy_gcc.c:8: note: ===vect_slp_analyze_bb=== copy_gcc.c:8: note: === vect_analyze_data_refs === Creating dr for *b_2(D) analyze_innermost: success. base_address: b_2(D) offset from base address: 0 constant offset from base address: 0 step: 0 aligned to: 128 base_object: *b_2(D) Creating dr for *a_1(D) analyze_innermost: success. base_address: a_1(D) offset from base address: 0 constant offset from base address: 0 step: 0 aligned to: 128 base_object: *a_1(D) copy_gcc.c:8: note: not vectorized: no vectype for stmt: *a_1(D) = *b_2(D); scalar_type: struct a1 copy_gcc.c:8: note: not vectorized: unhandled data-ref in basic block. f1 (struct a1 * a, struct a1 * b) { bb 2: *a_1(D) = *b_2(D); return; } Though I tried it with __attribute__((aligned)) too. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225
[Bug tree-optimization/42211] New: Segmentation fault with graphite -floop-interchange
gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc45/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/x86_64-apple-darwin10.2.0/4.5.0/lto-wrapper Target: x86_64-apple-darwin10.2.0 Configured with: ../gcc/configure --prefix=/usr/local/gcc45 --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --with-ppl=/sw --with-cloog=/sw --with-libelf=/sw --disable-nls --disable-bootstrap LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,lto,objc,obj-c++ Thread model: posix gcc version 4.5.0 20091129 (experimental) (GCC) Using r154734. With attached source: gcc -O3 -floop-interchange -S graphite_crash.i graphite_crash.i: In function 'border_mirror_480': graphite_crash.i:17:6: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. It doesn't happen reliably to me with -v -Q, so I can't check with gdb. Valgrind gives: ==12758== Invalid read of size 8 ==12758==at 0x1004AE4A3: lst_do_interchange_1 (graphite-interchange.c:709) ==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730) ==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734) ==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748) ==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260) ==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276) ==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300) ==12758==by 0x10057D522: execute_one_pass (passes.c:1522) ==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577) ==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578) ==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578) ==12758==by 0x1006AAA80: tree_rest_of_compilation (tree-optimize.c:408) ==12758== Address 0x141c25210 is 16 bytes inside a block of size 24 free'd ==12758==at 0x140EB88DC: free (vg_replace_malloc.c:325) ==12758==by 0x1004AE00C: lst_try_interchange (graphite-poly.h:704) ==12758==by 0x1004AE49F: lst_do_interchange_1 (graphite-interchange.c:710) ==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730) ==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734) ==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748) ==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260) ==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276) ==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300) ==12758==by 0x10057D522: execute_one_pass (passes.c:1522) ==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577) ==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578) ==12758== ==12758== Invalid read of size 8 ==12758==at 0x1004AE534: lst_do_interchange (graphite-interchange.c:732) ==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734) ==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748) ==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260) ==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276) ==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300) ==12758==by 0x10057D522: execute_one_pass (passes.c:1522) ==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577) ==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578) ==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578) ==12758==by 0x1006AAA80: tree_rest_of_compilation (tree-optimize.c:408) ==12758==by 0x100866F56: cgraph_expand_function (cgraphunit.c:1178) ==12758== Address 0x141c25210 is 16 bytes inside a block of size 24 free'd ==12758==at 0x140EB88DC: free (vg_replace_malloc.c:325) ==12758==by 0x1004AE00C: lst_try_interchange (graphite-poly.h:704) ==12758==by 0x1004AE49F: lst_do_interchange_1 (graphite-interchange.c:710) ==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730) ==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734) ==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748) ==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260) ==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276) ==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300) ==12758==by 0x10057D522: execute_one_pass (passes.c:1522) ==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577) ==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578) -- Summary: Segmentation fault with graphite -floop-interchange Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC build triplet: x86_64-apple
[Bug tree-optimization/42211] Segmentation fault with graphite -floop-interchange
--- Comment #1 from astrange at ithinksw dot com 2009-11-29 09:38 --- Created an attachment (id=19175) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19175action=view) somewhat-reduced source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42211
[Bug c/42136] New: Inconsistent strict-aliasing warning with cast from char[]
Source: typedef union u { unsigned i; unsigned short s[2]; unsigned char c[4]; } u; char c[4] __attribute__((aligned)); short s[2] __attribute__((aligned)); int f1() { return ((union u*)s)-i; } int f2() { return ((union u*)c)-i; } Using gcc 4.5: gcc -O3 -fstrict-aliasing -Wall -S wstrict_aliasing_char.c wstrict_aliasing_char.c: In function 'f2': wstrict_aliasing_char.c:13:17: warning: dereferencing type-punned pointer will break strict-aliasing rules I would expect either both or neither of the functions to warn, since pointer casting to unions is given in the manual as something that violates strict-aliasing, although gcc doesn't seem to actually take advantage of this. Instead, it looks like the warning is hardcoded to apply to a cast from char (c-common.c:1746 in r1554411): alias_set_type set1 = get_alias_set (TREE_TYPE (TREE_OPERAND (expr, 0))); alias_set_type set2 = get_alias_set (TREE_TYPE (type)); if (set1 != set2 set2 != 0 (set1 == 0 || !alias_sets_conflict_p (set1, set2))) { warning (OPT_Wstrict_aliasing, dereferencing type-punned pointer will break strict-aliasing rules); return true; } This came up during some x264 work, but it's taken care of now with some __attribute__((may_alias)). -- Summary: Inconsistent strict-aliasing warning with cast from char[] Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42136
[Bug tree-optimization/36646] [4.3/4.4/4.5 Regression] Unnecessary moves generated on loop boundaries
--- Comment #8 from astrange at ithinksw dot com 2009-11-07 09:03 --- Closing. -- astrange at ithinksw dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646
[Bug tree-optimization/36646] [4.3/4.4/4.5 Regression] Unnecessary moves generated on loop boundaries
--- Comment #7 from astrange at ithinksw dot com 2009-10-20 21:10 --- Tried with SVN today and it's fixed: L6: incb(%ebx) jmp L12 .align 4,0x90 Close if you want; I don't think it's worth finding when this happened. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646
[Bug inline-asm/11203] source doesn't compile with -O0 but they compile with -O3
--- Comment #40 from astrange at ithinksw dot com 2009-10-18 19:56 --- Linked from http://x264dev.multimedia.cx/?p=185, I'd forgotten all about the ridiculous flamewar in this one. Just as a note, the actual definitions of the four variables (from liba52): x2k = x + 2 * k; x3k = x2k + 2 * k; x4k = x3k + 2 * k; wB = wTB + 2 * k; Also, the asm inputs are silly - output 0 is the same as input 6 for no reason, and the same with output 2 and input 7. So change those to +m and change %6/%7 to %0/%2. That doesn't actually change anything, even though it should free two registers. It works with gcc 4.5 -O0 -fno-pic -fomit-frame-pointer, but not without one of those flags. Looks like that's because it's allocating 2 more registers for the unused fake inputs for the +m - change it to =m and it works with one flag removed, but still not both. So there's a specific bug. And of course it all works at -O1 because it doesn't have to use registers there. So maybe it should just do that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11203
[Bug tree-optimization/40992] [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size
--- Comment #3 from astrange at ithinksw dot com 2009-08-08 16:44 --- Maybe the C version will be usable after everyone is using 4.4+, earlier versions tend to make a mess. Anyway, counting newlines for size estimation wouldn't pessimize anything. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992
[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86
--- Comment #5 from astrange at ithinksw dot com 2009-08-07 03:04 --- Fixed with -O3 -fgraphite-identity. Why did I even bother checking that? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/40992] New: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size
The attached file is a loop over the same function implemented in C and inline asm. When compiled with: gcc -O3 -fno-pic -fomit-frame-pointer -fdump-tree-cunroll-details -S cabac_unroll.i cunroll thinks they're different sizes: size: 55-4, last_iteration: 55-4 Loop size: 55 Estimated size after unrolling: 442 size: 8-4, last_iteration: 8-4 Loop size: 8 Estimated size after unrolling: 34 and expands the asm loop all 13 times. This is reduced from ffmpeg decode_cabac_residual, where it apparently causes significant decoding slowdown. Besides that, cunroll seems to be hurting ffmpeg in general on x86-32 (http://multimedia.cx/eggs/last-performance-smackdown-for-awhile/), maybe we'll turn it down some. -- Summary: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992
[Bug tree-optimization/40992] [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size
--- Comment #1 from astrange at ithinksw dot com 2009-08-07 04:25 --- Created an attachment (id=18315) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18315action=view) the source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992
[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os
--- Comment #4 from astrange at ithinksw dot com 2009-06-05 04:31 --- This bug must have been weaker than I remembered it; when I used 4 char fields instead of one char[4], 4.4 behaved properly too. How about: Alexander Strange astra...@ithinksw.com PR tree-optimization/36318 * gcc.dg/tree-ssa/sra-7.c: New test. /* { dg-do compile } */ /* { dg-options -O1 -fdump-tree-sra-details } */ typedef struct {char f[4];} __attribute__((aligned (4))) s; void a(s *s1, s *s2) { *s1 = *s2; } /* Struct copies should not be split into members */ /* { dg-final { scan-tree-dump = \\\*s2 sra} } */ /* { dg-final { cleanup-tree-dump sra } } */ I checked sra instead of esra since it runs last and this is a negative test. Hopefully this is trivial? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318
[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os
--- Comment #2 from astrange at ithinksw dot com 2009-05-30 00:19 --- Fixed with new SRA: _foo1: subl$12, %esp movl20(%esp), %eax movl(%eax), %edx movl16(%esp), %eax movl%edx, (%eax) addl$12, %esp ret -- astrange at ithinksw dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318
[Bug c/2803] casts in asm act as lvalues
--- Comment #12 from astrange at ithinksw dot com 2009-05-25 20:26 --- I noticed this is still accepted by gcc 4.5; one stuck into ffmpeg and broke the build with another compiler. For instance, this only fails in c(): int as(int a) { asm ( : : m((int)a)); } int c(int a) { return *((int)a); } /usr/local/gcc45/bin/gcc -S test.c test.c: In function 'c': test.c:8: error: lvalue required as unary '' operand -- astrange at ithinksw dot com changed: What|Removed |Added CC||astrange at ithinksw dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=2803
[Bug target/39337] New: x86 use of VLA disables -fomit-frame-pointer
Using gcc 4.4.0 20090226 with -Os on: int f(int a) { if (!a) { return 0; } else { volatile int vla[a]; vla[0] = 0; return vla[0]; } } gives: _f: pushl %ebp xorl%eax, %eax movl%esp, %ebp subl$8, %esp movl8(%ebp), %edx testl %edx, %edx je L3 movl%esp, %ecx leal30(,%edx,4), %eax andl$-16, %eax subl%eax, %esp leal15(%esp), %eax andl$-16, %eax movl$0, (%eax) movl(%eax), %eax movl%ecx, %esp L3: leave ret Adding -fomit-frame-pointer gives the exact same result. ebp shouldn't be saved here, since esp is saved to and restored from ecx anyway, so it's not actually used for anything. This isn't just a problem for crazy asm - gcc errors if an asm clobbers ebp in a function with VLAs- but also means that inlining a function with VLAs makes generated code worse, since the entire function loses one register. -- Summary: x86 use of VLA disables -fomit-frame-pointer Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC build triplet: i?86-*-* GCC host triplet: i?86-*-* GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39337
[Bug target/39337] x86 use of VLA disables -fomit-frame-pointer
--- Comment #3 from astrange at ithinksw dot com 2009-03-02 02:39 --- This is correct, vla and alloca always uses a frame pointer because there is no way to get back to the original offsets so the compiler needs a frame pointer. It's not restoring from the frame pointer here, it's restoring from ecx. 'addl $8, %esp' would work just as well in the function epilogue, like it would if this function had no VLA. Disabling inlining does fix that problem, though. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39337
[Bug target/39329] New: x86 -Os could use mulw for (uint16 * uint16)16
Using 'gcc -Os -fomit-frame-pointer -march=core2 -mtune=core2' for unsigned short mul_high_c(unsigned short a, unsigned short b) { return (unsigned)(a * b) 16; } unsigned short mul_high_asm(unsigned short a, unsigned short b) { unsigned short res; asm(mulw %w2 : =d(res),+a(a) : rm(b)); return res; } I get _mul_high_c: subl$12, %esp movzwl 20(%esp), %eax movzwl 16(%esp), %edx addl$12, %esp imull %edx, %eax shrl$16, %eax ret _mul_high_asm: subl$12, %esp movl16(%esp), %eax mulw 20(%esp) addl$12, %esp movl%edx, %eax ret mulw puts its outputs in dx:ax, and dx contains (dx:ax)16, so the shift is avoided. Ignoring the weird Darwin stack adjustment code, the version with mulw is somewhat shorter and avoids a movzwl. I'm not sure what the performance difference is; mulw is listed in Agner's tables as fairly low latency, but requires a length changing prefix for memory. This type of operation is useful in fixed-point math, such as embedded audio codecs or arithmetic coders. -- Summary: x86 -Os could use mulw for (uint16 * uint16)16 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC build triplet: i?86-*-* GCC host triplet: i?86-*-* GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39329
[Bug target/39123] New: x86 asm *(a+b) input causes out of registers above -O0
Using gcc version 4.4.0 20090207 (experimental) (GCC) /usr/local/gcc44/bin/gcc -O0 -fno-pic -fomit-frame-pointer -S cabac-ret.i /usr/local/gcc44/bin/gcc -O1 -fno-pic -fomit-frame-pointer -S cabac-ret.i cabac-ret.i: In function 'get_cabac_minput': cabac-ret.i:24: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm' cabac-ret.i:24: error: 'asm' operand has impossible constraints This is an asm using 7 registers; above -O0 one of the inputs in the second version is combined into a complex memory operand, which uses 8 registers in one statement and fails to compile. It would be nice if it could fall back to a seperate add for x86-32, since the memory clobber in the first version might cause suboptimal code. -- Summary: x86 asm *(a+b) input causes out of registers above -O0 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC build triplet: i?86-*-* GCC host triplet: i?86-*-* GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39123
[Bug target/39123] x86 asm *(a+b) input causes out of registers above -O0
--- Comment #1 from astrange at ithinksw dot com 2009-02-07 06:13 --- Created an attachment (id=17265) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17265action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39123
[Bug target/32593] Missed optimization of 'y = constant - x' operation
--- Comment #4 from astrange at ithinksw dot com 2008-12-17 22:10 --- Causes silly code on i386 with this: void pred8x8l_vertical_add_c(unsigned char *pix, const short *block, int stride){ int i; for(i=0; i8; i++){ int j; for (j=0; j8; j++){ pix[j] = pix[j-stride] + block[j]; } pix+= stride; block+= 8; } } where it calculates and then spills each of [0-7] - stride to the stack, instead of just being able to keep -stride in a register and incrementing it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593
[Bug target/36539] IRA+i386 doesn't allocate asm output being returned to eax
--- Comment #8 from astrange at ithinksw dot com 2008-12-05 20:08 --- With some recent changes IRA makes better decisions now but they don't survive reload. Using /gcc -O3 -fomit-frame-pointer -fno-pic -fdump-rtl-ira -S cabac-ret.i I get about the same asm and this in the IRA dump: Allocnos coloring: Loop 0 (parent -1, header bb0, depth 0) bbs: 2 all: 0r64 1r58 2r62 3r59 4r60 5r63 modified regnos: 58 59 60 62 63 64 border: Pressure: GENERAL_REGS=6 Reg 58 of GENERAL_REGS has 2 regs less Reg 62 of GENERAL_REGS has 2 regs less Reg 59 of GENERAL_REGS has 2 regs less Reg 60 of GENERAL_REGS has 2 regs less Reg 63 of GENERAL_REGS has 2 regs less Pushing a0(r64,l0) Pushing a3(r59,l0)(potential spill: pri=2857, cost=2) Pushing a1(r58,l0) Pushing a5(r63,l0) Pushing a2(r62,l0) Pushing a4(r60,l0) Popping a4(r60,l0) -- assign reg 3 Popping a2(r62,l0) -- assign reg 4 Popping a5(r63,l0) -- assign reg 0 - r(state) Popping a1(r58,l0) -- assign reg 0 - =r(bit) Popping a3(r59,l0) -- assign reg 5 Popping a0(r64,l0) -- assign reg 0 - returned bit1 a1 and a5 should be conflicting, since a1 is an earlyclobber output and can't share a register with any of the inputs. reload fixes this by moving it to a worse register. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug target/36539] IRA+i386 doesn't allocate asm output being returned to eax
--- Comment #7 from astrange at ithinksw dot com 2008-09-18 01:29 --- Updated to 32-bit only. -- astrange at ithinksw dot com changed: What|Removed |Added Severity|normal |enhancement GCC target triplet|x86_64-*-* |i?86-*-* Summary|IRA doesn't allocate asm|IRA+i386 doesn't allocate |output being returned to eax|asm output being returned to ||eax http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax
--- Comment #5 from astrange at ithinksw dot com 2008-09-04 04:02 --- It is fixed for me on x86-64. For i386 it's still suboptimal: _get_cabac: subl$28, %esp movl%esi, 16(%esp) movl%edi, 20(%esp) movl%ebx, 12(%esp) movl%ebp, 24(%esp) movl32(%esp), %esi movl36(%esp), %edi movl(%esi), %eax movl4(%esi), %ebx # 16 ../cabac-ret.i 1 #%ebp %ebx %ax 16(%esi) %edi # 0 2 movl%eax, (%esi) movl%ebx, 4(%esi) movl%ebp, %eax movl12(%esp), %ebx andl$1, %eax movl16(%esp), %esi movl20(%esp), %edi movl24(%esp), %ebp addl$28, %esp ret but not a regression (code is worse without IRA). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug rtl-optimization/36673] IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389
--- Comment #5 from astrange at ithinksw dot com 2008-08-27 04:27 --- Fixed. -- astrange at ithinksw dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673
[Bug rtl-optimization/36672] IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829
--- Comment #4 from astrange at ithinksw dot com 2008-08-27 04:28 --- Fixed. -- astrange at ithinksw dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672
[Bug rtl-optimization/36663] IRA ICE in save_call_clobbered_regs at caller-save.c:1949
--- Comment #4 from astrange at ithinksw dot com 2008-08-27 04:28 --- Fixed. -- astrange at ithinksw dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663
[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax
--- Comment #3 from astrange at ithinksw dot com 2008-08-27 04:41 --- Now it is. -- astrange at ithinksw dot com changed: What|Removed |Added Summary|IRA doesn't allocate asm|[4.4 regression] IRA doesn't |output being returned to eax|allocate asm output being ||returned to eax http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug rtl-optimization/36663] New: IRA ICE in save_call_clobbered_regs at caller-save.c:1949
gcc -v Using built-in specs. Target: i386-apple-darwin9.3.0 Configured with: ../gcc/configure --prefix=/usr/local/gcc44-ira --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080530 (experimental) (GCC) gcc -O3 -fira -S ira-ice.i ira-ice.i: In function 'avf_sdp_create': ira-ice.i:59: internal compiler error: in save_call_clobbered_regs, at caller-save.c:1949 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. This only happens at -O3, not anything below. -- Summary: IRA ICE in save_call_clobbered_regs at caller- save.c:1949 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663
[Bug rtl-optimization/36663] IRA ICE in save_call_clobbered_regs at caller-save.c:1949
--- Comment #1 from astrange at ithinksw dot com 2008-06-29 07:14 --- Created an attachment (id=15828) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15828action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663
[Bug rtl-optimization/36672] New: IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829
gcc -v Using built-in specs. Target: i386-apple-darwin9.3.0 Configured with: ../gcc/configure --prefix=/usr/local/gcc44-ira --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080530 (experimental) (GCC) gcc -O3 -fira -fno-pic -S ira-ice2.i ira-ice2.i:38: internal compiler error: in emit_swap_insn, at reg-stack.c:829 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Happens with -O2, but not below that, and not without -fno-pic. -- Summary: IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672
[Bug rtl-optimization/36672] IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829
--- Comment #1 from astrange at ithinksw dot com 2008-06-29 21:35 --- Created an attachment (id=15830) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15830action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672
[Bug rtl-optimization/36673] New: IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389
gcc -v Using built-in specs. Target: i386-apple-darwin9.3.0 Configured with: ../gcc/configure --prefix=/usr/local/gcc44-ira --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080530 (experimental) (GCC) gcc -O3 -fira -fno-pic -S ira-ice.i ira-ice.i: In function 'MPV_motion_lowres': ira-ice.i:201: internal compiler error: in save_con_fun_n, at caller-save.c:1389 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Without -fno-pic there's a different ICE, and with a lower -O it compiles. Besides these three ICEs there are several miscompiles of ffmpeg r14025, all of which cause it to crash on startup or enter infinite loops, so I guess I can't benchmark IRA for now. -- Summary: IRA -O3 -fno-pic ICE in save_con_fun_n, at caller- save.c:1389 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673
[Bug rtl-optimization/36673] IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389
--- Comment #1 from astrange at ithinksw dot com 2008-06-29 21:41 --- Created an attachment (id=15831) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15831action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673
[Bug target/36661] New: x86 asm +r operands cause unnecessary spills/copies
Compiling the attached source on i386 with: gcc -O3 -fomit-frame-pointer -fno-pic -S asm-spills.i produces: .text .align 4,0x90 .globl _get_cabac_noinline _get_cabac_noinline: subl$76, %esp movl%esi, 64(%esp) movl%edi, 68(%esp) movl%ebp, 72(%esp) movl%ebx, 60(%esp) movl80(%esp), %esi movl84(%esp), %edi movl(%esi), %edx movl4(%esi), %ebx movl%edx, 28(%esp) # unused spill movl%edx, %ebp # pointless move # 24 ../strange-spills.i 1 #%eax %bp %ebx 16(%esi) %edx (%edi) # 0 2 movl%ebp, (%esi) movl%ebx, 4(%esi) andl$1, %eax movl60(%esp), %ebx movl%eax, 44(%esp) #unused spill movl64(%esp), %esi movl68(%esp), %edi movl72(%esp), %ebp addl$76, %esp ret .subsections_via_symbols which has several unnecessary stack spills. Reading through RTL dumps: - everything is fine before asmcons - asmcons inserts copies of c-low/range after they're loaded. There's no point to this, since the original is never used later, but I guess there isn't a problem as long as it's cleaned up. - Somehow, the RA gets confused by the asmcons copy and the later one to copy the return value into eax. Instead of assigning both sides of the copy to the same register (which is obviously possible), or even using mov, it spills and reloads them into different registers. - Later passes optimize away the reloads but keep the stores. This isn't a regression (gcc 3.4 isn't much better) and still happens in the IRA branch. For some reason, changing =q(tmp) to =d improves IRA but not trunk. This is about the same source as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539. -- Summary: x86 asm +r operands cause unnecessary spills/copies Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36661
[Bug target/36661] x86 asm +r operands cause unnecessary spills/copies
--- Comment #1 from astrange at ithinksw dot com 2008-06-28 23:35 --- Created an attachment (id=15823) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15823action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36661
[Bug tree-optimization/36646] New: [4.4 regression] Unnecessary moves generated on loop boundaries
The attached source is a loop+switch statement, where only some of the switch cases change the variable 'val'. 4.4 generates moves for it in every case, even the ones where it's not mentioned, while 4.2 didn't; the difference is visible in tree dumps. This part: case Op_Inc1: (*tape)++; break; with 4.2 at -O: L3:; *tape = *tape + 1; goto bb 3 (L0); L5: incb(%edx) jmp L13 SVN at -O: L3:; *tape.17 = *tape.17 + 1; val.16 = val; goto bb 3 (L10); L6: incb(%esi) movl%edx, %eax jmp L10 Suprisingly, -O3 is worse: L6: movl%edx, %eax incb(%esi) movl%eax, %edx jmp L2 IRA doesn't improve it. This isn't from real-world code, so it's not really important, but I'd like to make a code-copying VM out of this. -- Summary: [4.4 regression] Unnecessary moves generated on loop boundaries Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646
[Bug tree-optimization/36646] [4.4 regression] Unnecessary moves generated on loop boundaries
--- Comment #1 from astrange at ithinksw dot com 2008-06-27 04:57 --- Created an attachment (id=15818) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15818action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646
[Bug tree-optimization/36646] [4.4 regression] Unnecessary moves generated on loop boundaries
--- Comment #2 from astrange at ithinksw dot com 2008-06-27 05:04 --- Created an attachment (id=15819) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15819action=view) svn 20080625 + -O compile -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646
[Bug target/36539] New: [4.4 regression] IRA doesn't allocate asm output being returned to eax
Using today's IRA branch (r136683), on the attached file. gcc -O3 -fno-pic -fomit-frame-pointer -m64 -S cabac-ret.i -fira _get_cabac: LFB2: pushq %rbx LCFI0: movl(%rdi), %eax movl4(%rdi), %r8d # 16 cabac-ret.i 1 #%ebx %r8d %ax 24(%rdi) %rsi # 0 2 movl%eax, (%rdi) movl%r8d, 4(%rdi) movl%ebx, %eax popq%rbx andl$1, %eax ret with an unnecessary mov %ebx, %eax. Without -fira: movl(%rdi), %r8d movl4(%rdi), %r9d # 16 cabac-ret.i 1 #%eax %r9d %r8w 24(%rdi) %rsi # 0 2 movl%r8d, (%rdi) movl%r9d, 4(%rdi) andl$1, %eax ret Both allocators don't allocate bit to eax in 32-bit mode, though all other compilers with inline asm support I tried did. gcc 3.3 does, as well, but no other version seemed to. In this case it's not a problem, since changing the class to =a fixes it, but the function will be inlined a lot and I don't want to put unnecessary constraints on it. -- Summary: [4.4 regression] IRA doesn't allocate asm output being returned to eax Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: x86_64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax
--- Comment #1 from astrange at ithinksw dot com 2008-06-14 06:48 --- Created an attachment (id=15771) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15771action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539
[Bug target/36503] x86 can use x -y for x 32-y
--- Comment #4 from astrange at ithinksw dot com 2008-06-12 16:48 --- Maybe it seemed likely to cause a warning - I haven't checked that yet, though. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503
[Bug target/36502] New: i386/darwin generates unnecessary stack ops in every function
gcc -v Using built-in specs. Target: i386-apple-darwin9.2.2 Configured with: ../gcc/configure --prefix=/usr/local/gcc44 --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080611 (experimental) (GCC) gcc changes esp in every function, even if it has no stack values. Given: int a; void f() {a++;} gcc -O -fomit-frame-pointer -fno-pic -S add.c _f: subl$12, %esp incl_a addl$12, %esp ret Apple's GCC doesn't do this and neither does 4.4 on other systems (as far as I know). -- Summary: i386/darwin generates unnecessary stack ops in every function Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i386-apple-darwin* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36502
[Bug target/36503] New: x86 can use x -y for x 32-y
gcc -v Using built-in specs. Target: i386-apple-darwin9.2.2 Configured with: ../gcc/configure --prefix=/usr/local/gcc44 --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080611 (experimental) (GCC) gcc compiles int shift32(int i, int n) { return i (32 - n); } to _shift32: subl$12, %esp movl$32, %ecx subl20(%esp), %ecx movl16(%esp), %eax sarl%cl, %eax addl$12, %esp ret Since all 286-and-up CPUs only use the low 5 bits of ecx when shifting, this can be: _shift32: movl8(%esp), %ecx movl4(%esp), %eax negl %ecx sarl%cl, %eax ret This is very common in bitstream readers, where it's used to read the top N bits from a word. ffmpeg already has an inline asm to do it, which I'd like to get rid of. I'd guess this applies to some other architectures; it probably works on x86-64, but doesn't on PPC. -- Summary: x86 can use x -y for x 32-y Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503
[Bug target/36503] x86 can use x -y for x 32-y
-- astrange at ithinksw dot com changed: What|Removed |Added Severity|normal |enhancement http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503
[Bug tree-optimization/36318] New: SRA pessimizes struct copies without -Os
/usr/local/gcc44/bin/gcc -v Using built-in specs. Target: i386-apple-darwin9.2.2 Configured with: ../gcc/configure --prefix=/usr/local/gcc44 --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080523 (experimental) (GCC) and these options: gcc -fno-pic -fomit-frame-pointer -O3 -S wc.c For the attached source, gcc generates good code for global variable assignment: _foo0: subl$12, %esp movl_b, %eax movl%eax, _a addl$12, %esp ret but uses byte copies for pointer assignment: _foo1: subl$12, %esp movl%ebx, 4(%esp) movl%esi, 8(%esp) movl20(%esp), %eax movl16(%esp), %edx movzbl (%eax), %esi movzbl 1(%eax), %ebx movzbl 2(%eax), %ecx movzbl 3(%eax), %eax movb%cl, 2(%edx) movb%al, 3(%edx) movb%bl, 1(%edx) movl%esi, %eax movb%al, (%edx) movl4(%esp), %ebx movl8(%esp), %esi addl$12, %esp ret Using either -Os or -fno-tree-sra fixes it. -- Summary: SRA pessimizes struct copies without -Os Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318
[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os
--- Comment #1 from astrange at ithinksw dot com 2008-05-23 21:37 --- Created an attachment (id=15678) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15678action=view) testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318
[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86
--- Comment #4 from astrange at ithinksw dot com 2008-05-07 17:36 --- Created an attachment (id=15592) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15592action=view) minimal source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/36127] New: bad choice of loop IVs above -Os on x86
/usr/local/gcc44/bin/gcc -v [..] gcc version 4.4.0 20080503 (experimental) (GCC) gcc -O3 -mfpmath=sse -fno-pic -fno-tree-vectorize -S himenoBMTxps.c With -O2/-O3, the inner loop in jacobi() in this program ends containing a lot of this: movss _p-4(%edi,%edx,4), %xmm0 movl-96(%ebp), %edi subss _p-4(%edi,%edx,4), %xmm0 movl-108(%ebp), %edi subss _p-4(%edi,%edx,4), %xmm0 movl-92(%ebp), %edi addss _p-4(%edi,%edx,4), %xmm0 movl-124(%ebp), %edi At -O1 or -Os, it instead produces: movss 34056(%eax), %xmm0 subss 33024(%eax), %xmm0 subss -33024(%eax), %xmm0 addss -34056(%eax), %xmm0 which is much better. On core 2 it claims to be 40% faster at -Os. IIRC this isn't a problem on x86-64, but IRA+-O3 was much worse again. -- Summary: bad choice of loop IVs above -Os on x86 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC target triplet: i?86-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86
--- Comment #1 from astrange at ithinksw dot com 2008-05-05 02:12 --- Created an attachment (id=15578) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15578action=view) source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86
-- astrange at ithinksw dot com changed: What|Removed |Added Severity|normal |enhancement http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86
--- Comment #2 from astrange at ithinksw dot com 2008-05-05 02:12 --- Created an attachment (id=15579) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15579action=view) compiled at -O3 on darwin -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86
--- Comment #3 from astrange at ithinksw dot com 2008-05-05 02:13 --- Created an attachment (id=15580) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15580action=view) and at -Os -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127
[Bug tree-optimization/33705] restrict doesn't improve char * aliasing
--- Comment #4 from astrange at ithinksw dot com 2008-04-20 23:48 --- Created an attachment (id=15502) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15502action=view) source with __restrict (no change) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33705
[Bug target/35714] New: x86 poor code with pmaddwd
/usr/local/gcc44/bin/gcc -v Using built-in specs. Target: i386-apple-darwin9.2.0 Configured with: ../gcc/configure --prefix=/usr/local/gcc44 --enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw --with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc Thread model: posix gcc version 4.4.0 20080326 (experimental) (GCC) /usr/local/gcc44/bin/gcc -Os -march=core2 -fno-pic -fomit-frame-pointer -flax-vector-conversions -S pmaddwd.c generates: _madd_swapped: subl$12, %esp movaps LC0, %xmm1 addl$12, %esp pmaddwd %xmm1, %xmm0 ret .globl _madd _madd: subl$12, %esp movaps LC0, %xmm1 addl$12, %esp pmaddwd %xmm0, %xmm1 movaps %xmm1, %xmm0 ret Both of these should be: _madd: pmaddwd LC0, %xmm0 ret since the stack isn't referenced and pmaddwd is commutative. (the variable being renamed LC0 is PR 31043) -- Summary: x86 poor code with pmaddwd Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: astrange at ithinksw dot com GCC build triplet: i386-apple-darwin9.2.0 GCC host triplet: i386-apple-darwin9.2.0 GCC target triplet: i386-apple-darwin9.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35714
[Bug target/35714] x86 poor code with pmaddwd
--- Comment #1 from astrange at ithinksw dot com 2008-03-27 01:02 --- Created an attachment (id=15384) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15384action=view) source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35714
[Bug other/31043] duplicated data in .rodata / .rodata.cst sections.
--- Comment #1 from astrange at ithinksw dot com 2008-03-22 04:28 --- I encountered this myself with 4.4.0 20080321. If the data is static, gcc generates LC0 but not the copy with the original name, which impedes debugging. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31043