[Bug c/98902] -fmerge-all-constants leaves dangling reference

2021-01-31 Thread astrange at ithinksw dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98902

Alexander Strange  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Alexander Strange  ---
Interesting, this is a bug in the Compiler Explorer site. It's hiding .set
lines from me.

[Bug c/98902] New: -fmerge-all-constants leaves dangling reference

2021-01-31 Thread astrange at ithinksw dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98902

Bug ID: 98902
   Summary: -fmerge-all-constants leaves dangling reference
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: astrange at ithinksw dot com
  Target Milestone: ---

This source:
--
#include 

static const int a1[] = {1};
static const int a2[] = {1};

int main (void)
{
printf("%p %p\n", a1, a2);
return 0;
}
--

produces code where it doesn't emit 'a2' but still references it:
--
.LC0:
.string "%p %p\n"
main:
sub rsp, 8
mov edx, OFFSET FLAT:a2
mov esi, OFFSET FLAT:a1
xor eax, eax
mov edi, OFFSET FLAT:.LC0
callprintf
xor eax, eax
add rsp, 8
ret
a1:
.long   1
--

with '-O2 -fmerge-all-constants'. Did not verify this locally, just in compiler
explorer.

[Bug tree-optimization/61515] Extremely long compile time for generated code

2014-06-16 Thread astrange at ithinksw dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #3 from Alexander Strange astrange at ithinksw dot com ---
Without checking, -O0 went from 8 - 5 minutes.

I stopped the -Os compile at 29 minutes.


[Bug tree-optimization/61515] New: Extremely long compile time for generated code

2014-06-15 Thread astrange at ithinksw dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

Bug ID: 61515
   Summary: Extremely long compile time for generated code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: astrange at ithinksw dot com

 /usr/local/gcc49/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc49/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc49/libexec/gcc/x86_64-apple-darwin13.2.0/4.10.0/lto-wrapper
Target: x86_64-apple-darwin13.2.0
Configured with: ../../cc/gcc/configure --prefix=/usr/local/gcc49
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --with-isl=/sw --enable-languages=c,c++,lto,objc,obj-c++
--no-create --no-recursion
Thread model: posix
gcc version 4.10.0 20140615 (experimental) (GCC) 

For the attached source (C translation from a large BF program):
- gcc -O0 takes 9 minutes
- gcc -Os does not finish after 40 minutes


[Bug tree-optimization/61515] Extremely long compile time for generated code

2014-06-15 Thread astrange at ithinksw dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61515

--- Comment #1 from Alexander Strange astrange at ithinksw dot com ---
Created attachment 32944
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32944action=edit
Preprocessed source


[Bug target/43225] Structure copies not vectorized

2011-03-29 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225

--- Comment #4 from Alexander Strange astrange at ithinksw dot com 2011-03-29 
20:39:28 UTC ---
Better source:

#include emmintrin.h

struct a1 { char l[16];} __attribute__((aligned));
struct a2 { __m128i l; } __attribute__((aligned));

void f1(struct a1 *a, struct a1 *b)
{
*a = *b;
}

void f2(struct a2 *a, struct a2 *b)
{
*a = *b;
}

void f3(__m128i *a, __m128i *b)
{
*a = *b;
}

Code is the same as above in svn. LLVM uses movaps for all three functions.


[Bug inline-asm/46615] New: [4.6 regression] possibly-invalid x86-64 inline asm miscompilation

2010-11-22 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46615

   Summary: [4.6 regression] possibly-invalid x86-64 inline asm
miscompilation
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: inline-asm
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: astra...@ithinksw.com


gcc 4.6 miscompiles this source from ffmpeg on x86-64-apple-darwin10, whereas
previous compilers worked. I'm not sure if the asm is legal, but it's existed
in the wild for a long time.

const unsigned long long __attribute__((aligned(8))) ff_bgr24toUV[2][4] =
{
{0x3838DAC83838ULL, 0xECFFDAC8ECFFULL, 0xF6E4D0E3F6E4ULL,
0x3838D0E33838ULL},
{0xECFFDAC8ECFFULL, 0x3838DAC83838ULL , 0x3838D0E33838ULL,
0xF6E4D0E3F6E4ULL},
};

static void 
bgr24ToUV_mmx_MMX2(int f)
{
__asm__ volatile(
movq 24+%0, %%mm6 \n\t
:: m(ff_bgr24toUV[f == 0][0]));
}

void 
rgb24ToUV_MMX2()
{
bgr24ToUV_mmx_MMX2(1);
}

 gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.5.0/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.5.0
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --enable-checking --enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.6.0 20101122 (experimental) (GCC) 
 gcc -O -o swscale-fails.s -S swscale.i 
swscale.i: In function 'rgb24ToUV_MMX2':
swscale.i:10:2: warning: use of memory input without lvalue in asm operand 0 is
deprecated [enabled by default]

Working asm (4.2):
_rgb24ToUV_MMX2:
pushq%rbp
movq%rsp, %rbp
movq 24+_ff_bgr24toUV(%rip), %mm6 
leave
ret
.globl _ff_bgr24toUV
.const
.align 3
_ff_bgr24toUV:
.quad4050987868490315832
.quad-1369135209168966401
.quad-656399642184648988
.quad4051217538195929144
.quad-1369375758026740481
.quad4051228417348089912
.quad4050987868324313144
.quad-656169972313032988
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support

Non-working asm (4.6):
_rgb24ToUV_MMX2:
movq 24+LC0(%rip), %mm6 
ret
.globl _ff_bgr24toUV
.const
.align 3
_ff_bgr24toUV:
.quad4050987868490315832
.quad-1369135209168966401
.quad-656399642184648988
.quad4051217538195929144
.quad-1369375758026740481
.quad4051228417348089912
.quad4050987868324313144
.quad-656169972313032988
.literal8
.align 3
LC0:
.quad4050987868490315832
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support

24+_ff_bgr24toUV(%rip) is fine, but 24+LC0(%rip) is a pointer to nothing, and
ld breaks:

ld: in /var/folders/MY/MYkVh2TwHgKZhNFIG8M3wU+++TI/-Tmp-//cc9dJIWa.o, in
section __TEXT,__text reloc 0: local relocation for address 0x000C in
section __text does not target section __literal8

I'm going to fix the asm since it looks fragile anyway, but that won't fix
existing releases of ffmpeg.

Note that creating LC0 is not even an optimization since it doesn't save any
space (because the array is __attribute__((used))).


[Bug rtl-optimization/46248] New: 4.6 regression: crash+infinite recursion in combine

2010-10-31 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46248

   Summary: 4.6 regression: crash+infinite recursion in combine
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: astra...@ithinksw.com


Created attachment 22210
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22210
source

gcc r166084 crashes compiling ffmpeg libpostproc on x86-64-apple-darwin10.

Minimized-ish source attached.

 gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.4.0/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.4.0
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.6.0 20101030 (experimental) (GCC) 

 gcc -O3 -S postprocess.i 
gcc: internal compiler error: Segmentation fault (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Backtrace:

#0  0x00010031fc34 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f400078, pfalse=0x7fff5f400068) at
../../../src/gcc/gcc/combine.c:8471
#1  0x00010031fd82 in if_then_else_cond (x=0x1425e1498,
ptrue=0x7fff5f400118, pfalse=0x7fff5f400108) at
../../../src/gcc/gcc/combine.c:8507
#2  0x00010031fd82 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f4001b8, pfalse=0x7fff5f4001a8) at
../../../src/gcc/gcc/combine.c:8507
#3  0x00010031fd82 in if_then_else_cond (x=0x1425e1498,
ptrue=0x7fff5f400258, pfalse=0x7fff5f400248) at
../../../src/gcc/gcc/combine.c:8507
#4  0x00010031fd82 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f4002f8, pfalse=0x7fff5f4002e8) at
../../../src/gcc/gcc/combine.c:8507
#5  0x00010031fd82 in if_then_else_cond (x=0x1425e1498,
ptrue=0x7fff5f400398, pfalse=0x7fff5f400388) at
../../../src/gcc/gcc/combine.c:8507
#6  0x00010031fd82 in if_then_else_cond (x=0x1425e14b0,
ptrue=0x7fff5f400438, pfalse=0x7fff5f400428) at
../../../src/gcc/gcc/combine.c:8507
...


[Bug target/36503] x86 can use x -y for x 32-y

2010-10-20 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503

--- Comment #8 from Alexander Strange astrange at ithinksw dot com 2010-10-21 
04:39:36 UTC ---
I built ffmpeg for x86-64 with --disable-asm with the attached patch and the
regression tests failed. Reverting the patch fixes them. I saved the binaries
but haven't investigated yet.


[Bug rtl-optimization/45788] New: -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge

2010-09-25 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788

   Summary: -fwhole-program causes ICE error: BB 3 can not throw
but has an EH edge
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: astra...@ithinksw.com


 gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.4.0/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.4.0
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --with-gmp=/sw
--disable-bootstrap --enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.6.0 20100924 (experimental) (GCC) 

 gcc -O3 -fwhole-program -S eh_ice.ii
eh_ice.ii: In function 'void
_ZL9set_colorP9primitive7vectorXIfLi4EE.isra.3.constprop.5(texture**, color4)':
eh_ice.ii:93:15: error: BB 3 can not throw but has an EH edge
eh_ice.ii:93:15: internal compiler error: verify_flow_info failed
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Removing -fwhole-program fixes it.

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.


[Bug rtl-optimization/45788] -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge

2010-09-25 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788

--- Comment #1 from Alexander Strange astrange at ithinksw dot com 2010-09-25 
06:51:33 UTC ---
BTW, I think the error would be a lot clearer if it printed the pre-cloning/etc
function name.

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.


[Bug rtl-optimization/45788] -fwhole-program causes ICE error: BB 3 can not throw but has an EH edge

2010-09-25 Thread astrange at ithinksw dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45788

--- Comment #4 from Alexander Strange astrange at ithinksw dot com 2010-09-25 
19:50:29 UTC ---
I (probably) definitely attached it, is the attachment form in the new bugs
page not working?


[Bug target/44474] GCC inserts redundant test instruction due to incorrect clobber

2010-08-29 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-08-29 06:39 ---
Still happens with the new combine work (not that I really expected it to
change).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44474



[Bug target/44073] x86 constants could be unduplicated

2010-08-08 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2010-08-08 06:39 ---
That commit doesn't reverse cleanly anymore, and I'm not sure how to update it.
I don't have any pre-2005 gccs at the moment to test with.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073



[Bug target/44474] GCC inserts redundant test instruction due to incorrect clobber

2010-06-30 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-07-01 03:43 ---
The problem is combine.

This:

int test2( int *b )
{
int b_ = *b;
b_--;
if( b_ == 0 ) {
*b = b_;
return foo();
}
*b = b_;
return 0;
}

works:
_test2:
LFB1:
movl(%rdi), %eax
decl%eax
je  L7 - uses decl
movl%eax, (%rdi)
xorl%eax, %eax
ret
.align 4,0x90
L7:
movl$0, (%rdi)
xorl%eax, %eax
jmp _foo

The original turns (*b)-- into load/dec/store/cmp - combine tries to combine
dec/store which fails, but doesn't try dec/cmp.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44474



[Bug target/44532] New: x86-64 unnecessary parameter extension

2010-06-14 Thread astrange at ithinksw dot com
Source:
int f1(short a, int b)
{
return a * b;
}

int f2(unsigned short a, int b)
{
return a * b;
}

 gcc -O3 -fomit-frame-pointer -S paramext.c

_f1:
LFB0:
movl%esi, %eax
movswl  %di, %edi -
imull   %edi, %eax
ret
...
_f2:
LFB1:
movl%esi, %eax
movzwl  %di, %edi -
imull   %edi, %eax
ret

AFAIK integer parameters should already be extended to int, so those
instructions are redundant. llvm doesn't generate them.


-- 
   Summary: x86-64 unnecessary parameter extension
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44532



[Bug lto/44429] New: ltp ignoring __attribute__((used))

2010-06-05 Thread astrange at ithinksw dot com
Source:
static const int __attribute__((used)) i = 1;

int main(void)
{
int r;
__asm__ (movl _i(%%rip), %0 : =r(r));
return r;
}

 /usr/local/gcc46/bin/gcc -O3 -o attrused attrused.c 
 /usr/local/gcc46/bin/gcc -O3 -o attrused attrused.c -flto
Undefined symbols:
  _i, referenced from:
  _main in ccMflGRF.lto.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

Not sure how to construct a failing program that doesn't involve asm.
This is the only thing left preventing ffmpeg (without asm disabled) from
compiling under LTO.


-- 
   Summary: ltp ignoring __attribute__((used))
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.3.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44429



[Bug lto/44090] lto ice in verify_stmts

2010-05-24 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-05-24 20:01 ---
Fixed itself. Though lto still doesn't build ffmpeg, it's just a different bug
now.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug rtl-optimization/44223] New: segmentation fault with -g -fsched-pressure

2010-05-20 Thread astrange at ithinksw dot com
 gcc -O3 -g -fsched-pressure -fschedule-insns -S crash1m.i
crash1m.i: In function 'ff_adts_write_frame_header':
crash1m.i:35:2: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Backtrace:
(gdb) run
Starting program:
/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/cc1 -fpreprocessed
crash1m.i -march=core2 -mcx16 -msahf -maes -mpclmul -mpopcnt -msse4.2 --param
l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072
-mtune=core2 -fPIC -feliminate-unused-debug-symbols -quiet -dumpbase crash1m.i
-mmacosx-version-min=10.6.3 -auxbase crash1m -g -O3 -version -fsched-pressure
-fschedule-insns -o crash1m.s
Reading symbols for shared libraries .++. done
GNU C (GCC) version 4.6.0 20100521 (experimental) (x86_64-apple-darwin10.3.1)
compiled by GNU C version 4.2.1 (Apple Inc. build 5659), GMP version
4.3.1, MPFR version 2.4.2-p3, MPC version 0.8
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.6.0 20100521 (experimental) (x86_64-apple-darwin10.3.1)
compiled by GNU C version 4.2.1 (Apple Inc. build 5659), GMP version
4.3.1, MPFR version 2.4.2-p3, MPC version 0.8
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 5c588719ada4c17718f398d6d2dbd7a3

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x
0x0001004edc54 in dying_use_p (use=0x141720070) at
../../../src/gcc/gcc/haifa-sched.c:769
769 if (NONDEBUG_INSN_P (next-insn)
(gdb) bt
#0  0x0001004edc54 in dying_use_p (use=0x141720070) at
../../../src/gcc/gcc/haifa-sched.c:769
#1  0x0001004f055d in setup_insn_reg_pressure_info [inlined] () at
/Users/astrange/Projects/src/gcc/gcc/haifa-sched.c:1130
#2  0x0001004f055d in ready_sort (ready=0x100b0b5e0) at
../../../src/gcc/gcc/haifa-sched.c:1502
#3  0x0001004f5e4b in schedule_block (target_bb=0x7fff5fbfe4e8) at
../../../src/gcc/gcc/haifa-sched.c:3203
#4  0x00010060c8bd in schedule_insns () at
../../../src/gcc/gcc/sched-rgn.c:3001
#5  0x00010060cd4f in rest_of_handle_sched () at
../../../src/gcc/gcc/sched-rgn.c:3512
#6  0x00010059cb3f in execute_one_pass (pass=0x100b99d40) at
../../../src/gcc/gcc/passes.c:1589
#7  0x00010059ce1d in execute_pass_list (pass=0x100b99d40) at
../../../src/gcc/gcc/passes.c:1644
#8  0x00010059ce2f in execute_pass_list (pass=0x100b98ec0) at
../../../src/gcc/gcc/passes.c:1645
#9  0x0001006cd1d0 in invoke_plugin_callbacks [inlined] () at
/Users/astrange/Projects/src/gcc/gcc/plugin.h:413
#10 0x0001006cd1d0 in tree_rest_of_compilation (fndecl=0x14252f300) at
../../../src/gcc/gcc/tree-optimize.c:416
#11 0x000100898ef6 in cgraph_expand_function (node=0x14240cd20) at
../../../src/gcc/gcc/cgraphunit.c:1622
#12 0x00010089c07d in cgraph_expand_all_functions [inlined] () at
/Users/astrange/Projects/src/gcc/gcc/cgraphunit.c:1701
#13 0x00010089c07d in cgraph_optimize () at
../../../src/gcc/gcc/cgraphunit.c:1957
#14 0x00010089c676 in cgraph_finalize_compilation_unit () at
../../../src/gcc/gcc/cgraphunit.c:1161
#15 0x0001f0f2 in c_write_global_declarations () at
../../../src/gcc/gcc/c-decl.c:9578
#16 0x0001006623c5 in do_compile () at ../../../src/gcc/gcc/toplev.c:1059
#17 0x000100662b1d in toplev_main (argc=32, argv=0x7fff5fbfe828) at
../../../src/gcc/gcc/toplev.c:2433
#18 0x00010f64 in start ()


-- 
   Summary: segmentation fault with -g -fsched-pressure
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.3.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44223



[Bug rtl-optimization/44223] segmentation fault with -g -fsched-pressure

2010-05-20 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-05-21 02:02 ---
Created an attachment (id=20715)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20715action=view)
file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44223



[Bug target/44073] New: x86 constants could be unduplicated

2010-05-11 Thread astrange at ithinksw dot com
void f1(int *a, int *b, int *c)
{
int d = 0xE0E0E0E0;

*a = *b = *c = d;
}

produces
_f1:
LFB0:
movl$-522133280, (%rdx)
movl$-522133280, (%rsi)
movl$-522133280, (%rdi)
ret

on x86-64 at -Os. It would save instruction space and probably not be any
slower to actually assign d to a register, but this is only done for 64-bit
constants.


-- 
   Summary: x86 constants could be unduplicated
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073



[Bug target/44073] x86 constants could be unduplicated

2010-05-11 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-05-11 10:36 ---
It's propagated by vrp1, and then nothing removes it again. tree-uncprop
doesn't change it - it looks like it doesn't have anything to handle this,
actually.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44073



[Bug lto/44090] New: lto ice in verify_stmts

2010-05-11 Thread astrange at ithinksw dot com
 /usr/local/gcc46/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc46/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10.3.1
Configured with: ../../src/gcc/configure --prefix=/usr/local/gcc46
--with-arch=native --with-tune=native --disable-nls --enable-lto
--disable-bootstrap LDFLAGS=-L/sw/lib CPPFLAGS=-I/sw/include
--enable-languages=c,c++,objc,obj-c++,lto
Thread model: posix
gcc version 4.6.0 20100511 (experimental) (GCC) 

The attached files have two different definitions of MpegEncContext. -flto with
checking gives an ice on it instead of a readable warning/error:

 /usr/local/gcc46/bin/gcc -O3 -flto -c h263dec.i
 /usr/local/gcc46/bin/gcc -O3 -flto -c ituh263dec.i
 echo h263dec.o ituh263dec.o  test
 /usr/local/gcc46/libexec/gcc/x86_64-apple-darwin10.3.1/4.6.0/lto1 -O3 @test   
  
Reading object files: h263dec.o ituh263dec.o
Reading the callgraph
Merging declarations
Reading summaries
Reading function bodies: ff_h263_decode_mb ff_h263_decode_init
Performing interprocedural optimizations
 whole-program
In function 'ff_h263_decode_init':
lto1: error: type mismatch in address expression
unnamed-signed:32 (*T4a5) (struct MpegEncContext *, unnamed-signed:16[64]
*)

unnamed-signed:32 T4ac (struct MpegEncContext *, unnamed-signed:16[64] *)

# .MEM_5 = VDEF .MEM_4(D)
s_3-decode_mb = ff_h263_decode_mb;

lto1: internal compiler error: verify_stmts failed
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

It looks obviously invalid here, but building ffmpeg with -O3 -flto gives the
same ice, and I can't see any bugs that would cause that. It's hard to debug
it, though, since it doesn't print the origin files of the mismatched
definitions or anything.

The original, absolutely not unreduced version:
 svn co -r23100 svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg
 cd ffmpeg
 ./configure --cc=/usr/local/gcc46/bin/gcc --extra-cflags=-flto -O3 
 --extra-ldflags=-flto -O3 --enable-shared; make
...
lots of lto type of ... does not match original declaration warnings that
all seem to be wrong
...
s_4-decode_mb = ff_h263_decode_mb;

lto1: internal compiler error: verify_stmts failed


-- 
   Summary: lto ice in verify_stmts
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.3.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug lto/44090] lto ice in verify_stmts

2010-05-11 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-05-12 05:27 ---
Created an attachment (id=20638)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20638action=view)
test file 1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug lto/44090] lto ice in verify_stmts

2010-05-11 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-05-12 05:27 ---
Created an attachment (id=20639)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20639action=view)
test file 2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44090



[Bug tree-optimization/44063] [4.6 Regression]: build broken for libgcc cris-elf, ICE in cgraph_estimate_size_after_inlining, at ipa-inline

2010-05-10 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-05-11 03:38 ---
Created an attachment (id=20623)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20623action=view)
testcase

This happens building ffmpeg on x86-64 now. Minimal-ish testcase attached.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44063



[Bug target/43766] New: x86 prefetch doesn't use complex memory addressing

2010-04-16 Thread astrange at ithinksw dot com
Source:
void p(int *a, int i)
{
__builtin_prefetch(a[i]);
}

 gcc -O3 -fomit-frame-pointer -S prefetch.c
_p:
movslq  %esi, %rsi
leaq(%rdi,%rsi,4), %rax
prefetcht0  (%rax)
ret

leaq and prefetch should be merged.


-- 
   Summary: x86 prefetch doesn't use complex memory addressing
   Product: gcc
   Version: 4.6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43766



[Bug target/43766] x86 prefetch doesn't use complex memory addressing

2010-04-16 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-04-16 21:19 ---
Works with x86-64.

Checking -m32, the same thing happens with or without the patch:
_p:
subl$12, %esp
movl20(%esp), %eax
sall$2, %eax
addl16(%esp), %eax
addl$12, %esp
prefetcht0  (%eax)
ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43766



[Bug rtl-optimization/43721] Failure to optimise (a/b) and (a%b) into single __aeabi_idivmod call

2010-04-11 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-04-12 03:54 ---
Still the case with 4.5.

 arm-none-linux-gnueabi-gcc -Os -S divmod.c
 cat divmod.s
.cpu arm10tdmi
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file   divmod.c
.global __aeabi_idivmod
.global __aeabi_idiv
.text
.align  2
.global divmod
.type   divmod, %function
divmod:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
stmfd   sp!, {r4, r5, r6, lr}
mov r6, r0
mov r5, r1
bl  __aeabi_idivmod
mov r0, r6
mov r4, r1
mov r1, r5
bl  __aeabi_idiv
add r0, r4, r0
ldmfd   sp!, {r4, r5, r6, pc}
.size   divmod, .-divmod
.ident  GCC: (GNU) 4.5.0 20100325 (experimental)
.section.note.GNU-stack,,%progbits


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC||astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43721



[Bug target/43723] New: Some ARMs support unaligned

2010-04-11 Thread astrange at ithinksw dot com
Source:
struct s { int i; } __attribute__((packed));

int a(struct s *s)
{
return s-i;
}

Using 4.5:
 /usr/local/gcc-arm/bin/arm-none-linux-gnueabi-gcc -Os -mcpu=cortex-a8 -S 
 unaligned.c
 cat unaligned.s
.cpu cortex-a8
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 18, 4
.file   unaligned.c
.text
.align  2
.global a
.type   a, %function
a:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldrbr2, [r0, #1]@ zero_extendqisi2
ldrbr3, [r0, #0]@ zero_extendqisi2
orr r3, r3, r2, asl #8
ldrbr2, [r0, #2]@ zero_extendqisi2
ldrbr0, [r0, #3]@ zero_extendqisi2
orr r3, r3, r2, asl #16
orr r0, r3, r0, asl #24
bx  lr
.size   a, .-a
.ident  GCC: (GNU) 4.5.0 20100325 (experimental)
.section.note.GNU-stack,,%progbits

At least some configurations of cortex-a8 support unaligned access just fine,
so it should be possible to use it. But it doesn't look like it is - there is
no -mno-strict-align for arm. This would be a major code size reduction for
FFmpeg.


-- 
   Summary: Some ARMs support unaligned
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: arm-unknown-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43723



[Bug target/43550] New: arm missing rev16

2010-03-26 Thread astrange at ithinksw dot com
typedef unsigned short uint16_t;
typedef unsigned int   uint32_t;

uint16_t s16(uint16_t v)
{
return v8|v8;
}

uint32_t s32(uint32_t v)
{
return __builtin_bswap32(v);
}

 gcc -O3 -mcpu=cortex-a8 -S bswap.c
s16:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mov r3, r0, lsr #8
orr r0, r3, r0, asl #8
uxthr0, r0
bx  lr

s32:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
rev r0, r0
bx  lr

It generates 32-bit bswap using rev but not 16-bit using rev16. x86 can do
both.


-- 
   Summary: arm missing rev16
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: arm-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43550



[Bug lto/43373] whopr+linker plugin ICE compressed stream data error

2010-03-15 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-03-15 11:10 ---
The last two commands were the source and testcase. Should have spaced it out
more.

i don't have enough memory allocated to this VM to build ffmpeg without whopr,
so I thought i'd try the more experimental path first.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43373



[Bug lto/43342] lto1: internal compiler error: failed to reclaim unneeded function

2010-03-14 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2010-03-14 23:33 ---
This happens building ffmpeg --enable-shared with -fwhopr. I can make a
testcase out of that if needed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC||astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43342



[Bug lto/43372] New: lto ICE in strip_extension with linker plugin

2010-03-14 Thread astrange at ithinksw dot com
Source:
a.c:
int a()
{
return 0;
}

b.c:
extern int a();
int b()
{
a();
}

 gcc -fwhopr -c a.c b.c
 ar r liba.a a.o
 gcc -fwhopr -fuse-linker-plugin -shared -o libb.so b.o liba.a
lto1: internal compiler error: in strip_extension, at lto/lto.c:910
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper: /usr/local/gcc45/bin/gcc returned 1 exit status
/usr/bin/ld: fatal error: lto-wrapper failed
collect2: ld returned 1 exit status

It fails trying to strip .o from liba.a. (I added an extra line to print
that, so the ICE line number is off by 1.)
Using gcc 20100314 and gold from Ubuntu binutils-gold 2.20-0ubuntu2.


-- 
   Summary: lto ICE in strip_extension with linker plugin
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43372



[Bug lto/43373] New: whopr+linker plugin ICE compressed stream data error

2010-03-14 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc45/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: ../gcc/configure --with-arch=native --with-tune=native
--disable-bootstrap --with-mpc=/usr/local --enable-languages=c,c++,objc,lto
--enable-gold --enable-lto --prefix=/usr/local/gcc45
Thread model: posix
gcc version 4.5.0 20100314 (experimental) (GCC) 
 ld --version
GNU gold (GNU Binutils for Ubuntu 2.20) 1.9
Copyright 2008 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
 cat a.c
int main(void) {return 0;}
 gcc -fwhopr -fuse-linker-plugin -o a a.c -save-temps
lto1: internal compiler error: compressed stream: data error
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
lto1: fatal error: /usr/local/gcc45/bin/gcc terminated with status 256
compilation terminated.
lto-wrapper: /usr/local/gcc45/bin/gcc returned 1 exit status
/usr/bin/ld: fatal error: lto-wrapper failed
collect2: ld returned 1 exit status

Works without -fuse-linker-plugin. This prevents ffmpeg and x264 from
configuring for me if I put -fwhopr -fuse-linker-plugin in the CFLAGS/LDFLAGS.


-- 
   Summary: whopr+linker plugin ICE compressed stream data error
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43373



[Bug lto/43318] New: LTO ICE with minimal C++ program

2010-03-09 Thread astrange at ithinksw dot com
Using svn r157325 on Ubuntu.

 /usr/local/gcc45/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc45/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: ../gcc/configure --enable-threads=posix --with-arch=native
--with-tune=native --disable-nls --disable-bootstrap --prefix=/usr/local/gcc45
--with-mpc=/usr/local --enable-languages=c,c++,objc,lto --enable-lto
--enable-gold
Thread model: posix
gcc version 4.5.0 20100309 (experimental) (GCC) 

Source:
void a()
{
}

 /usr/local/gcc45/bin/g++ -flto -c a.cpp
 /usr/local/gcc45/bin/g++ -flto -O -r -nostdlib a.o
a/0(-1) @0xb769b398 availability:available needed reachable body
externally_visible finalized
  called by: 
  calls: 
callgraph:

a/0(-1) @0xb769b398 availability:available needed reachable body
externally_visible finalized
  called by: 
  calls: 
lto1: internal compiler error: in propagate, at ipa-reference.c:1244
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper: /usr/local/gcc45/bin/g++ returned 1 exit status
collect2: lto-wrapper returned 1 exit status

 /usr/local/gcc45/bin/g++ -flto -O -fno-ipa-reference -r -nostdlib a.o
lto1: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.
lto-wrapper: /usr/local/gcc45/bin/g++ returned 1 exit status
collect2: lto-wrapper returned 1 exit status


-- 
   Summary: LTO ICE with minimal C++ program
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318



[Bug lto/43318] LTO ICE with minimal C++ program

2010-03-09 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-03-10 00:32 ---
Actually, it doesn't work in C either. I find that unlikely, time to make sure
I didn't build it wrong somehow...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318



[Bug lto/43318] LTO ICE with minimal C++ program

2010-03-09 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2010-03-10 00:37 ---


*** This bug has been marked as a duplicate of 42402 ***


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43318



[Bug lto/42402] ICE in propagate, at ipa-reference.c:1244

2010-03-09 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-03-10 00:37 ---
*** Bug 43318 has been marked as a duplicate of this bug. ***


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC||astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42402



[Bug target/43233] New: x86 flags not combined across blocks

2010-03-02 Thread astrange at ithinksw dot com
Source:
int g1,g2,g3;

int f1(int a, int b)
{
a = 1;

if (a) return g1;
return g2;
}

int f2(int a, int b)
{
a = 1;

if (b)
g3++;

if (a) return g1;
return g2;
}

Compiled with:
 gcc -O3 -fomit-frame-pointer -S and_flags.c

f1 is ok but f2 generates this:
_f2:
andl$1, %edi -- #1
testl   %esi, %esi
je  L7
movq_...@gotpcrel(%rip), %rax
incl(%rax)
L7:
testl   %edi, %edi -- #2
jne L10
movq_...@gotpcrel(%rip), %rax
movl(%rax), %eax
ret
.align 4,0x90
L10:
movq_...@gotpcrel(%rip), %rax
movl(%rax), %eax
ret

The andl and testl should be folded into one andl.

Code is reduced from ffmpeg h264 decoder. It's easy to work around by
reordering source lines, so not too important.


-- 
   Summary: x86 flags not combined across blocks
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43233



[Bug tree-optimization/43224] New: Constant load not raised out of loop

2010-03-01 Thread astrange at ithinksw dot com
Source:
#include string.h

void dequant_lsps(double *lsps, int num,
  const unsigned short *values,
  int n_stages, const unsigned char * __restrict table,
  const double * __restrict mul_q, const double * __restrict
base_q)
{
const unsigned char *t_off = table[values[0] * num];
int m;

memset(lsps, 0, num * sizeof(*lsps));

for (m = 0; m  num; m++)
lsps[m] += base_q[0] + mul_q[0] * t_off[m];
}

 /usr/local/gcc45/bin/gcc -O3 -S base_lsp.c

The inner loop:
L3:
movzbl  (%r15), %edx
incq%r15
cvtsi2sd%edx, %xmm0
mulsd   0(%r13), %xmm0 - constant (and 0 prefix)
addsd   (%r14), %xmm0 - constant
addsd   (%rbx,%rax), %xmm0
movsd   %xmm0, (%rbx,%rax)
addq$8, %rax
cmpq%rcx, %rax
jne L3

Rest of the output attached.
base_q and mul_q should be loaded outside of the loop but aren't. I added
__restrict to base_q/mul_q/table, but it didn't affect it.
Code is reduced from FFmpeg WMA Voice decoder.


-- 
   Summary: Constant load not raised out of loop
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-apple-darwin10.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224



[Bug tree-optimization/43224] Constant load not raised out of loop

2010-03-01 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2010-03-02 03:45 ---
Created an attachment (id=20002)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20002action=view)
x86-64 asm output


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224



[Bug tree-optimization/43224] Constant load not raised out of loop

2010-03-01 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2010-03-02 04:00 ---
Is it possible for aliased writes to affect a const pointer? I was assuming
that it wasn't.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43224



[Bug target/43225] New: Structure copies not vectorized

2010-03-01 Thread astrange at ithinksw dot com
Source:

#include emmintrin.h

struct a1 { char l[16];};
struct a2 { __m128i l; };

void f1(struct a1 *a, struct a1 *b)
{
*a = *b;
}

void f2(struct a2 *a, struct a2 *b)
{
*a = *b;
}

 /usr/local/gcc45/bin/gcc -O3 -fomit-frame-pointer -S copy_gcc.c
_f1:
movq(%rsi), %rax
movq%rax, (%rdi)
movq8(%rsi), %rax
movq%rax, 8(%rdi)
ret

_f2:
movdqa  (%rsi), %xmm0
movdqa  %xmm0, (%rdi)
ret

Both are appropriately aligned and should use movdqa. This might not show up in
generic code, but I could have used it in an ffmpeg optimization.


-- 
   Summary: Structure copies not vectorized
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
  GCC host triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225



[Bug target/43225] Structure copies not vectorized

2010-03-01 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2010-03-02 05:31 ---
-fdump-tree-slp-details:
copy_gcc.c:8: note: ===vect_slp_analyze_bb===

copy_gcc.c:8: note: === vect_analyze_data_refs ===
Creating dr for *b_2(D)
analyze_innermost: success.
base_address: b_2(D)
offset from base address: 0
constant offset from base address: 0
step: 0
aligned to: 128
base_object: *b_2(D)
Creating dr for *a_1(D)
analyze_innermost: success.
base_address: a_1(D)
offset from base address: 0
constant offset from base address: 0
step: 0
aligned to: 128
base_object: *a_1(D)

copy_gcc.c:8: note: not vectorized: no vectype for stmt: *a_1(D) = *b_2(D);
 scalar_type: struct a1
copy_gcc.c:8: note: not vectorized: unhandled data-ref in basic block.
f1 (struct a1 * a, struct a1 * b)
{
bb 2:
  *a_1(D) = *b_2(D);
  return;

}

Though I tried it with __attribute__((aligned)) too.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43225



[Bug tree-optimization/42211] New: Segmentation fault with graphite -floop-interchange

2009-11-29 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc45/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc45/libexec/gcc/x86_64-apple-darwin10.2.0/4.5.0/lto-wrapper
Target: x86_64-apple-darwin10.2.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc45
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --with-ppl=/sw --with-cloog=/sw --with-libelf=/sw --disable-nls
--disable-bootstrap LDFLAGS=/usr/lib/libiconv.dylib
--enable-languages=c,c++,lto,objc,obj-c++
Thread model: posix
gcc version 4.5.0 20091129 (experimental) (GCC) 

Using r154734.

With attached source:
 gcc -O3 -floop-interchange -S graphite_crash.i
graphite_crash.i: In function 'border_mirror_480':
graphite_crash.i:17:6: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

It doesn't happen reliably to me with -v -Q, so I can't check with gdb.
Valgrind gives:
==12758== Invalid read of size 8
==12758==at 0x1004AE4A3: lst_do_interchange_1 (graphite-interchange.c:709)
==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x1006AAA80: tree_rest_of_compilation (tree-optimize.c:408)
==12758==  Address 0x141c25210 is 16 bytes inside a block of size 24 free'd
==12758==at 0x140EB88DC: free (vg_replace_malloc.c:325)
==12758==by 0x1004AE00C: lst_try_interchange (graphite-poly.h:704)
==12758==by 0x1004AE49F: lst_do_interchange_1 (graphite-interchange.c:710)
==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758== 
==12758== Invalid read of size 8
==12758==at 0x1004AE534: lst_do_interchange (graphite-interchange.c:732)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)
==12758==by 0x1006AAA80: tree_rest_of_compilation (tree-optimize.c:408)
==12758==by 0x100866F56: cgraph_expand_function (cgraphunit.c:1178)
==12758==  Address 0x141c25210 is 16 bytes inside a block of size 24 free'd
==12758==at 0x140EB88DC: free (vg_replace_malloc.c:325)
==12758==by 0x1004AE00C: lst_try_interchange (graphite-poly.h:704)
==12758==by 0x1004AE49F: lst_do_interchange_1 (graphite-interchange.c:710)
==12758==by 0x1004AE525: lst_do_interchange (graphite-interchange.c:730)
==12758==by 0x1004AE58A: lst_do_interchange (graphite-interchange.c:734)
==12758==by 0x1004AE5CA: scop_do_interchange (graphite-interchange.c:748)
==12758==by 0x1004AF4C7: apply_poly_transforms (graphite-poly.c:260)
==12758==by 0x1004A01A1: graphite_transform_loops (graphite.c:276)
==12758==by 0x100736B09: graphite_transforms (tree-ssa-loop.c:300)
==12758==by 0x10057D522: execute_one_pass (passes.c:1522)
==12758==by 0x10057D7CC: execute_pass_list (passes.c:1577)
==12758==by 0x10057D7DE: execute_pass_list (passes.c:1578)


-- 
   Summary: Segmentation fault with graphite -floop-interchange
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: x86_64-apple

[Bug tree-optimization/42211] Segmentation fault with graphite -floop-interchange

2009-11-29 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2009-11-29 09:38 ---
Created an attachment (id=19175)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19175action=view)
somewhat-reduced source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42211



[Bug c/42136] New: Inconsistent strict-aliasing warning with cast from char[]

2009-11-21 Thread astrange at ithinksw dot com
Source:
typedef union u { unsigned i; unsigned short s[2]; unsigned char c[4]; } u;

char c[4] __attribute__((aligned));
short s[2] __attribute__((aligned));

int f1()
{
return ((union u*)s)-i;
}

int f2()
{
return ((union u*)c)-i;
}

Using gcc 4.5:

 gcc -O3 -fstrict-aliasing -Wall -S wstrict_aliasing_char.c

wstrict_aliasing_char.c: In function 'f2':
wstrict_aliasing_char.c:13:17: warning: dereferencing type-punned pointer will
break strict-aliasing rules

I would expect either both or neither of the functions to warn, since pointer
casting to unions is given in the manual as something that violates
strict-aliasing, although gcc doesn't seem to actually take advantage of this.

Instead, it looks like the warning is hardcoded to apply to a cast from char
(c-common.c:1746 in r1554411):
  alias_set_type set1 =
get_alias_set (TREE_TYPE (TREE_OPERAND (expr, 0)));
  alias_set_type set2 = get_alias_set (TREE_TYPE (type));

  if (set1 != set2  set2 != 0
   (set1 == 0 || !alias_sets_conflict_p (set1, set2)))
{
  warning (OPT_Wstrict_aliasing, dereferencing type-punned 
   pointer will break strict-aliasing rules);
  return true;
}

This came up during some x264 work, but it's taken care of now with some
__attribute__((may_alias)).


-- 
   Summary: Inconsistent strict-aliasing warning with cast from
char[]
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42136



[Bug tree-optimization/36646] [4.3/4.4/4.5 Regression] Unnecessary moves generated on loop boundaries

2009-11-07 Thread astrange at ithinksw dot com


--- Comment #8 from astrange at ithinksw dot com  2009-11-07 09:03 ---
Closing.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug tree-optimization/36646] [4.3/4.4/4.5 Regression] Unnecessary moves generated on loop boundaries

2009-10-20 Thread astrange at ithinksw dot com


--- Comment #7 from astrange at ithinksw dot com  2009-10-20 21:10 ---
Tried with SVN today and it's fixed:

L6:
incb(%ebx)
jmp L12
.align 4,0x90

Close if you want; I don't think it's worth finding when this happened.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug inline-asm/11203] source doesn't compile with -O0 but they compile with -O3

2009-10-18 Thread astrange at ithinksw dot com


--- Comment #40 from astrange at ithinksw dot com  2009-10-18 19:56 ---
Linked from http://x264dev.multimedia.cx/?p=185, I'd forgotten all about the
ridiculous flamewar in this one.

Just as a note, the actual definitions of the four variables (from liba52):
  x2k = x + 2 * k;
  x3k = x2k + 2 * k;
  x4k = x3k + 2 * k;
  wB = wTB + 2 * k;

Also, the asm inputs are silly - output 0 is the same as input 6 for no reason,
and the same with output 2 and input 7. So change those to +m and change
%6/%7 to %0/%2.

That doesn't actually change anything, even though it should free two
registers. It works with gcc 4.5 -O0 -fno-pic -fomit-frame-pointer, but not
without one of those flags. Looks like that's because it's allocating 2 more
registers for the unused fake inputs for the +m - change it to =m and it
works with one flag removed, but still not both. So there's a specific bug.

And of course it all works at -O1 because it doesn't have to use registers
there. So maybe it should just do that.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11203



[Bug tree-optimization/40992] [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size

2009-08-08 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2009-08-08 16:44 ---
Maybe the C version will be usable after everyone is using 4.4+, earlier
versions tend to make a mess.

Anyway, counting newlines for size estimation wouldn't pessimize anything.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2009-08-06 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2009-08-07 03:04 ---
Fixed with -O3 -fgraphite-identity. Why did I even bother checking that?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/40992] New: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size

2009-08-06 Thread astrange at ithinksw dot com
The attached file is a loop over the same function implemented in C and inline
asm. 

When compiled with:
gcc -O3 -fno-pic -fomit-frame-pointer -fdump-tree-cunroll-details -S
cabac_unroll.i
cunroll thinks they're different sizes:

size: 55-4, last_iteration: 55-4
  Loop size: 55
  Estimated size after unrolling: 442

size: 8-4, last_iteration: 8-4
  Loop size: 8
  Estimated size after unrolling: 34

and expands the asm loop all 13 times.

This is reduced from ffmpeg decode_cabac_residual, where it apparently causes
significant decoding slowdown.

Besides that, cunroll seems to be hurting ffmpeg in general on x86-32
(http://multimedia.cx/eggs/last-performance-smackdown-for-awhile/), maybe we'll
turn it down some.


-- 
   Summary: [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size
   Product: gcc
   Version: 4.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992



[Bug tree-optimization/40992] [4.2/4.3/4.4/4.5 Regression] cunroll ignoring asm size

2009-08-06 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2009-08-07 04:25 ---
Created an attachment (id=18315)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18315action=view)
the source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40992



[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os

2009-06-04 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2009-06-05 04:31 ---
This bug must have been weaker than I remembered it; when I used 4 char fields
instead of one char[4], 4.4 behaved properly too.

How about:
Alexander Strange astra...@ithinksw.com

PR tree-optimization/36318
* gcc.dg/tree-ssa/sra-7.c: New test.

/* { dg-do compile } */
/* { dg-options -O1 -fdump-tree-sra-details } */

typedef struct {char f[4];} __attribute__((aligned (4))) s;

void a(s *s1, s *s2)
{
*s1 = *s2;
}

/* Struct copies should not be split into members */
/* { dg-final { scan-tree-dump = \\\*s2  sra} } */
/* { dg-final { cleanup-tree-dump sra } } */

I checked sra instead of esra since it runs last and this is a negative test.
Hopefully this is trivial?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318



[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os

2009-05-29 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2009-05-30 00:19 ---
Fixed with new SRA:
_foo1:
subl$12, %esp
movl20(%esp), %eax
movl(%eax), %edx
movl16(%esp), %eax
movl%edx, (%eax)
addl$12, %esp
ret


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318



[Bug c/2803] casts in asm act as lvalues

2009-05-25 Thread astrange at ithinksw dot com


--- Comment #12 from astrange at ithinksw dot com  2009-05-25 20:26 ---
I noticed this is still accepted by gcc 4.5; one stuck into ffmpeg and broke
the build with another compiler.

For instance, this only fails in c():

int as(int a)
{
asm ( : : m((int)a));
}

int c(int a)
{
return *((int)a);
}

 /usr/local/gcc45/bin/gcc -S test.c
test.c: In function 'c':
test.c:8: error: lvalue required as unary '' operand


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 CC||astrange at ithinksw dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=2803



[Bug target/39337] New: x86 use of VLA disables -fomit-frame-pointer

2009-03-01 Thread astrange at ithinksw dot com
Using gcc 4.4.0 20090226 with -Os on:

int f(int a)
{
if (!a) {
return 0;
} else {
volatile int vla[a];
vla[0] = 0;
return vla[0];
}
}

gives:
_f:
pushl   %ebp
xorl%eax, %eax
movl%esp, %ebp
subl$8, %esp
movl8(%ebp), %edx
testl   %edx, %edx
je  L3
movl%esp, %ecx
leal30(,%edx,4), %eax
andl$-16, %eax
subl%eax, %esp
leal15(%esp), %eax
andl$-16, %eax
movl$0, (%eax)
movl(%eax), %eax
movl%ecx, %esp
L3:
leave
ret

Adding -fomit-frame-pointer gives the exact same result. ebp shouldn't be saved
here, since esp is saved to and restored from ecx anyway, so it's not actually
used for anything.

This isn't just a problem for crazy asm - gcc errors if an asm clobbers ebp in
a function with VLAs- but also means that inlining a function with VLAs makes
generated code worse, since the entire function loses one register.


-- 
   Summary: x86 use of VLA disables -fomit-frame-pointer
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i?86-*-*
  GCC host triplet: i?86-*-*
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39337



[Bug target/39337] x86 use of VLA disables -fomit-frame-pointer

2009-03-01 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2009-03-02 02:39 ---
 This is correct, vla and alloca always uses a frame pointer because there is 
 no way to get back to the original offsets so the compiler needs a frame 
 pointer.

It's not restoring from the frame pointer here, it's restoring from ecx. 'addl
$8, %esp' would work just as well in the function epilogue, like it would if
this function had no VLA.

Disabling inlining does fix that problem, though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39337



[Bug target/39329] New: x86 -Os could use mulw for (uint16 * uint16)16

2009-02-28 Thread astrange at ithinksw dot com
Using 'gcc -Os -fomit-frame-pointer -march=core2 -mtune=core2' for

unsigned short mul_high_c(unsigned short a, unsigned short b)
{
return (unsigned)(a * b)  16;
}

unsigned short mul_high_asm(unsigned short a, unsigned short b)
{
unsigned short res;
asm(mulw %w2 : =d(res),+a(a) : rm(b));
return res;
}

I get

_mul_high_c:
subl$12, %esp
movzwl  20(%esp), %eax
movzwl  16(%esp), %edx
addl$12, %esp
imull   %edx, %eax
shrl$16, %eax
ret
_mul_high_asm:
subl$12, %esp
movl16(%esp), %eax
mulw 20(%esp)
addl$12, %esp
movl%edx, %eax
ret

mulw puts its outputs in dx:ax, and dx contains (dx:ax)16, so the shift is
avoided.

Ignoring the weird Darwin stack adjustment code, the version with mulw is
somewhat shorter and avoids a movzwl. I'm not sure what the performance
difference is; mulw is listed in Agner's tables as fairly low latency, but
requires a length changing prefix for memory.

This type of operation is useful in fixed-point math, such as embedded audio
codecs or arithmetic coders.


-- 
   Summary: x86 -Os could use mulw for (uint16 * uint16)16
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i?86-*-*
  GCC host triplet: i?86-*-*
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39329



[Bug target/39123] New: x86 asm *(a+b) input causes out of registers above -O0

2009-02-06 Thread astrange at ithinksw dot com
Using gcc version 4.4.0 20090207 (experimental) (GCC) 

 /usr/local/gcc44/bin/gcc -O0 -fno-pic -fomit-frame-pointer -S cabac-ret.i
 /usr/local/gcc44/bin/gcc -O1 -fno-pic -fomit-frame-pointer -S cabac-ret.i
cabac-ret.i: In function 'get_cabac_minput':
cabac-ret.i:24: error: can't find a register in class 'GENERAL_REGS' while
reloading 'asm'
cabac-ret.i:24: error: 'asm' operand has impossible constraints

This is an asm using 7 registers; above -O0 one of the inputs in the second
version is combined into a complex memory operand, which uses 8 registers in
one statement and fails to compile. It would be nice if it could fall back to a
seperate add for x86-32, since the memory clobber in the first version might
cause suboptimal code.


-- 
   Summary: x86 asm *(a+b) input causes out of registers above -O0
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i?86-*-*
  GCC host triplet: i?86-*-*
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39123



[Bug target/39123] x86 asm *(a+b) input causes out of registers above -O0

2009-02-06 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2009-02-07 06:13 ---
Created an attachment (id=17265)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17265action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39123



[Bug target/32593] Missed optimization of 'y = constant - x' operation

2008-12-17 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-12-17 22:10 ---
Causes silly code on i386 with this:
void pred8x8l_vertical_add_c(unsigned char *pix, const short *block, int
stride){
int i;
for(i=0; i8; i++){
int j;
for (j=0; j8; j++){
pix[j] = pix[j-stride] + block[j];
}
pix+= stride;
block+= 8;
}
}

where it calculates and then spills each of [0-7] - stride to the stack,
instead of just being able to keep -stride in a register and incrementing it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32593



[Bug target/36539] IRA+i386 doesn't allocate asm output being returned to eax

2008-12-05 Thread astrange at ithinksw dot com


--- Comment #8 from astrange at ithinksw dot com  2008-12-05 20:08 ---
With some recent changes IRA makes better decisions now but they don't survive
reload.

Using
 /gcc -O3 -fomit-frame-pointer -fno-pic -fdump-rtl-ira -S cabac-ret.i

I get about the same asm and this in the IRA dump:
 Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
bbs: 2
all: 0r64 1r58 2r62 3r59 4r60 5r63
modified regnos: 58 59 60 62 63 64
border:
Pressure: GENERAL_REGS=6
Reg 58 of GENERAL_REGS has 2 regs less
Reg 62 of GENERAL_REGS has 2 regs less
Reg 59 of GENERAL_REGS has 2 regs less
Reg 60 of GENERAL_REGS has 2 regs less
Reg 63 of GENERAL_REGS has 2 regs less
  Pushing a0(r64,l0)
  Pushing a3(r59,l0)(potential spill: pri=2857, cost=2)
  Pushing a1(r58,l0)
  Pushing a5(r63,l0)
  Pushing a2(r62,l0)
  Pushing a4(r60,l0)
  Popping a4(r60,l0)  -- assign reg 3
  Popping a2(r62,l0)  -- assign reg 4
  Popping a5(r63,l0)  -- assign reg 0 - r(state)
  Popping a1(r58,l0)  -- assign reg 0 - =r(bit)
  Popping a3(r59,l0)  -- assign reg 5
  Popping a0(r64,l0)  -- assign reg 0 - returned bit1

a1 and a5 should be conflicting, since a1 is an earlyclobber output and can't
share a register with any of the inputs. reload fixes this by moving it to a
worse register. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/36539] IRA+i386 doesn't allocate asm output being returned to eax

2008-09-17 Thread astrange at ithinksw dot com


--- Comment #7 from astrange at ithinksw dot com  2008-09-18 01:29 ---
Updated to 32-bit only.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

   Severity|normal  |enhancement
 GCC target triplet|x86_64-*-*  |i?86-*-*
Summary|IRA doesn't allocate asm|IRA+i386 doesn't allocate
   |output being returned to eax|asm output being returned to
   ||eax


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax

2008-09-03 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2008-09-04 04:02 ---
It is fixed for me on x86-64. For i386 it's still suboptimal:
_get_cabac:
subl$28, %esp
movl%esi, 16(%esp)
movl%edi, 20(%esp)
movl%ebx, 12(%esp)
movl%ebp, 24(%esp)
movl32(%esp), %esi
movl36(%esp), %edi
movl(%esi), %eax
movl4(%esi), %ebx
# 16 ../cabac-ret.i 1
#%ebp %ebx %ax 16(%esi) %edi
# 0  2
movl%eax, (%esi)
movl%ebx, 4(%esi)
movl%ebp, %eax
movl12(%esp), %ebx
andl$1, %eax
movl16(%esp), %esi
movl20(%esp), %edi
movl24(%esp), %ebp
addl$28, %esp
ret

but not a regression (code is worse without IRA).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug rtl-optimization/36673] IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #5 from astrange at ithinksw dot com  2008-08-27 04:27 ---
Fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673



[Bug rtl-optimization/36672] IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-08-27 04:28 ---
Fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672



[Bug rtl-optimization/36663] IRA ICE in save_call_clobbered_regs at caller-save.c:1949

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-08-27 04:28 ---
Fixed.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663



[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax

2008-08-26 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2008-08-27 04:41 ---
Now it is.


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

Summary|IRA doesn't allocate asm|[4.4 regression] IRA doesn't
   |output being returned to eax|allocate asm output being
   ||returned to eax


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug rtl-optimization/36663] New: IRA ICE in save_call_clobbered_regs at caller-save.c:1949

2008-06-29 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
Target: i386-apple-darwin9.3.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc44-ira
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080530 (experimental) (GCC)

 gcc -O3 -fira -S ira-ice.i
ira-ice.i: In function 'avf_sdp_create':
ira-ice.i:59: internal compiler error: in save_call_clobbered_regs, at
caller-save.c:1949
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

This only happens at -O3, not anything below.


-- 
   Summary: IRA ICE in save_call_clobbered_regs at caller-
save.c:1949
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663



[Bug rtl-optimization/36663] IRA ICE in save_call_clobbered_regs at caller-save.c:1949

2008-06-29 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-06-29 07:14 ---
Created an attachment (id=15828)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15828action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36663



[Bug rtl-optimization/36672] New: IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829

2008-06-29 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
Target: i386-apple-darwin9.3.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc44-ira
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080530 (experimental) (GCC)

 gcc -O3 -fira -fno-pic -S ira-ice2.i
ira-ice2.i:38: internal compiler error: in emit_swap_insn, at reg-stack.c:829
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Happens with -O2, but not below that, and not without -fno-pic.


-- 
   Summary: IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672



[Bug rtl-optimization/36672] IRA + -fno-pic ICE in emit_swap_insn, at reg-stack.c:829

2008-06-29 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-06-29 21:35 ---
Created an attachment (id=15830)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15830action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36672



[Bug rtl-optimization/36673] New: IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389

2008-06-29 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
Target: i386-apple-darwin9.3.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc44-ira
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080530 (experimental) (GCC)

 gcc -O3 -fira -fno-pic -S ira-ice.i
ira-ice.i: In function 'MPV_motion_lowres':
ira-ice.i:201: internal compiler error: in save_con_fun_n, at
caller-save.c:1389
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Without -fno-pic there's a different ICE, and with a lower -O it compiles.
Besides these three ICEs there are several miscompiles of ffmpeg r14025, all of
which cause it to crash on startup or enter infinite loops, so I guess I can't
benchmark IRA for now.


-- 
   Summary: IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-
save.c:1389
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673



[Bug rtl-optimization/36673] IRA -O3 -fno-pic ICE in save_con_fun_n, at caller-save.c:1389

2008-06-29 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-06-29 21:41 ---
Created an attachment (id=15831)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15831action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36673



[Bug target/36661] New: x86 asm +r operands cause unnecessary spills/copies

2008-06-28 Thread astrange at ithinksw dot com
Compiling the attached source on i386 with:
 gcc -O3  -fomit-frame-pointer -fno-pic -S asm-spills.i
produces:
.text
.align 4,0x90
.globl _get_cabac_noinline
_get_cabac_noinline:
subl$76, %esp
movl%esi, 64(%esp)
movl%edi, 68(%esp)
movl%ebp, 72(%esp)
movl%ebx, 60(%esp)
movl80(%esp), %esi
movl84(%esp), %edi
movl(%esi), %edx
movl4(%esi), %ebx
movl%edx, 28(%esp) # unused spill
movl%edx, %ebp # pointless move
# 24 ../strange-spills.i 1
#%eax %bp %ebx 16(%esi) %edx (%edi)
# 0  2
movl%ebp, (%esi)
movl%ebx, 4(%esi)
andl$1, %eax
movl60(%esp), %ebx
movl%eax, 44(%esp) #unused spill
movl64(%esp), %esi
movl68(%esp), %edi
movl72(%esp), %ebp
addl$76, %esp
ret
.subsections_via_symbols

which has several unnecessary stack spills.

Reading through RTL dumps:
- everything is fine before asmcons
- asmcons inserts copies of c-low/range after they're loaded. There's no point
to this, since the original is never used later, but I guess there isn't a
problem as long as it's cleaned up.
- Somehow, the RA gets confused by the asmcons copy and the later one to copy
the return value into eax. Instead of assigning both sides of the copy to the
same register (which is obviously possible), or even using mov, it spills and
reloads them into different registers.
- Later passes optimize away the reloads but keep the stores.

This isn't a regression (gcc 3.4 isn't much better) and still happens in the
IRA branch. For some reason, changing =q(tmp) to =d improves IRA but not
trunk.

This is about the same source as
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539.


-- 
   Summary: x86 asm +r operands cause unnecessary spills/copies
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36661



[Bug target/36661] x86 asm +r operands cause unnecessary spills/copies

2008-06-28 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-06-28 23:35 ---
Created an attachment (id=15823)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15823action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36661



[Bug tree-optimization/36646] New: [4.4 regression] Unnecessary moves generated on loop boundaries

2008-06-26 Thread astrange at ithinksw dot com
The attached source is a loop+switch statement, where only some of the switch
cases change the variable 'val'. 4.4 generates moves for it in every case, even
the ones where it's not mentioned, while 4.2 didn't; the difference is visible
in tree dumps.

This part:
case Op_Inc1: (*tape)++; break;

with 4.2 at -O:

L3:;
  *tape = *tape + 1;
  goto bb 3 (L0);

L5:
incb(%edx)
jmp L13

SVN at -O:
L3:;
  *tape.17 = *tape.17 + 1;
  val.16 = val;
  goto bb 3 (L10);

L6:
incb(%esi)
movl%edx, %eax
jmp L10

Suprisingly, -O3 is worse:
L6:
movl%edx, %eax
incb(%esi)
movl%eax, %edx
jmp L2

IRA doesn't improve it.
This isn't from real-world code, so it's not really important, but I'd like to
make a code-copying VM out of this.


-- 
   Summary: [4.4 regression] Unnecessary moves generated on loop
boundaries
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug tree-optimization/36646] [4.4 regression] Unnecessary moves generated on loop boundaries

2008-06-26 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-06-27 04:57 ---
Created an attachment (id=15818)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15818action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug tree-optimization/36646] [4.4 regression] Unnecessary moves generated on loop boundaries

2008-06-26 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2008-06-27 05:04 ---
Created an attachment (id=15819)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15819action=view)
svn 20080625 + -O compile


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36646



[Bug target/36539] New: [4.4 regression] IRA doesn't allocate asm output being returned to eax

2008-06-14 Thread astrange at ithinksw dot com
Using today's IRA branch (r136683), on the attached file.

 gcc -O3 -fno-pic -fomit-frame-pointer -m64 -S cabac-ret.i -fira
_get_cabac:
LFB2:
pushq   %rbx
LCFI0:
movl(%rdi), %eax
movl4(%rdi), %r8d
# 16 cabac-ret.i 1
#%ebx %r8d %ax 24(%rdi) %rsi
# 0  2
movl%eax, (%rdi)
movl%r8d, 4(%rdi)
movl%ebx, %eax
popq%rbx
andl$1, %eax
ret

with an unnecessary mov %ebx, %eax. Without -fira:
movl(%rdi), %r8d
movl4(%rdi), %r9d
# 16 cabac-ret.i 1
#%eax %r9d %r8w 24(%rdi) %rsi
# 0  2
movl%r8d, (%rdi)
movl%r9d, 4(%rdi)
andl$1, %eax
ret

Both allocators don't allocate bit to eax in 32-bit mode, though all other
compilers with inline asm support I tried did. gcc 3.3 does, as well, but no
other version seemed to.

In this case it's not a problem, since changing the class to =a fixes it,
but the function will be inlined a lot and I don't want to put unnecessary
constraints on it.


-- 
   Summary: [4.4 regression] IRA doesn't allocate asm output being
returned to eax
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: x86_64-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/36539] [4.4 regression] IRA doesn't allocate asm output being returned to eax

2008-06-14 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-06-14 06:48 ---
Created an attachment (id=15771)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15771action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36539



[Bug target/36503] x86 can use x -y for x 32-y

2008-06-12 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-06-12 16:48 ---
Maybe it seemed likely to cause a warning - I haven't checked that yet, though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503



[Bug target/36502] New: i386/darwin generates unnecessary stack ops in every function

2008-06-11 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
Target: i386-apple-darwin9.2.2
Configured with: ../gcc/configure --prefix=/usr/local/gcc44
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
--enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080611 (experimental) (GCC) 

gcc changes esp in every function, even if it has no stack values. Given:
int a;
void f() {a++;}

 gcc -O -fomit-frame-pointer -fno-pic -S add.c
_f:
subl$12, %esp
incl_a
addl$12, %esp
ret

Apple's GCC doesn't do this and neither does 4.4 on other systems (as far as I
know).


-- 
   Summary: i386/darwin generates unnecessary stack ops in every
function
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i386-apple-darwin*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36502



[Bug target/36503] New: x86 can use x -y for x 32-y

2008-06-11 Thread astrange at ithinksw dot com
 gcc -v
Using built-in specs.
Target: i386-apple-darwin9.2.2
Configured with: ../gcc/configure --prefix=/usr/local/gcc44
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
--enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080611 (experimental) (GCC) 

gcc compiles

int shift32(int i, int n)
{
return i  (32 - n);
}

to

_shift32:
subl$12, %esp
movl$32, %ecx
subl20(%esp), %ecx
movl16(%esp), %eax
sarl%cl, %eax
addl$12, %esp
ret

Since all 286-and-up CPUs only use the low 5 bits of ecx when shifting, this
can be:

_shift32:
movl8(%esp), %ecx
movl4(%esp), %eax
negl   %ecx
sarl%cl, %eax
ret

This is very common in bitstream readers, where it's used to read the top N
bits from a word. ffmpeg already has an inline asm to do it, which I'd like to
get rid of.

I'd guess this applies to some other architectures; it probably works on
x86-64, but doesn't on PPC.


-- 
   Summary: x86 can use x  -y for x  32-y
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503



[Bug target/36503] x86 can use x -y for x 32-y

2008-06-11 Thread astrange at ithinksw dot com


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

   Severity|normal  |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36503



[Bug tree-optimization/36318] New: SRA pessimizes struct copies without -Os

2008-05-23 Thread astrange at ithinksw dot com
 /usr/local/gcc44/bin/gcc -v
Using built-in specs.
Target: i386-apple-darwin9.2.2
Configured with: ../gcc/configure --prefix=/usr/local/gcc44
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
--enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080523 (experimental) (GCC) 

and these options:
 gcc -fno-pic -fomit-frame-pointer -O3 -S wc.c

For the attached source, gcc generates good code for global variable
assignment:
_foo0:
subl$12, %esp
movl_b, %eax
movl%eax, _a
addl$12, %esp
ret

but uses byte copies for pointer assignment:
_foo1:
subl$12, %esp
movl%ebx, 4(%esp)
movl%esi, 8(%esp)
movl20(%esp), %eax
movl16(%esp), %edx
movzbl  (%eax), %esi
movzbl  1(%eax), %ebx
movzbl  2(%eax), %ecx
movzbl  3(%eax), %eax
movb%cl, 2(%edx)
movb%al, 3(%edx)
movb%bl, 1(%edx)
movl%esi, %eax
movb%al, (%edx)
movl4(%esp), %ebx
movl8(%esp), %esi
addl$12, %esp
ret

Using either -Os or -fno-tree-sra fixes it.


-- 
   Summary: SRA pessimizes struct copies without -Os
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318



[Bug tree-optimization/36318] SRA pessimizes struct copies without -Os

2008-05-23 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-05-23 21:37 ---
Created an attachment (id=15678)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15678action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36318



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2008-05-07 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-05-07 17:36 ---
Created an attachment (id=15592)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15592action=view)
minimal source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/36127] New: bad choice of loop IVs above -Os on x86

2008-05-04 Thread astrange at ithinksw dot com
 /usr/local/gcc44/bin/gcc -v
[..]
gcc version 4.4.0 20080503 (experimental) (GCC)
 gcc -O3 -mfpmath=sse -fno-pic -fno-tree-vectorize -S himenoBMTxps.c

With -O2/-O3, the inner loop in jacobi() in this program ends containing a lot
of this:
movss   _p-4(%edi,%edx,4), %xmm0
movl-96(%ebp), %edi
subss   _p-4(%edi,%edx,4), %xmm0
movl-108(%ebp), %edi
subss   _p-4(%edi,%edx,4), %xmm0
movl-92(%ebp), %edi
addss   _p-4(%edi,%edx,4), %xmm0
movl-124(%ebp), %edi

At -O1 or -Os, it instead produces:
movss   34056(%eax), %xmm0
subss   33024(%eax), %xmm0
subss   -33024(%eax), %xmm0
addss   -34056(%eax), %xmm0

which is much better. On core 2 it claims to be 40% faster at -Os.

IIRC this isn't a problem on x86-64, but IRA+-O3 was much worse again.


-- 
   Summary: bad choice of loop IVs above -Os on x86
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
GCC target triplet: i?86-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2008-05-04 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-05-05 02:12 ---
Created an attachment (id=15578)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15578action=view)
source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2008-05-04 Thread astrange at ithinksw dot com


-- 

astrange at ithinksw dot com changed:

   What|Removed |Added

   Severity|normal  |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2008-05-04 Thread astrange at ithinksw dot com


--- Comment #2 from astrange at ithinksw dot com  2008-05-05 02:12 ---
Created an attachment (id=15579)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15579action=view)
compiled at -O3 on darwin


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/36127] bad choice of loop IVs above -Os on x86

2008-05-04 Thread astrange at ithinksw dot com


--- Comment #3 from astrange at ithinksw dot com  2008-05-05 02:13 ---
Created an attachment (id=15580)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15580action=view)
and at -Os


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36127



[Bug tree-optimization/33705] restrict doesn't improve char * aliasing

2008-04-20 Thread astrange at ithinksw dot com


--- Comment #4 from astrange at ithinksw dot com  2008-04-20 23:48 ---
Created an attachment (id=15502)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15502action=view)
source with __restrict (no change)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33705



[Bug target/35714] New: x86 poor code with pmaddwd

2008-03-26 Thread astrange at ithinksw dot com
 /usr/local/gcc44/bin/gcc -v
Using built-in specs.
Target: i386-apple-darwin9.2.0
Configured with: ../gcc/configure --prefix=/usr/local/gcc44
--enable-threads=posix --with-arch=core2 --with-tune=core2 --with-gmp=/sw
--with-mpfr=/sw --disable-nls --disable-bootstrap --enable-checking=yes,rtl
CFLAGS=-g LDFLAGS=/usr/lib/libiconv.dylib --enable-languages=c,c++,objc
Thread model: posix
gcc version 4.4.0 20080326 (experimental) (GCC)
 /usr/local/gcc44/bin/gcc -Os -march=core2 -fno-pic -fomit-frame-pointer 
 -flax-vector-conversions -S pmaddwd.c

generates:
_madd_swapped:
subl$12, %esp
movaps  LC0, %xmm1
addl$12, %esp
pmaddwd %xmm1, %xmm0
ret
.globl _madd
_madd:
subl$12, %esp
movaps  LC0, %xmm1
addl$12, %esp
pmaddwd %xmm0, %xmm1
movaps  %xmm1, %xmm0
ret

Both of these should be:
_madd:
pmaddwd LC0, %xmm0
ret

since the stack isn't referenced and pmaddwd is commutative. (the variable
being renamed LC0 is PR 31043)


-- 
   Summary: x86 poor code with pmaddwd
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: astrange at ithinksw dot com
 GCC build triplet: i386-apple-darwin9.2.0
  GCC host triplet: i386-apple-darwin9.2.0
GCC target triplet: i386-apple-darwin9.2.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35714



[Bug target/35714] x86 poor code with pmaddwd

2008-03-26 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-03-27 01:02 ---
Created an attachment (id=15384)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15384action=view)
source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35714



[Bug other/31043] duplicated data in .rodata / .rodata.cst sections.

2008-03-21 Thread astrange at ithinksw dot com


--- Comment #1 from astrange at ithinksw dot com  2008-03-22 04:28 ---
I encountered this myself with 4.4.0 20080321.

If the data is static, gcc generates LC0 but not the copy with the original
name, which impedes debugging.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31043



  1   2   >