Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-02-08 Thread H. J. Lu
On Wed, Feb 07, 2007 at 05:32:28PM +0300, Vladimir Sysoev wrote:
 Hi!
 I create test to reproduce issue with cpu2006/454.calculix
 See attached. File e_c3d.f contains cutted subroutine from calculix.
 tr535.f main entry point of the test. you can use go-script as a
 reference how i get these results. find_stall.pl script which find
 problem instruction combinations.
 
 Problem that new compiler generates read instruction right after
 write. See some dumps below.
 

A bug report is opened:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30735


H.J.


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-02-07 Thread Vladimir Sysoev

Hi!
I create test to reproduce issue with cpu2006/454.calculix
See attached. File e_c3d.f contains cutted subroutine from calculix.
tr535.f main entry point of the test. you can use go-script as a
reference how i get these results. find_stall.pl script which find
problem instruction combinations.

Problem that new compiler generates read instruction right after
write. See some dumps below.

This is inner cycle near line #42 generated by rev. 119759 compiler
.L13:
.LBB22:
.loc 1 42 0
movapd  %xmm2, %xmm0
leaq(%rdx,%rbx), %rax
.loc 1 38 0
addl$1, %edi
addq$24, %rdx
.loc 1 42 0
mulsd   72(%rcx), %xmm0
.loc 1 38 0
addq$72, %rcx
cmpl$4, %edi
.loc 1 42 0
mulsd   %xmm3, %xmm0
mulsd   -8(%rax,%r9,8), %xmm0
mulsd   %xmm4, %xmm0
addsd   %xmm0, %xmm1
.loc 1 38 0
jne .L13

This is for line 42 generated by rev. 119760 compiler
.L13:
.LBB23:
.loc 1 42 0
movsd   72(%rdx), %xmm0
movq80(%rsp), %rax
addq$72, %rdx
mulsd   -8(%r9,%r15,8), %xmm0
addq%rdi, %rax
addq$24, %rdi
.loc 1 38 0
cmpq$72, %rdi
.loc 1 42 0
mulsd   -8(%r11,%r14,8), %xmm0
mulsd   -8(%rax,%r13,8), %xmm0
movq440(%rsp), %rax
mulsd   (%rax), %xmm0
addsd   (%rsi,%r10,8), %xmm0 -|
movsd   %xmm0, (%rsi,%r10,8)-+- problems
.loc 1 38 0
jne .L13



My output is:
real0m3.781s
user0m3.776s
sys 0m0.004s

real0m5.956s
user0m5.948s
sys 0m0.004s
hey... we are going
hey... we are going
Line 31
   addsd   (%rsi,%r10,8), %xmm0
   movsd   %xmm0, (%rsi,%r10,8)

Line 42
   addsd   (%rsi,%r10,8), %xmm0
   movsd   %xmm0, (%rsi,%r10,8)

Feel free to ask if any problems with reproducing occurs.

-Vladimir


--
   * From: Grigory Zagorodnev grigory_zagorodnev at linux dot intel dot com
   * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com
   * Cc: H. J. Lu hjl at lucon dot org
   * Date: Mon, 15 Jan 2007 17:59:31 +0300
   * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix

Hi!
There is a huge regression of gcc 4.3 performance detected on
cpu2006/454.calculix benchmark at -O2 optimization level on
x86_64-redhat-linux.

Regression is caused by mem-ssa merge 12/12/2006 (revision 119760).
http://gcc.gnu.org/viewcvs?view=revrevision=119760


PS: I'm trying to get a small reproducer
- Grigory


test_calculix.tar.bz2
Description: BZip2 compressed data


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-18 Thread H. J. Lu
On Tue, Jan 16, 2007 at 07:05:34PM +0300, Grigory Zagorodnev wrote:
 Toon Moene wrote:
 Calculix is a combined C/Fortran program.  Did you try to compile the 
 Fortran parts with --param max-aliased-vops=something higher than the 
 default 50 ?
 Right, calculix is a combined program. Gprof says the regression is in 
 e_c3d routine which is 1k lines of Fortran code.
 
 Various max-aliased-vops give no difference for calculix:
 

Is that possible to extract a smaller testcase?


H.J.


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-18 Thread Grigory Zagorodnev

H. J. Lu wrote:

Is that possible to extract a smaller testcase?
I'm working on the small reproducer. That would take some time because 
of benchmark complexity.


- Grigory


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-16 Thread Grigory Zagorodnev

Toon Moene wrote:
Calculix is a combined C/Fortran program.  Did you try to compile the 
Fortran parts with --param max-aliased-vops=something higher than the 
default 50 ?
Right, calculix is a combined program. Gprof says the regression is in 
e_c3d routine which is 1k lines of Fortran code.


Various max-aliased-vops give no difference for calculix:

default (assume --param max-aliased-vops=50)
1780 sec.

--param max-aliased-vops=80
1789 sec.

--param max-aliased-vops=20
1780 sec.

Setting of max-aliased-vops to value greater than 80 gives an ICE:
allocation.f: In function 'allocation':
allocation.f:19: internal compiler error: in ssa_operand_alloc, at 
tree-ssa-oper

ands.c:365

- Grigory


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-15 Thread H. J. Lu
On Mon, Jan 15, 2007 at 09:47:34PM +0100, Toon Moene wrote:
 Grigory,
 
 Calculix is a combined C/Fortran program.  Did you try to compile the 
 Fortran parts with --param max-aliased-vops=something higher than the 
 default 50 ?
 
 Diego up'd the default from 10 to 50 because one (or more) of the 
 (Fortran) Polyhedron benchmarks showed a dramatic performance regression.
 

I added --param max-aliased-vops=50 to Fortran. It doesn't make
a difference to Calculix.


H.J.