Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-02-08 Thread H. J. Lu
On Wed, Feb 07, 2007 at 05:32:28PM +0300, Vladimir Sysoev wrote:
> Hi!
> I create test to reproduce issue with cpu2006/454.calculix
> See attached. File e_c3d.f contains cutted subroutine from calculix.
> tr535.f main entry point of the test. you can use go-script as a
> reference how i get these results. find_stall.pl script which find
> problem instruction combinations.
> 
> Problem that new compiler generates read instruction right after
> write. See some dumps below.
> 

A bug report is opened:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30735


H.J.


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-02-07 Thread Vladimir Sysoev

Hi!
I create test to reproduce issue with cpu2006/454.calculix
See attached. File e_c3d.f contains cutted subroutine from calculix.
tr535.f main entry point of the test. you can use go-script as a
reference how i get these results. find_stall.pl script which find
problem instruction combinations.

Problem that new compiler generates read instruction right after
write. See some dumps below.

This is inner cycle near line #42 generated by rev. 119759 compiler
.L13:
.LBB22:
.loc 1 42 0
movapd  %xmm2, %xmm0
leaq(%rdx,%rbx), %rax
.loc 1 38 0
addl$1, %edi
addq$24, %rdx
.loc 1 42 0
mulsd   72(%rcx), %xmm0
.loc 1 38 0
addq$72, %rcx
cmpl$4, %edi
.loc 1 42 0
mulsd   %xmm3, %xmm0
mulsd   -8(%rax,%r9,8), %xmm0
mulsd   %xmm4, %xmm0
addsd   %xmm0, %xmm1
.loc 1 38 0
jne .L13

This is for line 42 generated by rev. 119760 compiler
.L13:
.LBB23:
.loc 1 42 0
movsd   72(%rdx), %xmm0
movq80(%rsp), %rax
addq$72, %rdx
mulsd   -8(%r9,%r15,8), %xmm0
addq%rdi, %rax
addq$24, %rdi
.loc 1 38 0
cmpq$72, %rdi
.loc 1 42 0
mulsd   -8(%r11,%r14,8), %xmm0
mulsd   -8(%rax,%r13,8), %xmm0
movq440(%rsp), %rax
mulsd   (%rax), %xmm0
addsd   (%rsi,%r10,8), %xmm0 <-|
movsd   %xmm0, (%rsi,%r10,8)<-+- problems
.loc 1 38 0
jne .L13



My output is:
real0m3.781s
user0m3.776s
sys 0m0.004s

real0m5.956s
user0m5.948s
sys 0m0.004s
hey... we are going
hey... we are going
Line 31
   addsd   (%rsi,%r10,8), %xmm0
   movsd   %xmm0, (%rsi,%r10,8)

Line 42
   addsd   (%rsi,%r10,8), %xmm0
   movsd   %xmm0, (%rsi,%r10,8)

Feel free to ask if any problems with reproducing occurs.

-Vladimir


--
   * From: Grigory Zagorodnev 
   * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com
   * Cc: "H. J. Lu" 
   * Date: Mon, 15 Jan 2007 17:59:31 +0300
   * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix

Hi!
There is a huge regression of gcc 4.3 performance detected on
cpu2006/454.calculix benchmark at -O2 optimization level on
x86_64-redhat-linux.

Regression is caused by mem-ssa merge 12/12/2006 (revision 119760).
http://gcc.gnu.org/viewcvs?view=rev&revision=119760


PS: I'm trying to get a small reproducer
- Grigory


test_calculix.tar.bz2
Description: BZip2 compressed data


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-18 Thread Grigory Zagorodnev

H. J. Lu wrote:

Is that possible to extract a smaller testcase?
I'm working on the small reproducer. That would take some time because 
of benchmark complexity.


- Grigory


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-18 Thread H. J. Lu
On Tue, Jan 16, 2007 at 07:05:34PM +0300, Grigory Zagorodnev wrote:
> Toon Moene wrote:
> >Calculix is a combined C/Fortran program.  Did you try to compile the 
> >Fortran parts with --param max-aliased-vops= >default 50> ?
> Right, calculix is a combined program. Gprof says the regression is in 
> e_c3d routine which is 1k lines of Fortran code.
> 
> Various max-aliased-vops give no difference for calculix:
> 

Is that possible to extract a smaller testcase?


H.J.


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-16 Thread Grigory Zagorodnev

Toon Moene wrote:
Calculix is a combined C/Fortran program.  Did you try to compile the 
Fortran parts with --param max-aliased-vops=default 50> ?
Right, calculix is a combined program. Gprof says the regression is in 
e_c3d routine which is 1k lines of Fortran code.


Various max-aliased-vops give no difference for calculix:

default (assume --param max-aliased-vops=50)
1780 sec.

--param max-aliased-vops=80
1789 sec.

--param max-aliased-vops=20
1780 sec.

Setting of max-aliased-vops to value greater than 80 gives an ICE:
allocation.f: In function 'allocation':
allocation.f:19: internal compiler error: in ssa_operand_alloc, at 
tree-ssa-oper

ands.c:365

- Grigory


Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-15 Thread H. J. Lu
On Mon, Jan 15, 2007 at 09:47:34PM +0100, Toon Moene wrote:
> Grigory,
> 
> Calculix is a combined C/Fortran program.  Did you try to compile the 
> Fortran parts with --param max-aliased-vops= default 50> ?
> 
> Diego up'd the default from 10 to 50 because one (or more) of the 
> (Fortran) Polyhedron benchmarks showed a dramatic performance regression.
> 

I added --param max-aliased-vops=50 to Fortran. It doesn't make
a difference to Calculix.


H.J.


27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-15 Thread Toon Moene

Grigory,

Calculix is a combined C/Fortran program.  Did you try to compile the 
Fortran parts with --param max-aliased-vops=default 50> ?


Diego up'd the default from 10 to 50 because one (or more) of the 
(Fortran) Polyhedron benchmarks showed a dramatic performance regression.


(Note: I've sent Diego a 900 line Fortran subroutine that crashes the 
compiler if you give it --param max-aliased-vops=200 or higher).


Kind regards,

--
Toon Moene - e-mail: [EMAIL PROTECTED] - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran: 
http://gcc.gnu.org/ml/gcc/2007-01/msg00059.html


27% regression of gcc 4.3 performance on cpu2k6/calculix

2007-01-15 Thread Grigory Zagorodnev

Hi!
There is a huge regression of gcc 4.3 performance detected on 
cpu2006/454.calculix benchmark at -O2 optimization level on 
x86_64-redhat-linux.


Regression is caused by mem-ssa merge 12/12/2006 (revision 119760).
http://gcc.gnu.org/viewcvs?view=rev&revision=119760

PS: I'm trying to get a small reproducer
- Grigory