Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
On Wed, Feb 07, 2007 at 05:32:28PM +0300, Vladimir Sysoev wrote: > Hi! > I create test to reproduce issue with cpu2006/454.calculix > See attached. File e_c3d.f contains cutted subroutine from calculix. > tr535.f main entry point of the test. you can use go-script as a > reference how i get these results. find_stall.pl script which find > problem instruction combinations. > > Problem that new compiler generates read instruction right after > write. See some dumps below. > A bug report is opened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30735 H.J.
Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
Hi! I create test to reproduce issue with cpu2006/454.calculix See attached. File e_c3d.f contains cutted subroutine from calculix. tr535.f main entry point of the test. you can use go-script as a reference how i get these results. find_stall.pl script which find problem instruction combinations. Problem that new compiler generates read instruction right after write. See some dumps below. This is inner cycle near line #42 generated by rev. 119759 compiler .L13: .LBB22: .loc 1 42 0 movapd %xmm2, %xmm0 leaq(%rdx,%rbx), %rax .loc 1 38 0 addl$1, %edi addq$24, %rdx .loc 1 42 0 mulsd 72(%rcx), %xmm0 .loc 1 38 0 addq$72, %rcx cmpl$4, %edi .loc 1 42 0 mulsd %xmm3, %xmm0 mulsd -8(%rax,%r9,8), %xmm0 mulsd %xmm4, %xmm0 addsd %xmm0, %xmm1 .loc 1 38 0 jne .L13 This is for line 42 generated by rev. 119760 compiler .L13: .LBB23: .loc 1 42 0 movsd 72(%rdx), %xmm0 movq80(%rsp), %rax addq$72, %rdx mulsd -8(%r9,%r15,8), %xmm0 addq%rdi, %rax addq$24, %rdi .loc 1 38 0 cmpq$72, %rdi .loc 1 42 0 mulsd -8(%r11,%r14,8), %xmm0 mulsd -8(%rax,%r13,8), %xmm0 movq440(%rsp), %rax mulsd (%rax), %xmm0 addsd (%rsi,%r10,8), %xmm0 <-| movsd %xmm0, (%rsi,%r10,8)<-+- problems .loc 1 38 0 jne .L13 My output is: real0m3.781s user0m3.776s sys 0m0.004s real0m5.956s user0m5.948s sys 0m0.004s hey... we are going hey... we are going Line 31 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Line 42 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Feel free to ask if any problems with reproducing occurs. -Vladimir -- * From: Grigory Zagorodnev * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com * Cc: "H. J. Lu" * Date: Mon, 15 Jan 2007 17:59:31 +0300 * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix Hi! There is a huge regression of gcc 4.3 performance detected on cpu2006/454.calculix benchmark at -O2 optimization level on x86_64-redhat-linux. Regression is caused by mem-ssa merge 12/12/2006 (revision 119760). http://gcc.gnu.org/viewcvs?view=rev&revision=119760 PS: I'm trying to get a small reproducer - Grigory test_calculix.tar.bz2 Description: BZip2 compressed data
Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
H. J. Lu wrote: Is that possible to extract a smaller testcase? I'm working on the small reproducer. That would take some time because of benchmark complexity. - Grigory
Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
On Tue, Jan 16, 2007 at 07:05:34PM +0300, Grigory Zagorodnev wrote: > Toon Moene wrote: > >Calculix is a combined C/Fortran program. Did you try to compile the > >Fortran parts with --param max-aliased-vops= >default 50> ? > Right, calculix is a combined program. Gprof says the regression is in > e_c3d routine which is 1k lines of Fortran code. > > Various max-aliased-vops give no difference for calculix: > Is that possible to extract a smaller testcase? H.J.
Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
Toon Moene wrote: Calculix is a combined C/Fortran program. Did you try to compile the Fortran parts with --param max-aliased-vops=default 50> ? Right, calculix is a combined program. Gprof says the regression is in e_c3d routine which is 1k lines of Fortran code. Various max-aliased-vops give no difference for calculix: default (assume --param max-aliased-vops=50) 1780 sec. --param max-aliased-vops=80 1789 sec. --param max-aliased-vops=20 1780 sec. Setting of max-aliased-vops to value greater than 80 gives an ICE: allocation.f: In function 'allocation': allocation.f:19: internal compiler error: in ssa_operand_alloc, at tree-ssa-oper ands.c:365 - Grigory
Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
On Mon, Jan 15, 2007 at 09:47:34PM +0100, Toon Moene wrote: > Grigory, > > Calculix is a combined C/Fortran program. Did you try to compile the > Fortran parts with --param max-aliased-vops= default 50> ? > > Diego up'd the default from 10 to 50 because one (or more) of the > (Fortran) Polyhedron benchmarks showed a dramatic performance regression. > I added --param max-aliased-vops=50 to Fortran. It doesn't make a difference to Calculix. H.J.
27% regression of gcc 4.3 performance on cpu2k6/calculix
Grigory, Calculix is a combined C/Fortran program. Did you try to compile the Fortran parts with --param max-aliased-vops=default 50> ? Diego up'd the default from 10 to 50 because one (or more) of the (Fortran) Polyhedron benchmarks showed a dramatic performance regression. (Note: I've sent Diego a 900 line Fortran subroutine that crashes the compiler if you give it --param max-aliased-vops=200 or higher). Kind regards, -- Toon Moene - e-mail: [EMAIL PROTECTED] - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/ Who's working on GNU Fortran: http://gcc.gnu.org/ml/gcc/2007-01/msg00059.html
27% regression of gcc 4.3 performance on cpu2k6/calculix
Hi! There is a huge regression of gcc 4.3 performance detected on cpu2006/454.calculix benchmark at -O2 optimization level on x86_64-redhat-linux. Regression is caused by mem-ssa merge 12/12/2006 (revision 119760). http://gcc.gnu.org/viewcvs?view=rev&revision=119760 PS: I'm trying to get a small reproducer - Grigory