Re: food for optimizer developers

2010-08-12 Thread Steven Bosscher
On Thu, Aug 12, 2010 at 8:46 AM, Ralf W. Grosse-Kunstleve
 wrote:
> Hi Vladimir,
>
> Thanks for the feedback! Very interesting.
>
>
>> Intel optimization compiler team (besides researchers) is much bigger than
>>whole GCC community.
>
> That's a surprise to me. I have to say that the GCC community has done amazing
> work, as you came within factor 1.4 (gfortran) and 1.6 (g++ compiling 
> converted
> code)
> of ifort performance, which is close enough for our purposes, and I think 
> those
> of many people.

Well, I think a ratio of gfortran/ifort=1.4 isn't so great, really. If
you look at one of the popular Fortran benchmarks (Polyhedron,
http://www.polyhedron.com/pb05-linux-f90bench_p40html), the ratio was
less than 1.2 for gfortran 4.3 vs. ifort 11 on an Intel iCore7.

Can you tell how you obtained the performance numbers you are using?
There may be a few compiler flags you could add to reduce that ratio
of 1.4 to something better.

Ciao!
Steven


Re: food for optimizer developers

2010-08-12 Thread David Brown

On 11/08/2010 23:04, Vladimir Makarov wrote:

On 08/10/2010 09:51 PM, Ralf W. Grosse-Kunstleve wrote:

I wrote a Fortran to C++ conversion program that I used to convert
selected
LAPACK sources. Comparing runtimes with different compilers I get:

absolute relative
ifort 11.1.072 1.790s 1.00
gfortran 4.4.4 2.470s 1.38
g++ 4.4.4 2.922s 1.63


To get a full picture, it would be nice to see icc times too.

This is under Fedora 13, 64-bit, 12-core Opteron 2.2GHz

All files needed to easily reproduce the results are here:

http://cci.lbl.gov/lapack_fem/

See the README file or the example commands below.

Questions:

- Is there a way to make the g++ version as fast as ifort?



I think it is more important (and harder) to make gfortran closer to ifort.

I can not say about your fragment of LAPACK. But about 15 years ago I
worked on manual LAPACK optimization for an Alpha processor. As I
remember LAPACK is quite memory bound benchmark. The hottest spot was
matrix multiplication which is used in many LAPACK places. The matrix
multiplication in LAPACK is already moderately optimized by using
temporary variable and that makes it 1.5 faster (if cache is not enough
to hold matrices) than normal algorithm. But proper loop optimizations
(tiling mostly) could improve it in more 4 times.

So I guess and hope graphite project finally will improve LAPACK by
implementing tiling.

After solving memory bound problem, loop vectorization is another
important optimization which could improve LAPACK. Unfortunately, GCC
vectorizes less loops (it was about 2 time less when last time I
checked) than ifort. I did not analyze what is the reason for this.

After solving vectorization problem, another important lower-level loop
optimization is modulo scheduling (even if modern x86/x86_64 processor
are out of order) because OOO processors can look only through a few
branches. And as I remember, Intel compiler does make modulo scheduling
frequently. GCC modulo-scheduling is quite constraint.

That is my thoughts but I might be wrong because I have no time to
confirm my speculations. If you really want to help GCC developers, you
could make comparison analysis of the code generated by ifort and
gfortran and find what optimizations GCC misses. GCC has few resources
and developers who could solve the problems are very busy. Intel
optimization compiler team (besides researchers) is much bigger than
whole GCC community. Taking this into account and that they have much
more info about their processors, I don't think gfortran will generate a
better or equal code for floating point benchmarks in near future.



This is a little out of my league (being neither a FORTRAN programmer 
nor a gcc developer).


However, I note that in the code translated from Fortran to C++, the 
two-dimensional array accesses are all changed into manual address 
calculations done as integer arithmetic.  My understanding of the 
vectorisation, loop optimisation and more advanced code transformations 
from graphite is that they work best when given standard C array 
constructs.  This gives the compiler the most information, and thus it 
can generate the best code.







RE: Link error

2010-08-12 Thread Ian Bolton
Phung Nguyen wrote:
> I am trying to build cross compiler for xc16x. I built successfully
> binutils 2.18; gcc 4.0 and newlib 1.18. Everything is fine when
> compiling a simple C file without any library call. It is also fine
> when making a simple call to printf like printf("Hello world").
> However, i got error message from linker when call printf("i=%i",i);

I don't know the answer, but I think you are more likely to get one
if you post to gcc-h...@gcc.gnu.org.  The gcc@gcc.gnu.org list is for
people developing gcc, rather than only building or using it.

I hope you find your answer soon.

Best regards,
Ian





[txcorp.com #14666] Resolved: culler down

2010-08-12 Thread Tech-X Internal IT Support via RT
According to our records, your request has been resolved. If you have any
further questions or concerns, please reply to this email.


Re: food for optimizer developers

2010-08-12 Thread Steve Kargl
On Thu, Aug 12, 2010 at 09:51:42AM +0200, Steven Bosscher wrote:
> On Thu, Aug 12, 2010 at 8:46 AM, Ralf W. Grosse-Kunstleve
>  wrote:
> > Hi Vladimir,
> >
> > Thanks for the feedback! Very interesting.
> >
> >
> >> Intel optimization compiler team (besides researchers) is much bigger than
> >>whole GCC community.
> >
> > That's a surprise to me. I have to say that the GCC community has done 
> > amazing
> > work, as you came within factor 1.4 (gfortran) and 1.6 (g++ compiling 
> > converted
> > code)
> > of ifort performance, which is close enough for our purposes, and I think 
> > those
> > of many people.
> 
> Well, I think a ratio of gfortran/ifort=1.4 isn't so great, really. If
> you look at one of the popular Fortran benchmarks (Polyhedron,
> http://www.polyhedron.com/pb05-linux-f90bench_p40html), the ratio was
> less than 1.2 for gfortran 4.3 vs. ifort 11 on an Intel iCore7.
> 
> Can you tell how you obtained the performance numbers you are using?
> There may be a few compiler flags you could add to reduce that ratio
> of 1.4 to something better.
> 

Without knowing the compiler options, the results of any benchmark
are meaningless.  For various versions of gfortran, I find the
following average of 5 executions in seconds:

#   A  B  C  D  E  F  G  H
# gfc43   9.808  9.374  9.314  9.832  9.620  9.526  9.022  9.156
# gfc44   9.806  9.440  9.222  9.810  9.414  9.320  8.980  9.152
# gfc45   9.672  9.530  9.250  9.744  9.400  9.204  8.960  8.992
# gfc4x   9.814  9.358  8.622  9.810  Note1  9.172  8.958  9.022
#
# A = -march=native -O
# B = -march=native -O2
# C = -march=native -O3
# D = -march=native -O  -ffast-math
# E = -march=native -O2 -ffast-math
# F = -march=native -O  -funroll-loops
# G = -march=native -O2 -funroll-loops
# H = -march=native -O3 -funroll-loops
#
# Note 1:  STOP DLAMC1 failure (10)
#
# gfc43 --> 4.3.6 20100728 (prerelease)
# gfc44 --> 4.4.5 20100728 (prerelease)
# gfc45 --> 4.5.1 20100728 (prerelease)
# gfc4x --> 4.6.0 20100810 (experimental)
#

I'll note that G is my normal FFLAGS setting along with
-ftree-vectorize.  Column D and E above highlights why
I consider -ffast-math to be an evil option.  For this
benchmark the math is neither fast nor is it particularly
safe with gfc4x.

-- 
Steve


Re: food for optimizer developers

2010-08-12 Thread Toon Moene

Steve Kargl wrote:


# gfc4x   9.814  9.358  8.622  9.810  Note1  9.172  8.958  9.022


Column 5 compiled with -march=native -O2 -ffast-math


# Note 1:  STOP DLAMC1 failure (10)


That's probably because a standard compile of the LAPACK sources only 
compiles {S|D}LAM* with -O0.


The code is simply not written for any higher optimization (i.e., it 
assumes the compiler more or less compiles it "literally").


Cheers,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran


Re: food for optimizer developers

2010-08-12 Thread Steve Kargl
On Thu, Aug 12, 2010 at 08:47:34PM +0200, Toon Moene wrote:
> Steve Kargl wrote:
> 
> ># gfc4x   9.814  9.358  8.622  9.810  Note1  9.172  8.958  9.022
> 
> Column 5 compiled with -march=native -O2 -ffast-math
> 
> ># Note 1:  STOP DLAMC1 failure (10)
> 
> That's probably because a standard compile of the LAPACK sources only 
> compiles {S|D}LAM* with -O0.
> 
> The code is simply not written for any higher optimization (i.e., it 
> assumes the compiler more or less compiles it "literally").
> 

Your observation re-enforces the notion that doing 
benchmarks properly is difficult.  I forgot about
the lapack inquiry routines.  One would think that
some 20+ year after F90, that Dongarra and colleagues
would use the intrinsic numeric inquiry functions.
Although the accumulated time is small, DLAMCH() is
called 2642428 times during execution.  Everything
returned by DLAMCH() can be reduced to a compile
time constant. 

-- 
Steve


2010 GCC and GNU Toolchain Developers' Summit

2010-08-12 Thread Andrew J. Hutton
The annual GCC & GNU Toolchain Developers’ Summit brings together the 
core development team of the GNU Compiler Collection with those working 
on the other toolchain components to discuss the state of the art. We 
focus on providing a vendor neutral environment to encourage open 
dialog, technology demonstrations, as well as long term technical 
roadmap development.


The 2010 Summit will be taking place in Ottawa, Canada from October 25th 
to 27th and located in the SITE building at the uOttawa campus in the 
central core of the city.


The CFP is now available at http://www.gccsummit.org/2010/cfp.php and 
additional information is available on the Summit website at 
http://www.gccsummit.org.


Please feel free to contact me directly by email ajh gccsummit.org with 
any questions or comments.





Re: 2010 GCC and GNU Toolchain Developers' Summit

2010-08-12 Thread Toon Moene

Andrew J. Hutton wrote:

The annual GCC & GNU Toolchain Developers’ Summit brings together the 
core development team of the GNU Compiler Collection with those working 
on the other toolchain components to discuss the state of the art. We 
focus on providing a vendor neutral environment to encourage open 
dialog, technology demonstrations, as well as long term technical 
roadmap development.


The 2010 Summit will be taking place in Ottawa, Canada from October 25th 
to 27th and located in the SITE building at the uOttawa campus in the 
central core of the city.


The CFP is now available at http://www.gccsummit.org/2010/cfp.php and 
additional information is available on the Summit website at 
http://www.gccsummit.org.


Thanks for the invitation.  I'd like to join - unfortunately, I do not 
have a good subject for a talk.


Perhaps some from the gfortran effort would like to give a talk.  As far 
 as I can see I can support one person financially (trip from Europe to 
Ottawa vice versa and stay in "Les Suites").


Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran


gcc-4.5-20100812 is now available

2010-08-12 Thread gccadmin
Snapshot gcc-4.5-20100812 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100812/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 163207

You'll find:

gcc-4.5-20100812.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100812.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100812.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100812.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100812.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100812.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100812.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100812.tar.bz2The GCC testsuite

Diffs from 4.5-20100805 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: food for optimizer developers

2010-08-12 Thread Ralf W. Grosse-Kunstleve
Hi Steve,

> > Can you tell how you obtained the performance numbers you are using?
> > There may be a few compiler flags you could add to reduce that ratio
> > of 1.4 to something better.
> > 
>
> Without knowing the compiler options, the results of any benchmark
> are meaningless.

I used

  gfortran -o dsyev_test_gfortran -O3 -ffast-math dsyev_test.f

as per this script (same directory as the .f file) which lists all compilation
commands (ifort, etc.):

  http://cci.lbl.gov/lapack_fem/lapack_fem_001/compile_dsyev_tests.sh

> For various versions of gfortran, I find the
> following average of 5 executions in seconds:

#   A  B  C  D  E  F  G  H
# gfc43   9.808  9.374  9.314  9.832  9.620  9.526  9.022  9.156
# gfc44   9.806  9.440  9.222  9.810  9.414  9.320  8.980  9.152
# gfc45   9.672  9.530  9.250  9.744  9.400  9.204  8.960  8.992
# gfc4x   9.814  9.358  8.622  9.810  Note1  9.172  8.958  9.022
#
# A = -march=native -O
# B = -march=native -O2
# C = -march=native -O3
# D = -march=native -O  -ffast-math
# E = -march=native -O2 -ffast-math
# F = -march=native -O  -funroll-loops
# G = -march=native -O2 -funroll-loops
# H = -march=native -O3 -funroll-loops
#
# Note 1:  STOP DLAMC1 failure (10)
#
# gfc43 --> 4.3.6 20100728 (prerelease)
# gfc44 --> 4.4.5 20100728 (prerelease)
# gfc45 --> 4.5.1 20100728 (prerelease)
# gfc4x --> 4.6.0 20100810 (experimental)

Very useful!
I'm adding a column "I" with "-O3 -ffast-math" (which I've been using 
forever...).
I'm also trying with ("fc13-n") and without ("fc13") -march=native; I'm
embarrassed to admit this option has escaped me before.
On my FC13 machine with gcc 4.4.4 (12-core Opteron 2.2GHz):

#   A  B  C  D  E  F  G  H  I
# fc13   3.309  2.755  2.462  3.234  2.787  2.956  2.366  2.381  2.296
# fc13-n 3.176  2.742  2.037  3.310  2.730  2.899  2.447  1.982  1.894

For comparison, the ifort -O time was 1.790. Which means gfortran is
only 6% slower!
My original table revised after adding -march=native:

 absolute  relative
ifort 11.1.0721.790s1.00
gfortran 4.4.41.894s1.06
g++ 4.4.4 2.772s1.55

Ralf


Re: gcc-4.4-20100803 is now available

2010-08-12 Thread Ian Lance Taylor
Mihai Donțu  writes:

> Is there a page somewhere which details the list of changes made to gcc for 
> every release? I don't seem to be able to find it anywhere on gcc.gnu.org. 
> I'm 
> not referring to things like http://gcc.gnu.org/gcc-4.5/changes.html but 
> something more like:
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.34.2

See the various ChangeLog files in the gcc source code.

Ian


Re: x86 assembler syntax

2010-08-12 Thread Ian Lance Taylor
"Rick C. Hodgin"  writes:

> Is there an Intel-syntax compatible option for GCC or G++?  And if not,
> why not?  It's so much cleaner than AT&T's.

-masm=intel

This question would have been more appropriate on the gcc-help mailing
list.

Ian


Re: BUILT_IN_FRONTEND - how did this work?

2010-08-12 Thread Ian Lance Taylor
Steven Bosscher  writes:

> It seems that there once was support for builtin functions defined by
> a front end. This is still a useful idea (see e.g. PR24777) but it
> looks like there are no frontend built-in functions anymore. At least,
> a grep for BUILT_IN_FRONTEND gives no meaningful results. There's a
> hint that it may have worked, long ago, in ChangeLog-2000. But there
> is no documentation whatsoever about how BUILT_IN_FRONTEND is supposed
> to work.
>
> Does anyone know how BUILT_IN_FRONTEND should work, what one has to do
> to define a front-end built-in function? Examples? Any idea if there
> ever was a front end that used this feature?

It worked by using a langhook to expand the BUILT_IN_FRONTEND function
to RTL.  But that was only meaningful if you could pass language
specific trees to expand_expr.  That capability was dropped when
tree-ssa came in; language specific trees were converted to GENERIC
rather than to RTL.

BUILT_IN_FRONTEND no longer makes much sense.  We would now instead say
that the frontend should generate appropriate GENERIC for the builtin
function.  In the current world frontends don't generate RTL directly.

BUILT_IN_FRONTEND was used for printf for a while.  That usage was
removed in http://gcc.gnu.org/ml/gcc-patches/2003-07/msg02011.html .

Ian


Re: x86 assembler syntax

2010-08-12 Thread Rick C. Hodgin
> "Rick C. Hodgin"  writes:
> > Is there an Intel-syntax compatible option for GCC or G++?  And if not,
> > why not?  It's so much cleaner than AT&T's.
> -masm=intel
> This question would have been more appropriate on the gcc-help mailing
> list. -Ian Lance Taylor

My apologies to everyone.  I did not know such a list existed.

- Rick C. Hodgin




Re: Remove "asssertions" support from libcpp

2010-08-12 Thread Ian Lance Taylor
Steven Bosscher  writes:

> Assertions in libcpp have been deprecated since r135264:
>
> 2008-05-13  Tom Tromey  
>
> PR preprocessor/22168:
> * expr.c (eval_token): Warn for use of assertions.
>
> Can this feature be removed for GCC 4.6?

It was officially deprecated in the 4.4 release notes, so I think it can
be removed in 4.6.

Ian


Re: Usage of sizeof in testsuite/g++.dg/cpp0x/rv[1..8]p.C

2010-08-12 Thread Ian Lance Taylor
Uros Bizjak  writes:

> A problem arises with the code in testsuite/g++.dg/cpp0x/rv[1..8]p.C.
> These tests use "sizeof(..character array...) == ", but sizeof char
> array depends heavily on the value of #define STRUCTURE_SIZE_BOUNDARY.
> Targets that define this value to i.e. 32 (for performance reasons,
> instead of default BITS_PER_UNIT) will fail all these checks.
>
> Would it be acceptable to change all these checks from
>
> "sa t1;"
>
> to
>
> "sa t1;" ?

Sounds fine to me.

Ian