[Bug middle-end/21628] GCC much slower than ICL. Lack of inlining?

2008-09-28 Thread laurent at ient dot rwth-aachen dot de


--- Comment #9 from laurent at ient dot rwth-aachen dot de  2008-09-28 
07:59 ---
(In reply to comment #8)
 Try
   #define inline inline __attribute__((always_inline))
 instead.  The inline keyword changes linkage, so you have to keep it.
 If you keep having problems open a new bugreport please, the performance
 issue seems to be still solved.
 

Thank you! It works. 
Sorry for my question.

I have still noticed before another problem with
__attribute__((always_inline)).
I will write a bug report in a few days, as soon as I will reproduce it with a
small test case.
The bug actually accurs at a few positions in the STL with a error message
somewhat like sorry unimplemented, could not inline. I could temporary fix it
with a few change in the STL. 

Are you interested if I post a new report for a performance issue in comparison
to ICL? This performance issue is the only one I know, that still prevents me
from prefering GCC to ICL. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628



[Bug middle-end/21628] GCC much slower than ICL. Lack of inlining?

2008-09-27 Thread laurent at ient dot rwth-aachen dot de


--- Comment #7 from laurent at ient dot rwth-aachen dot de  2008-09-27 
11:40 ---
Hello

I reopen the discussion because I noticed a problem in relation with
__attribute__((__always_inline__)) when I tried to compile my library as a
DLL.

GCC now forces inlines well, and is now as quick as ICL for my generic
implementation of the FFT Fast Fourier Transform).
So I would like to progressively use GCC as my favorite compiler.

GCC works fine if I use inline as usual (but my program is slow).
But I get hundreds error messages if I use the macro #define inline
__attribute__((__always_inline__)). For example:

obj\Release\src\copy.o:copy.cpp:(.text+0x0): first defined here
obj\Release\src\dct.o:dct.cpp:(.text+0xe): multiple definition of `_ferror'
obj\Release\src\copy.o:copy.cpp:(.text+0xe): first defined here
obj\Release\src\dct.o:dct.cpp:(.text+0x1c): multiple definition of `operator
new(unsigned int, void*)'
obj\Release\src\copy.o:copy.cpp:(.text+0x1c): first defined here

I use the CodeBlock Environnement and MinGW GCC 4.3.1 (downloaded from
http://www.tdragon.net/recentgcc/)
My problem might be related with the following bug report:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37121

Any Clue


I was asked above from Paolo if I could give a test case where ICL is quicker
than GCC. I will publish the new version 2.2 of my library in a couple of days
(http://www.ient.rwth-aachen.de/~laurent/genial/genial.html).
If you are still interested, I could give then a small test case using my
library where ICL is much quicker than GCC and VC2008 (but I don't care much
about VC2008).


-- 

laurent at ient dot rwth-aachen dot de changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|FIXED   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628



[Bug middle-end/21628] GCC much slower than ICL. Lack of inlining?

2007-11-16 Thread laurent at ient dot rwth-aachen dot de


--- Comment #2 from laurent at ient dot rwth-aachen dot de  2007-11-16 
17:46 ---
(In reply to comment #1)
 What does -Winline say?
 
 Have you tried with always_inline? Example:
 
  /* Prototype.  */
  inline void foo (const char) __attribute__((always_inline));
 
Whaow, I have posted this report for a while...!!!

As I posted, GCC was at version 3.x.
Winline said that many functions were not inlined despite of the presence of
the keyword 'inline'.
yes, I did try __attribute__((__always_inline__)). 

But Since version 4.2, GCC seems to respect this attribute, at least!!! 
This was a great improvement for me, I have really waited for this feature.

I once found a page, where a very important person in the Linux world (cannot
remember who now, Linux Toward probably) complained about the lack of inlining
in linux-Kernel, that there were no way to force GCC, etc...
I am glad that this person was heard by GCC developers...

It improved a lot the performance of my library compiled with GCC.
But honestly ICL (Intel Compiler for Windows) is still much better in
optimisations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628



[Bug middle-end/21628] GCC much slower than ICL. Lack of inlining?

2007-11-16 Thread laurent at ient dot rwth-aachen dot de


--- Comment #6 from laurent at ient dot rwth-aachen dot de  2007-11-16 
20:42 ---
 Note that for completely inlining kernels you can use the
 __attribute__((flatten))
 on the *calling* function.  Usually with expression templates that is the
 function
 containing the loops, like
 void __attribute__((flatten)) doit()
 {
   for (;;)
 lots_of_calls_to_inline ();
 }
 and it will make sure to inline all calls done in doit (recursively, so no
 calls
 will be left in the final version).  Also starting with GCC 4.2 (and much
 improved on trunk which will become 4.3) using profile-feedback will
 improve inline performance a lot (use -fprofile-generate, run, -fprofile-use).
Good to know! Thanks for the advices!

 But we are better in freedom. ;-)
Much better!

 OK. So this is fixed. Thanks for the report nonetheless. And sorry for the
 delay.
No problemo. Thank to all of you.

 I don't think all the inlining improvements (many) can be traced back to any
 specific individual complaining (not even Linus Torvalds ;)
(Ups! sorry for having misspelled the name of Linus Torvalds!)
You are most probably right. I was nevertheless happy to notice I was not alone
to complain about the problem. 

 Details would be certainly welcome. Ideally, a reduced snippet, to pursue the
 optimization people to take action reasonably quickly...
Hmm, difficult. 
I just sometimes compare execution speed of numerical calculations from
different compilers (ICL,VC2005,GCC), and ICL is often quicker by maybe 10%.
If I have more specific and easier examples, I'll post them.

I especially appreciate the way GCC notifies the compilation errors from deep
nested templates. I could not have programmed deep nested template expression
with the complicated error messages form ICL or VC2005.

I have to say that ICL has obviously not respected the __forceinline directive
any more since the version 9 and 10, this is for me a clear regression.
I do not know exactly the changes in these latest versions, but I do not want
to exchange with my good old version 8.1.

Thanks


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628



[Bug c++/21628] New: GCC much slower than ICL. Lack of inlining?

2005-05-17 Thread laurent at ient dot rwth-aachen dot de
I first posted this problem at [EMAIL PROTECTED] I was advice to post my 
problem here.

I have a program with many many inline template functions.
It is essential for the execution speed that every (or almost every) function 
marked as inline, becomes really inlined by the compiler.

I already compiled the program with Intel Compiler (ICL) on Visual C++, and it 
works fine and fast. I verified that the functions are then really inlined.

But with GCC 3.4.X (Linux  Cygwin) the same program is much slower (5-20 times)
than the version compiled with ICL. The '-Winline' option of GCC shows me that 
many functions are not inlined like they should.

The compiler considers the 'inline' keyword as an hint, but does not follow it. 
I tried to set various options of GCC, but nothing is satisfactory as far: -
finline-limit 1 --param large-function-growth=100 --param max-
inline-insns-single=100 ...

I am convicted that the poor performance is due to the lack of inlining because 
I get slow execution speed with ICL when the functions are not marked 
as 'inline'. With the '-Winline' option of GCC, I see every not inlined 
functions.

Also the SSE mode of the following test program should be much quicker than 
without SIMD, but requires much more inlining. ICL manages it, GCC not at all.

Do you know a mean to force GCC to obey the inline statement, or to increase 
the limits that these compilers internally have? Or do you have an alternative?


It is not possible to give a small test program. If you want to test on your 
own, I propose you download my library at this address, and compile the 
following test. (No need to compile the library, it is STL-like) 
http://www.ient.rwth-aachen.de/team/laurent/genial/genial.html

#define FFT_LEVEL 32
#include signal/fft.h
int main()
{
  DenseVectorcomplexfloat ::self X(32,0);
  DenseVectorcomplexfloat ::self Y(X.size(),0);
  double t0=get_time();
  for (int i=0; i100; ++i)
fft(X,Y);
  cout  get_time()-t0  endl;
}

The execution time on a Pentium 4, 3.2GHz:
With ICL on Windows:
-No simd: 0.368s
-SSE: 0.126s
-SSE3: 0.112s
With GCC 3.4 on Cygwin/Linux (-O3 -msse3 -UWIN32 -ftemplate-depth-36 -lstlport)
-No SIMD : 0.969s
-SSE: 2.069s

For more informations, contact me per email (see home page)

Thanks

-- 
   Summary: GCC much slower than ICL. Lack of inlining?
   Product: gcc
   Version: 3.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: laurent at ient dot rwth-aachen dot de
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21628