Hi Michael, Can you provide some details on the compiler flags you used? If you can give me the commands required to reproduce your results, I will submit a bug report to ICC (I work for Intel).
I suspect that ICC is using different (more conservative) inlining heuristic than GCC. Failure to inline probably explains the 13x. However, that doesn't mean that GCC is right and ICC is wrong, as there is no perfect inlining heuristic (see e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 for a contrary position on GCC heuristics). Nonetheless, it may be useful to have the ICC inline heuristic aware of Eigen use cases, since Eigen is rather popular. Fortunately, it seems that there is a good solution already, which is to use EIGEN_STRONG_INLINE, which obviously causes ICC to inline much more aggressively. Best, Jeff PS In the unlikely event that EIGEN_STRONG_INLINE isn't sufficient, you may find the following ICC options useful. $ icpc -help inline Inlining -------- -inline-level=<n> control inline expansion: n=0 disable inlining n=1 inline functions declared with __inline, and perform C++ inlining n=2 inline any function, at the compiler's discretion -f[no-]inline inline functions declared with __inline, and perform C++ inlining -f[no-]inline-functions inline any function at the compiler's discretion -finline-limit=<n> set maximum number of statements a function can have and still be considered for inlining -fgnu89-inline use C89 semantics for "inline" functions when in C99 mode -inline-min-size=<n> set size limit for inlining small routines -no-inline-min-size no size limit for inlining small routines -inline-max-size=<n> set size limit for inlining large routines -no-inline-max-size no size limit for inlining large routines -inline-max-total-size=<n> maximum increase in size for inline function expansion -no-inline-max-total-size no size limit for inline function expansion -inline-max-per-routine=<n> maximum number of inline instances in any function -no-inline-max-per-routine no maximum number of inline instances in any function -inline-max-per-compile=<n> maximum number of inline instances in the current compilation -no-inline-max-per-compile no maximum number of inline instances in the current compilation -inline-factor=<n> set inlining upper limits by n percentage -no-inline-factor do not set set inlining upper limits -inline-forceinline treat inline routines as forceinline -inline-calloc directs the compiler to inline calloc() calls as malloc()/memset() -inline-min_caller-growth=<n> set lower limit on caller growth due to inlining a single routine -no-inline-min-caller-growth no lower limit on caller growth due to inlining a single routine On Wed, Mar 13, 2019 at 11:34 AM Michael Riesch <[email protected]> wrote: > Hello all, > > Thank you very much for your work on Eigen. We found it very useful for > our simulation software mbsolve [1] (BTW maybe you would like to add it > to the projects list that uses the Eigen library). > > The code I am working on at the moment consists mostly of dense > matrix-matrix and matrix-vector multiplications. I compiled the code > with both Intel compiler 19 and gcc 6.3.0 and found that there is a > strange performance difference. Unless I define > > #EIGEN_STRONG_INLINE inline > > the binary compiled by icc is ~13x slower. The gcc binary performance > remains the same, as inline seems to be the standard setting of this > macro for gcc. > > Why can this behavior occur? Or, alternatively, which possible > anti-pattern could be the cause of this performance difference? > > Any hints are welcome. If you need more information, please let me know. > > Thanks in advance and best regards, > Michael > > [1] https://github.com/mriesch-tum/mbsolve > > > > > -- Jeff Hammond [email protected] http://jeffhammond.github.io/
