Re: [eigen] Performance difference icc <-> gcc, EIGEN_STRONG_INLINE

Jeff Hammond Thu, 14 Mar 2019 06:57:57 -0700

Hi Michael,

Can you provide some details on the compiler flags you used?  If you can
give me the commands required to reproduce your results, I will submit a
bug report to ICC (I work for Intel).


I suspect that ICC is using different (more conservative) inlining
heuristic than GCC.  Failure to inline probably explains the 13x.  However,
that doesn't mean that GCC is right and ICC is wrong, as there is no
perfect inlining heuristic (see e.g.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 for a contrary position
on GCC heuristics).  Nonetheless, it may be useful to have the ICC inline
heuristic aware of Eigen use cases, since Eigen is rather popular.

Fortunately, it seems that there is a good solution already, which is to
use EIGEN_STRONG_INLINE, which obviously causes ICC to inline much more
aggressively.

Best,

Jeff

PS In the unlikely event that EIGEN_STRONG_INLINE isn't sufficient, you may
find the following ICC options useful.

$ icpc -help inline



Inlining

--------



-inline-level=<n>

          control inline expansion:

            n=0  disable inlining

            n=1  inline functions declared with __inline, and perform C++

                 inlining

            n=2  inline any function, at the compiler's discretion

-f[no-]inline

          inline functions declared with __inline, and perform C++ inlining

-f[no-]inline-functions

          inline any function at the compiler's discretion

-finline-limit=<n>

          set maximum number of statements a function can have and still be

          considered for inlining

-fgnu89-inline

           use C89 semantics for "inline" functions when in C99 mode

-inline-min-size=<n>

          set size limit for inlining small routines

-no-inline-min-size

          no size limit for inlining small routines

-inline-max-size=<n>

          set size limit for inlining large routines

-no-inline-max-size

          no size limit for inlining large routines

-inline-max-total-size=<n>

          maximum increase in size for inline function expansion

-no-inline-max-total-size

          no size limit for inline function expansion

-inline-max-per-routine=<n>

          maximum number of inline instances in any function

-no-inline-max-per-routine

          no maximum number of inline instances in any function

-inline-max-per-compile=<n>

          maximum number of inline instances in the current compilation

-no-inline-max-per-compile

          no maximum number of inline instances in the current compilation

-inline-factor=<n>

          set inlining upper limits by n percentage

-no-inline-factor

          do not set set inlining upper limits

-inline-forceinline

          treat inline routines as forceinline

-inline-calloc

          directs the compiler to inline calloc() calls as malloc()/memset()

-inline-min_caller-growth=<n>

          set lower limit on caller growth due to inlining a single routine

-no-inline-min-caller-growth

          no lower limit on caller growth due to inlining a single routine



On Wed, Mar 13, 2019 at 11:34 AM Michael Riesch <[email protected]>
wrote:

> Hello all,
>
> Thank you very much for your work on Eigen. We found it very useful for
> our simulation software mbsolve [1] (BTW maybe you would like to add it
> to the projects list that uses the Eigen library).
>
> The code I am working on at the moment consists mostly of dense
> matrix-matrix and matrix-vector multiplications. I compiled the code
> with both Intel compiler 19 and gcc 6.3.0 and found that there is a
> strange performance difference. Unless I define
>
> #EIGEN_STRONG_INLINE inline
>
> the binary compiled by icc is ~13x slower. The gcc binary performance
> remains the same, as inline seems to be the standard setting of this
> macro for gcc.
>
> Why can this behavior occur? Or, alternatively, which possible
> anti-pattern could be the cause of this performance difference?
>
> Any hints are welcome. If you need more information, please let me know.
>
> Thanks in advance and best regards,
> Michael
>
> [1] https://github.com/mriesch-tum/mbsolve
>
>
>
>
>

-- 
Jeff Hammond
[email protected]
http://jeffhammond.github.io/

Re: [eigen] Performance difference icc <-> gcc, EIGEN_STRONG_INLINE

Reply via email to