[Bug ipa/98265] [10/11 Regression] gcc-10 has significantly worse code generated with -O2 compared to -O1 (or gcc-9 -O2) when using the Eigen C++ library

2021-01-11 Thread kartikmohta at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98265

--- Comment #3 from Kartik Mohta  ---
This is a simple example to demonstrate the problem I've noticed in a bigger
program. Do the inlining limits depend on the size of the TU?

[Bug ipa/98265] gcc-10 has significantly worse code generated with -O2 compared to -O1 (or gcc-9 -O2) when using the Eigen C++ library

2020-12-13 Thread kartikmohta at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98265

--- Comment #1 from Kartik Mohta  ---
Created attachment 49755
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49755&action=edit
preprocessed file

[Bug ipa/98265] New: gcc-10 has significantly worse code generated with -O2 compared to -O1 (or gcc-9 -O2) when using the Eigen C++ library

2020-12-13 Thread kartikmohta at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98265

Bug ID: 98265
   Summary: gcc-10 has significantly worse code generated with -O2
compared to -O1 (or gcc-9 -O2) when using the Eigen
C++ library
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kartikmohta at gmail dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

I was checking the generated assembly for the following simple code using the
Eigen C++ library:

#include 

Eigen::Matrix f(float x, float y, float z, float scale)
{
  return Eigen::Matrix(x, y, z) * scale;
}

The Eigen::Matrix structure can be thought of as just a wrapper around an array
in this case. The preprocessed file is attached.


When compiled with -O1 optimization, this generates the expected reasonable
looking code, but when using -O2 it looks like the generated code is extremely
bad. The code generated by gcc-9.3 with -O2 also looks good. A quick comparison
of the code generated by gcc 10.2 vs gcc 9.3 can be seen at
https://godbolt.org/z/186c19.


Upon bisecting this, it seems like the offending changes are from the pair of
commits which changed some of the inliner params:

commit 1e83bd7003e03160b7d71fb959111e89b53446ab
Author: Jan Hubicka 
Date:   Sat Nov 23 05:13:23 2019

Convert inliner to new param infrastructure

commit 9340d34599e6d5e7a6f3614de44b2c578b180c1b
Author: Jan Hubicka 
Date:   Sat Nov 23 05:11:25 2019

Convert inliner to function specific param infrastructure


By playing with the optimization parameters, I saw that using either
-fno-partial-inlining or --param early-inlining-insns=14 fixes the generated
code with -O2.

Upon further investigation, it looks like earlier there were two separate
params for early-inlining-insns for -O3 and -O2 but in the consider_split
function (ipa-split.c) only the value for -O3 was used irrespective of the
optimization level. This code was not changed when the inliner params changes
were made leading to a much smaller value of param_early_inlining_insns being
used in consider_split for -O2 now, which may be causing problems with inlining
in this case.