[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-03-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P2

--- Comment #19 from Richard Biener  ---
Ok, so this means it is coalescing related.  We still don't know which
coalescing is good/bad though.

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-03-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #18 from Martin Liška  ---
(In reply to Patrik Huber from comment #14)
> It even seems a few percent slower after the FDO stuff. But the `
> -fprofile-use` is a bit weird. If there is no .gcda file, it doesn't
> complain. If you give it a file that doesn't exist (e.g. -fprofile-use=foo),
> then it doesn't complain either. So how can I check whether it really ran
> the FDO?

Yep, maybe having an option that will cause failure would be a good idea.
Anyway, you can use -fdump-ipa-profile and check *.065i.profile file where you
should see something like:

...
Read edge from 0 to 2, count:1
1 edge counts read
...

Note that -fprofile-use=foo tells the compiler to search in *folder* foo for
corresponding gcda files.

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-03-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #17 from Martin Liška  ---
Created attachment 43654
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43654=edit
optimized dump after the revision

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-03-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #16 from Martin Liška  ---
Created attachment 43653
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43653=edit
optimized dump before the revision

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-03-14 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||aoliva at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #15 from Martin Liška  ---
I can confirm that on my Haswell machine:
model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

I see regression with -march=core2 -mtune=core2 -O3 starting from r226901
(first time in GCC 5.x).

Time difference is:
0:00:14.975390
vs
0:00:11.889274

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-03-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.5

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #14 from Patrik Huber  ---
It even seems a few percent slower after the FDO stuff. But the `
-fprofile-use` is a bit weird. If there is no .gcda file, it doesn't complain.
If you give it a file that doesn't exist (e.g. -fprofile-use=foo), then it
doesn't complain either. So how can I check whether it really ran the FDO?

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #13 from Patrik Huber  ---
>> Did you try with FDO?  (-fprofile-generate, run, -fprofile-use)

I just tried this with g++-7. It didn't help, the final executable has the same
slower run time as in the attached log without the FDO.

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #12 from Richard Biener  ---
Hmm, the preprocessed source(s) are hard to work with given the eigen headers
seem to have conditional code on the enabled ISAs.

>From a quick look it seems to be inlining related?  My past experience says
that compute kernels in C++ should have the flatten attribute attached to
them...

Did you try with FDO?  (-fprofile-generate, run, -fprofile-use)

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #10 from Patrik Huber  ---
Created attachment 43367
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43367=edit
gcc5_gemm_test.ii

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #11 from Patrik Huber  ---
Created attachment 43368
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43368=edit
gcc7_gemm_test.ii

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #9 from Marc Glisse  ---
(In reply to Patrik Huber from comment #6)
> I could also upload you the .ii files but they are 5 MB, which the
> bugtracker doesn't allow (1 MB limit).

preprocessed sources are the .ii files (you can use compression).

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #8 from Patrik Huber  ---
Created attachment 43366
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43366=edit
full_log.txt

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #7 from Patrik Huber  ---
Created attachment 43365
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43365=edit
gemm_test.cpp

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #6 from Patrik Huber  ---
I could also upload you the .ii files but they are 5 MB, which the bugtracker
doesn't allow (1 MB limit).

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #5 from Patrik Huber  ---
Created attachment 43364
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43364=edit
gcc7_gemm_test.s

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #4 from Patrik Huber  ---
Created attachment 43363
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43363=edit
gcc5_gemm_test.s

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread patrikhuber at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

--- Comment #3 from Patrik Huber  ---
@Richard: I'm not 100% sure what you mean with "preprocessed source" but I
googled and you probably mean the output of compiling with "-c -save-temps".

Please see attached.

[Bug target/84280] [6/7/8 Regression] Performance regression in g++-7 with Eigen for non-AVX2 CPUs

2018-02-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84280

Marc Glisse  changed:

   What|Removed |Added

  Known to work||5.5.0
Summary|Performance regression in   |[6/7/8 Regression]
   |g++-7 with Eigen for|Performance regression in
   |non-AVX2 CPUs   |g++-7 with Eigen for
   ||non-AVX2 CPUs
  Known to fail||6.4.0, 7.2.0

--- Comment #2 from Marc Glisse  ---
The difference seems to be between gcc-5 and gcc-6.