Hi Steve, On Fri, Jun 08 2018, Steve Ellcey wrote: > On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote: >> >> When we do our own comparisons of GCC vs. ICC on benchmarks >> like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC >> (in fact it even trails in some benchmarks) unless you get to >> "SPEC tricks" like data structure re-organization optimizations that >> probably never apply in practice on real-world code (and people >> should fix such things at the source level being pointed at them >> via actually profiling their codes). > > Richard, > > I was wondering if you have any more details about these comparisions > you have done that you can share? Compiler versions, options used, > hardware, etc Also, were there any tests that stood out in terms of > icc outperforming GCC?
Mostly AMD Ryzen, GCC 8 vs ICC 18. We were comparing a few combinations of options. When we compared ICC's and our -Ofast (with or without native GCC march/mtune and a set ICC options that hopefully generate best code on for Ryzen), we found out that without LTO/IPO, GCC is actually slightly ahead of ICC on integer benchmarks (both SPEC 2006 and 2017). Floating-point results were a more mixed bag (mostly because ICC performed surprisingly poorly without IPO on a few) but at least on SPEC 2017, they were clearly better... with a caveat, see below my comment about wrf. With LTO/IPO, ICC can perform a few memory-reorg tricks that push them quite a bit ahead of us but I'm not convinced they can perform these transformations on much source code that happens not to be a well known benchmark. So I'd recommend always looking at non-IPO numbers too. > > I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and > a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4). > I used '-xHost -O3' for icc and '-march=native -mtune=native -O3' > for gcc. Please try with -Ofast too. The main reason is that -O3 does not imply -ffast-math and the performance gain from it is often very big (and I suspect the 525.x264_r difference is because of that). Alternatively, if your own workloads require high-precision floating-point math, you have to force ICC to use it to get a fair comparison. -Ofast also turns on -fno-protect-parens and -fstack-arrays that also help a few benchmarks a lot but note that you may need to set large stack ulimit for them not to crash (but ICC does the same thing, as far as we know). > > The int rate numbers (running 1 copy only) were not too bad, GCC was > only about 2% slower and only 525.x264_r seemed way slower with GCC. > The fp rate numbers (again only 1 copy) showed a larger difference, > around 20%. 521.wrf_r was more than twice as slow when compiled with > GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed > significant slowdowns when compiled with GCC vs. ICC. > Keep in mind that when discussing FP benchmarks, the used math library can be (almost) as important as the compiler. In the case of 481.wrf, we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) performance is about 70% of ICC's. When we just linked against AMD's libm, we got to 83%. When we instructed GCC to generate calls to Intel's SVML library and linked against it, we got to 91%. Using both SVML and AMD's libm, we achieved 93%. That means that there likely still is 7% to be gained from more clever optimizations in GCC but the real problem is in GNU libm. And 481.wrf is perhaps the most extreme example but definitely not the only one. Martin
