Re: How to get GCC on par with ICC?
On 11/06/18 11:05, Martin Jambor wrote: The int rate numbers (running 1 copy only) were not too bad, GCC was only about 2% slower and only 525.x264_r seemed way slower with GCC. The fp rate numbers (again only 1 copy) showed a larger difference, around 20%. 521.wrf_r was more than twice as slow when compiled with GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed significant slowdowns when compiled with GCC vs. ICC. Keep in mind that when discussing FP benchmarks, the used math library can be (almost) as important as the compiler. In the case of 481.wrf, we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) performance is about 70% of ICC's. When we just linked against AMD's libm, we got to 83%. When we instructed GCC to generate calls to Intel's SVML library and linked against it, we got to 91%. Using both SVML and AMD's libm, we achieved 93%. i think glibc 2.27 should outperform amd's libm on wrf (since i upstreamed the single precision code from https://github.com/ARM-software/optimized-routines/ ) the 83% -> 93% diff is because gcc fails to vectorize math calls in fortran to libmvec calls. That means that there likely still is 7% to be gained from more clever optimizations in GCC but the real problem is in GNU libm. And 481.wrf is perhaps the most extreme example but definitely not the only one. there is no longer a problem in gnu libm for the most common single precision calls and if things go well then glibc 2.28 will get double precision improvements too. but gcc has to learn how to use libmvec in fortran.
Re: How to get GCC on par with ICC?
On Wed, 2018-06-20 at 17:11 -0400, NightStrike wrote: > > If I could perhaps jump in here for a moment... Just today I hit upon > a series of small (in lines of code) loops that gcc can't vectorize, > and intel vectorizes like a madman. They all involve a lot of heavy > use of std::vector>. Comparisons were with gcc > 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running > as sched_FIFO, mlockall, affinity set to its own core, and all > interrupts vectored off that core. So, as close to not-noisy as > possible. There are a quite a number of bugzilla reports with examples where GCC does not vectorize a loop. I wonder if this example is related to PR 61247. Steve Ellcey
Re: How to get GCC on par with ICC?
On Wed, Jun 20, 2018 at 11:12 PM NightStrike wrote: > > On Wed, Jun 6, 2018 at 11:57 AM, Joel Sherrill wrote: > > > > On Wed, Jun 6, 2018 at 10:51 AM, Paul Menzel < > > pmenzel+gcc.gnu@molgen.mpg.de> wrote: > > > > > Dear GCC folks, > > > > > > > > > Some scientists in our organization still want to use the Intel compiler, > > > as they say, it produces faster code, which is then executed on clusters. > > > Some resources on the Web [1][2] confirm this. (I am aware, that it’s > > > heavily dependent on the actual program.) > > > > > > > Do they have specific examples where icc is better for them? Or can point > > to specific GCC PRs which impact them? > > > > > > GCC versions? > > > > Are there specific CPU model variants of concern? > > > > What flags are used to compile? Some times a bit of advice can produce > > improvements. > > > > Without specific examples, it is hard to set goals. > > If I could perhaps jump in here for a moment... Just today I hit upon > a series of small (in lines of code) loops that gcc can't vectorize, > and intel vectorizes like a madman. They all involve a lot of heavy > use of std::vector>. Comparisons were with gcc Ick - C++ ;) > 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running > as sched_FIFO, mlockall, affinity set to its own core, and all > interrupts vectored off that core. So, as close to not-noisy as > possible. > > I was surprised at the results results, but using each compiler's methods of > dumping vectorization info, intel wins on two points: > > 1) It actually vectorizes > 2) It's vectorizing output is much more easily readable > > Options were: > > gcc -Wall -ggdb3 -std=gnu++17 -flto -Ofast -march=native > > vs: > > icc -Ofast -std=gnu++14 > > > So, not exactly exact, but pretty close. > > > So here's an example of a chunk of code (not very readable, sorry > about that) that intel can vectorize, and subsequently make about 50% > faster: > > std::size_t nLayers { input.nn.size() }; > //std::size_t ySize = std::max_element(input.nn.cbegin(), > input.nn.cend(), [](auto a, auto b){ return a.size() < b.size(); > })->size(); > std::size_t ySize = 0; > for (auto const & nn: input.nn) > ySize = std::max(ySize, nn.size()); > > float yNorm[ySize]; > for (auto & y: yNorm) > y = 0.0f; > for (std::size_t i = 0; i < xSize; ++i) > yNorm[i] = xNorm[i]; > for (std::size_t layer = 0; layer < nLayers; ++layer) { > auto & nn = input.nn[layer]; > auto & b = nn.back(); > float y[ySize]; > for (std::size_t i = 0; i < nn[0].size(); ++i) { > y[i] = b[i]; > for (std::size_t j = 0; j < nn.size() - 1; ++j) > y[i] += nn.at(j).at(i) * yNorm[j]; > } > for (std::size_t i = 0; i < ySize; ++i) { > if (layer < nLayers - 1) > y[i] = std::max(y[i], 0.0f); > yNorm[i] = y[i]; > } > } > > > If I was better at godbolt, I could show the asm, but I'm not. I'm > willing to learn, though. A compilable testcase would be more useful - just file a bugzilla. Richard.
Re: How to get GCC on par with ICC?
On Wed, Jun 6, 2018 at 11:57 AM, Joel Sherrill wrote: > > On Wed, Jun 6, 2018 at 10:51 AM, Paul Menzel < > pmenzel+gcc.gnu@molgen.mpg.de> wrote: > > > Dear GCC folks, > > > > > > Some scientists in our organization still want to use the Intel compiler, > > as they say, it produces faster code, which is then executed on clusters. > > Some resources on the Web [1][2] confirm this. (I am aware, that it’s > > heavily dependent on the actual program.) > > > > Do they have specific examples where icc is better for them? Or can point > to specific GCC PRs which impact them? > > > GCC versions? > > Are there specific CPU model variants of concern? > > What flags are used to compile? Some times a bit of advice can produce > improvements. > > Without specific examples, it is hard to set goals. If I could perhaps jump in here for a moment... Just today I hit upon a series of small (in lines of code) loops that gcc can't vectorize, and intel vectorizes like a madman. They all involve a lot of heavy use of std::vector>. Comparisons were with gcc 8.1, intel 2018.u1, an AMD Opteron 6386 SE, with the program running as sched_FIFO, mlockall, affinity set to its own core, and all interrupts vectored off that core. So, as close to not-noisy as possible. I was surprised at the results results, but using each compiler's methods of dumping vectorization info, intel wins on two points: 1) It actually vectorizes 2) It's vectorizing output is much more easily readable Options were: gcc -Wall -ggdb3 -std=gnu++17 -flto -Ofast -march=native vs: icc -Ofast -std=gnu++14 So, not exactly exact, but pretty close. So here's an example of a chunk of code (not very readable, sorry about that) that intel can vectorize, and subsequently make about 50% faster: std::size_t nLayers { input.nn.size() }; //std::size_t ySize = std::max_element(input.nn.cbegin(), input.nn.cend(), [](auto a, auto b){ return a.size() < b.size(); })->size(); std::size_t ySize = 0; for (auto const & nn: input.nn) ySize = std::max(ySize, nn.size()); float yNorm[ySize]; for (auto & y: yNorm) y = 0.0f; for (std::size_t i = 0; i < xSize; ++i) yNorm[i] = xNorm[i]; for (std::size_t layer = 0; layer < nLayers; ++layer) { auto & nn = input.nn[layer]; auto & b = nn.back(); float y[ySize]; for (std::size_t i = 0; i < nn[0].size(); ++i) { y[i] = b[i]; for (std::size_t j = 0; j < nn.size() - 1; ++j) y[i] += nn.at(j).at(i) * yNorm[j]; } for (std::size_t i = 0; i < ySize; ++i) { if (layer < nLayers - 1) y[i] = std::max(y[i], 0.0f); yNorm[i] = y[i]; } } If I was better at godbolt, I could show the asm, but I'm not. I'm willing to learn, though.
Re: How to get GCC on par with ICC?
On Fri, 15 Jun 2018, Jeff Law wrote: > And resolution on -fno-math-errno as the default. Setting errno can be > more expensive than people realize. I don't think I saw any version of the -fno-math-errno patch proposal that included the testsuite updates I'd expect. Certainly gcc.dg/torture/pr68264.c tests libm functions setting errno and would need to use -fmath-errno explicitly, but it seems likely there are other tests involving built-in functions that in fact only test what they're intended to test given -fmath-errno; tests using libm functions without explicit -ffast-math / -fmath-errno / -fno-math-errno would need review (and there should be new tests for optimizations that are only valid given -fno-math-errno). -- Joseph S. Myers jos...@codesourcery.com
Re: How to get GCC on par with ICC?
On 06/15/2018 05:39 AM, Wilco Dijkstra wrote: > Martin wrote: > >> Keep in mind that when discussing FP benchmarks, the used math library >> can be (almost) as important as the compiler. In the case of 481.wrf, >> we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) >> performance is about 70% of ICC's. When we just linked against AMD's >> libm, we got to 83%. When we instructed GCC to generate calls to Intel's >> SVML library and linked against it, we got to 91%. Using both SVML and >> AMD's libm, we achieved 93%. >> >> That means that there likely still is 7% to be gained from more clever >> optimizations in GCC but the real problem is in GNU libm. And 481.wrf >> is perhaps the most extreme example but definitely not the only one. > > You really should retry with GLIBC 2.27 since several key math functions were > rewritten from scratch by Szabolcs Nagy (all in generic C code), resulting in > huge > performance gains on all targets (eg. wrf improved over 50%). > > I fixed several double precision functions in current GLIBC to avoid > extremely bad > performance which had been complained about for years. There are more math > functions on the way, so the GNU libm will not only catch up, but become the > fastest > math library available. And resolution on -fno-math-errno as the default. Setting errno can be more expensive than people realize. Jeff
Re: How to get GCC on par with ICC?
Martin wrote: > Keep in mind that when discussing FP benchmarks, the used math library > can be (almost) as important as the compiler. In the case of 481.wrf, > we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) > performance is about 70% of ICC's. When we just linked against AMD's > libm, we got to 83%. When we instructed GCC to generate calls to Intel's > SVML library and linked against it, we got to 91%. Using both SVML and > AMD's libm, we achieved 93%. > > That means that there likely still is 7% to be gained from more clever > optimizations in GCC but the real problem is in GNU libm. And 481.wrf > is perhaps the most extreme example but definitely not the only one. You really should retry with GLIBC 2.27 since several key math functions were rewritten from scratch by Szabolcs Nagy (all in generic C code), resulting in huge performance gains on all targets (eg. wrf improved over 50%). I fixed several double precision functions in current GLIBC to avoid extremely bad performance which had been complained about for years. There are more math functions on the way, so the GNU libm will not only catch up, but become the fastest math library available. Wilco
Re: How to get GCC on par with ICC?
Hi Steve, On Fri, Jun 08 2018, Steve Ellcey wrote: > On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote: >> >> When we do our own comparisons of GCC vs. ICC on benchmarks >> like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC >> (in fact it even trails in some benchmarks) unless you get to >> "SPEC tricks" like data structure re-organization optimizations that >> probably never apply in practice on real-world code (and people >> should fix such things at the source level being pointed at them >> via actually profiling their codes). > > Richard, > > I was wondering if you have any more details about these comparisions > you have done that you can share? Compiler versions, options used, > hardware, etc Also, were there any tests that stood out in terms of > icc outperforming GCC? Mostly AMD Ryzen, GCC 8 vs ICC 18. We were comparing a few combinations of options. When we compared ICC's and our -Ofast (with or without native GCC march/mtune and a set ICC options that hopefully generate best code on for Ryzen), we found out that without LTO/IPO, GCC is actually slightly ahead of ICC on integer benchmarks (both SPEC 2006 and 2017). Floating-point results were a more mixed bag (mostly because ICC performed surprisingly poorly without IPO on a few) but at least on SPEC 2017, they were clearly better... with a caveat, see below my comment about wrf. With LTO/IPO, ICC can perform a few memory-reorg tricks that push them quite a bit ahead of us but I'm not convinced they can perform these transformations on much source code that happens not to be a well known benchmark. So I'd recommend always looking at non-IPO numbers too. > > I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and > a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4). > I used '-xHost -O3' for icc and '-march=native -mtune=native -O3' > for gcc. Please try with -Ofast too. The main reason is that -O3 does not imply -ffast-math and the performance gain from it is often very big (and I suspect the 525.x264_r difference is because of that). Alternatively, if your own workloads require high-precision floating-point math, you have to force ICC to use it to get a fair comparison. -Ofast also turns on -fno-protect-parens and -fstack-arrays that also help a few benchmarks a lot but note that you may need to set large stack ulimit for them not to crash (but ICC does the same thing, as far as we know). > > The int rate numbers (running 1 copy only) were not too bad, GCC was > only about 2% slower and only 525.x264_r seemed way slower with GCC. > The fp rate numbers (again only 1 copy) showed a larger difference, > around 20%. 521.wrf_r was more than twice as slow when compiled with > GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed > significant slowdowns when compiled with GCC vs. ICC. > Keep in mind that when discussing FP benchmarks, the used math library can be (almost) as important as the compiler. In the case of 481.wrf, we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) performance is about 70% of ICC's. When we just linked against AMD's libm, we got to 83%. When we instructed GCC to generate calls to Intel's SVML library and linked against it, we got to 91%. Using both SVML and AMD's libm, we achieved 93%. That means that there likely still is 7% to be gained from more clever optimizations in GCC but the real problem is in GNU libm. And 481.wrf is perhaps the most extreme example but definitely not the only one. Martin
Re: How to get GCC on par with ICC?
On Fri, 8 Jun 2018, Steve Ellcey wrote: On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote: When we do our own comparisons of GCC vs. ICC on benchmarks like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC (in fact it even trails in some benchmarks) unless you get to "SPEC tricks" like data structure re-organization optimizations that probably never apply in practice on real-world code (and people should fix such things at the source level being pointed at them via actually profiling their codes). Richard, I was wondering if you have any more details about these comparisions you have done that you can share? Compiler versions, options used, hardware, etc Also, were there any tests that stood out in terms of icc outperforming GCC? I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4). I used '-xHost -O3' for icc and '-march=native -mtune=native -O3' for gcc. You should use -Ofast for gcc. As mentionned earlier in the discussion, ICC has some equivalent of -ffast-math by default. The int rate numbers (running 1 copy only) were not too bad, GCC was only about 2% slower and only 525.x264_r seemed way slower with GCC. The fp rate numbers (again only 1 copy) showed a larger difference, around 20%. 521.wrf_r was more than twice as slow when compiled with GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed significant slowdowns when compiled with GCC vs. ICC. -- Marc Glisse
Re: How to get GCC on par with ICC?
On Thu, 2018-06-07 at 12:01 +0200, Richard Biener wrote: > > When we do our own comparisons of GCC vs. ICC on benchmarks > like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC > (in fact it even trails in some benchmarks) unless you get to > "SPEC tricks" like data structure re-organization optimizations that > probably never apply in practice on real-world code (and people > should fix such things at the source level being pointed at them > via actually profiling their codes). Richard, I was wondering if you have any more details about these comparisions you have done that you can share? Compiler versions, options used, hardware, etc Also, were there any tests that stood out in terms of icc outperforming GCC? I did a compare of SPEC 2017 rate using GCC 8.* (pre release) and a recent ICC (2018.0.128?) on my desktop (Xeon CPU E5-1650 v4). I used '-xHost -O3' for icc and '-march=native -mtune=native -O3' for gcc. The int rate numbers (running 1 copy only) were not too bad, GCC was only about 2% slower and only 525.x264_r seemed way slower with GCC. The fp rate numbers (again only 1 copy) showed a larger difference, around 20%. 521.wrf_r was more than twice as slow when compiled with GCC instead of ICC and 503.bwaves_r and 510.parest_r also showed significant slowdowns when compiled with GCC vs. ICC. Steve Ellcey sell...@cavium.com
Re: How to get GCC on par with ICC?
On Wed, Jun 6, 2018 at 5:52 PM Paul Menzel wrote: > > Dear GCC folks, > > > Some scientists in our organization still want to use the Intel > compiler, as they say, it produces faster code, which is then executed > on clusters. Some resources on the Web [1][2] confirm this. (I am aware, > that it’s heavily dependent on the actual program.) > > My question is, is it realistic, that GCC could catch up and that the > scientists will start to use it over Intel’s compiler? Or will Intel > developers always have the lead, because they have secret documentation > and direct contact with the processor designers? They will of course have an edge in timing when supporting a new architecture because they have access to NDA material and hardware. For example the OSS community doesn't yet have access to any AVX512 capable machine (speaking of the GNU compile-farm), and those are prohibitly expensive for a private contributor. Similar stories apply to the access to proprietary benchmarks or simply having resources to continuously work with folks in HPC to make sure ICC works great for their codes. > If it is realistic, how can we get there? Would first the program be > written, and then the compiler be optimized for that? Or are just more > GCC developers needed? I think a big part of the story is perception and training. This means that for example a coherent and up-to-date source for information on how to use GCC in a HPC environment (optimizing your code, recommended compiler options, pitfalls to avoid, etc.) is desperately missing. When we do our own comparisons of GCC vs. ICC on benchmarks like SPEC CPU 2006/2017 ICC doesn't have a big lead over GCC (in fact it even trails in some benchmarks) unless you get to "SPEC tricks" like data structure re-organization optimizations that probably never apply in practice on real-world code (and people should fix such things at the source level being pointed at them via actually profiling their codes). In my own experience which dates back nearly 15 years now ICC is buggy (generates wrong-code / simulation results) and cannot compile a "simple" C++ program ;) This made me start working on GCC. Note that the very best strength of GCC is the first-class high-quality (insert more buzzwords here) support infrastructure if you actually run into issues with the compiler! Even when using paid ICC I never got timely fixes (if at all) for wrong-code issues I reported to them! I've separately replied to specific points in other posts where ICC has an edge over GCC. Richard. > > Kind regards, > > Paul > > > [1]: https://colfaxresearch.com/compiler-comparison/ > [2]: > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679.1280&rep=rep1&type=pdf >
Re: How to get GCC on par with ICC?
On Wed, Jun 6, 2018 at 8:31 PM Ryan Burn wrote: > > One case where ICC can generate much faster code sometimes is by using > the nontemporal pragma [https://software.intel.com/en-us/node/524559] > with loops. > > AFAIK, there's no such equivalent pragma in gcc > [https://gcc.gnu.org/ml/gcc/2012-01/msg00028.html]. > > When I tried this simple example > https://github.com/rnburn/square_timing/blob/master/bench.cpp that > measures times for this loop: > > void compute(const double* x, index_t N, double* y) { > #pragma vector nontemporal > for(index_t i=0; i } > > with and without nontemporal I got these times (N = 1,000,000) > > Temporal 1,042,080 > Non-Temporal 538,842 > > So running with the non-temporal pragma was nearly twice as fast. > > An equivalent non-temporal pragma for GCC would, IMO, certainly be a > very good feature to add. GCC has robust infrastructure for loop pragmas now just the set of pragmas available isn't very big. It would be interesting to know which ICC ones people use regularly so we can support those in GCC as well. Note using #pragmas is very much hand-optimizing the code for the compiler you use - sth that is possible for GCC as well. Richard. > On Wed, Jun 6, 2018 at 12:22 PM, Dmitry Mikushin wrote: > > Dear Paul, > > > > The opinion you've mentioned is common in scientific community. However, in > > more detail it often surfaces that the used set of GCC compiler options > > simply does not correspond to that "fast" version of Intel. For instance, > > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > > introduces significant performance gap. > > > > Kind regards, > > - Dmitry Mikushin | Applied Parallel Computing LLC | > > https://parallel-computing.pro > > > > > > 2018-06-06 18:51 GMT+03:00 Paul Menzel : > > > >> Dear GCC folks, > >> > >> > >> Some scientists in our organization still want to use the Intel compiler, > >> as they say, it produces faster code, which is then executed on clusters. > >> Some resources on the Web [1][2] confirm this. (I am aware, that it’s > >> heavily dependent on the actual program.) > >> > >> My question is, is it realistic, that GCC could catch up and that the > >> scientists will start to use it over Intel’s compiler? Or will Intel > >> developers always have the lead, because they have secret documentation and > >> direct contact with the processor designers? > >> > >> If it is realistic, how can we get there? Would first the program be > >> written, and then the compiler be optimized for that? Or are just more GCC > >> developers needed? > >> > >> > >> Kind regards, > >> > >> Paul > >> > >> > >> [1]: https://colfaxresearch.com/compiler-comparison/ > >> [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679 > >> .1280&rep=rep1&type=pdf > >> > >>
Re: How to get GCC on par with ICC?
On Wed, Jun 6, 2018 at 11:10 PM Zan Lynx wrote: > > On 06/06/2018 10:22 AM, Dmitry Mikushin wrote: > > The opinion you've mentioned is common in scientific community. However, in > > more detail it often surfaces that the used set of GCC compiler options > > simply does not correspond to that "fast" version of Intel. For instance, > > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > > introduces significant performance gap. > > > > Please note that if your compute cluster uses different models of CPU, > be extremely careful with -march=native. > > I've been bitten by it in VMs, several times. Unless you always run on > the same system that did the build, you are running a risk of illegal > instructions. Yes. Note this is where ICC has an advantage because it supports automagically doing runtime versioning based on the CPU instruction set for vectorized loops. We only support that in an awkward explicit way (the manual talks about this in the 'Function Multiversioning' section). But in the end it's just a "detail" that can be worked around with a little inconvenience ;) (I've yet to see a heterogenous cluster where the instruction set differences make a performance difference over choosing the lowest common one) Richard. > -- > Knowledge is Power -- Power Corrupts > Study Hard -- Be Evil
Re: How to get GCC on par with ICC?
On 06/06/2018 10:22 AM, Dmitry Mikushin wrote: > The opinion you've mentioned is common in scientific community. However, in > more detail it often surfaces that the used set of GCC compiler options > simply does not correspond to that "fast" version of Intel. For instance, > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > introduces significant performance gap. > Please note that if your compute cluster uses different models of CPU, be extremely careful with -march=native. I've been bitten by it in VMs, several times. Unless you always run on the same system that did the build, you are running a risk of illegal instructions. -- Knowledge is Power -- Power Corrupts Study Hard -- Be Evil
Re: How to get GCC on par with ICC?
One case where ICC can generate much faster code sometimes is by using the nontemporal pragma [https://software.intel.com/en-us/node/524559] with loops. AFAIK, there's no such equivalent pragma in gcc [https://gcc.gnu.org/ml/gcc/2012-01/msg00028.html]. When I tried this simple example https://github.com/rnburn/square_timing/blob/master/bench.cpp that measures times for this loop: void compute(const double* x, index_t N, double* y) { #pragma vector nontemporal for(index_t i=0; i wrote: > Dear Paul, > > The opinion you've mentioned is common in scientific community. However, in > more detail it often surfaces that the used set of GCC compiler options > simply does not correspond to that "fast" version of Intel. For instance, > when you do "-O3" for Intel it actually corresponds to (at least) "-O3 > -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously > introduces significant performance gap. > > Kind regards, > - Dmitry Mikushin | Applied Parallel Computing LLC | > https://parallel-computing.pro > > > 2018-06-06 18:51 GMT+03:00 Paul Menzel : > >> Dear GCC folks, >> >> >> Some scientists in our organization still want to use the Intel compiler, >> as they say, it produces faster code, which is then executed on clusters. >> Some resources on the Web [1][2] confirm this. (I am aware, that it’s >> heavily dependent on the actual program.) >> >> My question is, is it realistic, that GCC could catch up and that the >> scientists will start to use it over Intel’s compiler? Or will Intel >> developers always have the lead, because they have secret documentation and >> direct contact with the processor designers? >> >> If it is realistic, how can we get there? Would first the program be >> written, and then the compiler be optimized for that? Or are just more GCC >> developers needed? >> >> >> Kind regards, >> >> Paul >> >> >> [1]: https://colfaxresearch.com/compiler-comparison/ >> [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679 >> .1280&rep=rep1&type=pdf >> >>
Re: How to get GCC on par with ICC?
Dear Paul, The opinion you've mentioned is common in scientific community. However, in more detail it often surfaces that the used set of GCC compiler options simply does not correspond to that "fast" version of Intel. For instance, when you do "-O3" for Intel it actually corresponds to (at least) "-O3 -ffast-math -march=native" of GCC. Omitting "-ffast-math" obviously introduces significant performance gap. Kind regards, - Dmitry Mikushin | Applied Parallel Computing LLC | https://parallel-computing.pro 2018-06-06 18:51 GMT+03:00 Paul Menzel : > Dear GCC folks, > > > Some scientists in our organization still want to use the Intel compiler, > as they say, it produces faster code, which is then executed on clusters. > Some resources on the Web [1][2] confirm this. (I am aware, that it’s > heavily dependent on the actual program.) > > My question is, is it realistic, that GCC could catch up and that the > scientists will start to use it over Intel’s compiler? Or will Intel > developers always have the lead, because they have secret documentation and > direct contact with the processor designers? > > If it is realistic, how can we get there? Would first the program be > written, and then the compiler be optimized for that? Or are just more GCC > developers needed? > > > Kind regards, > > Paul > > > [1]: https://colfaxresearch.com/compiler-comparison/ > [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679 > .1280&rep=rep1&type=pdf > >
Re: How to get GCC on par with ICC?
On Wed, Jun 6, 2018 at 3:51 PM, Paul Menzel wrote: > Dear GCC folks, > > > Some scientists in our organization still want to use the Intel compiler, as > they say, it produces faster code, which is then executed on clusters. Some > resources on the Web [1][2] confirm this. (I am aware, that it’s heavily > dependent on the actual program.) > > My question is, is it realistic, that GCC could catch up and that the > scientists will start to use it over Intel’s compiler? Or will Intel > developers always have the lead, because they have secret documentation and > direct contact with the processor designers? > > If it is realistic, how can we get there? Would first the program be > written, and then the compiler be optimized for that? Or are just more GCC > developers needed? There are developers actually working on performance optimization in GCC so you are not the only one :). As an opensource compiler we do lack resource so more developers is always good for the project. As Joel pointed out, typical/reduced workload showing the performance gap is very important for our developers as well as attracting new developers. We can probably open a meta-bug for tracking if you have many of these example workloads. Thanks, bin > > > Kind regards, > > Paul > > > [1]: https://colfaxresearch.com/compiler-comparison/ > [2]: > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679.1280&rep=rep1&type=pdf >
Re: How to get GCC on par with ICC?
Dear Joel, Thank you for your quick reply. On 06/06/18 17:57, Joel Sherrill wrote: On Wed, Jun 6, 2018 at 10:51 AM, Paul Menzel wrote: Some scientists in our organization still want to use the Intel compiler, as they say, it produces faster code, which is then executed on clusters. Some resources on the Web [1][2] confirm this. (I am aware, that it’s heavily dependent on the actual program.) Do they have specific examples where icc is better for them? Or can point to specific GCC PRs which impact them? GCC versions? Are there specific CPU model variants of concern? What flags are used to compile? Some times a bit of advice can produce improvements. Without specific examples, it is hard to set goals. I could get such examples, but it will take some time, as it’s from other institutes. The clusters use exclusively Intel processors. (Hopefully, that will change.) I also found the article from the German Linux-Magazin in an English version at the ADMIN Magazin [3]. The German article had a more strong statement, that they use the Intel compilers due to performance reasons. My question is, is it realistic, that GCC could catch up and that the scientists will start to use it over Intel’s compiler? Or will Intel developers always have the lead, because they have secret documentation and direct contact with the processor designers? If it is realistic, how can we get there? Would first the program be written, and then the compiler be optimized for that? Or are just more GCC developers needed? For sure examples are needed so there are test cases to use for reference. If you want anything improved in any free software project, sponsoring developers is always a good thing. If you sponsor the right developers. :) That’s what I hoped for, but didn’t ask here. If you could point me to a list of possible contractors, that would be great. Please keep in mind, that in my organization certain decisions are made *very* slowly. I’ll try to get answers quickly, but procuring finances might take longer (half a year or much longer). Kind regards, Paul [1]: https://colfaxresearch.com/compiler-comparison/ [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679.1280&rep=rep1&type=pdf [3] http://www.admin-magazine.com/HPC/Articles/Selecting-Compilers-for-a-Supercomputer "HPC Compilers" smime.p7s Description: S/MIME Cryptographic Signature
Re: How to get GCC on par with ICC?
On Wed, Jun 6, 2018 at 10:51 AM, Paul Menzel < pmenzel+gcc.gnu@molgen.mpg.de> wrote: > Dear GCC folks, > > > Some scientists in our organization still want to use the Intel compiler, > as they say, it produces faster code, which is then executed on clusters. > Some resources on the Web [1][2] confirm this. (I am aware, that it’s > heavily dependent on the actual program.) > Do they have specific examples where icc is better for them? Or can point to specific GCC PRs which impact them? GCC versions? Are there specific CPU model variants of concern? What flags are used to compile? Some times a bit of advice can produce improvements. Without specific examples, it is hard to set goals. > My question is, is it realistic, that GCC could catch up and that the > scientists will start to use it over Intel’s compiler? Or will Intel > developers always have the lead, because they have secret documentation and > direct contact with the processor designers? > > If it is realistic, how can we get there? Would first the program be > written, and then the compiler be optimized for that? Or are just more GCC > developers needed? > For sure examples are needed so there are test cases to use for reference. If you want anything improved in any free software project, sponsoring developers is always a good thing. If you sponsor the right developers. :) I'm not discouraging you. I just trying to turn this into something actionable. --joel sherrill > > > Kind regards, > > Paul > > > [1]: https://colfaxresearch.com/compiler-comparison/ > [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679 > .1280&rep=rep1&type=pdf > >
How to get GCC on par with ICC?
Dear GCC folks, Some scientists in our organization still want to use the Intel compiler, as they say, it produces faster code, which is then executed on clusters. Some resources on the Web [1][2] confirm this. (I am aware, that it’s heavily dependent on the actual program.) My question is, is it realistic, that GCC could catch up and that the scientists will start to use it over Intel’s compiler? Or will Intel developers always have the lead, because they have secret documentation and direct contact with the processor designers? If it is realistic, how can we get there? Would first the program be written, and then the compiler be optimized for that? Or are just more GCC developers needed? Kind regards, Paul [1]: https://colfaxresearch.com/compiler-comparison/ [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.679.1280&rep=rep1&type=pdf smime.p7s Description: S/MIME Cryptographic Signature