For the curious, I ran benchmarks for both gcc-8 and clang-8 on my laptop for the ARROW-9197 patch
* Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470 * GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670 It's apparent in this particular instance that GCC was applying some optimizations pretty well outside of what Clang was doing. The baseline performance was 5-6x faster on GCC for certain cases than Clang. Very strange indeed On Mon, Jun 22, 2020 at 11:48 AM Neal Richardson <neal.p.richard...@gmail.com> wrote: > > FTR, by default R on macOS uses system clang, not homebrew; for Windows, it > is gcc (unless you're adventurous like Uwe and can hack it to work with > clang ;); and on Linux, CRAN checks both gcc and clang. > > Is it unreasonable to check *both* gcc and clang when there's a potentially > significant performance change? It sounded like recently there have been > cases where a change makes a big speedup with one compiler but a slowdown > with the other, so if we only test with one, we'll be degrading with the > other over time. But if we test both (simplifying of course, since there > are multiple versions, etc.), then at least we'd be more aware of what the > tradeoff is. > > Neal > > > On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques < > fsaintjacq...@gmail.com> wrote: > > > We should aim to improve the performance of the most widely used > > *default* packages, which are python pip, python conda and R (all > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > > (haven't checked) that clang is used on OSX via brew. Thus, by > > default, almost all users are going to use a gcc compiled version of > > arrow on Linux. > > > > François > > > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > > Based on some of my performance work recently, I'm growing > > > uncomfortable with using gcc as the performance baseline since the > > > results can be significantly different (sometimes 3-4x or more on > > > certain fast algorithms) from clang and MSVC. The perf results on > > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > > some benchmarks that showed 2-5x performance improvement on both clang > > > and MSVC shows small regressions (20-30%) with gcc. > > > > > > I don't think we need a hard-and-fast rule about whether to accept PRs > > > based on benchmarks but there are a few guiding criteria: > > > > > > * How much binary size does the new code add? I think many of us would > > > agree that a 20% performance increase on some algorithm might not be > > > worth adding 500KB to libarrow.so > > > * Is the code generally faster across the major compiler targets (gcc, > > > clang, MSVC)? > > > > > > I think that using clang as a baseline for informational benchmarks > > > would be good, but ultimately we need to be systematically collecting > > > data on all the major compiilers. Some time ago I proposed building a > > > Continuous Benchmarking framework > > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > > > for use with Arrow (and outside of Arrow, too) so I hope that this > > > will be able to help. > > > > > > - Wes > > > > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yibo....@arm.com> wrote: > > > > > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > > > >> There has been significant effort recently trying to optimize our > > C++ > > > > >> code. One thing that seems to come up frequently is different > > benchmark > > > > >> results between GCC and Clang. Even different versions of the same > > > > >> compiler can yield significantly different results on the same code. > > > > >> > > > > >> I would like to propose that we choose a specific compiler and > > version on > > > > >> Linux for evaluating performance related PRs. PRs would only be > > accepted > > > > >> if they improve the benchmarks under the selected version. > > > > > > > > > > Would this be a hard rule or just a guideline? There are many ways > > in > > > > > which benchmark numbers can be improved or deteriorated by a PR, and > > in > > > > > some cases that doesn't matter (benchmarks are not always realistic, > > and > > > > > they are not representative of every workload). > > > > > > > > > > > > > I agree that microbenchmark is not always useful, focusing too much on > > > > improving microbenchmark result gives me feeling of "overfit" (to some > > > > specific microarchitecture, compiler, or use case). > > > > > > > > > Regards > > > > > > > > > > Antoine. > > > > > > >