I think there is a lot of good discussion on this thread.  Let me try to
try to summarize and give my thoughts:

1.  Will this be a hard rule?
No, based on the feedback, I think there will always be some subjectivity
(e.g. tradeoff in increase in code size and maintainability).  My intent
here was to try to reduce cognitive load on reviewers and give clear
guidance for contributors on how PRs for performance will be evaluated.

2.  Why don't we track all of them?
We should be tracking all of them.  I think the question is what happens
when they diverge on benchmarks/real world use?  How do we decide if the PR
should be merged?  This goes back to point #1, having some basic guidelines
is useful. I would imagine we might already be "nerfing" users on older
linux distros with older C++ toolchains.  Going back to the point above, it
would be good to have a set of criteria on how much effort we invest to
avoid or fix regressions in this case.

I would hope that the pattern we will see is that modern compilers will
likely converge.  Looking into GCC/Clang specifically, it appears GCC
8/Clang 8 is perhaps the wrong comparison.  Clang 8  appears to have been
released 8 months after GCC 8.  GCC 9 was released 2 months after Clang 8.

3.  Microbenchmarks don't tell the whole story? (we shouldn't overfit)
I agree, but we should have at least a few more macro style benchmarks that
can help judge overall impact.  This is also where the other criteria come
into play on deciding whether to merge.

4.  Which compiler to choose?
I agree with the sentiment that we should be choosing a compiler that can
reflect performance in our prepackaged binaries.  Based on this thread, I'm
a little bit confused as to whether we can build more of our code with a
single compiler for the distribution channels (it sounds like it might be
viable with clang).  But IIUC most of our builds on, linux at least, use
GCC.  Are we even using consistent versions of GCC across our packages?  I
think it is a bad outcome for our users (and potentially a headache for us)
if we announce X features improved by Y% and users actually see a
regression.

Thanks,
Micah

On Mon, Jun 22, 2020 at 10:08 AM Wes McKinney <wesmck...@gmail.com> wrote:

> For the curious, I ran benchmarks for both gcc-8 and clang-8 on my
> laptop for the ARROW-9197 patch
>
> * Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470
> * GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670
>
> It's apparent in this particular instance that GCC was applying some
> optimizations pretty well outside of what Clang was doing. The
> baseline performance was 5-6x faster on GCC for certain cases than
> Clang. Very strange indeed
>
> On Mon, Jun 22, 2020 at 11:48 AM Neal Richardson
> <neal.p.richard...@gmail.com> wrote:
> >
> > FTR, by default R on macOS uses system clang, not homebrew; for Windows,
> it
> > is gcc (unless you're adventurous like Uwe and can hack it to work with
> > clang ;); and on Linux, CRAN checks both gcc and clang.
> >
> > Is it unreasonable to check *both* gcc and clang when there's a
> potentially
> > significant performance change? It sounded like recently there have been
> > cases where a change makes a big speedup with one compiler but a slowdown
> > with the other, so if we only test with one, we'll be degrading with the
> > other over time. But if we test both (simplifying of course, since there
> > are multiple versions, etc.), then at least we'd be more aware of what
> the
> > tradeoff is.
> >
> > Neal
> >
> >
> > On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > We should aim to improve the performance of the most widely used
> > > *default* packages, which are python pip, python conda and R (all
> > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> > > (haven't checked) that clang is used on OSX via brew. Thus, by
> > > default, almost all users are going to use a gcc compiled version of
> > > arrow on Linux.
> > >
> > > François
> > >
> > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > >
> > > > Based on some of my performance work recently, I'm growing
> > > > uncomfortable with using gcc as the performance baseline since the
> > > > results can be significantly different (sometimes 3-4x or more on
> > > > certain fast algorithms) from clang and MSVC. The perf results on
> > > > https://github.com/apache/arrow/pull/7506 were really surprising --
> > > > some benchmarks that showed 2-5x performance improvement on both
> clang
> > > > and MSVC shows small regressions (20-30%) with gcc.
> > > >
> > > > I don't think we need a hard-and-fast rule about whether to accept
> PRs
> > > > based on benchmarks but there are a few guiding criteria:
> > > >
> > > > * How much binary size does the new code add? I think many of us
> would
> > > > agree that a 20% performance increase on some algorithm might not be
> > > > worth adding 500KB to libarrow.so
> > > > * Is the code generally faster across the major compiler targets
> (gcc,
> > > > clang, MSVC)?
> > > >
> > > > I think that using clang as a baseline for informational benchmarks
> > > > would be good, but ultimately we need to be systematically collecting
> > > > data on all the major compiilers. Some time ago I proposed building a
> > > > Continuous Benchmarking framework
> > > > (
> https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > > > for use with Arrow (and outside of Arrow, too) so I hope that this
> > > > will be able to help.
> > > >
> > > > - Wes
> > > >
> > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yibo....@arm.com> wrote:
> > > > >
> > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > > > >
> > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > > > >> There has been significant effort recently trying to optimize
> our
> > > C++
> > > > > >> code.  One  thing that seems to come up frequently is different
> > > benchmark
> > > > > >> results between GCC and Clang.  Even different versions of the
> same
> > > > > >> compiler can yield significantly different results on the same
> code.
> > > > > >>
> > > > > >> I would like to propose that we choose a specific compiler and
> > > version on
> > > > > >> Linux for evaluating performance related PRs.  PRs would only be
> > > accepted
> > > > > >> if they improve the benchmarks under the selected version.
> > > > > >
> > > > > > Would this be a hard rule or just a guideline?  There are many
> ways
> > > in
> > > > > > which benchmark numbers can be improved or deteriorated by a PR,
> and
> > > in
> > > > > > some cases that doesn't matter (benchmarks are not always
> realistic,
> > > and
> > > > > > they are not representative of every workload).
> > > > > >
> > > > >
> > > > > I agree that microbenchmark is not always useful, focusing too
> much on
> > > > > improving microbenchmark result gives me feeling of "overfit" (to
> some
> > > > > specific microarchitecture, compiler, or use case).
> > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > >
>

Reply via email to