Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Uwe L. Korn Tue, 23 Jun 2020 00:54:03 -0700

FTR: We can use the latest(!) clang for all platform for conda and wheels. It 
isn't probably even that much of a complicated setup.


On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote:
> We should aim to improve the performance of the most widely used
> *default* packages, which are python pip, python conda and R (all
> platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> (haven't checked) that clang is used on OSX via brew. Thus, by
> default, almost all users are going to use a gcc compiled version of
> arrow on Linux.
> 
> François
> 
> On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <[email protected]> wrote:
> >
> > Based on some of my performance work recently, I'm growing
> > uncomfortable with using gcc as the performance baseline since the
> > results can be significantly different (sometimes 3-4x or more on
> > certain fast algorithms) from clang and MSVC. The perf results on
> > https://github.com/apache/arrow/pull/7506 were really surprising --
> > some benchmarks that showed 2-5x performance improvement on both clang
> > and MSVC shows small regressions (20-30%) with gcc.
> >
> > I don't think we need a hard-and-fast rule about whether to accept PRs
> > based on benchmarks but there are a few guiding criteria:
> >
> > * How much binary size does the new code add? I think many of us would
> > agree that a 20% performance increase on some algorithm might not be
> > worth adding 500KB to libarrow.so
> > * Is the code generally faster across the major compiler targets (gcc,
> > clang, MSVC)?
> >
> > I think that using clang as a baseline for informational benchmarks
> > would be good, but ultimately we need to be systematically collecting
> > data on all the major compiilers. Some time ago I proposed building a
> > Continuous Benchmarking framework
> > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > for use with Arrow (and outside of Arrow, too) so I hope that this
> > will be able to help.
> >
> > - Wes
> >
> > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <[email protected]> wrote:
> > >
> > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > >
> > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > >> There has been significant effort recently trying to optimize our C++
> > > >> code.  One  thing that seems to come up frequently is different 
> > > >> benchmark
> > > >> results between GCC and Clang.  Even different versions of the same
> > > >> compiler can yield significantly different results on the same code.
> > > >>
> > > >> I would like to propose that we choose a specific compiler and version 
> > > >> on
> > > >> Linux for evaluating performance related PRs.  PRs would only be 
> > > >> accepted
> > > >> if they improve the benchmarks under the selected version.
> > > >
> > > > Would this be a hard rule or just a guideline?  There are many ways in
> > > > which benchmark numbers can be improved or deteriorated by a PR, and in
> > > > some cases that doesn't matter (benchmarks are not always realistic, and
> > > > they are not representative of every workload).
> > > >
> > >
> > > I agree that microbenchmark is not always useful, focusing too much on
> > > improving microbenchmark result gives me feeling of "overfit" (to some
> > > specific microarchitecture, compiler, or use case).
> > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Reply via email to