Re: [DISCUSS][C++] Performance work and compiler standardization for linux
On Wed, Jun 24, 2020 at 9:48 PM Micah Kornfield wrote: > > In that case I would propose the following: > 1. Standardize on clang for performance generating numbers for performance > related PRs > 2. Adjust our binary artifact builds to use clang where feasible (I think > should wait until after our next release). Note that the manylinux* standards mandate gcc so we'll be stuck on gcc for the Python wheels and perhaps some other builds. Otherwise I'd be in favor of using clang when possible. > 3. Add to the contributors guide summarizing the discussion. > > Sound reasonable? > > Thanks, > Micah > > On Tue, Jun 23, 2020 at 12:53 AM Uwe L. Korn wrote: > > > FTR: We can use the latest(!) clang for all platform for conda and wheels. > > It isn't probably even that much of a complicated setup. > > > > On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote: > > > We should aim to improve the performance of the most widely used > > > *default* packages, which are python pip, python conda and R (all > > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > > > (haven't checked) that clang is used on OSX via brew. Thus, by > > > default, almost all users are going to use a gcc compiled version of > > > arrow on Linux. > > > > > > François > > > > > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney > > wrote: > > > > > > > > Based on some of my performance work recently, I'm growing > > > > uncomfortable with using gcc as the performance baseline since the > > > > results can be significantly different (sometimes 3-4x or more on > > > > certain fast algorithms) from clang and MSVC. The perf results on > > > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > > > some benchmarks that showed 2-5x performance improvement on both clang > > > > and MSVC shows small regressions (20-30%) with gcc. > > > > > > > > I don't think we need a hard-and-fast rule about whether to accept PRs > > > > based on benchmarks but there are a few guiding criteria: > > > > > > > > * How much binary size does the new code add? I think many of us would > > > > agree that a 20% performance increase on some algorithm might not be > > > > worth adding 500KB to libarrow.so > > > > * Is the code generally faster across the major compiler targets (gcc, > > > > clang, MSVC)? > > > > > > > > I think that using clang as a baseline for informational benchmarks > > > > would be good, but ultimately we need to be systematically collecting > > > > data on all the major compiilers. Some time ago I proposed building a > > > > Continuous Benchmarking framework > > > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > > > > for use with Arrow (and outside of Arrow, too) so I hope that this > > > > will be able to help. > > > > > > > > - Wes > > > > > > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > > > > > > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > > > > >> There has been significant effort recently trying to optimize our > > C++ > > > > > >> code. One thing that seems to come up frequently is different > > benchmark > > > > > >> results between GCC and Clang. Even different versions of the > > same > > > > > >> compiler can yield significantly different results on the same > > code. > > > > > >> > > > > > >> I would like to propose that we choose a specific compiler and > > version on > > > > > >> Linux for evaluating performance related PRs. PRs would only be > > accepted > > > > > >> if they improve the benchmarks under the selected version. > > > > > > > > > > > > Would this be a hard rule or just a guideline? There are many > > ways in > > > > > > which benchmark numbers can be improved or deteriorated by a PR, > > and in > > > > > > some cases that doesn't matter (benchmarks are not always > > realistic, and > > > > > > they are not representative of every workload). > > > > > > > > > > > > > > > > I agree that microbenchmark is not always useful, focusing too much > > on > > > > > improving microbenchmark result gives me feeling of "overfit" (to > > some > > > > > specific microarchitecture, compiler, or use case). > > > > > > > > > > > Regards > > > > > > > > > > > > Antoine. > > > > > > > > > > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
In that case I would propose the following: 1. Standardize on clang for performance generating numbers for performance related PRs 2. Adjust our binary artifact builds to use clang where feasible (I think should wait until after our next release). 3. Add to the contributors guide summarizing the discussion. Sound reasonable? Thanks, Micah On Tue, Jun 23, 2020 at 12:53 AM Uwe L. Korn wrote: > FTR: We can use the latest(!) clang for all platform for conda and wheels. > It isn't probably even that much of a complicated setup. > > On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote: > > We should aim to improve the performance of the most widely used > > *default* packages, which are python pip, python conda and R (all > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > > (haven't checked) that clang is used on OSX via brew. Thus, by > > default, almost all users are going to use a gcc compiled version of > > arrow on Linux. > > > > François > > > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney > wrote: > > > > > > Based on some of my performance work recently, I'm growing > > > uncomfortable with using gcc as the performance baseline since the > > > results can be significantly different (sometimes 3-4x or more on > > > certain fast algorithms) from clang and MSVC. The perf results on > > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > > some benchmarks that showed 2-5x performance improvement on both clang > > > and MSVC shows small regressions (20-30%) with gcc. > > > > > > I don't think we need a hard-and-fast rule about whether to accept PRs > > > based on benchmarks but there are a few guiding criteria: > > > > > > * How much binary size does the new code add? I think many of us would > > > agree that a 20% performance increase on some algorithm might not be > > > worth adding 500KB to libarrow.so > > > * Is the code generally faster across the major compiler targets (gcc, > > > clang, MSVC)? > > > > > > I think that using clang as a baseline for informational benchmarks > > > would be good, but ultimately we need to be systematically collecting > > > data on all the major compiilers. Some time ago I proposed building a > > > Continuous Benchmarking framework > > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > > > for use with Arrow (and outside of Arrow, too) so I hope that this > > > will be able to help. > > > > > > - Wes > > > > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > > > > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > > > >> There has been significant effort recently trying to optimize our > C++ > > > > >> code. One thing that seems to come up frequently is different > benchmark > > > > >> results between GCC and Clang. Even different versions of the > same > > > > >> compiler can yield significantly different results on the same > code. > > > > >> > > > > >> I would like to propose that we choose a specific compiler and > version on > > > > >> Linux for evaluating performance related PRs. PRs would only be > accepted > > > > >> if they improve the benchmarks under the selected version. > > > > > > > > > > Would this be a hard rule or just a guideline? There are many > ways in > > > > > which benchmark numbers can be improved or deteriorated by a PR, > and in > > > > > some cases that doesn't matter (benchmarks are not always > realistic, and > > > > > they are not representative of every workload). > > > > > > > > > > > > > I agree that microbenchmark is not always useful, focusing too much > on > > > > improving microbenchmark result gives me feeling of "overfit" (to > some > > > > specific microarchitecture, compiler, or use case). > > > > > > > > > Regards > > > > > > > > > > Antoine. > > > > > > > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
FTR: We can use the latest(!) clang for all platform for conda and wheels. It isn't probably even that much of a complicated setup. On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote: > We should aim to improve the performance of the most widely used > *default* packages, which are python pip, python conda and R (all > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > (haven't checked) that clang is used on OSX via brew. Thus, by > default, almost all users are going to use a gcc compiled version of > arrow on Linux. > > François > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney wrote: > > > > Based on some of my performance work recently, I'm growing > > uncomfortable with using gcc as the performance baseline since the > > results can be significantly different (sometimes 3-4x or more on > > certain fast algorithms) from clang and MSVC. The perf results on > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > some benchmarks that showed 2-5x performance improvement on both clang > > and MSVC shows small regressions (20-30%) with gcc. > > > > I don't think we need a hard-and-fast rule about whether to accept PRs > > based on benchmarks but there are a few guiding criteria: > > > > * How much binary size does the new code add? I think many of us would > > agree that a 20% performance increase on some algorithm might not be > > worth adding 500KB to libarrow.so > > * Is the code generally faster across the major compiler targets (gcc, > > clang, MSVC)? > > > > I think that using clang as a baseline for informational benchmarks > > would be good, but ultimately we need to be systematically collecting > > data on all the major compiilers. Some time ago I proposed building a > > Continuous Benchmarking framework > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > > for use with Arrow (and outside of Arrow, too) so I hope that this > > will be able to help. > > > > - Wes > > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > > >> There has been significant effort recently trying to optimize our C++ > > > >> code. One thing that seems to come up frequently is different > > > >> benchmark > > > >> results between GCC and Clang. Even different versions of the same > > > >> compiler can yield significantly different results on the same code. > > > >> > > > >> I would like to propose that we choose a specific compiler and version > > > >> on > > > >> Linux for evaluating performance related PRs. PRs would only be > > > >> accepted > > > >> if they improve the benchmarks under the selected version. > > > > > > > > Would this be a hard rule or just a guideline? There are many ways in > > > > which benchmark numbers can be improved or deteriorated by a PR, and in > > > > some cases that doesn't matter (benchmarks are not always realistic, and > > > > they are not representative of every workload). > > > > > > > > > > I agree that microbenchmark is not always useful, focusing too much on > > > improving microbenchmark result gives me feeling of "overfit" (to some > > > specific microarchitecture, compiler, or use case). > > > > > > > Regards > > > > > > > > Antoine. > > > > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
I think there is a lot of good discussion on this thread. Let me try to try to summarize and give my thoughts: 1. Will this be a hard rule? No, based on the feedback, I think there will always be some subjectivity (e.g. tradeoff in increase in code size and maintainability). My intent here was to try to reduce cognitive load on reviewers and give clear guidance for contributors on how PRs for performance will be evaluated. 2. Why don't we track all of them? We should be tracking all of them. I think the question is what happens when they diverge on benchmarks/real world use? How do we decide if the PR should be merged? This goes back to point #1, having some basic guidelines is useful. I would imagine we might already be "nerfing" users on older linux distros with older C++ toolchains. Going back to the point above, it would be good to have a set of criteria on how much effort we invest to avoid or fix regressions in this case. I would hope that the pattern we will see is that modern compilers will likely converge. Looking into GCC/Clang specifically, it appears GCC 8/Clang 8 is perhaps the wrong comparison. Clang 8 appears to have been released 8 months after GCC 8. GCC 9 was released 2 months after Clang 8. 3. Microbenchmarks don't tell the whole story? (we shouldn't overfit) I agree, but we should have at least a few more macro style benchmarks that can help judge overall impact. This is also where the other criteria come into play on deciding whether to merge. 4. Which compiler to choose? I agree with the sentiment that we should be choosing a compiler that can reflect performance in our prepackaged binaries. Based on this thread, I'm a little bit confused as to whether we can build more of our code with a single compiler for the distribution channels (it sounds like it might be viable with clang). But IIUC most of our builds on, linux at least, use GCC. Are we even using consistent versions of GCC across our packages? I think it is a bad outcome for our users (and potentially a headache for us) if we announce X features improved by Y% and users actually see a regression. Thanks, Micah On Mon, Jun 22, 2020 at 10:08 AM Wes McKinney wrote: > For the curious, I ran benchmarks for both gcc-8 and clang-8 on my > laptop for the ARROW-9197 patch > > * Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470 > * GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670 > > It's apparent in this particular instance that GCC was applying some > optimizations pretty well outside of what Clang was doing. The > baseline performance was 5-6x faster on GCC for certain cases than > Clang. Very strange indeed > > On Mon, Jun 22, 2020 at 11:48 AM Neal Richardson > wrote: > > > > FTR, by default R on macOS uses system clang, not homebrew; for Windows, > it > > is gcc (unless you're adventurous like Uwe and can hack it to work with > > clang ;); and on Linux, CRAN checks both gcc and clang. > > > > Is it unreasonable to check *both* gcc and clang when there's a > potentially > > significant performance change? It sounded like recently there have been > > cases where a change makes a big speedup with one compiler but a slowdown > > with the other, so if we only test with one, we'll be degrading with the > > other over time. But if we test both (simplifying of course, since there > > are multiple versions, etc.), then at least we'd be more aware of what > the > > tradeoff is. > > > > Neal > > > > > > On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques < > > fsaintjacq...@gmail.com> wrote: > > > > > We should aim to improve the performance of the most widely used > > > *default* packages, which are python pip, python conda and R (all > > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > > > (haven't checked) that clang is used on OSX via brew. Thus, by > > > default, almost all users are going to use a gcc compiled version of > > > arrow on Linux. > > > > > > François > > > > > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney > wrote: > > > > > > > > Based on some of my performance work recently, I'm growing > > > > uncomfortable with using gcc as the performance baseline since the > > > > results can be significantly different (sometimes 3-4x or more on > > > > certain fast algorithms) from clang and MSVC. The perf results on > > > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > > > some benchmarks that showed 2-5x performance improvement on both > clang > > > > and MSVC shows small regressions (20-30%) with gcc. > > > > > > > > I don't think we need a hard-and-fast rule about whether to accept > PRs > > > > based on benchmarks but there are a few guiding criteria: > > > > > > > > * How much binary size does the new code add? I think many of us > would > > > > agree that a 20% performance increase on some algorithm might not be > > > > worth adding
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
For the curious, I ran benchmarks for both gcc-8 and clang-8 on my laptop for the ARROW-9197 patch * Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470 * GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670 It's apparent in this particular instance that GCC was applying some optimizations pretty well outside of what Clang was doing. The baseline performance was 5-6x faster on GCC for certain cases than Clang. Very strange indeed On Mon, Jun 22, 2020 at 11:48 AM Neal Richardson wrote: > > FTR, by default R on macOS uses system clang, not homebrew; for Windows, it > is gcc (unless you're adventurous like Uwe and can hack it to work with > clang ;); and on Linux, CRAN checks both gcc and clang. > > Is it unreasonable to check *both* gcc and clang when there's a potentially > significant performance change? It sounded like recently there have been > cases where a change makes a big speedup with one compiler but a slowdown > with the other, so if we only test with one, we'll be degrading with the > other over time. But if we test both (simplifying of course, since there > are multiple versions, etc.), then at least we'd be more aware of what the > tradeoff is. > > Neal > > > On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques < > fsaintjacq...@gmail.com> wrote: > > > We should aim to improve the performance of the most widely used > > *default* packages, which are python pip, python conda and R (all > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > > (haven't checked) that clang is used on OSX via brew. Thus, by > > default, almost all users are going to use a gcc compiled version of > > arrow on Linux. > > > > François > > > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney wrote: > > > > > > Based on some of my performance work recently, I'm growing > > > uncomfortable with using gcc as the performance baseline since the > > > results can be significantly different (sometimes 3-4x or more on > > > certain fast algorithms) from clang and MSVC. The perf results on > > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > > some benchmarks that showed 2-5x performance improvement on both clang > > > and MSVC shows small regressions (20-30%) with gcc. > > > > > > I don't think we need a hard-and-fast rule about whether to accept PRs > > > based on benchmarks but there are a few guiding criteria: > > > > > > * How much binary size does the new code add? I think many of us would > > > agree that a 20% performance increase on some algorithm might not be > > > worth adding 500KB to libarrow.so > > > * Is the code generally faster across the major compiler targets (gcc, > > > clang, MSVC)? > > > > > > I think that using clang as a baseline for informational benchmarks > > > would be good, but ultimately we need to be systematically collecting > > > data on all the major compiilers. Some time ago I proposed building a > > > Continuous Benchmarking framework > > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > > > for use with Arrow (and outside of Arrow, too) so I hope that this > > > will be able to help. > > > > > > - Wes > > > > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > > > > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > > > >> There has been significant effort recently trying to optimize our > > C++ > > > > >> code. One thing that seems to come up frequently is different > > benchmark > > > > >> results between GCC and Clang. Even different versions of the same > > > > >> compiler can yield significantly different results on the same code. > > > > >> > > > > >> I would like to propose that we choose a specific compiler and > > version on > > > > >> Linux for evaluating performance related PRs. PRs would only be > > accepted > > > > >> if they improve the benchmarks under the selected version. > > > > > > > > > > Would this be a hard rule or just a guideline? There are many ways > > in > > > > > which benchmark numbers can be improved or deteriorated by a PR, and > > in > > > > > some cases that doesn't matter (benchmarks are not always realistic, > > and > > > > > they are not representative of every workload). > > > > > > > > > > > > > I agree that microbenchmark is not always useful, focusing too much on > > > > improving microbenchmark result gives me feeling of "overfit" (to some > > > > specific microarchitecture, compiler, or use case). > > > > > > > > > Regards > > > > > > > > > > Antoine. > > > > > > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
FTR, by default R on macOS uses system clang, not homebrew; for Windows, it is gcc (unless you're adventurous like Uwe and can hack it to work with clang ;); and on Linux, CRAN checks both gcc and clang. Is it unreasonable to check *both* gcc and clang when there's a potentially significant performance change? It sounded like recently there have been cases where a change makes a big speedup with one compiler but a slowdown with the other, so if we only test with one, we'll be degrading with the other over time. But if we test both (simplifying of course, since there are multiple versions, etc.), then at least we'd be more aware of what the tradeoff is. Neal On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques < fsaintjacq...@gmail.com> wrote: > We should aim to improve the performance of the most widely used > *default* packages, which are python pip, python conda and R (all > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose > (haven't checked) that clang is used on OSX via brew. Thus, by > default, almost all users are going to use a gcc compiled version of > arrow on Linux. > > François > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney wrote: > > > > Based on some of my performance work recently, I'm growing > > uncomfortable with using gcc as the performance baseline since the > > results can be significantly different (sometimes 3-4x or more on > > certain fast algorithms) from clang and MSVC. The perf results on > > https://github.com/apache/arrow/pull/7506 were really surprising -- > > some benchmarks that showed 2-5x performance improvement on both clang > > and MSVC shows small regressions (20-30%) with gcc. > > > > I don't think we need a hard-and-fast rule about whether to accept PRs > > based on benchmarks but there are a few guiding criteria: > > > > * How much binary size does the new code add? I think many of us would > > agree that a 20% performance increase on some algorithm might not be > > worth adding 500KB to libarrow.so > > * Is the code generally faster across the major compiler targets (gcc, > > clang, MSVC)? > > > > I think that using clang as a baseline for informational benchmarks > > would be good, but ultimately we need to be systematically collecting > > data on all the major compiilers. Some time ago I proposed building a > > Continuous Benchmarking framework > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > > for use with Arrow (and outside of Arrow, too) so I hope that this > > will be able to help. > > > > - Wes > > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > > >> There has been significant effort recently trying to optimize our > C++ > > > >> code. One thing that seems to come up frequently is different > benchmark > > > >> results between GCC and Clang. Even different versions of the same > > > >> compiler can yield significantly different results on the same code. > > > >> > > > >> I would like to propose that we choose a specific compiler and > version on > > > >> Linux for evaluating performance related PRs. PRs would only be > accepted > > > >> if they improve the benchmarks under the selected version. > > > > > > > > Would this be a hard rule or just a guideline? There are many ways > in > > > > which benchmark numbers can be improved or deteriorated by a PR, and > in > > > > some cases that doesn't matter (benchmarks are not always realistic, > and > > > > they are not representative of every workload). > > > > > > > > > > I agree that microbenchmark is not always useful, focusing too much on > > > improving microbenchmark result gives me feeling of "overfit" (to some > > > specific microarchitecture, compiler, or use case). > > > > > > > Regards > > > > > > > > Antoine. > > > > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
We should aim to improve the performance of the most widely used *default* packages, which are python pip, python conda and R (all platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose (haven't checked) that clang is used on OSX via brew. Thus, by default, almost all users are going to use a gcc compiled version of arrow on Linux. François On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney wrote: > > Based on some of my performance work recently, I'm growing > uncomfortable with using gcc as the performance baseline since the > results can be significantly different (sometimes 3-4x or more on > certain fast algorithms) from clang and MSVC. The perf results on > https://github.com/apache/arrow/pull/7506 were really surprising -- > some benchmarks that showed 2-5x performance improvement on both clang > and MSVC shows small regressions (20-30%) with gcc. > > I don't think we need a hard-and-fast rule about whether to accept PRs > based on benchmarks but there are a few guiding criteria: > > * How much binary size does the new code add? I think many of us would > agree that a 20% performance increase on some algorithm might not be > worth adding 500KB to libarrow.so > * Is the code generally faster across the major compiler targets (gcc, > clang, MSVC)? > > I think that using clang as a baseline for informational benchmarks > would be good, but ultimately we need to be systematically collecting > data on all the major compiilers. Some time ago I proposed building a > Continuous Benchmarking framework > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) > for use with Arrow (and outside of Arrow, too) so I hope that this > will be able to help. > > - Wes > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > > >> There has been significant effort recently trying to optimize our C++ > > >> code. One thing that seems to come up frequently is different benchmark > > >> results between GCC and Clang. Even different versions of the same > > >> compiler can yield significantly different results on the same code. > > >> > > >> I would like to propose that we choose a specific compiler and version on > > >> Linux for evaluating performance related PRs. PRs would only be accepted > > >> if they improve the benchmarks under the selected version. > > > > > > Would this be a hard rule or just a guideline? There are many ways in > > > which benchmark numbers can be improved or deteriorated by a PR, and in > > > some cases that doesn't matter (benchmarks are not always realistic, and > > > they are not representative of every workload). > > > > > > > I agree that microbenchmark is not always useful, focusing too much on > > improving microbenchmark result gives me feeling of "overfit" (to some > > specific microarchitecture, compiler, or use case). > > > > > Regards > > > > > > Antoine. > > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
Based on some of my performance work recently, I'm growing uncomfortable with using gcc as the performance baseline since the results can be significantly different (sometimes 3-4x or more on certain fast algorithms) from clang and MSVC. The perf results on https://github.com/apache/arrow/pull/7506 were really surprising -- some benchmarks that showed 2-5x performance improvement on both clang and MSVC shows small regressions (20-30%) with gcc. I don't think we need a hard-and-fast rule about whether to accept PRs based on benchmarks but there are a few guiding criteria: * How much binary size does the new code add? I think many of us would agree that a 20% performance increase on some algorithm might not be worth adding 500KB to libarrow.so * Is the code generally faster across the major compiler targets (gcc, clang, MSVC)? I think that using clang as a baseline for informational benchmarks would be good, but ultimately we need to be systematically collecting data on all the major compiilers. Some time ago I proposed building a Continuous Benchmarking framework (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md) for use with Arrow (and outside of Arrow, too) so I hope that this will be able to help. - Wes On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai wrote: > > On 6/22/20 5:07 PM, Antoine Pitrou wrote: > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > >> There has been significant effort recently trying to optimize our C++ > >> code. One thing that seems to come up frequently is different benchmark > >> results between GCC and Clang. Even different versions of the same > >> compiler can yield significantly different results on the same code. > >> > >> I would like to propose that we choose a specific compiler and version on > >> Linux for evaluating performance related PRs. PRs would only be accepted > >> if they improve the benchmarks under the selected version. > > > > Would this be a hard rule or just a guideline? There are many ways in > > which benchmark numbers can be improved or deteriorated by a PR, and in > > some cases that doesn't matter (benchmarks are not always realistic, and > > they are not representative of every workload). > > > > I agree that microbenchmark is not always useful, focusing too much on > improving microbenchmark result gives me feeling of "overfit" (to some > specific microarchitecture, compiler, or use case). > > > Regards > > > > Antoine. > >
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
On 6/22/20 5:07 PM, Antoine Pitrou wrote: Le 22/06/2020 à 06:27, Micah Kornfield a écrit : There has been significant effort recently trying to optimize our C++ code. One thing that seems to come up frequently is different benchmark results between GCC and Clang. Even different versions of the same compiler can yield significantly different results on the same code. I would like to propose that we choose a specific compiler and version on Linux for evaluating performance related PRs. PRs would only be accepted if they improve the benchmarks under the selected version. Would this be a hard rule or just a guideline? There are many ways in which benchmark numbers can be improved or deteriorated by a PR, and in some cases that doesn't matter (benchmarks are not always realistic, and they are not representative of every workload). I agree that microbenchmark is not always useful, focusing too much on improving microbenchmark result gives me feeling of "overfit" (to some specific microarchitecture, compiler, or use case). Regards Antoine.
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
Also, in my experience, it is much easier to install another clang version than another gcc version on Linux (gcc is more or less married to a given libstdc++ version, AFAICT). Regards Antoine. Le 22/06/2020 à 09:06, Uwe L. Korn a écrit : > With my conda-forge background, I would suggest to use clang as a performance > baseline, because it's currently the only compiler that works reliably on all > platforms. Most Linux distributions are nowadays built with gcc, also making > a strong argument, but on OSX and Windows the picture is a bit different. > > Linux: One can use gcc or clang, as long as you use the right flags, they are > interchangable. > macOS: There are gcc builds but I have seen in recent time people solely > using clang, either the one coming with Xcode or a newer build. > Windows (MSVC-toolchain): You can use MSVC or clang, while most people prefer > MSVC, clang is well-tested (e.g. Chrome uses it) > Windows (MSYS2-toolchain): Mostly uses GCC but clang is also supported. There > were some informal mentions that clang performs better here when using AVX2. > > Cheers > Uwe > > > On Mon, Jun 22, 2020, at 6:27 AM, Micah Kornfield wrote: >> There has been significant effort recently trying to optimize our C++ >> code. One thing that seems to come up frequently is different benchmark >> results between GCC and Clang. Even different versions of the same >> compiler can yield significantly different results on the same code. >> >> I would like to propose that we choose a specific compiler and version on >> Linux for evaluating performance related PRs. PRs would only be accepted >> if they improve the benchmarks under the selected version. Other >> performance related PRs would still be acceptable if they improve >> benchmarks for a different compiler as long as they don't make the primary >> one worse and don't hurt maintainability of the code. I also don't think >> this should change our CI matrix in any way. >> >> There are other variables that potentially come into play for evaluating >> benchmarks (architecture, OS Version, etc) but if we need to limit these in >> a similar way, I think they should be handled as separate discussions. >> >> I'm not clear on the limitations that our distributions place on compilers, >> but ideally we'd pick a compiler that can produce most/all of our binary >> artifacts. >> >> Could someone more familiar with the release toolchain comment on good >> options to consider? >> >> Any other criteria we should consider? Other thoughts? >> >> Thanks, >> Micah >>
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > There has been significant effort recently trying to optimize our C++ > code. One thing that seems to come up frequently is different benchmark > results between GCC and Clang. Even different versions of the same > compiler can yield significantly different results on the same code. > > I would like to propose that we choose a specific compiler and version on > Linux for evaluating performance related PRs. PRs would only be accepted > if they improve the benchmarks under the selected version. Would this be a hard rule or just a guideline? There are many ways in which benchmark numbers can be improved or deteriorated by a PR, and in some cases that doesn't matter (benchmarks are not always realistic, and they are not representative of every workload). Regards Antoine.
Re: [DISCUSS][C++] Performance work and compiler standardization for linux
With my conda-forge background, I would suggest to use clang as a performance baseline, because it's currently the only compiler that works reliably on all platforms. Most Linux distributions are nowadays built with gcc, also making a strong argument, but on OSX and Windows the picture is a bit different. Linux: One can use gcc or clang, as long as you use the right flags, they are interchangable. macOS: There are gcc builds but I have seen in recent time people solely using clang, either the one coming with Xcode or a newer build. Windows (MSVC-toolchain): You can use MSVC or clang, while most people prefer MSVC, clang is well-tested (e.g. Chrome uses it) Windows (MSYS2-toolchain): Mostly uses GCC but clang is also supported. There were some informal mentions that clang performs better here when using AVX2. Cheers Uwe On Mon, Jun 22, 2020, at 6:27 AM, Micah Kornfield wrote: > There has been significant effort recently trying to optimize our C++ > code. One thing that seems to come up frequently is different benchmark > results between GCC and Clang. Even different versions of the same > compiler can yield significantly different results on the same code. > > I would like to propose that we choose a specific compiler and version on > Linux for evaluating performance related PRs. PRs would only be accepted > if they improve the benchmarks under the selected version. Other > performance related PRs would still be acceptable if they improve > benchmarks for a different compiler as long as they don't make the primary > one worse and don't hurt maintainability of the code. I also don't think > this should change our CI matrix in any way. > > There are other variables that potentially come into play for evaluating > benchmarks (architecture, OS Version, etc) but if we need to limit these in > a similar way, I think they should be handled as separate discussions. > > I'm not clear on the limitations that our distributions place on compilers, > but ideally we'd pick a compiler that can produce most/all of our binary > artifacts. > > Could someone more familiar with the release toolchain comment on good > options to consider? > > Any other criteria we should consider? Other thoughts? > > Thanks, > Micah >