Le 21/07/2022 à 16:34, Wes McKinney a écrit :
Based on the discussion in https://github.com/apache/arrow/pull/13661,
it seems that one major issue with switching to -O2 is that
auto-vectorization (which we rely on in places) and perhaps some other
optimization passes would have to be manually enabled in gcc.
This benchmark run just completed but it does not have
autovectorization enabled, so the benchmark differences appear to be
caused by that
https://conbench.ursa.dev/compare/runs/e938638743e84794ad829524fae04fbd...20727b1b390e4b30be10f49db7f06f3f/
My inclination is that we should leave things as is and keep an eye on
symbol sizes (which we can easily do using the
cpp/tools/binary_symbol_explore.py tool that I wrote -- it computes
diffs right now but can be easily modified/extended to also print the
largest symbols in a shared library) in case we have other instances
of runaway code size.
The benchmark results above are a mixed bag, but not strongly in favour
of -O3.
I would like to receive more opinions, but from the feedback so far it
seems that switching to -O2 would be more robust? If so, then I would
be in favour of doing it.
Regards
Antoine.
On Thu, Jul 21, 2022 at 8:11 AM Yaron Gvili <rt...@hotmail.com> wrote:
only enable -O3 on source files selectively that can be demonstrated to benefit
from it
Unfortunately, actual benefits from -O3 are application dependent. As
https://www.linuxjournal.com/article/7269 explains:
"Although -O3 can produce fast code, the increase in the size of the image can have
adverse effects on its speed. For example, if the size of the image exceeds the size of
the available instruction cache, severe performance penalties can be observed. Therefore,
it may be better simply to compile at -O2 to increase the chances that the image fits in
the instruction cache."
The image size of a hot-spot of a specific application utilizing some Arrow
code compiled with -O3 could exceed the instruction cache size due to this code
even if the same Arrow code demonstrated better performance in Arrow benchmarks
comparing -O2 with -O3 compilation.
In the short-term, I join Wes' suggestion of trying to compile everything with
-O2 and checking that no existing benchmark suffers too much. Hopefully, none
would, and that would justify a switch to -O2. In the longer-term, I'd suggest
making a bisection tool for selecting the best optimization flags for Arrow
modules in the context of application-specific benchmarks.
Yaron.
________________________________
From: Sasha Krassovsky <krassovskysa...@gmail.com>
Sent: Wednesday, July 20, 2022 5:55 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] Moving from -O3 to -O2 optimization level in release builds
I’d +1 on this - in my past experience I’ve mostly seen -O2. It would make
sense to default to -O2 and only enable -O3 on source files selectively that
can be demonstrated to benefit from it (if anyone actually spends the time to
look into it).
Sasha
On Jul 20, 2022, at 2:10 PM, Wes McKinney <wesmck...@gmail.com> wrote:
hi all,
Antoine and I were digging into a weird issue where gcc in -O3
generated ~40KB of optimized code for a function which was less than
2KB in -O2, and where a "leaner" implementation (in PR 13654) was yet
faster and smaller. You can see some of the discussion at
https://github.com/apache/arrow/pull/13654
-O3 is known to have some other issues in additional to occasional
runaway code size -- I opened
https://github.com/apache/arrow/pull/13661
to explore changing out release optimization level to -O2 and see what
are the performance implications in our benchmarks (and likely make
builds meaningfully faster). If anyone has any thoughts about this let
us know!
Thanks,
WES