Le 21/07/2022 à 16:34, Wes McKinney a écrit :
Based on the discussion in https://github.com/apache/arrow/pull/13661,
it seems that one major issue with switching to -O2 is that
auto-vectorization (which we rely on in places) and perhaps some other
optimization passes would have to be manually enabled in gcc.

This benchmark run just completed but it does not have
autovectorization enabled, so the benchmark differences appear to be
caused by that

https://conbench.ursa.dev/compare/runs/e938638743e84794ad829524fae04fbd...20727b1b390e4b30be10f49db7f06f3f/

My inclination is that we should leave things as is and keep an eye on
symbol sizes (which we can easily do using the
cpp/tools/binary_symbol_explore.py tool that I wrote -- it computes
diffs right now but can be easily modified/extended to also print the
largest symbols in a shared library) in case we have other instances
of runaway code size.

The benchmark results above are a mixed bag, but not strongly in favour of -O3.

I would like to receive more opinions, but from the feedback so far it seems that switching to -O2 would be more robust? If so, then I would be in favour of doing it.

Regards

Antoine.



On Thu, Jul 21, 2022 at 8:11 AM Yaron Gvili <rt...@hotmail.com> wrote:

only enable -O3 on source files selectively that can be demonstrated to benefit 
from it

Unfortunately, actual benefits from -O3 are application dependent. As 
https://www.linuxjournal.com/article/7269 explains:

"Although -O3 can produce fast code, the increase in the size of the image can have 
adverse effects on its speed. For example, if the size of the image exceeds the size of 
the available instruction cache, severe performance penalties can be observed. Therefore, 
it may be better simply to compile at -O2 to increase the chances that the image fits in 
the instruction cache."

The image size of a hot-spot of a specific application utilizing some Arrow 
code compiled with -O3 could exceed the instruction cache size due to this code 
even if the same Arrow code demonstrated better performance in Arrow benchmarks 
comparing -O2 with -O3 compilation.

In the short-term, I join Wes' suggestion of trying to compile everything with 
-O2 and checking that no existing benchmark suffers too much. Hopefully, none 
would, and that would justify a switch to -O2. In the longer-term, I'd suggest 
making a bisection tool for selecting the best optimization flags for Arrow 
modules in the context of application-specific benchmarks.


Yaron.
________________________________
From: Sasha Krassovsky <krassovskysa...@gmail.com>
Sent: Wednesday, July 20, 2022 5:55 PM
To: dev@arrow.apache.org <dev@arrow.apache.org>
Subject: Re: [C++] Moving from -O3 to -O2 optimization level in release builds

I’d +1 on this - in my past experience I’ve mostly seen -O2. It would make 
sense to default to -O2 and only enable -O3 on source files selectively that 
can be demonstrated to benefit from it (if anyone actually spends the time to 
look into it).

Sasha

On Jul 20, 2022, at 2:10 PM, Wes McKinney <wesmck...@gmail.com> wrote:

hi all,

Antoine and I were digging into a weird issue where gcc in -O3
generated ~40KB of optimized code for a function which was less than
2KB in -O2, and where a "leaner" implementation (in PR 13654) was yet
faster and smaller. You can see some of the discussion at

https://github.com/apache/arrow/pull/13654

-O3 is known to have some other issues in additional to occasional
runaway code size -- I opened

https://github.com/apache/arrow/pull/13661

to explore changing out release optimization level to -O2 and see what
are the performance implications in our benchmarks (and likely make
builds meaningfully faster). If anyone has any thoughts about this let
us know!

Thanks,
WES

Reply via email to