On Thu, Jun 27, 2019 at 2:19 PM Bill Schmidt <[email protected]> wrote: > > On 6/27/19 6:45 AM, Segher Boessenkool wrote: > > On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote: > >> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt <[email protected]> > >> wrote: > >>> We've done some experimenting and realized that the subject option almost > >>> always provide improved performance for Power when the loop unroller is > >>> enabled. So this patch turns that flag on by default for us. > >> I guess it creates more freedom for combine (more single-uses) and register > >> allocation. I wonder in which cases this might pessimize things? I guess > >> the pre-RA scheduler might make RAs life harder with creating overlapping > >> life-ranges. > >> > >> I guess you didn't actually investigate the nature of the improvements you > >> saw? > > It breaks the length of dependency chains by a factor equal to the unroll > > factor. I do not know why this doesn't help a lot everywhere. It of > > course raises register pressure, maybe that is just it? > > Right, it's all about breaking dependencies to more efficiently exploit > the microarchitecture. By default, variable expansion in GCC is quite > conservative, creating only two reduction streams out of one, so it's > pretty rare for it to cause spill. This can be adjusted upwards with > --param max-variable-expansions-in-unroller=n. Our experiments show > that raising n to 4 starts to cause some minor degradations, which are > almost certainly due to pressure, so the default setting looks appropriate.
But it's probably only an issue for targets which enable pre-RA scheduling by default? It might also increase RA compile-time (more allocnos). Richard. > >> Do we want to adjust the flags documentation, saying whether this is > >> enabled > >> by default depends on the target (or even list them)? > > Good idea, thanks. > > OK, I'll update the docs and make the change that Segher requested. > Thanks for the reviews! > > Bill > > > > > > Segher >
