Hi Thomas,

Very many thanks for your help investigating this problem.

> > This patch addresses the "increased register pressure" regression
> > on nvptx-none caused by my change to transition the backend to
> > a STORE_FLAG_VALUE = 1 target.
> 
> Yes, "addresses", but unfortunately doesn't "resolve".  ;-|

Doh!

> I'm confirming the improved code generation (less registers used, less
> instructions emitted) in cases where it triggers -- but unfortunately it
> doesn't in the PR104345 'libgomp.oacc-c-c++-common/reduction-cplx-dbl.c'
> scenario.

Looking over the nvptx code currently generated for reduction-cplx-dbl.c,
it appears nearly optimal and it's difficult to see what could have regressed.
[It makes almost no uses of Boolean types, so is relatively unaffected
by a STORE_FLAG_VALUE change].  One remaining possibility is that
the "register usage" regression is not in reduction-cplx-dbl.c itself but
in __muldc3 in libgcc.a.  [I believe kernel resource usage is computed
including all called functions].

Might this be easy to test on your configuration, moving libgcc.a from
one build to another?

If it is __muldc3 regressing, then the other nvptx patches mentioned
previously, and perhaps even improvements to isnan and isinf, may help.

Again apologies that the "using nvptx set.?32 for "cond ? -1: 0" patch, that
catches many of the issues observed in your initial PR analysis, isn't actually
the root cause of this particular case.

Thanks again,
Roger
--


Reply via email to