Re: Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

Bernd Schmidt Wed, 10 Feb 2016 08:38:22 -0800

On 02/10/2016 05:23 PM, Thomas Schwinge wrote:

Why?  A user of GCC has no intrinsic interest in getting OpenACC kernels
constructs' code offloaded; the user wants his code to execute as fast as
possible.


If you consider the whole of OpenACC kernels code offloading as a
compiler optimization, then it's fine for GCC to abort this
"optimization" if it's reasonably clear that this transformation (code
offloading) will not be profitable -- just like what GCC does with other
possible code optimizations/transformations.

Yes, but if a single kernel (which might not even get executed atrun-time) can inhibit offloading for the whole program, then we're notmaking an intelligent decision, and IMO violating user expectations.IIUC it's also disabling offloading for parallels rather than justkernels, which we previously said shouldn't happen.

As I've said before,
profiling the execution times of several real-world codes has shown that
under the assumtion that parloops fails to parallelize one kernel (one
out of possibly many), this one kernel has always been a "hot spot", and
avoiding offloading in this case has always helped prevent performance
degradation below host-fallback performance.

IMO a warning for the specific kernel that's problematic would be betterso that users can selectively apply -fopenacc to files where it isprofitable.

It's of course unfortunate that we have to disable our offloading
machinery for a lot of codes using OpenACC kernels, but given the current
state of OpenACC kernels parallelization analysis (parloops), doing so is
still profitable for a user, compared to regressed performance with
single-threaded offloaded execution.

How often does this occur on real-world code? Will we end up supportingOpenACC by not doing offloading at all in the usual case? The way youdescribe it, it sounds like we should recommend that -fopenacc not beused in gcc-6 and restore the previous invoke.texi langauge that marksit as experimental.



Bernd

Re: Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

Reply via email to