On 02/10/2016 05:23 PM, Thomas Schwinge wrote:
Why?  A user of GCC has no intrinsic interest in getting OpenACC kernels
constructs' code offloaded; the user wants his code to execute as fast as
possible.

If you consider the whole of OpenACC kernels code offloading as a
compiler optimization, then it's fine for GCC to abort this
"optimization" if it's reasonably clear that this transformation (code
offloading) will not be profitable -- just like what GCC does with other
possible code optimizations/transformations.

Yes, but if a single kernel (which might not even get executed at run-time) can inhibit offloading for the whole program, then we're not making an intelligent decision, and IMO violating user expectations. IIUC it's also disabling offloading for parallels rather than just kernels, which we previously said shouldn't happen.

As I've said before,
profiling the execution times of several real-world codes has shown that
under the assumtion that parloops fails to parallelize one kernel (one
out of possibly many), this one kernel has always been a "hot spot", and
avoiding offloading in this case has always helped prevent performance
degradation below host-fallback performance.

IMO a warning for the specific kernel that's problematic would be better so that users can selectively apply -fopenacc to files where it is profitable.

It's of course unfortunate that we have to disable our offloading
machinery for a lot of codes using OpenACC kernels, but given the current
state of OpenACC kernels parallelization analysis (parloops), doing so is
still profitable for a user, compared to regressed performance with
single-threaded offloaded execution.

How often does this occur on real-world code? Will we end up supporting OpenACC by not doing offloading at all in the usual case? The way you describe it, it sounds like we should recommend that -fopenacc not be used in gcc-6 and restore the previous invoke.texi langauge that marks it as experimental.


Bernd

Reply via email to