On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath
<frede...@codesourcery.com> wrote:
>
>
> Hi,
>
> as a first step towards enabling the use of Graphite for optimizing
> OpenACC loops this patch moves the OpenACC device lowering after the
> Graphite pass.  This means that the device lowering now takes place
> after some crucial optimization passes. Thus new instances of those
> passes are added inside of a new pass pass_oacc_functions which ensures
> that they run on OpenACC functions only. The choice of the new position
> for pass_oacc_device_lower is further constrainted by the need to
> execute it before pass_vectorize.  This means that
> pass_oacc_device_lower now runs inside of pass_tree_loop. A further
> instance of the pass that handles functions without loops is added
> inside of pass_tree_no_loop. Yet another pass instance that executes if
> optimizations are disabled is included inside of a new
> pass_no_optimizations.
>
> The patch has been bootstrapped on x86_64-linux-gnu and tested with the
> GCC testsuite and with the libgomp testsuite with nvptx and gcn
> offloading.
>
> The patch should have no impact on non-OpenACC user code. However the
> new pass instances have changed the pass instance numbering and hence
> the dump scanning commands in several tests had to be adjusted. I hope

What's on my TODO list (or on the list of things to explore) is to make
the dump file names/suffixes explicit in passes.def like via

  NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")

and we'd get a dump named .ccp_oacc or so.  Or stick with explicit
numbers by specifying , 5.  If just the number is fixed this could
eventually be done with just tweaks to gen-pass-instances.awk

Now, what does oacc_device_lower actually do that you need to
re-run complex lowering?  What does cunrolli do at this point that
the complete_unroll pass later does not do?

What's special about oacc_device lower that doesn't also apply
to omp_device_lower?

Is all this targeted at code compiled exclusively for the offload
target?  Thus we're in lto1 here?  Does it make eventually more
sense to have a completely custom pass pipeline for the
offload compilation?  Maybe even per offload target?  See how
we have a custom pipeline for -Og (pass_all_optimizations_g).

> that I found all that needed adjustment, but it is well possible that I
> missed some tests that execute for particular targets or non-default
> languages only. The resulting UNRESOLVED tests are usually easily fixed
> by appending a pass number to the name of a pass that previously had no
> number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass
> number (e.g. "dce6" becomes "dce7") in a dump scanning command.
>
> The patch leads to several new unresolved tests in the libgomp testsuite
> which are caused by the combination of torture testing, missing cleanup
> of the offload dump files, and the new pass numbering.  If a test that
> uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with
> "-O0" and afterwards with "-O2", each run of the test executes different
> instances of pass_oacc_device_lower and produces dumps whose names
> differ only in the pass instance number.  The dump scanning command in
> the second run fails, because the dump files do not get removed after
> the first run and the command consequently matches two different dump
> files.  This seems to be a known issue.  I am going to submit a patch
> that implements the cleanup of the offload dumps soon.
>
> I have tried to rule out performance regressions by running different
> benchmark suites with nvptx and gcn offloading. Nevertheless, I think
> that it makes sense to keep an eye on OpenACC performance in the close
> future and revisit the optimizations that run on the device lowered
> function if necessary.
>
> Ok to include the patch in master?
>
> Best regards,
> Frederik
>
>
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter

Reply via email to