On Fri, Nov 6, 2020 at 12:18 PM Frederik Harwath <frede...@codesourcery.com> wrote: > > > Hi Richard, > > Richard Biener <richard.guent...@gmail.com> writes: > > > On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath > > > What's on my TODO list (or on the list of things to explore) is to make > > the dump file names/suffixes explicit in passes.def like via > > > > NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc") > > > > and we'd get a dump named .ccp_oacc or so. > > That would be very helpful for avoiding the drudgery of adapting those > pass numbers! > > > Now, what does oacc_device_lower actually do that you need to > > re-run complex lowering? What does cunrolli do at this point that > > the complete_unroll pass later does not do? > > > > Good spot, "cunrolli" seems to be unnecessary. The complex lowering is > necessary to handle the code that gets created by the OpenACC reduction > lowering during oaccdevlow. I have attached a test case (a reduced > version of > libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which > shows that the complex instructions are created by > pass_oacc_device_lower and which leads to an ICE if compiled without the > new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2"). > The problem is an unlowered addition. This is from a diff of the dump of > the pass following oaccdevlow1 (ccp4) with disabled and with enabled > tree-cplxlower2: > > < _91 = VIEW_CONVERT_EXPR<complex float>(_1); > < _92 = reduction_var_2 + _91; > --- > > _104 = REALPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>; > > _105 = IMAGPART_EXPR <VIEW_CONVERT_EXPR<complex float>(_1)>; > > _91 = COMPLEX_EXPR <_104, _105>; > > _106 = reduction_var$real_100 + _104; > > _107 = reduction_var$imag_101 + _105; > > _92 = COMPLEX_EXPR <_106, _107>;
I wonder if oacc device lowering could handle this itself rather than requiring another cplxlower pass for presumably just complex add? > > What's special about oacc_device lower that doesn't also apply > > to omp_device_lower? > > The passes do different things. The goal is to optimize OpenACC > loops using Graphite. The relevant lowering of the internal OpenACC > function calls happens in pass_oacc_device_lower. > > > Is all this targeted at code compiled exclusively for the offload > > target? Thus we're in lto1 here? > > The OpenACC outlined functions also get compiled for the host. > > > Does it make eventually more sense to have a completely custom pass > > pipeline for the offload compilation? Maybe even per offload target? > > See how we have a custom pipeline for -Og (pass_all_optimizations_g). > > What would be the main benefits of a separate pipeline? Avoiding > (re-)running passes unneccessarily, less unwanted interactions > in the test suite (but your suggestion above regarding the fixed > pass names would also solve this)? Mainly to avoid (re-)running passes unneccessarily and more easily tuning towards offload targets without affecting non-offload code too much. Can I somehow make you work on that dump-file idea? ;) Richard. > >> Ok to include the patch in master? > > Best regards, > Frederik > > ----------------- > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, > Alexander Walter