On Sat, 15 Nov 2014, Tom de Vries wrote: > On 15-11-14 13:14, Tom de Vries wrote: > > Hi, > > > > I'm submitting a patch series with initial support for the oacc kernels > > directive. > > > > The patch series uses pass_parallelize_loops to implement parallelization of > > loops in the oacc kernels region. > > > > The patch series consists of these 8 patches: > > ... > > 1 Expand oacc kernels after pass_build_ealias > > 2 Add pass_oacc_kernels > > 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > > 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > > 5 Add pass_loop_im to pass_oacc_kernels > > 6 Add pass_ccp to pass_oacc_kernels > > 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > > 8 Do simple omp lowering for no address taken var > > ... > > This patch lowers integer variables that do not have their address taken as > local variable. We use a copy at region entry and exit to copy the value in > and out. > > In the context of reduction handling in a kernels region, this allows the > parloops reduction analysis to recognize the reduction, even after oacc > lowering has been done in pass_lower_omp. > > In more detail, without this patch, the omp_data_i load and stores are > generated in place (in this case, in the loop): > ... > { > .omp_data_iD.2201 = &.omp_data_arr.15D.2220; > { > unsigned intD.9 iD.2146; > > iD.2146 = 0; > goto <D.2207>; > <D.2208>: > D.2216 = .omp_data_iD.2201->cD.2203; > c.9D.2176 = *D.2216; > D.2177 = (long unsigned intD.10) iD.2146; > D.2178 = D.2177 * 4; > D.2179 = c.9D.2176 + D.2178; > D.2180 = *D.2179; > D.2217 = .omp_data_iD.2201->sumD.2205; > D.2218 = *D.2217; > D.2217 = .omp_data_iD.2201->sumD.2205; > D.2219 = D.2180 + D.2218; > *D.2217 = D.2219; > iD.2146 = iD.2146 + 1; > <D.2207>: > if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>; > <D.2209>: > } > ... > > With this patch, the omp_data_i load and stores for sum are generated at entry > and exit: > ... > { > .omp_data_iD.2201 = &.omp_data_arr.15D.2218; > D.2216 = .omp_data_iD.2201->sumD.2205; > sumD.2206 = *D.2216; > { > unsigned intD.9 iD.2146; > > iD.2146 = 0; > goto <D.2207>; > <D.2208>: > D.2217 = .omp_data_iD.2201->cD.2203; > c.9D.2176 = *D.2217; > D.2177 = (long unsigned intD.10) iD.2146; > D.2178 = D.2177 * 4; > D.2179 = c.9D.2176 + D.2178; > D.2180 = *D.2179; > sumD.2206 = D.2180 + sumD.2206; > iD.2146 = iD.2146 + 1; > <D.2207>: > if (iD.2146 <= 524287) goto <D.2208>; else goto <D.2209>; > <D.2209>: > } > *D.2216 = sumD.2206; > #pragma omp return > } > ... > > > So, without the patch the reduction operation looks like this: > ... > *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x > ... > > And with this patch the reduction operation is simply: > ... > sumD.2206 = sumD.2206 + x: > ... > > OK for trunk?
I presume the reason you are trying to do that here is that otherwise it happens too late? What you do is what loop store motion would do. Now - I can see how that is easily confused by the static chain being address-taken. But I also remember that Eric did some preparatory work to fix that, for nested functions, that is, possibly setting DECL_NONADDRESSABLE_P? Don't remember exactly. That said - the gimple_seq_ior_addresses_taken_op callback looks completely broken. Consider &a.x which you'd fail to mark as address-taken. It looks like the body is not yet in CFG form when you apply all this? That said - the functions do not belong to gimple.[ch] at least as they are not going to work in general. I also question why they are necessary - you do + if (gimple_code (stmt) == GIMPLE_OACC_KERNELS + && !bitmap_bit_p (addresses_taken, DECL_UID (var)) + && INTEGRAL_TYPE_P (TREE_TYPE (var))) but why don't you simply check TREE_ADDRESSABLE (var)? TREE_ADDRESSABLE is conservative correct here. And the above won't help for float reductions. So if, then you should probably test is_gimple_reg_type (TREE_TYPE (var)) instead of INTEGRAL_TYPE_P and you definitely should limit the number of vars treated this way. Oh - and the optimization should be somewhere more general - after all it applies to all nested functions (thus move it to tree-nested.c?) and to autopar loops as well. Not sure how much code the omp lowering shares with unnesting - but hopefully enough. Richard. -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany