On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, May 6, 2014 at 6:44 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: > >>> Hi, >>> I split the patch into two and updated the test case. >>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm >>> cortex-m3. >>> Is it OK? >>> >>> Thanks, >>> bin >>> >>> PATCH 1: >>> >>> 2014-05-06 Bin Cheng <bin.ch...@arm.com> >>> >>> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >>> core part of address expressions. >> >> No gcc/ in the changelog >> >> Simplify that by using aff_combination_add_cst: >> >> + if (TREE_CODE (core) == MEM_REF) >> + { >> + aff_combination_add_cst (comb, mem_ref_offset (core)); >> + core = TREE_OPERAND (core, 0); >> >> patch 1 is ok with that change. > > Installed with below change because of wide-int merge: > - core = build_fold_addr_expr (core); > + if (TREE_CODE (core) == MEM_REF) > + { > + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1))); > + core = TREE_OPERAND (core, 0); > >> >>> PATCH 2: >>> >>> 2014-05-06 Bin Cheng <bin.ch...@arm.com> >>> >>> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>> (alloc_iv): Lower base expressions containing ADDR_EXPR. >> >> So this "lowers" addresses(?) which are based on &<not-decl>, >> like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR > &a+4 is already in its lowered form, what we want to handle is &EXPR + > 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. > >> when originally introducing address lowering (in rev. 204497) I was >> concerned about the cost? > Yes, you did. I still think the iv number is relative small for each > loop, thus the change won't cause compilation time issue considering > the existing tree<->affine transformation in ivopt. > I would like to collect more accurate time information for ivopt in > gcc bootstrap. Should I use "-ftime-report" then add all time slices > in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? > Thanks. > >> >> Now I wonder why we bother to convert the lowered form back >> to trees to store it in ->base and not simply keep (and always >> compute) the affine expansion only. Thus, change struct iv >> to have aff_tree base instead of tree base? >> >> Can you see what it takes to do such change? At the point >> we replace uses we go into affine representation again anyway. > Good idea, I may have a look. After going through work flow of ivopt, I think it's practical to make such change and can help compilation time. Ivopt calls tree_to_aff_combination to convert iv->base into affine_tree whenever it tries to represent an iv_use by an iv_cand, i.e., N*M times for computing cost of each iv_use represented by each iv_cand, and M times for rewriting each iv_use. Here, N and M stands for number of iv_use and iv_cand. Also strip_offset (which is a recursive function) is called for iv->base also at complexity of O(NM), so it may be improved too. To make a start, it's possible to store both tree and affine_tree of base in struct iv, and use either of them whenever needed.
It seems to me there are two potential issues of this change. First, though affine_tree of base is stored, we can't operate on it directly, which means we have to copy it before using it. Second, ivopt sometimes computes affine_tree of base in different type other than TREE_TYPE(base), we may need to do additional calls to aff_combination_convert to get wanted type. All these will introduce additional computation and compromise benefit of the change. At last, I did some experiments on how many additional calls to tree_to_aff_combination are introduced by this patch. The data of bootstrap shows it's less than 3% comparing to calls made by current implementation of ivopt, which confirms that this patch shouldn't have issue of compilation time. Any comments? Thanks, bin -- Best Regards.