On Tue, 19 Jul 2011, William J. Schmidt wrote: > I've been distracted by other things, but got back to this today... > > On Wed, 2011-07-06 at 16:58 +0200, Richard Guenther wrote: > > Ah, so we still have the ARRAY_REFs here. Yeah, well - then the > > issue boils down to get_inner_reference canonicalizing the offset > > according to what fold-const.c implements, and we simply emit > > the offset in that form (if you look at the normal_inner_ref: case > > in expand_expr_real*). Thus with the above form at expansion > > time the matching would be applied there (or during our RTL > > address-generation in fwprop.c ...). > > I put together a simple patch to match the specific pattern from the > original problem report in expand_expr_real_1. This returns the code > generation for that specific pattern to what it was a few years ago, but > has little effect on SPEC cpu2000 results. A small handful of > benchmarks are improved, but most results are in the noise range. So > solving the original problem alone doesn't account for very much of the > improvement we're getting with the full address lowering. > > Note that this only converted the basic pattern; I did not attempt to do > the extra "zero-offset mem-ref" optimization, which I would have had to > rewrite for an RTL phase. I don't see much point in pursuing that, > given these results. > > > > > Another idea is of course to lower all handled_component_p > > memory accesses - something I did on the old mem-ref branch > > and I believe I also suggested at some point. Without such lowering > > all the address-generation isn't exposed to the GIMPLE optimizers. > > > > The simplest way of doing the lowering is to do sth like > > > > base = get_inner_reference (rhs, ..., &bitpos, &offset, ...); > > base = fold_build2 (POINTER_PLUS_EXPR, ..., > > base, offset); > > base = force_gimple_operand (base, ... is_gimple_mem_ref_addr); > > tem = fold_build2 (MEM_REF, TREE_TYPE (rhs), > > base, > > build_int_cst (get_alias_ptr_type (rhs), > > bitpos)); > > > > at some point. I didn't end up doing that because of the loss of > > alias information. > > My next experiment will be to try something simple along these lines. > I'll look at the old mem-ref branch to see what you were doing there. > It will be interesting to see whether the gains generally outweigh the > small loss of TBAA information, as we saw using the affine-combination > approach. > > As an FYI, this is the patch I used for my experiment. Let me know if > you see anything horrifically wrong that might have impacted the > results. I didn't make any attempt to figure out whether the pattern > rewrite is appropriate for the target machine, since this was just an > experiment.
I wonder if the code below triggered at all as since we expand from SSA we no longer see the larger trees in-place but you have to look them up via SSA defs using get_gimple_for_ssa_name (or the helper get_def_for_expr). So I expect that the check for a MULT_EXPR offset only will trigger if the memory reference was still in the form of a variable ARRAY_REF (then get_inner_reference will produce an offset of the form i * sizeof (element)). I think what we want to catch here is also the case of *(ptr + i) which will not show up as ARRAY_REF but as MEM[ptr + 0] where ptr is an SSA name with a defining statement doing ptr' + i * sizeof (element). Richard. > Thanks, > Bill > > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c (revision 176422) > +++ gcc/expr.c (working copy) > @@ -7152,7 +7152,64 @@ expand_constructor (tree exp, rtx target, enum exp > return target; > } > > +/* Look for specific patterns in the inner reference BASE and its > + OFFSET expression. If found, return a single tree of type EXPTYPE > + that reassociates the addressability to combine constants. */ > +static tree > +reassociate_base_and_offset (tree base, tree offset, tree exptype) > +{ > + tree mult_op0, mult_op1, t1, t2; > + tree mult_expr, add_expr, mem_ref; > + tree offset_type; > + HOST_WIDE_INT c1, c2, c3, c; > > + /* Currently we look for the following pattern, where Tj is an > + arbitrary tree, and Ck is an integer constant: > + > + base: MEM_REF (T1, C1) > + offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > + > + This is converted to: > + > + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > + C1 + C2*C3) > + */ > + if (!base > + || !offset > + || TREE_CODE (base) != MEM_REF > + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST > + || TREE_CODE (offset) != MULT_EXPR) > + return NULL_TREE; > + > + mult_op0 = TREE_OPERAND (offset, 0); > + mult_op1 = TREE_OPERAND (offset, 1); > + > + if (TREE_CODE (mult_op0) != PLUS_EXPR > + || TREE_CODE (mult_op1) != INTEGER_CST > + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) > + return NULL_TREE; > + > + t1 = TREE_OPERAND (base, 0); > + t2 = TREE_OPERAND (mult_op0, 0); > + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); > + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); > + c3 = TREE_INT_CST_LOW (mult_op1); > + c = c1 + c2 * c3; > + > + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); > + > + mult_expr = build2 (MULT_EXPR, sizetype, t2, > + build_int_cst (offset_type, c3)); > + > + add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); > + > + mem_ref = build2 (MEM_REF, exptype, add_expr, > + build_int_cst (offset_type, c)); > + > + return mem_ref; > +} > + > + > /* expand_expr: generate code for computing expression EXP. > An rtx for the computed value is returned. The value is never null. > In the case of a void EXP, const0_rtx is returned. > @@ -9034,7 +9091,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac > { > enum machine_mode mode1, mode2; > HOST_WIDE_INT bitsize, bitpos; > - tree offset; > + tree offset, tem2; > int volatilep = 0, must_force_mem; > bool packedp = false; > tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset, > @@ -9051,6 +9108,15 @@ expand_expr_real_1 (tree exp, rtx target, enum mac > && DECL_PACKED (TREE_OPERAND (exp, 1)))) > packedp = true; > > + /* Experimental: Attempt to reassociate the base and offset > + to combine constants. */ > + if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp))) > + != NULL_TREE) > + { > + tem = tem2; > + offset = NULL_TREE; > + } > + > /* If TEM's type is a union of variable size, pass TARGET to the inner > computation, since it will need a temporary and TARGET is known > to have to do. This occurs in unchecked conversion in Ada. */ > > > -- Richard Guenther <rguent...@suse.de> Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer