On Thu, 12 Jan 2012, Martin Jambor wrote: > Hi, > > On Thu, Jan 12, 2012 at 03:23:31PM +0100, Richard Guenther wrote: > > On Thu, 12 Jan 2012, Martin Jambor wrote: > > > > > Hi, > > > > > > I tend to believe that this is the SRA part of a fix for PR 50444 (and > > > possibly some strict alignment SRA-related bugs too. What it does is > > > that it decreases the alignment of the built MEM_REF type if it seems > > > necessary. That is in cases when the alignment of the type as it > > > arrived in the parameter is larger than the alignment of the base we > > > are building the MEM_REF on top of or the MEM_REF we are replacing or > > > when it is larger than the biggest power of two that divides the > > > offset we are adding, if it is non zero). > > > > > > This patch alone fixes the testcase on i686, on x86_64 the testcase > > > stops segfaulting but the final comparison does not go well and the > > > testcase returns non-zero exit code. However, I belive that is most > > > probably an expansion problem, rather than SRA's. Perhaps it should > > > be also noted that if I revert SRA's build_ref_for_model to what it > > > was in revision 164280 and SRA creates only a MEM_REF rather than a > > > COMPONENT_REF on top of a MEM_REF which it does now, the assignment > > > expansion takes the "optab_handler (movmisalign_optab, mode)" route > > > and the testcase is compiled well, with both the segfault gone and > > > comparison yielding the correct result. I plan to have a look at what > > > goes on in expansion with the current build_ref_for_model shortly but > > > thought I'd post this for review and comments before that. > > > > > > A similar patch has passed bootstrap and testsuite on x86_64-linux, > > > this exact one is undergoing the same as I'm writing this. Is it the > > > good direction, should I commit the patch (to trunk and the 4.6 > > > branch) if it passes? > > > > I don't think this is a suitable place to discover the alignment > > for the access. In fact what we should do is use the same alignment > > that was used for the access we replicate/decompose (or is the > > 'base' argument to build_ref_for_offset always the original access?) > > No, it often it is not. And I am not sure what that same alignment > above is. Let me explain the crux of the problem with the testcase. > The data structures look like this: > > typedef struct { > uint32_t v[4]; > } a4x32; > > typedef struct { > __m128i m; > } a1xm128i; > > typedef struct { > a1xm128i key; > a4x32 c; > size_t elem; > a4x32 v; > } Engine; > > And we have a local aggregate of type a4x32 but which is accessed as > __m128i, therefore its scalar replacement has type __m128i. Any store > of that type requires 128bit alignment.
I don't see that a4x32 accessed as __m128i - that is what SRA generates but not what is present in the IL before SRA. > The problem is that the original a4x32 aggregate is stored into the v > field of a structure of type Engine which is pointed to by a pointer > parameter. Field v has offset 40 and therefore any store to it in the > __m128i type without fiddling with its alignment is illegal (the > Engine structure itself is 128bit aligned because of field key). > > Specifically, before SRA it all looks like this. D.4781 and D.4792 > are local a4x32 structures, e points to a struct Engine): > > MEM[(struct *)&D.4792].m = vector_ssa_name; > D.4781 = D.4792; > e_1(D)->v = D.4781; > > If I want to replace the local aggregates D.4781 and D.4792 with > vectors and make them go away completely (and I do believe people > would complain if I did not, especially if they are used in more > computations elsewhere in the function), I have to store the vecor to > the unaligned position. > > If we want to make D.4781 disappear completely, there is no original > access we are replicating/decomposing and we have to take into account > the offset 40, which however is a property of this particular memory > access and nothing else. Therefore I thought this was the proper > place to deal with it. I see it that already the MEM[(char * {ref-all})&c4x32] = MEM[(char * {ref-all})&c1x128]; D.1760 = c4x32; to MEM[(struct *)&D.1760].m = c4x32$m_15; replacement SRA performs is bogus. D.1760 is a4x32 but SRA generates a c1x128 store into it. Richard. > > > > Also ... > > > > > Thanks, > > > > > > Martin > > > > > > > > > 2012-01-12 Martin Jambor <mjam...@suse.cz> > > > > > > PR tree-optimization/50444 > > > * tree-sra.c (build_ref_for_offset): Decrease the alignment of the > > > type of the created MEM_REF if necessary. > > > > > > Index: src/gcc/tree-sra.c > > > =================================================================== > > > --- src.orig/gcc/tree-sra.c > > > +++ src/gcc/tree-sra.c > > > @@ -1458,9 +1458,10 @@ build_ref_for_offset (location_t loc, tr > > > tree exp_type, gimple_stmt_iterator *gsi, > > > bool insert_after) > > > { > > > + HOST_WIDE_INT base_offset, final_offset; > > > + unsigned int align; > > > tree prev_base = base; > > > tree off; > > > - HOST_WIDE_INT base_offset; > > > > > > gcc_checking_assert (offset % BITS_PER_UNIT == 0); > > > > > > @@ -1488,24 +1489,34 @@ build_ref_for_offset (location_t loc, tr > > > gsi_insert_before (gsi, stmt, GSI_SAME_STMT); > > > update_stmt (stmt); > > > > > > + final_offset = offset / BITS_PER_UNIT; > > > off = build_int_cst (reference_alias_ptr_type (prev_base), > > > - offset / BITS_PER_UNIT); > > > + final_offset); > > > base = tmp; > > > + align = get_object_alignment (prev_base); > > > } > > > else if (TREE_CODE (base) == MEM_REF) > > > { > > > - off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > > > - base_offset + offset / BITS_PER_UNIT); > > > + final_offset = base_offset + offset / BITS_PER_UNIT; > > > + off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > > > final_offset); > > > off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off); > > > + align = get_object_or_type_alignment (base); > > > base = unshare_expr (TREE_OPERAND (base, 0)); > > > } > > > else > > > { > > > - off = build_int_cst (reference_alias_ptr_type (base), > > > - base_offset + offset / BITS_PER_UNIT); > > > + final_offset = base_offset + offset / BITS_PER_UNIT; > > > + off = build_int_cst (reference_alias_ptr_type (base), > > > final_offset); > > > + align = get_object_alignment (base); > > > > I think you always want to use get_object_alignment_1 to be able to > > factor in offset and thus knowledge about explicit misalignment, > > otherwise you'd pessimize quite some cases. Then the question > > is whether to factor in TYPE_ALIGN of the original access > > (see get_object_or_type_alignment). > > I think the MEM_REF case needs to use get_object_or_type_alignment > like my patch does because quite likely its base is a pointer > SSA_NAME. I do not know how I could make any use of the value > returned in get_object_alignment_1's bitposp other than exactly what > get_object_alignment does. Can you give me a further hint? What > would be the reason for pessimizing things? > > Thanks a lot, > > Martin > > > > > > Richard. > > > > > base = build_fold_addr_expr (unshare_expr (base)); > > > } > > > > > > + if (final_offset != 0) > > > + align = MIN (align, > > > + (1u << ctz_hwi (absu_hwi (final_offset))) * > > > BITS_PER_UNIT); > > > + if (align < TYPE_ALIGN (exp_type)) > > > + exp_type = build_aligned_type (exp_type, align); > > > + > > > return fold_build2_loc (loc, MEM_REF, exp_type, base, off); > > > } > > -- Richard Guenther <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer