On Thu, 12 Jan 2012, Martin Jambor wrote:

> Hi,
> 
> On Thu, Jan 12, 2012 at 03:23:31PM +0100, Richard Guenther wrote:
> > On Thu, 12 Jan 2012, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > I tend to believe that this is the SRA part of a fix for PR 50444 (and
> > > possibly some strict alignment SRA-related bugs too.  What it does is
> > > that it decreases the alignment of the built MEM_REF type if it seems
> > > necessary.  That is in cases when the alignment of the type as it
> > > arrived in the parameter is larger than the alignment of the base we
> > > are building the MEM_REF on top of or the MEM_REF we are replacing or
> > > when it is larger than the biggest power of two that divides the
> > > offset we are adding, if it is non zero).
> > > 
> > > This patch alone fixes the testcase on i686, on x86_64 the testcase
> > > stops segfaulting but the final comparison does not go well and the
> > > testcase returns non-zero exit code.  However, I belive that is most
> > > probably an expansion problem, rather than SRA's.  Perhaps it should
> > > be also noted that if I revert SRA's build_ref_for_model to what it
> > > was in revision 164280 and SRA creates only a MEM_REF rather than a
> > > COMPONENT_REF on top of a MEM_REF which it does now, the assignment
> > > expansion takes the "optab_handler (movmisalign_optab, mode)" route
> > > and the testcase is compiled well, with both the segfault gone and
> > > comparison yielding the correct result.  I plan to have a look at what
> > > goes on in expansion with the current build_ref_for_model shortly but
> > > thought I'd post this for review and comments before that.
> > > 
> > > A similar patch has passed bootstrap and testsuite on x86_64-linux,
> > > this exact one is undergoing the same as I'm writing this.  Is it the
> > > good direction, should I commit the patch (to trunk and the 4.6
> > > branch) if it passes?
> > 
> > I don't think this is a suitable place to discover the alignment
> > for the access.  In fact what we should do is use the same alignment
> > that was used for the access we replicate/decompose (or is the
> > 'base' argument to build_ref_for_offset always the original access?)
> 
> No, it often it is not.  And I am not sure what that same alignment
> above is.  Let me explain the crux of the problem with the testcase.
> The data structures look like this:
> 
>   typedef struct {
>       uint32_t v[4];
>   } a4x32;
>   
>   typedef struct {
>       __m128i m;
>     } a1xm128i;
> 
>   typedef struct  {
>       a1xm128i key;
>       a4x32 c;
>       size_t elem;
>       a4x32 v;
>   } Engine;
> 
> And we have a local aggregate of type a4x32 but which is accessed as
> __m128i, therefore its scalar replacement has type __m128i.  Any store
> of that type requires 128bit alignment.

I don't see that a4x32 accessed as __m128i - that is what SRA
generates but not what is present in the IL before SRA.

> The problem is that the original a4x32 aggregate is stored into the v
> field of a structure of type Engine which is pointed to by a pointer
> parameter.  Field v has offset 40 and therefore any store to it in the
> __m128i type without fiddling with its alignment is illegal (the
> Engine structure itself is 128bit aligned because of field key).
> 
> Specifically, before SRA it all looks like this.  D.4781 and D.4792
> are local a4x32 structures, e points to a struct Engine):
> 
>   MEM[(struct  *)&D.4792].m = vector_ssa_name;
>   D.4781 = D.4792;
>   e_1(D)->v = D.4781;
> 
> If I want to replace the local aggregates D.4781 and D.4792 with
> vectors and make them go away completely (and I do believe people
> would complain if I did not, especially if they are used in more
> computations elsewhere in the function), I have to store the vecor to
> the unaligned position.
> 
> If we want to make D.4781 disappear completely, there is no original
> access we are replicating/decomposing and we have to take into account
> the offset 40, which however is a property of this particular memory
> access and nothing else.  Therefore I thought this was the proper
> place to deal with it.

I see it that already the

  MEM[(char * {ref-all})&c4x32] = MEM[(char * {ref-all})&c1x128];
  D.1760 = c4x32;

to

  MEM[(struct  *)&D.1760].m = c4x32$m_15;

replacement SRA performs is bogus.  D.1760 is a4x32 but SRA
generates a c1x128 store into it.

Richard.

> > 
> > Also ...
> > 
> > > Thanks,
> > > 
> > > Martin
> > > 
> > > 
> > > 2012-01-12  Martin Jambor  <mjam...@suse.cz>
> > > 
> > >   PR tree-optimization/50444
> > >   * tree-sra.c (build_ref_for_offset): Decrease the alignment of the
> > >   type of the created MEM_REF if necessary.
> > > 
> > > Index: src/gcc/tree-sra.c
> > > ===================================================================
> > > --- src.orig/gcc/tree-sra.c
> > > +++ src/gcc/tree-sra.c
> > > @@ -1458,9 +1458,10 @@ build_ref_for_offset (location_t loc, tr
> > >                 tree exp_type, gimple_stmt_iterator *gsi,
> > >                 bool insert_after)
> > >  {
> > > +  HOST_WIDE_INT base_offset, final_offset;
> > > +  unsigned int align;
> > >    tree prev_base = base;
> > >    tree off;
> > > -  HOST_WIDE_INT base_offset;
> > >  
> > >    gcc_checking_assert (offset % BITS_PER_UNIT == 0);
> > >  
> > > @@ -1488,24 +1489,34 @@ build_ref_for_offset (location_t loc, tr
> > >   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> > >        update_stmt (stmt);
> > >  
> > > +      final_offset = offset / BITS_PER_UNIT;
> > >        off = build_int_cst (reference_alias_ptr_type (prev_base),
> > > -                    offset / BITS_PER_UNIT);
> > > +                    final_offset);
> > >        base = tmp;
> > > +      align = get_object_alignment (prev_base);
> > >      }
> > >    else if (TREE_CODE (base) == MEM_REF)
> > >      {
> > > -      off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)),
> > > -                    base_offset + offset / BITS_PER_UNIT);
> > > +      final_offset = base_offset + offset / BITS_PER_UNIT;
> > > +      off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 
> > > final_offset);
> > >        off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off);
> > > +      align = get_object_or_type_alignment (base);
> > >        base = unshare_expr (TREE_OPERAND (base, 0));
> > >      }
> > >    else
> > >      {
> > > -      off = build_int_cst (reference_alias_ptr_type (base),
> > > -                    base_offset + offset / BITS_PER_UNIT);
> > > +      final_offset = base_offset + offset / BITS_PER_UNIT;
> > > +      off = build_int_cst (reference_alias_ptr_type (base), 
> > > final_offset);
> > > +      align = get_object_alignment (base);
> > 
> > I think you always want to use get_object_alignment_1 to be able to
> > factor in offset and thus knowledge about explicit misalignment,
> > otherwise you'd pessimize quite some cases.  Then the question
> > is whether to factor in TYPE_ALIGN of the original access
> > (see get_object_or_type_alignment).
> 
> I think the MEM_REF case needs to use get_object_or_type_alignment
> like my patch does because quite likely its base is a pointer
> SSA_NAME.  I do not know how I could make any use of the value
> returned in get_object_alignment_1's bitposp other than exactly what
> get_object_alignment does.  Can you give me a further hint?  What
> would be the reason for pessimizing things?
> 
> Thanks a lot,
> 
> Martin
> 
> 
> > 
> > Richard.
> > 
> > >        base = build_fold_addr_expr (unshare_expr (base));
> > >      }
> > >  
> > > +  if (final_offset != 0)
> > > +    align = MIN (align,
> > > +                 (1u << ctz_hwi (absu_hwi (final_offset))) * 
> > > BITS_PER_UNIT);
> > > +  if (align < TYPE_ALIGN (exp_type))
> > > +    exp_type = build_aligned_type (exp_type, align);
> > > +
> > >    return fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > >  }
> 
> 

-- 
Richard Guenther <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Reply via email to