store

Thomas Preud'homme Wed, 16 Apr 2014 22:20:08 -0700

> From: Richard Biener [mailto:richard.guent...@gmail.com]
> 
> With handling only the outermost handled-component and then only a
> selected subset you'll catch many but not all cases.  Why not simply
> use get_inner_reference () here (plus stripping the constant offset
> from an innermost MEM_REF) and get the best of both worlds (not
> duplicate parts of its logic and handle more cases)?  Eventually
> using tree-affine.c and get_inner_reference_aff is even more appropriate
> so you can compute the address differences without decomposing
> them yourselves.


Why does the constant offset from an innermost MEM_REF need to be
stripped? Shouldn't that be part of the offset in the symbolic number?

> 
> +             /*  Compute address to load from and cast according to the size
> +                 of the load.  */
> +             load_ptr_type = build_pointer_type (load_type);
> +             addr_expr = build1 (ADDR_EXPR, load_ptr_type, bswap_src);
> +             addr_tmp = make_temp_ssa_name (load_ptr_type, NULL,
> "load_src");
> +             addr_stmt = gimple_build_assign_with_ops
> +                        (NOP_EXPR, addr_tmp, addr_expr, NULL);
> +             gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT);
> +
> +             /* Perform the load.  */
> +             load_offset_ptr = build_int_cst (load_ptr_type, 0);
> +             val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst");
> +             val_expr = build2 (MEM_REF, load_type, addr_tmp, 
> load_offset_ptr);
> +             load_stmt = gimple_build_assign_with_ops
> +                        (MEM_REF, val_tmp, val_expr, NULL);
> 
> this is unnecessarily complex and has TBAA issues.  You don't need to
> create a "correct" pointer type, so doing
> 
>     addr_expr = fold_build_addr_expr (bswap_src);
> 
> is enough.  Now, to fix the TBAA issues you either need to remember
> and "combine" the reference_alias_ptr_type of each original load and
> use that for the load_offset_ptr value or decide that isn't worth it and
> use alias-set zero (use ptr_type_node).

Sorry this is only my second patch [1] to gcc so it's not all clear to me. The 
TBAA
issue you mention comes from val_expr referring to a memory area that
overlap with the smaller memory area used in the bitwise OR operation, am I
right? Now, I have no idea about how to do the combination of the values
returned by reference_alias_ptr_type () for each individual small memory
area. Can you advise me on this? And what are the effect of not doing it and
using ptr_type_node for the alias-set?

[1] First one was a fix on the existing implementation of the bswap pass.

> 
> Can you also expand the comment about "size vs. range"?  Is it
> that range can be bigger than size if you have (short)a[0] |
> ((short)a[3] << 1) sofar
> where size == 2 but range == 3?  Thus range can also be smaller than size
> for example for (short)a[0] | ((short)a[0] << 1) where range would be 1 and
> size == 2?  I suppose adding two examples like this to the comment, together
> with the expected value of 'n' would help here.

You understood correctly. I will add the suggested example.

> Otherwise the patch looks good.  Now we're only missing the addition
> of trying to match to a VEC_PERM_EXPR with a constant mask
> using can_vec_perm_p ;)

Is that the vector shuffle engine you were mentioning in PR54733? If I
understand correctly it is a generalization of the check again CMPNOP and
CMPXCHG in find_bswap in this new patchset. I will look if ARM could
Benefit from this and if yes I might take a look (yep, two conditions).

Thanks a lot for such quick and detailed comments after my ping.

Best regards,

Thomas

RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store

Reply via email to