> From: Richard Biener [mailto:richard.guent...@gmail.com] > > With handling only the outermost handled-component and then only a > selected subset you'll catch many but not all cases. Why not simply > use get_inner_reference () here (plus stripping the constant offset > from an innermost MEM_REF) and get the best of both worlds (not > duplicate parts of its logic and handle more cases)? Eventually > using tree-affine.c and get_inner_reference_aff is even more appropriate > so you can compute the address differences without decomposing > them yourselves.
Why does the constant offset from an innermost MEM_REF need to be stripped? Shouldn't that be part of the offset in the symbolic number? > > + /* Compute address to load from and cast according to the size > + of the load. */ > + load_ptr_type = build_pointer_type (load_type); > + addr_expr = build1 (ADDR_EXPR, load_ptr_type, bswap_src); > + addr_tmp = make_temp_ssa_name (load_ptr_type, NULL, > "load_src"); > + addr_stmt = gimple_build_assign_with_ops > + (NOP_EXPR, addr_tmp, addr_expr, NULL); > + gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT); > + > + /* Perform the load. */ > + load_offset_ptr = build_int_cst (load_ptr_type, 0); > + val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst"); > + val_expr = build2 (MEM_REF, load_type, addr_tmp, > load_offset_ptr); > + load_stmt = gimple_build_assign_with_ops > + (MEM_REF, val_tmp, val_expr, NULL); > > this is unnecessarily complex and has TBAA issues. You don't need to > create a "correct" pointer type, so doing > > addr_expr = fold_build_addr_expr (bswap_src); > > is enough. Now, to fix the TBAA issues you either need to remember > and "combine" the reference_alias_ptr_type of each original load and > use that for the load_offset_ptr value or decide that isn't worth it and > use alias-set zero (use ptr_type_node). Sorry this is only my second patch [1] to gcc so it's not all clear to me. The TBAA issue you mention comes from val_expr referring to a memory area that overlap with the smaller memory area used in the bitwise OR operation, am I right? Now, I have no idea about how to do the combination of the values returned by reference_alias_ptr_type () for each individual small memory area. Can you advise me on this? And what are the effect of not doing it and using ptr_type_node for the alias-set? [1] First one was a fix on the existing implementation of the bswap pass. > > Can you also expand the comment about "size vs. range"? Is it > that range can be bigger than size if you have (short)a[0] | > ((short)a[3] << 1) sofar > where size == 2 but range == 3? Thus range can also be smaller than size > for example for (short)a[0] | ((short)a[0] << 1) where range would be 1 and > size == 2? I suppose adding two examples like this to the comment, together > with the expected value of 'n' would help here. You understood correctly. I will add the suggested example. > Otherwise the patch looks good. Now we're only missing the addition > of trying to match to a VEC_PERM_EXPR with a constant mask > using can_vec_perm_p ;) Is that the vector shuffle engine you were mentioning in PR54733? If I understand correctly it is a generalization of the check again CMPNOP and CMPXCHG in find_bswap in this new patchset. I will look if ARM could Benefit from this and if yes I might take a look (yep, two conditions). Thanks a lot for such quick and detailed comments after my ping. Best regards, Thomas