On Thu, 25 Aug 2022, Andre Vieira (lists) wrote:

> 
> On 17/08/2022 13:49, Richard Biener wrote:
> > Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET
> > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
> > access - that's the offset within the representative (by construction
> > both fields share DECL_FIELD_OFFSET).
> Doh! That makes sense...
> >> So instead I change bitpos such that:
> >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative));
> >> bitpos -= bitpos.to_constant () / align_of_representative *
> >> align_of_representative;
> > ?  Not sure why alignment comes into play here?
> Yeah just forget about this... it was my ill attempt at basically doing what
> you described above.
> > Not sure what you are saying but "yes", all shifting and masking should
> > happen in the type of the representative.
> >
> > +  tree bitpos_tree = build_int_cst (bitsizetype, bitpos);
> >
> > for your convenience there's bitsize_int (bitpos) you can use.
> >
> > I don't think you are using the correct bitpos though, you fail to
> > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR.
> Not sure I understand what you mean? I do adjust it, I've changed it now so it
> should hopefully be clearer.
> >
> > +                        build_int_cst (bitsizetype, TYPE_PRECISION
> > (bf_type)),
> >
> > the size of the bitfield reference is DECL_SIZE of the original
> > FIELD_DECL - it might be bigger than the precision of its type.
> > You probably want to double-check it's equal to the precision
> > (because of the insert but also because of all the masking) and
> > refuse to lower if not.
> I added a check for this but out of curiosity, how can the DECL_SIZE of a
> bitfield FIELD_DECL be different than it's type's precision?

It's probably not possible to create a C testcase but I don't see
what makes this impossible in general to have padding in a bitfield 
object.

> >
> > +/* Return TRUE if there are bitfields to lower in this LOOP.  Fill
> > TO_LOWER
> > +   with data structures representing these bitfields.  */
> > +
> > +static bool
> > +bitfields_to_lower_p (class loop *loop,
> > +                     vec <gassign *> &reads_to_lower,
> > +                     vec <gassign *> &writes_to_lower)
> > +{
> > +  basic_block *bbs = get_loop_body (loop);
> > +  gimple_stmt_iterator gsi;
> >
> > as said I'd prefer to do this walk as part of the other walks we
> > already do - if and if only because get_loop_body () is a DFS
> > walk over the loop body (you should at least share that).
> I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the
> walk over them is because it becomes quite complex to split out the decision
> to not lower if's because there are none, for which we will still want to
> lower bitfields, versus not lowering if's when they are there but aren't
> lowerable at which point we will forego lowering bitfields since we will not
> vectorize this loop anyway.
> >
> > +      value = fold_build1 (NOP_EXPR, load_type, value);
> >
> > fold_convert (load_type, value)
> >
> > +      if (!CONSTANT_CLASS_P (value))
> > +       {
> > +         pattern_stmt
> > +           = gimple_build_assign (vect_recog_temp_ssa_var (load_type,
> > NULL),
> > +                                  value);
> > +         value = gimple_get_lhs (pattern_stmt);
> >
> > there's in principle
> >
> >       gimple_seq stmts = NULL;
> >       value = gimple_convert (&stmts, load_type, value);
> >       if (!gimple_seq_empty_p (stmts))
> >         {
> >           pattern_stmt = gimple_seq_first_stmt (stmts);
> >           append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
> >         }
> >
> > though a append_pattern_def_seq helper to add a convenience sequence
> > would be nice to have here.
> Ended up using the existing 'vect_convert_input', seems to do nicely here.
> > You probably want to double-check your lowering code by
> > bootstrapping / testing with -ftree-loop-if-convert.
> Done, this lead me to find a new failure mode, where the type of the first
> operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures
> when constructing the masking and shifting. I ended up adding a nop-conversion
> to an INTEGER type of the same width first if necessary.

You want a VIEW_CONVERT (aka bit-cast) here.

> Also did a follow-up
> bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model`
> to further test the codegen. All seems to be working on aarch64-linux-gnu.

+static tree
+get_bitfield_rep (gassign *stmt, bool write, tree *bitpos,
+                 tree *struct_expr)
...
+  /* Bail out if the DECL_SIZE of the field_decl isn't the same as the 
BF's
+     precision.  */
+  unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE 
(field_decl));
+  if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size)
+    return NULL_TREE;

you can

use compare_tree_int (DECL_SIZE (field_decl), TYPE_PRECISION (...)) != 0

which avoids caring for the case the size isn't a uhwi ...

+      gimple *new_stmt = gimple_build_assign (unshare_expr 
(rep_comp_ref),
+                                             new_val);
+      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+      tree vdef = gimple_vdef (stmt);
+      gimple_set_vdef (new_stmt, vdef);
+      SSA_NAME_DEF_STMT (vdef) = new_stmt;

you can use gimple_move_vops (new_stmt, stmt); here

+      tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val,
+                        build_int_cst (bitsizetype, TYPE_PRECISION 
(bf_type)),
+                        bitpos);
+      new_val = ifc_temp_var (bf_type, bfr, &gsi);
+      redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs 
(stmt),
+                                                    new_val));

I'm curious, why the push to redundant_ssa_names?  That could use
a comment ...

+  need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower,
+                                                 writes_to_lower);

do we want to conditionalize this on flag_tree_loop_vectorize?  That is,
I think the lowering should for now happen only on the loop version
guarded by .IFN_VECTORIZED.  There's

  if ((need_to_predicate || any_complicated_phi)
      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
          || loop->dont_vectorize))
    goto cleanup;

for the cases that will force versioning, but I think we should
simply not lower bitfields in the

         ((!flag_tree_loop_vectorize && !loop->force_vectorize)
          || loop->dont_vectorize)

case?

+      if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN
+         || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF)
+       return NULL;

the first || goes to a new line

+      bf_stmt = static_cast <gassign *> (second_stmt);

"nicer" and shorter is

       bf_stmt = dyn_cast <gassign *> (second_stmt);
       if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
         return NULL;

+  tree lhs = TREE_OPERAND (bf_ref, 0);
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)))
+    return NULL;
+
+  gimple *use_stmt, *pattern_stmt;
+  use_operand_p use_p;
+  tree ret = gimple_assign_lhs (first_stmt);

just when reading, generic variables like 'lhs' are not helpful
(when they are not an actual lhs even less so ...).
You have nice docs ontop of the function - when you use
atual names for _2 = BIT_FIELD_REF (_1, ...) variables you can
even use them in the code so docs and code match up nicely.

+  /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, 
convert
+     it to one of the same width so we can perform the necessary masking 
and
+     shifting.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
+    {
+      tree int_type
+       = build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE 
(lhs)),
+                                         true);

so you probably run into this from code that's not lowered from
original bitfield reads?  Note you should use TYPE_SIZE here,
definitely not TYPE_PRECISION on arbitrary types (if its a vector
type then that will yield the number of units for example).

+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant 
();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant 
();

is there anything that prevents this to run on VLA vector extractions?
I think it would be nice to test constantness at the start of the
function.

+         pattern_stmt
+           = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE 
(lhs),
+                                                           NULL),

eh, seeing that multiple times the vect_recog_temp_ssa_var needs
a defaulted NULL second argument ...

Note I fear we will have endianess issues when translating
bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts.  Rules
for memory and register operations do not match up (IIRC, I repeatedly
run into issues here myself).  The testcases all look like they
won't catch this - I think an example would be sth like
struct X { unsigned a : 23; unsigned b : 9; }, can you see to do
testing on a big-endian target?

Otherwise the patch looks good, so there's only minor things to
fix up (in case the endianess issue turns out to be a non-issue).

Sorry for the delay in reviewing.

Thanks,
Richard.

Reply via email to