On Thu, 25 Aug 2022, Andre Vieira (lists) wrote: > > On 17/08/2022 13:49, Richard Biener wrote: > > Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET > > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield > > access - that's the offset within the representative (by construction > > both fields share DECL_FIELD_OFFSET). > Doh! That makes sense... > >> So instead I change bitpos such that: > >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); > >> bitpos -= bitpos.to_constant () / align_of_representative * > >> align_of_representative; > > ? Not sure why alignment comes into play here? > Yeah just forget about this... it was my ill attempt at basically doing what > you described above. > > Not sure what you are saying but "yes", all shifting and masking should > > happen in the type of the representative. > > > > + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); > > > > for your convenience there's bitsize_int (bitpos) you can use. > > > > I don't think you are using the correct bitpos though, you fail to > > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR. > Not sure I understand what you mean? I do adjust it, I've changed it now so it > should hopefully be clearer. > > > > + build_int_cst (bitsizetype, TYPE_PRECISION > > (bf_type)), > > > > the size of the bitfield reference is DECL_SIZE of the original > > FIELD_DECL - it might be bigger than the precision of its type. > > You probably want to double-check it's equal to the precision > > (because of the insert but also because of all the masking) and > > refuse to lower if not. > I added a check for this but out of curiosity, how can the DECL_SIZE of a > bitfield FIELD_DECL be different than it's type's precision?
It's probably not possible to create a C testcase but I don't see what makes this impossible in general to have padding in a bitfield object. > > > > +/* Return TRUE if there are bitfields to lower in this LOOP. Fill > > TO_LOWER > > + with data structures representing these bitfields. */ > > + > > +static bool > > +bitfields_to_lower_p (class loop *loop, > > + vec <gassign *> &reads_to_lower, > > + vec <gassign *> &writes_to_lower) > > +{ > > + basic_block *bbs = get_loop_body (loop); > > + gimple_stmt_iterator gsi; > > > > as said I'd prefer to do this walk as part of the other walks we > > already do - if and if only because get_loop_body () is a DFS > > walk over the loop body (you should at least share that). > I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the > walk over them is because it becomes quite complex to split out the decision > to not lower if's because there are none, for which we will still want to > lower bitfields, versus not lowering if's when they are there but aren't > lowerable at which point we will forego lowering bitfields since we will not > vectorize this loop anyway. > > > > + value = fold_build1 (NOP_EXPR, load_type, value); > > > > fold_convert (load_type, value) > > > > + if (!CONSTANT_CLASS_P (value)) > > + { > > + pattern_stmt > > + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, > > NULL), > > + value); > > + value = gimple_get_lhs (pattern_stmt); > > > > there's in principle > > > > gimple_seq stmts = NULL; > > value = gimple_convert (&stmts, load_type, value); > > if (!gimple_seq_empty_p (stmts)) > > { > > pattern_stmt = gimple_seq_first_stmt (stmts); > > append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); > > } > > > > though a append_pattern_def_seq helper to add a convenience sequence > > would be nice to have here. > Ended up using the existing 'vect_convert_input', seems to do nicely here. > > You probably want to double-check your lowering code by > > bootstrapping / testing with -ftree-loop-if-convert. > Done, this lead me to find a new failure mode, where the type of the first > operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures > when constructing the masking and shifting. I ended up adding a nop-conversion > to an INTEGER type of the same width first if necessary. You want a VIEW_CONVERT (aka bit-cast) here. > Also did a follow-up > bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model` > to further test the codegen. All seems to be working on aarch64-linux-gnu. +static tree +get_bitfield_rep (gassign *stmt, bool write, tree *bitpos, + tree *struct_expr) ... + /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's + precision. */ + unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE (field_decl)); + if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size) + return NULL_TREE; you can use compare_tree_int (DECL_SIZE (field_decl), TYPE_PRECISION (...)) != 0 which avoids caring for the case the size isn't a uhwi ... + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + tree vdef = gimple_vdef (stmt); + gimple_set_vdef (new_stmt, vdef); + SSA_NAME_DEF_STMT (vdef) = new_stmt; you can use gimple_move_vops (new_stmt, stmt); here + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt), + new_val)); I'm curious, why the push to redundant_ssa_names? That could use a comment ... + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); do we want to conditionalize this on flag_tree_loop_vectorize? That is, I think the lowering should for now happen only on the loop version guarded by .IFN_VECTORIZED. There's if ((need_to_predicate || any_complicated_phi) && ((!flag_tree_loop_vectorize && !loop->force_vectorize) || loop->dont_vectorize)) goto cleanup; for the cases that will force versioning, but I think we should simply not lower bitfields in the ((!flag_tree_loop_vectorize && !loop->force_vectorize) || loop->dont_vectorize) case? + if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN + || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF) + return NULL; the first || goes to a new line + bf_stmt = static_cast <gassign *> (second_stmt); "nicer" and shorter is bf_stmt = dyn_cast <gassign *> (second_stmt); if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) return NULL; + tree lhs = TREE_OPERAND (bf_ref, 0); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))) + return NULL; + + gimple *use_stmt, *pattern_stmt; + use_operand_p use_p; + tree ret = gimple_assign_lhs (first_stmt); just when reading, generic variables like 'lhs' are not helpful (when they are not an actual lhs even less so ...). You have nice docs ontop of the function - when you use atual names for _2 = BIT_FIELD_REF (_1, ...) variables you can even use them in the code so docs and code match up nicely. + /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert + it to one of the same width so we can perform the necessary masking and + shifting. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + { + tree int_type + = build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE (lhs)), + true); so you probably run into this from code that's not lowered from original bitfield reads? Note you should use TYPE_SIZE here, definitely not TYPE_PRECISION on arbitrary types (if its a vector type then that will yield the number of units for example). + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant (); is there anything that prevents this to run on VLA vector extractions? I think it would be nice to test constantness at the start of the function. + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), + NULL), eh, seeing that multiple times the vect_recog_temp_ssa_var needs a defaulted NULL second argument ... Note I fear we will have endianess issues when translating bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts. Rules for memory and register operations do not match up (IIRC, I repeatedly run into issues here myself). The testcases all look like they won't catch this - I think an example would be sth like struct X { unsigned a : 23; unsigned b : 9; }, can you see to do testing on a big-endian target? Otherwise the patch looks good, so there's only minor things to fix up (in case the endianess issue turns out to be a non-issue). Sorry for the delay in reviewing. Thanks, Richard.