On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva <ol...@adacore.com> wrote: > > > This patch introduces various improvements to the logic that merges > field compares. > > Before the patch, we could merge: > > (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) > > into something like: > > (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK) > > if both of A's fields live within the same alignment boundaries, and > so do B's, at the same relative positions. Constants may be used > instead of the object B. > > The initial goal of this patch was to enable such combinations when a > field crossed alignment boundaries, e.g. for packed types. We can't > generally access such fields with a single memory access, so when we > come across such a compare, we will attempt to combine each access > separately. > > Some merging opportunities were missed because of right-shifts, > compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and > narrowing conversions, especially after earlier merges. This patch > introduces handlers for several cases involving these. > > Other merging opportunities were missed because of association. The > existing logic would only succeed in merging a pair of consecutive > compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying > e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D. I've generalized the > handling of the rightmost compare in the left-hand operand, going for > the leftmost compare in the right-hand operand, and then onto trying > to merge compares pairwise, one from each operand, even if they are > not consecutive, taking care to avoid merging operations with > intervening side effects, including volatile accesses. > > When it is the second of a non-consecutive pair of compares that first > accesses a word, we may merge the first compare with part of the > second compare that refers to the same word, keeping the compare of > the remaining bits at the spot where the second compare used to be. > > Handling compares with non-constant fields was somewhat generalized, > now handling non-adjacent fields. When a field of one object crosses > an alignment boundary but the other doesn't, we issue the same load in > both compares; gimple optimizers will later turn it into a single > load, without our having to handle SAVE_EXPRs at this point. > > The logic for issuing split loads and compares, and ordering them, is > now shared between all cases of compares with constants and with > another object. > > The -Wno-error for toplev.o on rs6000 is because of toplev.c's: > > if ((flag_sanitize & SANITIZE_ADDRESS) > && !FRAME_GROWS_DOWNWARD) > > and rs6000.h's: > > #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 \ > || (flag_sanitize & SANITIZE_ADDRESS) != 0) > > The mutually exclusive conditions involving flag_sanitize are now > noticed and reported by fold-const.c's: > > warning (0, > "%<and%> of mutually exclusive equal-tests" > " is always 0"); > > This patch enables over 12k compare-merging opportunities that we used > to miss in a GCC bootstrap. > > Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu. Ok to install?
Sorry for throwing a wrench in here but doing this kind of transform during GENERIC folding is considered a bad thing. There are better places during GIMPLE opts to replicate (and enhance) what fold_truth_andor does, namely ifcombine and reassoc, to mention the two passes that do similar transforms (they are by no means exact matches, otherwise fold_truth_andor would be gone already). Richard. > > for gcc/ChangeLog > > * fold-const.c (prepare_xor): New. > (decode_field_reference): Handle xor, shift, and narrowing > conversions. > (all_ones_mask_p): Remove. > (compute_split_boundary_from_align): New. > (build_split_load, reuse_split_load): New. > (fold_truth_andor_1): Add recursion to combine pairs of > non-neighboring compares. Handle xor compared with zero. > Handle fields straddling across alignment boundaries. > Generalize handling of non-constant rhs. > (fold_truth_andor): Leave sub-expression handling to the > recursion above. > * config/rs6000/t-rs6000 (toplev.o-warn): Disable errors. > > for gcc/testsuite/ChangeLog > > * gcc.dg/field-merge-1.c: New. > * gcc.dg/field-merge-2.c: New. > * gcc.dg/field-merge-3.c: New. > * gcc.dg/field-merge-4.c: New. > * gcc.dg/field-merge-5.c: New. > --- > gcc/config/rs6000/t-rs6000 | 4 > gcc/fold-const.c | 818 > ++++++++++++++++++++++++++++------ > gcc/testsuite/gcc.dg/field-merge-1.c | 64 +++ > gcc/testsuite/gcc.dg/field-merge-2.c | 31 + > gcc/testsuite/gcc.dg/field-merge-3.c | 36 + > gcc/testsuite/gcc.dg/field-merge-4.c | 40 ++ > gcc/testsuite/gcc.dg/field-merge-5.c | 40 ++ > 7 files changed, 882 insertions(+), 151 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c > create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c > create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c > create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c > create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c > > diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 > index 1ddb572..516486d 100644 > --- a/gcc/config/rs6000/t-rs6000 > +++ b/gcc/config/rs6000/t-rs6000 > @@ -52,6 +52,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: > $(srcdir)/config/rs6000/genopt.sh \ > $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \ > $(srcdir)/config/rs6000/rs6000-tables.opt > > +# FRAME_GROWS_DOWNWARD tests flag_sanitize in a way that rules out a > +# test in toplev.c. > +toplev.o-warn = -Wno-error > + > # The rs6000 backend doesn't cause warnings in these files. > insn-conditions.o-warn = > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c > index 0cc80ad..bfbf88c 100644 > --- a/gcc/fold-const.c > +++ b/gcc/fold-const.c > @@ -4603,6 +4603,41 @@ optimize_bit_field_compare (location_t loc, enum > tree_code code, > return lhs; > } > > +/* If *R_ARG is a constant zero, and L_ARG is a possibly masked > + BIT_XOR_EXPR, return 1 and set *r_arg to l_arg. > + Otherwise, return 0. > + > + The returned value should be passed to decode_field_reference for it > + to handle l_arg, and then doubled for r_arg. */ > +static int > +prepare_xor (tree l_arg, tree *r_arg) > +{ > + int ret = 0; > + > + if (!integer_zerop (*r_arg)) > + return ret; > + > + tree exp = l_arg; > + STRIP_NOPS (exp); > + > + if (TREE_CODE (exp) == BIT_AND_EXPR) > + { > + tree and_mask = TREE_OPERAND (exp, 1); > + exp = TREE_OPERAND (exp, 0); > + STRIP_NOPS (exp); STRIP_NOPS (and_mask); > + if (TREE_CODE (and_mask) != INTEGER_CST) > + return ret; > + } > + > + if (TREE_CODE (exp) == BIT_XOR_EXPR) > + { > + *r_arg = l_arg; > + return 1; > + } > + > + return ret; > +} > + > /* Subroutine for fold_truth_andor_1: decode a field reference. > > If EXP is a comparison reference, we return the innermost reference. > @@ -4625,6 +4660,10 @@ optimize_bit_field_compare (location_t loc, enum > tree_code code, > > *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any. > > + XOR_WHICH is 1 or 2 if EXP was found to be a (possibly masked) > + BIT_XOR_EXPR compared with zero. We're to take the first or second > + operand thereof if so. It should be zero otherwise. > + > Return 0 if this is not a component reference or is one that we can't > do anything with. */ > > @@ -4632,7 +4671,7 @@ static tree > decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize, > HOST_WIDE_INT *pbitpos, machine_mode *pmode, > int *punsignedp, int *preversep, int *pvolatilep, > - tree *pmask, tree *pand_mask) > + tree *pmask, tree *pand_mask, int xor_which) > { > tree exp = *exp_; > tree outer_type = 0; > @@ -4640,6 +4679,7 @@ decode_field_reference (location_t loc, tree *exp_, > HOST_WIDE_INT *pbitsize, > tree mask, inner, offset; > tree unsigned_type; > unsigned int precision; > + HOST_WIDE_INT shiftrt = 0; > > /* All the optimizations using this function assume integer fields. > There are problems with FP fields since the type_for_size call > @@ -4664,13 +4704,55 @@ decode_field_reference (location_t loc, tree *exp_, > HOST_WIDE_INT *pbitsize, > return NULL_TREE; > } > > + if (xor_which) > + { > + gcc_checking_assert (TREE_CODE (exp) == BIT_XOR_EXPR); > + exp = TREE_OPERAND (exp, xor_which - 1); > + STRIP_NOPS (exp); > + } > + > + if (CONVERT_EXPR_P (exp) > + || TREE_CODE (exp) == NON_LVALUE_EXPR) > + { > + if (!outer_type) > + outer_type = TREE_TYPE (exp); > + exp = TREE_OPERAND (exp, 0); > + STRIP_NOPS (exp); > + } > + > + if (TREE_CODE (exp) == RSHIFT_EXPR > + && TREE_CODE (TREE_OPERAND (exp, 1)) == INTEGER_CST > + && tree_fits_shwi_p (TREE_OPERAND (exp, 1))) > + { > + tree shift = TREE_OPERAND (exp, 1); > + STRIP_NOPS (shift); > + shiftrt = tree_to_shwi (shift); > + if (shiftrt > 0) > + { > + exp = TREE_OPERAND (exp, 0); > + STRIP_NOPS (exp); > + } > + else > + shiftrt = 0; > + } > + > + if (CONVERT_EXPR_P (exp) > + || TREE_CODE (exp) == NON_LVALUE_EXPR) > + { > + if (!outer_type) > + outer_type = TREE_TYPE (exp); > + exp = TREE_OPERAND (exp, 0); > + STRIP_NOPS (exp); > + } > + > poly_int64 poly_bitsize, poly_bitpos; > inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset, > pmode, punsignedp, preversep, pvolatilep); > + > if ((inner == exp && and_mask == 0) > || !poly_bitsize.is_constant (pbitsize) > || !poly_bitpos.is_constant (pbitpos) > - || *pbitsize < 0 > + || *pbitsize <= shiftrt > || offset != 0 > || TREE_CODE (inner) == PLACEHOLDER_EXPR > /* Reject out-of-bound accesses (PR79731). */ > @@ -4679,6 +4761,21 @@ decode_field_reference (location_t loc, tree *exp_, > HOST_WIDE_INT *pbitsize, > *pbitpos + *pbitsize) < 0)) > return NULL_TREE; > > + if (shiftrt) > + { > + if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN) > + *pbitpos += shiftrt; > + *pbitsize -= shiftrt; > + } > + > + if (outer_type && *pbitsize > TYPE_PRECISION (outer_type)) > + { > + HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type); > + if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN) > + *pbitpos += excess; > + *pbitsize -= excess; > + } > + > unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1); > if (unsigned_type == NULL_TREE) > return NULL_TREE; > @@ -4709,27 +4806,6 @@ decode_field_reference (location_t loc, tree *exp_, > HOST_WIDE_INT *pbitsize, > return inner; > } > > -/* Return nonzero if MASK represents a mask of SIZE ones in the low-order > - bit positions and MASK is SIGNED. */ > - > -static bool > -all_ones_mask_p (const_tree mask, unsigned int size) > -{ > - tree type = TREE_TYPE (mask); > - unsigned int precision = TYPE_PRECISION (type); > - > - /* If this function returns true when the type of the mask is > - UNSIGNED, then there will be errors. In particular see > - gcc.c-torture/execute/990326-1.c. There does not appear to be > - any documentation paper trail as to why this is so. But the pre > - wide-int worked with that restriction and it has been preserved > - here. */ > - if (size > precision || TYPE_SIGN (type) == UNSIGNED) > - return false; > - > - return wi::mask (size, false, precision) == wi::to_wide (mask); > -} > - > /* Subroutine for fold: determine if VAL is the INTEGER_CONST that > represents the sign bit of EXP's type. If EXP represents a sign > or zero extension, also test VAL against the unextended type. > @@ -6120,6 +6196,139 @@ merge_truthop_with_opposite_arm (location_t loc, tree > op, tree cmpop, > return NULL_TREE; > } > > +/* Return the one bitpos within bit extents L or R that is at an > + ALIGN-bit alignment boundary, or -1 if there is more than one such > + boundary, if there isn't any, or if there is any such boundary > + between the extents. L and R are given by bitpos and bitsize. If > + it doesn't return -1, there are two consecutive ALIGN-bit words > + that contain both extents, and at least one of the extents > + straddles across the returned alignment boundary. */ > +static inline HOST_WIDE_INT > +compute_split_boundary_from_align (HOST_WIDE_INT align, > + HOST_WIDE_INT l_bitpos, > + HOST_WIDE_INT l_bitsize, > + HOST_WIDE_INT r_bitpos, > + HOST_WIDE_INT r_bitsize) > +{ > + HOST_WIDE_INT amask = ~(align - 1); > + > + HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos); > + HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize); > + > + HOST_WIDE_INT boundary = (end_bit - 1) & amask; > + > + /* Make sure we're crossing no more than one alignment boundary. > + > + ??? We don't have logic to recombine loads of two adjacent > + fields that each crosses a different alignment boundary, so > + as to load the middle word only once, if other words can't be > + otherwise recombined. */ > + if (boundary - first_bit > align) > + return -1; > + > + HOST_WIDE_INT l_start_word = l_bitpos & amask; > + HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask; > + > + HOST_WIDE_INT r_start_word = r_bitpos & amask; > + HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask; > + > + /* If neither field straddles across an alignment boundary, it's no > + use to even try to merge them. */ > + if (l_start_word == l_end_word && r_start_word == r_end_word) > + return -1; > + > + return boundary; > +} > + > +/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at > + LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2, > + respectively, starting at BIT_POS, using reversed endianness if > + REVERSEP. Also initialize BITPOS (the starting position of each > + part into INNER), BITSIZ (the bit count starting at BITPOS), > + TOSHIFT[1] (the amount by which the part and its mask are to be > + shifted right to bring its least-significant bit to bit zero) and > + SHIFTED (the amount by which the part, by separate loading, has > + already been shifted right, but that the mask needs shifting to > + match). */ > +static inline void > +build_split_load (tree /* out */ ln_arg[2], > + HOST_WIDE_INT /* out */ bitpos[2], > + HOST_WIDE_INT /* out */ bitsiz[2], > + HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2], > + HOST_WIDE_INT /* out */ shifted[2], > + location_t loc, tree inner, tree orig_inner, > + scalar_int_mode mode, scalar_int_mode mode2, > + HOST_WIDE_INT bit_pos, bool reversep) > +{ > + bitsiz[0] = GET_MODE_BITSIZE (mode); > + bitsiz[1] = GET_MODE_BITSIZE (mode2); > + > + for (int i = 0; i < 2; i++) > + { > + tree type = lang_hooks.types.type_for_size (bitsiz[i], 1); > + bitpos[i] = bit_pos; > + ln_arg[i] = make_bit_field_ref (loc, inner, orig_inner, > + type, bitsiz[i], > + bit_pos, 1, reversep); > + bit_pos += bitsiz[i]; > + } > + > + toshift[1] = toshift[0]; > + if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN) > + { > + shifted[0] = bitsiz[1]; > + shifted[1] = 0; > + toshift[0] = 0; > + } > + else > + { > + shifted[1] = bitsiz[0]; > + shifted[0] = 0; > + toshift[1] = 0; > + } > +} > + > +/* Make arrangements to split at bit BOUNDARY a single loaded word > + (with REVERSEP bit order) LN_ARG[0], to be shifted right by > + TOSHIFT[0] to bring the field of interest to the least-significant > + bit. The expectation is that the same loaded word will be > + propagated from part 0 to part 1, with just different shifting and > + masking to extract both parts. MASK is not expected to do more > + than masking out the bits that belong to the other part. See > + build_split_load for more information on the other fields. */ > +static inline void > +reuse_split_load (tree /* in[0] out[1] */ ln_arg[2], > + HOST_WIDE_INT /* in[0] out[1] */ bitpos[2], > + HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2], > + HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2], > + HOST_WIDE_INT /* out */ shifted[2], > + tree /* out */ mask[2], > + HOST_WIDE_INT boundary, bool reversep) > +{ > + ln_arg[1] = ln_arg[0]; > + bitpos[1] = bitpos[0]; > + bitsiz[1] = bitsiz[0]; > + shifted[1] = shifted[0] = 0; > + > + tree basemask = build_int_cst_type (TREE_TYPE (ln_arg[0]), -1); > + > + if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN) > + { > + toshift[1] = toshift[0]; > + toshift[0] = bitpos[0] + bitsiz[0] - boundary; > + mask[0] = const_binop (LSHIFT_EXPR, basemask, > + bitsize_int (toshift[0])); > + mask[1] = const_binop (BIT_XOR_EXPR, basemask, mask[0]); > + } > + else > + { > + toshift[1] = boundary - bitpos[1]; > + mask[1] = const_binop (LSHIFT_EXPR, basemask, > + bitsize_int (toshift[1])); > + mask[0] = const_binop (BIT_XOR_EXPR, basemask, mask[1]); > + } > +} > + > /* Find ways of folding logical expressions of LHS and RHS: > Try to merge two comparisons to the same innermost item. > Look for range tests like "ch >= '0' && ch <= '9'". > @@ -6142,11 +6351,19 @@ merge_truthop_with_opposite_arm (location_t loc, tree > op, tree cmpop, > TRUTH_TYPE is the type of the logical operand and LHS and RHS are its > two operands. > > + SEPARATEP should be NULL if LHS and RHS are adjacent in > + CODE-chained compares, and a non-NULL pointer to NULL_TREE > + otherwise. If the "words" accessed by RHS are already accessed by > + LHS, this won't matter, but if RHS accesses "words" that LHS > + doesn't, then *SEPARATEP will be set to the compares that should > + take RHS's place. By "words" we mean contiguous bits that do not > + cross a an TYPE_ALIGN boundary of the accessed object's type. > + > We return the simplified tree or 0 if no optimization is possible. */ > > static tree > fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type, > - tree lhs, tree rhs) > + tree lhs, tree rhs, tree *separatep) > { > /* If this is the "or" of two comparisons, we can do something if > the comparisons are NE_EXPR. If this is the "and", we can do something > @@ -6157,6 +6374,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > convert EQ_EXPR to NE_EXPR so we need not reject the "wrong" > comparison for one-bit fields. */ > > + enum tree_code orig_code = code; > enum tree_code wanted_code; > enum tree_code lcode, rcode; > tree ll_arg, lr_arg, rl_arg, rr_arg; > @@ -6168,13 +6386,16 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp; > int ll_reversep, lr_reversep, rl_reversep, rr_reversep; > machine_mode ll_mode, lr_mode, rl_mode, rr_mode; > - scalar_int_mode lnmode, rnmode; > + scalar_int_mode lnmode, lnmode2, rnmode; > tree ll_mask, lr_mask, rl_mask, rr_mask; > tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask; > tree l_const, r_const; > tree lntype, rntype, result; > HOST_WIDE_INT first_bit, end_bit; > int volatilep; > + bool l_split_load; > + > + gcc_checking_assert (!separatep || !*separatep); > > /* Start by getting the comparison codes. Fail if anything is volatile. > If one operand is a BIT_AND_EXPR with the constant one, treat it as if > @@ -6202,7 +6423,116 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > > if (TREE_CODE_CLASS (lcode) != tcc_comparison > || TREE_CODE_CLASS (rcode) != tcc_comparison) > - return 0; > + { > + tree separate = NULL; > + > + /* Check for the possibility of merging component references. > + If any of our operands is another similar operation, recurse > + to try to merge individual operands, but avoiding double > + recursion: recurse to each leaf of LHS, and from there to > + each leaf of RHS, but don't bother recursing into LHS if RHS > + is neither a comparison nor a compound expr, nor into RHS if > + the LHS leaf isn't a comparison. In case of no successful > + merging, recursion depth is limited to the sum of the depths > + of LHS and RHS, and the non-recursing code below will run no > + more times than the product of the leaf counts of LHS and > + RHS. If there is a successful merge, we (recursively) > + further attempt to fold the result, so recursion depth and > + merge attempts are harder to compute. */ > + if (TREE_CODE (lhs) == code && TREE_TYPE (lhs) == truth_type > + && (TREE_CODE_CLASS (rcode) == tcc_comparison > + || (TREE_CODE (rhs) == code && TREE_TYPE (rhs) == truth_type))) > + { > + if ((result = fold_truth_andor_1 (loc, code, truth_type, > + TREE_OPERAND (lhs, 1), rhs, > + separatep > + ? separatep > + : NULL)) != 0) > + { > + /* We have combined the latter part of LHS with RHS. If > + they were separate, the recursion already placed any > + remains of RHS in *SEPARATEP, otherwise they are all > + in RESULT, so we just have to prepend to result the > + former part of LHS. */ > + result = fold_build2_loc (loc, code, truth_type, > + TREE_OPERAND (lhs, 0), result); > + return result; > + } > + if ((result = fold_truth_andor_1 (loc, code, truth_type, > + TREE_OPERAND (lhs, 0), rhs, > + separatep > + ? separatep > + : &separate)) != 0) > + { > + /* We have combined the former part of LHS with RHS. If > + they were separate, the recursive call will have > + placed remnants of RHS in *SEPARATEP. If they > + wereń't, they will be in SEPARATE instead. Append > + the latter part of LHS to the result, and then any > + remnants of RHS that we haven't passed on to the > + caller. */ > + result = fold_build2_loc (loc, code, truth_type, > + result, TREE_OPERAND (lhs, 1)); > + if (separate) > + result = fold_build2_loc (loc, code, truth_type, > + result, separate); > + return result; > + } > + } > + else if (TREE_CODE_CLASS (lcode) == tcc_comparison > + && TREE_CODE (rhs) == code && TREE_TYPE (rhs) == truth_type) > + { > + if ((result = fold_truth_andor_1 (loc, code, truth_type, > + lhs, TREE_OPERAND (rhs, 0), > + separatep > + ? &separate > + : NULL)) != 0) > + { > + /* We have combined LHS with the former part of RHS. If > + they were separate, have any remnants of RHS placed > + in separate, so that we can combine them with the > + latter part of RHS, and then send them back for the > + caller to handle. If they were adjacent, we can just > + append the latter part of RHS to the RESULT. */ > + if (!separate) > + separate = TREE_OPERAND (rhs, 1); > + else > + separate = fold_build2_loc (loc, code, truth_type, > + separate, TREE_OPERAND (rhs, 1)); > + if (separatep) > + *separatep = separate; > + else > + result = fold_build2_loc (loc, code, truth_type, > + result, separate); > + return result; > + } > + if ((result = fold_truth_andor_1 (loc, code, truth_type, > + lhs, TREE_OPERAND (rhs, 1), > + &separate)) != 0) > + { > + /* We have combined LHS with the latter part of RHS. > + They're definitely not adjacent, so we get the > + remains of RHS in SEPARATE, and then prepend the > + former part of RHS to it. If LHS and RHS were > + already separate to begin with, we leave the remnants > + of RHS for the caller to deal with, otherwise we > + append them to the RESULT. */ > + if (!separate) > + separate = TREE_OPERAND (rhs, 0); > + else > + separate = fold_build2_loc (loc, code, truth_type, > + TREE_OPERAND (rhs, 0), separate); > + if (separatep) > + *separatep = separate; > + else > + result = fold_build2_loc (loc, code, truth_type, > + result, separate); > + return result; > + } > + } > + > + return 0; > + } > > ll_arg = TREE_OPERAND (lhs, 0); > lr_arg = TREE_OPERAND (lhs, 1); > @@ -6278,22 +6608,24 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > > ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0; > volatilep = 0; > + int l_xor = prepare_xor (ll_arg, &lr_arg); > ll_inner = decode_field_reference (loc, &ll_arg, > &ll_bitsize, &ll_bitpos, &ll_mode, > &ll_unsignedp, &ll_reversep, &volatilep, > - &ll_mask, &ll_and_mask); > + &ll_mask, &ll_and_mask, l_xor); > lr_inner = decode_field_reference (loc, &lr_arg, > &lr_bitsize, &lr_bitpos, &lr_mode, > &lr_unsignedp, &lr_reversep, &volatilep, > - &lr_mask, &lr_and_mask); > + &lr_mask, &lr_and_mask, 2 * l_xor); > + int r_xor = prepare_xor (rl_arg, &rr_arg); > rl_inner = decode_field_reference (loc, &rl_arg, > &rl_bitsize, &rl_bitpos, &rl_mode, > &rl_unsignedp, &rl_reversep, &volatilep, > - &rl_mask, &rl_and_mask); > + &rl_mask, &rl_and_mask, r_xor); > rr_inner = decode_field_reference (loc, &rr_arg, > &rr_bitsize, &rr_bitpos, &rr_mode, > &rr_unsignedp, &rr_reversep, &volatilep, > - &rr_mask, &rr_and_mask); > + &rr_mask, &rr_and_mask, 2 * r_xor); > > /* It must be true that the inner operation on the lhs of each > comparison must be the same if we are to be able to do anything. > @@ -6349,6 +6681,72 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > return 0; > } > > + /* This will be bumped to 2 if any of the field pairs crosses an > + alignment boundary, so the merged compare has to be done in two > + parts. */ > + int parts = 1; > + /* Set to true if the second combined compare should come first, > + e.g., because the second original compare accesses a word that > + the first one doesn't, and the combined compares access those in > + cmp[0]. */ > + bool first1 = false; > + /* Set to true if the first original compare is not the one being > + split. */ > + bool maybe_separate = false; > + > + /* The following 2-dimensional arrays use the first index to > + identify left(0)- vs right(1)-hand compare operands, and the > + second one to identify merged compare parts. */ > + /* The memory loads or constants to be compared. */ > + tree ld_arg[2][2]; > + /* The first bit of the corresponding inner object that the > + corresponding LD_ARG covers. */ > + HOST_WIDE_INT bitpos[2][2]; > + /* The bit count starting at BITPOS that the corresponding LD_ARG > + covers. */ > + HOST_WIDE_INT bitsiz[2][2]; > + /* The number of bits by which LD_ARG has already been shifted > + right, WRT mask. */ > + HOST_WIDE_INT shifted[2][2]; > + /* The number of bits by which both LD_ARG and MASK need shifting to > + bring its least-significant bit to bit zero. */ > + HOST_WIDE_INT toshift[2][2]; > + /* An additional mask to be applied to LD_ARG, to remove any bits > + that may have been loaded for use in another compare, but that > + don't belong in the corresponding compare. */ > + tree xmask[2][2] = {}; > + > + /* The combined compare or compares. */ > + tree cmp[2]; > + > + /* Consider we're comparing two non-contiguous fields of packed > + structs, both aligned at 32-bit boundaries: > + > + ll_arg: an 8-bit field at offset 0 > + lr_arg: a 16-bit field at offset 2 > + > + rl_arg: an 8-bit field at offset 1 > + rr_arg: a 16-bit field at offset 3 > + > + We'll have r_split_load, because rr_arg straddles across an > + alignment boundary. > + > + We'll want to have: > + > + bitpos = { { 0, 0 }, { 0, 32 } } > + bitsiz = { { 32, 32 }, { 32, 8 } } > + > + And, for little-endian: > + > + shifted = { { 0, 0 }, { 0, 32 } } > + toshift = { { 0, 24 }, { 0, 0 } } > + > + Or, for big-endian: > + > + shifted = { { 0, 0 }, { 8, 0 } } > + toshift = { { 8, 0 }, { 0, 0 } } > + */ > + > /* See if we can find a mode that contains both fields being compared on > the left. If we can't, fail. Otherwise, update all constants and masks > to be relative to a field of that size. */ > @@ -6357,11 +6755,41 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0, > TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD, > volatilep, &lnmode)) > - return 0; > + { > + /* Consider the possibility of recombining loads if any of the > + fields straddles across an alignment boundary, so that either > + part can be loaded along with the other field. */ > + HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner)); > + HOST_WIDE_INT boundary = compute_split_boundary_from_align > + (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); > + > + if (boundary < 0 > + || !get_best_mode (boundary - first_bit, first_bit, 0, 0, > + align, BITS_PER_WORD, volatilep, &lnmode) > + || !get_best_mode (end_bit - boundary, boundary, 0, 0, > + align, BITS_PER_WORD, volatilep, &lnmode2)) > + return 0; > + > + l_split_load = true; > + parts = 2; > + if (ll_bitpos >= boundary) > + maybe_separate = first1 = true; > + else if (ll_bitpos + ll_bitsize <= boundary) > + maybe_separate = true; > + } > + else > + l_split_load = false; > > lnbitsize = GET_MODE_BITSIZE (lnmode); > lnbitpos = first_bit & ~ (lnbitsize - 1); > + if (l_split_load) > + lnbitsize += GET_MODE_BITSIZE (lnmode2); > lntype = lang_hooks.types.type_for_size (lnbitsize, 1); > + if (!lntype) > + { > + gcc_checking_assert (l_split_load); > + lntype = build_nonstandard_integer_type (lnbitsize, 1); > + } > xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos; > > if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN) > @@ -6414,19 +6842,58 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > || ll_reversep != lr_reversep > /* Make sure the two fields on the right > correspond to the left without being swapped. */ > - || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos) > + || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos > + || lnbitpos < 0) > return 0; > > + bool r_split_load; > + scalar_int_mode rnmode2; > + > first_bit = MIN (lr_bitpos, rr_bitpos); > end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize); > if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0, > TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD, > volatilep, &rnmode)) > - return 0; > + { > + /* Consider the possibility of recombining loads if any of the > + fields straddles across an alignment boundary, so that either > + part can be loaded along with the other field. */ > + HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner)); > + HOST_WIDE_INT boundary = compute_split_boundary_from_align > + (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize); > + > + if (boundary < 0 > + /* If we're to split both, make sure the split point is > + the same. */ > + || (l_split_load > + && (boundary - lr_bitpos > + != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos)) > + || !get_best_mode (boundary - first_bit, first_bit, 0, 0, > + align, BITS_PER_WORD, volatilep, &rnmode) > + || !get_best_mode (end_bit - boundary, boundary, 0, 0, > + align, BITS_PER_WORD, volatilep, &rnmode2)) > + return 0; > + > + r_split_load = true; > + parts = 2; > + if (lr_bitpos >= boundary) > + maybe_separate = first1 = true; > + else if (lr_bitpos + lr_bitsize <= boundary) > + maybe_separate = true; > + } > + else > + r_split_load = false; > > rnbitsize = GET_MODE_BITSIZE (rnmode); > rnbitpos = first_bit & ~ (rnbitsize - 1); > + if (r_split_load) > + rnbitsize += GET_MODE_BITSIZE (rnmode2); > rntype = lang_hooks.types.type_for_size (rnbitsize, 1); > + if (!rntype) > + { > + gcc_checking_assert (r_split_load); > + rntype = build_nonstandard_integer_type (rnbitsize, 1); > + } > xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos; > > if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN) > @@ -6442,133 +6909,190 @@ fold_truth_andor_1 (location_t loc, enum tree_code > code, tree truth_type, > rntype, rr_mask), > size_int (xrr_bitpos)); > > - /* Make a mask that corresponds to both fields being compared. > - Do this for both items being compared. If the operands are the > - same size and the bits being compared are in the same position > - then we can do this by masking both and comparing the masked > - results. */ > - ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask); > lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask); > - if (lnbitsize == rnbitsize > - && xll_bitpos == xlr_bitpos > - && lnbitpos >= 0 > - && rnbitpos >= 0) > - { > - lhs = make_bit_field_ref (loc, ll_inner, ll_arg, > - lntype, lnbitsize, lnbitpos, > - ll_unsignedp || rl_unsignedp, > ll_reversep); > - if (! all_ones_mask_p (ll_mask, lnbitsize)) > - lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask); > - > - rhs = make_bit_field_ref (loc, lr_inner, lr_arg, > - rntype, rnbitsize, rnbitpos, > - lr_unsignedp || rr_unsignedp, > lr_reversep); > - if (! all_ones_mask_p (lr_mask, rnbitsize)) > - rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask); > - > - return build2_loc (loc, wanted_code, truth_type, lhs, rhs); > - } > - > - /* There is still another way we can do something: If both pairs of > - fields being compared are adjacent, we may be able to make a wider > - field containing them both. > - > - Note that we still must mask the lhs/rhs expressions. Furthermore, > - the mask must be shifted to account for the shift done by > - make_bit_field_ref. */ > - if (((ll_bitsize + ll_bitpos == rl_bitpos > - && lr_bitsize + lr_bitpos == rr_bitpos) > - || (ll_bitpos == rl_bitpos + rl_bitsize > - && lr_bitpos == rr_bitpos + rr_bitsize)) > - && ll_bitpos >= 0 > - && rl_bitpos >= 0 > - && lr_bitpos >= 0 > - && rr_bitpos >= 0) > - { > - tree type; > - > - lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype, > - ll_bitsize + rl_bitsize, > - MIN (ll_bitpos, rl_bitpos), > - ll_unsignedp, ll_reversep); > - rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype, > - lr_bitsize + rr_bitsize, > - MIN (lr_bitpos, rr_bitpos), > - lr_unsignedp, lr_reversep); > - > - ll_mask = const_binop (RSHIFT_EXPR, ll_mask, > - size_int (MIN (xll_bitpos, xrl_bitpos))); > - lr_mask = const_binop (RSHIFT_EXPR, lr_mask, > - size_int (MIN (xlr_bitpos, xrr_bitpos))); > - > - /* Convert to the smaller type before masking out unwanted bits. */ > - type = lntype; > - if (lntype != rntype) > + > + toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos); > + shifted[1][0] = 0; > + > + if (!r_split_load) > + { > + bitpos[1][0] = rnbitpos; > + bitsiz[1][0] = rnbitsize; > + ld_arg[1][0] = make_bit_field_ref (loc, lr_inner, lr_arg, > + rntype, rnbitsize, rnbitpos, > + lr_unsignedp || rr_unsignedp, > + lr_reversep); > + } > + > + if (parts > 1) > + { > + if (r_split_load) > + build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1], > + shifted[1], loc, lr_inner, lr_arg, > + rnmode, rnmode2, rnbitpos, lr_reversep); > + else > + reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1], > + shifted[1], xmask[1], > + lnbitpos + GET_MODE_BITSIZE (lnmode) > + - ll_bitpos + lr_bitpos, lr_reversep); > + } > + } > + else > + { > + /* Handle the case of comparisons with constants. If there is > + something in common between the masks, those bits of the > + constants must be the same. If not, the condition is always > + false. Test for this to avoid generating incorrect code > + below. */ > + result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask); > + if (! integer_zerop (result) > + && simple_cst_equal (const_binop (BIT_AND_EXPR, > + result, l_const), > + const_binop (BIT_AND_EXPR, > + result, r_const)) != 1) > + { > + if (wanted_code == NE_EXPR) > { > - if (lnbitsize > rnbitsize) > - { > - lhs = fold_convert_loc (loc, rntype, lhs); > - ll_mask = fold_convert_loc (loc, rntype, ll_mask); > - type = rntype; > - } > - else if (lnbitsize < rnbitsize) > - { > - rhs = fold_convert_loc (loc, lntype, rhs); > - lr_mask = fold_convert_loc (loc, lntype, lr_mask); > - type = lntype; > - } > + warning (0, > + "%<or%> of unmatched not-equal tests" > + " is always 1"); > + return constant_boolean_node (true, truth_type); > } > + else > + { > + warning (0, > + "%<and%> of mutually exclusive equal-tests" > + " is always 0"); > + return constant_boolean_node (false, truth_type); > + } > + } > > - if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize)) > - lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask); > + if (lnbitpos < 0) > + return 0; > > - if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize)) > - rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask); > + /* The constants are combined so as to line up with the loaded > + field, so use the same parameters. */ > + ld_arg[1][0] = const_binop (BIT_IOR_EXPR, l_const, r_const); > + toshift[1][0] = MIN (xll_bitpos, xrl_bitpos); > + shifted[1][0] = 0; > + bitpos[1][0] = lnbitpos; > + bitsiz[1][0] = lnbitsize; > > - return build2_loc (loc, wanted_code, truth_type, lhs, rhs); > - } > + if (parts > 1) > + reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1], > + shifted[1], xmask[1], > + lnbitpos + GET_MODE_BITSIZE (lnmode), > + lr_reversep); > > - return 0; > + lr_mask = build_int_cst_type (TREE_TYPE (ld_arg[1][0]), -1); > + > + /* If the compiler thinks this is used uninitialized below, it's > + because it can't realize that parts can only be 2 when > + comparing wiht constants if l_split_load is also true. This > + just silences the warning. */ > + rnbitpos = 0; > } > > - /* Handle the case of comparisons with constants. If there is something in > - common between the masks, those bits of the constants must be the same. > - If not, the condition is always false. Test for this to avoid > generating > - incorrect code below. */ > - result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask); > - if (! integer_zerop (result) > - && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const), > - const_binop (BIT_AND_EXPR, result, r_const)) != 1) > + ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask); > + toshift[0][0] = MIN (xll_bitpos, xrl_bitpos); > + shifted[0][0] = 0; > + > + if (!l_split_load) > { > - if (wanted_code == NE_EXPR) > + bitpos[0][0] = lnbitpos; > + bitsiz[0][0] = lnbitsize; > + ld_arg[0][0] = make_bit_field_ref (loc, ll_inner, ll_arg, > + lntype, lnbitsize, lnbitpos, > + ll_unsignedp || rl_unsignedp, > + ll_reversep); > + } > + > + if (parts > 1) > + { > + if (l_split_load) > + build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0], > + shifted[0], loc, ll_inner, ll_arg, > + lnmode, lnmode2, lnbitpos, ll_reversep); > + else > + reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0], > + shifted[0], xmask[0], > + rnbitpos + GET_MODE_BITSIZE (rnmode) > + - lr_bitpos + ll_bitpos, ll_reversep); > + } > + > + for (int i = 0; i < parts; i++) > + { > + tree op[2] = { ld_arg[0][i], ld_arg[1][i] }; > + tree mask[2] = { ll_mask, lr_mask }; > + > + for (int j = 0; j < 2; j++) > { > - warning (0, "%<or%> of unmatched not-equal tests is always 1"); > - return constant_boolean_node (true, truth_type); > + /* Mask out the bits belonging to the other part. */ > + if (xmask[j][i]) > + mask[j] = const_binop (BIT_AND_EXPR, mask[j], xmask[j][i]); > + > + if (shifted[j][i]) > + { > + tree shiftsz = bitsize_int (shifted[j][i]); > + mask[j] = const_binop (RSHIFT_EXPR, mask[j], shiftsz); > + } > + mask[j] = fold_convert_loc (loc, TREE_TYPE (op[j]), mask[j]); > } > - else > + > + HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]); > + > + if (shift) > { > - warning (0, "%<and%> of mutually exclusive equal-tests is always > 0"); > - return constant_boolean_node (false, truth_type); > + int j; > + if (shift > 0) > + j = 0; > + else > + { > + j = 1; > + shift = -shift; > + } > + > + tree shiftsz = bitsize_int (shift); > + op[j] = fold_build2_loc (loc, RSHIFT_EXPR, TREE_TYPE (op[j]), > + op[j], shiftsz); > + mask[j] = const_binop (RSHIFT_EXPR, mask[j], shiftsz); > } > - } > > - if (lnbitpos < 0) > - return 0; > + /* Convert to the smaller type before masking out unwanted > + bits. */ > + tree type = TREE_TYPE (op[0]); > + if (type != TREE_TYPE (op[1])) > + { > + int j = (TYPE_PRECISION (type) > + < TYPE_PRECISION (TREE_TYPE (op[1]))); > + if (!j) > + type = TREE_TYPE (op[1]); > + op[j] = fold_convert_loc (loc, type, op[j]); > + mask[j] = fold_convert_loc (loc, type, mask[j]); > + } > > - /* Construct the expression we will return. First get the component > - reference we will make. Unless the mask is all ones the width of > - that field, perform the mask operation. Then compare with the > - merged constant. */ > - result = make_bit_field_ref (loc, ll_inner, ll_arg, > - lntype, lnbitsize, lnbitpos, > - ll_unsignedp || rl_unsignedp, ll_reversep); > + for (int j = 0; j < 2; j++) > + if (! integer_all_onesp (mask[j])) > + op[j] = build2_loc (loc, BIT_AND_EXPR, type, > + op[j], mask[j]); > > - ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask); > - if (! all_ones_mask_p (ll_mask, lnbitsize)) > - result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask); > + cmp[i] = build2_loc (loc, wanted_code, truth_type, op[0], op[1]); > + } > + > + if (first1) > + std::swap (cmp[0], cmp[1]); > + > + if (parts == 1) > + result = cmp[0]; > + else if (!separatep || !maybe_separate) > + result = build2_loc (loc, orig_code, truth_type, cmp[0], cmp[1]); > + else > + { > + result = cmp[0]; > + *separatep = cmp[1]; > + } > > - return build2_loc (loc, wanted_code, truth_type, result, > - const_binop (BIT_IOR_EXPR, l_const, r_const)); > + return result; > } > > /* T is an integer expression that is being multiplied, divided, or taken a > @@ -9166,15 +9690,7 @@ fold_truth_andor (location_t loc, enum tree_code code, > tree type, > return fold_build2_loc (loc, code, type, arg0, tem); > } > > - /* Check for the possibility of merging component references. If our > - lhs is another similar operation, try to merge its rhs with our > - rhs. Then try to merge our lhs and rhs. */ > - if (TREE_CODE (arg0) == code > - && (tem = fold_truth_andor_1 (loc, code, type, > - TREE_OPERAND (arg0, 1), arg1)) != 0) > - return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); > - > - if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0) > + if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1, NULL)) != 0) > return tem; > > bool logical_op_non_short_circuit = LOGICAL_OP_NON_SHORT_CIRCUIT; > diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c > b/gcc/testsuite/gcc.dg/field-merge-1.c > new file mode 100644 > index 00000000..1818e104 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/field-merge-1.c > @@ -0,0 +1,64 @@ > +/* { dg-do run } */ > +/* { dg-options "-O -save-temps -fdump-tree-optimized" } */ > + > +/* Check that field loads compared with constants are merged, even if > + tested out of order, and when fields straddle across alignment > + boundaries. */ > + > +struct TL { > + unsigned char p; > + unsigned int a; > + unsigned char q; > + unsigned int b; > + unsigned char r; > + unsigned int c; > + unsigned char s; > +} __attribute__ ((packed, aligned (4), scalar_storage_order > ("little-endian"))); > + > +struct TB { > + unsigned char p; > + unsigned int a; > + unsigned char q; > + unsigned int b; > + unsigned char r; > + unsigned int c; > + unsigned char s; > +} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian"))); > + > +#define vc 0xaa > +#define vi 0x12345678 > + > +struct TL vL = { vc, vi, vc, vi, vc, vi, vc }; > +struct TB vB = { vc, vi, vc, vi, vc, vi, vc }; > + > +void f (void) { > + /* Which words of | vL | vB | */ > + /* are accessed by |0123|0123| */ > + if (0 /* the tests? | | | */ > + || vL.p != vc /* |* | | */ > + || vB.p != vc /* | |* | */ > + || vL.s != vc /* | *| | */ > + || vB.q != vc /* | | * | */ > + || vL.a != vi /* |^* | | */ > + || vB.b != vi /* | | ^* | */ > + || vL.c != vi /* | *^| | */ > + || vB.c != vi /* | | ^*| */ > + || vL.b != vi /* | ^^ | | */ > + || vL.q != vc /* | ^ | | */ > + || vB.a != vi /* | |^^ | */ > + || vB.r != vc /* | | ^ | */ > + || vB.s != vc /* | | ^| */ > + || vL.r != vc /* | ^ | | */ > + ) > + __builtin_abort (); > +} > + > +int main () { > + f (); > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */ > +/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* > } } } } */ > +/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || > x86_64-*-* } } } } */ > +/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || > rs6000-*-* } } } } */ > diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c > b/gcc/testsuite/gcc.dg/field-merge-2.c > new file mode 100644 > index 00000000..80c573b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/field-merge-2.c > @@ -0,0 +1,31 @@ > +/* { dg-do run } */ > +/* { dg-options "-O" } */ > + > +struct TL { > + unsigned short a; > + unsigned short b; > +} __attribute__ ((packed, aligned (8))); > + > +struct TB { > + unsigned char p; > + unsigned short a; > + unsigned short b; > +} __attribute__ ((packed, aligned (8))); > + > +#define vc 0xaa > + > +struct TL vL = { vc, vc }; > +struct TB vB = { vc, vc, vc }; > + > +void f (void) { > + if (0 > + || vL.b != vB.b > + || vL.a != vB.a > + ) > + __builtin_abort (); > +} > + > +int main () { > + f (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c > b/gcc/testsuite/gcc.dg/field-merge-3.c > new file mode 100644 > index 00000000..8fdbb9a3 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/field-merge-3.c > @@ -0,0 +1,36 @@ > +/* { dg-do run } */ > +/* { dg-options "-O" } */ > + > +const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__); > + > +struct T1 { > + unsigned char p[2]; > + unsigned short a; > + unsigned int z; > +} __attribute__((__aligned__(8))); > + > +struct T2 { > + unsigned short p; > + unsigned short a; > + unsigned int z; > +} __attribute__((__aligned__(8))); > + > +#define vc 0xaa > +#define vi 0x12345678 > + > +struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi }; > +struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi }; > + > +void f (void) { > + if (0 > + || v1.p[!BIG_ENDIAN_P] != v2.p >> 8 > + || v1.a != v2.a > + || (v1.z ^ v2.z) & 0xff00ff00 != 0 > + || (v1.z ^ v2.z) & 0x00ff00ff != 0) > + __builtin_abort (); > +} > + > +int main () { > + f (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c > b/gcc/testsuite/gcc.dg/field-merge-4.c > new file mode 100644 > index 00000000..c629069 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/field-merge-4.c > @@ -0,0 +1,40 @@ > +/* { dg-do run } */ > +/* { dg-options "-O" } */ > + > +struct T1 { > + unsigned int zn; > + unsigned char p; > + unsigned char qn; > + unsigned short a; > + unsigned int z; > +} __attribute__((__packed__, __aligned__(4))); > + > +struct T2 { > + unsigned int zn; > + unsigned char rn; > + unsigned char p; > + unsigned char qn; > + unsigned short a; > + unsigned int z; > +} __attribute__((__packed__, __aligned__(4))); > + > +#define vc 0xaa > +#define vs 0xccdd > +#define vi 0x12345678 > + > +struct T1 v1 = { -1, vc, 1, vs, vi }; > +struct T2 v2 = { -1, 0, vc, 1, vs, vi }; > + > +void f (void) { > + if (0 > + || v1.p != v2.p > + || v1.a != v2.a > + || v1.z != v2.z > + ) > + __builtin_abort (); > +} > + > +int main () { > + f (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c > b/gcc/testsuite/gcc.dg/field-merge-5.c > new file mode 100644 > index 00000000..1580b14bc > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/field-merge-5.c > @@ -0,0 +1,40 @@ > +/* { dg-do run } */ > +/* { dg-options "-O" } */ > + > +struct T1 { > + unsigned int zn; > + unsigned char p; > + unsigned char qn; > + unsigned short a; > + unsigned int z; > +} __attribute__((__packed__, __aligned__(8))); > + > +struct T2 { > + unsigned int zn; > + unsigned char rn; > + unsigned char p; > + unsigned char qn; > + unsigned short a; > + unsigned int z; > +} __attribute__((__packed__, __aligned__(8))); > + > +#define vc 0xaa > +#define vs 0xccdd > +#define vi 0x12345678 > + > +struct T1 v1 = { -1, vc, 1, vs, vi }; > +struct T2 v2 = { -1, 0, vc, 1, vs, vi }; > + > +void f (void) { > + if (0 > + || v1.p != v2.p > + || v1.a != v2.a > + || v1.z != v2.z > + ) > + __builtin_abort (); > +} > + > +int main () { > + f (); > + return 0; > +} > > > -- > Alexandre Oliva, happy hacker > https://FSFLA.org/blogs/lxo/ > Free Software Activist > GNU Toolchain Engineer