Re: [aarch64][RFA][rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64
On Wed, Apr 24, 2019 at 12:16:39PM -0600, Jeff Law wrote: > On 4/24/19 7:05 AM, Segher Boessenkool wrote: > > On Mon, Apr 22, 2019 at 09:09:12AM -0600, Jeff Law wrote: > >> First, it re-uses combine's make_field_assignment in the aarch64 > >> backend. > > > > That's not really acceptable. make_field_assignment depends intimately > > on make_extraction, which is very much combine-specific (and wrong in so > > many ways it's hard to tell). > I think the structure of the pattern and its condition avoid the most > problematic issues. But yea, it's not ideal. Oh, I'm not saying it does not work, I trust you tested it well. But there are so many ways this can completely break (with ICEs and everything), this really isn't good. You say all the problematic cases cannot happen... So make a clone of this function without any of those cases? :-) > > But it _is_ a big problem in other contexts: you should not create > > non-canonical RTL. In your specific case, where you want to run this > > from a splitter, you have to make sure you get only correct, recognised > > patterns for your machine. make_field_extraction cannot guarantee you > > that. > Even in the case where the form and operands are limited to what's in > the new pattern? My worry with make_extraction is those cases where it > gets too smart for its own good and returns something even simpler than > extraction. I'm not sure that can happen in this case or not. > > I'm not keen on duplicating the logic from make_field_assignment and > make_extraction. Though again, given the limited forms we accept it may > not be too bad. Just praying that the RTL it creates is the RTL you prefer to see, or even just RTL the backend can handle, doesn't sound too great. (And any time this breaks it is a target bug, not combine). > > You also have an insn condition that can become invalid by the time the > > pattern is split (the !reload_complted part), which means the pattern > > will then *not* be split, which won't work here (the template is "#"). > The pattern is split by the splitting pass before register allocation > and reload. It's never supposed to exist beyond that splitting pass. > I'm pretty sure we've done this kind of thing regularly in the past. Are you sure this code can never be created between the split pass and when reload is finished? Like maybe during LRA itself. It's hard to convince myself of this. > >> +(define_insn_and_split "" > > > > Give this a name? Starting with * if you don't want a gen_* for it. > Didn't figure it was worth the trouble. But it's easy enough to do. It helps debugging (and it helps naming stuff in the changelog :-) ) > >> + [(set (match_operand:GPI 0 "register_operand" "=r") > >> + (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0") > >> +(match_operand:GPI 2 "const_int_operand" "n")) > >> + (match_operand:GPI 3 "const_int_operand" "n")))] > >> + "(!reload_completed > >> +/* make_field_assignment doesn't handle subregs well. */ > >> +&& REG_P (operands[0]) > > > > Another reason to not use make_field_assignment, I'm afraid :-( > Perhaps. I'd probably want to avoid SUBREGs here regardless :-) You get a subreg for using a 64-bit pseudo as 32-bit (a pseudo has only one mode). This happens a lot for insert insns, from what I see. Segher
Re: [aarch64][RFA][rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64
On 4/24/19 7:05 AM, Segher Boessenkool wrote: > Hi Jeff, > > On Mon, Apr 22, 2019 at 09:09:12AM -0600, Jeff Law wrote: >> First, it re-uses combine's make_field_assignment in the aarch64 >> backend. > > That's not really acceptable. make_field_assignment depends intimately > on make_extraction, which is very much combine-specific (and wrong in so > many ways it's hard to tell). I think the structure of the pattern and its condition avoid the most problematic issues. But yea, it's not ideal. >> So we don't have to duplicate any of that logic. I scanned >> make_field_assignment and its children to make sure it didn't use any >> data that was only valid during the combine pass, but I could have >> missed something. This is one of those cases where encapsulating the >> pass specific bits in a class really helps avoid problems... Just >> saying > > It isn't the data that is the problem. combine has almost no private > data. But the problem is all the hidden assumptions combine makes, and > for example that it often make **non-canonical** RTL; this is no problem > in combine, because that code will just not recog() (except of course > that you then get less well optimised code, in general). (Often, combine > makes that RTL canonical later, fwiw). > > But it _is_ a big problem in other contexts: you should not create > non-canonical RTL. In your specific case, where you want to run this > from a splitter, you have to make sure you get only correct, recognised > patterns for your machine. make_field_extraction cannot guarantee you > that. Even in the case where the form and operands are limited to what's in the new pattern? My worry with make_extraction is those cases where it gets too smart for its own good and returns something even simpler than extraction. I'm not sure that can happen in this case or not. I'm not keen on duplicating the logic from make_field_assignment and make_extraction. Though again, given the limited forms we accept it may not be too bad. > >> Second, it relaxes the main insv pattern to allow an immediate in the >> operand predicate, but forces it into a register during LRA. I don't >> generally like having predicates that are looser than the constraints, >> but it really helps here. Basically we want to see the constant field >> we're going to insert. > > You also have an insn condition that can become invalid by the time the > pattern is split (the !reload_complted part), which means the pattern > will then *not* be split, which won't work here (the template is "#"). The pattern is split by the splitting pass before register allocation and reload. It's never supposed to exist beyond that splitting pass. I'm pretty sure we've done this kind of thing regularly in the past. > >> -static int >> +int >> get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen) > > If you want to reuse this, please give it a better name, and move it to > rtlanal or some header file or similar? If we go forward, yea, makes sense. > > It may be nicer to make a more machine-specific routine for this, see > rs6000_is_valid_mask for example. > >> +;; This must be split before register allocation and reloading as >> +;; we need to verify it's actually an bitfield insertion by >> +;; examining both immediate operands. After reload we will lose >> +;; knowledge of operands[3] constant status. > > I don't understand what this means? After reload operands[3] is still a > const_int? It has to be an integer prior to register allocation. Otherwise we can't verify we have a valid field assignment. The pattern should never exist past the insn splitting pass. The comment is obsolete. It should always be split prior to register allocation into a field assignment. > >> +(define_insn_and_split "" > > Give this a name? Starting with * if you don't want a gen_* for it. Didn't figure it was worth the trouble. But it's easy enough to do. > >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> +(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0") >> + (match_operand:GPI 2 "const_int_operand" "n")) >> + (match_operand:GPI 3 "const_int_operand" "n")))] >> + "(!reload_completed >> +/* make_field_assignment doesn't handle subregs well. */ >> +&& REG_P (operands[0]) > > Another reason to not use make_field_assignment, I'm afraid :-( Perhaps. I'd probably want to avoid SUBREGs here regardless :-) jeff
Re: [aarch64][RFA][rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64
Hi Jeff, On Mon, Apr 22, 2019 at 09:09:12AM -0600, Jeff Law wrote: > First, it re-uses combine's make_field_assignment in the aarch64 > backend. That's not really acceptable. make_field_assignment depends intimately on make_extraction, which is very much combine-specific (and wrong in so many ways it's hard to tell). If you now use this in other places than combine, we will *never* be able to fix it. It's also shoddy architecture, of course. > So we don't have to duplicate any of that logic. I scanned > make_field_assignment and its children to make sure it didn't use any > data that was only valid during the combine pass, but I could have > missed something. This is one of those cases where encapsulating the > pass specific bits in a class really helps avoid problems... Just > saying It isn't the data that is the problem. combine has almost no private data. But the problem is all the hidden assumptions combine makes, and for example that it often make **non-canonical** RTL; this is no problem in combine, because that code will just not recog() (except of course that you then get less well optimised code, in general). (Often, combine makes that RTL canonical later, fwiw). But it _is_ a big problem in other contexts: you should not create non-canonical RTL. In your specific case, where you want to run this from a splitter, you have to make sure you get only correct, recognised patterns for your machine. make_field_extraction cannot guarantee you that. > Second, it relaxes the main insv pattern to allow an immediate in the > operand predicate, but forces it into a register during LRA. I don't > generally like having predicates that are looser than the constraints, > but it really helps here. Basically we want to see the constant field > we're going to insert. You also have an insn condition that can become invalid by the time the pattern is split (the !reload_complted part), which means the pattern will then *not* be split, which won't work here (the template is "#"). > -static int > +int > get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen) If you want to reuse this, please give it a better name, and move it to rtlanal or some header file or similar? It may be nicer to make a more machine-specific routine for this, see rs6000_is_valid_mask for example. > +;; This must be split before register allocation and reloading as > +;; we need to verify it's actually an bitfield insertion by > +;; examining both immediate operands. After reload we will lose > +;; knowledge of operands[3] constant status. I don't understand what this means? After reload operands[3] is still a const_int? > +(define_insn_and_split "" Give this a name? Starting with * if you don't want a gen_* for it. > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0") > + (match_operand:GPI 2 "const_int_operand" "n")) > + (match_operand:GPI 3 "const_int_operand" "n")))] > + "(!reload_completed > +/* make_field_assignment doesn't handle subregs well. */ > +&& REG_P (operands[0]) Another reason to not use make_field_assignment, I'm afraid :-( Segher
Re: [aarch64][RFA][rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64
On 4/22/19 6:39 PM, Kugan Vivekanandarajah wrote: > Hi Jeff, > > [...] > > + "#" > + "&& 1" > + [(const_int 0)] > + "{ > + /* If we do not have an RMW operand, then copy the input > + to the output before this insn. Also modify the existing > + insn in-place so we can have make_field_assignment actually > + generate a suitable extraction. */ > + if (!rtx_equal_p (operands[0], operands[1])) > + { > + emit_move_insn (operands[0], operands[1]); > + XEXP (XEXP (SET_SRC (PATTERN (curr_insn)), 0), 0) = copy_rtx (operands[0]); > + } > + > + rtx make_field_assignment (rtx); > + rtx newpat = make_field_assignment (PATTERN (curr_insn)); > + gcc_assert (newpat); > + emit_insn (newpat); > + DONE; > > It seems that make_field_assignment returns a new pattern only if it > succeeds and returns the same pattern otherwise. So I am wondering if > it is worth simplifying the above. Like removing the assert and > checking/inserting move only when new pattern is returned? Thanks, this is something I meant to clean up and just totally forgot. If the assert ever triggers it means something has gone horribly wrong. Essentially the insn's condition wasn't sufficiently tight. Catching it with the assert at this point is much easier to debug. The failure modes when you don't catch the failure to create a field assignment here occur much later in the pipeline and it won't be obvious why things went wrong (I know that from experience :-). So please consider the assert to be gcc_assert (newpat != PATTERN (curr_insn); Jeff
Re: [aarch64][RFA][rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64
Hi Jeff, [...] + "#" + "&& 1" + [(const_int 0)] + "{ + /* If we do not have an RMW operand, then copy the input + to the output before this insn. Also modify the existing + insn in-place so we can have make_field_assignment actually + generate a suitable extraction. */ + if (!rtx_equal_p (operands[0], operands[1])) + { + emit_move_insn (operands[0], operands[1]); + XEXP (XEXP (SET_SRC (PATTERN (curr_insn)), 0), 0) = copy_rtx (operands[0]); + } + + rtx make_field_assignment (rtx); + rtx newpat = make_field_assignment (PATTERN (curr_insn)); + gcc_assert (newpat); + emit_insn (newpat); + DONE; It seems that make_field_assignment returns a new pattern only if it succeeds and returns the same pattern otherwise. So I am wondering if it is worth simplifying the above. Like removing the assert and checking/inserting move only when new pattern is returned? Thanks, Kugan > > Jeff
[aarch64][RFA][rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64
This is a generalized version of the patch I sent last week. This variant fixes the remainder of the insv_1 and insv_2 failures on aarch64. In combination with an updated patch from Steve this should be enough to fix 87763. A couple notes. First, it re-uses combine's make_field_assignment in the aarch64 backend. So we don't have to duplicate any of that logic. I scanned make_field_assignment and its children to make sure it didn't use any data that was only valid during the combine pass, but I could have missed something. This is one of those cases where encapsulating the pass specific bits in a class really helps avoid problems... Just saying Second, it relaxes the main insv pattern to allow an immediate in the operand predicate, but forces it into a register during LRA. I don't generally like having predicates that are looser than the constraints, but it really helps here. Basically we want to see the constant field we're going to insert. Primarily looking for feedback from the aarch64 maintainers on the pattern and Segher on the re-use of make_field_assignment. I've bootstrapped and regression tested on aarch64 and aarch64_be as well as x86_64, ppc64le, ppc64, i686, etc. It's also been regression tested on a nice variety of *-elf targets. Segher, Richard/James OK for the trunk? Jeff diff --git a/gcc/combine.c b/gcc/combine.c index 5616e6b1bac..03be306f5bc 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -449,8 +449,6 @@ static rtx expand_compound_operation (rtx); static const_rtx expand_field_assignment (const_rtx); static rtx make_extraction (machine_mode, rtx, HOST_WIDE_INT, rtx, unsigned HOST_WIDE_INT, int, int, int); -static int get_pos_from_mask (unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT *); static rtx canon_reg_for_combine (rtx, rtx); static rtx force_int_to_mode (rtx, scalar_int_mode, scalar_int_mode, scalar_int_mode, unsigned HOST_WIDE_INT, int); @@ -459,7 +457,6 @@ static rtx force_to_mode (rtx, machine_mode, static rtx if_then_else_cond (rtx, rtx *, rtx *); static rtx known_cond (rtx, enum rtx_code, rtx, rtx); static int rtx_equal_for_field_assignment_p (rtx, rtx, bool = false); -static rtx make_field_assignment (rtx); static rtx apply_distributive_law (rtx); static rtx distribute_and_simplify_rtx (rtx, int); static rtx simplify_and_const_int_1 (scalar_int_mode, rtx, @@ -8569,7 +8566,7 @@ make_compound_operation (rtx x, enum rtx_code in_code) *PLEN is set to the length of the field. */ -static int +int get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen) { /* Get the bit number of the first 1 bit from the right, -1 if none. */ @@ -8584,7 +8581,8 @@ get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen) if (len <= 0) pos = -1; - *plen = len; + if (plen) +*plen = len; return pos; } @@ -9745,7 +9743,7 @@ rtx_equal_for_field_assignment_p (rtx x, rtx y, bool widen_x) We only handle the most common cases. */ -static rtx +rtx make_field_assignment (rtx x) { rtx dest = SET_DEST (x); diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5a1894063a1..a2a0a86cb51 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5473,7 +5473,7 @@ [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r") (match_operand 1 "const_int_operand" "n") (match_operand 2 "const_int_operand" "n")) - (match_operand:GPI 3 "register_operand" "r"))] + (match_operand:GPI 3 "aarch64_reg_or_imm" "r"))] "!(UINTVAL (operands[1]) == 0 || (UINTVAL (operands[2]) + UINTVAL (operands[1]) > GET_MODE_BITSIZE (mode)))" @@ -5481,6 +5481,43 @@ [(set_attr "type" "bfm")] ) +;; This must be split before register allocation and reloading as +;; we need to verify it's actually an bitfield insertion by +;; examining both immediate operands. After reload we will lose +;; knowledge of operands[3] constant status. +;; +(define_insn_and_split "" + [(set (match_operand:GPI 0 "register_operand" "=r") + (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0") + (match_operand:GPI 2 "const_int_operand" "n")) +(match_operand:GPI 3 "const_int_operand" "n")))] + "(!reload_completed +/* make_field_assignment doesn't handle subregs well. */ +&& REG_P (operands[0]) +&& get_pos_from_mask (~UINTVAL (operands[2]), NULL) >= 0 +&& (UINTVAL (operands[2]) & UINTVAL (operands[3])) == 0)" + "#" + "&& 1" + [(const_int 0)] + "{ + /* If we do not have an RMW operand, then copy the input + to the output before this insn. Also modify the existing + insn in-place so we can have make_field_assignment actually + generate a suitable extraction. */ + if (!rtx_equal_p (operands[0], operands[1])) + {