Re: [PATCH 000/236] Introduce rtx subclasses

David Malcolm Tue, 12 Aug 2014 17:30:06 -0700

On Tue, 2014-08-12 at 14:39 -0600, Jeff Law wrote:
> On 08/06/14 11:19, David Malcolm wrote:
> >
> > The aim of the patch series is to improve the type-safety and
> > readability of the backend by introducing subclasses of rtx (actually
> > rtx_def) for *instructions*, and also for EXPR_LIST, INSN_LIST, SEQUENCE.
> >
> > That way we can document directly in the code the various places that
> > manipulate insn chains vs other kinds of rtx node.
> Right.  And by catching this stuff at compile time developers working on 
> RTL bits should become at least a tiny bit more efficient.
> >
> > The class hierarchy looks like this (using indentation to show
> > inheritance, and indicating the invariants):
> >
> > class rtx_def;
> >    class rtx_expr_list;           /* GET_CODE (X) == EXPR_LIST */
> >    class rtx_insn_list;           /* GET_CODE (X) == INSN_LIST */
> >    class rtx_sequence;            /* GET_CODE (X) == SEQUENCE */
> >    class rtx_insn;                /* INSN_CHAIN_CODE_P (GET_CODE (X)) */
> >      class rtx_real_insn;         /* INSN_P (X) */
> >        class rtx_debug_insn;      /* DEBUG_INSN_P (X) */
> >        class rtx_nonjump_insn;    /* NONJUMP_INSN_P (X) */
> >        class rtx_jump_insn;       /* JUMP_P (X) */
> >        class rtx_call_insn;       /* CALL_P (X) */
> >      class rtx_jump_table_data;   /* JUMP_TABLE_DATA_P (X) */
> >      class rtx_barrier;           /* BARRIER_P (X) */
> >      class rtx_code_label;        /* LABEL_P (X) */
> >      class rtx_note;              /* NOTE_P (X) */
> So I think the only real question here is do we want to distinguish 
> between rtx_real_insn and the others?
> 
> When I see "rtx_real_insn" my first thought is something that's going to 
> turn into an instruction.  But that's not true in this hierarchy due to 
> rtx_debug_insn.
> 
> So throughout the work-to-date, did you find much use for rtx_real_insn?


Essentially zero in the work-to-date.

Of the 13 new subclasses of rtx, I only make major use of about half of
them; here are the frequencies (as reported by grep -w in my current
working copy):

      class rtx_def;   /* ~26000 for "rtx" */
        class rtx_expr_list;           /* 73 */
        class rtx_insn_list;           /* 130 */
        class rtx_sequence;            /* 71 */
        class rtx_insn;                /* ~4300 */
          class rtx_real_insn;         /* 17 */
            class rtx_debug_insn;      /* 9 */
            class rtx_nonjump_insn;    /* 5 */
            class rtx_jump_insn;       /* 9 */
            class rtx_call_insn;       /* 25 */
          class rtx_jump_table_data;   /* 41 */
          class rtx_barrier;           /* 25 */
          class rtx_code_label;        /* 176 */
          class rtx_note;              /* 63 */

i.e. rtx_real_insn is basically unused, as are rtx_debug_insn,
rtx_nonjump_insn and rtx_jump_insn.

I think there's a case for keeping the concrete subclasses (those last
three), but we can drop rtx_real_insn.

I did some analysis of the operands of the various subclasses:

     class rtx_def;
       class rtx_expr_list;           /* "ee" */
       class rtx_insn_list;           /* "ue" */
       class rtx_sequence;            /* "E"  */
       class rtx_insn;
         class rtx_real_insn;
                                      /*  01234567 */
           class rtx_debug_insn;      /* "uuBeiie" */
           class rtx_nonjump_insn;    /* "uuBeiie" */
           class rtx_jump_insn;       /* "uuBeiie0" */
           class rtx_call_insn;       /* "uuBeiiee" */
         class rtx_jump_table_data;   /* "uuBe0000" */
         class rtx_barrier;           /* "uu00000"  */
         class rtx_code_label;        /* "uuB00is"  */
         class rtx_note;              /* "uuB0ni"  */
/*                                        01234567
                                          ││││││││
            XEXP (insn, 0):   PREV_INSN()─┘│││││││
            XEXP (insn, 1):    NEXT_INSN()─┘││││││
          XBBDEF (insn, 2):BLOCK_FOR_INSN()─┘│││││
            XEXP (insn, 3):        PATTERN()─┘││││
           XUINT (insn, 4):   INSN_LOCATION()─┘│││
            XINT (insn, 5):        INSN_CODE()─┘││
            XEXP (insn, 6):         REG_NOTES()─┘│
  XEXP(INSN, 7): CALL_INSN_FUNCTION_USAGE(INSN) ─┘ */

with an rtx_real_insn you're guaranteed at least a "uuBeiie".  But
nothing is using that in the patches as they stand, so we can simply
drop the class.

Perhaps the class hierarchy diagram in coretypes.h should gain the above
operand annotation?


> > The patch series also contains some cleanups using inline methods:
> >
> >    * being able to get rid of this boilerplate everywhere that jump tables
> >      are handled:
> >
> >        if (GET_CODE (PATTERN (table)) == ADDR_VEC)
> >          vec = XVEC (PATTERN (table), 0);
> >        else
> >          vec = XVEC (PATTERN (table), 1);
> >
> >     in favor of a helper method (inlined):
> >
> >        vec = table->get_labels ();
> Right.  I suspect there's other cleanups like this all over the place 
> that we could do and probably should in the interest of readability.

(nods)

> >    * having a subclass for EXPR_LIST allows for replacing this kind of
> >      thing:
> >
> >        for (x = forced_labels; x; x = XEXP (x, 1))
> >          if (XEXP (x, 0))
> >             set_label_offsets (XEXP (x, 0), NULL_RTX, 1);
> >
> >      with the following, which captures that it's an EXPR_LIST, and makes
> >      it clearer that we're simply walking a singly-linked list:
> >
> >        for (rtx_expr_list *x = forced_labels; x; x = x->next ())
> >          if (x->element ())
> >            set_label_offsets (x->element (), NULL_RTX, 1);
> And would could argue here that we really just want to utilize some 
> standard list container.  I'm not sure what we're getting by rolling our 
> own.  When we discussed this a few weeks ago, I probably should have 
> pushed a bit harder on that.  But I think we can/should table that 
> discussion.  What you've done is a clear step forward and I think should 
> be reviewed on those merits.  If someone wants to go another step 
> forward and use standard containers, that can be dealt with independently.

(nods)

> >
> > There are some surface details to the patches:
> >
> >    * class names.  The subclass names correspond to the lower_case name
> >      from rtl.def, with an "rtx_" prefix.  "rtx_insn" and "rtx_real_insn"
> >      don't correspond to concrete node kinds, and hence I had to invent
> >      the names.  (In an earlier version of the patches these were
> >      "rtx_base_insn" and "rtx_insn" respectively, but the former occurred
> >      much more than the latter and so it seemed better to use the shorter
> >      spelling for the common case).
> I can live with the existing class names.

(nods)

> >
> >    * there's a NULL_RTX define in rtl.h.   In an earlier version of the
> >      patch I added a NULL_INSN define, but in this version I simply use
> >      NULL, and I'm in two minds about whether a NULL_INSN is desirable
> >      (would we have a NULL_FOO for all of the subclasses?).  I like having
> >      a strong distinction between arbitrary RTL nodes vs instructions,
> >      so maybe there's a case for NULL_INSN, but not for the subclasses?
> No strong opinion here.  I think we added NULL_TREE/NULL_RTX.  I could 
> possibly see extending that to the overall concept of "insn chain 
> things", but I think doing one for each subclass would probably be overkill.

In the absence of strong opinions, maybe we should proceed with the
patches as written i.e. *without* a NULL_INSN define.


> >    * I added an "rtx_real_insn" subclass for the INSN_P predicate, adding
> >      the idea of a PATTERN, a basic_block, and a location - but I hardly
> >      use this anywhere.  That said, it seems to be a real concept in the
> >      code, so I added it.
> So DEBUG_INSN has a pattern and that's how it got into rtx_real_insn. 
> Hmmm.  Would agree it's a real concept in the code, but as much as I 
> hate bikeshedding on names, this is one that just doesn't fit well.  If 
> you didn't use it much, maybe it's fixable without an enormous amount of 
> renaming work.

I now think my comment above is incorrect, based on re-reading the
operands in rtl.def and my chart above.  So let's just drop
rtx_real_insn; it shouldn't affect the patches much.

> >      So in the following patches the pointerness is explicit: the patches
> >      refer to:
> >         rtx_insn *insn;
> >      rather than just:
> >         rtx_insn insn;
> >      and so one can write:
> >         const rtx_insn *insn
> >      and the "constness" applies to the insn, not to the pointer.
> Seems like the right direction to me.

(nods)

> >    * Should as_a <rtx_insn *> accept a NULL pointer?  It's possible to make
> >      the is_a_helper cope with NULL, but this adds an extra conditional.
> >      I instead added an as_a_nullable<> cast, so that you can explicitly
> >      add a check against NULL before checking the code of the rtx node.
> >      But maybe it's cleaner to simply have is_a_helper<rtx_insn *> cope
> >      with NULL?
> I'd think not.

FWIW I seem to recall Andrew MacLeod saying that in at least one
iteration of his tree separation patches that he'd written an
is_a_helper<T> for some T that coped with NULL.

> > Some deeper questions:
> >
> >    * should rtx_insn eventually be a separate class from rtx, separating
> >      insn chain nodes from rtx nodes?  I don't know if that's a worthwhile
> >      longterm goal, but this patch series gets us somewhere closer to
> >      being able to achieve that.  (actually getting there would be a much
> >      more invasive set of patches).
> I think so and it generally matches what we've done with gimple.  

OK

> One could even argue that the insn chain itself really should just be a 
> standard container class, but I think that's much further out.

Interesting.

> > To keep this reviewable, and to try to mitigate bitrot, I've chopped it up
> > into numerous relatively small patches.  The aim is that at every patch,
> > the code correctly builds on all supported configurations.  That said,
> > there's a complicated dependency graph of types in gcc's code, so to
> > tame that, the patch series is divided into 6 phases:
> So from a staging standpoint I think you should install as we go rather 
> than waiting for the entire set to be approved.  My worry is that given 
> the size, by the time we get it reviewed & updated, there'll have been 
> more bitrot and you end up iterating quite a bit just because it takes 
> time to fixup details in your patchkit and the code is continually 
> changing under you.

OK.


> >    * phase 1 adds "scaffolding": in various places, strengthen the return
> >      types from internal APIs and macros so as to promise an rtx_insn *
> >      rather than a plain rtx.  For example, the DEP_PRO/DEP_CON macros
> >      in sched-int.h which lookup fields within struct _dep become inline
> >      functions that return rtx_insn * (using checked casts).  In this way,
> >      stronger type information can be used by subsequent patches whilst
> >      avoiding the chicken-and-egg issue since writes to the fields can
> >      still be arbitrary rtx nodes, according to the type system at least.
> Right.
> 
> >
> >    * phase 2: an alphabetical tour of the backend: a series of patches,
> >      from alias.c through web.c, each patch touching one file,
> >      strengthening the types within it as much as we can at that point.
> >      The patches sometimes make use of the alphabetic ordering in order to
> >      use APIs that have already been strengthened to work on rtx_insn.
> OK.  Good to know that patch later patch in this part may depend on the 
> API guarantees from an earlier patch.
> 
> >
> >    * phase 3: similar to phase 2, but for the various config
> >      subdirectories.
> Right.  I'd think in phase 3, each backend ought to be independent.  So 
> various backend changes ought to be able to go in, even if there's 
> something contentious in a different config directory.

Yes: I *think* all of these ones are independent from each other.

> >    * phase 4: tears down the scaffolding, replacing checked casts as much
> >      as possible by strengthening core fields of core types.  For example,
> >      we eventually reach the point in sched-int.h where the fields "pro"
> >      and "con" within struct _dep can become rtx_insn *, and hence
> >      DEP_PRO/DEP_CON can be converted back to plain macros, without needing
> >      the checked cast.
> Right.
> 
> 
> >
> >    * phase 5: all of the above was for instructions; this phase adds three
> >      subclasses for other node kinds.  I experimented with subclasses for
> >      various node kinds; these three seemed most appropriate: EXPR_LIST,
> >      INSN_LIST, SEQUENCE.  I kept these as a separate phase as Jeff asked
> >      me to separate them from the instruction patches, to avoid
> >      complicating things, but I think these three are also a worthwhile
> >      cleanup with relatively little complexity.
> Ok.  Wished we'd talked more about this, but it's not the end of the world.

The impression I got of the mood in the room at my Cauldron talk was
that people were broadly happy with these three classes; albeit perhaps
as a stepping stone to using other container classes...

> >    * phase 6: (new since my Cauldron talk): this phase freely uses
> >      EXPR_LIST, INSN_LIST, SEQUENCE to do cleanups that were awkward
> >      without them.  In particular, by the end of this phase, NEXT_INSN()
> >      and PREV_INSN() require rtx_insn * rather than plain rtx.
> Woo Wooo!

...which is why I used them in the big push for the params of
NEXT_INSN/PREV_INSN in phase 6.

> So out of curiosity, any thoughts on what other big things are out there 
> that need to be fixed.  I'm keen to keep this stuff moving as much as 
> possible.

Arguably PATTERN() should require an rtx_insn * rather than a plain rtx,
but it would be an involved patch.

Other followups would be to reduce the number of as_a <rtx_insn *> in
the code; for example grepping for "uncast_" shows quite a few, a
pattern where I strengthen the type of a parameter as seen within the
function without needing to strengthen the param itself, where:

   void foo (rtx insn)
   {
        /* do things with insn */
   }

becomes:

   void foo (rtx uncast_insn)
   {
        rtx_insn *insn = as_a_nullable <rtx_insn *> (uncast_insn);
        /* do things with insn */
   }

which is something of a shortcut to doing it "properly" as:

   void foo (rtx_insn *insn)
   {
        /* do things with insn */
   }

Major (ab)users of the above right now are e.g. the emit_*_{before|
after}() functions in emit-rtl.c.


> > The patches (and my control for testing) has been on top of r211061, which
> > being from 2014-05-29 is now two months old, but hopefully is still
> > reviewable; naturally I'll perform bootstrap and regression testing for
> > each patch in turn before committing.
> It's still reviewable, though you'll likely find some minor changes will 
> be needed just due to churn.  The iterator/for_each_rtx work comes 
> immediately to mind, but I'm sure there'll be others.

That one occurred to me also.

> A part of me really wonders if the 6 phases above should have been 
> submitted independently.  ie, once the scaffolding was done why not go 
> ahead and get that reviewed & installed, then move on to phase2 patches. 
>   I bring it up more for future work of a similar nature.
> 
> I realize that you had to do a fair amount of the later work to ensure 
> the scaffolding was right and so that we could see what the end result 
> would likely look like.  But something feels like it could be staged better.

FWIW What you're seeing is the end result of a *lot* of 
"git rebase -i", where I was splitting, combining, and reordering
patches: the phases didn't exist as fully-formed entities until I was
ready to send the patches.

Though I appreciate things were suboptimal here.

> Anyway...
> 
> 
> >
> > Performance
> > ===========
> >
> > I tested the performance with --enable-checking=release using
> > two large files (kdecore.cc, bigcode.c), comparing a control build
> > to a patched build.
> >
> > There were no significant differences in compilation time:
> OK.  A bit of a surprise, but then again perhaps not because we don't 
> enable RTL checking by default.  And I suspect a goodly amount of the 
> RTL checking overhead is in the operands, not on the chain itself.
> 
> 
> >>
> >
> > As for the performance of a regular build i.e. with as_a<>
> > checks *enabled*; looking at the wallclock time taken for a bootstrap and
> > regression test, for my s390 builds (with -j3) I saw:
> >
> > s390 control:
> >    "make" time: 68 mins
> >    "make check" time: 122 mins
> >    total time: 190 mins
> >
> > s390 experiment:
> >    "make" time: 70 mins
> >    "make check" time: 126 mins
> >    total time: 196 mins
> >
> > showing a 3% increase, presumably due to the as_a and as_a_nullable
> > checks.
> You want to be real careful benchmarking on ppc/s390 hardware as they're 
> partitioned and what goes on in a different partition can affect your 
> partition.

Ah.  This suggests my testing may need redoing.  I'll try to redo this
on a dedicated x86_64 box.

> > i.e. a release build shows no change in performance; a debug build shows
> > a 3% increase in time taken to bootstrap and regression test.  I believe
> > the debug build could be sped up with further patches to eliminate the
> > checked casts.
> Yea.  As I've mentioned, my vision is to try and push out the casts as 
> much as reasonably possible.

(nods)

Thanks
Dave

Re: [PATCH 000/236] Introduce rtx subclasses

Reply via email to