On Wed, 2024-02-28 at 08:58 +0100, Richard Biener wrote:
> On Tue, Feb 27, 2024 at 10:20 PM Robert Dubner <rdub...@symas.com>
> wrote:
> > 
> > Richard,
> > 
> > Thank you very much for your comments.
> > 
> > When I set out to create the capability, I had a "specification" in
> > mind.
> > 
> > I didn't have a clue how to create a GENERIC tree that could be fed
> > to the
> > middle end in a way that would successfully result in an
> > executable.  And I
> > needed to be able to do that in order to proceed with the project
> > of
> > creating a COBOL front end.
> > 
> > So, I came up with the idea of using GCC to compile simple
> > programs, and to
> > hook into the compiler to examine the trees fed to the middle end,
> > and to
> > display those trees in the human-readable format I needed to
> > understand
> > them.  And that's what I did.
> > 
> > My first incarnation generated pure text files, and I used that to
> > get
> > going.
> > 
> > After a while I realized that when I used the output file, I was
> > spending a
> > lot of time searching through the text files.  And I had the
> > brainstorm!
> > Hyperlinks!  HTML files!  We have the technology!  So, I created
> > the .HTML
> > files as well.
> > 
> > I found this useful to the point of necessity in order to learn how
> > to
> > generate the GENERIC trees.  I believe it would be equally useful
> > to the
> > next developer who, for whatever reason, needs to understand, on a
> > "You need
> > to learn the alphabet before you can learn how to read" level, what
> > the
> > middle end requires from a GENERIC tree generated by a front end.
> > 
> > But I've never used it on a complex program. I've used it only to
> > learn how
> > to create the GENERIC nodes for very particular things, and so I
> > would use
> > the -fdump-generic-nodes feature on a very simple C program that
> > demonstrated, in isolation, the feature I needed.  Once I figured
> > it out, I
> > would create front end C routines or macros that used the
> > tree.h/tree.cc
> > features to build those GENERIC trees, and then I would move on.
> > 
> > I decided to offer it up here, in order to to learn how to create
> > patches
> > and to get
> > to know the people and the process, as well as from the desire to
> > share it.
> > And instantly I got the "How about a machine-readable format?"
> > comments.
> > Which are reasonable.  So, because it wasn't hard, I hacked at the
> > existing
> > code to create a JSON output.  (But I remind you that up until now,
> > nobody
> > seems to have needed a JSON representation.)
> > 
> > And your observation that the human readable representation could
> > be made
> > from the JSON representation is totally accurate.
> > 
> > But that wasn't my specification.  My specification was "A tool so
> > that a
> > human being can examine a simple GENERIC tree to learn how it's
> > done."
> > 
> > But it seems to me that we are now moving into the realm of a new
> > specification.
> > 
> > Said another way:  To go from "A human readable representation of a
> > simple
> > GENERIC tree" to "A machine readable JSON representation of an
> > arbitrarily
> > complex GENERIC tree, from which a human readable representation
> > can be
> > created" means, in effect, starting over on a different project
> > that I don't
> > need.  I already *have* a project that I am working on -- the COBOL
> > front
> > end.
> > 
> > The complexity of GENERIC trees is, in my experienced opinion, an
> > obstacle
> > for the creation of front ends.  The GCC Internals document has a
> > lot of
> > information, but to go from it to a front end is like using the
> > maintenance
> > manual for an F16 fighter to try to learn to fly the aircraft.
> > 
> > The program "main(){}" generates a tree with over seventy nodes.  I
> > see no
> > way to document why that's true; it's all arbitrary in the sense
> > that "this
> > is how GCC works".  -fdump-generic-nodes made it possible for me to
> > figure
> > out how those nodes are connected and, thus, how to create a new
> > front end.
> > I figure that other developers might find it useful, as well.
> > 
> > I guess I am saying that I am not, at this time, able to work on a
> > whole
> > different tool.  I think what I have done so far does something
> > useful that
> > doesn't seem to otherwise exist in GCC.
> > 
> > I suppose the question for you is, "Is it useful enough?"
> > 
> > I won't be offended if the answer is "No" and I hope you won't be
> > offended
> > by my not having the bandwidth to address your very thoughtful and
> > valid
> > observations about how it could be better.
> 
> No offense taken - I did realize how useful this was to you (and
> specifically
> the hyper-linking looked even very useful to me!).  I often lament
> the lack
> of domain-specific visualization tools for the various data
> structures GCC
> has - having something for GENERIC would be very welcome.
> 
> We have for example ways to dump graphviz .dot format graphs of the
> CFG
> and some other data structures and do that natively, not via JSON
> indirection.

FWIW for GCC 15 I've been experimenting with adding a
text_art::tree_widget class; with that, the analyzer can visualize an
ana::program_state instance like this (potentially with colorization in
suitable terminals):

State
├─ Region Model
│  ├─ Current Frame: frame: ‘test_7’@1
│  ├─ Store
│  │  ╰─ root region
│  │     ╰─ (*INIT_VAL(a_10(D)))
│  │        ╰─ bytes 12-15: ‘int’ {(int)42}
│  ╰─ Constraints
│     ├─ Equivalence class ec0
│     │  ├─ (void *)0B
│     │  ╰─ ‘0B’
│     ├─ Equivalence class ec1
│     │  ╰─ INIT_VAL(a_10(D))
│     ├─ Equivalence class ec2
│     │  ╰─ (INIT_VAL(a_10(D))+(sizetype)12)
│     ├─ ec1: {INIT_VAL(a_10(D))} != ec0: {(void *)0B == [m_constant]‘0B’}
│     ╰─ ec2: {(INIT_VAL(a_10(D))+(sizetype)12)} != ec0: {(void *)0B == 
[m_constant]‘0B’}
╰─ ‘malloc’ state machine
   ╰─ 0x62082e0: (INIT_VAL(a_10(D))+(sizetype)12): assumed-non-null (in frame: 
‘test_7’@1)

and such visualizations could be added for other hierarchical data
structures.  Also, because it uses text_art::widget, the content at a
tree node doesn't have to be purely textual, and we could do things
like the following (which is a mockup):

State
├─ Region Model           Bound value │ Effective value
│  ├─ Stack                           │
│  │  ├─ frame@0 'foo'                │
│  │  │  ├─ 'i'           (int)42     │ 
│  │  │  ╰─ 'j'                       │ UNINIT(int)
│  │  ╰─ frame@1 'bar'                │
│  │     ╰─ 'k'                       │
│  │        ╰─ [3]                    │
│  │           ├─ .x      INIT_VAL(p) │
│  │           ╰─ .y      INIT_VAL(q) │
│  ╰─ Globals                         │
│     ╰─ 'baz'                        │
├─ Constraints
│  ╰─ (etc)
╰─ 'malloc' state
   ╰─ CONJURED_VALUE('ptr') unchecked('free')


That said, our "tree" structure is arguably a directed graph rather
than a tree (consider e.g. types)

> 
> Incidentially this looks like something fit for a google summer of
> code project.
> Ideally it would hook into print-tree.cc providing an alternate
> structured output.
> It currently prints in the style
> 
>  <function_decl 0x7ffff71bc600 bswap16
>     type <function_type 0x7ffff71ba5e8
>         type <integer_type 0x7ffff702b540 short unsigned int public
> unsigned HI
>             size <integer_cst 0x7ffff702d108 constant 16>
>             unit-size <integer_cst 0x7ffff702d120 constant 2>
>             align:16 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff702b540 precision:16 min <integer_cst
> 0x7ffff702d138 0> max <integer_cst 0x7ffff702d0f0 65535>>
>         QI
>         size <integer_cst 0x7ffff702d048 constant 8>
>         unit-size <integer_cst 0x7ffff702d060 constant 1>
> ...
> 
> where you can see it follows tree -> tree edges up to some depth
> (and avoids repeated expansion).  When debugging that's all I have
> and I have to follow edges by matching up the raw addresses printed,
> re-dumping those that didn't get expanded.  HTML would be indeed
> so much nicer here (and a more complete output).
> 
> From a maintainance point I think it's important to have "dump a tree
> node"
> once, so when fields are added or deemed useful for presenting in a
> dump
> you don't have to chase down more than one place.  Maintenance is
> also
> the reason to not simply accept your contribution as-is.

Presumably we'd want some kind of "visitor" code that captures visiting
the fields of each type in one place, allowing for different consumers
of this data (HTML generation, JSON generation etc).  Though I'm not
sure how best to express this double-dispatch (by templates, vfunc, or
whatnot).

We already have some of this in the form of gengtype and what it
generates, but I suspect we don't want to add further reliance on
gengtype.

> 
> I do hope this eventually gets picked up.  I've added a project idea
> to https://gcc.gnu.org/wiki/SummerOfCode
> and would be willing to mentor it.

FWIW you added it to the "Other projects and project ideas" below
"Other Project Ideas", where it's much less likely to get noticed than
under the "Selected Project Ideas".
> 
> Oh, and I'm looking forward to the actual Cobol work!

Likewise

[...snip...]

Dave

Reply via email to