On Tue, Feb 13, 2024 at 8:47 PM Robert Dubner <rdub...@symas.com> wrote: > > I have not contributed to GCC before, so I am not totally sure how to go > about it. > > So, I am letting you know what I want to do, so that I can get advice on a > good way to do it. I have read https://gcc.gnu.org/contribute.html, and I > have reviewed the Gnu Coding Standards and the GCC additional coding > standards, so I have some idea of what's needed. But there is a gulf > between theory and practice, and I am hoping for guidance. > > Jim Lowden and I have been developing a COBOL front end for GCC. He's > primarily been parsing the language. It's been my task to generate the > GENERIC/GIMPLE trees for the parsed code. We've been working at this for > a couple of years. We have reached the point where we want to start > submitting patches for the community to evaluate. > > I figured I would start small, where "small" means mainly one new source > code file of 1,580 lines. > > When I first started trying to generate GIMPLE trees to implement > functions, it became clear to me that I needed to be able to > reverse-engineer known good trees generated by the C front end. Oh, I > could see what other front ends were doing in their source code. But I > didn't know what the goal was. I wanted to see not just individual nodes, > but how they all related to each other. > > There didn't seem to be any such functionality in GCC. I found a routine > in print-tree.cc which printed out a single node, but I needed to > understand the entire tree of nodes for a function. And I very quickly > got tired -- very tired -- of trying to figure out the relationships > between nodes, and I wanted more information than the print-tree routines > were providing. > > So, I created the gcc/dump-gimple-nodes.cc source code, which implements > the dump_gimple_nodes() function, which is controlled by the new > -fdump-gimple-nodes GCC command-line option. That option hooks into the > top of the gimplify_function_tree() function in gcc/gimplify.cc.
A first comment is that you seem to dump the GENERIC graph the frontend feeds to the gimplifier. So this isn't GIMPLE just yet, so it possibly should be dump_generic_nodes (). We dump a textual representation at a similar state with -fdump-tree-original. There's a -raw modifier that for example for C streams ;; Function main (null) ;; enabled by -tree-original @1 statement_list 0 : @2 1 : @3 @2 bind_expr type: @4 body: @5 @3 return_expr type: @4 expr: @6 @4 void_type name: @7 algn: 8 @5 statement_list @6 modify_expr type: @8 op 0: @9 op 1: @10 @7 type_decl name: @11 type: @4 @8 integer_type name: @12 size: @13 algn: 32 prec: 32 sign: signed min : @14 max : @15 ... I didn't track down where the C frontend triggers this or what utility it uses in the end. It is also somewhat frontend specific, likely before genericization. I agree with Andi that these days sth more structured might be preferable (but your html example might be good to parse and click through for a human) > The dump_gimple_nodes() function does a depth-first walk of the specified > function_decl, outputting each node once in a readable format. Each node > gets an arbitrary identifying number. There are two output files; the > first, "func_name.nodes", is pure text. After I got tired of endlessly > searching through the text file for the next node of interest, I created > the "func_name.nodes.html" file, which is the same information with > internal hyperlinks between the nodes. > > Here are the first two nodes of a typical simple function: > > ***********************************This is NodeNumber0 > (0x7f12e13b0d00) NodeNumber0 > tree_code: function_decl > tree_code_class: tcc_declaration > base_flags: static public > type: NodeNumber1 function_type > name: NodeNumber6410 identifier_node "main" > context: NodeNumber107 translation_unit_decl "bigger.c" > source_location: bigger.c:7:5 > uid: 3663 > initial(bindings): NodeNumber6411 block > machine_mode: QI(15) > align: 8 > warn_if_not_align: 0 > pt_uid: 3663 > raw_assembler_name: NodeNumber6410 identifier_node "main" > visibility: default > result: NodeNumber6412 result_decl > function(pointer): 0x7f12e135d508 > arguments: NodeNumber6413 parm_decl "argc" > saved_tree(function_body): NodeNumber6417 statement_list > function_code: 0 > function_flags: public no_instrument_function_entry_exit > ***********************************This is NodeNumber1 > (0x7f12e13b3d20) NodeNumber1 > tree_code: function_type > tree_code_class: tcc_type > machine_mode: QI(15) > type: NodeNumber2 integer_type > address_space:0 > size(in bits): NodeNumber55 uint128 8 > size_unit(in bytes): NodeNumber12 uint64 1 > uid: 1515 > precision: 0 > contains_placeholder: 0 > align: 8 > warn_if_not_align: 0 > alias_set_type: -1 > canonical: NodeNumber1 function_type > main_variant: NodeNumber1 function_type > values: NodeNumber6408 tree_list > *********************************** > > Note how even when an attribute points to another node, e.g., > > arguments: NodeNumber6413 parm_decl "argc" > > the output routine goes down another level or two in an attempt to make it > more meaningful. The attribute points just to NodeNumber6413, but the > output shows that node to be a parm_decl, and there is additional code > that recognizes that a parm_decl has an identifier_node with the value > "argc". > > An example of a complete dump is available at > https://www.dubner.com/main.nodes.html. The C source code that generated > it is available at the end of > https://cobolworx.com/pages/dump-gimple-nodes.html > > I found this feature to be absolutely necessary when figuring out how > working front ends built valid GIMPLE trees for functions. I am hopeful > other developers can see the utility. > > Does this require any further discussion? Or is my next step to start > developing the series of patches that will create the dump-gimple-nodes > source code, and that will modify Makefile.in, gimplify.cc, and common.opt > to incorporate it? > > Thanks so much for any suggestions and guidance, > > Bob Dubner >