Hi, On Thu, 10 Mar 2016, Richard Biener wrote:
> Then I'd like to be able to re-construct SSA without jumping through > hoops (usually you can get close but if you require copies propagated in > a special way you are basically lost for example). > > Thus my proposal to make the GSoC student attack the unit-testing > problem by doing modifications to the pass manager and "extending" an > existing frontend (C for simplicity). I think it's wrong to try to shoehorn the gimple FE into the C FE. C is fundamentally different from gimple and you'd have to sprinkle gimple_dialect_p() all over the place, and maintaining that while developing future C improvements will turn out to be much work. Some differences of C and gimple: * C has recursive expressions, gimple is n-op stmts, no expressions at all * C has type promotions, gimple is explicit * C has all other kinds of automatic conversion (e.g. pointer decay) * C has scopes, gimple doesn't (well, global and local only), i.e. symbol lookup is much more complicated * C doesn't have exceptions * C doesn't have class types, gimple has * C doesn't have SSA (yes, I'm aware of your suggestions for that) * C doesn't have self-referential types * C FE generates GENERIC, not GIMPLE (so you'd need to go through the gimplifier and again would feed gimple directly into the passes) I really don't think changing the C FE to accept gimple is a useful way forward. I agree with others that the gimple FE is rather more similar to a textual form of LTO bytecode. In difference to the byte code its syntax has to be defined and kept stable, and yes, it's sensible to get inspiration from C syntax. > It's true that in the ideal world a "GIMPLE frontend" has to work > differently but then to get a 100% GIMPLE frontend you indeed arrive at > what LTO does and then I agree we should attack that from the LTO side > and not develop another thing besides it. But this is a _much_ larger > task and I don't see anyone finishing that. So, you think the advantage of starting with the C FE would be faster return on investment? Even if it were true (and I'm not sure it is), what would it help if it ultimately wouldn't be acceptable for inclusion in GCC proper? > Also keep in mind that the LTO side will likely simplify a lot by > stripping out the ability to stream things that are only required for > debug information (I hopefully will finish LTO early debug for GCC 7). Btw. I'm not suggesting to fit the gimple FE into the LTO infrastructure, that's also leading to a mess IMO. But I do think creating a gimple FE from scratch is easier than fitting it into the C FE. Taking some infrastructure from the LTO frontend (namely all the tree/gimple building routines) and some from the C frontend (namely parts of the parsers, at least for types). > So - can we please somehow focus on the original question about a GSoC > project around the "GIMPLE frontend"? Of course you now can take Davids > promise of spending some development time on this into consideration. > Try drafting something that can be reasonably accomplished with giving > us something to actually use in the GCC 7 timeframe. Okay, here's a rough plan: 1) take the LTO frontend, remove all bytecode routines, partitioning stuff, lto symtab merging stuff, tree merging stuff, WPA stuff; i.e. retain only the few pieces needed for creating tree and those that every frontend needs to provide. Now that frontend should be able to generate an empty compilation unit. 2) implement two stub functions: parsetype and lookuptype, the first always generates a type named 'int32' with the obvious INTEGER_TYPE tree, the second always gives back that type. Now hack on the frontend long enough that this simple input can be parsed and the obvious asm file is generated: "int32 i;". Then "int32 i = 0;". Now the frontend knows global decls and initializers, but the type parsing problem is deferred. 3) implement function parsing. Hack on the frontend long enough that this program is accepted and generates the obvious gimple: int32 main(int32 argc) { int32 a; a_1 = argc_0; _2 = a_1 - argc_0; return _2; } While doing this, think about how to represent default defs, and generally SSA names. 4) implement some more operators for one and two-op statements 5) implement syntax for conditional code, for instance: { L0: if (argc_0 == 1) goto L1; else goto L2; L1: $preds (L0) return 1; L2: $preds (L0) return 2; } 5) think about and implement syntax for PHI nodes, e.g.: { int a; L0: if (argc_0 == 1) goto L1; else goto L2; L1: $preds (L0) a_1 = 1; goto L3; L2: $preds (L0) a_2 = 42 + argc_0; goto L3; L3: $preds (L1, L2) a_3 = $phi (a_1, a_2); return a_3; } See for instance how it's important above that all destinations that are reached by multiple gotos make the order of incoming edges explicit. It can't be left implicit (or needs very strict and checkable rules). 6) implement function calls, requires parsing global function decls (still only using 'int' types) and parsing/generating gcall insns. 7) implement more scalar base types, including pointers, add parsing/generating conversion statements 8) think about and implement syntax for _specifying_ layout of scalar types, as well as attributes (i.e. how to say that type "T" in an signed integer type of 47 bits, aligned 8 bit, size 64 bits) 9) think about and implement syntax for aggregate types 10) implement more parsing for accessing elements of aggregate types 11) hack on the whole think long enough that this can be parsed and generates the obvious gimple: type T1 *char; type T2 const C1; type T3 (C2, ...) : int; // function type decl T3 *printf; type T {int a[10];} T x; int main(int argc, char *argv[]) { T *t; t_1 = &x; t_1->a[0] = argc_0; printf ("you passed %d arguments\n", t1->a[0]); return 0; } Some of the later steps can be done in different order. I think this would give enough to do for GSoC, but with mentoring (especially with the first part, stripping down the LTO frontend to something trivial) it seems achievable. Bonus: implement exceptions, implement generic mean to annotate either statements or types or decls, implement parsing the various GCC builtins. Obviously our gimple dumper should be amended to also be able to emit whatever syntax is decided in the course of implementation. In the process focus should be on generatic a grammar that's easy to parse while retaining some visual pleasance (not all of the above might be good examples). At least the gimple statement syntax should probably be quite similar to our current dump format. At least during development we shouldn't fear revamping the syntax even severely if it furthers those goals, even if testcases then need to be thrown away. Ciao, Michael.