Re: Initial feedback on PAST-pm, or Partridge
On Wed, Dec 06, 2006 at 10:33:45PM -0800, Allison Randal wrote: - In PGE grammars, what is the { ... } at the end of every proto declaration supposed to do? [...] But in the end, I didn't allow simple semicolon terminators simply because it wasn't valid Perl 6 syntax, and in many cases I think that having subtle differences isn't ideal as people may get confused about what is allowed where. But I don't have a large objection to modifying the PGE::Grammar compiler to represent empty declarations with semicolons as well as yada-yada-yada blocks. Excellent. Excellent as in ...? [ ] Go ahead and allow semicolons, since you don't have a large objection. [ ] Your explanation is excellent, stick with the yadas to avoid the subtle contrasts to Perl 6. I prefer option (A), allowing semicolons. The tricky thing is that we're adopting syntax from one use case into another use case. The yadas make perfect sense in the context of a Perl 6 program (where the yada means that the code body will later be filled in), but they make no sense as part of a Parrot parser (where the yada can't be filled in, and is just an artifact). IIUC, PGE's use of yada is actually the same use case as Perl 6. The yadas in Perl 6 can be stubs to be filled in later, but S03 and S06 indicate that yadas are also used as the body in function prototypes, i.e., where the function is actually to be defined somewhere else. To me that feels exactly like what we have here -- the grammar file is prototyping operator functions that are defined somewhere else. (And, for several of the existing compilers, they really *are* function prototypes, in that the function body comes from a PIR function.) Not an immediate priority, though. And, maybe Perl 6 will change and solve the problem for us before we get there. ;) Sounds good to me. It's an easy switch to allow the semicolons when/if we decide to do that. Pm
Re: Initial feedback on PAST-pm, or Partridge
Patrick R. Michaud wrote: But come to think of it, if we had something like Capture PMCs available as a standard type (and an easy way to generate them in PIR), then the existing :vtable('init') would be quite sufficient. To steal from Perl 6's C \(...) capture syntax: $P0 = new 'Foo::Bar', \(param1, param2, 'abc'=param3) .sub 'init' :vtable .param pmc args # initialize self based on array/hash components of args pmc # ... It's a reasonable solution. Have to think a bit more about the syntax for creating them. We have talked about giving PIR some short-cut syntax for creating data structures, as syntactic sugar for the basic 'push' and keyed set operations. It hasn't come to anything yet, but this could be tied into it. Largely, it's the fundamental question of Is PIR an assembly language, or an MLL for humans? The answer is probably Both. We can avoid modifying PIR's fundamental syntax by requiring the initializer argument to be created separately: $P0 = new 'SigHash' $P0.set(param1, param2, 'abc'=param3) $P1 = new 'Foo::Bar', $P0 But, that's one more step than what you're doing now: $P0 = new 'Foo::Bar' $P0.init(param1, param2, 'abc'=param3) An improvement might be through changes to the OO model: $P0 = find_type 'Foo::Bar' # returns a class object $P1 = $P0.new(param1, param2, 'abc'=param3) # new is a class method My point is simply that it's far easier to go from a MLL (whatever syntax) to PIR method calls than to generate specific Parrot opcodes, because method calls have a very regular syntax that Parrot opcodes don't. I would have disagreed a couple months ago, as opcodes were simpler to generate in the old PAST/POST. But with the new implementation I agree. - One more comment in this department: move PIR generation out of the POST node objects. A tree-grammar that outputs PIR code strings isn't a final solution, but it's a more maintainable intermediate step than mingled syntax tree representation and code generation (remember P6C?). I never really dealt with P6C. :-) Lucky you. :) It was great in the early days, and allowed for rapid prototyping, but it grew...um...organically. Still, I can see about moving the code generation out of the POST node objects; I may do it as a lower priority though, since I don't think that aspect is driving many design or implementation decisions for us at this point. Yes, a lower priority is fine. I suspect that Pheme will drive the development of POST, since the Pheme compiler will be working with it directly, rather than treating it as an invisible background step. - In PGE grammars, what is the { ... } at the end of every proto declaration supposed to do? [...] But in the end, I didn't allow simple semicolon terminators simply because it wasn't valid Perl 6 syntax, and in many cases I think that having subtle differences isn't ideal as people may get confused about what is allowed where. But I don't have a large objection to modifying the PGE::Grammar compiler to represent empty declarations with semicolons as well as yada-yada-yada blocks. Excellent. Excellent as in ...? [ ] Go ahead and allow semicolons, since you don't have a large objection. [ ] Your explanation is excellent, stick with the yadas to avoid the subtle contrasts to Perl 6. I prefer option (A), allowing semicolons. The tricky thing is that we're adopting syntax from one use case into another use case. The yadas make perfect sense in the context of a Perl 6 program (where the yada means that the code body will later be filled in), but they make no sense as part of a Parrot parser (where the yada can't be filled in, and is just an artifact). Not an immediate priority, though. And, maybe Perl 6 will change and solve the problem for us before we get there. ;) Allison
Re: Initial feedback on PAST-pm, or Partridge
Patrick R. Michaud wrote: On Mon, Nov 27, 2006 at 09:20:08PM -0800, Allison Randal wrote: chromatic's suggestion is to replace the series of manual calls in HLLCompiler's 'compile' method with an iterator over an array of compiler tasks. I very much agree with chromatic -- indeed, this is mainly why I didn't go with putting ostgrammar methods into the HLLCompiler object before. Having HLLCompiler effectively hardcode a sequence of parser-astgrammar-ostgrammar feels a bit heavy-handed to me, almost saying that we really expect you to always have exactly the sequence source-parse-ast-ost-pir-bytecode, and you're definitely using TGE for the intermediate steps. The patch I sent is the first step toward making chromatic's suggestion work. The problem with the current implementation is that each stage decides what the next stage will be. If the PAST-to-POST transformation calls the POST-to-PIR transformation before returning, then you can't easily insert an additional stage between the two. I guess if we expect a lot of compilers to be making language-specific derivations or replacements of the ast-ost stage then putting the ost specifications into HLLCompiler makes some sense, but I totally agree with chromatic that a more generic approach is needed here. And what I had been aiming for in terms of array of compiler tasks was something like array of compiler stages, where each compiler stage is itself a compiler (in the compreg and HLL compiler sense) that does the transformation to the next item in the list. And each compiler stage knows the details of how it performs its transformation, whether that's using TGE or some other method. I completely agree on the idea of giving each stage its own compiler, and making that compiler aware of everything it needs to know to perform its own stage of compilation. I also completely agree on putting as little code as possible for performing the compilation into the HLLCompiler module. Where we diverge is that I don't want the compiler for one stage to know anything about the next stage. Each stage should operate independently, and only the HLLCompiler should control the order of stages. Part of me really wishes that each compiler task would end up being a standardized 'apply' or 'compile' subroutine or method of each stage. In other words, to have compilation effectively become a sequence like: .local pmc code # source to parse tree $P0 = get_hll_global ['Perl6::Grammar'], 'apply' code = $P0(code, adverbs :flat :named) # parse tree to ast $P0 = get_hll_global ['Perl6::PAST::Grammar'], 'apply' code = $P0(code, adverbs :flat :named) # ast to ost $P0 = get_hll_global ['POST::Grammar'], 'apply' code = $P0(code, adverbs :flat :named) # ost to result $P0 = get_hll_global ['POST::Compiler'], 'apply' code = $P0(code, adverbs :flat :named) Here the 'apply' functions in Perl6::PAST::Grammar and POST::Grammar are simply imported from TGE and do the steps of creating the builder object and then applying the grammar. The 'apply' function in Perl6::Grammar would just be a standardized start rule for the parser grammar (and can be directly specified as such in the .pg file). If we could standardize at this level, then a compiler simply specifies the sequence of things to be applied, and the above instructions could be implemented with a simple iterator over the sequence. This is _really_ what I was attempting to get at by having separate compiler objects for PAST, POST, and friends, except that instead of calling the standard function 'apply' I was using 'compile'. Hm actually, I like this a lot better than registering a compiler for POST and retrieving it by 'compreg'. I would push it one step farther, though. Instead of setting 'astgrammar' in HLLCompiler's 'init' method, set 'astcompiler'. The revised method for a stage (using the parse-tree-to-AST as an example) would be as follows, where the method only performs error checks to make sure that it got a valid class name, creates a compiler object for that stage, and calls 'compile'. (Here I'm using the naming scheme from below.) .sub 'compile_parse_tree' :method .param pmc source .param pmc adverbs :slurpy :named .local string ptcompiler_name .local pmc ptcompiler ptcompiler_name = self.'ptcompiler'() unless ptcompiler_name goto err_no_ptcompiler $I0 = find_type ptgrammar_name ptcompiler = new $I0 .return ptcompiler.'compile'(source) err_no_ptcompiler: $P0 = new .Exception $P0['_message'] = 'Missing ptcompiler in compiler' throw $P0 .end For now, we create a separate compiler object for each tree grammar, but ultimately TGE could generate the appropriate 'compile' method in each generated tree grammar class. Part of me thinks that 'apply' and 'compile' are pretty much the same thing, in the sense that both refer to using some sort of transformer thing to
Re: Initial feedback on PAST-pm, or Partridge
Am Dienstag, 28. November 2006 08:51 schrieb Patrick R. Michaud: I'm half-way inclined to see that as a limitation in Parrot that needs to be fixed rather than a problem with these classes. Having dealt with this in both PGE and at least two PAST implementations, I certainly see it as a Parrot limitation. This was discussed already more then one time. Last was IMHO: http://groups.google.at/group/perl.perl6.internals/browse_frm/thread/e68dc0a0a96585b7/b536997757a3043b?lnk=gstq=instantiate+toetsch+newrnum=2#b536997757a3043b leo
Re: Initial feedback on PAST-pm, or Partridge
Am Dienstag, 28. November 2006 08:51 schrieb Patrick R. Michaud: But come to think of it, if we had something like Capture PMCs available as a standard type (and an easy way to generate them in PIR), then the existing :vtable('init') would be quite sufficient. Another note. Yes, a core Capture PMC would help *in combination* with re-coding calling-conv's internals. These internals are a bit suboptimal currently as they are using to 'arrays' of information: the variable sized opcode part (holding involved registers and constants) and the signature PMC (with other call signature details). Unifying with and improving the latter into a Capture would speed-up the argument passing code and simplify such Capture-based new/init vtables. Please blame me for the current imeplementation ;) leo
Re: Initial feedback on PAST-pm, or Partridge
I'll split my replies into separate threads to make it easier to wrap our brains around individual chunks. Patrick R. Michaud wrote: Clear boundaries between components: (Fuzzy boundaries of abstraction make it difficult to allow for other implementations of the AST/OST or customization of the compiler object.) - The 'compile' method doesn't belong in the PAST object, it belongs in HLLCompiler. ... After a lot of thought and false starts, I ended up taking a different approach to compilation than the HLLCompiler specifies the complete sequence of transformations. Essentially I've taken the approach that a compiler is simply something that transforms a source data structure into a target data structure, and so what we really have is a sequence of compilers. To this end, I really wanted to call my compiler base class 'Compiler' and not 'HLLCompiler', but unfortunately that classname is already used by Parrot for something else and so 'HLLCompiler' is what I chose until that could be resolved. The 'HLL' probably implies more than I intended to imply. So, the 'Abc' compiler really is just something that converts the 'bc' language into a PAST structure, after doing that it simply hands the result off to the 'PAST-pm' compiler. Similarly, the 'PAST' compiler translates into POST and hands the result off to the POST compiler, and POST simply does its thing and returns a PIR or executable result. Let's take a couple steps back. The compiler module is really like Test::Builder. It's the infrastructure code that provides standard functionality to all compiler writers. Standardization is good, it means we don't have 500 incompatible implementations of 'ok'. (Actually, we still have non-standard implementations of 'ok' floating around, and they're a major headache. All the more reason to standardize the compiler tools early on.) With tests, each test file does one thing (tests a chunk of code, says 'ok' or 'not ok' multiple times). The individual tests don't need to each duplicate the infrastructure code. Test::Harness provides the infrastructure, progresses through all the tests, maintains meta-information as it goes, and summarizes at the end. With compiler modules, the individual PGE and TGE modules each do one thing, take in the source code in one form and output it in another form. There's no need to re-write the infrastructure code into the syntax tree modules for every stage of compilation. Let Compiler::Builder (or Compiler::Harness, or whatever we call it) handle the infrastructure. - The 'compile' method also doesn't belong in the main compiler executable, it belongs in HLLCompiler. - Merge them into one 'compile' method in HLLCompiler. - Customization of HLLCompiler should be handled by creating a subclass of HLLCompiler. (The current 'register' strategy is somewhat fragile.) I don't have any problem with having each language subclass HLLCompiler and override the 'compile' method in each, I'll work on that soon. Of course, the method still ends up one way or another in the main compiler executable, it may simply change the namespace. The point is that 99% of compiler writers shouldn't need to write any code for the 'compile' method at all. - Provide an 'init' method for HLLCompiler that lets the compiler writer set which modules HLLCompiler will use for each stage of compilation. This will cover the majority of compilers without requiring each compiler writer to define their own 'compile' routine. Because of the multi-stage approach I've taken, the compile routines are already fairly short, and to me they're not at all onerous for a compiler writer to create. For each of languages/abc/, languages/APL/, and languages/perl6/ the 'compile' method is less than 30 lines of PIR. (And it will only require a couple of lines of code to abstract the existing call to 'compile' methods of PAST/POST to instead use PAST/POST compilers.) a) Most compilers will simply cut-n-paste an existing 'compile' routine from an existing compiler. Cut-n-paste programming is a code smell and a maintenance headache. b) Why require the compiler writer to write 30 lines of code when they could write one? The entire core executable for a compiler could consist of nothing but: .sub '__onload' :load :init # load your modules $P1 = new [ 'HLLCompiler' ] $P1.'init'('language'='punie', 'parse_grammar'='Punie::Parser', 'ast_grammar'='Punie::AST::Grammar') .end .sub 'main' :main .param pmc args $P0 = compreg 'punie' $P1 = $P0.'command_line'(args) .return ($P1) .end That's a great selling point to new compiler writers. (And I'd be even happier if we could export the 'main' routine from HLLCompiler instead of cut-n-pasting it.) I also think that many compilers may end up with compiler-specific option flags or other items that need to be taken care of, and it seems to me that this is more easily handled by a method definition than a module
Re: Initial feedback on PAST-pm, or Partridge
Patrick R. Michaud wrote: Also, out of curiosity, which high-level constructs in punie aren't working? What I've found so far are: - The top-level AST structure is off: my temporary hack to replace PAST::Stmt and PAST::Exp with PAST::Stmts is producing extra temporary variables in the PIR output. I need to refactor the top few tiers of transformation rules, and maybe refactor the Punie parser grammar. - Conditionals are handled completely differently in the new PAST, so Punie needs some replumbing in the AST transformation for those. - Comma lists are also handled completely differently. So, it's not a matter of missing features (aside from PAST::Label), it's just a matter of adapting the code to a different way of thinking. I'll work through these in the next few days and let you know what I find as I go. Allison
Re: Initial feedback on PAST-pm, or Partridge
On Mon, Nov 27, 2006 at 10:52:13AM -0800, Allison Randal wrote: Patrick R. Michaud wrote: Also, out of curiosity, which high-level constructs in punie aren't working? What I've found so far are: - The top-level AST structure is off: my temporary hack to replace PAST::Stmt and PAST::Exp with PAST::Stmts is producing extra temporary variables in the PIR output. I need to refactor the top few tiers of transformation rules, and maybe refactor the Punie parser grammar. I'll gladly add PAST::Stmt and PAST::Exp nodes if that's at all useful. Just because they're there doesn't mean a compiler has to use them. :-) - Comma lists are also handled completely differently. PAST itself doesn't know anything about comma lists -- it just thinks of comma as being an operator like any other operator. In perl6 the infix:, operator has 'list' associativity, so that it ends up with a variable arity. However, I recognize that some languages might need to keep the notion that commas are left-associative with arity 2, so perhaps we need some form of 'list' pasttype that would combine the operands together somehow? So, it's not a matter of missing features (aside from PAST::Label), it's just a matter of adapting the code to a different way of thinking. I'll work through these in the next few days and let you know what I find as I go. That'd be great. I'm working on some refactors of HLLCompiler and PAST right now, I don't think any of these will break existing code. Pm
Re: Initial feedback on PAST-pm, or Partridge
On Mon, Nov 27, 2006 at 01:13:52AM -0800, Allison Randal wrote: .sub '__onload' :load :init # load your modules $P1 = new [ 'HLLCompiler' ] $P1.'init'('language'='punie', 'parse_grammar'='Punie::Parser', 'ast_grammar'='Punie::AST::Grammar') .end .sub 'main' :main .param pmc args $P0 = compreg 'punie' $P1 = $P0.'command_line'(args) .return ($P1) .end [...] Standardized infrastructure code good. Make Ogg-itect happy. :) We definitely want Ogg-itect to remain happy. :-) Now implemented in r15882 as shown above, sans the helper 'init' method (which I'll add later tonight). Examples are in languages/perl6/ and languages/abc/ . Time permitting tonight I will also refactor the monolithic 'command_line' method of HLLCompiler into separate shorter methods. Pm
Re: Initial feedback on PAST-pm, or Partridge
This fragment of response is about types, layers of abstraction and tracking information as the stages of compilation progress. And, I probably haven't said it enough yet, but the work you've done here is absolutely wonderful, Patrick. There's nothing like a solid chunk of working code to push the design to the next stage of evolution. :) Patrick R. Michaud wrote: On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote: - Is there no way to indicate what type of variable a PAST::Var is? Scalar/Array/Hash? (high-level types, not low-level types) Sure, that's what 'vtype' is -- it indicates the type of value that the variable ought to hold. My plan has been to follow the Perl6 concept of implementation types and value types within PAST. Thus far I've only put in the support for the value types, as the vtype attribute (and vtype can be any high-level type the language happens to support). I'm expecting to add an itype attribute at some point when we're a bit farther along; I'm still working out the details. Hrm... you've really got two HLL types: the container type (scalar/array/hash) and the value type (Str, Int, Foo::Bar, Array, Hash, Matrix, Custom::Hash, etc). You've also essentially got two PIR types: the container type (int/num/str/pmc) and the value type (int, num, str, or some pmc type). By implementation type do you mean the PIR value type? A YAML config file to map HLL value types to PIR value types for a particular compiler would be another nice addition. PAST doesn't need to know anything about PIR types. - In PAST nodes, the attribute 'ctype' isn't actually storing a C language type. Better name? It really stands for constant type, and is one of 'i', 'n', or 's' depending on whether it can be treated as an int, num, or string when being handled as a constant in PIR. Okay, 'const_type' is a better name. - The attribute 'vtype' is both variable type in POST::Var and value type in POST::Val. Handy generalization, but it's not clear from the name that 'vtype' is either of those things. I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var or POST::Val. Indeed I did. Though, why isn't there a POST::Var or POST::Val? POST has both variables and values. But 'vtype' really stands for value type in both cases -- it's the type of value returned by either a PAST::Var or PAST::Val node. Hmm... If a PAST::Var is, say, an integer constant, will it have the same 'value_type' as an integer PAST::Val? (Definitely go with the longer name instead of 'vtype'.) - The values for both 'ctype' and 'vtype' are obscure. Better to establish a general system for representing types, than to include raw Parrot types or 1-letter codes in the AST. Ultimately I expect that the types that appear in 'vtype' will be the types defined by the HLL itself. For example, in perl6 one would see 'vtype'='Str' to indicate a Perl 6 string constant. Unfortunately it's been difficult to illustrate this in real code because of the HLL classname conflicts that I've been reporting in other contexts. What bug # is that? It's hard to imagine how an HLL type name that's only stored in an AST would conflict with a Parrot class name. Or, are you assuming that the HLL type names have to be the same as the Parrot class names? Shouldn't need to be the same, you just need a config file mapping between the two. I agree the values and name for 'ctype' are a bit obscure, and will gladly accept any suggestions for improving it. The 'ctype' attribute is really just code optimization in the final output, and it does assume some knowledge of the target. If no ctype is specified, past-pm assumes that the constant value must first be placed into a PMC in order to be useful. With a ctype present, then past-pm can match up the (PIR) opcode contexts in which the value can be directly used as an int/num/string in an operation. It's the difference between # $b + 2 # $b + 2 get_global $P0, '$b' get_global $P0, '$b' new $P2, .Undef new $P1, .Integer add $P2, $P0, 2 assign $P1, 2 new $P2, .Undef add $P2, $P0, $P1 or # say 3, 4, 5# say 3, 4, 5 say(3, 4, 5) new $P1, .Integer assign $P1, 3 new $P2, .Integer assign $P2, 4 new $P3, .Integer assign $P3, 5 say($P1, $P2, $P3) Okay, if ctype is an optimization hint, then you don't actually need to list the specific types (i/n/s) in the PAST nodes. All you need is the name of the HLL value type, and a small bit of config info for that type name.
Re: Initial feedback on PAST-pm, or Partridge
Patrick R. Michaud wrote: I'll gladly add PAST::Stmt and PAST::Exp nodes if that's at all useful. Just because they're there doesn't mean a compiler has to use them. :-) Well, I came to the conclusion that PAST::Exp was useless a while ago. (Its entire point of existence was as a dummy node to be factored out at the PAST-to-POST stage.) I do think PAST::Stmt is useful, but I want to take a stab at refactoring it out first. Oh, I should have mentioned that the patch I sent in to remove the dummy 'root' rule from the POST::Grammar was part of what was making Punie work (because Punie's top-level node isn't a PAST::Block, it's a PAST::Stmts). I can refactor that out, but in this case it seemed to make more sense to refactor the compiler tool (since the other languages still worked with the change). I'm working on some refactors of HLLCompiler and PAST right now, I don't think any of these will break existing code. Break away. I'm fine with the implementation shifting under the Punie port, since it means progress. Allison
Re: Initial feedback on PAST-pm, or Partridge
On Mon, Nov 27, 2006 at 05:28:59PM -0800, Allison Randal wrote: Patrick R. Michaud wrote: I'll gladly add PAST::Stmt and PAST::Exp nodes if that's at all useful. Just because they're there doesn't mean a compiler has to use them. :-) Well, I came to the conclusion that PAST::Exp was useless a while ago. (Its entire point of existence was as a dummy node to be factored out at the PAST-to-POST stage.) I do think PAST::Stmt is useful, but I want to take a stab at refactoring it out first. Excellent. Let me know when/if you want PAST::Stmt added in, and any attributes you want it to have. Oh, I should have mentioned that the patch I sent in to remove the dummy 'root' rule from the POST::Grammar was part of what was making Punie work (because Punie's top-level node isn't a PAST::Block, it's a PAST::Stmts). I can refactor that out, but in this case it seemed to make more sense to refactor the compiler tool (since the other languages still worked with the change). POST really needs to have a POST::Sub at the top of the tree, so the purpose of the 'root' rule in POST::Grammar is (going to be) to create a POST::Sub for the tree if the lower transformations don't happen to return one. I'll add that code shortly, and then things should work properly even if the top-level node in PAST isn't a PAST::Block. Pm
Re: Initial feedback on PAST-pm, or Partridge
Patrick R. Michaud wrote: Now implemented in r15882 as shown above, sans the helper 'init' method (which I'll add later tonight). Examples are in languages/perl6/ and languages/abc/ . Definitely an improvement. Hmm... okay, I see what you're going for. Creating subclass of HLLCompiler for every stage of compilation is heavyweight, but it's definitely nice to be able to say give me a compiler for this tree. So, with a thumbs up on that modification, I've attached a patch that does two things: a) keeps strict functionality boundaries so the controller object does the controlling, and the compiler objects for PAST and POST do only compiling; and b) makes it possible to override the grammar used for the PAST-to-POST transformation. ABC passes all its tests, and Perl6 doesn't fail any more tests than it was failing before. (I made it a patch because it's a refactor that's easy to show but convoluted to explain.) chromatic's suggestion is to replace the series of manual calls in HLLCompiler's 'compile' method with an iterator over an array of compiler tasks. Then, a compiler-writer can insert another task (perhaps a tree-based optimizer between the PAST and POST stages), by calling a method to specify that the new task is 'before' or 'after' another task (much like the precedence levels of PGE rules). His idea is a good next step, but I wanted to keep the change set small, so didn't implement it here. Allison Index: runtime/parrot/library/Parrot/HLLCompiler.pir === --- runtime/parrot/library/Parrot/HLLCompiler.pir (revision 15893) +++ runtime/parrot/library/Parrot/HLLCompiler.pir (working copy) @@ -17,6 +17,7 @@ $P0 = newclass [ 'HLLCompiler' ] addattribute $P0, '$parsegrammar' addattribute $P0, '$astgrammar' +addattribute $P0, '$ostgrammar' addattribute $P0, '$!compsub' .end @@ -71,6 +72,10 @@ Accessor for the 'astgrammar' attribute. +=item ostgrammar([string grammar]) + +Accessor for the 'ostgrammar' attribute. + =cut .sub 'parsegrammar' :method @@ -86,7 +91,13 @@ .return self.'attr'('$astgrammar', value, has_value) .end +.sub 'ostgrammar' :method +.param string value:optional +.param int has_value :opt_flag +.return self.'attr'('$ostgrammar', value, has_value) +.end + =item compile(pmc code [, adverbs :slurpy :named]) Compile Csource according to any options given by @@ -113,10 +124,13 @@ .local pmc result result = self.'parse'(source, adverbs :flat :named) if target == 'parse' goto have_result -result = self.'ast'(result, adverbs :flat :named) +result = self.'astcompile'(result, adverbs :flat :named) if target == 'past' goto have_result -$P0 = compreg 'PAST' -result = $P0.'compile'(result, adverbs :flat :named) +result = self.'ostcompile'(result, adverbs :flat :named) +if target == 'post' goto have_result +result = self.'pircompile'(result, adverbs :flat :named) +if target == 'pir' goto have_result +result = self.'pirrun'(result, adverbs :flat :named) have_result: .return (result) .end @@ -147,7 +161,7 @@ .end -=item ast(source [, adverbs :slurpy :named]) +=item astcompile(source [, adverbs :slurpy :named]) Transform Csource using the compiler's Castgrammar according to any options given by Cadverbs, and return the @@ -155,7 +169,7 @@ =cut -.sub 'ast' :method +.sub 'astcompile' :method .param pmc source .param pmc adverbs :slurpy :named .local string astgrammar_name @@ -173,7 +187,50 @@ throw $P0 .end +=item ostcompile(source [, adverbs :slurpy :named]) +Transform Csource using the compiler's Costgrammar +according to any options given by Cadverbs, and return the +resulting ost. + +=cut + +.sub 'ostcompile' :method +.param pmc source +.param pmc adverbs :slurpy :named +.local string ostgrammar_name +.local pmc ostgrammar, ostbuilder +ostgrammar_name = self.'ostgrammar'() +unless ostgrammar_name goto default_ostgrammar +$I0 = find_type ostgrammar_name +ostgrammar = new $I0 +ostbuilder = ostgrammar.'apply'(source) +.return ostbuilder.'get'('post') + + default_ostgrammar: +$P0 = compreg 'PAST' +.return $P0.'compile'(source, adverbs :flat :named) +.end + +.sub 'pircompile' :method +.param pmc source +.param pmc adverbs :slurpy :named + +$P0 = compreg 'POST' +$P1 = $P0.'compile'(source, adverbs :flat :named) +.return ($P1) +.end + +.sub 'pirrun' :method +.param pmc source +.param pmc adverbs :slurpy :named + +$P0 = compreg 'PIR' +$P1 = $P0(source) +.return ($P1) +.end + + =item register(string name, pmc compsub) # DEPRECATED (Deprecated.) Registers this compiler object as Cname and Index: compilers/past-pm/POST/Compiler.pir === ---
Re: Initial feedback on PAST-pm, or Partridge
This fragment of a reply is the random bits that didn't make it into other topic-centered replies. Patrick R. Michaud wrote: On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote: Excellent. Just as a general overall response -- PAST-pm is by no means finished, so many of the items that seem to be missing are simply cases of I haven't gotten to them yet so they aren't implemented yet. Understood, it's a work in progress. Which makes this the perfect time to influence it's future. :) - There's no PAST::Label node type? How do you represent labels in the HLL source? I just haven't gotten to this part yet. Okay, I'll add it when I need it, if you haven't already added it by then. I don't have a problem with switching it to 'value', I went with 'name' primarily because every PAST::Node has a name and so it just made sense to use it there. But let me make another weak argument in favor of 'name'. If a HLL programmer writes $a = 1.23456789E6; then the rhs becomes a PAST::Val node. How should we represent the value? The parse-to-past translation could evaluate the contents of 1.23456789E1 and store the result in 'value' as (.Float) 12.3456789, but unfortunately when convert that .Float back into a string for use as PIR code it comes out as 12.3457 -- i.e., the code looks like: $P0 = new .Float $P0 = 12.3457 set_global '$a', $P0 I decided that in a number of cases like this, what we really want to retain in PAST::Val is a precise string representation of the value that goes in the resulting output, and not a native representation that may lose precision in translation through POST/PIR. So, what we're really storing is the value's name and not its value.(I did say it was a weak argument.) Fair. And agreed that PAST::Val should store the raw parsed constant, not an evaluated form. Anyway, we can switch to 'value' if that's ultimately better; I was just thinking that 'name' might be equally appropriate. Yeah, let's go with 'value'. The only case I can think of that might need to use a value as a name is Ruby, where you can call a method on a literal: 2.class But in that case, I think you'd end up representing '2' as a constant Var named '2' anyway (perhaps with a PIR value type of RubyLiteralInt). Ismy means isdeclaration here, and I can go ahead and change it. Excellent! Currently Parrot uses '__init' as the method for initializing new objects, thus I think 'init' is at least consistent with Parrot. Where it's inconsistent is in the arguments each takes, so you can't use the current 'init' methods as :vtable('init') methods. I'm half-way inclined to see that as a limitation in Parrot that needs to be fixed rather than a problem with these classes. I've also thought about doing 'push' as a :vtable entry, and we can still easily do that, but there are at least two items in favor of keeping a method-based approach: (1) :vtable in subclassed items still has some issues to be addressed (e.g., RT #40626), and Yes, hold off on this fix until :vtable works, but put it into the draft PDD. (2) when we get a high-level transformation language into TGE, it's very likely that the operations on nodes will be method-based and not opcode-based. Well, the operations will be in a middle-level-language syntax. Whether the MLL uses a methody syntax or a procedural syntax doesn't matter, since either can be translated to either syntax in PIR. Besides, using :vtable we can get both a method and a :vtable entry for the price of one method definition. Clear boundaries between components: (Fuzzy boundaries of abstraction make it difficult to allow for other implementations of the AST/OST or customization of the compiler object.) - One more comment in this department: move PIR generation out of the POST node objects. A tree-grammar that outputs PIR code strings isn't a final solution, but it's a more maintainable intermediate step than mingled syntax tree representation and code generation (remember P6C?). A clear boundary between the OST and PIR generation will also push us closer to the final solution. - Provide distinct errors/exceptions for failures at each stage of compilation to make it easy to figure out which stage is failing. Agreed -- however, exception handling in Parrot still needs implementation and better flushing out (this is what prompted my question about the status of exception handling implementation in last week's #parrotsketch, and my comment that I'm likely to need them fairly soon.) Yes, exceptions need work, and soon. - In PGE grammars, what is the { ... } at the end of every proto declaration supposed to do? [...] But in the end, I didn't allow simple semicolon terminators simply because it wasn't valid Perl 6 syntax, and in many cases I think that having subtle differences isn't ideal as people may get confused about what is allowed where. But I don't have a large objection to modifying the PGE::Grammar
Re: Initial feedback on PAST-pm, or Partridge
On Mon, Nov 27, 2006 at 09:20:08PM -0800, Allison Randal wrote: Patrick R. Michaud wrote: Now implemented in r15882 as shown above, sans the helper 'init' method (which I'll add later tonight). Examples are in languages/perl6/ and languages/abc/ . So, with a thumbs up on that modification, I've attached a patch that does two things: a) keeps strict functionality boundaries so the controller object does the controlling, and the compiler objects for PAST and POST do only compiling; and b) makes it possible to override the grammar used for the PAST-to-POST transformation. ABC passes all its tests, and Perl6 doesn't fail any more tests than it was failing before. (I made it a patch because it's a refactor that's easy to show but convoluted to explain.) chromatic's suggestion is to replace the series of manual calls in HLLCompiler's 'compile' method with an iterator over an array of compiler tasks. I very much agree with chromatic -- indeed, this is mainly why I didn't go with putting ostgrammar methods into the HLLCompiler object before. Having HLLCompiler effectively hardcode a sequence of parser-astgrammar-ostgrammar feels a bit heavy-handed to me, almost saying that we really expect you to always have exactly the sequence source-parse-ast-ost-pir-bytecode, and you're definitely using TGE for the intermediate steps. I guess if we expect a lot of compilers to be making language-specific derivations or replacements of the ast-ost stage then putting the ost specifications into HLLCompiler makes some sense, but I totally agree with chromatic that a more generic approach is needed here. And what I had been aiming for in terms of array of compiler tasks was something like array of compiler stages, where each compiler stage is itself a compiler (in the compreg and HLL compiler sense) that does the transformation to the next item in the list. And each compiler stage knows the details of how it performs its transformation, whether that's using TGE or some other method. Putting transformation details like the ostbuilder and apply steps into HLLCompiler still feels wrong to me somehow, although I did come around to agreeing with the idea that the commonly repeated details for source-parse and parse-ast belong in the default 'compile' method for compiler objects. Part of me really wishes that each compiler task would end up being a standardized 'apply' or 'compile' subroutine or method of each stage. In other words, to have compilation effectively become a sequence like: .local pmc code # source to parse tree $P0 = get_hll_global ['Perl6::Grammar'], 'apply' code = $P0(code, adverbs :flat :named) # parse tree to ast $P0 = get_hll_global ['Perl6::PAST::Grammar'], 'apply' code = $P0(code, adverbs :flat :named) # ast to ost $P0 = get_hll_global ['POST::Grammar'], 'apply' code = $P0(code, adverbs :flat :named) # ost to result $P0 = get_hll_global ['POST::Compiler'], 'apply' code = $P0(code, adverbs :flat :named) Here the 'apply' functions in Perl6::PAST::Grammar and POST::Grammar are simply imported from TGE and do the steps of creating the builder object and then applying the grammar. The 'apply' function in Perl6::Grammar would just be a standardized start rule for the parser grammar (and can be directly specified as such in the .pg file). If we could standardize at this level, then a compiler simply specifies the sequence of things to be applied, and the above instructions could be implemented with a simple iterator over the sequence. This is _really_ what I was attempting to get at by having separate compiler objects for PAST, POST, and friends, except that instead of calling the standard function 'apply' I was using 'compile'. Part of me thinks that 'apply' and 'compile' are pretty much the same thing, in the sense that both refer to using some sort of transformer thing to change from a source representation into an equivalent target. - At any rate, even if we go with the approach outlined in the patch, I have to say that I'm not at all keen on the method names 'astcompile', 'ostcompile', etc. in the patch. When I read 'astcompile' it sounds to me like it's a method to compile an ast into something else, when in fact the method in the patch is compiling some source into an ast. (By analogy, we speak of Perl 6 compiler and PIR compiler as being things that consume Perl 6 and PIR, not the things that that produce Perl 6 or PIR.) So at the very least I'd prefer to have those methods called 'get_ast' or 'make_ast' or something much less likely to cause confusion. Indeed, the reason why I went with simple 'parse' and 'ast' method names in the original is because the method name tells me what it is that I'm getting back, much like an accessor. Pm
Re: Initial feedback on PAST-pm, or Partridge
On Mon, Nov 27, 2006 at 10:13:21PM -0800, Allison Randal wrote: This fragment of a reply is the random bits that didn't make it into other topic-centered replies. ...and some quick responses before turning in for the night... Currently Parrot uses '__init' as the method for initializing new objects, thus I think 'init' is at least consistent with Parrot. Where it's inconsistent is in the arguments each takes, so you can't use the current 'init' methods as :vtable('init') methods. I'm half-way inclined to see that as a limitation in Parrot that needs to be fixed rather than a problem with these classes. Having dealt with this in both PGE and at least two PAST implementations, I certainly see it as a Parrot limitation. Ultimately I want to have a method that can accept variable arguments so that I can initialize a newly created object. I chose 'init' because it seemed like the natural/obvious name for such a method, but if there's a better name I'll gladly switch. I haven't found the Parrot :vtable('init') to be all that useful, since there's not a parameterized version of it beyond passing a single PMC. And getting arguments into a single PMC isn't all that fun or useful. But come to think of it, if we had something like Capture PMCs available as a standard type (and an easy way to generate them in PIR), then the existing :vtable('init') would be quite sufficient. To steal from Perl 6's C \(...) capture syntax: $P0 = new 'Foo::Bar', \(param1, param2, 'abc'=param3) .sub 'init' :vtable .param pmc args # initialize self based on array/hash components of args pmc # ... I've also thought about doing 'push' as a :vtable entry, and we can still easily do that, but there are at least two items in favor of keeping a method-based approach: (2) when we get a high-level transformation language into TGE, it's very likely that the operations on nodes will be method-based and not opcode-based. Well, the operations will be in a middle-level-language syntax. Whether the MLL uses a methody syntax or a procedural syntax doesn't matter, since either can be translated to either syntax in PIR. My point is simply that it's far easier to go from a MLL (whatever syntax) to PIR method calls than to generate specific Parrot opcodes, because method calls have a very regular syntax that Parrot opcodes don't. - One more comment in this department: move PIR generation out of the POST node objects. A tree-grammar that outputs PIR code strings isn't a final solution, but it's a more maintainable intermediate step than mingled syntax tree representation and code generation (remember P6C?). I never really dealt with P6C. :-), Still, I can see about moving the code generation out of the POST node objects; I may do it as a lower priority though, since I don't think that aspect is driving many design or implementation decisions for us at this point. - In PGE grammars, what is the { ... } at the end of every proto declaration supposed to do? [...] But in the end, I didn't allow simple semicolon terminators simply because it wasn't valid Perl 6 syntax, and in many cases I think that having subtle differences isn't ideal as people may get confused about what is allowed where. But I don't have a large objection to modifying the PGE::Grammar compiler to represent empty declarations with semicolons as well as yada-yada-yada blocks. Excellent. Excellent as in ...? [ ] Go ahead and allow semicolons, since you don't have a large objection. [ ] Your explanation is excellent, stick with the yadas to avoid the subtle contrasts to Perl 6. Pm
Initial feedback on PAST-pm, or Partridge
Overall, the POST implementation is usable and I really like the new HLL compiler module. I've got Punie working with the new toolchain to the point that it's generating valid PIR code for many low-level constructs, but some of the high-level constructs that worked under the previous toolchain I still don't have working. I've done everything I can do with straightforward translations of the existing code, and am now to the point where I'll have to do major conceptual refactors to fit with the new toolchain. I've already accumulated a good quantity of feedback for Patrick, so I figured I'd go ahead and send it out now. (Especially since some of my comments may result in changes that will make it much easier to finish porting Punie.) I had to poke into the guts of HLLCompiler, the new PAST, and the new POST a fair bit in the process of getting Punie to work with them, so my comments here are a mixture of user experience and implementation details. I've grouped my comments into general categories. -- Available node types: - There's no PAST::Stmt node type? I only see PAST::Stmts and PAST::Op. But statements are composed of multiple ops. So, everything is an op? I was using PAST::Stmt and PAST::Exp for a similar purpose to what POST::Ops performs. I've hacked it to use PAST::Stmts for this purpose, but it doesn't quite work. - There's no PAST::Label node type? How do you represent labels in the HLL source? - Is there no way to indicate what type of variable a PAST::Var is? Scalar/Array/Hash? (high-level types, not low-level types) --- Meaningful naming: (Be kind to your compiler writers.) - In the PAST nodes, I grok 'name' as the operator/function name of a PAST::Op and as the HLL variable name of a PAST::Var, but making it the value of a PAST::Val is going to far. It was 'value' in the old PAST, which makes more sense. You're passing named parameters into 'init', so I can't see a reason not to use a more meaningful name for the attribute. - In PAST nodes, the attribute 'ctype' isn't actually storing a C language type. Better name? - The attribute 'vtype' is both variable type in POST::Var and value type in POST::Val. Handy generalization, but it's not clear from the name that 'vtype' is either of those things. - The values for both 'ctype' and 'vtype' are obscure. Better to establish a general system for representing types, than to include raw Parrot types or 1-letter codes in the AST. - In PAST nodes, consider the audience when choosing attribute names like 'ismy' (PAST::Var). Something like 'islexical' or 'isdeclaration' (I'm not sure which you mean), is friendlier to non-Perl users, and actually clearer even for Perl users. - In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op) represents. Is it the literal name of a PIR opcode, or a generic representation of standard low-level operations? I'm more in favor of the latter. Better still, give compiler-writers a standard format lookup table they can write to allow the PAST to POST tranformation to select the right PIR operation from the HLL op name. (See the comments on boundaries of abstraction.) - In PAST nodes, the 'clone' method is now 'init'.'clone' was a terrible name, I agree, but 'init' isn't quite right either. - In PAST nodes, the 'add_child' method is now 'push'. I liked 'add_child' better, but, maybe what we really want is not a method at all, but a :vtable entry for an array push? Seems likely, since there's really not any other array-like behavior the syntax-tree nodes need to have. - On module naming, I quickly regretted the naming of past2post.tg and past2post_gen.pir (and all the related names) and changed them to POSTGrammar.tg, POSTGrammar.pir, etc. in Punie. The .tg files are modules, they're just modules written in a different language, so we should standardize on module-style naming. Consider names like POST/Grammar.tg and POST/Grammar.pir, or Partridge/Compiler/AST.tg and Partridge/Compiler/AST.pir (looking at it from the perspective of the compilation source rather than the compilation result). --- Clear boundaries between components: (Fuzzy boundaries of abstraction make it difficult to allow for other implementations of the AST/OST or customization of the compiler object.) - The 'compile' method doesn't belong in the PAST object, it belongs in HLLCompiler. - The 'compile' method also doesn't belong in the main compiler executable, it belongs in HLLCompiler. - Merge them into one 'compile' method in HLLCompiler. - Provide an 'init' method for HLLCompiler that lets the compiler writer set which modules HLLCompiler will use for each stage of compilation. This will cover the majority of compilers without requiring each compiler writer to define their own 'compile' routine. - Customization of HLLCompiler should be handled by creating a subclass of HLLCompiler. (The current 'register' strategy is somewhat fragile.) - It would
Re: Initial feedback on PAST-pm, or Partridge
On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote: I had to poke into the guts of HLLCompiler, the new PAST, and the new POST a fair bit in the process of getting Punie to work with them, so my comments here are a mixture of user experience and implementation details. I've grouped my comments into general categories. Excellent. Just as a general overall response -- PAST-pm is by no means finished, so many of the items that seem to be missing are simply cases of I haven't gotten to them yet so they aren't implemented yet. Available node types: - There's no PAST::Stmt node type? I only see PAST::Stmts and PAST::Op. But statements are composed of multiple ops. So, everything is an op? At present there's no PAST::Stmt node type, but one can be easily added. I thought about putting one in based on Punie's use of PAST::Stmt, but I hadn't quite figured out exactly _why_ it's important so I thought I'd leave it out until I actually needed it somewhere. In many ways ops are already composed of multiple ops, so a statement can be considered just another op. (But I do see why someone would want a PAST::Stmt abstraction -- on the other hand, I didn't see how it changed the resulting POST/PIR output.) - There's no PAST::Label node type? How do you represent labels in the HLL source? I just haven't gotten to this part yet. - Is there no way to indicate what type of variable a PAST::Var is? Scalar/Array/Hash? (high-level types, not low-level types) Sure, that's what 'vtype' is -- it indicates the type of value that the variable ought to hold. My plan has been to follow the Perl6 concept of implementation types and value types within PAST. Thus far I've only put in the support for the value types, as the vtype attribute (and vtype can be any high-level type the language happens to support). I'm expecting to add an itype attribute at some point when we're a bit farther along; I'm still working out the details. --- Meaningful naming: (Be kind to your compiler writers.) I totally agree, and I'm not yet wedded to any particular naming scheme. - In the PAST nodes, I grok 'name' as the operator/function name of a PAST::Op and as the HLL variable name of a PAST::Var, but making it the value of a PAST::Val is going to far. It was 'value' in the old PAST, which makes more sense. You're passing named parameters into 'init', so I can't see a reason not to use a more meaningful name for the attribute. I don't have a problem with switching it to 'value', I went with 'name' primarily because every PAST::Node has a name and so it just made sense to use it there. But let me make another weak argument in favor of 'name'. If a HLL programmer writes $a = 1.23456789E6; then the rhs becomes a PAST::Val node. How should we represent the value? The parse-to-past translation could evaluate the contents of 1.23456789E1 and store the result in 'value' as (.Float) 12.3456789, but unfortunately when convert that .Float back into a string for use as PIR code it comes out as 12.3457 -- i.e., the code looks like: $P0 = new .Float $P0 = 12.3457 set_global '$a', $P0 I decided that in a number of cases like this, what we really want to retain in PAST::Val is a precise string representation of the value that goes in the resulting output, and not a native representation that may lose precision in translation through POST/PIR. So, what we're really storing is the value's name and not its value.(I did say it was a weak argument.) Anyway, we can switch to 'value' if that's ultimately better; I was just thinking that 'name' might be equally appropriate. - In PAST nodes, the attribute 'ctype' isn't actually storing a C language type. Better name? It really stands for constant type, and is one of 'i', 'n', or 's' depending on whether it can be treated as an int, num, or string when being handled as a constant in PIR. - The attribute 'vtype' is both variable type in POST::Var and value type in POST::Val. Handy generalization, but it's not clear from the name that 'vtype' is either of those things. I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var or POST::Val. But 'vtype' really stands for value type in both cases -- it's the type of value returned by either a PAST::Var or PAST::Val node. - The values for both 'ctype' and 'vtype' are obscure. Better to establish a general system for representing types, than to include raw Parrot types or 1-letter codes in the AST. Ultimately I expect that the types that appear in 'vtype' will be the types defined by the HLL itself. For example, in perl6 one would see 'vtype'='Str' to indicate a Perl 6 string constant. Unfortunately it's been difficult to illustrate this in real code because of the HLL classname conflicts that I've been reporting in other contexts. I agree the values and name for 'ctype' are a bit obscure, and will gladly accept any suggestions for improving it. The 'ctype'
Re: Initial feedback on PAST-pm, or Partridge
On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote: Overall, the POST implementation is usable and I really like the new HLL compiler module. I've got Punie working with the new toolchain to the point that it's generating valid PIR code for many low-level constructs, but some of the high-level constructs that worked under the previous toolchain I still don't have working. Also, out of curiosity, which high-level constructs in punie aren't working? Pm