On Thu, Dec 08, 2005 at 11:51:52AM +0200, Yuval Kogman wrote: : On Wed, Dec 07, 2005 at 16:48:11 -0500, Peter Schwenn wrote: : > Dear Perl6 Language, : > : > I am Perl user from near year 0. For me the easiest way to learn (, : > track, and get to the point of contributing to) Perl6 would be a Perl : > grammar (a regex rule set in, preferably, Perl6) that transforms any : > Perl5 script into a Perl6. Of couse, besides learning Perl6 for a : > regex'r or Perl5'r such as myself, and tracking, and documenting 6, it : > would have huge use for Perl5 users making or considering the : > transition. : : IMHO machine translation is not that good a way to start learning - : the real benefit of Perl 6 is in the features which have no perl 5 : equivalents and solve problems much more elegantly.
Except it would be lovely to have a smart enough refactoring translator that it could recognize where those elegant solutions are possible and at least give the option of attempting them. Or at least a hint that there might be a better way. : The best thing to do is to hang out on #perl6 and get involved with : the test suite, as well as reading the synopses. : : Perhaps writing a toy program or something like that could also : help. Sure, but some of our toys are bigger than others. :-) : > Is there such a Perl5->Perl6 translator underway? : : Larry Wall is working on using the perl (5) interpreter to create : compiled output (as opposed to just something that executes in : memory) that can then be read by a translator without actually : parsing perl 5. Yes, I have a version of 5.9.2 that dumps out some *very* strange XML that represents, as closely as possible, the exact meaning of the code to Perl 5, along with all the syntactic bits. I then filter that strange XML back into something approximating an AST. I am in the process of proving to myself that I'm getting enough information out of this to recreate the original Perl 5, so I jokingly call this my Perl5-to-Perl5 translator. As of today, I'm able to translate 76.57% of the t/*/*.t files that come with the Perl distribution. Considering that last week this number was down at about 5%, it would seem that I've been making a lot of progress. But most of the work went into that first 5%, and a lot of work will likely go into the last 5% as well. To get that first 5% I basically had to completely refactor the lexer and the grammar without changing anything, which is of course impossible. The Perl 5 parser forgets or misplaces an astounding variety of information that the translator needs, and you can't just go and tell it to turn off the optimizations, because in fact most of those optimizations are deeply interwingled with semantic analysis and transformations as well. Basically, every skipspace() in toke.c and every op_free() in op.c and every rule reduction in perly.y loses necessary information. To attempt to do what I'm currently doing you would have to be completely insane like me. It's a total nightmare. If I were Catholic I'd be hoping this all counts as pennance for my past sins, and gets me out of 100 million years of Purgatory or so. But being a Protestant, I'm merely repenting of my past sins, and thinking about maybe repenting my future ones. And if I were Jewish I'd've said "Oy vey" many times over. :-) Anyway, once I get to 100% of the t/ files, I'll make it translate all of CPAN back to itself. And at some point I'll take a first whack at the Perl5-to-Perl6 translator, then open it up for community participation. It's still just a bit too early for that, though, because there's such a delicate interplay between refactoring bits of perl without changing anything vs trying to guess whether we are getting enough type and structural information out to recreate the original in the backend. There are already more than 10000 lines of code in the backend just to undo the damage done by the Perl 5 engine. : Before this happens this will be very very hard - the high level : language has vast amounts of implications on execution etc, but the : opcode tree is much more simpler to predict (for a computer). Right. But my intent is to write a really good translator, and that implies that it has to be a multi-level translation. That involves keeping track of all the subtle semantic and pragmatic information as well as the basic syntactic information. Otherwise we might as well just feed Perl 5 to babblefish and see if Perl 6 comes out... : > p.s. The developing form of such a grammar could likely lead to : > a grammar package which facilitates rule sets for languages in : > other domains, in terms of illuminating means of choosing among modes : > for rule ordering, collecting, scoping, re-application, recursion, exclusion and so forth. : : Since perl 5's actual parser and tokenizer will be used for this it : won't be very extensible, but this is important because perl is : reaaaaaaaaaaaaaaaaaaaaaaaaallly hard to parse. And it's oh about 20 times harder to tell if you've parsed it correctly from a semantic point of view. Consider that every little instrumentation tweak I've made to the lexer has had about a 50% chance of inducing strange distortions in the meaning of "Perl". I would be completely lost without the existing regression tests. That is why I went for minimal instrumentation and try to undo most of the damage in back end. (Or I guess it'll be a "middle end" once we start translating to Perl 6.) As for the original question, I think that the Perl 6 grammar will be a much better example for how to parse other languages than a Perl 5 grammar would be, since one of the underlying design currents from the beginning has been that Perl 6 had to be a language that was amenable to parsing by Perl 6 rules (with a little help from a bottom-up operator-precedence expression parser.) Larry