Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
Leon Brocard wrote: Bradley M. Kuhn sent the following bits through the ether: It should be noted that in Larry's speech on Friday, he said that he wanted to write the Lexer and Parser for Perl in some subset of Perl. :) Is there a writeup somewhere for those who couldn't attend? Hmmm, I wonder what kind of subset would be necessary - surely the most useful constructs are also the most complicated... We could learn quite a bit by looking through the code from Parse::RecDescent, switch.pm, and friends. Damian's done a lot of parsing (including parsing Perl) with Perl, so this would be a good place to start. In terms of bootstrapping, however, we either need to: - Write the Perl subset in C (or some other portable language), or - Use Perl 5 as the 'Perl subset', and distribute that with Perl 6. The 2nd of these options seems unlikely to be practical... Maybe however the bootstrapper could be a subset of Perl 5 stolen fairly directly from the existing code. Maybe this would then also become the Perl for small/embedded devices.
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 03:56:20AM -0400, Adam Turoff wrote: We could learn quite a bit by looking through the code from Parse::RecDescent, switch.pm, and friends. Damian's done a lot of parsing (including parsing Perl) with Perl, so this would be a good place to start. It's time to drag out my quote of the week: Recursive-descent, or predictive, parsing ONLY works on grammars where the first terminal symbol of each subexpression provides enough information to choose which production to use. (Appel, emphasis mine.) Gisle and I were talking about this tonight, and it *might* be possible to write the Perl tokenizer in a Perl[56] regex, which is more easily parsable in C. All of a sudden, toke.c is replaced by toke.re, which would be much more legible to this community (which is more of a strike against toke.c instead of a benefit of some toke.re). That would certainly qualify as implementing the Perl grammar in Perl, and might even be achievable. (*gasp!*) This would have to take account of the fact that Perl's tokeniser is aware of what's going on in the rest of perl. Consider print foo; What should the tokeniser return for "foo"? Is it a bareword? Is it a subroutine call? Is it a class? Is it - heaven forbid - a filehandle? Well, it could be any of these things. You have to choose. So, while I don't doubt that, with the state of Perl's regexes these days, it's possible to create something with enough sentience to tokenize Perl, I've really got to wonder whether it's sane. -- BEWARE! People acting under the influence of human nature.
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 11:00:35AM +0100, Simon Cozens wrote: On Tue, Oct 17, 2000 at 10:37:24AM +0100, Simon Cozens wrote: What should the tokeniser return for "foo"? Uh, tokenizer != lexer. Insert coffee. Yes, writing a tokeniser in a regexp should be very doable. To allow the lexer to influence the tokeniser, what characters are we going to use in (? ) for smoke and mirrors extensions? (?s) and (?m) are already taken. [Seriously, I was under the impression that the perl tokenizer was influenced by the state of the lexer] Nicholas Clark
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 11:22:02AM +0100, Nicholas Clark wrote: [Seriously, I was under the impression that the perl tokenizer was influenced by the state of the lexer] Currently, the tokeniser and the lexer are a combined entity. It doesn't have to be this way, though. At least, I don't think it does, until you're allowed to define your own special variables which I sincerely hope won't happen. (Quick, how would you parse ""?) To be perfectly honest, my preferred solution would be to have the tokenizer, lexer and parser as a single, hand-crafted LR(k) monstrosity. -- "So i get the chance to reread my postings to asr at times, with a corresponding conservation of the almighty leviam00se, Kai Henningsen." -- Megahal (trained on asr), 1998-11-06
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
Simon Cozens wrote: Currently, the tokeniser and the lexer are a combined entity. Yes, in the vast majority of languages; so people get used to thinking that it has to be this way. my preferred solution would be to have the tokenizer, lexer and parser as a single, hand-crafted LR(k) monstrosity. This is a case of me agreeing with Simon 1000%. I was going to just let it go by, but I thought it might be nice to add my aol/ for a change. -- John Porter
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
Simon Cozens wrote: It's time to drag out my quote of the week: Recursive-descent, or predictive, parsing ONLY works on grammars where the first terminal symbol of each subexpression provides enough information to choose which production to use. Recursive-descent parsers are nice because they are *much* easier to generate errors with. They are also much easier to generate segmented grammars which is nice for something like Perl because there are so many quiet shifts into several different sub-languages. The only real problem is prediction and that is *easily* solved with look-ahead and/or back-tracking. IMHO back-tracking is preferable, especially if there are cut-points where the search tree can be pruned. I think it's very powerful to think of a grammar as a declarative program which searches for the best-fit between itself and the input stream. So, while I don't doubt that, with the state of Perl's regexes these days, it's possible to create something with enough sentience to tokenize Perl, I've really got to wonder whether it's sane. I think the goal would be to increase the power of the regexes to handle Perl grammar. This could be the coolest language tool since yacc. (I'm intentionally not comparing Perl's regex to lex. We shouldn't make the same stupid mistake as lex/yacc by splitting a language into a token specification and a grammar with incompatible syntax.) - Ken
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
At 10:22 AM 10/17/00 -0400, John Porter wrote: Simon Cozens wrote: Currently, the tokeniser and the lexer are a combined entity. Yes, in the vast majority of languages; so people get used to thinking that it has to be this way. I'd just as soon we thought a bit differently. I'm not sure we want to split the lexer and tokenizer out, but I don't want to rule out the possibility. It's looking like a goodly portion of the lexing/tokenizing/parsing bit of perl 6 will be written in perl, so I'm not sure how things are going to split out just yet. I don't suppose anyone's got code to translate a perl regex into C? Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 01:18:39PM -0400, Ken Fox wrote: The other down-side is that we'd be doing a whole lot of custom work designed just for parsing Perl instead of creating something more general and powerful that can be used for other problems as well. For example, I'd imagine the PDL folks would much rather extend a recursive-descent parser with back-tracking than an LR(k) monstrosity. Not that I know anything about how to write a parser. But anecdotal evidence from all the syntax highlighting editors etc. is that perl is the hardest thing to parse. Hence if perl6 contains a generic parser powerful enough to parse perl (if this can be done), to me this would suggest that it would allow a lot of other people to use it to rapidly implement parsers for just about anything else. Nicholas Clark
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 01:18:39PM -0400, Ken Fox wrote: Those are hard to understand because so much extra work has to be done to compensate for lack of top-down state when doing a bottom-up match. I haven't found this to be true. Since Perl is much more difficult than C++ to parse... Perl is essentially a natural language processing problem, and so I'd think twice before hitting it with a pure computer science solution. Come on, guys, take a look at toke.c; there's a *probabilistic* part-of-speech tagger in there. This is NLP, and we do things differently. Have you read this? ftp://ftp.cs.titech.ac.jp/pub/TR/93/TR93-0003.ps.gz -- This process can check if this value is zero, and if it is, it does something child-like. -- Forbes Burkowski, CS 454, University of Washington
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
Adam Turoff wrote: to write the Perl tokenizer in a Perl[56] regex, which is more easily parsable in C. All of a sudden, toke.c is replaced by toke.re, which would be much more legible to this community (which is more of a strike against toke.c instead of a benefit of some toke.re). Larry brought this up in his talk. Of course, I believe that Larry was sleep-deprived at the time, too. ;) It was late though. Might have been sleep deprevation talking. -- Bradley M. Kuhn - http://www.ebb.org/bkuhn PGP signature
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 07:18:54PM -0400, Bradley M. Kuhn wrote: Adam Turoff wrote: to write the Perl tokenizer in a Perl[56] regex, which is more easily parsable in C. All of a sudden, toke.c is replaced by toke.re, which would be much more legible to this community (which is more of a strike against toke.c instead of a benefit of some toke.re). Larry brought this up in his talk. Of course, I believe that Larry was sleep-deprived at the time, too. ;) It was late though. Might have been sleep deprevation talking. Dammit, I'm not finding the message in the thread, but someone casually mentioned writing the important bits of parsing Perl in Perl5, generating bytecode, and starting Perl6 by writing the bytecode loader. (Apologies for not finding the attribution. Please stand up and elucidate if you've had this idea as well.) That approach does have a significant amount of merit. Smalltalk, FORTH, Lisp (etc.), and Java work in that manner. That would pose a bootstrapping problem if there were no Perl5 to start with. That should also aid in the testing effort. Z.
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
On Tue, Oct 17, 2000 at 08:57:43PM -0400, Dan Sugalski wrote: On Tue, 17 Oct 2000, Adam Turoff wrote: Dammit, I'm not finding the message in the thread, but someone casually mentioned writing the important bits of parsing Perl in Perl5, generating bytecode, and starting Perl6 by writing the bytecode loader. (Apologies for not finding the attribution. Please stand up and elucidate if you've had this idea as well.) That would be me. I wasn't necessarily thinking of emitting p6 bytecode, though that's certainly possible. What's wrong with bootstrapping Perl6 with the Perl5 bytecode (or the most interesting subset of Perl5 bytecode)? That lets Perl6 start out where the parser is read in as bytecode (or compiled to C from Perl or bytecode) and modify those bytecodes as the need progresses. Voila. No bootstrapping problem. (e.g. start writing the Perl6 parser in Perl5). That should also aid in the testing effort. I hadn't thought about that, but it would. I'm thinking we need to set up a bunch of performance benchmarks for p6 development too, though that can go in as part of the general QA. (Not much Q there if we run slower...) This came up before in another thread many moons ago (don't have time to find the reference). IIRC, this was deemed an exercise in premature optimization (i.e. EVIL!). Z.
Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
Bradley M. Kuhn sent the following bits through the ether: It should be noted that in Larry's speech on Friday, he said that he wanted to write the Lexer and Parser for Perl in some subset of Perl. :) Is there a writeup somewhere for those who couldn't attend? Hmmm, I wonder what kind of subset would be necessary - surely the most useful constructs are also the most complicated... Leon -- Leon Brocard.http://www.astray.com/ yapc::Europehttp://yapc.org/Europe/ ... Where has all that spare time just come from? ;-)
Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)
Dan Sugalski wrote: At 07:48 PM 10/12/00 +, John van V wrote: * It also means we can write bits of perl in Perl, and similarly not have to care about this fact. Granted, some developers are thick as a brick... If you are writing perl in Perl, then, presumably, you would know this. But perl won't be written in Perl. It'll be written in C, most likely. It should be noted that in Larry's speech on Friday, he said that he wanted to write the Lexer and Parser for Perl in some subset of Perl. :) (I was the only one who clapped, which either means: (a) this is not a popular idea (b) there weren't many Perl6 hackers in the crowd (c) I am extremely over-exited about the prospect of writing the lexer and parser in Perl. :) -- Bradley M. Kuhn - http://www.ebb.org/bkuhn PGP signature
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: C's vararg handling sucks in many sublime and profound ways. It does, though, work. If we declare in advance that all C-visible perl functions have an official parameter list of (...), then we can make it work. The calling program would just fetch function pointers from us somehow, and do the call in. Can't. ISO C requires that all variadic functions take at least one named parameter. The best you can do is something like (void *, ...). (Perl_Interpreter *, ...) surely? [Having seen the pTHX_ appear all over perl5] Nicholas Clark
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
At 10:16 AM 10/13/00 +0100, Nicholas Clark wrote: On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote: Dan Sugalski [EMAIL PROTECTED] writes: C's vararg handling sucks in many sublime and profound ways. It does, though, work. If we declare in advance that all C-visible perl functions have an official parameter list of (...), then we can make it work. The calling program would just fetch function pointers from us somehow, and do the call in. Can't. ISO C requires that all variadic functions take at least one named parameter. The best you can do is something like (void *, ...). (Perl_Interpreter *, ...) surely? That'd be a good one, except we're trying to avoid that if we can... :( Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
Dan Sugalski wrote: At 08:57 PM 10/12/00 +0100, Simon Cozens wrote: On Thu, Oct 12, 2000 at 03:43:07PM -0400, Dan Sugalski wrote: Doing this also means someone writing an app with an embedded perl interpreter can call into perl code the same way as they call into any C library. Of course, the problem comes that we can't have anonymous functions in C. Sure we do. You can get a pointer to a function, and then call that function through the pointer. (Though argument handling's rather dodgy) That is, if we want to call Perl sub "foo", we'll really need to call something like call_perl("foo", ..args... ); whereas we'd much rather do this: foo(..args..) (Especially since C's handling of varargs is, well, unpleasant.) cat ENDEND rfc334.h: /* based on http://www.eskimo.com/~scs/C-faq/q15.4.html */ void *call_perl(char *PerlFuncName, ...); ENDEND Which then makes the RFC121 -oh output simple and easy: perl routines which have been marked with a RFC334 attribute indicating their C calling convention get two lines written to standard output (or the designated header file): one is a macro, which will call callperl directly with the name and the args, #define fooDIRECT(A,B,C) callperl(foo, A,B,C) and the other is a wrapper. perlval *foo(int A, perlval *B, char *C); -- David Nicol 816.235.1187 [EMAIL PROTECTED] "After jotting these points down, we felt better."
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
At 07:48 PM 10/12/00 +, John van V wrote: * It also means we can write bits of perl in Perl, and similarly not have to care about this fact. Granted, some developers are thick as a brick... If you are writing perl in Perl, then, presumably, you would know this. But perl won't be written in Perl. It'll be written in C, most likely. It'll be written in a modular and overridable way, though. And that means that if we want to override things with perl code, that perl code needs to be callable the same way that any C function is. Doing this also means someone writing an app with an embedded perl interpreter can call into perl code the same way as they call into any C library. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
At 08:57 PM 10/12/00 +0100, Simon Cozens wrote: On Thu, Oct 12, 2000 at 03:43:07PM -0400, Dan Sugalski wrote: Doing this also means someone writing an app with an embedded perl interpreter can call into perl code the same way as they call into any C library. Of course, the problem comes that we can't have anonymous functions in C. Sure we do. You can get a pointer to a function, and then call that function through the pointer. (Though argument handling's rather dodgy) That is, if we want to call Perl sub "foo", we'll really need to call something like call_perl("foo", ..args... ); whereas we'd much rather do this: foo(..args..) (Especially since C's handling of varargs is, well, unpleasant.) C's vararg handling sucks in many sublime and profound ways. It does, though, work. If we declare in advance that all C-visible perl functions have an official parameter list of (...), then we can make it work. The calling program would just fetch function pointers from us somehow, and do the call in. Granted this will be a pain on our side (since I expect the C vararg stuff is very different from platform to platform, even more so (and more annoyingly) than plain function call stuff) but it is doable. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
Dan Sugalski [EMAIL PROTECTED] writes: C's vararg handling sucks in many sublime and profound ways. It does, though, work. If we declare in advance that all C-visible perl functions have an official parameter list of (...), then we can make it work. The calling program would just fetch function pointers from us somehow, and do the call in. Can't. ISO C requires that all variadic functions take at least one named parameter. The best you can do is something like (void *, ...). -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote: Can't. ISO C requires that all variadic functions take at least one named parameter. The best you can do is something like (void *, ...). Argh. Can't we just use a stack? I like stacks. Stacks make sense. -- ..you could spend *all day* customizing the title bar. Believe me. I speak from experience." (By Matt Welsh)
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
At 12:07 AM 10/13/00 +0100, Simon Cozens wrote: On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote: Can't. ISO C requires that all variadic functions take at least one named parameter. The best you can do is something like (void *, ...). Well, damn. And I mean that sincerely. :( Argh. Can't we just use a stack? I like stacks. Stacks make sense. Stacks cost time and programmer effort. Parameters are generally passed in registers, (how many depends on your architecture, but sane ones have a bunch) while stacks require smacking data into memory somewhere and messing with the stack pointer. Even if you have a macro to push, it means doing this: PUSH(i); PUSH(j); PUSH(k); call_foo(); instead of: call_foo(i, j, k); Stacks are OK for internal work (though I'd prefer a register file, if even a virtual one) but for externally exposed stuff we're better off taking parameters if we can. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
Dan Sugalski wrote: Well, damn. And I mean that sincerely. :( I don't think it's that big a deal. Easy enough to wrap in a macro. -- John Porter
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
On Thu, Oct 12, 2000 at 10:55:52PM -0400, John Porter wrote: Dan Sugalski wrote: Well, damn. And I mean that sincerely. :( I don't think it's that big a deal. Easy enough to wrap in a macro. I thought (hoped) that the plan was the avoid the cpp like the plague and cancer it is. The massive overdose of cpp magic is one of the main reasons why understanding/debugging/developing Perl 5 core is so much fun. This example is slightly off because most of the "fun" is courtesy gcc, not perl, but I think it illustrates the point quite nicely. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-08/msg01810.html -- John Porter -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
Jarkko Hietaniemi wrote: On Thu, Oct 12, 2000 at 10:55:52PM -0400, John Porter wrote: I don't think it's that big a deal. Easy enough to wrap in a macro. I thought (hoped) that the plan was the avoid the cpp like the plague and cancer it is. Well, yes, definitely; but we're just adding an argument to the front of the arg list. Is that so bad? -- John Porter
Re: RFC 334 (v1) I'm {STILL} trying to understand this...
At 11:21 PM 10/12/00 -0400, John Porter wrote: Jarkko Hietaniemi wrote: On Thu, Oct 12, 2000 at 10:55:52PM -0400, John Porter wrote: I don't think it's that big a deal. Easy enough to wrap in a macro. I thought (hoped) that the plan was the avoid the cpp like the plague and cancer it is. Well, yes, definitely; but we're just adding an argument to the front of the arg list. Is that so bad? Yes, it is. Extra arguments cost. It's not free to pass them in by any means--you can see hits up to 10% in some extreme cases. If the arguments are used it's one thing, but if they're dummy (as they would be in those cases where the indirect routine you're calling *isn't* a perl one) then that cost can really add up. CPP won't help us either, since we'll be calling these routines indirectly, via a function pointer in C. There's no way we can reasonably get CPP involved, even if we wanted to. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk