Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Jeremy Howard

Leon Brocard wrote:
 Bradley M. Kuhn sent the following bits through the ether:

  It should be noted that in Larry's speech on Friday, he said that he
wanted
  to write the Lexer and Parser for Perl in some subset of Perl.  :)

 Is there a writeup somewhere for those who couldn't attend?

 Hmmm, I wonder what kind of subset would be necessary - surely the
 most useful constructs are also the most complicated...

We could learn quite a bit by looking through the code from
Parse::RecDescent, switch.pm, and friends. Damian's done a lot of parsing
(including parsing Perl) with Perl, so this would be a good place to start.

In terms of bootstrapping, however, we either need to:
 - Write the Perl subset in C (or some other portable language), or
 - Use Perl 5 as the 'Perl subset', and distribute that with Perl 6.

The 2nd of these options seems unlikely to be practical... Maybe however the
bootstrapper could be a subset of Perl 5 stolen fairly directly from the
existing code. Maybe this would then also become the Perl for small/embedded
devices.





Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Simon Cozens

On Tue, Oct 17, 2000 at 03:56:20AM -0400, Adam Turoff wrote:
  We could learn quite a bit by looking through the code from
  Parse::RecDescent, switch.pm, and friends. Damian's done a lot of parsing
  (including parsing Perl) with Perl, so this would be a good place to start.

It's time to drag out my quote of the week:

Recursive-descent, or predictive, parsing ONLY works on grammars
where the first terminal symbol of each subexpression provides
enough information to choose which production to use.

(Appel, emphasis mine.)

 Gisle and I were talking about this tonight, and it *might* be possible
 to write the Perl tokenizer in a Perl[56] regex, which is more easily 
 parsable in C.  All of a sudden, toke.c is replaced by toke.re, which
 would be much more legible to this community (which is more of a strike
 against toke.c instead of a benefit of some toke.re).  That would certainly
 qualify as implementing the Perl grammar in Perl, and might even be
 achievable.   (*gasp!*)

This would have to take account of the fact that Perl's tokeniser is
aware of what's going on in the rest of perl. Consider

print foo;

What should the tokeniser return for "foo"? Is it a bareword? Is it a
subroutine call? Is it a class? Is it - heaven forbid - a filehandle? 
Well, it could be any of these things. You have to choose.

So, while I don't doubt that, with the state of Perl's regexes these
days, it's possible to create something with enough sentience to
tokenize Perl, I've really got to wonder whether it's sane.

-- 
BEWARE!  People acting under the influence of human nature.



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Nicholas Clark

On Tue, Oct 17, 2000 at 11:00:35AM +0100, Simon Cozens wrote:
 On Tue, Oct 17, 2000 at 10:37:24AM +0100, Simon Cozens wrote:
  What should the tokeniser return for "foo"? 
 
 Uh, tokenizer != lexer. Insert coffee. Yes, writing a tokeniser in a regexp
 should be very doable.

To allow the lexer to influence the tokeniser, what characters are we
going to use in (? ) for smoke and mirrors extensions? (?s) and (?m) are
already taken.

[Seriously, I was under the impression that the perl tokenizer was
influenced by the state of the lexer]

Nicholas Clark



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Simon Cozens

On Tue, Oct 17, 2000 at 11:22:02AM +0100, Nicholas Clark wrote:
 [Seriously, I was under the impression that the perl tokenizer was
 influenced by the state of the lexer]

Currently, the tokeniser and the lexer are a combined entity. It doesn't have
to be this way, though. At least, I don't think it does, until you're allowed
to define your own special variables which I sincerely hope won't happen.

(Quick, how would you parse ""?)

To be perfectly honest, my preferred solution would be to have the tokenizer,
lexer and parser as a single, hand-crafted LR(k) monstrosity.

-- 
"So i get the chance to reread my postings to asr at times, with a
corresponding conservation of the almighty leviam00se, Kai Henningsen."
-- Megahal (trained on asr), 1998-11-06



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread John Porter

Simon Cozens wrote:
 
 Currently, the tokeniser and the lexer are a combined entity. 

Yes, in the vast majority of languages; so people get used to thinking
that it has to be this way.


 my preferred solution would be to have the tokenizer,
 lexer and parser as a single, hand-crafted LR(k) monstrosity.

This is a case of me agreeing with Simon 1000%.
I was going to just let it go by, but I thought it might 
be nice to add my aol/ for a change.

-- 
John Porter




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Ken Fox

Simon Cozens wrote:
 It's time to drag out my quote of the week:
 
 Recursive-descent, or predictive, parsing ONLY works on grammars
 where the first terminal symbol of each subexpression provides
 enough information to choose which production to use.

Recursive-descent parsers are nice because they are *much* easier to
generate errors with. They are also much easier to generate segmented
grammars which is nice for something like Perl because there are so
many quiet shifts into several different sub-languages.

The only real problem is prediction and that is *easily* solved with
look-ahead and/or back-tracking. IMHO back-tracking is preferable,
especially if there are cut-points where the search tree can be pruned.
I think it's very powerful to think of a grammar as a declarative
program which searches for the best-fit between itself and the input
stream.

 So, while I don't doubt that, with the state of Perl's regexes these
 days, it's possible to create something with enough sentience to
 tokenize Perl, I've really got to wonder whether it's sane.

I think the goal would be to increase the power of the regexes to
handle Perl grammar. This could be the coolest language tool since
yacc. (I'm intentionally not comparing Perl's regex to lex. We shouldn't
make the same stupid mistake as lex/yacc by splitting a language into
a token specification and a grammar with incompatible syntax.)

- Ken



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Dan Sugalski

At 10:22 AM 10/17/00 -0400, John Porter wrote:
Simon Cozens wrote:
 
  Currently, the tokeniser and the lexer are a combined entity.

Yes, in the vast majority of languages; so people get used to thinking
that it has to be this way.

I'd just as soon we thought a bit differently. I'm not sure we want to 
split the lexer and tokenizer out, but I don't want to rule out the 
possibility. It's looking like a goodly portion of the 
lexing/tokenizing/parsing bit of perl 6 will be written in perl, so I'm not 
sure how things are going to split out just yet.

I don't suppose anyone's got code to translate a perl regex into C?

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Nicholas Clark

On Tue, Oct 17, 2000 at 01:18:39PM -0400, Ken Fox wrote:
 The other down-side is that we'd be doing a whole lot of custom work designed
 just for parsing Perl instead of creating something more general and powerful
 that can be used for other problems as well. For example, I'd imagine the PDL
 folks would much rather extend a recursive-descent parser with back-tracking
 than an LR(k) monstrosity.

Not that I know anything about how to write a parser. But anecdotal evidence
from all the syntax highlighting editors etc. is that perl is the hardest
thing to parse. Hence if perl6 contains a generic parser powerful enough to
parse perl (if this can be done), to me this would suggest that it would
allow a lot of other people to use it to rapidly implement parsers for just
about anything else.

Nicholas Clark



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Simon Cozens

On Tue, Oct 17, 2000 at 01:18:39PM -0400, Ken Fox wrote:
 Those are hard to understand because so much extra work has to be done to
 compensate for lack of top-down state when doing a bottom-up match.

I haven't found this to be true.

 Since Perl is much more difficult than C++ to parse...

Perl is essentially a natural language processing problem, and so I'd think
twice before hitting it with a pure computer science solution.

Come on, guys, take a look at toke.c; there's a *probabilistic* part-of-speech
tagger in there. This is NLP, and we do things differently.

Have you read this?
ftp://ftp.cs.titech.ac.jp/pub/TR/93/TR93-0003.ps.gz

-- 
This process can check if this value is zero, and if it is, it does
something child-like.
-- Forbes Burkowski, CS 454, University of Washington



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Bradley M. Kuhn

Adam Turoff wrote:

 to write the Perl tokenizer in a Perl[56] regex, which is more easily 
 parsable in C.  All of a sudden, toke.c is replaced by toke.re, which
 would be much more legible to this community (which is more of a strike
 against toke.c instead of a benefit of some toke.re).

Larry brought this up in his talk.  Of course, I believe that Larry was
sleep-deprived at the time, too.  ;)

 It was late though.  Might have been sleep deprevation talking.


-- 
Bradley M. Kuhn  -  http://www.ebb.org/bkuhn

 PGP signature


Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Adam Turoff

On Tue, Oct 17, 2000 at 07:18:54PM -0400, Bradley M. Kuhn wrote:
 Adam Turoff wrote:
  to write the Perl tokenizer in a Perl[56] regex, which is more easily 
  parsable in C.  All of a sudden, toke.c is replaced by toke.re, which
  would be much more legible to this community (which is more of a strike
  against toke.c instead of a benefit of some toke.re).
 
 Larry brought this up in his talk.  Of course, I believe that Larry was
 sleep-deprived at the time, too.  ;)
 
  It was late though.  Might have been sleep deprevation talking.

Dammit, I'm not finding the message in the thread, but someone casually
mentioned writing the important bits of parsing Perl in Perl5, generating
bytecode, and starting Perl6 by writing the bytecode loader.  (Apologies
for not finding the attribution.  Please stand up and elucidate if
you've had this idea as well.)

That approach does have a significant amount of merit.  Smalltalk, FORTH,
Lisp (etc.), and Java work in that manner.  That would pose a bootstrapping 
problem if there were no Perl5 to start with.  That should also aid in the
testing effort.

Z.




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Adam Turoff

On Tue, Oct 17, 2000 at 08:57:43PM -0400, Dan Sugalski wrote:
 On Tue, 17 Oct 2000, Adam Turoff wrote:
  Dammit, I'm not finding the message in the thread, but someone casually
  mentioned writing the important bits of parsing Perl in Perl5, generating
  bytecode, and starting Perl6 by writing the bytecode loader.  (Apologies
  for not finding the attribution.  Please stand up and elucidate if
  you've had this idea as well.)
 
 That would be me. I wasn't necessarily thinking of emitting p6 bytecode,
 though that's certainly possible. 

What's wrong with bootstrapping Perl6 with the Perl5 bytecode (or the
most interesting subset of Perl5 bytecode)?  That lets Perl6 start out
where the parser is read in as bytecode (or compiled to C from Perl or
bytecode) and modify those bytecodes as the need progresses.  Voila.  
No bootstrapping problem.  (e.g. start writing the Perl6 parser in Perl5).

  That should also aid in the testing effort.
 
 I hadn't thought about that, but it would. I'm thinking we need to set up
 a bunch of performance benchmarks for p6 development too, though that can
 go in as part of the general QA. (Not much Q there if we run slower...)

This came up before in another thread many moons ago (don't have
time to find the reference). IIRC, this was deemed an exercise in
premature optimization (i.e. EVIL!).

Z.




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-16 Thread Leon Brocard

Bradley M. Kuhn sent the following bits through the ether:

 It should be noted that in Larry's speech on Friday, he said that he wanted
 to write the Lexer and Parser for Perl in some subset of Perl.  :)

Is there a writeup somewhere for those who couldn't attend?

Hmmm, I wonder what kind of subset would be necessary - surely the
most useful constructs are also the most complicated...

Leon
-- 
Leon Brocard.http://www.astray.com/
yapc::Europehttp://yapc.org/Europe/

... Where has all that spare time just come from? ;-)



Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-15 Thread Bradley M. Kuhn

Dan Sugalski wrote:
 At 07:48 PM 10/12/00 +, John van V wrote:
 
 
   * It also means we can write bits of perl in Perl, and similarly not 
  have
 to care about this fact.
 
 Granted, some developers are thick as a brick...
 If you are writing perl in Perl, then, presumably, you would know this.
 
 But perl won't be written in Perl. It'll be written in C, most likely. 

It should be noted that in Larry's speech on Friday, he said that he wanted
to write the Lexer and Parser for Perl in some subset of Perl.  :)

(I was the only one who clapped, which either means:

   (a) this is not a popular idea
   (b) there weren't many Perl6 hackers in the crowd
   (c) I am extremely over-exited about the prospect of writing the lexer
   and parser in Perl.  :)

-- 
Bradley M. Kuhn  -  http://www.ebb.org/bkuhn

 PGP signature


Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-13 Thread Nicholas Clark

On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote:
 Dan Sugalski [EMAIL PROTECTED] writes:
 
  C's vararg handling sucks in many sublime and profound ways. It does,
  though, work. If we declare in advance that all C-visible perl functions
  have an official parameter list of (...), then we can make it work. The
  calling program would just fetch function pointers from us somehow, and
  do the call in.
 
 Can't.  ISO C requires that all variadic functions take at least one named
 parameter.  The best you can do is something like (void *, ...).

(Perl_Interpreter *, ...)
surely?

[Having seen the pTHX_ appear all over perl5]

Nicholas Clark



Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-13 Thread Dan Sugalski

At 10:16 AM 10/13/00 +0100, Nicholas Clark wrote:
On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote:
  Dan Sugalski [EMAIL PROTECTED] writes:
 
   C's vararg handling sucks in many sublime and profound ways. It does,
   though, work. If we declare in advance that all C-visible perl functions
   have an official parameter list of (...), then we can make it work. The
   calling program would just fetch function pointers from us somehow, and
   do the call in.
 
  Can't.  ISO C requires that all variadic functions take at least one named
  parameter.  The best you can do is something like (void *, ...).

(Perl_Interpreter *, ...)
surely?

That'd be a good one, except we're trying to avoid that if we can... :(

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-13 Thread David L. Nicol

Dan Sugalski wrote:
 
 At 08:57 PM 10/12/00 +0100, Simon Cozens wrote:
 On Thu, Oct 12, 2000 at 03:43:07PM -0400, Dan Sugalski wrote:
   Doing this also means someone writing an app with an embedded perl
   interpreter can call into perl code the same way as they call into any C
   library.
 
 Of course, the problem comes that we can't have anonymous functions in C.
 
 Sure we do. You can get a pointer to a function, and then call that
 function through the pointer. (Though argument handling's rather dodgy)
 
 That is, if we want to call Perl sub "foo", we'll really need to call
 something like
 
  call_perl("foo", ..args... );
 
 whereas we'd much rather do this:
 
  foo(..args..)
 
 (Especially since C's handling of varargs is, well, unpleasant.)

cat ENDEND  rfc334.h:


/* based on http://www.eskimo.com/~scs/C-faq/q15.4.html */
void *call_perl(char *PerlFuncName, ...);

ENDEND



Which then makes the RFC121 -oh output simple and easy:  perl routines
which have been marked with a RFC334 attribute indicating their C calling
convention get two lines written to standard output (or the designated
header file):

one is a macro, which will call callperl directly with the name and the args,

#define fooDIRECT(A,B,C) callperl(foo, A,B,C)

and the other is a wrapper.

perlval *foo(int A, perlval *B, char *C);






-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
"After jotting these points down, we felt better."



Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Dan Sugalski

At 07:48 PM 10/12/00 +, John van V wrote:


  * It also means we can write bits of perl in Perl, and similarly not 
 have
to care about this fact.

Granted, some developers are thick as a brick...
If you are writing perl in Perl, then, presumably, you would know this.

But perl won't be written in Perl. It'll be written in C, most likely. 
It'll be written in a modular and overridable way, though. And that means 
that if we want to override things with perl code, that perl code needs to 
be callable the same way that any C function is.

Doing this also means someone writing an app with an embedded perl 
interpreter can call into perl code the same way as they call into any C 
library.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Dan Sugalski

At 08:57 PM 10/12/00 +0100, Simon Cozens wrote:
On Thu, Oct 12, 2000 at 03:43:07PM -0400, Dan Sugalski wrote:
  Doing this also means someone writing an app with an embedded perl
  interpreter can call into perl code the same way as they call into any C
  library.

Of course, the problem comes that we can't have anonymous functions in C.

Sure we do. You can get a pointer to a function, and then call that 
function through the pointer. (Though argument handling's rather dodgy)

That is, if we want to call Perl sub "foo", we'll really need to call
something like

 call_perl("foo", ..args... );

whereas we'd much rather do this:

 foo(..args..)

(Especially since C's handling of varargs is, well, unpleasant.)

C's vararg handling sucks in many sublime and profound ways. It does, 
though, work. If we declare in advance that all C-visible perl functions 
have an official parameter list of (...), then we can make it work. The 
calling program would just fetch function pointers from us somehow, and do 
the call in.

Granted this will be a pain on our side (since I expect the C vararg stuff 
is very different from platform to platform, even more so (and more 
annoyingly) than plain function call stuff) but it is doable.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Russ Allbery

Dan Sugalski [EMAIL PROTECTED] writes:

 C's vararg handling sucks in many sublime and profound ways. It does,
 though, work. If we declare in advance that all C-visible perl functions
 have an official parameter list of (...), then we can make it work. The
 calling program would just fetch function pointers from us somehow, and
 do the call in.

Can't.  ISO C requires that all variadic functions take at least one named
parameter.  The best you can do is something like (void *, ...).

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/



Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Simon Cozens

On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote:
 Can't.  ISO C requires that all variadic functions take at least one named
 parameter.  The best you can do is something like (void *, ...).

Argh. Can't we just use a stack? I like stacks. Stacks make sense.

-- 
..you could spend *all day* customizing the title bar.  Believe me.  I
speak from experience."
(By Matt Welsh)



Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Dan Sugalski

At 12:07 AM 10/13/00 +0100, Simon Cozens wrote:
On Thu, Oct 12, 2000 at 03:24:23PM -0700, Russ Allbery wrote:
  Can't.  ISO C requires that all variadic functions take at least one named
  parameter.  The best you can do is something like (void *, ...).

Well, damn. And I mean that sincerely. :(

Argh. Can't we just use a stack? I like stacks. Stacks make sense.

Stacks cost time and programmer effort. Parameters are generally passed in 
registers, (how many depends on your architecture, but sane ones have a 
bunch) while stacks require smacking data into memory somewhere and messing 
with the stack pointer. Even if you have a macro to push, it means doing this:

   PUSH(i);
   PUSH(j);
   PUSH(k);
   call_foo();

instead of:

   call_foo(i, j, k);

Stacks are OK for internal work (though I'd prefer a register file, if even 
a virtual one) but for externally exposed stuff we're better off taking 
parameters if we can.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread John Porter

Dan Sugalski wrote:
 
 Well, damn. And I mean that sincerely. :(

I don't think it's that big a deal.  Easy enough to wrap in a macro.

-- 
John Porter




Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Jarkko Hietaniemi

On Thu, Oct 12, 2000 at 10:55:52PM -0400, John Porter wrote:
 Dan Sugalski wrote:
  
  Well, damn. And I mean that sincerely. :(
 
 I don't think it's that big a deal.  Easy enough to wrap in a macro.

I thought (hoped) that the plan was the avoid the cpp like the plague
and cancer it is.  The massive overdose of cpp magic is one of the
main reasons why understanding/debugging/developing Perl 5 core is
so much fun.

This example is slightly off because most of the "fun" is courtesy
gcc, not perl, but I think it illustrates the point quite nicely.

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-08/msg01810.html

 -- 
 John Porter

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread John Porter

Jarkko Hietaniemi wrote:
 On Thu, Oct 12, 2000 at 10:55:52PM -0400, John Porter wrote:
  
  I don't think it's that big a deal.  Easy enough to wrap in a macro.
 
 I thought (hoped) that the plan was the avoid the cpp like the plague
 and cancer it is.  

Well, yes, definitely; but we're just adding an argument to the front
of the arg list.  Is that so bad?

-- 
John Porter




Re: RFC 334 (v1) I'm {STILL} trying to understand this...

2000-10-12 Thread Dan Sugalski

At 11:21 PM 10/12/00 -0400, John Porter wrote:
Jarkko Hietaniemi wrote:
  On Thu, Oct 12, 2000 at 10:55:52PM -0400, John Porter wrote:
  
   I don't think it's that big a deal.  Easy enough to wrap in a macro.
 
  I thought (hoped) that the plan was the avoid the cpp like the plague
  and cancer it is.

Well, yes, definitely; but we're just adding an argument to the front
of the arg list.  Is that so bad?

Yes, it is.

Extra arguments cost. It's not free to pass them in by any means--you can 
see hits up to 10% in some extreme cases. If the arguments are used it's 
one thing, but if they're dummy (as they would be in those cases where the 
indirect routine you're calling *isn't* a perl one) then that cost can 
really add up.

CPP won't help us either, since we'll be calling these routines indirectly, 
via a function pointer in C. There's no way we can reasonably get CPP 
involved, even if we wanted to.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk