Re: Let the hacking commence!

Luke Blanshard Sat, 08 Jan 2005 09:15:19 -0800

Luke Palmer wrote:

[By the way, shouldn't this grammar be called "Perl" rather than "Perl6::Grammar"?...

Grammars and classes share a namespace, so I think Perl::Grammar is correct...

I got the name Perl for the grammar from S05, which also gives this example:

   given $source_code {
       $parsetree = m/<Perl.prog>/;
   }

# Whitespace definition for Perl code. rule ws() { # Case 1: Unicode space characters, comments, or POD blocks, or # any combination thereof. [ \s | ÂcommentÂ | ÂpodÂ ]+

I changed your ÂcommentÂ and ÂpodÂ to <comment> and <pod>. We don't have a policy yet on what we're caputring and how, so I'm just leaving all the angle brackets single. Once we decide how our resultant data structure should look, we can go back and change them.

Good idea.

     # Case 2: We're looking at a non-word-constituent or EOF,
     # meaning zero-width counts as whitespace.
 | <before \W> | $
# Case 3: We must be looking at a word constituent. We match # whitespace at BOF or after a non-word-constituent. | ^ | <after \W>
I'm going to kill these last two cases. The rules for where whitespace is optional are more complex than whether you're on a word constituent or not. The user of the ws rule is going to know whether whitespace is optional or required in a particular position, so he can put <ws> or <ws>? as he needs to. Also, if we're being good little boys, we'll be putting backtracking colons after our identifier matches, so a <ws> rule will never show up in the middle of an identifier.

I'm not sure this will work, unless you get rid of the :w's everywhere in this grammar. My understanding of how :w works (from S05) is that it puts <ws> in place of every whitespace sequence in the rule. This means that <ws> has to be smart enough to match the empty string at particular places. These two cases are my take on where those particular places should be for Perl code -- though I may well be missing something!

}
# Comment definition for Perl code. rule comment() { # A hash ("#"), then everything through the next newline or EOF. <'#'> .*? [ \n | $ ] }
I factored <'#'> out into <comment_introducer>. We're putting all token characters into their own rules so it's easy for extenders to change them.

Also a good idea, though of course the fact that comment is a rule means that extenders can already do this with a little more work.

...

Okay, it's in. I can't say it's correct, since I've never been very good at writing regexes, and this is certainly more like a regex than like a grammar. When I wasn't sure how something worked, I just assumed you did it right.

Ouch -- I hope somebody (Larry?) gives it a once-over. I'm not a regex guru either, I find myself writing them every year or two. Nowhere near often enough for me to assume I've done it right, anyway.

However, we'd like to eventually make the POD rule less like a match and more like a parse. The POD sections are going to be stored as metadata for the program to grab if it needs to. Right now, it just pretends it's all a comment.

That makes sense. So the plan is to change POD syntax to not require a blank line before each command line? I think that will help a lot. Maybe I'll take a crack at expanding this to more completely parse the POD, while it's still fresh in my mind.

Luke

Re: Let the hacking commence!

Reply via email to