Luke Palmer wrote:

[By the way, shouldn't this grammar be called "Perl" rather than
"Perl6::Grammar"?...


Grammars and classes share a namespace, so I think Perl::Grammar is
correct...


I got the name Perl for the grammar from S05, which also gives this example:

   given $source_code {
       $parsetree = m/<Perl.prog>/;
   }


# Whitespace definition for Perl code.
rule ws() {
# Case 1: Unicode space characters, comments, or POD blocks, or
# any combination thereof.
[ \s | Âcomment | Âpod ]+


I changed your Âcomment and Âpod to <comment> and <pod>. We don't
have a policy yet on what we're caputring and how, so I'm just leaving
all the angle brackets single. Once we decide how our resultant data
structure should look, we can go back and change them.


Good idea.

     # Case 2: We're looking at a non-word-constituent or EOF,
     # meaning zero-width counts as whitespace.
 | <before \W> | $

# Case 3: We must be looking at a word constituent. We match
# whitespace at BOF or after a non-word-constituent.
| ^ | <after \W>



I'm going to kill these last two cases. The rules for where whitespace
is optional are more complex than whether you're on a word constituent
or not. The user of the ws rule is going to know whether whitespace is
optional or required in a particular position, so he can put <ws> or
<ws>? as he needs to. Also, if we're being good little boys, we'll be
putting backtracking colons after our identifier matches, so a <ws> rule
will never show up in the middle of an identifier.


I'm not sure this will work, unless you get rid of the :w's everywhere in this grammar. My understanding of how :w works (from S05) is that it puts <ws> in place of every whitespace sequence in the rule. This means that <ws> has to be smart enough to match the empty string at particular places. These two cases are my take on where those particular places should be for Perl code -- though I may well be missing something!

}

# Comment definition for Perl code.
rule comment() {
# A hash ("#"), then everything through the next newline or EOF.
<'#'> .*? [ \n | $ ]
}


I factored <'#'> out into <comment_introducer>. We're putting all token
characters into their own rules so it's easy for extenders to change
them.


Also a good idea, though of course the fact that comment is a rule means that extenders can already do this with a little more work.

...


Okay, it's in. I can't say it's correct, since I've never been very
good at writing regexes, and this is certainly more like a regex than
like a grammar. When I wasn't sure how something worked, I just assumed
you did it right.


Ouch -- I hope somebody (Larry?) gives it a once-over. I'm not a regex guru either, I find myself writing them every year or two. Nowhere near often enough for me to assume I've done it right, anyway.

However, we'd like to eventually make the POD rule less like a match and
more like a parse. The POD sections are going to be stored as metadata
for the program to grab if it needs to. Right now, it just pretends
it's all a comment.


That makes sense. So the plan is to change POD syntax to not require a blank line before each command line? I think that will help a lot. Maybe I'll take a crack at expanding this to more completely parse the POD, while it's still fresh in my mind.

Luke

Reply via email to