Re: Let the hacking commence!

2005-01-08 Thread Luke Blanshard
Luke Palmer wrote:
[By the way, shouldn't this grammar be called "Perl" rather than
"Perl6::Grammar"?...
   

Grammars and classes share a namespace, so I think Perl::Grammar is
correct...
 

I got the name Perl for the grammar from S05, which also gives this example:
   given $source_code {
   $parsetree = m//;
   }

# Whitespace definition for Perl code.
rule ws() {
 # Case 1: Unicode space characters, comments, or POD blocks, or
 # any combination thereof.
   [ \s | Âcomment | Âpod ]+
   

I changed your Âcomment and Âpod to  and .  We don't
have a policy yet on what we're caputring and how, so I'm just leaving
all the angle brackets single.  Once we decide how our resultant data
structure should look, we can go back and change them.  
 

Good idea.
 # Case 2: We're looking at a non-word-constituent or EOF,
 # meaning zero-width counts as whitespace.
 |  | $
 # Case 3: We must be looking at a word constituent.  We match
 # whitespace at BOF or after a non-word-constituent.
 | ^ | 
   

I'm going to kill these last two cases.  The rules for where whitespace
is optional are more complex than whether you're on a word constituent
or not.  The user of the ws rule is going to know whether whitespace is
optional or required in a particular position, so he can put  or
? as he needs to.  Also, if we're being good little boys, we'll be
putting backtracking colons after our identifier matches, so a  rule
will never show up in the middle of an identifier.
 

I'm not sure this will work, unless you get rid of the :w's everywhere 
in this grammar.  My understanding of how :w works (from S05) is that it 
puts  in place of every whitespace sequence in the rule.  This means 
that  has to be smart enough to match the empty string at particular 
places.  These two cases are my take on where those particular places 
should be for Perl code -- though I may well be missing something!

}
# Comment definition for Perl code.
rule comment() {
 # A hash ("#"), then everything through the next newline or EOF.
   <'#'> .*? [ \n | $ ]
}
   

I factored <'#'> out into .  We're putting all token
characters into their own rules so it's easy for extenders to change
them.
 

Also a good idea, though of course the fact that comment is a rule means 
that extenders can already do this with a little more work.

...
   

Okay, it's in.  I can't say it's correct, since I've never been very
good at writing regexes, and this is certainly more like a regex than
like a grammar.  When I wasn't sure how something worked, I just assumed
you did it right.
 

Ouch -- I hope somebody (Larry?) gives it a once-over.  I'm not a regex 
guru either, I find myself writing them every year or two.  Nowhere near 
often enough for me to assume I've done it right, anyway.

However, we'd like to eventually make the POD rule less like a match and
more like a parse.  The POD sections are going to be stored as metadata
for the program to grab if it needs to.  Right now, it just pretends
it's all a comment.
 

That makes sense.  So the plan is to change POD syntax to not require a 
blank line before each command line?  I think that will help a lot.  
Maybe I'll take a crack at expanding this to more completely parse the 
POD, while it's still fresh in my mind.

Luke


Re: Let the hacking commence!

2005-01-08 Thread Luke Palmer
Luke Blanshard writes:
> Luke Palmer wrote:
> >This list is for people interested in building the Perl 6 compiler.  Now
> >you have your first real task!  
> >
> >We have to make a formal grammar for Perl 6.  Perl 6 is a huge language,
> >so the task seems better done incrementally by the community...
> >
> >Send patches to this list.
> 
> OK, I'll bite.  In contrast to Luke's 50-thousand-foot level, I'm
> diving down into the goriest of details.  At the end of this message
> is a rule for whitespace within Perl code, and supporting rules for
> comments and pod.

Excellent! 

> I'm not posting this as a diff, because I have the faint suspicion
> that others might have been hacking on this file offline.  But I
> gather these rules should go in the "TOKENS" section.

Well, you might think that, but it wouldn't be true.  I haven't touched
it since I sent it out.  I *should* have, so your suspicion is
well-founded. :-)

> [By the way, shouldn't this grammar be called "Perl" rather than
> "Perl6::Grammar"?  Also, is this file now available in some repository
> somewhere?]

Grammars and classes share a namespace, so I think Perl::Grammar is
correct.  That's what it's currently called in the repository, which is
at:

https://svn.perl.org/perl6/grammar/trunk

> I'd like reviewers to pay special attention to the pod stuff.  It's
> not clear to me what the precise rules are or should be for blank
> lines preceding pod commands.  I got from S02 the idea that we should
> allow standalone =begin/=end sections (and that they should nest).
> But does the =end line have to be preceded by a blank line?

Nope.

> As far as I can tell, the =begin line does not.  In the interest of
> symmetry, I have written the rules to not require a blank line before
> the closing =end either.  Even though this appears to violate the
> usual rules for pod.

Yeah, we're changing the blank line requirement, mostly to make
bulleted lists take up less vertical space.

> (Another guy called) Luke
> 
> 
> 
> 
> # Whitespace definition for Perl code.
> rule ws() {
>   # Case 1: Unicode space characters, comments, or POD blocks, or
>   # any combination thereof.
> [ \s | Âcomment | Âpod ]+

I changed your Âcomment and Âpod to  and .  We don't
have a policy yet on what we're caputring and how, so I'm just leaving
all the angle brackets single.  Once we decide how our resultant data
structure should look, we can go back and change them.  

> 
>   # Case 2: We're looking at a non-word-constituent or EOF,
>   # meaning zero-width counts as whitespace.
>   |  | $
> 
>   # Case 3: We must be looking at a word constituent.  We match
>   # whitespace at BOF or after a non-word-constituent.
>   | ^ | 

I'm going to kill these last two cases.  The rules for where whitespace
is optional are more complex than whether you're on a word constituent
or not.  The user of the ws rule is going to know whether whitespace is
optional or required in a particular position, so he can put  or
? as he needs to.  Also, if we're being good little boys, we'll be
putting backtracking colons after our identifier matches, so a  rule
will never show up in the middle of an identifier.

> }
> 
> # Comment definition for Perl code.
> rule comment() {
>   # A hash ("#"), then everything through the next newline or EOF.
> <'#'> .*? [ \n | $ ]
> }

I factored <'#'> out into .  We're putting all token
characters into their own rules so it's easy for extenders to change
them.

> # A POD block, as extended for P6.  This is a =begin/=end pair, a =for
> # paragraph, or a standard =/=cut block.
> rule pod() {
>   # Case 1: a =begin/=end block, in its own rule so it can
>   # recurse.
> Âpod_begin_end_blockÂ
> 
>   # Case 2: a =for paragraph.  "=for" at BOL, plus any space
>   # character, starts it, and the first blank line (or EOF) ends
>   # it.
>   | ^^=for \s :: .*? [ \n \h* \n | $ ]
> 
>   # Case 3: any arbitrary POD block.  Starts with "=" at BOL,
>   # followed by a letter, ends with "=cut" at BOL or at EOF.
>   | ^^=<+> :: .*? [ \n =cut [ \s | $ ] | $ ]
> }

Factored = out into 

> # A (recursive) =begin/=end POD block.
> rule pod_begin_end_block() {
>   # Starts with "=begin" at BOL, followed by an optional name
>   # which we save to match with the corresponding "=end".
> ^^=begin [ \h+ $ := (\S+) | \h* \n ]
> 
>   # Next comes any number of single characters or nested =begin/
>   # =end blocks -- but the smallest number that will match...
> [ . | Âpod_begin_end_block ]*?

Reversed as you requested, and added an alternative between them that
speeds things up.  That's right, I'm preprematurely optimizing.

> 
>   # ...an "=end" at BOL followed by the name saved above, or
>   # followed by nothing if there wasn't one.  If we make it to EOF
>   # without finding the "=end" line, we blow up.
> [
>   ^^=end [ <( $ )> :: \h+ $ |  ] \h

Re: Let the hacking commence!

2005-01-08 Thread Luke Blanshard
Luke Blanshard wrote:
  # Next comes any number of single characters or nested =begin/
  # =end blocks -- but the smallest number that will match...
[ . | «pod_begin_end_block» ]*?
Actually I think that alternation needs to be in the other order, 
doesn't it?  (This is within rule pod_begin_end_block.)


Re: Let the hacking commence!

2005-01-08 Thread Luke Blanshard
Luke Palmer wrote:
This list is for people interested in building the Perl 6 compiler.  Now
you have your first real task!  

We have to make a formal grammar for Perl 6.  Perl 6 is a huge language,
so the task seems better done incrementally by the community...
Send patches to this list.
OK, I'll bite.  In contrast to Luke's 50-thousand-foot level, I'm
diving down into the goriest of details.  At the end of this message
is a rule for whitespace within Perl code, and supporting rules for
comments and pod.
I'm not posting this as a diff, because I have the faint suspicion
that others might have been hacking on this file offline.  But I
gather these rules should go in the "TOKENS" section.
[By the way, shouldn't this grammar be called "Perl" rather than
"Perl6::Grammar"?  Also, is this file now available in some repository
somewhere?]
I'd like reviewers to pay special attention to the pod stuff.  It's
not clear to me what the precise rules are or should be for blank
lines preceding pod commands.  I got from S02 the idea that we should
allow standalone =begin/=end sections (and that they should nest).
But does the =end line have to be preceded by a blank line?  As far as
I can tell, the =begin line does not.  In the interest of symmetry, I
have written the rules to not require a blank line before the closing
=end either.  Even though this appears to violate the usual rules for
pod.
(Another guy called) Luke

# Whitespace definition for Perl code.
rule ws() {
  # Case 1: Unicode space characters, comments, or POD blocks, or
  # any combination thereof.
[ \s | «comment» | «pod» ]+
  # Case 2: We're looking at a non-word-constituent or EOF,
  # meaning zero-width counts as whitespace.
  |  | $
  # Case 3: We must be looking at a word constituent.  We match
  # whitespace at BOF or after a non-word-constituent.
  | ^ | 
}
# Comment definition for Perl code.
rule comment() {
  # A hash ("#"), then everything through the next newline or EOF.
<'#'> .*? [ \n | $ ]
}
# A POD block, as extended for P6.  This is a =begin/=end pair, a =for
# paragraph, or a standard =/=cut block.
rule pod() {
  # Case 1: a =begin/=end block, in its own rule so it can
  # recurse.
«pod_begin_end_block»
  # Case 2: a =for paragraph.  "=for" at BOL, plus any space
  # character, starts it, and the first blank line (or EOF) ends
  # it.
  | ^^=for \s :: .*? [ \n \h* \n | $ ]
  # Case 3: any arbitrary POD block.  Starts with "=" at BOL,
  # followed by a letter, ends with "=cut" at BOL or at EOF.
  | ^^=<+> :: .*? [ \n =cut [ \s | $ ] | $ ]
}
# A (recursive) =begin/=end POD block.
rule pod_begin_end_block() {
  # Starts with "=begin" at BOL, followed by an optional name
  # which we save to match with the corresponding "=end".
^^=begin [ \h+ $ := (\S+) | \h* \n ]
  # Next comes any number of single characters or nested =begin/
  # =end blocks -- but the smallest number that will match...
[ . | «pod_begin_end_block» ]*?
  # ...an "=end" at BOL followed by the name saved above, or
  # followed by nothing if there wasn't one.  If we make it to EOF
  # without finding the "=end" line, we blow up.
[
  ^^=end [ <( $ )> :: \h+ $ |  ] \h* [ \n | $ ]
|
  $  { fail "Unterminated =begin/=end block" }
]
}


Re: Let the hacking commence!

2004-12-22 Thread Luke Palmer
Luke Palmer writes:
> Also, don't use rule parameters, conditionals, or code blocks.  Those
> things require us to know Perl 6 before we're done defining Perl 6.
> Keep it essentially BNF with Perl 6 syntax (it's okay to use groups and
> quantifiers though, since those can always be converted to formal BNF).

Change in plan on this one:  We're going to shoot for the grammar as
Perl will see it, and then factor it down into a bootstrappable grammar
later.  So whatever hook hashes we're going to use should be in there
(though they may not be named properly).  Basically, pretend Perl 6
exists and write it for Perl 6.

Luke


Re: Let the hacking commence!

2004-12-21 Thread Luke Palmer
Patrick R. Michaud writes:
> > rule identifier() { <> \w* }
> 
> Does Perl 6 allow leading underscores in identifiers?  If so,
> shouldn't this be
> 
> rule identifier() { <++[_]> \w* }
> 
> ?

Yeah, it should. There was an error anyway:

rule identifier() { <+> \w* }

Fixed.

> 
> > rule open_expression_grouping() { \( }
> > rule close_expression_grouping() { \) }
> > rule open_argument_list() { \( }
> > rule close_argument_list() { \) }
> 
> I'm not sure I agree with expression_grouping being defined in this
> way-- it seems to me that parens (and brackets and braces and dots)
> are being treated as operators (S03, S04), perhaps even
> "postcircumfix" operators if I understand what that means (A12).  So
> we need to be a bit careful here.

Parens are plain old "circumfix".  We could stick that into the
operator-precedence parser, and in fact they probably belong there.  But
the grammar is supposed to be so extensible that if we try to define
things in terms of hooks from the beginning, we'll never get anywhere.

You're right about the argument_list forms.  Keep in mind that these are
just the token definitions.  The rules for using them are up higher.
Again, my reasoning for including them was the same: we have to include
something.  And I figure it's easier to take stuff out than to put stuff
in.

> 
> In addition to reviewing what's been done so far, I'll take a stab
> at writing the rules for P6 rules.  :-)

Eexcellent.

Luke


Re: Let the hacking commence!

2004-12-20 Thread Patrick R. Michaud
A few initial questions/comments on some small things -- I'll get
to the bigger constructs a bit later.  I'm an "outside-in" designer,
so I tend to work on the macro and micro levels until I meet in the
middle.

> rule identifier() { <> \w* }

Does Perl 6 allow leading underscores in identifiers?  If so,
shouldn't this be

rule identifier() { <++[_]> \w* }

?


> rule open_expression_grouping() { \( }
> rule close_expression_grouping() { \) }
> rule open_argument_list() { \( }
> rule close_argument_list() { \) }

I'm not sure I agree with expression_grouping being defined in this way--
it seems to me that parens (and brackets and braces and dots) are being 
treated as operators (S03, S04), perhaps even "postcircumfix" operators 
if I understand what that means (A12).  So we need to be a bit careful
here.

In addition to reviewing what's been done so far, I'll take a stab
at writing the rules for P6 rules.  :-)

Pm


Re: Let the hacking commence!

2004-12-20 Thread Patrick R. Michaud
On Mon, Dec 20, 2004 at 03:32:31PM -0700, Luke Palmer wrote:
> We have to make a formal grammar for Perl 6.  Perl 6 is a huge language,
> so the task seems better done incrementally by the community.  The
> current version can be seen temporarily at
> http://luqui.org/perl6/Grammar.perl6 until the svn repository is fully
> set up.  

And the svn repository is now set up, at https://svn.perl.org/perl6.
The grammar itself lives in grammar/trunk/Grammar.perl6, although this
and the overall repository structure will certainly change over time.
At least we're using subversion from the outset so that this will
hopefully be easier.

I agree with Luke that our initial goal is to just get the rules written,
and to look for complete coverage of the language.  Of course, those
of you following perl6-language know that Perl 6 is continuing to
evolve, so we're going to be following along as best we can.  When 
there's a doubt about how to do something, we'll follow whatever is 
written in the most recent Synopsis/Apocalypse/whatever, and if it's
a major language design issue we'll kick it back to perl6-language for
discussion.

Also, as most of you know, Perl 6 is designed so that the grammar can
be modified/extended from within Perl itself, so to we need to make sure
the grammar is understandable as well as workable.  (Usually these two
goals work together anyway.)  So, while I highly encourage any and
all contributions to the grammar, some "code-only" patches may be
delayed until we have suitable descriptions to go along with it.
And if something isn't clear to someone, please say so.

In particular, consistent with good programming practice, we want the
names of our rules and other constructs to be chosen carefully and
consistent with the terms found in the language description.  So, this
is one of our reasons to go with small careful steps at first, to
make sure the nouns and verbs we use in the grammar match the
nouns and verbs we want to use when describing the language elsewhere.

We're still working out some of the details of parsing, including
operator precedence.  As a result some things may be handwavy at first 
-- that's normal.  We may even have a few false starts here and there, 
and that's okay too.  (After all, there's been a lot of false starts 
on p6l, so we're allowed a few in p6c also. :-)  

Most of all, by building the grammar publicly and in small steps, and
then doing the same with the compiler, I'm hoping to increase the
number of people who can help with building and maintaining the p6
compiler, as well as provide a path for others to follow along later.

And so, on to the rules!

Pm


Let the hacking commence!

2004-12-20 Thread Luke Palmer
This list is for people interested in building the Perl 6 compiler.  Now
you have your first real task!  

We have to make a formal grammar for Perl 6.  Perl 6 is a huge language,
so the task seems better done incrementally by the community.  The
current version can be seen temporarily at
http://luqui.org/perl6/Grammar.perl6 until the svn repository is fully
set up.  I've also attached the initial revision to this message.

It's written in a top-down fashion so far (because that's how my brain
works with grammars), but feel free to work on it bottom-up, left-right
(though recall that we have to make an LL grammar :-), inside-out,
whatever.  Let's just get rules written.  Also, document your work as
much as possible.  We won't be accepting new rules without explanation
unless they are *really* trivial.

Send patches to this list.

Patrick will shortly write an introduction explaining our larger design
goals.  I'll just be focusing on the more technical stuff.  With that in
mind, here are a couple of technical notes:

This grammar should be suitable for both the bootstrap and the main
language.  That means that if there's something that should go in one
but not the other, it shouldn't go into this grammar.

Also, don't use rule parameters, conditionals, or code blocks.  Those
things require us to know Perl 6 before we're done defining Perl 6.
Keep it essentially BNF with Perl 6 syntax (it's okay to use groups and
quantifiers though, since those can always be converted to formal BNF).

Keep the grammar LL(k), which means no left recursion.  If you need to
write a rule left-recursively, you probably need to change that into a
repeat quantifier one level higher.

We're going to be sandwitching an operator-precedence parser in between
two recursive descents to achieve a balance of speed and dynamism.  That
means that we don't need to worry about that huge list of operators.  We
only need to worry about the ones that take non-standard operands, like
==> or ~~.  The others will be encoded in a computer-readable table
somewhere.

Most of all, hack liberally!  Don't be afraid to change stuff around.
These patches are easy to write, and they can be as small or as large as
you feel necessary.  It's also helpful to make suggestions even if you
don't have a patch.

Have fun,
Luke
grammar Perl::Grammar;

## MAIN LANGUAGE
# Everything down until 

rule program() {  }

# a statement_list usually is enclosed in curlies, except for in 
# the main program
rule statement_list() :w { 
?
  |  ?
  |  [   ]?
}

# sub, method, class, grammar, etc.
rule declaration() {...}

# if, while, loop, etc.
rule statement_construct() {...}

## OPERATOR PRECEDENCE PARSER
# The only operators that need attention here are those that take "nonstandard"
# operands; i.e. ==> which takes a sub call, or ~~ which takes a lot of
# different things.  Operators whose operands are other expressions will be
# handled automatically.  It might be useful to inline an operator precedence
# table here.

## LEXER STUFF
# Definition of  down to basic syntax, as well as basic token
# names.  Don't inline tokens above; instead, name them here and refer
# to them with rules. 

#  is called upon when expecting a term.
rule term_hook() {  }

rule term() :w { 
  
  | 
  | 
}

#  is called upon when expecting an operator.
rule operator_hook() {# notice no :w

  | ? 
}

# Handle the equivalence of $foo{bar} and $foo .{bar}, but not
# $foo {bar}.
rule dot_subscript() {# again, no :w

  | ?  
}

# Anything nonalphabetic that comes after a dot, like <>, {}, [], etc.
rule subscript_non_method() {...}

# Anything at all that comes after a dot.
# XXX: Still need to handle listop method calls: $foo.bar: 1,2,3;
rule subscript() {
 ?
  | 
}

rule enclosed_argument_list() {
  
}

# The inside of the argument list of a sub call, not including parentheses.
# Also, if it fits, the end of a listop, which ought to be the same thing
# without the parentheses.
rule argument_list() {...}

# XXX still need to handle symbolics: $Foo::(expression)::bar
rule qualified_identifier() {
[  <'::'> ]* 
}

# This is intentionally not defined.  This will be defined dynamically,
# and in different ways for the bootstrap and the main grammar.
rule standard_operator() {...}

## TOKENS
rule identifier() { <> \w* }
rule statement_separator() { ; }
rule open_expression_grouping() { \( }
rule close_expression_grouping() { \) }
rule open_argument_list() { \( }
rule close_argument_list() { \) }
rule dot() { \. }

## RULES
# This is the metagrammar for Perl 6 rules, from  down to
# basic syntax.

#  is the inside of a rule, excluding the initial slash or
# braces or whatever the delimiter is.
# XXX: How do we handle varying delimiters?
rule pattern() { * }

# Any piece of a pattern: atoms, groups, quantified components, code blocks,
# etc.
rule pattern_component() {...}

## TYPE SYSTEM
# Definition of the type s