Let the hacking commence!

Luke Palmer Mon, 20 Dec 2004 14:20:01 -0800

This list is for people interested in building the Perl 6 compiler.  Now
you have your first real task!


We have to make a formal grammar for Perl 6.  Perl 6 is a huge language,
so the task seems better done incrementally by the community.  The
current version can be seen temporarily at
http://luqui.org/perl6/Grammar.perl6 until the svn repository is fully
set up.  I've also attached the initial revision to this message.

It's written in a top-down fashion so far (because that's how my brain
works with grammars), but feel free to work on it bottom-up, left-right
(though recall that we have to make an LL grammar :-), inside-out,
whatever.  Let's just get rules written.  Also, document your work as
much as possible.  We won't be accepting new rules without explanation
unless they are *really* trivial.

Send patches to this list.

Patrick will shortly write an introduction explaining our larger design
goals.  I'll just be focusing on the more technical stuff.  With that in
mind, here are a couple of technical notes:

This grammar should be suitable for both the bootstrap and the main
language.  That means that if there's something that should go in one
but not the other, it shouldn't go into this grammar.

Also, don't use rule parameters, conditionals, or code blocks.  Those
things require us to know Perl 6 before we're done defining Perl 6.
Keep it essentially BNF with Perl 6 syntax (it's okay to use groups and
quantifiers though, since those can always be converted to formal BNF).

Keep the grammar LL(k), which means no left recursion.  If you need to
write a rule left-recursively, you probably need to change that into a
repeat quantifier one level higher.

We're going to be sandwitching an operator-precedence parser in between
two recursive descents to achieve a balance of speed and dynamism.  That
means that we don't need to worry about that huge list of operators.  We
only need to worry about the ones that take non-standard operands, like
==> or ~~.  The others will be encoded in a computer-readable table
somewhere.

Most of all, hack liberally!  Don't be afraid to change stuff around.
These patches are easy to write, and they can be as small or as large as
you feel necessary.  It's also helpful to make suggestions even if you
don't have a patch.

Have fun,
Luke

grammar Perl::Grammar;

## MAIN LANGUAGE
# Everything down until <expression>

rule program() { <statement_list> }

# a statement_list usually is enclosed in curlies, except for in 
# the main program
rule statement_list() :w { 
        <declaration>        <statement_list>?
      | <statment_construct> <statement_list>?
      | <expression> [ <statement_separator> <statement_list> ]?
}

# sub, method, class, grammar, etc.
rule declaration() {...}

# if, while, loop, etc.
rule statement_construct() {...}

## OPERATOR PRECEDENCE PARSER
# The only operators that need attention here are those that take "nonstandard"
# operands; i.e. ==> which takes a sub call, or ~~ which takes a lot of
# different things.  Operators whose operands are other expressions will be
# handled automatically.  It might be useful to inline an operator precedence
# table here.

## LEXER STUFF
# Definition of <term> down to basic syntax, as well as basic token
# names.  Don't inline tokens above; instead, name them here and refer
# to them with rules. 

# <term_hook> is called upon when expecting a term.
rule term_hook() { <term> }

rule term() :w { 
        <open_expression_grouping> <expression> <close_expression_grouping>
      | <variable>
      | <literal>
}

# <operator_hook> is called upon when expecting an operator.
rule operator_hook() {    # notice no :w
        <dot_subscript>
      | <ws>? <standard_operator>
}

# Handle the equivalence of $foo{bar} and $foo .{bar}, but not
# $foo {bar}.
rule dot_subscript() {    # again, no :w
        <subscript_non_method>
      | <ws>? <dot> <subscript>
}

# Anything nonalphabetic that comes after a dot, like <>, {}, [], etc.
rule subscript_non_method() {...}

# Anything at all that comes after a dot.
# XXX: Still need to handle listop method calls: $foo.bar: 1,2,3;
rule subscript() {
        <qualified_identifier> <enclosed_argument_list>?
      | <subscript_non_method>
}

rule enclosed_argument_list() {
        <open_argument_list> <argument_list> <close_argument_list>
}

# The inside of the argument list of a sub call, not including parentheses.
# Also, if it fits, the end of a listop, which ought to be the same thing
# without the parentheses.
rule argument_list() {...}

# XXX still need to handle symbolics: $Foo::(expression)::bar
rule qualified_identifier() {
        [ <identifier> <'::'> ]* <identifier>
}

# This is intentionally not defined.  This will be defined dynamically,
# and in different ways for the bootstrap and the main grammar.
rule standard_operator() {...}

## TOKENS
rule identifier() { <<alpha>> \w* }
rule statement_separator() { ; }
rule open_expression_grouping() { \( }
rule close_expression_grouping() { \) }
rule open_argument_list() { \( }
rule close_argument_list() { \) }
rule dot() { \. }

## RULES
# This is the metagrammar for Perl 6 rules, from <pattern> down to
# basic syntax.

# <pattern> is the inside of a rule, excluding the initial slash or
# braces or whatever the delimiter is.
# XXX: How do we handle varying delimiters?
rule pattern() { <pattern_component>* }

# Any piece of a pattern: atoms, groups, quantified components, code blocks,
# etc.
rule pattern_component() {...}

## TYPE SYSTEM
# Definition of the type syntax, from <type> down to tokens.  This is too "low"
# for signatures/siglets, which ought to be up in the main language section.

# Any type: things that come between <my> and the variable, things that
# introduce a parameter in a signature, etc.  Use <qualified_identifier> for
# now in place of a previously declared type name.
rule type() {...}

Let the hacking commence!

Reply via email to