I'm working on a Perl 5 module that will allow for the parsing of a Perl 6 rule into a tree structure -- specifically, I'm subclassing/extending Regexp::Parser into Perl6::Rule::Parser. This module is designed ONLY to PARSE the contents of a rule; it is not concerned with the implementation of all the new things Perl 6 rules will offer, merely their syntax. Once this module is done, I'll work on a slightly broader one which will concern itself with the exterior of the rule (the m:xyz:abc('def')/.../ part, rather than the contents of the rule itself).

To do this effectively, I need an exhaustive list of all tokens that can appear in a Perl 6 rule. By "token", I mean a single unit of purpose, such as ^^ and <after ...> and **{3..6}. I have looked through the latest revisions of Apo05 and Syn05 (from Dec 2004) and come up with the following list:

  http://japhy.perlmonk.org/perl6/rules.txt

The list is split up by leading character. I think it's complete, but I'm probably wrong, which is why I need more eyes to look over it and tell me what I've missed.

I just got an email back from Damian which will help me move in the right direction, but I'd like this to be open to as many knowledgeable minds as possible.

The part which needs a bit of clarification right now, in my opinion, is character classes. From what I can gather, these are character classes:

  <[a-z] +<digit>>
  <+<alpha> -[aeiouAEIOU]>

but I want to be sure. I'm also curious about whitespace. Is "<[" one token, or can I write "< [a-z] >" and have it be a character class?

Thanks for your help.  Unless you're difficult.

--
Jeff "japhy" Pinyan         %  How can we ever be the sold short or
RPI Acacia Brother #734     %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %    -- Meister Eckhart

Reply via email to