I'm working on a Perl 5 module that will allow for the parsing of a Perl 6
rule into a tree structure -- specifically, I'm subclassing/extending
Regexp::Parser into Perl6::Rule::Parser. This module is designed ONLY to
PARSE the contents of a rule; it is not concerned with the implementation
of all the new things Perl 6 rules will offer, merely their syntax. Once
this module is done, I'll work on a slightly broader one which will
concern itself with the exterior of the rule (the m:xyz:abc('def')/.../
part, rather than the contents of the rule itself).
To do this effectively, I need an exhaustive list of all tokens that can
appear in a Perl 6 rule. By "token", I mean a single unit of purpose,
such as ^^ and <after ...> and **{3..6}. I have looked through the latest
revisions of Apo05 and Syn05 (from Dec 2004) and come up with the
following list:
http://japhy.perlmonk.org/perl6/rules.txt
The list is split up by leading character. I think it's complete, but I'm
probably wrong, which is why I need more eyes to look over it and tell me
what I've missed.
I just got an email back from Damian which will help me move in the right
direction, but I'd like this to be open to as many knowledgeable minds as
possible.
The part which needs a bit of clarification right now, in my opinion, is
character classes. From what I can gather, these are character classes:
<[a-z] +<digit>>
<+<alpha> -[aeiouAEIOU]>
but I want to be sure. I'm also curious about whitespace. Is "<[" one
token, or can I write "< [a-z] >" and have it be a character class?
Thanks for your help. Unless you're difficult.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart