Hello, I am very interested in chromatic's pheme language. I have been reading through the code and looking at your TODO list. I thought I would tackle some of the easier issues to get a handle on PIR and help out a bit.
questions:
1. Are you targetting r5 or r6 ? I think r6 would be a better fit for parrot
myself. In particular the spec for (library foo) aka name-spaces would
help pheme integrate with parrot/other languages better.
I decided to start with something easy: whitespace.
I looked up r6 which has a nice BNF grammar that is a useful starting point.
I came up with the rules below:
rule ignore { [ <comment> | <delimiter> ]* }
token comment { ; \N* <eol> }
rule delimiter { <blank> | <eol> }
token blank { <[\ \t]>+ }
token eol { \n\r? }
I know almost zero about PGE, I am reading docs etc. but basically what
I would like to do is build tokens out of tokens. Ideally I would like
to make both "ignore" and "delimiter" be tokens , not rules.
This is more of a writing convenience. In a difficult sed script I built in
the past to go through and convert a bunch of broken C++ decls I would use
shell variables to store regex building blocks, and then assemble those
building blocks into higher level expressions with basic string interpolation.
question:
1. can tokens be used to build tokens ? In perl 5 I would compile a regex with
string interpolation to get this sort of functionality. if so is there a
name
for this feature in PGE ?
2. token eol { \n\r? }
This is pretty clearly for handling windows line terminators. This is the
sort
of thing that should be pushed down into parrot. a special builtin "eol" or
"end-of-line" token could help get rid of this stuff out of parrot. Is this
RT worthy ? Something like this would definitely fit with the "conservation
of
cruft" principle.
3. Is there a tool for pretty printing a AST dump ? I am thinking of dumping
the AST using dump, then using a classic tree drawing algorithm , and
drawing a tree using SVG. Something like that could probably be done
easily in perl5. Is there a tool like this ?
4. how do you debug AST ? recommended tools ?
atom handling:
I noticed that atom handling looked very alpha. It looks like you want to
distinguish
between symbols "foo" | "foo-bar" , and literal values "#t" | "#f". This is
really nasty
to do at a lexical level.
A nicer way to do this would be to form a string token like this:
token string { <!reserved>+ }
token reserved {
# r5 reserved
# future reserved
<[
\( \) \# '
\[ \] \{ \}
]>
}
At this point most languages are going to need to post-lexical analysis of the
string to distinguish literal values from symbols. A syntax like this would be
nice:
token truth:string {
# <[tf]>
}
token integral:string {
\d+
}
token symbol:string fallback
This syntax would indicate that after the token string has been lexed that
it is again analyzed by a regex, and converted to either a truth value,
integral value, or a symbol if all else fails.
If this is not already implemented I would like to create a TODO RT for
it.
Thanks for any comments/suggestions.
Cheers,
Mike Mattie - [EMAIL PROTECTED]
signature.asc
Description: PGP signature
