Hi,
Thanks for the comments :)
I'll then rewrite most of my rules into tokens. And about the
definition of ?ws, the engine I mentioned is Pugs::Complier::Rule,
so that if what PGE does is considered the correct way, I will
change the behavior of P::C::Rule. By the way, if someone can add
it to S05 would make me more comfortable.
Shu-Chun Weng
On Fri, Jun 02, 2006 at 09:03:18AM -0500, Patrick R. Michaud wrote:
On Fri, Jun 02, 2006 at 02:17:25PM +0800, Shu-chun Weng wrote:
1. Spaces at beginning and end of rule blocks should be ignored
since space before and after current rule are most likely be
defined in rules using current one.
1a. I'm not sure if it's clear to define as this, but the spaces
around the rule-level alternative could also be ignored.
At one point I had been exploring along similar lines, but at the
moment I'd say we don't want to do this. See below for an example...
For instance, look at the rule FunctionAppExpr defined in
MiniPerl6 grammar file.
rule FunctionAppExpr
{Variable|Constants|ArrayRef|FunctionName[?ws?'('?ws?Parameters')']?}
FWIW, I'd go ahead and write this as a token statement instead of
a rule:
token FunctionAppExpr {
| Variable
| Constants
| ArrayRef
| FunctionName [ ?ws \( ?ws Parameters \) ]?
}
In fact, now that I've written the above I'm more inclined to say
it's not a good idea to ignore some whitespace in rule definitions
but not others. Consider:
rule FunctionAppExpr {
| Variable
| Constants
| ArrayRef
| FunctionName[ \( Parameters \) ]?
}
Can we quickly determine where the ?ws are being generated?
What if the [...] portion had an alternation in it?
(And, if we ignore leading/trailing whitespace in rule blocks, do
we also ignore leading/trailing whitespace in subpatterns?)
In a couple of grammars I've developed already (especially the
one used for pgc.pir), having whitespace at the beginning of rules
and around alternations become ?ws is useful and important.
In these cases, ignoring such whitespace would mean adding explicit
?ws in the rule to get things to work. At that point it feels like
waterbed theory -- by improving things for the FunctionAppExpr
rule above we're pushing the complexity somewhere else.
In general I'd say that in a production such as FunctionAppExpr
where there are just a few places that need ?ws, then it's
better to use 'token' and explicitly indicate the allowed
whitespace.
(Side observation: in
...|FunctionName[?ws?'('?ws?Parameters')']?}
above, there's no whitespace between Parameters and the closing paren.
Why not?)
2. I am not sure the default rule of ws, I couldn't found it in
S05. Currently the engine use :P5/\s+/ but I would like it to
be :P/\s*/ when it's before or after non-words and remains
the same (\s+) otherwise.
PGE does the \s* when before or after non-words and \s+ otherwise
explicitly in its ws rule, which is written in PIR. (Being able
to write subrules procedurally is Ireally nice.)
In P5 it'd probably be something like
(?:(?!\w)|(?!\w))\s*|\s+
or maybe better is
(?:(?!\w)|(?!\w)|\s)\s*
Pm