Hi, Thanks for the comments :)
I'll then rewrite most of my rules into tokens. And about the definition of <?ws>, the "engine" I mentioned is Pugs::Complier::Rule, so that if what PGE does is considered the "correct" way, I will change the behavior of P::C::Rule. By the way, if someone can add it to S05 would make me more comfortable. Shu-Chun Weng On Fri, Jun 02, 2006 at 09:03:18AM -0500, Patrick R. Michaud wrote: > On Fri, Jun 02, 2006 at 02:17:25PM +0800, Shu-chun Weng wrote: > > 1. Spaces at beginning and end of rule blocks should be ignored > > since space before and after current rule are most likely be > > defined in rules using current one. > > 1a. I'm not sure if it's "clear" to define as this, but the spaces > > around the rule-level alternative could also be ignored. > > At one point I had been exploring along similar lines, but at the > moment I'd say we don't want to do this. See below for an example... > > > For instance, look at the rule FunctionAppExpr defined in > > MiniPerl6 grammar file. > > > > rule FunctionAppExpr > > {<Variable>|<Constants>|<ArrayRef>|<FunctionName>[<?ws>?<'('><?ws>?<Parameters><')'>]?} > > FWIW, I'd go ahead and write this as a token statement instead of > a rule: > > token FunctionAppExpr { > | <Variable> > | <Constants> > | <ArrayRef> > | <FunctionName> [ <?ws> \( <?ws> <Parameters> \) ]? > } > > In fact, now that I've written the above I'm more inclined to say > it's not a good idea to ignore some whitespace in rule definitions > but not others. Consider: > > rule FunctionAppExpr { > | <Variable> > | <Constants> > | <ArrayRef> > | <FunctionName>[ \( <Parameters> \) ]? > } > > Can we quickly determine where the <?ws> are being generated? > What if the [...] portion had an alternation in it? > > (And, if we ignore leading/trailing whitespace in rule blocks, do > we also ignore leading/trailing whitespace in subpatterns?) > > In a couple of grammars I've developed already (especially the > one used for pgc.pir), having whitespace at the beginning of rules > and around alternations become <?ws> is useful and important. > In these cases, ignoring such whitespace would mean adding explicit > <?ws> in the rule to get things to work. At that point it feels like > waterbed theory -- by "improving" things for the FunctionAppExpr > rule above we're pushing the complexity somewhere else. > > In general I'd say that in a production such as FunctionAppExpr > where there are just a few places that need <?ws>, then it's > better to use 'token' and explicitly indicate the allowed > whitespace. > > (Side observation: in > ...|<FunctionName>[<?ws>?<'('><?ws>?<Parameters><')'>]?} > above, there's no whitespace between <Parameters> and the closing paren. > Why not?) > > > 2. I am not sure the default rule of <ws>, I couldn't found it in > > S05. Currently the engine use :P5/\s+/ but I would like it to > > be :P/\s*/ when it's before or after non-words and remains > > the same (\s+) otherwise. > > PGE does the "\s* when before or after non-words and \s+ otherwise" > explicitly in its <ws> rule, which is written in PIR. (Being able > to write subrules procedurally is I<really> nice.) > > In P5 it'd probably be something like > > (?:(?<!\w)|(?!\w))\s*|\s+ > > or maybe better is > > (?:(?<!\w)|(?!\w)|\s)\s* > > Pm