Rather than answer each message in this thread individually, I'll try to aggregate them here. Disclaimer: These are just my interpretations of how rules are defined; I'm not the one who decides how they *should* be defined.
On Wed, May 25, 2005 at 10:55:59AM -0400, Jeff 'japhy' Pinyan wrote: > On May 25, Mark A. Biggar said: > >Jonathan Scott Duff wrote: > >>On Tue, May 24, 2005 at 11:24:50PM -0400, Jeff 'japhy' Pinyan wrote: > >>>I wish <!prop X> was allowed. I don't see why <!...> has to be confined > >>>to zero-width assertions. <!...> isn't confined to use with zero-width assertions, but <!...> always acts as a zero-width assertion. In essence, since we're requiring a negative match, nothing is consumed by that negative match. In some senses <!subrule> is the same as <!before <subrule> >. > >Now <prop X> is a character class just like <+digit> and so > >under the new character class syntax, would probably be written > ><+prop X> or if the white space is a problem, then maybe <+prop:X> > >(or <+prop(X)> as Larry gets the colon :-), but that is a pretty > >adverbial case so ':' maybe okay) with the complemented case being > ><-prop:X>. The whitespace itself isn't a problem, but it means that whatever follows is parsed using rules syntax and not a string constant. Thus we probably want <prop:Lu> or <prop("Lu")> and not <prop Lu>. And to be a little pedantic in terminology, I call <prop:Lu> a capturing subrule, not a character class match (although that subrule probably does match and capture just a single character). The character class match would be <+prop:Lu> or something like that. However, we do get into a parsing issue with <+prop:Lu+prop:Ll>, which would probably have to be written as <+prop('Lu')+prop('Ll')>, unless we treat the + as "special". (AFAIK, the :-argument form of subrule calls isn't well defined yet -- it's only briefly mentioned/proposed in A05.) > >Actually the 'prop' may be unnecessary at all, as we know > >we're in the character class sub-language because we saw the '<+', '<-' > >or '<[', so we could just define the various Unicode character property > >codes (I.e., Lu, Ll, Zs, etc) as pre-defined character class names just > >like 'digit' or 'letter'. I like this. > Yeah, that was going to be my next step, except that the unknowing person > might make a sub-rule of their own called, say, "Zs", and then which would > take precedence? Perhaps <prop:X> is a good way of writing it. Well, it works out the same as if someone creates their own "digit" or "alpha" rule. One can always get to the built-in definition by explicit scoping using <Grammar::digit> (or wherever the built-ins end up being defined). Pm