Rather than answer each message in this thread individually, I'll
try to aggregate them here. Disclaimer: These are just my
interpretations of how rules are defined; I'm not the one who
decides how they *should* be defined.
On Wed, May 25, 2005 at 10:55:59AM -0400, Jeff 'japhy' Pinyan wrote:
> On May 25, Mark A. Biggar said:
> >Jonathan Scott Duff wrote:
> >>On Tue, May 24, 2005 at 11:24:50PM -0400, Jeff 'japhy' Pinyan wrote:
> >>>I wish <!prop X> was allowed. I don't see why <!...> has to be confined
> >>>to zero-width assertions.
<!...> isn't confined to use with zero-width assertions, but <!...>
always acts as a zero-width assertion. In essence, since we're requiring
a negative match, nothing is consumed by that negative match.
In some senses <!subrule> is the same as <!before <subrule> >.
> >Now <prop X> is a character class just like <+digit> and so
> >under the new character class syntax, would probably be written
> ><+prop X> or if the white space is a problem, then maybe <+prop:X>
> >(or <+prop(X)> as Larry gets the colon :-), but that is a pretty
> >adverbial case so ':' maybe okay) with the complemented case being
> ><-prop:X>.
The whitespace itself isn't a problem, but it means that whatever
follows is parsed using rules syntax and not a string constant. Thus we
probably want <prop:Lu> or <prop("Lu")> and not <prop Lu>.
And to be a little pedantic in terminology, I call <prop:Lu> a capturing
subrule, not a character class match (although that subrule probably does
match and capture just a single character). The character class
match would be <+prop:Lu> or something like that. However, we do
get into a parsing issue with <+prop:Lu+prop:Ll>, which would probably
have to be written as <+prop('Lu')+prop('Ll')>, unless we treat the +
as "special". (AFAIK, the :-argument form of subrule calls isn't well
defined yet -- it's only briefly mentioned/proposed in A05.)
> >Actually the 'prop' may be unnecessary at all, as we know
> >we're in the character class sub-language because we saw the '<+', '<-'
> >or '<[', so we could just define the various Unicode character property
> >codes (I.e., Lu, Ll, Zs, etc) as pre-defined character class names just
> >like 'digit' or 'letter'.
I like this.
> Yeah, that was going to be my next step, except that the unknowing person
> might make a sub-rule of their own called, say, "Zs", and then which would
> take precedence? Perhaps <prop:X> is a good way of writing it.
Well, it works out the same as if someone creates their own "digit" or
"alpha" rule. One can always get to the built-in definition by explicit
scoping using <Grammar::digit> (or wherever the built-ins end up
being defined).
Pm