Rather than answer each message in this thread individually, I'll
try to aggregate them here.  Disclaimer:  These are just my
interpretations of how rules are defined; I'm not the one who 
decides how they *should* be defined.

On Wed, May 25, 2005 at 10:55:59AM -0400, Jeff 'japhy' Pinyan wrote:
> On May 25, Mark A. Biggar said:
> >Jonathan Scott Duff wrote:
> >>On Tue, May 24, 2005 at 11:24:50PM -0400, Jeff 'japhy' Pinyan wrote:
> >>>I wish <!prop X> was allowed.  I don't see why <!...> has to be confined 
> >>>to zero-width assertions.

<!...> isn't confined to use with zero-width assertions, but <!...> 
always acts as a zero-width assertion.  In essence, since we're requiring
a negative match, nothing is consumed by that negative match.

In some senses  <!subrule>  is the same as <!before <subrule> >.

> >Now <prop X> is a character class just like <+digit> and so
> >under the new character class syntax, would probably be written
> ><+prop X> or if the white space is a problem, then maybe <+prop:X>
> >(or <+prop(X)> as Larry gets the colon :-), but that is a pretty
> >adverbial case so ':' maybe okay) with the complemented case being
> ><-prop:X>.  

The whitespace itself isn't a problem, but it means that whatever
follows is parsed using rules syntax and not a string constant.  Thus we 
probably want <prop:Lu> or <prop("Lu")> and not <prop Lu>.  

And to be a little pedantic in terminology, I call <prop:Lu> a capturing
subrule, not a character class match (although that subrule probably does
match and capture just a single character).  The character class
match would be <+prop:Lu> or something like that.  However, we do
get into a parsing issue with <+prop:Lu+prop:Ll>, which would probably
have to be written as <+prop('Lu')+prop('Ll')>, unless we treat the +
as "special".  (AFAIK, the :-argument form of subrule calls isn't well 
defined yet -- it's only briefly mentioned/proposed in A05.)

> >Actually the 'prop' may be unnecessary at all, as we know
> >we're in the character class sub-language because we saw the '<+', '<-'
> >or '<[', so we could just define the various Unicode character property
> >codes (I.e., Lu, Ll, Zs, etc) as pre-defined character class names just
> >like 'digit' or 'letter'.

I like this.

> Yeah, that was going to be my next step, except that the unknowing person 
> might make a sub-rule of their own called, say, "Zs", and then which would 
> take precedence?  Perhaps <prop:X> is a good way of writing it.

Well, it works out the same as if someone creates their own "digit" or
"alpha" rule.  One can always get to the built-in definition by explicit
scoping using <Grammar::digit> (or wherever the built-ins end up
being defined).

Pm

Reply via email to