Re: Useful task -- Character properties

Patrick R. Michaud Wed, 04 May 2005 11:28:36 -0700

On Wed, May 04, 2005 at 12:30:48PM -0400, Dan Sugalski wrote:
> At 10:21 AM -0500 5/4/05, Patrick R. Michaud wrote:
> >Actually, overnight I realized there's a relatively good-sized
> >project that needs figuring out -- identifying character properties
> >such as isalpha, islower, isprint, etc.  Here I'll briefly sketch
> >how I'd like it to work, and maybe someone enterprising can take
> >things from
> 
> I'd planned on everything else going into constructed character 
> classes. I'd figured the named classes would correspond to the major 
> regex classes (things represented by \X sequences) while the 
> constructed classes would handle everything else and more or less 
> correspond to [] style sequences.


Makes sense.  But somehow the named class versions of the ops
don't give me quite as much coverage as I'd like -- for example,
I can use "find_digit" to measure off a sequence of non-digit
characters (e.g., rx { \D* } ), but there's not a corresponding
"find_non_digit" opcode to let me measure off a set of digits
(e.g., rx { \d* } ).  

We'll still need a way to make constructed character classes
for <upper>, <lower>, and the like.  But I (or someone else) can 
probably build that component in PIR for now, just hardcoding the ASCII or
Latin-1 tables for the time being until we come up with something
else later.

Pm

Re: Useful task -- Character properties

Reply via email to