On Wed, May 04, 2005 at 12:30:48PM -0400, Dan Sugalski wrote: > At 10:21 AM -0500 5/4/05, Patrick R. Michaud wrote: > >Actually, overnight I realized there's a relatively good-sized > >project that needs figuring out -- identifying character properties > >such as isalpha, islower, isprint, etc. Here I'll briefly sketch > >how I'd like it to work, and maybe someone enterprising can take > >things from > > I'd planned on everything else going into constructed character > classes. I'd figured the named classes would correspond to the major > regex classes (things represented by \X sequences) while the > constructed classes would handle everything else and more or less > correspond to [] style sequences.
Makes sense. But somehow the named class versions of the ops don't give me quite as much coverage as I'd like -- for example, I can use "find_digit" to measure off a sequence of non-digit characters (e.g., rx { \D* } ), but there's not a corresponding "find_non_digit" opcode to let me measure off a set of digits (e.g., rx { \d* } ). We'll still need a way to make constructed character classes for <upper>, <lower>, and the like. But I (or someone else) can probably build that component in PIR for now, just hardcoding the ASCII or Latin-1 tables for the time being until we come up with something else later. Pm