On Wed, Nov 10, 2010 at 01:03:26PM -0500, Chase Albert wrote: > Sorry if this is the wrong forum. I was wondering if there was a way to > specify unicode > categories<http://www.fileformat.info/info/unicode/category/index.htm>in > a regular expression (and hence a grammar), or if there would be any > consideration for adding support for that (requiring some kind of special > syntax).
Unicode categories are done using assertion syntax with "is" followed by the category name. Thus <isLu> (uppercase letter), <isNd> (decimal digit), <isZs> (space separator), etc. This even works in Rakudo today: $ ./perl6 > say 'abcdEFG' ~~ / <isLu> / E They can also be combined, as in +isLu+isLt (uppercase+titlecase). The relevant section of the spec is in Synopsis 5; search for "Unicode properties are always available with a prefix". Hope this helps! Pm