On Wed, Nov 10, 2010 at 01:03:26PM -0500, Chase Albert wrote:
> Sorry if this is the wrong forum. I was wondering if there was a way to
> specify unicode
> categories<http://www.fileformat.info/info/unicode/category/index.htm>in
> a regular expression (and hence a grammar), or if there would be any
> consideration for adding support for that (requiring some kind of special
> syntax).

Unicode categories are done using assertion syntax with "is" followed by
the category name.  Thus <isLu> (uppercase letter), <isNd> (decimal digit), 
<isZs> (space separator), etc.

This even works in Rakudo today:

    $ ./perl6
    > say 'abcdEFG' ~~ / <isLu> /
    E

They can also be combined, as in +isLu+isLt  (uppercase+titlecase).
The relevant section of the spec is in Synopsis 5; search for "Unicode
properties are always available with a prefix".

Hope this helps!

Pm

Reply via email to