Patrick wrote: : > * Almost. E.g. isL would be nice to have as well. : : Those exist also: : : $ ./perl6 : > say 'abCD34' ~~ / <isL> / : a : > say 'abCD34' ~~ / <isN> / : 3 : >
They may exist, but I'm not certain it's a good idea to encourage the Is_XXX approach on *anything* except Script=XXX properties. They certainly don't work on everything, you know. Also, I can't for the life of me why one would ever write <isL> when <Letter> is so much more obvious; similarly, for <isN> over <Number>. Just because you can do so, doesn't mean you necessarily should. http://unicode.org/reports/tr18/#Categories The recommended names for UCD properties and property values are in PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue]. There are both abbreviated names and longer, more descriptive names. It is strongly recommended that both names be recognized, and that loose matching of property names be used, whereby the case distinctions, whitespace, hyphens, and underbar are ignored. Furthermore, be aware that the Number property is *NOT* the same as the Decimal_Number property. In perl5, if one wants [0-9], then one expresses it exactly that way, since that's a lot shorter than writing (?=\p{ASCII})\p{Nd}, where Nd can also be Decimal_Number. Again, please that Number is far broader than even Decimal_Number, which is itself almost certainly broader than you're thinking. Here's a trio of little programs specifically designed to help scout out Unicode characters and their properties. They work best on 5.12+, but should be ok on 5.10, too. --tom
unitrio.tar.gz
Description: application/tar