Tom Christiansen wrote:
Patrick wrote:
: > * Almost. E.g. isL would be nice to have as well.
:
: Those exist also:
:
: $ ./perl6
: > say 'abCD34' ~~ / <isL> /
: a
: > say 'abCD34' ~~ / <isN> /
: 3
: >
They may exist, but I'm not certain it's a good idea to encourage
the Is_XXX approach on *anything* except Script=XXX properties.
They certainly don't work on everything, you know.
Also, I can't for the life of me why one would ever write <isL> when
<Letter> is so much more obvious; similarly, for <isN> over <Number>.
Just because you can do so, doesn't mean you necessarily should.
http://unicode.org/reports/tr18/#Categories
The recommended names for UCD properties and property values are in
PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
There are both abbreviated names and longer, more descriptive names.
It is strongly recommended that both names be recognized, and that
loose matching of property names be used, whereby the case
distinctions, whitespace, hyphens, and underbar are ignored.
Furthermore, be aware that the Number property is *NOT* the same
as the Decimal_Number property. In perl5, if one wants [0-9], then
one expresses it exactly that way, since that's a lot shorter than
writing (?=\p{ASCII})\p{Nd}, where Nd can also be Decimal_Number.
Again, please that Number is far broader than even Decimal_Number,
which is itself almost certainly broader than you're thinking.
Here's a trio of little programs specifically designed to help scout
out Unicode characters and their properties. They work best on 5.12+,
but should be ok on 5.10, too.
--tom
The 'Is' prefix can be used on any property in 5.12 for which there is
no naming conflict. The only naming conflicts are certain of the block
properties, such as Arabic. IsArabic means the Arabic script. InArabic
means the base Arabic block. Personally, I find Is and In unintuitive,
and prefer to write sc=arabic or blk=arabic instead.
When Unicode proposed to add some properties in 5.2 that started with
'Is', there was significant enough protest that they backed off, and
promised never to do it again, adding a stability policy to 6.0 to that
effect. Apparently a number of languages use 'Is' as a prefix.