Karl Williamson wrote:
> I'm presuming you need this not for a one-time only thing, but to be
> able to run this program over and over.
Yes -- this is for a module that will be usable in a number of
situations. See
http://search.cpan.org/~bhallissy/Text-Unicode-Equivalents-0.05/.
The current implementation cheats by accessing unicore/Decomposition.pl
exactly the same way Unicode::UCD does.
> You can always download UnicodeData.txt from the Unicode web site.
Yes I can -- and certainly have done for my personal use. But including
that file (or some derivative) in a general purpose module would mean
that it wouldn't necessarily have the same Unicode version as the Perl
installation into which my module might be installed. And besides, the
information I need is already in the Perl core -- though supposedly not
usable.
> In a regular expression,
> \p{Dt= can} (Decomposition_Type=Canonical) will match all characters
> that you want.
Yes, I understand that I can test a character to see if it has a
particular decomposition, but I'm not sure I understand how to use a
regex to generate a complete list of characters with decompositions.
> I'm thinking that 5.16 will have the stringification
> of that regex include the list you want, but not in 5.14, and
> stringification is not necessarily fixed either.
>
> I could easily write a new function for UCD that returns a list of
> all code points that have a given property.
That is an interesting offer, and I think this should be given serious
consideration. I'm sure my little module isn't the only one that, as we
go into the future, would benefit from such a function.
Thanks for your reply, Karl.
Bob