In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/53597948a6d0cb346f2b9bcc354471f29df31309?hp=2e2d70f2b1c8c037ccde21d4de658efaa0008b49>
- Log ----------------------------------------------------------------- commit 53597948a6d0cb346f2b9bcc354471f29df31309 Author: H.Merijn Brand <[email protected]> Date: Mon Nov 14 13:15:25 2016 +0100 Additional warning of Name.pl going away M lib/Unicode/UCD.pm commit 816b0b097b90ca037852f7c0f7c670799afdf23b Author: H.Merijn Brand <[email protected]> Date: Mon Nov 14 12:37:18 2016 +0100 Unicode::UCD documentation for reading Name.pl as encouraged practice M lib/Unicode/UCD.pm ----------------------------------------------------------------------- Summary of changes: lib/Unicode/UCD.pm | 41 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm index 990e86f..c281490 100644 --- a/lib/Unicode/UCD.pm +++ b/lib/Unicode/UCD.pm @@ -1211,7 +1211,7 @@ sub bidi_types { =head2 B<compexcl()> WARNING: Unicode discourages the use of this function or any of the -alternative mechanisms listed in this section (the documention of +alternative mechanisms listed in this section (the documentation of C<compexcl()>), except internally in implementations of the Unicode Normalization Algorithm. You should be using L<Unicode::Normalize> directly instead of these. Using these will likely lead to half-baked results. @@ -3155,11 +3155,48 @@ return C<undef> if called with one of those. The returned values for the Perl extension properties, such as C<Any> and C<Greek> are somewhat misleading. The values are either C<"Y"> or C<"N>". All Unicode properties are bipartite, so you can actually use the C<"Y"> or -C<"N>" in a Perl regular rexpression for these, like C<qr/\p{ID_Start=Y/}> or +C<"N>" in a Perl regular expression for these, like C<qr/\p{ID_Start=Y/}> or C<qr/\p{Upper=N/}>. But the Perl extensions aren't specified this way, only like C</qr/\p{Any}>, I<etc>. You can't actually use the C<"Y"> and C<"N>" in them. +=head3 Getting every available name + +Instead of reading the Unicode Database directly from files, as you were able +to do for a long time, you are encouraged to use the supplied functions. So, +instead of reading C<Name.pl> - which may disappear without notice in the +future - directly, as with + + my (%name, %cp); + for (split m/\s*\n/ => do "unicore/Name.pl") { + my ($cp, $name) = split m/\t/ => $_; + $cp{$name} = $cp; + $name{$cp} = $name unless $cp =~ m/ /; + } + +You ought to use L</prop_invmap> like this: + + my (%name, %cp, %cps, $n); + # All codepoints + foreach my $cat (qw( Name Name_Alias )) { + my ($codepoints, $names, $format, $default) = prop_invmap($cat); + # $format => "n", $default => "" + foreach my $i (0 .. @$codepoints - 2) { + my ($cp, $n) = ($codepoints->[$i], $names->[$i]); + # If $n is a ref, the same codepoint has multiple names + foreach my $name (ref $n ? @$n : $n) { + $name{$cp} //= $name; + $cp{$name} //= $cp; + } + } + } + # Named sequences + { my %ns = namedseq(); + foreach my $name (sort { $ns{$a} cmp $ns{$b} } keys %ns) { + $cp{$name} //= [ map { ord } split "" => $ns{$name} ]; + } + } + =cut # User-defined properties could be handled with some changes to utf8_heavy.pl; -- Perl5 Master Repository
