[perl.git] branch blead, updated. v5.25.6-246-g5359794

H.Merijn Brand Mon, 14 Nov 2016 04:17:34 -0800

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/53597948a6d0cb346f2b9bcc354471f29df31309?hp=2e2d70f2b1c8c037ccde21d4de658efaa0008b49>


- Log -----------------------------------------------------------------
commit 53597948a6d0cb346f2b9bcc354471f29df31309
Author: H.Merijn Brand <[email protected]>
Date:   Mon Nov 14 13:15:25 2016 +0100

    Additional warning of Name.pl going away

M       lib/Unicode/UCD.pm

commit 816b0b097b90ca037852f7c0f7c670799afdf23b
Author: H.Merijn Brand <[email protected]>
Date:   Mon Nov 14 12:37:18 2016 +0100

    Unicode::UCD documentation for reading Name.pl as encouraged practice

M       lib/Unicode/UCD.pm
-----------------------------------------------------------------------

Summary of changes:
 lib/Unicode/UCD.pm | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/lib/Unicode/UCD.pm b/lib/Unicode/UCD.pm
index 990e86f..c281490 100644
--- a/lib/Unicode/UCD.pm
+++ b/lib/Unicode/UCD.pm
@@ -1211,7 +1211,7 @@ sub bidi_types {
 =head2 B<compexcl()>
 
 WARNING: Unicode discourages the use of this function or any of the
-alternative mechanisms listed in this section (the documention of
+alternative mechanisms listed in this section (the documentation of
 C<compexcl()>), except internally in implementations of the Unicode
 Normalization Algorithm.  You should be using L<Unicode::Normalize> directly
 instead of these.  Using these will likely lead to half-baked results.
@@ -3155,11 +3155,48 @@ return C<undef> if called with one of those.
 The returned values for the Perl extension properties, such as C<Any> and
 C<Greek> are somewhat misleading.  The values are either C<"Y"> or C<"N>".
 All Unicode properties are bipartite, so you can actually use the C<"Y"> or
-C<"N>" in a Perl regular rexpression for these, like C<qr/\p{ID_Start=Y/}> or
+C<"N>" in a Perl regular expression for these, like C<qr/\p{ID_Start=Y/}> or
 C<qr/\p{Upper=N/}>.  But the Perl extensions aren't specified this way, only
 like C</qr/\p{Any}>, I<etc>.  You can't actually use the C<"Y"> and C<"N>" in
 them.
 
+=head3 Getting every available name
+
+Instead of reading the Unicode Database directly from files, as you were able
+to do for a long time, you are encouraged to use the supplied functions. So,
+instead of reading C<Name.pl> - which may disappear without notice in the
+future - directly, as with
+
+  my (%name, %cp);
+  for (split m/\s*\n/ => do "unicore/Name.pl") {
+      my ($cp, $name) = split m/\t/ => $_;
+      $cp{$name} = $cp;
+      $name{$cp} = $name unless $cp =~ m/ /;
+  }
+
+You ought to use L</prop_invmap> like this:
+
+  my (%name, %cp, %cps, $n);
+  # All codepoints
+  foreach my $cat (qw( Name Name_Alias )) {
+      my ($codepoints, $names, $format, $default) = prop_invmap($cat);
+      # $format => "n", $default => ""
+      foreach my $i (0 .. @$codepoints - 2) {
+          my ($cp, $n) = ($codepoints->[$i], $names->[$i]);
+          # If $n is a ref, the same codepoint has multiple names
+          foreach my $name (ref $n ? @$n : $n) {
+              $name{$cp} //= $name;
+              $cp{$name} //= $cp;
+          }
+      }
+  }
+  # Named sequences
+  {   my %ns = namedseq();
+      foreach my $name (sort { $ns{$a} cmp $ns{$b} } keys %ns) {
+          $cp{$name} //= [ map { ord } split "" => $ns{$name} ];
+      }
+  }
+
 =cut
 
 # User-defined properties could be handled with some changes to utf8_heavy.pl;

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.25.6-246-g5359794

Reply via email to