-----BEGIN PGP SIGNED MESSAGE-----

Moin,

currently, charnames::viacode() does only take integers and strings, but 
stumbles over hex strings, even though the code tries to handle them (but 
fails due to a thinko):

 # perl -Mcharnames=:full -le 'print charnames::viacode(0x1234)'
 ETHIOPIC SYLLABLE SEE
 # perl -Mcharnames=:full -le 'print charnames::viacode(4660)'
 ETHIOPIC SYLLABLE SEE
 # perl -Mcharnames=:full -le 'print charnames::viacode("4660")'
 ETHIOPIC SYLLABLE SEE
 # perl -Mcharnames=:full -le 'print charnames::viacode("0x1234")'
 Argument "0x1234" isn't numeric in sprintf
 at /usr/local/lib/perl5/5.8.6/charnames.pm line 270.
 NULL

The attached patch corrects that mistake, and also optimizes the function. 
It inlines the subroutine _getcode(), which was used only once, and 
streamlines the operations a bit.

A benchmark with both versions was also done. The new one was named 
viacode2, and in the original one I replaced sprintf "%04X", $arg with 
sprintf "%04X", $code to fix the hexstring bug to compare the 
optimizations vs. straight bugfix:

Benchmark: running v("0x1234"), v(0x1234), v2("0x1234"), v2(0x1234) for at 
least 5 CPU seconds...

 v("0x1234"):  4 (5.07 usr + 0.05 sys = 5.12 CPU) @ 113076/s (n=578953)
 v(0x1234):    6 (5.05 usr + 0.05 sys = 5.10 CPU) @ 173617/s (n=885448)

 v2("0x1234"): 5 (5.16 usr + 0.03 sys = 5.19 CPU) @ 158199/s (n=821057)
 v2(0x1234):   5 (5.12 usr + 0.02 sys = 5.14 CPU) @ 221021/s (n=1136050)

The benchmark benched the cached version (once you looked up 
viacode(1234), charnames will remember the result internally). Uncached 
lookups are *much* slower, my system makes about 400 calls/s. The very 
first calls is again _much_ slower, because charnames includes a 500K 
file via do "". Both uncached and very-first-call szenarios are unlikely 
speed up, but not hurt, either.

Although all tests pass, the patch does not contain new tests for the now 
working calling-types. These will come when I rewrote charnames.t to use 
is() instead of printing ok or not ok (because it is quite hard to insert 
a test in the middle, where it belongs).

Btw, can Carp::croak() ever return? I think the return statements after 
carp() are superflouse and can be removed.

Best wishes,

Tels

- -- 
 Signed on Wed Jul  6 20:59:00 2005 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 "The campaign should combat the messages of pornography by putting signs
 on buses saying sex with children is not OK." -- Mary Anne Layden in
 ttp://tinyurl.com/6a9cy

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iQEVAwUBQswtCHcLPEOTuEwVAQGNRgf+M83ORhpuUbgzA+XWcHlZVWqZNvD2L91c
Vh0dFmphRvZuOczEnr3BiOSr3nqNiT7JBGdqx1JZZ1MURju90spkNBPBtGvdwC+7
x08+iCkaPpGY5XtFK/l+IBCIFTVupRsmRnRwwLcT/dGYnTasrLO9qE/VW+rh1lH2
smRVVZAUgqJw0222fMJ/3icWIlo+2zpMevwoM7Lzefn/D/erW+eLqi6tJZIjpa9O
lBf436MG7TsCHML2Wqkvzwyi2dnTVsjxIOku1ymjW7aN0/e/gLSnRMbr47Fs0Eg7
U5J13bJ+sxw9qfdxSZqPbcKFEzQ8UxnnhSj4jrkdz7BvRZ8GypJDpw==
=96E8
-----END PGP SIGNATURE-----
diff -ruN blead/lib/charnames.pm blead.charnames/lib/charnames.pm
--- blead/lib/charnames.pm      2005-07-02 18:40:54.000000000 +0200
+++ blead.charnames/lib/charnames.pm    2005-07-06 20:52:11.000000000 +0200
@@ -2,7 +2,7 @@
 use strict;
 use warnings;
 use File::Spec;
-our $VERSION = '1.04_1';
+our $VERSION = '1.05';
 
 use bytes ();          # for $bytes::hint_bits
 $charnames::hint_bits = 0x20000; # HINT_LOCALIZE_HH
@@ -247,42 +247,29 @@
   }
 } # import
 
-# this comes actually from Unicode::UCD, but it avoids the
-# overhead of loading it
-sub _getcode {
-    my $arg = shift;
-
-    if ($arg =~ /^[1-9]\d*$/) {
-       return $arg;
-    } elsif ($arg =~ /^(?:[Uu]\+|0[xX])?([[:xdigit:]]+)$/) {
-       return hex($1);
-    }
-
-    return;
-}
-
 my %viacode;
 
 sub viacode
 {
   if (@_ != 1) {
     carp "charnames::viacode() expects one argument";
-    return ()
+    return;
   }
 
   my $arg = shift;
-  my $code = _getcode($arg);
 
   my $hex;
-
-  if (defined $code) {
+  if ($arg =~ /^[1-9]\d*$/) {
     $hex = sprintf "%04X", $arg;
+  } elsif ($arg =~ /^(?:[Uu]\+|0[xX])?([[:xdigit:]]+)$/) {
+    $hex = $1;
   } else {
     carp("unexpected arg \"$arg\" to charnames::viacode()");
     return;
   }
 
-  if ($code > 0x10FFFF) {
+  # checking the length first is slightly faster
+  if (length($hex) > 5 && hex($hex) > 0x10FFFF) {
     carp sprintf "Unicode characters only allocated up to U+10FFFF (you asked 
for U+%X)", $hex;
     return;
   }
@@ -291,11 +278,9 @@
 
   $txt = do "unicore/Name.pl" unless $txt;
 
-  if ($txt =~ m/^$hex\t\t(.+)/m) {
-    return $viacode{$hex} = $1;
-  } else {
-    return;
-  }
+  return unless $txt =~ m/^$hex\t\t(.+)/m;
+
+  $viacode{$hex} = $1;
 } # viacode
 
 my %vianame;

Reply via email to