-----BEGIN PGP SIGNED MESSAGE----- Moin,
currently, charnames::viacode() does only take integers and strings, but stumbles over hex strings, even though the code tries to handle them (but fails due to a thinko): # perl -Mcharnames=:full -le 'print charnames::viacode(0x1234)' ETHIOPIC SYLLABLE SEE # perl -Mcharnames=:full -le 'print charnames::viacode(4660)' ETHIOPIC SYLLABLE SEE # perl -Mcharnames=:full -le 'print charnames::viacode("4660")' ETHIOPIC SYLLABLE SEE # perl -Mcharnames=:full -le 'print charnames::viacode("0x1234")' Argument "0x1234" isn't numeric in sprintf at /usr/local/lib/perl5/5.8.6/charnames.pm line 270. NULL The attached patch corrects that mistake, and also optimizes the function. It inlines the subroutine _getcode(), which was used only once, and streamlines the operations a bit. A benchmark with both versions was also done. The new one was named viacode2, and in the original one I replaced sprintf "%04X", $arg with sprintf "%04X", $code to fix the hexstring bug to compare the optimizations vs. straight bugfix: Benchmark: running v("0x1234"), v(0x1234), v2("0x1234"), v2(0x1234) for at least 5 CPU seconds... v("0x1234"): 4 (5.07 usr + 0.05 sys = 5.12 CPU) @ 113076/s (n=578953) v(0x1234): 6 (5.05 usr + 0.05 sys = 5.10 CPU) @ 173617/s (n=885448) v2("0x1234"): 5 (5.16 usr + 0.03 sys = 5.19 CPU) @ 158199/s (n=821057) v2(0x1234): 5 (5.12 usr + 0.02 sys = 5.14 CPU) @ 221021/s (n=1136050) The benchmark benched the cached version (once you looked up viacode(1234), charnames will remember the result internally). Uncached lookups are *much* slower, my system makes about 400 calls/s. The very first calls is again _much_ slower, because charnames includes a 500K file via do "". Both uncached and very-first-call szenarios are unlikely speed up, but not hurt, either. Although all tests pass, the patch does not contain new tests for the now working calling-types. These will come when I rewrote charnames.t to use is() instead of printing ok or not ok (because it is quite hard to insert a test in the middle, where it belongs). Btw, can Carp::croak() ever return? I think the return statements after carp() are superflouse and can be removed. Best wishes, Tels - -- Signed on Wed Jul 6 20:59:00 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "The campaign should combat the messages of pornography by putting signs on buses saying sex with children is not OK." -- Mary Anne Layden in ttp://tinyurl.com/6a9cy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQswtCHcLPEOTuEwVAQGNRgf+M83ORhpuUbgzA+XWcHlZVWqZNvD2L91c Vh0dFmphRvZuOczEnr3BiOSr3nqNiT7JBGdqx1JZZ1MURju90spkNBPBtGvdwC+7 x08+iCkaPpGY5XtFK/l+IBCIFTVupRsmRnRwwLcT/dGYnTasrLO9qE/VW+rh1lH2 smRVVZAUgqJw0222fMJ/3icWIlo+2zpMevwoM7Lzefn/D/erW+eLqi6tJZIjpa9O lBf436MG7TsCHML2Wqkvzwyi2dnTVsjxIOku1ymjW7aN0/e/gLSnRMbr47Fs0Eg7 U5J13bJ+sxw9qfdxSZqPbcKFEzQ8UxnnhSj4jrkdz7BvRZ8GypJDpw== =96E8 -----END PGP SIGNATURE-----
diff -ruN blead/lib/charnames.pm blead.charnames/lib/charnames.pm --- blead/lib/charnames.pm 2005-07-02 18:40:54.000000000 +0200 +++ blead.charnames/lib/charnames.pm 2005-07-06 20:52:11.000000000 +0200 @@ -2,7 +2,7 @@ use strict; use warnings; use File::Spec; -our $VERSION = '1.04_1'; +our $VERSION = '1.05'; use bytes (); # for $bytes::hint_bits $charnames::hint_bits = 0x20000; # HINT_LOCALIZE_HH @@ -247,42 +247,29 @@ } } # import -# this comes actually from Unicode::UCD, but it avoids the -# overhead of loading it -sub _getcode { - my $arg = shift; - - if ($arg =~ /^[1-9]\d*$/) { - return $arg; - } elsif ($arg =~ /^(?:[Uu]\+|0[xX])?([[:xdigit:]]+)$/) { - return hex($1); - } - - return; -} - my %viacode; sub viacode { if (@_ != 1) { carp "charnames::viacode() expects one argument"; - return () + return; } my $arg = shift; - my $code = _getcode($arg); my $hex; - - if (defined $code) { + if ($arg =~ /^[1-9]\d*$/) { $hex = sprintf "%04X", $arg; + } elsif ($arg =~ /^(?:[Uu]\+|0[xX])?([[:xdigit:]]+)$/) { + $hex = $1; } else { carp("unexpected arg \"$arg\" to charnames::viacode()"); return; } - if ($code > 0x10FFFF) { + # checking the length first is slightly faster + if (length($hex) > 5 && hex($hex) > 0x10FFFF) { carp sprintf "Unicode characters only allocated up to U+10FFFF (you asked for U+%X)", $hex; return; } @@ -291,11 +278,9 @@ $txt = do "unicore/Name.pl" unless $txt; - if ($txt =~ m/^$hex\t\t(.+)/m) { - return $viacode{$hex} = $1; - } else { - return; - } + return unless $txt =~ m/^$hex\t\t(.+)/m; + + $viacode{$hex} = $1; } # viacode my %vianame;