Re: range operator vs. unicode
On Thu, 8 Jun 2006 11:03:42 +0200, "Rafael Garcia-Suarez" <[EMAIL PROTECTED]> wrote > Sure, we can extend the magic to ensure that the increment of a > variable that holds "\N{omega}" is "\N{alpha}\N{alpha}". But I feel > that dragons might be dormant here... If is the ranges of greak letters to be supported, the treatment of final sigma (U+03C2) which is interposed between rho (U+03C1) and sigma (U+03C2) should be considered well. For capital letters, the corresponding code point (U+03A2) is reserved (i.e. unassigned). And greek-aware people may complain that stigma (U+03D[AB]) or digamma (U+03D[CD]) isn't treated as the sixth letter following epsilon. P.S. 11 is represented by iota-alpha, not by kappa, with the greek numeral system. cf. http://en.wikipedia.org/wiki/Greek_numerals Regards, SADAHIRO Tomoyuki
Re: range operator vs. unicode
Dan Kogai schreef: > I found that ('a'..'z') works only for alphanumerals. Just like it is documented. But your definition of 'alphanumeral' is stale: news:[EMAIL PROTECTED] -- Affijn, Ruud "Gewoon is een tijger."
Re: range operator vs. unicode
On Thu, Jun 08, 2006 at 05:03:15PM +0900, Dan Kogai wrote: > I found that ('a'..'z') works only for alphanumerals. Try the code > below; > > use strict; > use warnings; > #use utf8; > use charnames ':full'; > binmode STDOUT, ':utf8'; > # works > print "$_\n" for ("\N{LATIN CAPITAL LETTER A}" .. "\N{LATIN CAPITAL > LETTER Z}"); > # (0..9, 'A'..'Z', 'a'..'z'); symbols skipped > print "$_\n" for ("\N{DIGIT ZERO}" .. "\N{LATIN SMALL LETTER Z}"); Right. > # does not work > print "$_\n" for ("\N{LATIN SMALL LETTER A}" .. "\N{LEFT CURLY > BRACKET}"); The above should print a, ..., z, and does do so. The next in the series after z is aa, which is longer than LEFT CURLY BRACKET, so the range is ended with z. Since magical string increment doesn't recognize any of the below starting characters, the next three ranges should just return the starting element. > print "$_\n" for ("\N{NO-BREAK SPACE}" .. "\N{LATIN SMALL LETTER Y > WITH DIAERESIS}"); > print "$_\n" for ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK > CAPITAL LETTER OMEGA}"); > print "$_\n" for ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA > LETTER VO}") > __END__ > > There is an easy workaround, however. > > my @katakana = map { chr } ("\N{KATAKANA LETTER SMALL A}" .. "\N > {KATAKANA LETTER VO}"); Did you mean: ord("\N{KATAKANA LETTER SMALL A}") .. ord("\N{KATAKANA LETTER VO}"); ? > Since we have a workaround above, I don't consider this range > implementation is a bug -- after all we would be rather surprised if > ('\x0' .. '\x{10}') worked. But the following should be fixed so > greeks are not confused with the consequence of ("\N{GREEK CAPITAL > LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"), japanese are not > confused with ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER > VO}") and so forth. Which part should be fixed? > perldoc perlop > > The range operator (in list context) makes use of the > >magical auto- > > increment algorithm if the operands are strings. You can say The key part is that magical auto-increment is defined earlier as only working for strings matching "/^[a-zA-Z]*[0-9]*\z/". > > > > @alphabet = ('A' .. 'Z'); > > > > to get all normal letters of the English alphabet, or > > > > $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; > > > > to get a hexadecimal digit, or > > > > @z2 = ('01' .. '31'); print $z2[$mday]; > > > > to get dates with leading zeros. If the final value > >specified is not > > in the sequence that the magical increment would produce, > >the sequence > > goes until the next value would be longer than the final > >value speci- > > fied.
Re: range operator vs. unicode
On 08/06/06, Dan Kogai <[EMAIL PROTECTED]> wrote: On the other hand, ranges in regexp and C works. You may consider this inconsistent but range operator must accept variables like ($start .. $end) while character ranges in regexp is constant. I don't find this particularly inconsistent. Ranges in tr/// and in [] are between characters, but the range operator operates on strings : [EMAIL PROTECTED] ~]$ perl -l print for "zy" .. "aab" __END__ zy zz aaa aab Sure, we can extend the magic to ensure that the increment of a variable that holds "\N{omega}" is "\N{alpha}\N{alpha}". But I feel that dragons might be dormant here...
[PATCH] Re: range operator vs. unicode
On Thu, Jun 08, 2006 at 05:56:13PM +0900, Dan Kogai wrote: > On Jun 08, 2006, at 17:34 , Yitzchak Scott-Thoennes wrote: > >Which part should be fixed? > > The limitation of the magic, namely > > > >The key part is that magical auto-increment is defined earlier as > >only working for strings matching "/^[a-zA-Z]*[0-9]*\z/". > > Which is described in "Auto-increment and Auto-decrement", though > "Range Operator" does mention. > > perldoc perlop > > The range operator (in list context) makes use of the > >magical auto- > > increment algorithm if the operands are strings. > > This would make lawyers happy enough but not (Uni)?coders like > myself. With the advent of Unicode support more people would attempt > things like ("\N{alpha}" .. "\N{omega}") and wonder why it does not > work like ("a".."z"). So we should add something like; > > =head2 CAVEAT > > Note that the range operator cannot apply magic beyond C<[a-zA-Z0-9] > >. Therefore > > use charnames 'greek'; > my @greek_small = ("\N{alpha}" .. "\N{omega}"); > > Does not work. If you want non-ascii ranges, try > > my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N > {omega}") ); > > On the other hand, ranges in regexp and C works. You may > consider this inconsistent but range operator must accept variables > like ($start .. $end) while character ranges in regexp is > constant. Hmm, we don't seem to document even what something like "+" .. "-" does. How does this look: --- perl/pod/perlop.pod.orig2006-05-15 09:48:33.0 -0700 +++ perl/pod/perlop.pod 2006-06-08 02:30:45.5 -0700 @@ -648,10 +648,22 @@ @z2 = ('01' .. '31'); print $z2[$mday]; -to get dates with leading zeros. If the final value specified is not -in the sequence that the magical increment would produce, the sequence -goes until the next value would be longer than the final value -specified. +to get dates with leading zeros. + +If the final value specified is not in the sequence that the magical +increment would produce, the sequence goes until the next value would +be longer than the final value specified. + +If the initial value specified isn't part of a magical increment +sequence (that is, matching "/^[a-zA-Z]*[0-9]*\z/"), only the initial +value will be returned. So the following will only return an alpha: + +use charnames 'greek'; +my @greek_small = ("\N{alpha}" .. "\N{omega}"); + +Use this instead: + +my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") ); Because each operand is evaluated in integer form, C<2.18 .. 3.14> will return two elements in list context.
Re: range operator vs. unicode
On Jun 08, 2006, at 17:34 , Yitzchak Scott-Thoennes wrote: Which part should be fixed? The limitation of the magic, namely The key part is that magical auto-increment is defined earlier as only working for strings matching "/^[a-zA-Z]*[0-9]*\z/". Which is described in "Auto-increment and Auto-decrement", though "Range Operator" does mention. perldoc perlop The range operator (in list context) makes use of the magical auto- increment algorithm if the operands are strings. This would make lawyers happy enough but not (Uni)?coders like myself. With the advent of Unicode support more people would attempt things like ("\N{alpha}" .. "\N{omega}") and wonder why it does not work like ("a".."z"). So we should add something like; =head2 CAVEAT Note that the range operator cannot apply magic beyond C<[a-zA-Z0-9] >. Therefore use charnames 'greek'; my @greek_small = ("\N{alpha}" .. "\N{omega}"); Does not work. If you want non-ascii ranges, try my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N {omega}") ); On the other hand, ranges in regexp and C works. You may consider this inconsistent but range operator must accept variables like ($start .. $end) while character ranges in regexp is constant. =cut Dan the Range (?:Ar)ranger
range operator vs. unicode
Porters, I found that ('a'..'z') works only for alphanumerals. Try the code below; use strict; use warnings; #use utf8; use charnames ':full'; binmode STDOUT, ':utf8'; # works print "$_\n" for ("\N{LATIN CAPITAL LETTER A}" .. "\N{LATIN CAPITAL LETTER Z}"); # (0..9, 'A'..'Z', 'a'..'z'); symbols skipped print "$_\n" for ("\N{DIGIT ZERO}" .. "\N{LATIN SMALL LETTER Z}"); # does not work print "$_\n" for ("\N{LATIN SMALL LETTER A}" .. "\N{LEFT CURLY BRACKET}"); print "$_\n" for ("\N{NO-BREAK SPACE}" .. "\N{LATIN SMALL LETTER Y WITH DIAERESIS}"); print "$_\n" for ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"); print "$_\n" for ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER VO}") __END__ There is an easy workaround, however. my @katakana = map { chr } ("\N{KATAKANA LETTER SMALL A}" .. "\N {KATAKANA LETTER VO}"); Since we have a workaround above, I don't consider this range implementation is a bug -- after all we would be rather surprised if ('\x0' .. '\x{10}') worked. But the following should be fixed so greeks are not confused with the consequence of ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"), japanese are not confused with ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER VO}") and so forth. perldoc perlop The range operator (in list context) makes use of the magical auto- increment algorithm if the operands are strings. You can say @alphabet = ('A' .. 'Z'); to get all normal letters of the English alphabet, or $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; to get a hexadecimal digit, or @z2 = ('01' .. '31'); print $z2[$mday]; to get dates with leading zeros. If the final value specified is not in the sequence that the magical increment would produce, the sequence goes until the next value would be longer than the final value speci- fied. Dan the Man with Too Many Characters to Squeeze in the Range