subject:"range operator vs. unicode"

Re: range operator vs. unicode

2006-06-08 Thread SADAHIRO Tomoyuki


On Thu, 8 Jun 2006 11:03:42 +0200, "Rafael Garcia-Suarez" <[EMAIL PROTECTED]> 
wrote

> Sure, we can extend the magic to ensure that the increment of a
> variable that holds "\N{omega}" is "\N{alpha}\N{alpha}". But I feel
> that dragons might be dormant here...

If is the ranges of greak letters to be supported, the treatment
of final sigma (U+03C2) which is interposed between rho (U+03C1)
and sigma (U+03C2) should be considered well.
For capital letters, the corresponding code point (U+03A2) is
reserved (i.e. unassigned).

And greek-aware people may complain that stigma (U+03D[AB])
or digamma (U+03D[CD]) isn't treated as the sixth letter
following epsilon.

P.S. 11 is represented by iota-alpha, not by kappa,
with the greek numeral system.
cf. http://en.wikipedia.org/wiki/Greek_numerals

Regards,
SADAHIRO Tomoyuki

Re: range operator vs. unicode

2006-06-08 Thread Dr.Ruud

Dan Kogai schreef:

> I found that ('a'..'z') works only for alphanumerals.

Just like it is documented. But your definition of 'alphanumeral'
is stale:

  news:[EMAIL PROTECTED]

-- 
Affijn, Ruud

"Gewoon is een tijger."

Re: range operator vs. unicode

2006-06-08 Thread Yitzchak Scott-Thoennes

On Thu, Jun 08, 2006 at 05:03:15PM +0900, Dan Kogai wrote:
> I found that ('a'..'z') works only for alphanumerals.  Try the code  
> below;
> 
> use strict;
> use warnings;
> #use utf8;
> use charnames ':full';
> binmode STDOUT, ':utf8';
> # works
> print "$_\n" for ("\N{LATIN CAPITAL LETTER A}" .. "\N{LATIN CAPITAL  
> LETTER Z}");
> # (0..9, 'A'..'Z', 'a'..'z'); symbols skipped
> print "$_\n" for ("\N{DIGIT ZERO}" .. "\N{LATIN SMALL LETTER Z}");

Right.

> # does not work
> print "$_\n" for ("\N{LATIN SMALL LETTER A}" .. "\N{LEFT CURLY  
> BRACKET}");

The above should print a, ..., z, and does do so.  The next in the
series after z is aa, which is longer than LEFT CURLY BRACKET, so the
range is ended with z.

Since magical string increment doesn't recognize any of the below
starting characters, the next three ranges should just return the
starting element.

> print "$_\n" for ("\N{NO-BREAK SPACE}" .. "\N{LATIN SMALL LETTER Y  
> WITH DIAERESIS}");
> print "$_\n" for ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK  
> CAPITAL LETTER OMEGA}");
> print "$_\n" for ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA  
> LETTER VO}")
> __END__
> 
> There is an easy workaround, however.
> 
> my @katakana = map { chr } ("\N{KATAKANA LETTER SMALL A}" .. "\N 
> {KATAKANA LETTER VO}");

Did you mean:
 ord("\N{KATAKANA LETTER SMALL A}") .. ord("\N{KATAKANA LETTER VO}");
?

> Since we have a workaround above, I don't consider this range  
> implementation is a bug -- after all we would be rather surprised if  
> ('\x0' .. '\x{10}') worked.  But the following should be fixed so  
> greeks are not confused with the consequence of  ("\N{GREEK CAPITAL  
> LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"), japanese are not  
> confused with ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER  
> VO}") and so forth.

Which part should be fixed?
 
> perldoc perlop
> >   The range operator (in list context) makes use of the  
> >magical auto-
> >   increment algorithm if the operands are strings.  You can say

The key part is that magical auto-increment is defined earlier as
only working for strings matching "/^[a-zA-Z]*[0-9]*\z/".

> >
> >   @alphabet = ('A' .. 'Z');
> >
> >   to get all normal letters of the English alphabet, or
> >
> >   $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];
> >
> >   to get a hexadecimal digit, or
> >
> >   @z2 = ('01' .. '31');  print $z2[$mday];
> >
> >   to get dates with leading zeros.  If the final value  
> >specified is not
> >   in the sequence that the magical increment would produce,  
> >the sequence
> >   goes until the next value would be longer than the final  
> >value speci-
> >   fied.

Re: range operator vs. unicode

2006-06-08 Thread Rafael Garcia-Suarez


On 08/06/06, Dan Kogai <[EMAIL PROTECTED]> wrote:

On the other hand, ranges in regexp and C works.  You may
consider this inconsistent but range operator must accept variables
like ($start .. $end) while character ranges in regexp is
constant.


I don't find this particularly inconsistent. Ranges in tr/// and in []
are between characters, but the range operator operates on strings :

[EMAIL PROTECTED] ~]$ perl -l
print for "zy" .. "aab"
__END__
zy
zz
aaa
aab

Sure, we can extend the magic to ensure that the increment of a
variable that holds "\N{omega}" is "\N{alpha}\N{alpha}". But I feel
that dragons might be dormant here...

[PATCH] Re: range operator vs. unicode

2006-06-08 Thread Yitzchak Scott-Thoennes

On Thu, Jun 08, 2006 at 05:56:13PM +0900, Dan Kogai wrote:
> On Jun 08, 2006, at 17:34 , Yitzchak Scott-Thoennes wrote:
> >Which part should be fixed?
> 
> The limitation of the magic, namely
> >
> >The key part is that magical auto-increment is defined earlier as
> >only working for strings matching "/^[a-zA-Z]*[0-9]*\z/".
> 
> Which is described in "Auto-increment and Auto-decrement", though  
> "Range Operator" does mention.
> 
> perldoc perlop
> >   The range operator (in list context) makes use of the  
> >magical auto-
> >   increment algorithm if the operands are strings.
> 
> This would make lawyers happy enough but not (Uni)?coders like  
> myself.  With the advent of Unicode support more people would attempt  
> things like ("\N{alpha}" .. "\N{omega}") and wonder why it does not  
> work like ("a".."z").  So we should add something like;
> 
> =head2 CAVEAT
> 
> Note that the range operator cannot apply magic beyond C<[a-zA-Z0-9] 
> >.  Therefore
> 
>   use charnames 'greek';
>   my @greek_small =  ("\N{alpha}" .. "\N{omega}");
> 
> Does not work.  If you want non-ascii ranges, try
> 
>   my @greek_small =  map { chr } ( ord("\N{alpha}") .. ord("\N 
> {omega}") );
> 
> On the other hand, ranges in regexp and C works.  You may  
> consider this inconsistent but range operator must accept variables  
> like ($start .. $end) while character ranges in regexp is  
> constant.

Hmm, we don't seem to document even what something like "+" .. "-"
does.  How does this look:

--- perl/pod/perlop.pod.orig2006-05-15 09:48:33.0 -0700
+++ perl/pod/perlop.pod 2006-06-08 02:30:45.5 -0700
@@ -648,10 +648,22 @@
 
 @z2 = ('01' .. '31');  print $z2[$mday];
 
-to get dates with leading zeros.  If the final value specified is not
-in the sequence that the magical increment would produce, the sequence
-goes until the next value would be longer than the final value
-specified.
+to get dates with leading zeros.
+
+If the final value specified is not in the sequence that the magical
+increment would produce, the sequence goes until the next value would
+be longer than the final value specified.
+
+If the initial value specified isn't part of a magical increment
+sequence (that is, matching "/^[a-zA-Z]*[0-9]*\z/"), only the initial
+value will be returned.  So the following will only return an alpha:
+
+use charnames 'greek';
+my @greek_small =  ("\N{alpha}" .. "\N{omega}");
+
+Use this instead:
+
+my @greek_small =  map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") );
 
 Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
 return two elements in list context.

Re: range operator vs. unicode

2006-06-08 Thread Dan Kogai


On Jun 08, 2006, at 17:34 , Yitzchak Scott-Thoennes wrote:

Which part should be fixed?


The limitation of the magic, namely


The key part is that magical auto-increment is defined earlier as
only working for strings matching "/^[a-zA-Z]*[0-9]*\z/".


Which is described in "Auto-increment and Auto-decrement", though  
"Range Operator" does mention.


perldoc perlop
   The range operator (in list context) makes use of the  
magical auto-

   increment algorithm if the operands are strings.


This would make lawyers happy enough but not (Uni)?coders like  
myself.  With the advent of Unicode support more people would attempt  
things like ("\N{alpha}" .. "\N{omega}") and wonder why it does not  
work like ("a".."z").  So we should add something like;


=head2 CAVEAT

Note that the range operator cannot apply magic beyond C<[a-zA-Z0-9] 
>.  Therefore


  use charnames 'greek';
  my @greek_small =  ("\N{alpha}" .. "\N{omega}");

Does not work.  If you want non-ascii ranges, try

  my @greek_small =  map { chr } ( ord("\N{alpha}") .. ord("\N 
{omega}") );


On the other hand, ranges in regexp and C works.  You may  
consider this inconsistent but range operator must accept variables  
like ($start .. $end) while character ranges in regexp is  
constant.


=cut

Dan the Range (?:Ar)ranger

range operator vs. unicode

2006-06-08 Thread Dan Kogai


Porters,

I found that ('a'..'z') works only for alphanumerals.  Try the code  
below;


use strict;
use warnings;
#use utf8;
use charnames ':full';
binmode STDOUT, ':utf8';
# works
print "$_\n" for ("\N{LATIN CAPITAL LETTER A}" .. "\N{LATIN CAPITAL  
LETTER Z}");

# (0..9, 'A'..'Z', 'a'..'z'); symbols skipped
print "$_\n" for ("\N{DIGIT ZERO}" .. "\N{LATIN SMALL LETTER Z}");
# does not work
print "$_\n" for ("\N{LATIN SMALL LETTER A}" .. "\N{LEFT CURLY  
BRACKET}");
print "$_\n" for ("\N{NO-BREAK SPACE}" .. "\N{LATIN SMALL LETTER Y  
WITH DIAERESIS}");
print "$_\n" for ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK  
CAPITAL LETTER OMEGA}");
print "$_\n" for ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA  
LETTER VO}")

__END__

There is an easy workaround, however.

my @katakana = map { chr } ("\N{KATAKANA LETTER SMALL A}" .. "\N 
{KATAKANA LETTER VO}");



Since we have a workaround above, I don't consider this range  
implementation is a bug -- after all we would be rather surprised if  
('\x0' .. '\x{10}') worked.  But the following should be fixed so  
greeks are not confused with the consequence of  ("\N{GREEK CAPITAL  
LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"), japanese are not  
confused with ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER  
VO}") and so forth.


perldoc perlop
   The range operator (in list context) makes use of the  
magical auto-

   increment algorithm if the operands are strings.  You can say

   @alphabet = ('A' .. 'Z');

   to get all normal letters of the English alphabet, or

   $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];

   to get a hexadecimal digit, or

   @z2 = ('01' .. '31');  print $z2[$mday];

   to get dates with leading zeros.  If the final value  
specified is not
   in the sequence that the magical increment would produce,  
the sequence
   goes until the next value would be longer than the final  
value speci-

   fied.


Dan the Man with Too Many Characters to Squeeze in the Range

Re: range operator vs. unicode

Re: range operator vs. unicode

Re: range operator vs. unicode

Re: range operator vs. unicode

[PATCH] Re: range operator vs. unicode

Re: range operator vs. unicode

range operator vs. unicode

7 matches

Site Navigation

Mail list logo

Footer information