Re: encoding...

2003-11-02 Thread John Delacour
At 3:36 pm -0800 2/11/03, Jan Dubois wrote:

 Should work if you initialize the variable in a BEGIN block:

 BEGIN { $source = 'MacRoman'; }
 use encoding $source, STDOUT => 'utf-8';
Ah!

 Yes, put single quotes around your EOT marker:

 $text = <<'EOT';

$ome$tuff
$ome$tuff
$ome$tuff
  >EOT
  >#
Hey! and I'd been wondering why people bothered to do that!

Terrific.  Thanks!

JD




Re: encoding...

2003-11-02 Thread Jan Dubois
On Sun, 2 Nov 2003 23:24:41 +, John Delacour <[EMAIL PROTECTED]> wrote:

>Question 1.
>
>In this script I would like for convenience' sake to use variables in 
>the second line, but I don't seem to be able to do so.  Am I missing 
>something or is is simply not possible?
>
>
>$source = 'MacRoman';  # I want to use this in the next line
>use encoding qw(  MacRoman  ), STDOUT => qw(  utf-8 ) ;

Should work if you initialize the variable in a BEGIN block:

BEGIN { $source = 'MacRoman'; }
use encoding $source, STDOUT => 'utf-8';

"use" is executed at compile time, so variables initialized at runtime
won't be usable.

>$text = "café" ;
>print $text ;
>
>
>Question 2
>
>Is there a way, without using q(), to single-quote a block of text as 
>one can double-quote it this way:
>
>$text = <$ome$tuff
>$ome$tuff
>$ome$tuff
>EOT
>#
>
>I want to be able to quote a block of JIS-encoded stuff (which 
>contains lots of $)

Cheers,
-Jan



encoding...

2003-11-02 Thread John Delacour
Question 1.

In this script I would like for convenience' sake to use variables in 
the second line, but I don't seem to be able to do so.  Am I missing 
something or is is simply not possible?

$source = 'MacRoman';  # I want to use this in the next line
use encoding qw(  MacRoman  ), STDOUT => qw(  utf-8 ) ;
$text = "café" ;
print $text ;
Question 2

Is there a way, without using q(), to single-quote a block of text as 
one can double-quote it this way:

$text = <
I want to be able to quote a block of JIS-encoded stuff (which 
contains lots of $)


Re: 5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}

2003-11-02 Thread Jarkko Hietaniemi
> I just happened to notice that the perlre man page describes the 
> POSIX "[:punct:]" character class as being equivalent to the unicode 
> "\p{IsPunct}" character class.
> 
> I haven't tried to track down the respective standards documents for
> POSIX and Unicode to see whether these classes are _supposed_ to be
> equivalent over the printable ASCII character set, but when I test them

AFAIK there are currently no existing standards defining those
equivalences.  There has been some discussion about that in Unicode
consortium mailing lists, but in fact there are some doubts about the
wisdom of stating anything about such equivalences (because the C
standards where the :foo: originate have frankly no clue about the
more complex property structure of Unicode).

The closest upcoming standard is the proposed update to the TR18:
http://www.unicode.org/reports/tr18/tr18-8.html, see Annex C.

If you say :punct: on a non-Unicode data, you are doing _operating_
_system_ _dependent_ AND _locale_ _dependent_ operation.  :punct: and
\p{Punct} are (supposed to be) equivalent with Unicode data.

> in Perl 5.8.1, they are _not_ equivalent, as the following snippet will
> demonstrate:

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen


5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}

2003-11-02 Thread David Graff

I just happened to notice that the perlre man page describes the 
POSIX "[:punct:]" character class as being equivalent to the unicode 
"\p{IsPunct}" character class.

I haven't tried to track down the respective standards documents for
POSIX and Unicode to see whether these classes are _supposed_ to be
equivalent over the printable ASCII character set, but when I test them
in Perl 5.8.1, they are _not_ equivalent, as the following snippet will
demonstrate:

for $x ( 0x20 .. 0x7e ) { 
$_ = chr( $x );
$res = ( /[[:punct:]]/ ) ? "matches  :punct:" : "is not a :punct:";
$res .= ( /\p{IsPunct}/ ) ? " matches  {IsPunct}" : " fails on {IsPunct}";
printf( " 0x%x (%3d.) %s %s\n", $x, $x, $_, $res ) if ( $res =~ /matches/ );
}

The differences involve these nine characters:  $ + < = > ^ ` | ~

Except for the back-tick (`), I wouldn't be surprised if POSIX and 
Unicode are supposed to differ on these points, so maybe it's just a 
matter of fixing the perlre man page.  (I'm not sure yet what the 
behavior of [:punct:] is supposed to be on non-ASCII punctuation 
characters in Unicode -- maybe the man page should clarify this too.)

Dave Graff