Re: encoding...
At 3:36 pm -0800 2/11/03, Jan Dubois wrote: Should work if you initialize the variable in a BEGIN block: BEGIN { $source = 'MacRoman'; } use encoding $source, STDOUT => 'utf-8'; Ah! Yes, put single quotes around your EOT marker: $text = <<'EOT'; $ome$tuff $ome$tuff $ome$tuff >EOT ># Hey! and I'd been wondering why people bothered to do that! Terrific. Thanks! JD
Re: encoding...
On Sun, 2 Nov 2003 23:24:41 +, John Delacour <[EMAIL PROTECTED]> wrote: >Question 1. > >In this script I would like for convenience' sake to use variables in >the second line, but I don't seem to be able to do so. Am I missing >something or is is simply not possible? > > >$source = 'MacRoman'; # I want to use this in the next line >use encoding qw( MacRoman ), STDOUT => qw( utf-8 ) ; Should work if you initialize the variable in a BEGIN block: BEGIN { $source = 'MacRoman'; } use encoding $source, STDOUT => 'utf-8'; "use" is executed at compile time, so variables initialized at runtime won't be usable. >$text = "café" ; >print $text ; > > >Question 2 > >Is there a way, without using q(), to single-quote a block of text as >one can double-quote it this way: > >$text = <$ome$tuff >$ome$tuff >$ome$tuff >EOT ># > >I want to be able to quote a block of JIS-encoded stuff (which >contains lots of $) Cheers, -Jan
encoding...
Question 1. In this script I would like for convenience' sake to use variables in the second line, but I don't seem to be able to do so. Am I missing something or is is simply not possible? $source = 'MacRoman'; # I want to use this in the next line use encoding qw( MacRoman ), STDOUT => qw( utf-8 ) ; $text = "café" ; print $text ; Question 2 Is there a way, without using q(), to single-quote a block of text as one can double-quote it this way: $text = < I want to be able to quote a block of JIS-encoded stuff (which contains lots of $)
Re: 5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}
> I just happened to notice that the perlre man page describes the > POSIX "[:punct:]" character class as being equivalent to the unicode > "\p{IsPunct}" character class. > > I haven't tried to track down the respective standards documents for > POSIX and Unicode to see whether these classes are _supposed_ to be > equivalent over the printable ASCII character set, but when I test them AFAIK there are currently no existing standards defining those equivalences. There has been some discussion about that in Unicode consortium mailing lists, but in fact there are some doubts about the wisdom of stating anything about such equivalences (because the C standards where the :foo: originate have frankly no clue about the more complex property structure of Unicode). The closest upcoming standard is the proposed update to the TR18: http://www.unicode.org/reports/tr18/tr18-8.html, see Annex C. If you say :punct: on a non-Unicode data, you are doing _operating_ _system_ _dependent_ AND _locale_ _dependent_ operation. :punct: and \p{Punct} are (supposed to be) equivalent with Unicode data. > in Perl 5.8.1, they are _not_ equivalent, as the following snippet will > demonstrate: -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
5.8.1 perlre man page: [:punct:] vs. \p{IsPunct}
I just happened to notice that the perlre man page describes the POSIX "[:punct:]" character class as being equivalent to the unicode "\p{IsPunct}" character class. I haven't tried to track down the respective standards documents for POSIX and Unicode to see whether these classes are _supposed_ to be equivalent over the printable ASCII character set, but when I test them in Perl 5.8.1, they are _not_ equivalent, as the following snippet will demonstrate: for $x ( 0x20 .. 0x7e ) { $_ = chr( $x ); $res = ( /[[:punct:]]/ ) ? "matches :punct:" : "is not a :punct:"; $res .= ( /\p{IsPunct}/ ) ? " matches {IsPunct}" : " fails on {IsPunct}"; printf( " 0x%x (%3d.) %s %s\n", $x, $x, $_, $res ) if ( $res =~ /matches/ ); } The differences involve these nine characters: $ + < = > ^ ` | ~ Except for the back-tick (`), I wouldn't be surprised if POSIX and Unicode are supposed to differ on these points, so maybe it's just a matter of fixing the perlre man page. (I'm not sure yet what the behavior of [:punct:] is supposed to be on non-ASCII punctuation characters in Unicode -- maybe the man page should clarify this too.) Dave Graff