Regular maintainence work. :-) Thanks, /Autrijus/
--- encoding.pm.orig Sat Mar 13 21:46:21 2004 +++ encoding.pm Sat Mar 13 22:02:58 2004 @@ -158,11 +158,11 @@ =item * Changing PerlIO layers of C<STDIN> and C<STDOUT> to the encoding - specified. +specified. =back -=head2 Literal Conversions +=head2 Literal conversions You can write code in EUC-JP as follows: @@ -246,9 +246,9 @@ Sets the script encoding to I<ENCNAME>. And unless ${^UNICODE} exists and non-zero, PerlIO layers of STDIN and STDOUT are set to -":encoding(I<ENCNAME>)". +C<:encoding(I<ENCNAME>)>. -Note that STDERR WILL NOT be changed. +Note that STDERR will I<not> be changed. Also note that non-STD file handles remain unaffected. Use C<use open> or C<binmode> to change layers of those. @@ -279,7 +279,7 @@ =item no encoding; Unsets the script encoding. The layers of STDIN, STDOUT are -reset to ":raw" (the default unprocessed raw stream of bytes). +reset to C<:raw> (the default unprocessed raw stream of bytes). =back @@ -291,7 +291,7 @@ in UTF-8 -- or use a source filter. That's what 'Filter=>1' does. What does this mean? Your source code behaves as if it is written in -UTF-8 with 'use utf8' in effect. So even if your editor only supports +UTF-8 with C<use utf8> in effect. So even if your editor only supports Shift_JIS, for example, you can still try examples in Chapter 15 of C<Programming Perl, 3rd Ed.>. For instance, you can use UTF-8 identifiers. @@ -327,12 +327,12 @@ B<use encoding> can appear as many times as you want in a given script. The multiple use of this pragma is discouraged. -By the same reason, the use this pragma inside modules is also -discouraged (though not as strongly discouranged as the case above. -See below). +By the same reason, the use of this pragma inside modules is also +discouraged, although not as strongly discouranged as the case above +(see below). If you still have to write a module with this pragma, be very careful -of the load order. See the codes below; +of the load order. A common mistake is shown below: # called module package Module_IN_BAR; @@ -345,16 +345,16 @@ use Module_IN_BAR; # surprise! use encoding "bar" is in effect. -The best way to avoid this oddity is to use this pragma RIGHT AFTER -other modules are loaded. i.e. +The best way to avoid this oddity is to use this pragma I<right after> +other modules are loaded, like this: use Module_IN_BAR; use encoding "foo"; =head2 DO NOT MIX MULTIPLE ENCODINGS -Notice that only literals (string or regular expression) having only -legacy code points are affected: if you mix data like this +This pragma only affects literals (string or regular expression) composed +solely of legacy code points. If you mix data like this: \xDF\x{100} @@ -363,39 +363,39 @@ "\xDF" =~ /\x{3af}/ -but this will not +but this will not: "\xDF\x{100}" =~ /\x{3af}\x{100}/ -since the C<\xDF> (ISO 8859-7 GREEK SMALL LETTER IOTA WITH TONOS) on -the left will B<not> be upgraded to C<\x{3af}> (Unicode GREEK SMALL -LETTER IOTA WITH TONOS) because of the C<\x{100}> on the left. You -should not be mixing your legacy data and Unicode in the same string. +Because of the C<\x{100}> on the right side, C<\xDF> (ISO 8859-7 GREEK +SMALL LETTER IOTA WITH TONOS) on the left will B<not> be upgraded to +C<\x{3af}> (Unicode GREEK SMALL LETTER IOTA WITH TONOS). You should +not be mixing your legacy data and Unicode in the same string. -This pragma also affects encoding of the 0x80..0xFF code point range: -normally characters in that range are left as eight-bit bytes (unless +This pragma also affects encoding of the C<0x80>..C<0xFF> code point range. +Characters in that range are normally left as eight-bit bytes (unless they are combined with characters with code points 0x100 or larger, -in which case all characters need to become UTF-8 encoded), but if -the C<encoding> pragma is present, even the 0x80..0xFF range always -gets UTF-8 encoded. +in which case all characters will be upgraded to unicode), but if +the C<encoding> pragma is present, code points in the C<0x80>..C<0xFF> +will always be decoded into unicode strings, with the specifed encoding. After all, the best thing about this pragma is that you don't have to -resort to \x{....} just to spell your name in a native encoding. +resort to C<\x{....}> just to spell your name in a native encoding. So feel free to put your strings in your encoding in quotes and regexes. =head2 tr/// with ranges The B<encoding> pragma works by decoding string literals in -C<q//,qq//,qr//,qw///, qx//> and so forth. In perl 5.8.0, this -does not apply to C<tr///>. Therefore, +C<q//>, C<qq//>, C<qr//>, C<qw//>, C<qx//> and so forth. In perl +5.8.0, this did not apply to C<tr///>. Therefore, use encoding 'euc-jp'; #.... $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/; # -------- -------- -------- -------- -Does not work as +did not work as $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/; @@ -414,7 +414,7 @@ This counterintuitive behavior has been fixed in perl 5.8.1. -=head3 workaround to tr///; +=head3 Workaround to tr///; In perl 5.8.0, you can work around as follows; @@ -469,7 +469,7 @@ =over -=item literals in regex that are longer than 127 bytes +=item Literals in regex that are longer than 127 bytes For native multibyte encodings (either fixed or variable length), the current implementation of the regular expressions may introduce @@ -481,7 +481,7 @@ (Porters who are willing and able to remove this limitation are welcome.) -=item format +=item Format This pragma doesn't work well with format because PerlIO does not get along very well with it. When format contains non-ascii
pgp00000.pgp
Description: PGP signature