[PATCH] encoding.pm POD cleanup

Autrijus Tang Sat, 13 Mar 2004 06:06:05 -0800

Regular maintainence work. :-)

Thanks,
/Autrijus/


--- encoding.pm.orig    Sat Mar 13 21:46:21 2004
+++ encoding.pm Sat Mar 13 22:02:58 2004
@@ -158,11 +158,11 @@
 =item *
 
 Changing PerlIO layers of C<STDIN> and C<STDOUT> to the encoding
- specified.
+specified.
 
 =back
 
-=head2 Literal Conversions
+=head2 Literal conversions
 
 You can write code in EUC-JP as follows:
 
@@ -246,9 +246,9 @@
 
 Sets the script encoding to I<ENCNAME>.  And unless ${^UNICODE} 
 exists and non-zero, PerlIO layers of STDIN and STDOUT are set to
-":encoding(I<ENCNAME>)".
+C<:encoding(I<ENCNAME>)>.
 
-Note that STDERR WILL NOT be changed.
+Note that STDERR will I<not> be changed.
 
 Also note that non-STD file handles remain unaffected.  Use C<use
 open> or C<binmode> to change layers of those.
@@ -279,7 +279,7 @@
 =item no encoding;
 
 Unsets the script encoding. The layers of STDIN, STDOUT are
-reset to ":raw" (the default unprocessed raw stream of bytes).
+reset to C<:raw> (the default unprocessed raw stream of bytes).
 
 =back
 
@@ -291,7 +291,7 @@
 in UTF-8 -- or use a source filter.  That's what 'Filter=>1' does.
 
 What does this mean?  Your source code behaves as if it is written in
-UTF-8 with 'use utf8' in effect.  So even if your editor only supports
+UTF-8 with C<use utf8> in effect.  So even if your editor only supports
 Shift_JIS, for example, you can still try examples in Chapter 15 of
 C<Programming Perl, 3rd Ed.>.  For instance, you can use UTF-8
 identifiers.
@@ -327,12 +327,12 @@
 B<use encoding> can appear as many times as you want in a given script. 
 The multiple use of this pragma is discouraged.
 
-By the same reason, the use this pragma inside modules is also
-discouraged (though not as strongly discouranged as the case above.  
-See below).
+By the same reason, the use of this pragma inside modules is also
+discouraged, although not as strongly discouranged as the case above
+(see below).
 
 If you still have to write a module with this pragma, be very careful
-of the load order.  See the codes below;
+of the load order.  A common mistake is shown below:
 
   # called module
   package Module_IN_BAR;
@@ -345,16 +345,16 @@
   use Module_IN_BAR;
   # surprise! use encoding "bar" is in effect.
 
-The best way to avoid this oddity is to use this pragma RIGHT AFTER
-other modules are loaded.  i.e.
+The best way to avoid this oddity is to use this pragma I<right after>
+other modules are loaded, like this:
 
   use Module_IN_BAR;
   use encoding "foo";
 
 =head2 DO NOT MIX MULTIPLE ENCODINGS
 
-Notice that only literals (string or regular expression) having only
-legacy code points are affected: if you mix data like this
+This pragma only affects literals (string or regular expression) composed
+solely of legacy code points.  If you mix data like this:
 
        \xDF\x{100}
 
@@ -363,39 +363,39 @@
 
        "\xDF" =~ /\x{3af}/
 
-but this will not
+but this will not:
 
        "\xDF\x{100}" =~ /\x{3af}\x{100}/
 
-since the C<\xDF> (ISO 8859-7 GREEK SMALL LETTER IOTA WITH TONOS) on
-the left will B<not> be upgraded to C<\x{3af}> (Unicode GREEK SMALL
-LETTER IOTA WITH TONOS) because of the C<\x{100}> on the left.  You
-should not be mixing your legacy data and Unicode in the same string.
+Because of the C<\x{100}> on the right side, C<\xDF> (ISO 8859-7 GREEK
+SMALL LETTER IOTA WITH TONOS) on the left will B<not> be upgraded to
+C<\x{3af}> (Unicode GREEK SMALL LETTER IOTA WITH TONOS).  You should
+not be mixing your legacy data and Unicode in the same string.
 
-This pragma also affects encoding of the 0x80..0xFF code point range:
-normally characters in that range are left as eight-bit bytes (unless
+This pragma also affects encoding of the C<0x80>..C<0xFF> code point range.
+Characters in that range are normally left as eight-bit bytes (unless
 they are combined with characters with code points 0x100 or larger,
-in which case all characters need to become UTF-8 encoded), but if
-the C<encoding> pragma is present, even the 0x80..0xFF range always
-gets UTF-8 encoded.
+in which case all characters will be upgraded to unicode), but if
+the C<encoding> pragma is present, code points in the C<0x80>..C<0xFF>
+will always be decoded into unicode strings, with the specifed encoding.
 
 After all, the best thing about this pragma is that you don't have to
-resort to \x{....} just to spell your name in a native encoding.
+resort to C<\x{....}> just to spell your name in a native encoding.
 So feel free to put your strings in your encoding in quotes and
 regexes.
 
 =head2 tr/// with ranges
 
 The B<encoding> pragma works by decoding string literals in
-C<q//,qq//,qr//,qw///, qx//> and so forth.  In perl 5.8.0, this
-does not apply to C<tr///>.  Therefore,
+C<q//>, C<qq//>, C<qr//>, C<qw//>, C<qx//> and so forth.  In perl
+5.8.0, this did not apply to C<tr///>.  Therefore,
 
   use encoding 'euc-jp';
   #....
   $kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/;
   #           -------- -------- -------- --------
 
-Does not work as
+did not work as
 
   $kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/;
 
@@ -414,7 +414,7 @@
 
 This counterintuitive behavior has been fixed in perl 5.8.1.
 
-=head3 workaround to tr///;
+=head3 Workaround to tr///;
 
 In perl 5.8.0, you can work around as follows;
 
@@ -469,7 +469,7 @@
 
 =over
 
-=item literals in regex that are longer than 127 bytes
+=item Literals in regex that are longer than 127 bytes
 
 For native multibyte encodings (either fixed or variable length),
 the current implementation of the regular expressions may introduce
@@ -481,7 +481,7 @@
 (Porters who are willing and able to remove this limitation are
 welcome.)
 
-=item format
+=item Format
 
 This pragma doesn't work well with format because PerlIO does not
 get along very well with it.  When format contains non-ascii

pgp00000.pgp
Description: PGP signature

[PATCH] encoding.pm POD cleanup

Reply via email to