Colin Watson <[EMAIL PROTECTED]> writes:

> For me, this fixed the case where a 0xA0 byte is embedded essentially
> accidentally in the middle of a UTF-8 stream (as happened with debconf's
> Russian translations), but it broke the case where 0xA0 is actually
> being used as a non-breaking space. Note that I'm using the new 'pod2man
> --utf8' option, although presumably so is Gerfried since perldoc now
> uses that option automatically.
>
> I've attached debconf.fr.1.pod, which reproduces this problem. Run
> 'pod2man -c Debconf -r '' --utf8 --section=1 debconf.fr.1.pod', and look
> carefully at the line matching "purge". It looks like this:
>
>   soient bons et pour que les commandes «?purge?» et «?unregister?» soient
>
> The two characters marked as "?" here are the byte 0xA0. The characters
> around it are encoded in UTF-8. 0xA0 doesn't decode as UTF-8 so man
> assumes that this page must be ISO-8859-1, which means the whole page
> comes out misencoded.
>
> Is this because Pod::Man hasn't been told about the encoding of the
> input data, perhaps? The input files pretty much have to be in UTF-8 if
> you're using --utf8, so do we have to tell perl that with binmode?

Hi Colin,

You got it exactly right.  Basically, podlators has been papering over
this bug incorrectly, but in a way that happens to do the right thing with
a common POD problem.

Most POD authors from the pre-Unicode days of Perl don't realize this, but
if you use Unicode characters in POD, you have to declare the input
encoding in the POD in order for the results to be reliable and
consistent.  This is actually mentioned in perlpod, but if you were like
me, you haven't read that recently.  :)  I just discovered this myself.

       "=encoding encodingname"
           This command is used for declaring the encoding of a document.
           Most users won’t need this; but if your encoding isn’t US-ASCII or
           Latin-1, then put a "=encoding encodingname" command early in the
           document so that pod formatters will know how to decode the
           document.  For encodingname, use a name recognized by the
           Encode::Supported module.

So if you're using UTF-8, starting the POD with:

    =encoding UTF-8

is required.  If you add that, the current version of Pod::Man (and
previous versions, as it turns out, mostly by chance) will do the right
thing.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to