Colin Watson <[EMAIL PROTECTED]> writes: > For me, this fixed the case where a 0xA0 byte is embedded essentially > accidentally in the middle of a UTF-8 stream (as happened with debconf's > Russian translations), but it broke the case where 0xA0 is actually > being used as a non-breaking space. Note that I'm using the new 'pod2man > --utf8' option, although presumably so is Gerfried since perldoc now > uses that option automatically. > > I've attached debconf.fr.1.pod, which reproduces this problem. Run > 'pod2man -c Debconf -r '' --utf8 --section=1 debconf.fr.1.pod', and look > carefully at the line matching "purge". It looks like this: > > soient bons et pour que les commandes «?purge?» et «?unregister?» soient > > The two characters marked as "?" here are the byte 0xA0. The characters > around it are encoded in UTF-8. 0xA0 doesn't decode as UTF-8 so man > assumes that this page must be ISO-8859-1, which means the whole page > comes out misencoded. > > Is this because Pod::Man hasn't been told about the encoding of the > input data, perhaps? The input files pretty much have to be in UTF-8 if > you're using --utf8, so do we have to tell perl that with binmode?
Hi Colin, You got it exactly right. Basically, podlators has been papering over this bug incorrectly, but in a way that happens to do the right thing with a common POD problem. Most POD authors from the pre-Unicode days of Perl don't realize this, but if you use Unicode characters in POD, you have to declare the input encoding in the POD in order for the results to be reliable and consistent. This is actually mentioned in perlpod, but if you were like me, you haven't read that recently. :) I just discovered this myself. "=encoding encodingname" This command is used for declaring the encoding of a document. Most users won’t need this; but if your encoding isn’t US-ASCII or Latin-1, then put a "=encoding encodingname" command early in the document so that pod formatters will know how to decode the document. For encodingname, use a name recognized by the Encode::Supported module. So if you're using UTF-8, starting the POD with: =encoding UTF-8 is required. If you add that, the current version of Pod::Man (and previous versions, as it turns out, mostly by chance) will do the right thing. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/> -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]