Grant McLean <gr...@mclean.net.nz> writes: > OK, so I went ahead and implemented both the warning and the heuristic > to guess Latin-1 vs UTF-8 (only when no encoding was specified). The > resulting patch is here: > > https://github.com/theory/pod-simple/pull/26
This patch enforces authors to add an "=encoding UTF-8" line to specify that the doc is, indeed, UTF-8 encoded. Wouldn't it be far better to consider all POD documents to be Utf-8 encoded Unicode and fall back to Latin1 if invalid UTF-8 sequences are detected? In other words, do not enforce the author to add "=encoding UTF-8" since that's the default? And only add "=encoding ISO8859-1" for Latin1 encoded documents? Since most POD documents currently are ASCII, they won't be affected. POD docs that are Latin1 or something similar must get an explicit encoding line added. These are precisely the documents affected by your patch. -- Johan