On 01/10/2015 11:35 PM, David E. Wheeler wrote:
On Jan 10, 2015, at 5:48 PM, Sean Burke <sbu...@cpan.org> wrote:

Helleu, Pod pals!
Short version about "Re: Assume CP1252"-- I advise: yes, assume CP1252 where 
technically you were expecting Latin-1.

Thanks for chiming in, Sean.

I agree completely, go for it!

Yes:
* assume that input is CP1252 in the absence of any encoding being declared
* assume that input is CP1252 if the declared encoding is Latin-1

As far as I know, that amicable bait-and-switch (i.e., construing Latin-1 to 
actually mean the superset CP1252) means in practice that everybody wins, and 
nobody loses, and DWIM prevails yet again.

Right, I vaguely remember you telling me this before. I forgot about #2 (and 
the HTML 5 precedent).

I think I oppose overruling someone's =encoding line. The reason that 1252 is effectively a superset of latin1 is because it reuses the C1 controls to mean something else, and we don't expect those controls to actually appear in a pod document. That is quite likely, except for one, NEL, U+85, which is the usual line separator on some platforms, notably os390 (that code point is the horizontal ellipsis in 1252).

It strikes me as wrong anyway to say we know better than the coder. There needs to be a way for a coder to specify the coding and not have that specification ignored by us. We do not have the foresight to know the possible circumstances where Latin1 is the correct value and 1252 is not. We could be wrong, and we should provide an easy workaround for our wrongness. The most straight forward which will lead to the least resentment against us when we are wrong is to simply not second guess what the coder has said.

os390 is proof that there is at least one platform that Perl runs on where 1252 is not a superset of Latin1. There could be special casing for that platform. But if we're wrong there, we could be wrong elsewhere. It just seems a bad idea to think we know better than the coder.

Reply via email to