On 9 Jan 2014 22:25, "Kristján Valur Jónsson" <krist...@ccpgames.com> wrote: > > > > > -----Original Message----- > > From: Victor Stinner [mailto:victor.stin...@gmail.com] > > Sent: 9. janúar 2014 13:51 > > To: Kristján Valur Jónsson > > Cc: Antoine Pitrou; python-dev@python.org > > Subject: Re: [Python-Dev] Python3 "complexity" > > > > 2014/1/9 Kristján Valur Jónsson <krist...@ccpgames.com>: > > > This definition is funny, because according to Wikipedia, it is a > > > "superset" of 8869-1 ( latin1) > > > > Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in > > (IANA's) ISO-8859-1. > > > > Python implements the latter, ISO-8859-1. > > > > Wikipedia says "This encoding is a superset of ISO 8859-1, but differs from > > the IANA's ISO-8859-1". > > > > Thanks. That's entirely non-confusing :) > " ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429." > > So anyway, yes, Python's "latin1" encoding does cover the entire 256 range. But on windows we use cp1252 instead which does not, > but instead defines useful and common windows characters in many of the control caracters slots. > Hence the need for "surrogateescape" to be able to roundtrip characters. > > Again, this is non-obvious, and knowing from my experience with cp1252, I had no way of guessing that the "subset", i.e. latin1, would indeed cover all the range. Two things then I have learned since my initial foray into parsing ascii files with python3: Surrogateescapes and "latin1 in python == IANA's ISO-8859-1 which does indeed define the whole 8 bit range".
http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.htmlis currently linked from the Unicode HOWTO. However, I'd be happy to offer it for direct inclusion to help make it more discoverable. Cheers, Nick. > > K > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com