Antoine Pitrou wrote: > M.-A. Lemburg <mal <at> egenix.com> writes: >> >> It's rather common to exchange text files between users... and >> in form of XML and CSV files, these also tend to get used as >> input for programs every now and then > > For XML files, encoding should be taken care of by the XML layer, not the IO > layer. That is, they will be read and written as binary files. > > For CSV files, the situation is in my experience hopeless. Even on an UTF-8 > system some software will interpret them as latin-1 by default. The "solution" > I've come with is to first decode them as UTF-8 and fall back on latin-1 if it > fails.
... and then there are the UTF-16-LE CSV files that Excel exports. In any case, the encodings of these files don't have anything to do with the user's locale setting and that's why guessing based on this setting is less than ideal. Now, we cannot easily remove this guessing since we're in stable mode again with 3.1. Perhaps we should add a way to at least be able to switch off this guessing, so that applications can be tested in a predictable way, rather than depending on the test runner's locale settings ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com