Le vendredi 08 janvier 2010 10:10:23, Martin v. Löwis a écrit : > > Builtin open() function is unable to open an UTF-16/32 file starting with > > a BOM if the encoding is not specified (raise an unicode error). For an > > UTF-8 file starting with a BOM, read()/readline() returns also the BOM > > whereas the BOM should be "ignored". > > It depends. If you use the utf-8-sig encoding, it *will* ignore the > UTF-8 signature.
Sure, but it means that you only use UTF-8+BOM files. If you get UTF-8 and UTF-8+BOM files, you have to to detect the encoding (not an easy job) or to remove the BOM after the first read (much harder if you use a module like ConfigParser or csv). > > Since my proposition changes the result TextIOWrapper.read()/readline() > > for files starting with a BOM, we might introduce an option to open() to > > enable the new behaviour. But is it really needed to keep the backward > > compatibility? > > Absolutely. And there is no need to produce a new option, but instead > use the existing options: define an encoding that auto-detects the > encoding from the family of BOMs. Maybe you call it encoding="sniff". Good idea, I choosed open(filename, encoding="BOM"). -- Victor Stinner http://www.haypocalc.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com