Martin v. Löwis wrote: > Stephen J. Turnbull wrote: > >> So there is a standard for the UTF-8 signature, and I know of >> applications which produce it. While I agree with you that Python's >> codecs shouldn't produce it (by default), providing an option to strip >> is a good idea. > > I would personally like to see an "utf-8-bom" codec (perhaps better > named "utf-8-sig", which strips the BOM on reading (if present) > and generates it on writing.
+1. >> However, this option should be part of the initialization of an IO >> stream which produces Unicodes, _not_ an operation on arbitrary >> internal strings (whether raw or Unicode). > > > With the UTF-8-SIG codec, it would apply to all operation modes of > the codec, whether stream-based or from strings. Whether or not to > use the codec would be the application's choice. I'd suggest to use the same mode of operation as we have in the UTF-16 codec: it removes the BOM mark on the first call to the StreamReader .decode() method and writes a BOM mark on the first call to .encode() on a StreamWriter. Note that the UTF-16 codec is strict w/r to the presence of the BOM mark: you get a UnicodeError if a stream does not start with a BOM mark. For the UTF-8-SIG codec, this should probably be relaxed to not require the BOM. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 05 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com