Re: [Python-Dev] Unicode byte order mark decoding

Walter Dörwald Tue, 05 Apr 2005 03:31:19 -0700

M.-A. Lemburg wrote:

[...]
With the UTF-8-SIG codec, it would apply to all operation modes of
the codec, whether stream-based or from strings. Whether or not to
use the codec would be the application's choice.


I'd suggest to use the same mode of operation as we have in
the UTF-16 codec: it removes the BOM mark on the first call
to the StreamReader .decode() method and writes a BOM mark
on the first call to .encode() on a StreamWriter.

Note that the UTF-16 codec is strict w/r to the presence
of the BOM mark: you get a UnicodeError if a stream does
not start with a BOM mark. For the UTF-8-SIG codec, this
should probably be relaxed to not require the BOM.

I've started writing such a codec. Making the BOM optional on decoding definitely simplifies the implementation.

Bye,
   Walter Dörwald
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode byte order mark decoding

Reply via email to