Re: [Python-Dev] Unicode byte order mark decoding

"Martin v. Löwis" Tue, 05 Apr 2005 01:03:19 -0700

Stephen J. Turnbull wrote:

So there is a standard for the UTF-8 signature, and I know of
applications which produce it.  While I agree with you that Python's
codecs shouldn't produce it (by default), providing an option to strip
is a good idea.


I would personally like to see an "utf-8-bom" codec (perhaps better
named "utf-8-sig", which strips the BOM on reading (if present)
and generates it on writing.

However, this option should be part of the initialization of an IO
stream which produces Unicodes, _not_ an operation on arbitrary
internal strings (whether raw or Unicode).


With the UTF-8-SIG codec, it would apply to all operation modes of
the codec, whether stream-based or from strings. Whether or not to
use the codec would be the application's choice.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode byte order mark decoding

Reply via email to