On Apr 5, 2005, at 15:33, Walter Dörwald wrote:
The stateful decoder has a little problem: At least three bytes
have to be available from the stream until the StreamReader
decides whether these bytes are a BOM that has to be skipped.
This means that if the file only contains "ab", the user will
never see these two characters.

Shouldn't the decoder be capable of doing a partial match and quitting early? After all, "ab" is encoded in UTF8 as <61> <62> but the BOM is <ef> <bb> <bf>. If it did this type of partial matching, this issue would be avoided except in rare situations.


A solution for this would be to add an argument named final to
the decode and read methods that tells the decoder that the
stream has ended and the remaining buffered bytes have to be
handled now.

This functionality is provided by a flush() method on similar objects, such as the zlib compression objects.


Evan Jones

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to