Ezio Melotti <ezio.melo...@gmail.com> added the comment:

Lars, I think the situation can still be improved. If tarfile works with bytes 
strings it should accept only bytes strings or unicode strings that can be 
encoded in ASCII, and encode them as soon as it gets them.
In the problem reported by Peter, he was passing u"." that is a unicode 
ASCII-only string. Later in the program this string gets mixed with a byte 
string and this causes an implicit decoding, i.e. it turns the byte strings to 
unicode (and possibly fails if the filename is non-ASCII). Even if the decoding 
succeeds, eventually tarfile will have to convert the unicode string to a byte 
string again.

A better approach would be to encode using the ASCII codec all the unicode 
strings that are passed.
If the unicode strings are ASCII-only (like the u"." Peter was passing), they 
can be encoded without problems. When they get mixed with other strings they 
are all bytes strings so no implicit decoding happens.
If the unicode strings are non-ASCII, the encoding will fail immediately and 
warn the user that he will have to encode the unicode string before passing it 
to the function.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7693>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to