Martin Panter <[email protected]> added the comment:
I suspect this is caused by TextIOWrapper guessing if it is writing the start
of a file versus in the middle, and being confused by “seekable” returning
False. GzipFile implements some “seek” calls in write mode, but LZMAFile and
BZ2File do not.
Using this test class:
class Writer(BufferedIOBase):
def writable(self):
return True
def __init__(self, offset):
self.offset = offset
def seekable(self):
result = self.offset is not None
print('seekable ->', result)
return result
def tell(self):
print('tell ->', self.offset)
return self.offset
def write(self, data):
print('write', repr(data))
a BOM is inserted when “tell” returns zero:
>>> t = io.TextIOWrapper(Writer(0), 'utf-16')
seekable -> True
tell -> 0
>>> t.write('HI'); t.flush() # Writes BOM
2
write b'\xff\xfeH\x00I\x00'
and not when “tell” returns a positive number:
>>> t = io.TextIOWrapper(Writer(1), 'utf-16')
seekable -> True
tell -> 1
>>> t.write('HI'); t.flush() # Omits BOM
2
write b'H\x00I\x00'
However the “io” and “_pyio” behaviours differ when “seekable” returns False:
>>> t = io.TextIOWrapper(Writer(None), 'utf-16')
seekable -> False
>>> t.write('HI'); t.flush() # io omits BOM
2
write b'H\x00I\x00'
>>> t = _pyio.TextIOWrapper(Writer(None), 'utf-16')
seekable -> False
>>> t.write('HI'); t.flush() # _pyio writes BOM
write b'\xff\xfeH\x00I\x00'
2
IMO the “_pyio” behaviour is more sensible: write a BOM because that’s what the
UTF-16 codec produces.
----------
nosy: +martin.panter
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue36304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com