[issue36304] When using bz2 and lzma in mode 'wt', the BOM is not written

2019-03-18 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36304] When using bz2 and lzma in mode 'wt', the BOM is not written

2019-03-18 Thread Gianluca


Gianluca  added the comment:

In case the file is not seekable, we could decide based on the file mode:
- if mode='w', write the BOM
- if mode='a', don't write the BOM

Of course, mode "a" doesn't guarantee we are in the middle of the file, but it 
seems a consistent behavior not writing the BOM if we are "appending" to the 
file.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36304] When using bz2 and lzma in mode 'wt', the BOM is not written

2019-03-15 Thread Martin Panter

Martin Panter  added the comment:

I suspect this is caused by TextIOWrapper guessing if it is writing the start 
of a file versus in the middle, and being confused by “seekable” returning 
False. GzipFile implements some “seek” calls in write mode, but LZMAFile and 
BZ2File do not.

Using this test class:

class Writer(BufferedIOBase):
def writable(self):
return True
def __init__(self, offset):
self.offset = offset
def seekable(self):
result = self.offset is not None
print('seekable ->', result)
return result
def tell(self):
print('tell ->', self.offset)
return self.offset
def write(self, data):
print('write', repr(data))

a BOM is inserted when “tell” returns zero:

>>> t = io.TextIOWrapper(Writer(0), 'utf-16')
seekable -> True
tell -> 0
>>> t.write('HI'); t.flush()  # Writes BOM
2
write b'\xff\xfeH\x00I\x00'

and not when “tell” returns a positive number:

>>> t = io.TextIOWrapper(Writer(1), 'utf-16')
seekable -> True
tell -> 1
>>> t.write('HI'); t.flush()  # Omits BOM
2
write b'H\x00I\x00'

However the “io” and “_pyio” behaviours differ when “seekable” returns False:

>>> t = io.TextIOWrapper(Writer(None), 'utf-16')
seekable -> False
>>> t.write('HI'); t.flush()  # io omits BOM
2
write b'H\x00I\x00'
>>> t = _pyio.TextIOWrapper(Writer(None), 'utf-16')
seekable -> False
>>> t.write('HI'); t.flush()  # _pyio writes BOM
write b'\xff\xfeH\x00I\x00'
2

IMO the “_pyio” behaviour is more sensible: write a BOM because that’s what the 
UTF-16 codec produces.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36304] When using bz2 and lzma in mode 'wt', the BOM is not written

2019-03-15 Thread Terry J. Reedy


Change by Terry J. Reedy :


--
nosy: +benjamin.peterson, ezio.melotti, lemburg, vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36304] When using bz2 and lzma in mode 'wt', the BOM is not written

2019-03-15 Thread Gianluca


Gianluca  added the comment:

As one can read in the stackoverflow answer, using _pyio.TextIOWrapper works as 
expected. So it looks like this is a bug of _io.TextIOWrapper.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36304] When using bz2 and lzma in mode 'wt', the BOM is not written

2019-03-15 Thread Gianluca


New submission from Gianluca :

When bz2 and lzma files are used in writing text mode (wrapped in a 
TextIOWrapper), the BOM of encodings such as utf-16 and utf-32 is not written. 
The gzip package works as expected (it writes the BOM).

The code that demonstrate this behavior (tested with Python 3.7) is attached 
here and can also be found on stackoverflow: 
https://stackoverflow.com/questions/55171439/python-bz2-and-lzma-in-mode-wt-dont-write-the-bom-while-gzip-does-why?noredirect=1#comment97103212_55171439

--
components: IO
files: demonstrate_BOM_issue.py
messages: 337987
nosy: janluke
priority: normal
severity: normal
status: open
title: When using bz2 and lzma in mode 'wt', the BOM is not written
type: behavior
versions: Python 3.7
Added file: https://bugs.python.org/file48209/demonstrate_BOM_issue.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com