New submission from Daniel Pope <lord.ma...@gmail.com>:

A tarfile.TarFile object open for writing may silently write corrupt tar files 
if it is destroyed before being closed.

While explicitly calling close() or using the object as a context manager is 
recommended, I would not expect this in basic usage.

There are two steps needed for a TarFile to be closed properly:

* According to https://github.com/python/cpython/blob/3.7/Lib/tarfile.py#L1726, 
two zero blocks must be written (though GNU tar seems to work even if these are 
absent)
* The underlying fileobj (an io.BufferedWriter) must then be flushed

A BufferedWriter is flushed in its __del__(); the problem is that TarFile 
objects form a reference cycle with their TarInfo members due to this line, 
which has the comment "Not Needed": 
https://github.com/python/cpython/blob/3.7/Lib/tarfile.py#L1801

Under PEP-442, when the TarFile becomes unreferenced the following Cycle 
Isolate is formed:

    TarInfo <=> TarFile -> BufferedWriter -> FileIO

Finalisers for these objects are run in an undefined order. If the FileIO 
finaliser is run before the BufferedWriter finaliser, then the fd is closed, 
buffered data in the BufferedWriter is not committed to disk, and the tar file 
is corrupt.

Additionally, while ResourceWarning is issued if the BufferedWriter or FileIO 
are left unclosed, no such warning is emitted by the TarFile.

----------
components: Library (Lib)
messages: 325266
nosy: lordmauve
priority: normal
severity: normal
status: open
title: tarfile.TarFile may write corrupt files if not closed
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34662>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to