Lars Gustäbel added the comment:

I have written a test for the issue, so that we have a basis for discussion.

There are four different scenarios where an unexpected eof can occur: inside a 
metadata block, directly after a metadata block, inside a data segment or 
directly after a data segment (i.e. missing end of archive marker).

Case #1 is taken care of (TruncatedHeaderError).

Case #4 is merely a violation of standard, which is neglectable.

Case #2 and #3 are essentially the same. If a data segment is empty or 
incomplete this means data was lost when the archive was created which should 
not go unnoticed when reading it. (see _FileInFile.read() for the code in 
question)

The problem is that, even after we have fixed case #2 and #4, we have no 
reliable way to detect an incomplete data segment unless we read it and count 
the bytes. If we simply iterate over the TarFile (e.g. do a TarFile.list()) the 
archive will appear intact. That is because in the TarFile.next() method we 
seek from one metadata block to the next, but we cannot simply detect if we 
seek beyond the end of the archive - except if we insist on the premise that 
each tar that we read is standards-compliant and comes with an end of archive 
marker (see case #4), which we probably should not.

Three possible options come to my mind:

1. Add a warning to the documentation that in order to test the integrity of an 
archive the user has to read through all the data segments.
2. Instead of using seek() in TarFile.next() use read() to advance the file 
pointer. This is a negative impact on the performance in most cases.
3. Insist on an end of archive marker. This has the disadvantage that users may 
get an exception although everything is fine.

----------
assignee:  -> lars.gustaebel
keywords: +patch
Added file: http://bugs.python.org/file39528/01-issue24259-test.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24259>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to