[Bug-tar] Incorrect listing of sparse files with more than 8G of real data

Niessen, Chris Sat, 25 Oct 2014 14:19:28 -0700

If a sparse file with more than 8G of real data is stored in a POSIX format 
archive (which is done correctly in 1.28), listing the contents of the archive 
will fail.


Archiving a sparse file with more than 8G of real data results in two extended 
header entries being written; GNU.sparse.realsize and size.
When the archive is listed, both of those values are read and end up being 
stored in stat.st_size, which causes whichever value came first (happens to be 
GNU.sparse.realsize) to be lost.  (This is because size_decoder and 
sparse_size_decoder currently do exactly the same thing.)

If the file has less than 8G of real data, then the amount of real data in the 
archive, which gets put in stat.st_size when the file header is first read, 
gets stashed in stat_info->archive_file_size in list.c:692 prior to the 
extended headers getting parsed.  Then, after the extended headers are parsed, 
and stat.st_size gets updated with the value in GNU.sparse.realsize by 
sparse_size_decoder, both values are available, and tar successfully lists the 
contents of the archive.

However, if the file has more than 8G of real data, then the value that gets 
stashed in stat_info->archive_file_size is the value from the file header, 
which was written as zero since the actual data size doesn't fit in the POSIX 
header field, and the actual data size gets put in a "size" extended header.  
Since the actual size of the file in the archive doesn't get saved in 
list.c:692 (since it hasn't been read out of the extended header yet), then the 
actual size of the data never makes it into archive_file_size, and the listing 
operation will fail, since tar will not successfully skip to the next member 
and will display errors.

A patch to address this was submitted against 1.27
http://www.mail-archive.com/bug-tar%40gnu.org/msg03905.html
but it doesn't seem to have made it in to 1.28.

Before finding that patch, I generated my own that modifies size_decoder to put 
the value of the "size" extended header value into archive_file_size, and if 
archive_file_size and stat.st_size have the same value (meaning stat.st_size 
hasn't been updated by a previously parsed extended header), then the "size" 
attribute will also get put into stat.st_size.  That way, stat.st_size will be 
updated properly for non-sparse files, but will not be clobbered for sparse 
ones.

I can provide that patch if desired, but its only two lines.
Thanks-
-Chris Niessen

[Bug-tar] Incorrect listing of sparse files with more than 8G of real data

Reply via email to