Paul Eggert wrote:
This discussion suggests the need for a new, easy-to use format option,
which is like '-Hpax' except that it omits atime and ctime, and omits
the subseconds part of mtime. Using this format would mean that pax
extensions won't be used unless they're needed (a file with a long name,
a timestamp past the year 2246, etc.) and so the tarball would be more
portable to platforms with older or buggy tarball extractors.
Maybe we could call this the 'art' format, for "archive reproducible
tarball", so that people could use 'tar -Hart' for it. Like 'ustar',
'art' format would be a strict subset of 'pax' format so it would be
POSIX-conforming.
We could introduce the new option in the next release of GNU tar, and
think about changing the default format to it in a later release.
What do you think?
I like the idea. In fact is what I do in tarlz. I would only suggest to make
'tar -Hart' protect the extended records with a CRC as tarlz does. See
http://www.nongnu.org/lzip/manual/tarlz_manual.html#Amendments-to-pax-format
and
http://www.nongnu.org/lzip/manual/tarlz_manual.html#key_005fcrc32
GNU.crc32
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The
keyword of the CRC record is protected by the CRC to guarante that
corruption is always detected (except in case of CRC collision). A CRC was
chosen because a checksum is too weak for a potentially large list of
variable sized records. A checksum can't detect simple errors like the
swapping of two bytes.
Antonio.