I have to correct myself regarding --compare doing seeks. It looks like
it is doing checksums on both tar file contents and sometimes also on
the file system. Of course that means reading the complete archive. My
testing was flawed in that I --compared a single non-archived file. As
tar seeks through all the archive headers unsuccessfully, I misread the
error message.
I want to turn that into a feature request for a --no-checksum option,
upon which tar does only uses file date and sizes to decide on file
differences and skips the checksums.
Am 24.10.21 um 17:50 schrieb Johannes Nieß:
Hi,
While working on a backup script that --updates the tar file (on a
disk) with --multi-volume, I discovered that tar does not seek through
the archive and speed is much lower than expected. Are there any
technical reasons for that, other than outdated silent assumptions?
While trying to read the code and documentation, I stumbled upon this
code in buffer.c||
|if (!multi_volume_option && !use_compress_program_option && fstat
(archive, &st) == 0) seekable_archive = S_ISREG (st.st_mode); else
seekable_archive = false;|
That multi_volume_option isn't documented to make the file
non-seekable (see below). Is this just a silent and incorrect
assumption that --multi-volume always implies non-seekable tapes?
From the man page:
-n, --seek
Assume the archive is seekable. Normally tar
determines automatically whether the archive
can be seeked or not. This option is intended for use
in cases when such recognition fails.
It takes effect only if the archive is open for
reading (e.g. with --list or --extract op-
tions).
As far as I understand it, --update is mostly a combination of
--compare (which is a seeking read operation) and --append in case
file size and date differ. According to my tests, the --compare part
of --update does not seek between headers (even without --multivolume
and for an uncompressed .tar file). Can we please get a huge
performance boost in --update by making it jump from header to header
(=seek) in the compare phase? The streamed file contents seem not to
be needed for anything and slow down the process.
Best regards,
Johannes Nieß