I have to correct myself regarding --compare doing seeks. It looks like it is doing checksums on both tar file contents and sometimes also on the file system. Of course that means reading the complete archive. My testing was flawed in that I --compared a single non-archived file. As tar seeks through all the archive headers unsuccessfully, I misread the error message.

I want to turn that into a feature request for a --no-checksum option, upon which tar does only uses file date and sizes to decide on file differences and skips the checksums.


Am 24.10.21 um 17:50 schrieb Johannes Nieß:

Hi,

While working on a backup script that --updates the tar file (on a disk) with --multi-volume, I discovered that tar does not seek through the archive and speed is much lower than expected. Are there any technical reasons for that, other than outdated silent assumptions?

While trying to read the code and documentation, I stumbled upon this code in buffer.c||

|if (!multi_volume_option && !use_compress_program_option && fstat (archive, &st) == 0) seekable_archive = S_ISREG (st.st_mode); else seekable_archive = false;|

That  multi_volume_option isn't documented to make the file non-seekable (see below). Is this just a silent and incorrect assumption that --multi-volume always implies non-seekable tapes?

From the man page:

       -n, --seek
              Assume  the  archive  is seekable.  Normally tar determines automatically whether the archive               can be seeked or not.  This option is intended for use in cases when such recognition  fails.               It  takes  effect  only if the archive is open for reading (e.g. with --list or --extract op-
              tions).

As far as I understand it, --update is mostly a combination of --compare (which is a seeking read operation) and --append in case file size and date differ. According to my tests, the --compare part of --update does not seek between headers (even without --multivolume and for an uncompressed .tar file).  Can we please get a huge performance boost in --update by making it jump from header to header (=seek) in the compare phase? The streamed file contents seem not to be needed for anything and slow down the process.

Best regards,

Johannes Nieß

Reply via email to