Re: Question about coreutils 'cp' - does it check target file after copying?

Bob Proulx Sat, 03 Oct 2009 08:43:44 -0700

Alex wrote:
> Does the GNU coreutils 'cp' utility guarantee that the target file
> after copying is the same as the source one?


Mostly it is a correct by construction process.  GNU 'cp' will report
any error that occurs during the copy.  If no error occurs then the
copy was correct.

> If it doesn't then I'll need to make my own diff-ing or checksums
> verifying, right? Or maybe, all the copying is implemented via 100%-
> reliable low-level calls, so all the checking I'm talking about is
> redundant?

The 'cp' command reads and writes files using the kernel system calls.
The only way to have a file that isn't identical to the source is if
the kernel is buggy and incorrectly reports success when in actuality
it had failed.  Otherwise if the read and write calls both return
success then the file will be successfully copied.  Therefore in the
'cp' command itself there isn't a need to do an additional comparison
check and indeed especiall on large files such a check would be a
severe penalty.

Note that "sparse" files are somewhat of a special case and can be
expanded or preserved depending upon the options used for the copy.
But I don't think that is what you are talking about.

In summary I don't think you need to do an additional integrity check
if the 'cp' reports success.

There are times when being able to deduce if a /previous/ run of 'cp'
was successful.  For example if the 'cp' command was prevented from
finishing because power was lost to the system.  Obviously no success
or failure was reported and the calling process also didn't run and
the files might not be identical.  There may be a partially written
file on disk in that case.  Even if you added a post copy check you
could be in this condition since the post copy check couldn't run with
the power off either.  The 'rsync' tool is very useful in such
situations for two reasons.  One is that it will re-sync the files
only if they are not the same making recovery efficient and doing
nothing if nothing needs to be done making doing nothing very
efficient too.  Another is that rsync copies files to a temporary
location and then renames them into place when the full file is
available so as to avoid a time when only a partial file is in place.
However even that venerable technique fails on some newer buggy
filesystem implementations that try to optimize too much and reorder
actions.  (Trying to avoid starting a long discussion about it here
but people who recognize what I am referring to will know the
arguments on both sides.)  In any case using 'rsync' is useful if you
need to be able to run the same command repeatedly and want to avoid
unnecessary copies.

Bob

Re: Question about coreutils 'cp' - does it check target file after copying?

Reply via email to