On 9/18/25 3:19 AM, Paul Eggert wrote:
I also see that (at least on my machine) changing the xattrs does not affect the file's modification or change times

Well *that's* weird. I would not expect the system calls to do that. This whole xattr business is a bit of an undocumented zoo, I'm afraid.

Hmm looks like my bad here, trying this now the "change" time stamp is being updated by changes to the xattrs (or ACLs too). (Interestingly, it effects the timestamp on both files hardlinked to the same inode, but maybe that's obvious to everyone else.) Code:
touch foo ; sleep 1 ; setfattr -n user.creator -v bar foo ; stat foo

I'm happy to have been wrong in this case, as this simplifies both the apparent issues and the potential solutions. As far as I can tell, a change to any one of the attributes that can be included with the --preserve flag will result in a new "change" timestamp, and so I don't think there's any need to look beyond the most intuitive implementation of the --update=older option: If the timestamps on the target are newer than the timestamps on the source, don't do anything.

If I add --no-preserve=mode to the command, this does prevent the extra call to chmod, but adding --no-preserve=xattr still results in a call to chmod, which seems odd based on your explanation.

I've lost context. How do you reproduce the bug? I just now tried "cp -p --no-preserve=mode --no-preserve=xattr abc def" on two files in /tmp on Ubuntu 25.04, and saw no call to chmod, just to fsetxattr with "system.posix_acl_access". The "abc" file was permission 0664 and my umask was 002.

Sorry, yes, it got a little abstract. I'm only seeing extra calls to chmod being applied to directories, so the point here was simply that the call to chmod is not being triggered by --preserve=xattr, and so I don't think we need to worry about xattrs for this problem. The chmod I'm refering to should be reproducible like this:
mkdir foo
mkdir bar
cp --recursive --preserve=mode --update=older --target-directory="./bar" "foo" strace cp --recursive --preserve=mode --update=older --target-directory="./bar" "foo"

I finally learned how to use gdb correctly, and so I can say that the call to chmod seems to originate here: https://github.com/coreutils/coreutils/blob/master/src/copy.c#L2691 , i.e., near the end of the copy_internal function in copy.c, where several attributes are copied as needed. Interestingly, just a few lines before that, before the corresponding block for preserving the ownership, there is already the comment /* Avoid calling chown if we know it's not necessary.  */ , so it seems reasonable that the behavior of preserving the mode should be the same.

another good solution here seems like it could be to remove the overlap in these two, so that it would be possible to specify copying the mode/ACLs without the xattrs,

Can one even do that at the syscall level? POSIX ACLs are xattrs, no?

I'm not see any xattrs when I set ACLs, but I don't know much about POSIX; in any case I think it's besides the point now.

The first odd thing would seem to be that these files are being relinked every time that cp is run, even when they're already linked.

I would expect this in the normal case. What's the scenario and why don't you expect it in your case?

Again, to make this more concrete let's consider this code:
touch foo
cp -l foo bar
mkdir qux
cp --preserve=links --update=older --target-directory="./qux" *
strace cp --preserve=links --update=older --target-directory="./qux" *

Here, with the --update=older option, it seems unexpected that ./qux/foo and ./qux/bar are being relinked the second time they are copied from ".", when what the manual describes is "--update[=UPDATE]    control which existing files are updated" and "'older' [...] results in files being replaced if they're older than the corresponding source file."

I think two hypothetical situations show the kinds of situations that could reasonably be accounted for, and have reasonably clear ideal behaviors with respect to cp:

1. Two files, A and B, are hardlinked in directory SRC, and copied to
   DEST with the options --preserve=links and --update=older. Here,
   clearly DEST/A and DEST/B should be hardlinked to each other. Then,
   later, SRC/A and SRC/B are unlinked, and become two independent
   files; then, next time cp is run with --preserve=links and
   --update=older, since the files in SRC are newer than those in DEST,
   DEST/A and DEST/B should be overwritten with the new versions from SRC.
2. Two files, A and B, are hardlinked in directory SRC, and copied to
   DEST with the options --preserve=links and --update=older. Here,
   clearly DEST/A and DEST/B should be hardlinked to each other.
   (Everything is the same up to here.) Then, later, DEST/A and DEST/B
   are unlinked, and become independent files; then, next time cp is
   run with --preserve=links and --update=older, what should be done?
   To me it seems clear the "--update=older" means that in the case of
   having new data in DEST, they should not be overwritten, or
   modified. However, the current behavior of cp is rather strange: It
   would link DEST/A and DEST/B without worrying about their contents
   at all. DEST/A and DEST/B could be two files containing,
   independently, arbitrary data, and cp would simply pick one at
   random (or perhaps the second one listed on the command line?) to
   overwrite with the other, while also not ensuring that the contents
   of these two files has anything to do with the data in SRC/A or
   SRC/B. Clearly, this is a rather unlikely situation, but to me the
   most logical behavior would be to not randomly destroy the data in
   one of the files in DEST when the "--update=older" option appears to
   be requesting that cp do exactly the opposite.

Code to show the current behavior of cp for scenario 2:
touch A
cp -l A B
echo 1 > B
cat A
mkdir DEST
cp --preserve=links --update=older --target-directory="./DEST" *
cat DEST/A
rm DEST/B
echo 2 > DEST/A
echo 3 > DEST/B
cp --preserve=links --update=older --target-directory="./DEST" *
cat DEST/A
cat DEST/B

Here, both DEST/A and DEST/B will have "2" in them, at the end.

But it gets stranger— starting with an empty target directory, if I run this command: cp --no-preserve=links --update=older --target-directory="/home/kye/ test" ./* then the first time I run it, the files hardlinked in the source directory are not hardlinked in the target (seemingly correct), but when running it the second time (and afterwards) the two files then do become hardlinked, apparently contradicting the --no-preserve=links option.

So, it makes sense that hardlinked files were being hardlinked in the target directory when I had --preserve=all, but maybe it doesn't make sense to relinking them over-and-over when they are already linked. And it seems like --update=older is somehow forcing the files to be hardlinked, but only when they already exist in the target directory.

Yes, it's all fairly mysterious. Perhaps someone with more time can look into it.

So, similar to the previous one, this can be reproduced as follows:
touch foo
cp -l foo bar
mkdir qux
cp --no-preserve=links --update=older --target-directory="./qux" *
ls -i qux
strace cp --no-preserve=links --update=older --target-directory="./qux" *
ls -i qux

From gdb, the call (responsible for the behavior of both this example and the one above) appears to originate from https://github.com/coreutils/coreutils/blob/master/src/copy.c#L1823 , which is in a block of copy_internal specific to combination of the "--update=older" option with a destination file that is newer than the source. This line even comes with the helpful comment /* Note we currently replace DST_NAME unconditionally, even if it was a newer separate file.  */ , so actually this behavior is somehow intentional, but I don't see any justification given for this exception to the expected behavior of --update=older or --no-preserve=links.

To me, this line in copy.c seems to simply be a mistake: If the files in the target directory are newer than those in the source, and the user has requested --update=older, I don't see any reason to be messing around with links.

I have to admit I'm a bit puzzled by
unlinkat(3, "./CuYCWUCU", 0)            = 0
which would appear to be successfully removing a filename that no longer exists.

I agree, seems a bit fishy.

It's not fishy. POSIX says that rename ("CuYCWUCU", "dest") must do nothing and return 0 if "CuYCWUCU" and "dest" are both hard links to the same file. This is a bug in POSIX, and Linux unfortunately conforms to POSIX here. The workaround for the bug is to unlink ("CuYCWUCU") after renaming it to "dest"; the unlink succeeds if and only if the bug was triggered.

Well, good to know, thanks for the info!

On 9/18/25 4:08 AM, Paul Eggert wrote:
Yes, it's all fairly mysterious. Perhaps someone with more time can look into it.

Following up on that a bit. I think part of the problem here is that cp is designed to copy things as efficiently as possible. It's not designed to make minimal changes to the destination (which it sounds like is what you want), so it's perhaps not the best tool for the job you're trying to do. You might want to look into alternative tools (rsync, maybe?) that are more aligned to your needs.

Yeah, it's something I'm considering, but for the moment we do seem to have found some bugs, so I'll reevaluate the situation once these details with cp are clarified a bit.

Regarding the design of cp, I agree that in the typical situation it makes more sense to make a syscall to write something that is definitely correct, rather than making one syscall to see if anything needs to be changed and then (almost definitely) making another to write what was already known to be correct. But in the case of the user having requested "--update=older", the second case seems to be exactly what is wanted, and it seems safe to assume that making a lot of "reads" and a few "writes" will be the more efficient thing to do than making a lot of "writes".

Thanks,

-Kye

--
Kye E. Hunter
PGP: 6859 E2DE D598 49EA 9319  10CD DEF2 BA03 A6BE 3062
--


Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to