On 2026-06-06 20:25, Collin Funk wrote:
I guess it might be be possible to move the open earlier, but it would be a bit of work. Currently, copy_internal stat's (well, fstatat, in this case) the source file to determine how to operate on it based on it's type. To make this safe, in copy_reg we check that the dev & inode don't change inbetween stating and opening the file.
Yes, and that entire idea dates back to 7th Edition Unix, when it was dangerous to try to open a file without checking its type first, because the file could be a tape drive that automatically rewound after closing and if you didn't want have side effects on reads then you had to stat the file before opening it to make sure it wasn't a tape drive. stat+open is racy, though, and that is why copy_internal checks dev+ino afterwards, and even this test isn't guaranteed to be race-free if a file gets deleted and recreated but I suppose it's the best we can do. (Hey! Now that I think about it we could also check the file type - that'd be an improvement over the current check.)
Nowadays on GNU/Linux there's a better way: open the file with O_PATH, use fstat to check its type, then use openat2 (fd, "", ...) with O_RDONLY to get the file descriptor you really want. No races involving file types are possible, though unfortunately this behavior is undocumented (!) and the Gnulib openat2 emulation doesn't support this behavior (and it wouldn't be easy to add, for non-directories anyway). FreeBSD 13+ has a similar feature with openat (fd, "", O_RDONLY | O_EMPTY_PATH). It's too bad that GNU/Linux didn't follow FreeBSD's lead here, as the FreeBSD approach is simpler and better-documented, but the Linux kernel developers felt that they had run out of flags.
I suppose copy_internal should use the O_PATH approach if on a new-enough GNU/Linux or FreeBSD system. Admittedly doing it this way would be nontrivial. And it wouldn't fix the problems on macOS. And openat2 will almost surely change in Linux 7.2 so this stuff is a bit bleeding-edge anyway.
The issue is that we get a bogus dev/ino from path based stat operations when invoking 'install' with standard input as the source, for example.
Yes, but let's go back to the original question: why is that dev+ino check there at all? What would go wrong if we removed it? Or, can we remove the check only in the cases where it's misguided, as is the case here?
