Re: Confused as to why rsync thinks time, owner and group of many files differ
Hi Kevin, On Thu, Feb 03, 2022 at 05:38:41PM -0500, Kevin Korb via rsync wrote: > Are you using the same source and target each time? Yes. > I ask because the only discrepancy I see is the link count which > shows that there are 11 more instances of that inode on the source > than the target. Maybe instances in other snapshots are being > updated/re-linked? I haven't yet let rsync run all the way through the whole source filesystem so it probably hasn't yet sent over some of the hardlinks that it knows about for this file. There's only ever one rsync going at once, because this is a one-off thing I am doing by hand. > The only other thing to mention is that when you abort rsync (with -P or > --inplace) incomplete files are left. Rsync doesn't fix the owner+group > until it is done with a directory and it doesn't fix the timestamp until it > is done with a file. This would be why you shouldn't mix those options with > --update since the truncated file will be newer than the source file. Okay, but: - it's thousands of files that are reported as having differing t/o/g, not just whichever one was being worked on when I hit ctrl-c. I'm only hitting ctrl-c because rsync sees thousands of changes that I can't explain. - they don't have differing t/o/g when you look at them. - their contents are identical anyway as confirmed by sha256sum and also as confirmed by the fact that rsync isn't sending the file contents over. - if I use "-I --checksum" to skip mtime checking and force checksum, rsync doesn't try to sync these files (it does still for the ones it thinks o/g are different). This partial workaround isn't very useful anyway as --checksum takes forever. Point is, it definitely thinks there are changes of mtime, uid and/or gid. So I am still really confused. If I remove the --inplace I think the spurious t/o/g detection will still happen, and also that rsync will create a temp file to rename over each file, so blowing up the hardlinks that it has already sent across. This would be mere curiosity if it did this once and then was happy that it had set the mtime/uid/gid, but it doesn't, it does it every time, which is making things really slow. I am trying to build a newer rsync for use on the sender to see if that makes any difference but am also running into bizarre problems there, which is perhaps for another thread. Illegal instruction somewhere inside libcrypto. The same libcrypto that the packaged rsync is linked against. Goes away if I use --cc=none, but happens for md4 or md5. Really not my night! I am tempted to blow away the btrfs filesystem and just do xfs to xfs, to rule out weird issues there. It would be a shame though as I was hoping to use btrfs's compression here. Cheers, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Confused as to why rsync thinks time, owner and group of many files differ
Hi, I am at the moment using rsync to move quite a big set of backups from one machine to another. The source filesystem is xfs; the target filesystem is btrfs. For various reasons I have been stopping the rsync part way through and re-starting. I have noticed that a large number of files are transferred over and over and I can't work out why. Example: sudo rsync -iPva \ --inplace \ --numeric-ids \ --delete \ /data/backup/rsnapshot/daily.0/cacti/ \ root@koff:/data/backup/rsnapshot/daily.0/cacti/ ... http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes, prealloc Destination: $ rsync --version rsync version 3.2.3 protocol version 31 Copyright (C) 1996-2020 by Andrew Tridgell, Wayne Davison, and others. Web site: https://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, hardlink-specials, symlinks, IPv6, atimes, batchfiles, inplace, append, ACLs, xattrs, optional protect-args, iconv, symtimes, prealloc, stop-at, no crtimes Optimizations: SIMD, asm, openssl-crypto Checksum list: xxh128 xxh3 xxh64 (xxhash) md5 md4 none Compress list: zstd lz4 zlibx zlib none What am I missing? Thanks, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: How to manage root<-->root rsync keeping permissions?
Hello, On Tue, Aug 03, 2021 at 03:05:27PM +0100, Chris Green via rsync wrote: > Remember, as I said, this is all Debianland with no real root login, > while I could add one I'd prefer not to. Your system already has a root user and if you added an SSH public key to its authorized_keys file (and allowed root login by public key only in sshd_config) then SSH login would work. The only form of login you would have added is "by this specific ssh key". The account could still remain password locked as it is now. It is difficult for me to see why such a setup would be inherently more secure than one where a regular user account can do absolutely anything (i.e. run rsync as root without password prompt), especially given that a regular user account is likely to run a lot of other software some of which may have bugs. But we all choose our security stance. > I've set it up so chris can run rsync with root permissions. > However I'm not quite sure how to get it to work as one needs to say > "sudo rsync" to get the root privilege. How do you do that? The first link I sent you had an example of that: --rsync-path="sudo rsync" Cheers, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: How to manage root<-->root rsync keeping permissions?
Hi Chris, On Tue, Aug 03, 2021 at 11:48:31AM +0100, Chris Green via rsync wrote: > If I used the --super option (in a command like the one above) and > chris can run rsync as root on the remote end (via options in the > sudoers file) will this do what I want? I guess I can go away and try > it! :-) You don't need --super if the remote side actually is running as root (either because you logged in as "root" or you logged in as "chris" but told it to execute "sudo rsync"). If you're going to use sudo then you'll want to set it NOPASSWD so it doesn't ask for a sudo password. Possibly restricting that only to uses of rsync or a specific script, otherwise it is giving "chris" blanket sudo access without a password. Cheers, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: How to manage root<-->root rsync keeping permissions?
Hi Chris, On Tue, Aug 03, 2021 at 09:48:37AM +0100, Chris Green via rsync wrote: > But how do you handle the other end to restore the root ownership etc.? > The script has to do something like:- > > rsync -a /etc/ chris@remote:backups/etc/ > > So at the remote end it only has chris' privileges. A couple of options: https://strugglers.net/~andy/blog/2021/04/10/rsync-and-sudo-without-x-forwarding/ Since you want to automate it I'd go with letting root log in by ssh key only, and force the key to work only with a specific script. Here is an example forced command that only allows rsync https://www.guyrutenberg.com/2014/01/14/restricting-ssh-access-to-rsync/ This is still vulnerable to doing anything that rsync can do. You can secure it further by making a script that only does the specific things you need rsync to do, e.g. the exact parameters and paths, and force that script instead. Cheers, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: feature request: exclude from path
Hi Matt, On Sat, Aug 01, 2020 at 10:10:49PM -0400, Matt Stevens via rsync wrote: > I lack development skills. Would there be a way for rsync to be passed an > option to exclude a specific path during a sync operaton? All of my attempts > to use exclude have failed, as it does not respect paths, only filetypes. The existing --exclude and filter file options work for me for this. Maybe show us what you're doing (command line), what you expect to happen and what actually happens? You absolutely can exclude paths. The exclude and filter options are very expressive. I've been doing it for years. Cheers, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
High memory usage - any way around it other than splitting jobs?
Hi, I have a virtual machine with 2G of memory. On this VM there is a directory tree with 33.3 million files in it. When attempting to rsync (rsync -PSHav --delete /source /dest) this tree from one directory to another on the same host, rsync uses all the memory and is killed by oom-killer. This host is Debian oldstable so has $ rsync --version rsync version 3.1.2 protocol version 31 The normal operation of this VM does not require more than 2G of memory, but I doubled it to 4G anyway. Unfortunately rsync still uses all the memory and is killed. Most advice I can find on decreasing rsync memory usage advises to split the job up into batches. By issuing one rsync for each directory within /source I was able to make this work. The interesting thing is though, the split of file numbers between sub-directories is very uneven with the majority of them (31.5 million of the 33.3 million) being in just one of the sub-directory trees. I am kind of surprised that rsync has such a problem going just that little bit further with the last 2 million. Is there any scope for improvement with the incremental recursion code? If I upgraded the version of rsync could I expect this to work any better? I could also give the host a massive swap file. It currently has just 1G of swap, which all gets used in the failure case. I could add more but I fear that the job will go so slow it will not complete in a reasonable time. I don't know if the -H option is causing extra memory usage here; unfortunately it is necessary as there are hardlinks in there. Some years old advice says to disable incremental recursion with --no-i-r. As incremental recursion was added to reduce memory usage this seems counter-intuitive to me, but this advice is all over the Internet… These are all things I will investigate before settling for the "split into multiple jobs" approach; just wondered if anyone has any shortcuts for me. Thanks, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html