RE: Time rsYnc Machine (tym)
M. Carrasco wrote (Friday, August 10, 2012 12:23 AM): Reading the rsync man page, it seems that the -H option with --link-dest is tricky. The price to pay on not using -H is that hard linked files are treated as separated files: I could not find any mechanism in rsync to improve the time machine effect; I would appreciate hits on how to improve it. I highlight this in the tym man page and inside the program. I'm still using the way before -H and --link-dest were available: cp --archive --link $PREVIOUS_SNAPSHOT $NEW_SNAPSHOT rsync \ --archive --verbose --delete\ --delete-excluded --one-file-system \ $SOME_EXCLUDE_INCLUDE_RULES \ $DIR_TO_BACKUP/ $NEW_SNAPSHOT With a bit shuffling around the snapshot directories, I keep 24x hourly, 7x daily, 4x weekly and 12 montly snapshots (using 4 cron jobs). Furthermore, I've mounted the backup partition as /root/backup, with a read-only bind-mount on /media/backup so that the user can always have a safe look into all the snapshots. ;-) Have a nice day, Berny -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
On Thu 09 Aug 2012, Linda Walsh wrote: Anyway, thanks for the history update. I have a feeling rsync is afraid to use memory -- and really, it should try to use alot of memory to optimize transfers, I have had rsync fail after using up 8GB memory + 4GB swap, so I'm very happy it does its best to minimize memory usage. Turning off partial-send's (--whole-file) using 8-bit-io) seem to really help speed things up on a same-system copy... In doing a full sync with a backup, --whole-file is the default when source and destination are both on the same system -- at least, if you don't specify hostnames... of a 6G HD, (I used --drop-cache, --inplace and --del as well) Doing an archive diff with --acls --xattrs + --hardlinks rysync averaged 125MB/s for the actual IO... (about 30% of the disk)... OK, and now on a 6*TB* HD :-) This is on one of my backup servers: /dev/sdc 39T 33T 5.9T 85% /backup Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Paul Slootman wrote: On Thu 09 Aug 2012, Linda Walsh wrote: Anyway, thanks for the history update. I have a feeling rsync is afraid to use memory -- and really, it should try to use alot of memory to optimize transfers, I have had rsync fail after using up 8GB memory + 4GB swap, so I'm very happy it does its best to minimize memory usage. Well it should size itself for the resources available...in any case up or down... OK, and now on a 6*TB* HD :-) This is on one of my backup servers: /dev/sdc 39T 33T 5.9T 85% /backup you fit a 39T partition on a 6TB HD? something slipped there as on mine. -- my media partition was 6T, not 6G. I'm always losing units... Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
I may be mistaken, but I heard at one time that rsync was noticeably slower when asked to preserve hard links. I'm guessing this is a matter of CPU requirements rather than I/O requirements, but that's just a guess. You can go a long way toward detecting hardlinks by just watching for st_nlink 1 from fstat(). Another way, one that only bothers with hardlinks that are part of a transfer, uses bloom filters. You stash a inode#,device# pair in the filter. A bloom filter is a probabilistic set datastructure. You can't iterate over its elements (it doesn't store them all), but you can add elements and test candidate elements for membership. The chief advantage is that you can have storage requirements as low as one bit per element stored. When you do a set membership test, a bloom filter can give one of two answers: This element is definitely not in the set or This element is almost definitely in the set. Detecting hardlinks is a good use of a bloom filter, because all you really want to do is cut down on the number of things you need to store in an array/list. It's OK to have a few things in the list that aren't really hardlinks - the goal is chiefly to dramatically cut down on the storage requirements, and avoid a large sort() or O(file_count) hash table. That's where the filter part of the name comes from - it's filtering the count of the objects of interest down to a more manageable size. On Tue, Aug 7, 2012 at 9:07 PM, Linda Walsh rs...@tlinx.org wrote: Dan Stromberg wrote: FWIW, it might be nice to add a hardlink detecting bloom filter to rsync at some point. This makes the process of detecting hardlinks less expensive. Another way to narrow down the field is to just look at st_nlink. What's a bloom filter? and how / why would it make things less expensive? I don't understand why it is expensive now? You have to visit all files -- likely a previsit to get a size estimate -- reading all the inodes at that point, Then have a hash 'ino2names' for each inode to point to an array name of files found in the tree with the same inode %ino2names $ino2names=[array of paths relative to root of tree being examined] Since the size of the transfer is known after the initial scan -- all the inode inode-path mapping would be knowable as well, at that point. No extra expense involved. Of course given the error I reported, it seems rsync has gotten broken, recently with hard links -- they aren't that difficult. It is presumed, that links 'out of tree' are ignored @ source and target -- meaning target files end up with same internal linkages as on source, and any external links would be broken. I still have no clue what a bloom filter is?? ;-) Cluesticks anyone? -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Dan Stromberg wrote: I may be mistaken, but I heard at one time that rsync was noticeably slower when asked to preserve hard links. I'm guessing this is a matter of CPU requirements rather than I/O requirements, but that's just a guess. --- I didn't realize at the time I wrote it, that rsync had switched to an incremental_recursive algorithm, where it doesn't know the full dataset on each machine before starting synchronization. That makes it impossible to *easily* manage hardlinks -- and, in fact, as indicated elsewhere, causes hardlink copying to fail when making a differential rsync from A-C not including anything on B (where B is the argument for the --compare-dest option, and you copy from A-C; (Note, A is older version of B (a snapshot ), so I'm copying the differences between a live snapshot taken by lvm (at some point in the past, and 'now') to a static volume 'B'. FWIW, I save that off into it's own resized volumne mounted under a date stamped dir like /A/snapdir/@GMT-2012.08.05-00.21.43. That gives can give me the previous versions of files option on my samba shares on windows... I didn't understand why a bloom filter would be needed if you knew all the inodes, but given their approach with the incremental recursion I can see why they are reaching for arcane methods of handling hard links. Fortunately my problem with it crashing goes away and hardlinks work fine if incremental recursion is turned off. I'd be surprised if there was any hard-link overhead -- as in doing a dup-file detector, a version that pre-read the filenames to sort by size inode, ran significantly faster than one of the standard chksum/md5sum based detectors, which tried to build a list as it went... Anyway, thanks for the history update. I have a feeling rsync is afraid to use memory -- and really, it should try to use alot of memory to optimize transfers, Turning off partial-send's (--whole-file) using 8-bit-io) seem to really help speed things up on a same-system copy... In doing a full sync with a backup, of a 6G HD, (I used --drop-cache, --inplace and --del as well) Doing an archive diff with --acls --xattrs + --hardlinks rysync averaged 125MB/s for the actual IO... (about 30% of the disk)... -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Push is when you run your backup program (rsync and whatever script) on the machine being backed up and you push/upload your data to the backup system. Pull is when you run your backup program on the backup system and pull/download the data from the machine being backed up. You may also find the following link of interest : http://www.lbackup.org/network_backup_strategies -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Could you explain push vs. pull? I haven't seen that before. TIA Joe Note that this stuff is a lot easier if you pull your backups rather than pushing them. That way your making of directories and symlinks and deleting of old backups are all done locally. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Push is when you run your backup program (rsync and whatever script) on the machine being backed up and you push/upload your data to the backup system. Pull is when you run your backup program on the backup system and pull/download the data from the machine being backed up. The scripting is much easier in pull mode because the script is running where the backups are and can manipulate them easily. In push mode you have to manage everything through either ssh or rsync. On 07/28/12 13:14, jose...@main.nc.us wrote: Could you explain push vs. pull? I haven't seen that before. TIA Joe Note that this stuff is a lot easier if you pull your backups rather than pushing them. That way your making of directories and symlinks and deleting of old backups are all done locally. - -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone:(407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Floridak...@sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAUHqMACgkQVKC1jlbQAQfMrQCg69cKWKBe+2zjHaN1waEItX9i PmUAoJ8HgpNRNRGqO+kj6un225PVEVMV =FU29 -END PGP SIGNATURE- -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
On Sat, Jul 28, 2012 at 3:48 AM, M. Carrasco c...@dragoman.org wrote: 3. Cron It can run properly run from cron as it is demonized. What's this about? I've never had problems running run of the mill scripts from cron, once the environment is adequately replicated. --hard-links is not used and there is for and against reasons. From man rsync: ... finding multiply-linked files is expensive. Without this option, hard-linked files in the transfer are treated as though they were separate files. FWIW, it might be nice to add a hardlink detecting bloom filter to rsync at some point. This makes the process of detecting hardlinks less expensive. Another way to narrow down the field is to just look at st_nlink. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
He redirects stdout and stderr to files and doesn't require user interaction. Living on a notebook, almost all of my scripts don't do that, so they won't work from cron or any background situation unless I modify them with that in mind. Joe On Sat, Jul 28, 2012 at 3:48 AM, M. Carrasco c...@dragoman.org wrote: 3. Cron It can run properly run from cron as it is demonized. What's this about? I've never had problems running run of the mill scripts from cron, once the environment is adequately replicated. --hard-links is not used and there is for and against reasons. From man rsync: ... finding multiply-linked files is expensive. Without this option, hard-linked files in the transfer are treated as though they were separate files. FWIW, it might be nice to add a hardlink detecting bloom filter to rsync at some point. This makes the process of detecting hardlinks less expensive. Another way to narrow down the field is to just look at st_nlink. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Ah. Not being interactive is important for running in cron; I believe stdin will probably immediately EOF. But redirecting stdout and stderr is unnecessary - the output just goes to a cron e-mail with most cron's. Sometimes it's better to redirect to a file, but that's more of a user preference things. On Sat, Jul 28, 2012 at 12:26 PM, jose...@main.nc.us wrote: He redirects stdout and stderr to files and doesn't require user interaction. Living on a notebook, almost all of my scripts don't do that, so they won't work from cron or any background situation unless I modify them with that in mind. Joe On Sat, Jul 28, 2012 at 3:48 AM, M. Carrasco c...@dragoman.org wrote: 3. Cron It can run properly run from cron as it is demonized. What's this about? I've never had problems running run of the mill scripts from cron, once the environment is adequately replicated. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
On Sat, Jul 28, 2012 at 3:48 AM, M. Carrasco c...@dragoman.org wrote: 3. Cron It can run properly run from cron as it is demonized. What's this about? I've never had problems running run of the mill scripts from cron, once the environment is adequately replicated. Joe already answered, but one can also define it as you do: adequate environment: input, output, signals ... --hard-links is not used and there is for and against reasons. From man rsync: ... finding multiply-linked files is expensive. Without this option, hard-linked files in the transfer are treated as though they were separate files. FWIW, it might be nice to add a hardlink detecting bloom filter to rsync at some point. This makes the process of detecting hardlinks less expensive. Another way to narrow down the field is to just look at st_nlink. Anyway to put this into a wish list? In the meantime, users should be aware of when to use --hard-links. Regards Tomas -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Joe, Your desires are orders :-) ... a proper (small) man page http://dragoman.org/tym Regards Tomas On 26 Jul 2012, at 21:42, jose...@main.nc.us wrote: No good deed goes unpunished ;) Very nicely coded script, but it's a bit dense. I'm good at bash and can survive in rsync, but could you provide a description of what it actually does so I don't have to spend a long time analysing the code? Does it keep multiple versions like the name implies? Will it survive directory names with embedded blanks (in the parameters)? At first glance, this looks like a problem, but I may have missed something. Why trap so many signals? If something goes wrong, do I have to kill -9 to stop it? Does one of those keep it running when you logoff? I don't see a nohup in the script. I've seen some methods that use hard links to make subsequent backups smaller. I haven't quite figured out how that works, but it doesn't look like you use it. What does your script do if the destination runs out of space or isn't mounted? TIA Joe http://dragoman.org/tym Regards Tomas -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Thanks so much. I'm going to try it out. It looks like all I need to do is add something (manual or script) to delete the oldest versions periodically. Joe Joe, Your desires are orders :-) ... a proper (small) man page http://dragoman.org/tym Regards Tomas On 26 Jul 2012, at 21:42, jose...@main.nc.us wrote: No good deed goes unpunished ;) Very nicely coded script, but it's a bit dense. I'm good at bash and can survive in rsync, but could you provide a description of what it actually does so I don't have to spend a long time analysing the code? Does it keep multiple versions like the name implies? Will it survive directory names with embedded blanks (in the parameters)? At first glance, this looks like a problem, but I may have missed something. Why trap so many signals? If something goes wrong, do I have to kill -9 to stop it? Does one of those keep it running when you logoff? I don't see a nohup in the script. I've seen some methods that use hard links to make subsequent backups smaller. I haven't quite figured out how that works, but it doesn't look like you use it. What does your script do if the destination runs out of space or isn't mounted? TIA Joe http://dragoman.org/tym Regards Tomas -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Note that this stuff is a lot easier if you pull your backups rather than pushing them. That way your making of directories and symlinks and deleting of old backups are all done locally. On 07/27/12 16:04, jose...@main.nc.us wrote: Thanks so much. I'm going to try it out. It looks like all I need to do is add something (manual or script) to delete the oldest versions periodically. Joe Joe, Your desires are orders :-) ... a proper (small) man page http://dragoman.org/tym Regards Tomas On 26 Jul 2012, at 21:42, jose...@main.nc.us wrote: No good deed goes unpunished ;) Very nicely coded script, but it's a bit dense. I'm good at bash and can survive in rsync, but could you provide a description of what it actually does so I don't have to spend a long time analysing the code? Does it keep multiple versions like the name implies? Will it survive directory names with embedded blanks (in the parameters)? At first glance, this looks like a problem, but I may have missed something. Why trap so many signals? If something goes wrong, do I have to kill -9 to stop it? Does one of those keep it running when you logoff? I don't see a nohup in the script. I've seen some methods that use hard links to make subsequent backups smaller. I haven't quite figured out how that works, but it doesn't look like you use it. What does your script do if the destination runs out of space or isn't mounted? TIA Joe http://dragoman.org/tym Regards Tomas -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html - -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone:(407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Floridak...@sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlAS+jwACgkQVKC1jlbQAQdLBQCdEgpazw5+mGZu6mjtqzcllpM5 HdIAoLCJcZyZkfYq8uzHDMUYiRQeE7jC =XfiB -END PGP SIGNATURE- -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
No good deed goes unpunished ;) Very nicely coded script, but it's a bit dense. I'm good at bash and can survive in rsync, but could you provide a description of what it actually does so I don't have to spend a long time analysing the code? Does it keep multiple versions like the name implies? Will it survive directory names with embedded blanks (in the parameters)? At first glance, this looks like a problem, but I may have missed something. Why trap so many signals? If something goes wrong, do I have to kill -9 to stop it? Does one of those keep it running when you logoff? I don't see a nohup in the script. I've seen some methods that use hard links to make subsequent backups smaller. I haven't quite figured out how that works, but it doesn't look like you use it. What does your script do if the destination runs out of space or isn't mounted? TIA Joe http://dragoman.org/tym Regards Tomas -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Time rsYnc Machine (tym)
Joe, You know programmers are not very good at documenting :-) On 26 Jul 2012, at 21:42, jose...@main.nc.us wrote: No good deed goes unpunished ;) Very nicely coded script, but it's a bit dense. I'm good at bash and can survive in rsync, My intention was to make something readable :-) Main () { Init MakeDest Sync Bye 0 } I will *try* to illustrate the code, but, please point to the most obscure bits and I will concentrate on it. but could you provide a description of what it actually does so I don't have to spend a long time analysing the code? Similar functionalities to the Time Machine, though nothing for restoring: one has to use the available copy commands. Does it keep multiple versions like the name implies? Yes. As indicated in http://dragoman.org/tym each run creates two yymmdd-hhmmss directories, one in the backup destination directory and one in the log directory Example - /foo/bar-dest (the data, using the hard link mechanism of rsync) 120603-011102 120616-091035 120726-160744 - ~./tym (the logs) 120603-011102 120616-091035 120726-160744 Each log directory contain four files: log.txt: then main log of Tim rsync.txt : the output of rsync out.txt: the standard output; should be empty err.txt: the standard error; should be empty The file log.txt 1|0|start-time|08:46:54| 2|0|start-date|20-02-2012| 3|0|Prog|tym| ... 19|119|day:hour:minute:second|00:00:01:59| 20|119|end-date|20-02-2012| 21|119|end-time|08:48:53| Where the fields are: sequential number seconds into the program key value Will it survive directory names with embedded blanks (in the parameters)? Probably not. The variable SourceList should contain strings that survive Linux and rsync rsync ... $SourceList Why trap so many signals? If something goes wrong, do I have to kill -9 to stop it? I trap all the signals because I want to know all the signals received. If tym receives a signal it log it in log.txt and continue; it might be changed to do something else. I was considering changing it, so one could kill it with SIGTERM; the PID could be logged into log.txt. Does one of those keep it running when you logoff? I don't see a nohup in the script. It will keep running after logoff as SIGHUP is also trapped. I've seen some methods that use hard links to make subsequent backups smaller. I haven't quite figured out how that works, but it doesn't look like you use it. The hard work is done by rsync; tym is just a wrap. Indeed, it should not be too hard to implement it in rsync One might consider implementing time machines facilites in rsync: --tm-source --tm-dest --tm-logdir --tm-machine What does your script do if the destination runs out of space or isn't mounted? It will fail and the errors registered in the appropriate logs. Regards Tomas TIA Joe http://dragoman.org/tym Regards Tomas -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html