Re: Is there any way to restore/create hardlinks lost in incremental backups?
On Sun, Dec 13, 2020 at 11:59 AM Wayne Davison via rsync < rsync@lists.samba.org> wrote: > I should also mention that there are totally valid reasons why the dir > might be huge on day4. For instance, if someone changed the mode on the > files from 664 to 644 then the files cannot be hard-linked together even if > the file's data is unchanged. The same goes for differences in preserved > xattrs, acls, and ownership. In such a case you could decide that you > don't care about the change in meta info and tweak it on the earlier files > to match day4's files and then the suggested re-link command would decide > it could join them together. You'd probably then need to keep going and > re-link day5's pictures (since it was probably linking to the old day4's > pictures). > > ..wayne.. > I totally get why some folks would prefer to use rsync --link-dest for backups: It's very fast, and the backup itself is usable as a replacement filesystem. If you are open to trying something else though, there are probably several tools at https://stromberg.dnsalias.org/~strombrg/backshift/documentation/comparison/index.html that can backup permissions changes without needing to create a copy of the file data. Sadly, I don't know about most of the tools there, but I know that backshift wouldn't. Backshift is much slower than rsync, but also takes up quite a bit less storage space, even if you mv a large hierarchy or change all the file permissions in a hierarchy. HTH -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
I should also mention that there are totally valid reasons why the dir might be huge on day4. For instance, if someone changed the mode on the files from 664 to 644 then the files cannot be hard-linked together even if the file's data is unchanged. The same goes for differences in preserved xattrs, acls, and ownership. In such a case you could decide that you don't care about the change in meta info and tweak it on the earlier files to match day4's files and then the suggested re-link command would decide it could join them together. You'd probably then need to keep going and re-link day5's pictures (since it was probably linking to the old day4's pictures). ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
You could rsync the current day4 dir to a day4.new dir, and list all the prior days as --link-dest options. Make sure that you're using the same xatt/acl options as your official backup command (the options may or may not be present) so that you are preserving the same level of info as the backup. You also have the choice of copying the whole day4 dir or just the day4/pictures dir, as you see fit. For example: rsync -aiv --link-dest=../day1 --link-dest=../day2 --link-dest=../day3 day4/ day4.new/ mv day4 day4.bad mv day4.new day4 If you only want to reprocess the pictures subdir, just tweak the "day4/" arg to be "day4/pictures" (no trailing slash) and change the mv commands to deal with just that subdir. ..wayne.. On Thu, Dec 10, 2020 at 9:29 AM Chris Green via rsync wrote: > I run a simple self written incremental backup system using rsync's > --link-dest option. > > Occasionally, because I've moved things around or because I've done > something else that breaks things, the hard links aren't created as > they should be and I get a very space consuming backup increment. > > Is there any easy way that one can restore hard links in the *middle* > of a series? For example say I have:- > > day1/pictures > day2/pictures > day3/pictures > day4/pictures > day5/pictures > > and I notice that day4/pictures is using as much space as > day1/pictures but all the others are relatively small, i.e. > day2 day3 and day5 have correctly hard linked to the previous day but > day4 hasn't. > > It needs a tool that can scan day4, check a file is identical with the > one in day3 then hardlink it without losing the link from day5. > > There's jdupes but that does lose the link from day5 so you'd have to > apply it to all the directories after the one that's lost the links. > > > > -- > Chris Green > · > > > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html > -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
Guillaume Outters via rsync wrote: > On 2020-12-11 12:53, Chris Green wrote : > > > […] wrote a trivial[ish] script that copied > > all the backups to a new destination sequentially (using --link-dest) > > and then removed the original tree, having checked the new backups > > were OK of course. > > With the same cause as yours, I once worked out exactly the same > solution. > > But then, having to automate it, I worked a bit more on it, and ended > up having a shell script that: > - recursively listed files as "file size - inode - path" > - with sort and awk, output the list of "every size that has different > inodes" > - for each output size, cksumed one file for each inode > - if two different inodes (with the same file size) had their cksum > match, then it replaced every file for the last inode, with a link to > the first inode > > If you have to run it frequently, you may want to implement something > similar. > Although it ignores mtime info (and thus strips it when lning), > it has the great benefit of finding every duplicate, be it renamed and > move to another dir > (as in > ./her.2020-12-01/Library/Mail/…/Sent.mbox/…/Attachments/…/PhotoDeFamille.JPG > versus ./his.2020-11-26/perso/photos/100_.JPG). > > (and by the way I reimplemented it in C, "just for fun" and for speed > too: https://github.com/outtersg/dude/ . Hmm, in C but in French) > The program jdupes will do it for you as well. The disadvantage (for me) of jdupes is that, given 40 or so incremental backups (which is what I had when I saw the problem) each with many tens of thousands of files in them it will take a *very* long time to do its job. Like your solution it's general, files can have different names and be in totally different places in the directory hierarchy and it will find the duplicates. In my case the files which should be duplicates (and thus be hard linked) are always ones with the same name in the same place in the hierarchy. It feels as if there should be a better/faster way of addressing this particular case but I don't know what it is. -- Chris Green · -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
On 2020-12-11 12:53, Chris Green wrote : […] wrote a trivial[ish] script that copied all the backups to a new destination sequentially (using --link-dest) and then removed the original tree, having checked the new backups were OK of course. With the same cause as yours, I once worked out exactly the same solution. But then, having to automate it, I worked a bit more on it, and ended up having a shell script that: - recursively listed files as "file size - inode - path" - with sort and awk, output the list of "every size that has different inodes" - for each output size, cksumed one file for each inode - if two different inodes (with the same file size) had their cksum match, then it replaced every file for the last inode, with a link to the first inode If you have to run it frequently, you may want to implement something similar. Although it ignores mtime info (and thus strips it when lning), it has the great benefit of finding every duplicate, be it renamed and move to another dir (as in ./her.2020-12-01/Library/Mail/…/Sent.mbox/…/Attachments/…/PhotoDeFamille.JPG versus ./his.2020-11-26/perso/photos/100_.JPG). (and by the way I reimplemented it in C, "just for fun" and for speed too: https://github.com/outtersg/dude/ . Hmm, in C but in French) -- Guillaume -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
Paul Slootman via rsync wrote: > On Thu 10 Dec 2020, Chris Green via rsync wrote: > > > > Occasionally, because I've moved things around or because I've done > > something else that breaks things, the hard links aren't created as > > they should be and I get a very space consuming backup increment. > > > > Is there any easy way that one can restore hard links in the *middle* > > of a series? For example say I have:- > > > > day1/pictures > > day2/pictures > > day3/pictures > > day4/pictures > > day5/pictures > > > > and I notice that day4/pictures is using as much space as > > day1/pictures but all the others are relatively small, i.e. > > day2 day3 and day5 have correctly hard linked to the previous day but > > day4 hasn't. > > > > It needs a tool that can scan day4, check a file is identical with the > > one in day3 then hardlink it without losing the link from day5. > > If you have these files that are hardlinked: > > day1/pictures/1.jpg > day2/pictures/1.jpg > day3/pictures/1.jpg > > And these are hardlinked, but to a different inode: > > day4/pictures/1.jpg > day5/pictures/1.jpg > > then there is no way of linking the second group to the first in one > step; you will have to individually link day3/pictures/1.jpg to > day4/pictures/1.jpg and then day3/pictures/1.jpg (or > day4/pictures/1.jpg) to day5/pictures/1.jpg. > > It's not like a group of directory entries that are hardlinked to one > inode are some sort of actual group; they just happen to be directory > entries that point to the same inode number. There is no other relation > between those directory entries. > > So you will have to incrementally process each next day against the > previous day. > Yes, that's what I have done, wrote a trivial[ish] script that copied all the backups to a new destination sequentially (using --link-dest) and then removed the original tree, having checked the new backups were OK of course. Fortunately I have lots of spare space on the backup system at the moment having just upgraded it with a new 8Tb drive, so duplicating the whole backup wasn't an issue (though rather slow because it was from and to the same drive). > > If I make a significant change in such a directory structure (e.g. > renaming a directory) I try to remember to do the same thing on the > backup which some say is wrong, but it saves a lot of space, like you > discovered :) > Yes, I've sometimes done that. -- Chris Green · -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
On Thu 10 Dec 2020, Chris Green via rsync wrote: > > Occasionally, because I've moved things around or because I've done > something else that breaks things, the hard links aren't created as > they should be and I get a very space consuming backup increment. > > Is there any easy way that one can restore hard links in the *middle* > of a series? For example say I have:- > > day1/pictures > day2/pictures > day3/pictures > day4/pictures > day5/pictures > > and I notice that day4/pictures is using as much space as > day1/pictures but all the others are relatively small, i.e. > day2 day3 and day5 have correctly hard linked to the previous day but > day4 hasn't. > > It needs a tool that can scan day4, check a file is identical with the > one in day3 then hardlink it without losing the link from day5. If you have these files that are hardlinked: day1/pictures/1.jpg day2/pictures/1.jpg day3/pictures/1.jpg And these are hardlinked, but to a different inode: day4/pictures/1.jpg day5/pictures/1.jpg then there is no way of linking the second group to the first in one step; you will have to individually link day3/pictures/1.jpg to day4/pictures/1.jpg and then day3/pictures/1.jpg (or day4/pictures/1.jpg) to day5/pictures/1.jpg. It's not like a group of directory entries that are hardlinked to one inode are some sort of actual group; they just happen to be directory entries that point to the same inode number. There is no other relation between those directory entries. So you will have to incrementally process each next day against the previous day. If I make a significant change in such a directory structure (e.g. renaming a directory) I try to remember to do the same thing on the backup which some say is wrong, but it saves a lot of space, like you discovered :) Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Is there any way to restore/create hardlinks lost in incremental backups?
Hi. Is it possible that, if day4 is consuming too much space, that day3 was an incomplete backup? The rsync wrapper I wrote goes to a little trouble to make sure that incomplete backups aren't allowed. It's called Backup.rsync, and can be found at: https://stromberg.dnsalias.org/~strombrg/Backup.remote.html It does this by mv'ing backups to a magic name scheme only after they fully finish, to distinguish them from partial backups. If a backup is found that doesn't have that magic name scheme, it is assumed to be partia, and is reused as the starting point for the next snapshot. Feel free to use it, or raid it for ideas. HTH On Thu, Dec 10, 2020 at 9:29 AM Chris Green via rsync wrote: > I run a simple self written incremental backup system using rsync's > --link-dest option. > > Occasionally, because I've moved things around or because I've done > something else that breaks things, the hard links aren't created as > they should be and I get a very space consuming backup increment. > > Is there any easy way that one can restore hard links in the *middle* > of a series? For example say I have:- > > day1/pictures > day2/pictures > day3/pictures > day4/pictures > day5/pictures > > and I notice that day4/pictures is using as much space as > day1/pictures but all the others are relatively small, i.e. > day2 day3 and day5 have correctly hard linked to the previous day but > day4 hasn't. > > It needs a tool that can scan day4, check a file is identical with the > one in day3 then hardlink it without losing the link from day5. > > There's jdupes but that does lose the link from day5 so you'd have to > apply it to all the directories after the one that's lost the links. > > > > -- > Chris Green > · > > > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html > -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Is there any way to restore/create hardlinks lost in incremental backups?
I run a simple self written incremental backup system using rsync's --link-dest option. Occasionally, because I've moved things around or because I've done something else that breaks things, the hard links aren't created as they should be and I get a very space consuming backup increment. Is there any easy way that one can restore hard links in the *middle* of a series? For example say I have:- day1/pictures day2/pictures day3/pictures day4/pictures day5/pictures and I notice that day4/pictures is using as much space as day1/pictures but all the others are relatively small, i.e. day2 day3 and day5 have correctly hard linked to the previous day but day4 hasn't. It needs a tool that can scan day4, check a file is identical with the one in day3 then hardlink it without losing the link from day5. There's jdupes but that does lose the link from day5 so you'd have to apply it to all the directories after the one that's lost the links. -- Chris Green · -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html