Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-13 Thread Dan Stromberg via rsync
On Sun, Dec 13, 2020 at 11:59 AM Wayne Davison via rsync <
rsync@lists.samba.org> wrote:

> I should also mention that there are totally valid reasons why the dir
> might be huge on day4. For instance, if someone changed the mode on the
> files from 664 to 644 then the files cannot be hard-linked together even if
> the file's data is unchanged. The same goes for differences in preserved
> xattrs, acls, and ownership.  In such a case you could decide that you
> don't care about the change in meta info and tweak it on the earlier files
> to match day4's files and then the suggested re-link command would decide
> it could join them together.  You'd probably then need to keep going and
> re-link day5's pictures (since it was probably linking to the old day4's
> pictures).
>
> ..wayne..
>

I totally get why some folks would prefer to use rsync --link-dest for
backups: It's very fast, and the backup itself is usable as a replacement
filesystem.

If you are open to trying something else though, there are probably several
tools at
https://stromberg.dnsalias.org/~strombrg/backshift/documentation/comparison/index.html
that can backup permissions changes without needing to create a copy of the
file data.  Sadly, I don't know about most of the tools there, but I know
that backshift wouldn't.  Backshift is much slower than rsync, but also
takes up quite a bit less storage space, even if you mv a large hierarchy
or change all the file permissions in a hierarchy.

HTH
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-13 Thread Wayne Davison via rsync
I should also mention that there are totally valid reasons why the dir
might be huge on day4. For instance, if someone changed the mode on the
files from 664 to 644 then the files cannot be hard-linked together even if
the file's data is unchanged. The same goes for differences in preserved
xattrs, acls, and ownership.  In such a case you could decide that you
don't care about the change in meta info and tweak it on the earlier files
to match day4's files and then the suggested re-link command would decide
it could join them together.  You'd probably then need to keep going and
re-link day5's pictures (since it was probably linking to the old day4's
pictures).

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-13 Thread Wayne Davison via rsync
You could rsync the current day4 dir to a day4.new dir, and list all the
prior days as --link-dest options. Make sure that you're using the same
xatt/acl options as your official backup command (the options may or may
not be present) so that you are preserving the same level of info as the
backup.  You also have the choice of copying the whole day4 dir or just the
day4/pictures dir, as you see fit.

For example:

rsync -aiv --link-dest=../day1 --link-dest=../day2
--link-dest=../day3 day4/ day4.new/
mv day4 day4.bad
mv day4.new day4

If you only want to reprocess the pictures subdir, just tweak the "day4/"
arg to be "day4/pictures" (no trailing slash) and change the mv commands to
deal with just that subdir.

..wayne..


On Thu, Dec 10, 2020 at 9:29 AM Chris Green via rsync 
wrote:

> I run a simple self written incremental backup system using rsync's
> --link-dest option.
>
> Occasionally, because I've moved things around or because I've done
> something else that breaks things, the hard links aren't created as
> they should be and I get a very space consuming backup increment.
>
> Is there any easy way that one can restore hard links in the *middle*
> of a series?  For example say I have:-
>
> day1/pictures
> day2/pictures
> day3/pictures
> day4/pictures
> day5/pictures
>
> and I notice that day4/pictures is using as much space as
> day1/pictures but all the others are relatively small, i.e.
> day2 day3 and day5 have correctly hard linked to the previous day but
> day4 hasn't.
>
> It needs a tool that can scan day4, check a file is identical with the
> one in day3 then hardlink it without losing the link from day5.
>
> There's jdupes but that does lose the link from day5 so you'd have to
> apply it to all the directories after the one that's lost the links.
>
>
>
> --
> Chris Green
> ·
>
>
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-11 Thread Chris Green via rsync
Guillaume Outters via rsync  wrote:
> On 2020-12-11 12:53, Chris Green wrote :
> 
> > […] wrote a trivial[ish] script that copied
> > all the backups to a new destination sequentially (using --link-dest)
> > and then removed the original tree, having checked the new backups
> > were OK of course.
> 
> With the same cause as yours, I once worked out exactly the same 
> solution.
> 
> But then, having to automate it, I worked a bit more on it, and ended 
> up having a shell script that:
> - recursively listed files as "file size - inode - path"
> - with sort and awk, output the list of "every size that has different 
> inodes"
> - for each output size, cksumed one file for each inode
> - if two different inodes (with the same file size) had their cksum 
> match, then it replaced every file for the last inode, with a link to 
> the first inode
> 
> If you have to run it frequently, you may want to implement something 
> similar.
> Although it ignores mtime info (and thus strips it when lning),
> it has the great benefit of finding every duplicate, be it renamed and 
> move to another dir
> (as in 
> ./her.2020-12-01/Library/Mail/…/Sent.mbox/…/Attachments/…/PhotoDeFamille.JPG 
> versus ./his.2020-11-26/perso/photos/100_.JPG).
> 
> (and by the way I reimplemented it in C, "just for fun" and for speed 
> too: https://github.com/outtersg/dude/ . Hmm, in C but in French)
> 
The program jdupes will do it for you as well.  

The disadvantage (for me) of jdupes is that, given 40 or so incremental
backups (which is what I had when I saw the problem) each with many
tens of thousands of files in them it will take a *very* long time to
do its job.

Like your solution it's general, files can have different names and be
in totally different places in the directory hierarchy and it will
find the duplicates.

In my case the files which should be duplicates (and thus be hard
linked) are always ones with the same name in the same place in the
hierarchy.  It feels as if there should be a better/faster way of
addressing this particular case but I don't know what it is.

-- 
Chris Green
·


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-11 Thread Guillaume Outters via rsync

On 2020-12-11 12:53, Chris Green wrote :


[…] wrote a trivial[ish] script that copied
all the backups to a new destination sequentially (using --link-dest)
and then removed the original tree, having checked the new backups
were OK of course.


With the same cause as yours, I once worked out exactly the same 
solution.


But then, having to automate it, I worked a bit more on it, and ended 
up having a shell script that:

- recursively listed files as "file size - inode - path"
- with sort and awk, output the list of "every size that has different 
inodes"

- for each output size, cksumed one file for each inode
- if two different inodes (with the same file size) had their cksum 
match, then it replaced every file for the last inode, with a link to 
the first inode


If you have to run it frequently, you may want to implement something 
similar.

Although it ignores mtime info (and thus strips it when lning),
it has the great benefit of finding every duplicate, be it renamed and 
move to another dir
(as in 
./her.2020-12-01/Library/Mail/…/Sent.mbox/…/Attachments/…/PhotoDeFamille.JPG 
versus ./his.2020-11-26/perso/photos/100_.JPG).


(and by the way I reimplemented it in C, "just for fun" and for speed 
too: https://github.com/outtersg/dude/ . Hmm, in C but in French)


--
Guillaume

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-11 Thread Chris Green via rsync
Paul Slootman via rsync  wrote:
> On Thu 10 Dec 2020, Chris Green via rsync wrote:
> > 
> > Occasionally, because I've moved things around or because I've done
> > something else that breaks things, the hard links aren't created as
> > they should be and I get a very space consuming backup increment.
> > 
> > Is there any easy way that one can restore hard links in the *middle*
> > of a series?  For example say I have:-
> > 
> > day1/pictures
> > day2/pictures
> > day3/pictures
> > day4/pictures
> > day5/pictures
> > 
> > and I notice that day4/pictures is using as much space as
> > day1/pictures but all the others are relatively small, i.e.
> > day2 day3 and day5 have correctly hard linked to the previous day but
> > day4 hasn't.
> > 
> > It needs a tool that can scan day4, check a file is identical with the
> > one in day3 then hardlink it without losing the link from day5.
> 
> If you have these files that are hardlinked:
> 
> day1/pictures/1.jpg
> day2/pictures/1.jpg
> day3/pictures/1.jpg
> 
> And these are hardlinked, but to a different inode:
> 
> day4/pictures/1.jpg
> day5/pictures/1.jpg
> 
> then there is no way of linking the second group to the first in one
> step; you will have to individually link day3/pictures/1.jpg to
> day4/pictures/1.jpg and then day3/pictures/1.jpg (or
> day4/pictures/1.jpg) to day5/pictures/1.jpg.
> 
> It's not like a group of directory entries that are hardlinked to one
> inode are some sort of actual group; they just happen to be directory
> entries that point to the same inode number. There is no other relation
> between those directory entries.
> 
> So you will have to incrementally process each next day against the
> previous day.
> 
Yes, that's what I have done, wrote a trivial[ish] script that copied
all the backups to a new destination sequentially (using --link-dest)
and then removed the original tree, having checked the new backups
were OK of course.

Fortunately I have lots of spare space on the backup system at the
moment having just upgraded it with a new 8Tb drive, so duplicating
the whole backup wasn't an issue (though rather slow because it was
from and to the same drive).

> 
> If I make a significant change in such a directory structure (e.g.
> renaming a directory) I try to remember to do the same thing on the
> backup which some say is wrong, but it saves a lot of space, like you
> discovered :)
> 
Yes, I've sometimes done that.

-- 
Chris Green
·


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-11 Thread Paul Slootman via rsync
On Thu 10 Dec 2020, Chris Green via rsync wrote:
> 
> Occasionally, because I've moved things around or because I've done
> something else that breaks things, the hard links aren't created as
> they should be and I get a very space consuming backup increment.
> 
> Is there any easy way that one can restore hard links in the *middle*
> of a series?  For example say I have:-
> 
> day1/pictures
> day2/pictures
> day3/pictures
> day4/pictures
> day5/pictures
> 
> and I notice that day4/pictures is using as much space as
> day1/pictures but all the others are relatively small, i.e.
> day2 day3 and day5 have correctly hard linked to the previous day but
> day4 hasn't.
> 
> It needs a tool that can scan day4, check a file is identical with the
> one in day3 then hardlink it without losing the link from day5.

If you have these files that are hardlinked:

day1/pictures/1.jpg
day2/pictures/1.jpg
day3/pictures/1.jpg

And these are hardlinked, but to a different inode:

day4/pictures/1.jpg
day5/pictures/1.jpg

then there is no way of linking the second group to the first in one
step; you will have to individually link day3/pictures/1.jpg to
day4/pictures/1.jpg and then day3/pictures/1.jpg (or
day4/pictures/1.jpg) to day5/pictures/1.jpg.

It's not like a group of directory entries that are hardlinked to one
inode are some sort of actual group; they just happen to be directory
entries that point to the same inode number. There is no other relation
between those directory entries.

So you will have to incrementally process each next day against the
previous day.


If I make a significant change in such a directory structure (e.g.
renaming a directory) I try to remember to do the same thing on the
backup which some say is wrong, but it saves a lot of space, like you
discovered :)


Paul

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-10 Thread Dan Stromberg via rsync
Hi.

Is it possible that, if day4 is consuming too much space, that day3 was an
incomplete backup?

The rsync wrapper I wrote goes to a little trouble to make sure that
incomplete backups aren't allowed.  It's called Backup.rsync, and can be
found at:
https://stromberg.dnsalias.org/~strombrg/Backup.remote.html
It does this by mv'ing backups to a magic name scheme only after they fully
finish, to distinguish them from partial backups.  If a backup is found
that doesn't have that magic name scheme, it is assumed to be partia, and
is reused as the starting point for the next snapshot.

Feel free to use it, or raid it for ideas.

HTH


On Thu, Dec 10, 2020 at 9:29 AM Chris Green via rsync 
wrote:

> I run a simple self written incremental backup system using rsync's
> --link-dest option.
>
> Occasionally, because I've moved things around or because I've done
> something else that breaks things, the hard links aren't created as
> they should be and I get a very space consuming backup increment.
>
> Is there any easy way that one can restore hard links in the *middle*
> of a series?  For example say I have:-
>
> day1/pictures
> day2/pictures
> day3/pictures
> day4/pictures
> day5/pictures
>
> and I notice that day4/pictures is using as much space as
> day1/pictures but all the others are relatively small, i.e.
> day2 day3 and day5 have correctly hard linked to the previous day but
> day4 hasn't.
>
> It needs a tool that can scan day4, check a file is identical with the
> one in day3 then hardlink it without losing the link from day5.
>
> There's jdupes but that does lose the link from day5 so you'd have to
> apply it to all the directories after the one that's lost the links.
>
>
>
> --
> Chris Green
> ·
>
>
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Is there any way to restore/create hardlinks lost in incremental backups?

2020-12-10 Thread Chris Green via rsync
I run a simple self written incremental backup system using rsync's
--link-dest option.

Occasionally, because I've moved things around or because I've done
something else that breaks things, the hard links aren't created as
they should be and I get a very space consuming backup increment.

Is there any easy way that one can restore hard links in the *middle*
of a series?  For example say I have:-

day1/pictures
day2/pictures
day3/pictures
day4/pictures
day5/pictures

and I notice that day4/pictures is using as much space as
day1/pictures but all the others are relatively small, i.e.
day2 day3 and day5 have correctly hard linked to the previous day but
day4 hasn't.

It needs a tool that can scan day4, check a file is identical with the
one in day3 then hardlink it without losing the link from day5.

There's jdupes but that does lose the link from day5 so you'd have to
apply it to all the directories after the one that's lost the links.



-- 
Chris Green
·


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html