Verifying a time machine backup

2014-12-04 Thread Michael
I'd like a way to verify a time machine backup.

What I envision:
1. A tool to list "which files on the backup do not need to be backed up" -- in 
other words, the list of files that time machine think are worth backing up but 
can be skipped. These can then be sent to a diff-tool to verify that what is on 
the backup matches.

2. A way to let time machine know that "Hey, this file does not actually match, 
and needs to be backed up". "Delete all backups of file X" is one such, but it 
is overkill. On the other hand, if the file on the backup is in error, maybe it 
should be removed. It is also not quite sufficient, if files should be backed 
up but are missing.

#1 -- list all files that should be backed up and not need to be re-backed up 
-- is needed to avoid worrying about files that do not get backed up. The idea 
of "only scan files that are on the backup" will miss files that should have 
been but have not because of a bug in time machine itself. In theory, such a 
tool can be written today, but I have no idea how. As far as I can tell, 
backupd is the only program that has the knowledge to make such a list, but 
does not.

#2 -- force a backup of specific files -- seems to be impossible at the moment.

In the past (10.9.5), I had directories where the list of "what was backed up" 
did not match the list of "what should have been backed up". So the concern of 
"files missing" is not crazy. And the concern of "undetected IO error in the 
write" is very real.

Has anyone looked into this issue? Are there any tools, or any sort of "work in 
progress", or anything, to deal with any of this so far?

---
Entertaining minecraft videos
http://YouTube.com/keybounce

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-04 Thread Arno Hautala
The tl;dr for this that tmutil might provide some of what you're
looking for, but if you don't trust TimeMachine, you should use a
different backup tool.

On Thu, Dec 4, 2014 at 12:25 PM, Michael  wrote:
>
> What I envision:
> 1. A tool to list "which files on the backup do not need to be backed up" -- 
> in other words, the list of files that time machine think are worth backing 
> up but can be skipped. These can then be sent to a diff-tool to verify that 
> what is on the backup matches.

TimeMachine determines what needs to be backed up by watching FSEvents
for directories with changed files. During the backup TM then inspects
every file inside the flagged directories.
To build your list of files to check, you'd need to do the same thing.

> 2. A way to let time machine know that "Hey, this file does not actually 
> match, and needs to be backed up". "Delete all backups of file X" is one 
> such, but it is overkill. On the other hand, if the file on the backup is in 
> error, maybe it should be removed. It is also not quite sufficient, if files 
> should be backed up but are missing.

I don't think there's any method to do this other than modifying the
target files so TimeMachine explicitly notices the file.

> #1 -- list all files that should be backed up and not need to be re-backed up 
> -- is needed to avoid worrying about files that do not get backed up. The 
> idea of "only scan files that are on the backup" will miss files that should 
> have been but have not because of a bug in time machine itself. In theory, 
> such a tool can be written today, but I have no idea how. As far as I can 
> tell, backupd is the only program that has the knowledge to make such a list, 
> but does not.

You could do this with FSEvents.

> #2 -- force a backup of specific files -- seems to be impossible at the 
> moment.

Yeah, there's no way to tell TimeMachine to backup a file, aside from
modifying the file and starting a backup.

> In the past (10.9.5), I had directories where the list of "what was backed 
> up" did not match the list of "what should have been backed up". So the 
> concern of "files missing" is not crazy. And the concern of "undetected IO 
> error in the write" is very real.
>
> Has anyone looked into this issue? Are there any tools, or any sort of "work 
> in progress", or anything, to deal with any of this so far?

You can use the tmutil command to see what differs in any two
snapshots or from a snapshot to the current computer state.

There's also fseventer for watching changed files, though I'm not sure
if it's still compatible with recent OS versions.

All in all though, if you don't feel like you can trust TimeMachine (I
wouldn't trust it either if you find it skipping files), you're
probably just better off using another tool (rsync, rsnapshot,
CarbonCopyCloner, SuperDuper, Crashplan, BackBlaze, etc.). Pick a tool
that allows you to more easily verify the backup and manually backup
anything that fails verification.

-- 
arno  s  hautala/-|   a...@alum.wpi.edu

pgp b2c9d448
___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-04 Thread Michael

On 2014-12-04, at 11:47 AM, Arno Hautala  wrote:

> The tl;dr for this that tmutil might provide some of what you're
> looking for, but if you don't trust TimeMachine, you should use a
> different backup tool.

It isn't about whether or not I trust Time Machine, or a different tool.

1. No backup tool will protect against a drive error that stores the wrong 
thing onto the disk. At best, you can flush the system cache, and re-read the 
file. That only guarantees that it matches today.

2. No tool can be considered perfect against bugs. They can and will happen. I 
found, and reported, an issue with Time Machine. Does not mean that other tools 
don't have other problems.


> On Thu, Dec 4, 2014 at 12:25 PM, Michael  wrote:
>> 
>> What I envision:
>> 1. A tool to list "which files on the backup do not need to be backed up" -- 
>> in other words, the list of files that time machine think are worth backing 
>> up but can be skipped. These can then be sent to a diff-tool to verify that 
>> what is on the backup matches.
> 
> TimeMachine determines what needs to be backed up by watching FSEvents
> for directories with changed files. During the backup TM then inspects
> every file inside the flagged directories.
> To build your list of files to check, you'd need to do the same thing.

No, that doesn't work, and tells me that I wasn't clear. So let me try again.

FSEvents tells backupd which directories have modified files; backupd checks 
each of those directories to see which files have been changed. It then 
consults an internal list of "do not back up", the system list of 
user-specified "do not back up" files, and the per-file "do not back up" meta 
data flag. If all of those pass, then it decides to back it up.

I want a list of all the files on the machine that would be backed up if it 
were doing a "from-scratch" backup, EXCEPT for those where FSEvents says "This 
needs to be backed up".

That is the list of everything on the backup that should match the file system.


>> 2. A way to let time machine know that "Hey, this file does not actually 
>> match, and needs to be backed up". "Delete all backups of file X" is one 
>> such, but it is overkill. On the other hand, if the file on the backup is in 
>> error, maybe it should be removed. It is also not quite sufficient, if files 
>> should be backed up but are missing.
> 
> I don't think there's any method to do this other than modifying the
> target files so TimeMachine explicitly notices the file.

Yea, I want to force TM to re-backup, without having to change the file. Not 
even change the date. Basically, a way to tell time machine "the existing file 
on the backup properly belongs to an older backup set, but should be replaced 
anew on the next backup set."


> You can use the tmutil command to see what differs in any two
> snapshots or from a snapshot to the current computer state.

True. Now, do you have a way to say "Only tell me if backupd would not want to 
back this up"? If it's different, but backupd would back it up, then I don't 
care. Or if backupd would say "This is on the do not backup list", then I don't 
care.

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-04 Thread LuKreme

> On Dec 4, 2014, at 10:25 AM, Michael  wrote:
> 
> I'd like a way to verify a time machine backup.

If Time Machine completed without an error, it’s verified.

> What I envision:
> 1. A tool to list "which files on the backup do not need to be backed up" -- 
> in other words, the list of files that time machine think are worth backing 
> up but can be skipped. These can then be sent to a diff-tool to verify that 
> what is on the backup matches.
> 
> 2. A way to let time machine know that "Hey, this file does not actually 
> match, and needs to be backed up". "Delete all backups of file X" is one 
> such, but it is overkill. On the other hand, if the file on the backup is in 
> error, maybe it should be removed. It is also not quite sufficient, if files 
> should be backed up but are missing.

Time Machine manages which files need to backup on its own. If you want that 
level of control, I suggest using a tool like rsnapshot.

-- 
"The good news: Hadron Collider went live and did not destroy ALL
reality. Bad: I'm the only one who remembers President Gore's 2 terms."

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-05 Thread Michael

On 2014-12-04, at 8:06 PM, LuKreme  wrote:

> 
>> On Dec 4, 2014, at 10:25 AM, Michael  wrote:
>> 
>> I'd like a way to verify a time machine backup.
> 
> If Time Machine completed without an error, it’s verified.

That is just silly.

1. Any program can have bugs. Time machine included. (I have found time machine 
making errors, and reported them. The only solution I have so far is to wipe 
the backup and start over).
2. Unless you flush the system cache and the drive cache, and force a re-read 
from the media, there is no way to test for silent data corruption. I don't 
think time machine does this. I don't know of any system API to flush the 
kernel cache, nor of any device independent way to flush the drive cache.

---
Entertaining minecraft videos
http://YouTube.com/keybounce

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-05 Thread LuKreme

> On Dec 5, 2014, at 3:23 AM, Michael  wrote:
> 
> 
> On 2014-12-04, at 8:06 PM, LuKreme  wrote:
> 
>> 
>>> On Dec 4, 2014, at 10:25 AM, Michael  wrote:
>>> 
>>> I'd like a way to verify a time machine backup.
>> 
>> If Time Machine completed without an error, it’s verified.
> 
> That is just silly.

Not really.

> 1. Any program can have bugs. Time machine included. (I have found time 
> machine making errors, and reported them. The only solution I have so far is 
> to wipe the backup and start over).

And if it had a bug, it would STILL verify.

> 2. Unless you flush the system cache and the drive cache, and force a re-read 
> from the media, there is no way to test for silent data corruption. I don't 
> think time machine does this. I don't know of any system API to flush the 
> kernel cache, nor of any device independent way to flush the drive cache.

No, Time machine writes and check what it wrote. If it can read a corrupted 
file, then it will write and verify a corrupted file. But then, so will 
anything else.


-- 
NO ONE WANTS TO HEAR FROM MY ARMPITS Bart chalkboard Ep. 3F01

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-05 Thread Arno Hautala
On Thu, Dec 4, 2014 at 6:45 PM, Michael  wrote:
>
> On 2014-12-04, at 11:47 AM, Arno Hautala  wrote:
>
>> The tl;dr for this that tmutil might provide some of what you're
>> looking for, but if you don't trust TimeMachine, you should use a
>> different backup tool.
>
> It isn't about whether or not I trust Time Machine, or a different tool.

Ultimately, it is. Either the backup app is trustworthy (including
acceptable rates of errors from bugs, faulty hardware, cosmic rays) or
you find a tool or multiple backup strategies that do provide that
level of trust.

> 1. No backup tool will protect against a drive error that stores the wrong 
> thing onto the disk. At best, you can flush the system cache, and re-read the 
> file. That only guarantees that it matches today.

You want to start using ZFS and ECC RAM.

> 2. No tool can be considered perfect against bugs. They can and will happen. 
> I found, and reported, an issue with Time Machine. Does not mean that other 
> tools don't have other problems.

Visible on OpenRadar?

> FSEvents tells backupd which directories have modified files; backupd checks 
> each of those directories to see which files have been changed. It then 
> consults an internal list of "do not back up", the system list of 
> user-specified "do not back up" files, and the per-file "do not back up" meta 
> data flag. If all of those pass, then it decides to back it up.
>
> I want a list of all the files on the machine that would be backed up if it 
> were doing a "from-scratch" backup, EXCEPT for those where FSEvents says 
> "This needs to be backed up".
>
> That is the list of everything on the backup that should match the file 
> system.

So, you want a list of files that have already been backed up, that
haven't changed on the filesystem, so you can verify that the data has
been correctly backed up. In the ideal case, if performed immediately
after the backup completes, this would be every file in the backup.

I think the easiest way to do this would be to just compare the backup
to the current state (tmutil compare). If the list of differing files
is the same as the list of files that need to be backed up (collected
by fsevents), your backup can be considered verified.

The easier way would be to use ZFS, snapshots, and compare a snapshot
to your current state (zfs diff).

> Yea, I want to force TM to re-backup, without having to change the file. Not 
> even change the date. Basically, a way to tell time machine "the existing 
> file on the backup properly belongs to an older backup set, but should be 
> replaced anew on the next backup set."

The only system that provides this that I can think of is rsync. I
suppose with rsnapshot you could manually rsync a specific file into
an existing snapshot.

Overall though, the point of a good backup system is that it's
supposed to be automatic. You should periodically verify your backup,
but I'd think this should just be a matter of diffing the backup to
the current state. If anything differs, you can manually determine if:
- it was properly not backed up
- it differs because it's changed since the last backup, but the last
backup is valid
- if the backup file is corrupt
- if the computer copy is corrupt

>> You can use the tmutil command to see what differs in any two
>> snapshots or from a snapshot to the current computer state.
>
> True. Now, do you have a way to say "Only tell me if backupd would not want 
> to back this up"? If it's different, but backupd would back it up, then I 
> don't care. Or if backupd would say "This is on the do not backup list", then 
> I don't care.

So again, you want the list of files that are included in the backup,
but haven't changed. I can't think of any easy way to get that list
without scanning the entire disk. And at that point, you might as well
just be comparing the whole backup.

-- 
arno  s  hautala/-|   a...@alum.wpi.edu

pgp b2c9d448
___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-05 Thread Macs R We

On Dec 5, 2014, at 9:07 PM, Arno Hautala  wrote:

> So, you want a list of files that have already been backed up, that
> haven't changed on the filesystem, so you can verify that the data has
> been correctly backed up. In the ideal case, if performed immediately
> after the backup completes, this would be every file in the backup.
> 
> I think the easiest way to do this would be to just compare the backup
> to the current state (tmutil compare). If the list of differing files
> is the same as the list of files that need to be backed up (collected
> by fsevents), your backup can be considered verified.

Well, that just verifies the table of contents.  I think he wants to verify the 
contents (data), and for that he needs the table of contents as a first step.

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-06 Thread Michael

On 2014-12-05, at 10:50 PM, Macs R We  wrote:

> 
> On Dec 5, 2014, at 9:07 PM, Arno Hautala  wrote:
> 
>> So, you want a list of files that have already been backed up, that
>> haven't changed on the filesystem, so you can verify that the data has
>> been correctly backed up. In the ideal case, if performed immediately
>> after the backup completes, this would be every file in the backup.
>> 
>> I think the easiest way to do this would be to just compare the backup
>> to the current state (tmutil compare). If the list of differing files
>> is the same as the list of files that need to be backed up (collected
>> by fsevents), your backup can be considered verified.
> 
> Well, that just verifies the table of contents.  I think he wants to verify 
> the contents (data), and for that he needs the table of contents as a first 
> step.

Correct.

ZFS's checksums can tell me "Hey, the data you tried to read is no good". I 
want to know that before I need to restore from backup.

I can't use ZFS for time machine. I can't control whether the drive's internal 
buffer is error correcting memory or not. I can't control if the disk sector 
was written correctly but has become unreadable.

All I can do is compare what's on the disk with what's on the backup, file by 
file. "tmutil compare" can do that, but it will report too many false positives 
-- everything modified since the last backup will show as different, and 
everything that should not be backed up will show as missing.

To have an automated verification, I need to be able to filter to only those 
files that should be on the backup and have not changed / do not need to be 
backed up again.

===

Open Radar: I'm not sure. About half of what I submit is closed as a duplicate, 
and I can never see the originals. I have no clue how to see someone else's bug 
report, nor how to share mine.

___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk


Re: Verifying a time machine backup

2014-12-06 Thread Arno Hautala
On Sat, Dec 6, 2014 at 4:31 AM, Michael  wrote:
>
> ZFS's checksums can tell me "Hey, the data you tried to read is no good". I 
> want to know that before I need to restore from backup.

That's what a scrub is for. And verifying the backup against the
current state. Things get easier if the main data is on ZFS as well.

> I can't use ZFS for time machine.

You can. My TimeMachine store is hosted on a FreeNAS box. It's not as
ideal as a directly attached; I definitely see more TM errors over
wireless, but I don't think a TimeCapsule would be any better.

> I can't control whether the drive's internal buffer is error correcting 
> memory or not. I can't control if the disk sector was written correctly but 
> has become unreadable.

Don't forget cosmic rays, theft, fire, or flood. Though ZFS is
designed to work around the other issues you mentioned.

> All I can do is compare what's on the disk with what's on the backup, file by 
> file. "tmutil compare" can do that, but it will report too many false 
> positives -- everything modified since the last backup will show as 
> different, and everything that should not be backed up will show as missing.

'compare' doesn't list excluded files.

You're left with files that have changed and I'd think you'd want to
know about those. I can't think of a way to determine that a file has
changed on disk vs. becoming corrupt on disk vs. becoming corrupt in
the backup. fsevents can help narrow it down to the change being in
the backup or on disk, but maybe fsevents missed a file, or a bug in
fsevents has triggered a false positive / negative.

> To have an automated verification, I need to be able to filter to only those 
> files that should be on the backup and have not changed / do not need to be 
> backed up again.

I don't think you can really have an automated verification. You can
get a list of differing files, but you need to inspect the files to
determine if there is corruption or not. And if your verification
occurs after corruption on disk has been backed up, what are you
verifying exactly?

You could do things like writing checksum files and comparing against
those as well as against the backup, but now you're getting into the
territory of implementing your own custom version of a subset of ZFS
or some other modern filesystem.

It occurs to me that with OpenZFS [1] it'd be possible to format some
hunk of your internal hard drive as ZFS and put all your personal
files there. Then you can replicate instead of using TimeMachine.

[1]: https://openzfsonosx.org

> Open Radar: I'm not sure. About half of what I submit is closed as a 
> duplicate, and I can never see the originals. I have no clue how to see 
> someone else's bug report, nor how to share mine.

You share yours by posting the same content to openradar.appspot.com


-- 
arno  s  hautala/-|   a...@alum.wpi.edu

pgp b2c9d448
___
MacOSX-talk mailing list
MacOSX-talk@omnigroup.com
http://www.omnigroup.com/mailman/listinfo/macosx-talk