Data corruption check

2007-09-17 Thread Fabian Cenedese
Hi

I was wondering what happens if a file that is regularly synched but
seldom changes gets corrupted in the copy. As it seldom (or never)
changes the mod time will always be the same. But if the content
changes (bit flip, bad HD...) will rsync get this and synch it again?

Would I need the -c (crc) flag for this to work? That of course slows
things down quite a bit. Is this the only way to ensure that the
contents are the same on both sides?

There's also the -I, --ignore-times switch. If I use this but without -c
what method is used for checking then? Or does -I imply -c?

Thanks

bye  Fabi


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Data corruption check

2007-09-18 Thread Matt McCutchen
On 9/18/07, Fabian Cenedese <[EMAIL PROTECTED]> wrote:
> I was wondering what happens if a file that is regularly synched but
> seldom changes gets corrupted in the copy.

Are you referring to rsync writing corrupted data to the destination
file or a problem with the destination filesystem or disk causing the
file to read as data different from what was written?

> As it seldom (or never)
> changes the mod time will always be the same. But if the content
> changes (bit flip, bad HD...) will rsync get this and synch it again?

No...

> Would I need the -c (crc) flag for this to work? That of course slows
> things down quite a bit. Is this the only way to ensure that the
> contents are the same on both sides?

Yes, yes.  The only way to check whether a bit of the file has flipped
due to HD or filesystem flakiness is to read the entire file, which is
what -c does.

> There's also the -I, --ignore-times switch. If I use this but without -c
> what method is used for checking then? Or does -I imply -c?

-I rewrites the destination file no matter what, while -c computes its
MD4 or MD5 checksum first and then rewrites it only if its checksum
differs from that of the source file.  Either option gives the same
end result for the destination file.  However, they may have different
performance (-c uses more disk reading but potentially less disk
writing and slightly less network traffic), and -I logs more transfers
than -c and interferes with --link-dest.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Data corruption check

2007-09-19 Thread Fabian Cenedese
At 15:15 18.09.2007 -0400, Matt McCutchen wrote:
>On 9/18/07, Fabian Cenedese <[EMAIL PROTECTED]> wrote:
>> I was wondering what happens if a file that is regularly synched but
>> seldom changes gets corrupted in the copy.
>
>Are you referring to rsync writing corrupted data to the destination
>file or a problem with the destination filesystem or disk causing the
>file to read as data different from what was written?

I was thinking of any problem, even a transport error.

>> There's also the -I, --ignore-times switch. If I use this but without -c
>> what method is used for checking then? Or does -I imply -c?
>
>-I rewrites the destination file no matter what, while -c computes its
>MD4 or MD5 checksum first and then rewrites it only if its checksum
>differs from that of the source file.  Either option gives the same
>end result for the destination file.  However, they may have different
>performance (-c uses more disk reading but potentially less disk
>writing and slightly less network traffic), and -I logs more transfers
>than -c and interferes with --link-dest.

Thanks for the explanations. That means that -l and -c are not
usable together as they contradict themselves, right?

I was asking because I'm responsible for our backups. The
current solution with rsync works nicely. While the RAID storage
also monitor the HD's SMART state I was still wondering
about a way to detect otherwise unknown data corruption.

I guess if I first made a normal rsync and then a rsync --dry-run -c
I could find file differences that shouldn't be (provided there
wasn't any real change otherwise, like in the middle of the night).
Of course that wouldn't tell me what side had changed, but still
something worth considering doing once a month or so...

Thanks for your help

bye  Fabi


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: Data corruption check

2007-09-19 Thread Tony Abernethy
Fabian Cenedese wrote:
> At 15:15 18.09.2007 -0400, Matt McCutchen wrote:
> >On 9/18/07, Fabian Cenedese <[EMAIL PROTECTED]> wrote:
> >> I was wondering what happens if a file that is regularly 
> synched but
> >> seldom changes gets corrupted in the copy.
> >
> >Are you referring to rsync writing corrupted data to the destination
> >file or a problem with the destination filesystem or disk causing the
> >file to read as data different from what was written?
> 
> I was thinking of any problem, even a transport error.
> 
> >> There's also the -I, --ignore-times switch. If I use this 
> but without -c
> >> what method is used for checking then? Or does -I imply -c?
> >
> >-I rewrites the destination file no matter what, while -c 
> computes its
> >MD4 or MD5 checksum first and then rewrites it only if its checksum
> >differs from that of the source file.  Either option gives the same
> >end result for the destination file.  However, they may have 
> different
> >performance (-c uses more disk reading but potentially less disk
> >writing and slightly less network traffic), and -I logs more 
> transfers
> >than -c and interferes with --link-dest.
> 
> Thanks for the explanations. That means that -l and -c are not
> usable together as they contradict themselves, right?
> 
> I was asking because I'm responsible for our backups. The
> current solution with rsync works nicely. While the RAID storage
> also monitor the HD's SMART state I was still wondering
> about a way to detect otherwise unknown data corruption.
> 
> I guess if I first made a normal rsync and then a rsync --dry-run -c
> I could find file differences that shouldn't be (provided there
> wasn't any real change otherwise, like in the middle of the night).
> Of course that wouldn't tell me what side had changed, but still
> something worth considering doing once a month or so...
> 
> Thanks for your help
> 
> bye  Fabi
> 
Seems like an old sailors rule:
Have one chronograph or three, never two.
With two, you know something is wrong, but no idea what to do.

Disk is cheap.
Thanks for the idea of the rsync --dry-run -c
Methinks it will help a lot of Windows users. 
(even with only two comparands)

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Data corruption check

2007-09-19 Thread Matt McCutchen
On 9/19/07, Fabian Cenedese <[EMAIL PROTECTED]> wrote:
> Thanks for the explanations. That means that -l and -c are not
> usable together as they contradict themselves, right?

Correct.  I tested with rsync 2.6.9 and it appears that if you use
both, -c overrides -I.

> I guess if I first made a normal rsync and then a rsync --dry-run -c
> I could find file differences that shouldn't be (provided there
> wasn't any real change otherwise, like in the middle of the night).
> Of course that wouldn't tell me what side had changed, but still
> something worth considering doing once a month or so...

I like this idea.  In fact, I often use "rsync -i --dry-run" (with or
without -c) as a sort of filesystem diff command.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Data corruption check

2007-09-19 Thread Keith Lofstrom
On 9/18/07, Fabian Cenedese <[EMAIL PROTECTED]> wrote:
> I was wondering what happens if a file that is regularly synched but
> seldom changes gets corrupted in the copy.

On Wed, Sep 19, 2007 at 09:23:28AM +0200, Fabian Cenedese wrote:
> I was asking because I'm responsible for our backups. The
> current solution with rsync works nicely. While the RAID storage
> also monitor the HD's SMART state I was still wondering
> about a way to detect otherwise unknown data corruption.

I run rsync inside of dirvish (www.dirvish.org) for automated
backups.  I also run osiris (osiris.shmoo.com) which scans for
modified files, both checking the metadata and a hash of the 
actual data, finding all changes relative to a database on a
central osiris server.  It should be possible to combine these
mechanisms, scanning for changes with osiris in parallel on all
the backup clients, then using rsync to move the files that
osiris detected as changed, without rsync having to scan the
whole filesystem again.  Or some combination of both, letting
osiris look in detail at high-vulnerability files daily, then
the rest of the filesystem in sections over the course of a week.

The osiris scheduler is weird;  it is designed to be robust against
system vandals, but difficult to configure and especially difficult
to run as part of a larger app.  But a good programmer would be
able to break out the scanning components and tie them into a
different tool.

In any case, if you are worried about file corruption, consider 
running osiris, which will tell you more than you ever wanted to
know about what is changing in your filesystems.

Keith

-- 
Keith Lofstrom  [EMAIL PROTECTED] Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Data corruption check

2007-09-20 Thread Fabian Cenedese
At 08:12 19.09.2007 -0700, Keith Lofstrom wrote:
>On 9/18/07, Fabian Cenedese <[EMAIL PROTECTED]> wrote:
>> I was wondering what happens if a file that is regularly synched but
>> seldom changes gets corrupted in the copy.
>
>On Wed, Sep 19, 2007 at 09:23:28AM +0200, Fabian Cenedese wrote:
>> I was asking because I'm responsible for our backups. The
>> current solution with rsync works nicely. While the RAID storage
>> also monitor the HD's SMART state I was still wondering
>> about a way to detect otherwise unknown data corruption.
>
>I run rsync inside of dirvish (www.dirvish.org) for automated
>backups.  I also run osiris (osiris.shmoo.com) which scans for
>modified files, both checking the metadata and a hash of the 
>actual data, finding all changes relative to a database on a
>central osiris server.

We already have a backup system that creates incremental backups
as well as mirroring the file server as well as the backups to a
second RAID. I think the osiris is a bit overkill for our application.
But thanks for the info.

bye  Fabi


-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html