On Tue, Mar 11, 2014 at 11:52:51AM -0500, Karl O. Pinc wrote: > On 03/11/2014 11:02:28 AM, Sig Pam wrote: > > Hi everbody! > > > > I'm currently working in a project which has to copy huge amounts of > > data from one storage to another. For a reason I cannot validate any > > longer, there is a roumor that "rsync may silently corrupt data". > > Personally, I don't believe that. > > > > "They" explain it this way: "rsync does an in-stream data > > deduplication. It creates a checksum for each data block to transfer, > > and if a block with the same checksum has already been transferred > > sooner, this old block will be re-used to save bandwidth. But, for > > any > > reason, two diffent blocks can produce the same checksum even if the > > source data is not the same, effectively corrupting the data stream". > > Well, yeah. It works that way if you're transferring data over > the network. > > The question is: "how often will this problem exhibit itself?" > The answer is: "Usually, never within the lifetime of the Universe." >
If anyone wants a much longer discription of how the rsync algorithm works. There was a talk at the Ottawa Linux Symposium by Andrew Tridgell: http://www.linuxsymposium.org/2000/rsync.php I found a recording here: http://ftp.gnumonks.org/pub/congress-talks/ols2000/high/cd2/2000-07-21_15-02-49_C_64.mp3 If you prefer reading, there is a transcript on Source Forge in Lyx format: http://olstrans.cvs.sourceforge.net/viewvc/olstrans/ols2000/transcripts/completed/OLS2000-rsync.lyx?view=markup > You're a lot more likely to have data corruption due to a > cosmic ray hitting your box. > > There are some cases where the answer is: "Maybe more often." The only > time I can think of that you'd want to worry about > is if you're researching MD5 > checksum collisions and have a lot of data on disk that has > collisions in the checksumming. In other words, > if you're actively trying to cause problems it might be an issue. > > (The older rsyncs used MD4.) > > If you're actually _copying_ data rather than backing it up then > avoid the issue by not using rsync. Otherwise the tradeoff > is worth the risk. > > Karl <k...@meme.com> > Free Software: "You don't pay back, you pay forward." > -- Robert A. Heinlein > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html