I'm not entirely focused on the plausible risk of a hash collision. The chance of a broken firmware or a logical error during de-dup/replication destroying your data is highly more likely to happen.
Putting all your eggs in one basket is what I'm against. Putting all your data in a duplicated disksystem solution is putting all your eggs in one basket. When TSM is duplicating your data (aka backing up storage pools), there is no logical connection between your primary storage pool and your copypool. In a replicated/mirrored solution, you have a logical connection (not only a physical) which produces a risk of striking out not only your primary storage, but also your copypool storage in the same process. In contrary to a replicated/mirrored solution, TSM actually needs to be able to read the logical part of data, aka the files, while a replicated solution with no application awareness doesnt read the logical part, only the bits and bytes. Regards Daniel Daniel Sparrman Exist i Stockholm AB Växel: 08-754 98 00 Fax: 08-754 97 30 daniel.sparr...@exist.se http://www.existgruppen.se Posthusgatan 1 761 30 NORRTÄLJE -----"Allen S. Rout" <a...@ufl.edu> skrev: ----- Till: ADSM: Dist Stor Manager <ADSM-L@vm.marist.edu> Från: "Allen S. Rout" <a...@ufl.edu> Datum: 10/05/2011 14:43 Kopia: Daniel Sparrman <daniel.sparr...@exist.se> Ärende: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] Ang: Re: [ADSM-L] vtl versus file systems for pirmary pool Extensive top-post trail deleted. On 10/05/2011 02:39 AM, Daniel Sparrman wrote: > As with the hash conflict, the DD uses SHA-1 with a variable block > length for deduplication. Theoretically, there is a 2^160 chance it > will happen. Doesnt seem to be that bad, but your first hash > collision is randomly more likely to happen than that number > suggests. I agree with your technical analysis, and I feel your disquiet. Waay back in the '80s, I brought a (8mm :) tape to a meeting with a dept official to say "One chance in a billion means to me that there are five broken files on this tape".. The topic then was "should we make copies of these?" But I feel that you express these numbers in a vacuum which misleads. The appropriate judgement has to be, not "Is an error possible?", but "How risky is this?"; and that risk has to be compared to the other risks you're taking. I feel that you are focused on the unpredictably large impact of a collision. "All my backups are gone!" is emotionally accessible to any of us, and makes me shudder. But that scenario is not a plausible result of a hash collision. Not that the reality is peachy: "Some difficult-to identify set of my files are now corrupt" is quite bad enough, thank you. A 1/10^30 risk just doesn't have the same emotional availability. But the homeopathic chances of it happening ought to temper the resistance. I would invoke the analogy of driving your car across the country vs. taking an airplane; Many are paralyzed by the risks of air travel, when the actuaries will tell you with great precision that you've a better chance of dying in the drive _to the airport_ than once you've taken off. Similarly, I'd guess that more DD failures have happened due to physical violence than due to hash collisions. - Allen S. Rout