Do you really need to resolve the conflicts? It might be easier and sufficient to just flag those hashes where a conflict has been detected as : "dont dedup this hash anymore, collissions have been seen."
On Wed, Jan 2, 2013 at 10:40 AM, Benoît Canet <benoit.ca...@irqsave.net> wrote: > Le Wednesday 02 Jan 2013 à 12:26:37 (-0600), Troy Benjegerdes a écrit : >> The probability may be 'low' but it is not zero. Just because it's >> hard to calculate the hash doesn't mean you can't do it. If your >> input data is not random the probability of a hash collision is >> going to get scewed. >> >> Read about how Bitcoin uses hashes. >> >> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards, >> and I can make a regression test that will create deduplication hash >> collisions on purpose. > > It's not a problem as Eric pointed out while reviewing the previous patchset > there is a small place left with zeroes on the deduplication block. > A bit could be set on it when a collision is detected and an offset could > point > to a cluster used to resolve collisions. > >> >> >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote: >> > > How does this code handle hash collisions, and do you have some >> > > regression >> > > tests that purposefully create a dedup hash collision, and verify that >> > > the >> > > 'right thing' happens? >> > >> > The two hash function that can be used are cryptographics and not broken >> > yet. >> > So nobody knows how to generate a collision. >> > >> > You can do the math to calculate the probability of collision using a 256 >> > bit >> > hash while processing 1EiB of data the result is so low you can consider it >> > won't happen. >> > The sha256 ZFS deduplication works the same way regarding collisions. >> > >> > I currently use qemu-io-test for testing purpose and iozone with the -w >> > flag in >> > the guest. >> > I would like to find a good deduplication stress test to run in a guest. >> > >> > Regards >> > >> > Beno?t >> > >> > > It's great that this almost works, but it seems rather dangerous to put >> > > something like this into the mainline code without some regression tests. >> > > >> > > (I'm also suspecting the regression test will be a great way to find >> > > flakey hardware) >> > > >> > > -------------------------------------------------------------------------- >> > > Troy Benjegerdes 'da hozer' >> > > ho...@hozed.org >> > > >> > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/) >> > > software & hardware (http://q3u.be) stuff and not get a real job. >> > > Charles Shultz had the best answer: >> > > >> > > "Why do musicians compose symphonies and poets write poems? They do it >> > > because life wouldn't have any meaning for them if they didn't. That's >> > > why >> > > I draw cartoons. It's my life." -- Charles Shultz >> >> -- >> -------------------------------------------------------------------------- >> Troy Benjegerdes 'da hozer' ho...@hozed.org >> >> Somone asked my why I work on this free (http://www.fsf.org/philosophy/) >> software & hardware (http://q3u.be) stuff and not get a real job. >> Charles Shultz had the best answer: >> >> "Why do musicians compose symphonies and poets write poems? They do it >> because life wouldn't have any meaning for them if they didn't. That's why >> I draw cartoons. It's my life." -- Charles Shultz >> >