> How does this code handle hash collisions, and do you have some regression > tests that purposefully create a dedup hash collision, and verify that the > 'right thing' happens?
The two hash function that can be used are cryptographics and not broken yet. So nobody knows how to generate a collision. You can do the math to calculate the probability of collision using a 256 bit hash while processing 1EiB of data the result is so low you can consider it won't happen. The sha256 ZFS deduplication works the same way regarding collisions. I currently use qemu-io-test for testing purpose and iozone with the -w flag in the guest. I would like to find a good deduplication stress test to run in a guest. Regards Benoît > It's great that this almost works, but it seems rather dangerous to put > something like this into the mainline code without some regression tests. > > (I'm also suspecting the regression test will be a great way to find > flakey hardware) > > -------------------------------------------------------------------------- > Troy Benjegerdes 'da hozer' ho...@hozed.org > > Somone asked my why I work on this free (http://www.fsf.org/philosophy/) > software & hardware (http://q3u.be) stuff and not get a real job. > Charles Shultz had the best answer: > > "Why do musicians compose symphonies and poets write poems? They do it > because life wouldn't have any meaning for them if they didn't. That's why > I draw cartoons. It's my life." -- Charles Shultz