On Wednesday 05 December 2007, [EMAIL PROTECTED] wrote: > You'd think that using this technology on a live filesystem could incur a > significant performance penalty due to all those calculations (fuse module > anyone ?). Imagine a hardware optimized data de-duplication disk > controller, similar to raid XOR optimized cpus. Now that would be cool. All > it would need to store was meta-data when it had already seen the exact > same block. I think fundamentally it is similar in result to on the fly > disk compression.
Actually, the impact - if the filesystem is designed correctly - shouldn't be that horrible. After all, Sun has managed to integrate checksums into ZFS and still get great performance. In addition, ZFS doesn't directly overwrite data but uses a new datablock each time... What you would have to do then is keep a lookup table with the checksums to find possible matches quickly. Then when you find one, do another compare to be 100% sure you didn't have a collision on your checksums. If that works, then you can reference that datablock. It is still a lot of work, but as sun showed, on the fly compares and checksums are doable without too much of a hit. Peter. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos