Maybe a happy path exists, between efficiency and correctness ;) I think the Rsync is probably a good comparison to the use case at hand (it identifies identical blocks between the source and target, and only sends deltas of the wire). It combines a quick has to identify candidates that might be duplicates, but relies on comparison to ensure that the match is real, and not just a hash collision.
See the source of all knowledge: http://en.wikipedia.org/wiki/Rsync#Algorithm On Sat, Mar 10, 2012 at 1:15 PM, Maru Newby <mne...@internap.com> wrote: > Hi Joe, > > There's one huge difference between page deduplication and object > deduplication: Page size is small and predictable, whereas object size is > not. Given this, full compares would not be a good way to implement > performant object deduplication in swift. > > Thanks, > > > Maru > > > On 2012-03-10, at 9:57 AM, Joe Gordon wrote: > > Paulo, Caitlin, > > > Can SHA-1 collisions be generated? If so can you point me to the article? > > Also why compare hashes in the first place? Linux 'Kenel Samepage Merging', > which does page deduplication for KVM, does a full compare to be safe [1]. > Even if collisions can't be generated, what are the odds of a collision > (for SHA-1 and SHA-256) happening by chance when using Swift at scale? > > > best, > Joe Gordon > > > > > [1] http://www.linux-kvm.com/sites/default/files/KvmForum2008_KSM.pdf > > > On Fri, Mar 9, 2012 at 4:44 PM, Caitlin Bestler > <caitlin.best...@nexenta.com> wrote: >> >> Paulo, >> >> >> >> I believe you’ll find that we’re thinking along the same lines. Please >> review my proposal at http://etherpad.openstack.org/P9MMYSWE6U >> >> >> >> One quick observation is that SHA-1 is totally inadequate for >> fingerprinting objects in a public object store. An attacker could easily >> >> predict the fingerprint of content likely to be posted, generate alternate >> content that had the same SHA-1 fingerprint and pre-empt >> >> the signature. For example: an ISO of an open source OS distribution. If I >> get my false content with the same fingerprint into the >> >> repository first then everyone who downloads that ISO will get my altered >> copy. >> >> >> >> SHA-256 is really needed to make this type of attack infeasible. >> >> >> >> I also think that distributed deduplication works very well with object >> versioning. Your comments on the proposal cited above >> >> would be great to hear. >> >> >> >> From: openstack-bounces+caitlin.bestler=nexenta....@lists.launchpad.net >> [mailto:openstack-bounces+caitlin.bestler=nexenta....@lists.launchpad.net] >> On Behalf Of Paulo Ricardo Motta Gomes >> Sent: Thursday, March 08, 2012 1:19 PM >> To: openstack@lists.launchpad.net >> >> >> Subject: [Openstack] Enabling data deduplication on Swift >> >> >> >> Hello everyone, >> >> >> >> I'm a student of the European Master in Distributed Computing (EMDC) >> currently working on my master thesis on distributed content-addressable >> storage/deduplication. >> >> >> >> I'm happy to announce I will be contributing the outcome of my thesis work >> to OpenStack by enabling both object-level and block-level deduplication >> functionality on Swift >> (https://answers.launchpad.net/swift/+question/156862). >> >> >> >> I have written a detailed blog post where I describe the initial >> architecture of my >> solution: http://paulormg.com/2012/03/05/enabling-deduplication-in-a-distributed-object-storage/ >> >> >> >> Feedback from the OpenStack/Swift community would be very appreciated. >> >> >> >> Cheers, >> >> >> >> Paulo >> >> >> >> -- >> European Master in Distributed Computing - www.kth.se/emdc >> Royal Institute of Technology - KTH >> >> Instituto Superior Técnico - IST >> >> http://paulormg.com >> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp