Hi Joe, There's one huge difference between page deduplication and object deduplication: Page size is small and predictable, whereas object size is not. Given this, full compares would not be a good way to implement performant object deduplication in swift.
Thanks, Maru On 2012-03-10, at 9:57 AM, Joe Gordon wrote: > Paulo, Caitlin, > > > Can SHA-1 collisions be generated? If so can you point me to the article? > > Also why compare hashes in the first place? Linux 'Kenel Samepage Merging', > which does page deduplication for KVM, does a full compare to be safe [1]. > Even if collisions can't be generated, what are the odds of a collision (for > SHA-1 and SHA-256) happening by chance when using Swift at scale? > > > best, > Joe Gordon > > > > [1] http://www.linux-kvm.com/sites/default/files/KvmForum2008_KSM.pdf > > > On Fri, Mar 9, 2012 at 4:44 PM, Caitlin Bestler <caitlin.best...@nexenta.com> > wrote: > Paulo, > > > > I believe you’ll find that we’re thinking along the same lines. Please review > my proposal at http://etherpad.openstack.org/P9MMYSWE6U > > > > One quick observation is that SHA-1 is totally inadequate for fingerprinting > objects in a public object store. An attacker could easily > > predict the fingerprint of content likely to be posted, generate alternate > content that had the same SHA-1 fingerprint and pre-empt > > the signature. For example: an ISO of an open source OS distribution. If I > get my false content with the same fingerprint into the > > repository first then everyone who downloads that ISO will get my altered > copy. > > > > > SHA-256 is really needed to make this type of attack infeasible. > > > > I also think that distributed deduplication works very well with object > versioning. Your comments on the proposal cited above > > would be great to hear. > > > > From: openstack-bounces+caitlin.bestler=nexenta....@lists.launchpad.net > [mailto:openstack-bounces+caitlin.bestler=nexenta....@lists.launchpad.net] On > Behalf Of Paulo Ricardo Motta Gomes > Sent: Thursday, March 08, 2012 1:19 PM > To: openstack@lists.launchpad.net > > > Subject: [Openstack] Enabling data deduplication on Swift > > > > Hello everyone, > > > > I'm a student of the European Master in Distributed Computing (EMDC) > currently working on my master thesis on distributed content-addressable > storage/deduplication. > > > > I'm happy to announce I will be contributing the outcome of my thesis work to > OpenStack by enabling both object-level and block-level deduplication > functionality on Swift (https://answers.launchpad.net/swift/+question/156862). > > > > I have written a detailed blog post where I describe the initial architecture > of my solution: > http://paulormg.com/2012/03/05/enabling-deduplication-in-a-distributed-object-storage/ > > > > Feedback from the OpenStack/Swift community would be very appreciated. > > > > Cheers, > > > > Paulo > > > > -- > European Master in Distributed Computing - www.kth.se/emdc > Royal Institute of Technology - KTH > > Instituto Superior Técnico - IST > > http://paulormg.com > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp