Hi Joe,

There's one huge difference between page deduplication and object 
deduplication:  Page size is small and predictable, whereas object size is not. 
 Given this, full compares would not be a good way to implement performant 
object deduplication in swift.

Thanks,


Maru

On 2012-03-10, at 9:57 AM, Joe Gordon wrote:

> Paulo, Caitlin, 
> 
> 
> Can SHA-1 collisions be generated?  If so can you point me to the article? 
> 
> Also why compare hashes in the first place?  Linux 'Kenel Samepage Merging', 
> which does page deduplication for KVM, does a full compare to be safe [1].  
> Even if collisions can't be generated, what are the odds of a collision (for 
> SHA-1 and SHA-256) happening by chance when using Swift at scale?  
> 
> 
> best,
> Joe Gordon
> 
> 
> 
> [1] http://www.linux-kvm.com/sites/default/files/KvmForum2008_KSM.pdf
> 
> 
> On Fri, Mar 9, 2012 at 4:44 PM, Caitlin Bestler <caitlin.best...@nexenta.com> 
> wrote:
> Paulo,
> 
>  
> 
> I believe you’ll find that we’re thinking along the same lines. Please review 
> my proposal at http://etherpad.openstack.org/P9MMYSWE6U
> 
>  
> 
> One quick observation is that SHA-1 is totally inadequate for fingerprinting 
> objects in a public object store. An attacker could easily
> 
> predict the fingerprint of content likely to be posted, generate alternate 
> content that had the same SHA-1 fingerprint and pre-empt
> 
> the signature. For example: an ISO of an open source OS distribution. If I 
> get my false content with the same fingerprint into the
> 
> repository first then everyone who downloads that ISO will get my altered 
> copy.
> 
> 
>  
> 
> SHA-256 is really needed to make this type of attack infeasible.
> 
>  
> 
> I also think that distributed deduplication works very well with object 
> versioning. Your comments on the proposal cited above
> 
> would be great to hear.
> 
>  
> 
> From: openstack-bounces+caitlin.bestler=nexenta....@lists.launchpad.net 
> [mailto:openstack-bounces+caitlin.bestler=nexenta....@lists.launchpad.net] On 
> Behalf Of Paulo Ricardo Motta Gomes
> Sent: Thursday, March 08, 2012 1:19 PM
> To: openstack@lists.launchpad.net
> 
> 
> Subject: [Openstack] Enabling data deduplication on Swift
> 
>  
> 
> Hello everyone,
> 
>  
> 
> I'm a student of the European Master in Distributed Computing (EMDC) 
> currently working on my master thesis on distributed content-addressable 
> storage/deduplication.
> 
>  
> 
> I'm happy to announce I will be contributing the outcome of my thesis work to 
> OpenStack by enabling both object-level and block-level deduplication 
> functionality on Swift (https://answers.launchpad.net/swift/+question/156862).
> 
>  
> 
> I have written a detailed blog post where I describe the initial architecture 
> of my solution: 
> http://paulormg.com/2012/03/05/enabling-deduplication-in-a-distributed-object-storage/
> 
>  
> 
> Feedback from the OpenStack/Swift community would be very appreciated.
> 
>  
> 
> Cheers,
> 
>  
> 
> Paulo
> 
>  
> 
> -- 
> European Master in Distributed Computing - www.kth.se/emdc
> Royal Institute of Technology - KTH
> 
> Instituto Superior Técnico - IST
> 
> http://paulormg.com
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to