On Wed, Oct 3, 2012 at 9:19 AM, Dr Adam Back <a...@cypherspace.org> wrote: > Incidentally a somewhat related problem with dedup (probably more in cloud > storage than local dedup of storage) is that the dedup function itself can > lead to the "confirmation" or even "decryption" of documents with > sufficiently low entropy as the attacker can induce you to "store" or > directly query the dedup service looking for all possible documents. eg say > a form letter where the only blanks to fill in are the name (known > suspected) and a figure (<1,000,000 possible values). > > Also if there is encryption there are privacy and security leaks arising > from doing dedup based on plaintext.
Compression at lower layers tends to leak. We've seen this in VOIP, and now CRIME. Dedup is a compression function running at a lower layer (i.e., lower than the application writing the file contents). Of course, dedup is not a compression function that is easily applied at the application layer, so if you really need dedup, then you need it at lower layers. The question is: do you need dedup and confidentiality protection for the same data? I think most would answer "no". > And if you are doing dedup on ciphertext (or the data is not encrypted), you > could follow David's suggestion of HMAC-SHA1 or the various AES-MACs. In > fact I would suggest for encrypted data, you really NEED to base dedup on > MACs and NOT hashes or you leak and risk bruteforce "decryption" of > plaintext by hash brute-forcing the non-encrypted dedup tokens. Encrypted ZFS hashes and authenticates ciphertext. The attacker is presumed to observe all on-disk data, including ciphertext, block pointers (which contain authentication tags and hashes), ... The attacker can observe dups as well as ZFS, and can attempt passive and active attacks. Dedup certainly adds to the attacker's traffic analysis capabilities, but also to the attacker's active attack capabilities (e.g., if the attacker can mount a chosen plaintext attack). Note that encrypted ZFS can only dedup within sets of datasets that share the same keys. What difference does it make if dedup uses an authentication tag or a hash of ciphertext? Assuming no collisions anyways, and if dups are verified then collisions make little difference as far as dedup is concerned. I think the harm is done first by compressing and encrypting at a layer lower than the application; encryption can be done at lower layers, but compression is best left to the application layer. Nico -- _______________________________________________ cryptography mailing list cryptography@randombit.net http://lists.randombit.net/mailman/listinfo/cryptography