hedging our bets -- in case SHA-256 turns out to be insecure

Zooko Wilcox-O'Hearn Sun, 08 Nov 2009 10:08:57 -0800

Folks:

We're going to be deploying a new crypto scheme in Tahoe-LAFS nextyear -- the year 2010. Tahoe-LAFS is used for long-term storage, andI won't be surprised if people store files on Tahoe-LAFS in 2010 andthen rely on the confidentiality and integrity of those files formany years or even decades to come. (People started storing files onTahoe-LAFS in 2008 and so far they show no signs of losing interestin the integrity and confidentiality of those files.)

This long-term focus makes Tahoe-LAFS's job harder than the job ofprotecting transient network packets. If someone figures out in 2020or 2030 how to spoof a network transaction that you sent in 2010 (see[1]), it'll be far too late to do you any harm, but if they figureout in 2030 how to alter a file that you uploaded to a Tahoe-LAFSgrid in 2010, that might harm you.

Therefore I've been thinking about how to make Tahoe-LAFS robustagainst the possibility that SHA-256 will turn out to be insecure.

A very good way to do this is to design Tahoe-LAFS so that it reliesas little as possible on SHA-256's security properties. The propertythat seems to be the hardest for a secure hash function to provide iscollision-resistance. We are analyzing new crypto schemes to see howmany security properties of Tahoe-LAFS we can continue to guaranteeeven if the collision-resistance of the underlying secure hashfunction fails, and similarly for the other properties of the securehash function which might fail [2].

This note is not about that design process, though, but about how tomaximize the chance that the underlying hash function does providethe desired security properties.

We could use a different hash function than SHA-256 -- there are manyalternatives. SHA-512 would probably be safer, but it is extremelyexpensive on the cheap, low-power 32-bit ARM CPUs that are one of ourdesign targets [3], and the output size of 512 bits is too large tofit into Tahoe-LAFS capabilities. There are fourteen candidates leftin the SHA-3 contest at the moment. Several of them haveconservative designs and good performance, but there is always therisk that they will be found to have catastrophic design flaws orthat a great advance in hash function cryptanalysis will suddenlyshow how to crack them. Of course, a similar risk applies to SHA-256!

So I turn to the question of how to combine multiple hash functionsto build a hash function which is secure even if one or more of theunderlying hash functions turns out to be weak.

I've read several interesting papers on the subject -- such as [4, 5]and especially "Robust Multi-Property Combiners for Hash FunctionsRevisited" by Marc Fischlin, Anja Lehmann, and Krzysztof Pietrzak[6]. The good news is that it turns out to be doable! The lattertwo papers show nice strong theoretical results -- ways to combinehash functions so that the resulting combination is as strong orstronger than the two underlying hash functions. The bad news isthat the proposal in [6] builds a combined function whose output istwice the size of the output of a single hash function. There is agood theoretical reason for this [4], but it won't work for ourpractical engineering requirements -- we need hash function outputsas small as possible (partially due to usability issues)

The other bad news is that the construction proposed in [6] iscomplicated, underspecified, and for the strongest version of it, itimposes a limit on the length of the inputs that you can feed to yourhash function. It grows to such complexity and incurs suchlimitations because it is, if I may call it this, "too theoretical".It is designed to guarantee certain output properties predicated onminimal theoretical assumptions about the properties of theunderlying hash functions. This is a fine goal, but in practice wedon't want to pay such a high cost in complexity and performance inorder to gain such abstract improvement. We should be able to "hedgeour bets" and achieve a comfortable margin of safety with a verysimple and efficient scheme by making stronger, less formal, but veryplausible assumptions about the underlying hash functions. Read on.

I propose the following combined hash function C, built out of twohash functions H1 and H2:


C(x) = H1(H1(x) || H2(x))

The first observation is that if H1 is collision-resistant then so isC. In practice I would expect to use SHA-256 for H1, so theresulting combiner C[SHA-256, H2] will be at least as strong asSHA-256. (One could even think of this combiner C as just being atricky way to strengthen SHA-256 by using the output of H2(x) as arandomized salt -- see [7].)

The next observation is that finding a pair of inputs x1, x2 whichcollide in *both* H1 and in H2 is likely to be much harder thanfinding a pair of inputs that collide in H1 and finding a pair ofinputs that collide in H2 (see [5]).

Now the reason that a combiner like this one is not published intheoretical crypto literature is that it obviously could fail if theouter hash function H1 fails. For example, even if H2 is collision-resistant, if H1 turns out to be susceptible to collisions, thentheoretically speaking C[H1, H2] might be susceptible to collisions.However, in real life C[H1, H2] would most likely still be collisionresistant!

All practical attacks on real hash functions so far (if I understandcorrectly) are multi-block attacks in which the attacker is able tofeed a sufficiently long and unconstrained input to the hashfunctions that the effects of the later parts of his inputs are ableto manipulate the state generated by the earlier parts of hisinputs. My combiner C uses H1 in its outer invocation on a single-block-sized input, which means no such multi-block attacks arepossible on the outer invocation. In addition, the inputs that theattacker gets to feed to the outer invocation of H1 are highlyconstrained. Basically, he would already have to be very good atmanipulating the inner invocations H1 and H2 in ways that he isn'tsupposed to before he can even begin to manipulate the outerinvocation of H1.

A measure of the practical security of a combiner like this one wouldbe "how safe would it be if it were instantiated using brokenpractical hash functions such as MD5 and SHA1?". It appears to me(from an admittedly cursory analysis) that there is no realistic wayto find collisions in C[MD5, SHA1] even though there are realisticways to find collisions in MD5 and in SHA1. Of course, I'm notproposing to use C[MD5, SHA1]! I'm proposing to use C[SHA-256, _]where _ is some other hash function which is believed to be strong.The example of instantiating C with MD5 and SHA1 just goes to showthat C is a hash function which is stronger than either of its twounderlying hash functions.

The other desirable security properties such as second-preimageresistance and pre-image resistance seem to follow the same patternas collision-resistance -- C[H1, H2] seems to be much stronger thanH1 or H2 alone.


Regards,

Zooko

[1] http://extendedsubset.com/Renegotiating_TLS.pdf
[2] http://allmydata.org/trac/tahoe/wiki/NewCaps/WhatCouldGoWrong
[3] http://bench.cr.yp.to/results-hash.html#arm-apollo

[4] Krzysztof Pietrzak: "Non-Trivial Black-Box Combiners forCollision-Resistant Hash-Functions don't Exist"[5] Jonathan J. Hoch, Adi Shamir: "On the Strength of theConcatenated Hash Combiner when All the Hash Functions are Weak"[6] Marc Fischlin, Anja Lehmann, Krzysztof Pietrzak: "Robust Multi-Property Combiners for Hash Functions Revisited"

[7] http://webee.technion.ac.il/~hugo/rhash/rhash.pdf

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majord...@metzdowd.com

hedging our bets -- in case SHA-256 turns out to be insecure

Reply via email to