On 1 Mar 2017, at 7:18, Daniel Shahaf wrote: > Stefan Sperling wrote on Wed, Mar 01, 2017 at 11:01:40 +0100: >> On Tue, Feb 28, 2017 at 10:17:34PM -0600, Greg Stein wrote: >>> I really like this idea. >>> >>> And we could take a copy of APR's sha1 code, and rejigger >>> it to perform *both* hashes during the same scan of the >>> raw bytes. I would expect the time taken to extend by >>> (say) 1.1X rather than a full 2X. The inner loop might >>> cost a bit more, but we'd only scan the bytes once. Very >>> handy, when you're talking about megabytes in a stream-y >>> environment. >>> >>> (and medium-term, push this dual-sha1 computation back >>> into APR) >> >> The idea is nice and I would support its application. > > I would not. > > Using two variants of sha1 is fine for supporting webkit's > use-case, or to protect admins from disgruntled employees > who commit the known collision to DoS their employer. > > However, when the attacker model is a competent/resourceful > attacker, using two variants of sha1 only increases the > attack's complexity by a factor of about 2⁷ ([1, §4.1]), > and might in fact cause collisions to be _easier_ to find > (since additional state is output). > > So let's not add a variant of sha1.
Note that is not a variant of sha1, which implies that it is a *different* algorithm. It's the same algorithm, run with two different inputs. The paper you reference: https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf is interesting, and some of it might be going over my head. However, I'll note that it seems to be about: "... unwanted dependencies between two slightly different instances of the same hash function" That is not what I am suggesting. I am suggesting to use the exact same hash function, but giving the function two vastly different collections of data. And not only that, but there is an ordering dependency between the two sets of data. It's mighty hard to add data to the first set (the entire file) without causing significant disruption in the *actual data* that will be fed to the second set. I do not see how this extended-sha1 would be easier to break than the current sha1, because it includes the current sha1, unchanged. And then it adds a second sha1 digest to that, by effectively hashing a *different file* (the file created by using only every-other-byte of the original file). Obviously this is also not perfect, but then there will never be a hashing algorithm which can perfectly hash all 1-megabyte files of arbitrary data into unique 256-byte values. But it sure seems like it will make it much harder to *calculate* a second file which will sum-up to the same digest values (two values) as any given datafile. Now as someone else noted, this idea would require a format change to svn data structures,. So this would still have to wait for the next new version of svn. But it might be faster to use this tactic instead of making the much more substantial change of totally dropping sha1 for sha256. (I don't mean it will necessarily be faster to implement, but just that it might run faster in day-to-day operations) -- Garance Alistair Drosehn = dro...@rpi.edu Senior Systems Programmer or g...@freebsd.org Rensselaer Polytechnic Institute; Troy, NY; USA