Re: Files with identical SHA1 breaks the repo

Garance A Drosehn Wed, 01 Mar 2017 11:48:43 -0800

On 1 Mar 2017, at 7:18, Daniel Shahaf wrote:

> Stefan Sperling wrote on Wed, Mar 01, 2017 at 11:01:40 +0100:
>> On Tue, Feb 28, 2017 at 10:17:34PM -0600, Greg Stein wrote:
>>> I really like this idea.
>>>
>>> And we could take a copy of APR's sha1 code, and rejigger
>>> it to perform *both* hashes during the same scan of the
>>> raw bytes. I would expect the time taken to extend by
>>> (say) 1.1X rather than a full 2X. The inner loop might
>>> cost a bit more, but we'd only scan the bytes once. Very
>>> handy, when you're talking about megabytes in a stream-y
>>> environment.
>>>
>>> (and medium-term, push this dual-sha1 computation back
>>> into APR)
>>
>> The idea is nice and I would support its application.
>
> I would not.
>
> Using two variants of sha1 is fine for supporting webkit's
> use-case, or to protect admins from disgruntled employees
> who commit the known collision to DoS their employer.
>
> However, when the attacker model is a competent/resourceful
> attacker, using two variants of sha1 only increases the
> attack's complexity by a factor of about 2⁷ ([1, §4.1]),
> and might in fact cause collisions to be _easier_ to find
> (since additional state is output).
>
> So let's not add a variant of sha1.


Note that is not a variant of sha1, which implies that it
is a *different* algorithm.  It's the same algorithm, run
with two different inputs.

The paper you reference:
https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf

is interesting, and some of it might be going over my head.
However, I'll note that it seems to be about:

   "... unwanted dependencies between two slightly
    different instances of the same hash function"

That is not what I am suggesting.  I am suggesting to use
the exact same hash function, but giving the function two
vastly different collections of data.  And not only that,
but there is an ordering dependency between the two sets of
data.  It's mighty hard to add data to the first set (the
entire file) without causing significant disruption in the
*actual data* that will be fed to the second set.

I do not see how this extended-sha1 would be easier to break
than the current sha1, because it includes the current sha1,
unchanged.  And then it adds a second sha1 digest to that,
by effectively hashing a *different file* (the file created
by using only every-other-byte of the original file).

Obviously this is also not perfect, but then there will
never be a hashing algorithm which can perfectly hash all
1-megabyte files of arbitrary data into unique 256-byte
values. But it sure seems like it will make it much harder
to *calculate* a second file which will sum-up to the same
digest values (two values) as any given datafile.

Now as someone else noted, this idea would require a format
change to svn data structures,.  So this would still have
to wait for the next new version of svn.  But it might be
faster to use this tactic instead of making the much more
substantial change of totally dropping sha1 for sha256.

(I don't mean it will necessarily be faster to implement,
but just that it might run faster in day-to-day operations)

-- 
Garance Alistair Drosehn                =     [email protected]
Senior Systems Programmer               or   [email protected]
Rensselaer Polytechnic Institute;             Troy, NY;  USA

Re: Files with identical SHA1 breaks the repo

Reply via email to