I remember some experiments in early development of WC-NG where we measured
which checksums worked vs which ones were too expensive. Going to the SHA1
family was at least 5 times more expensive or so…
We determined back then SHA1 was good enough for our use and that of our users
‘except for those doing collision research’.
Just adding more checksums internally, because we can won’t help our users… The
only real solution is doing full comparisons when checksums match… Which
virtually never happens. It happened for the first time now, so most likely
never before for all of the Subversion users together.
This is how we used MD5 before… But we determined SHA1 would be good enough to
avoid this, even when such a collision would be found… as it is today.
I don’t think this incident changes those original ideas about which hash is
good enough… Perhaps some careful re-evaluation is necessary, but I don’t think
we should just ‘fix this’ by bumping everything to the next hashtype.
This ‘just use a more expensive hash’ may be a good approach for other users of
hashes, but I don’t think we want to make every common Subversion operations
much slower because there is one collision found using an insane amount of
CPU/GPU power.
Of course we should fix things to not break, but that is a different story.
Bert
Sent from Mail for Windows 10
From: Stefan Sperling
Sent: vrijdag 24 februari 2017 17:10
To: Andreas Stieger
Cc: Subversion Development
Subject: Re: Files with identical SHA1 breaks the repo
On Fri, Feb 24, 2017 at 04:17:44PM +0100, Andreas Stieger wrote:
> Hi,
>
> "Stefan Hett" wrote:
> > On 2/23/2017 9:02 PM, Øyvind A. Holm wrote:
> > > This is the only known SHA-1 collision at the moment, but Google will
> > > release the collision code in 90 days, so we can expect this not to last
> > > forever.
> > Reading up on that in an article on a German magazine [1] clarifies that
> > the effort to create that hash still quite large (6500 CPU years + 100
> > GPU years to calculate the collision). So this relativates the impact a bit.
> > Certainly I'm not trying to say that the situation on SVN's side
> > should/could not be improved, though.
> >
> > [1]
> > https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html
>
> An occurrence of this issue in a production repository with the published
> PDFs:
> https://bugs.webkit.org/show_bug.cgi?id=168774#c29
>
> Andreas
Well, what did they expect? Did they expect that all software which is
part of their toolchain has ever been tested with files that produce
a SHA1 collision? Nobody had such files until yesterday...
They should have tried this on a test repository first.
Anyway, so SVN has multiple problems with SHA1 collisions.
One problem is that the libsvn_wc code does the wrong thing when SHA1
hashes match but MD5 hashes do not. The error on checkout is happening
because pristines are keyed on SHA1, and only one pristine is saved:
$ ls .svn/pristine/
38/
$ ls .svn/pristine/38/
38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
$ sha1 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
SHA1 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) =
38762cf7f55934b34d179ae6a4c80cadccbb7f0a
$ md5 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
MD5 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) =
ee4aa52b139d925f8d8884402b0a750c
By design, the current working copy format cannot store both of these PDFs.
This is hard to solve without a working copy format bump :-/
The best fix would probably be moving libsvn_wc to SHA256 or SHA3.
FSFS looks alright. The node records for these two PDFs look like this:
[[[
id: 0-1.0.r1/5
type: file
count: 0
text: 1 3 381130 422435 ee4aa52b139d925f8d8884402b0a750c
38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_3
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-1.pdf
copyroot: 0 /
id: 2-1.0.r1/6
type: file
count: 0
text: 1 3 381130 422435 5bd9d8cabc46041579a311230539b8d1
38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_4
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-2.pdf
copyroot: 0 /
]]]
We should look into making the FSFS code make use of both checksums to
handle ambiguities. It seems about time to add SHA256 and/or SHA3 as well.
'svnadmin load' fails, too:
$ svnadmin create repo2
$ vi repo
repo/ repo2/
$ vi repo2/db/fs
fs-type fsfs.conf
$ vi repo2/db/fsfs.conf # disable rep-sharing
$ svnadmin dump repo > repo.dump
* Dumped revision 0.
* Dumped revision 1.
$ svnadmin load repo2 < repo.dump
<<< Started new transaction, based on original revision 1
* editing path : shattered-1.pdf ... done.
* editing path : shattered-2.pdf ...subversion/libsvn_repos/load.c:709,
subversion/libsvn_repos/load.c:351,
subversion/libsvn_subr/stream.c:273,
subversion/libsvn_subr/checksum.c:658: (apr_err=SVN_ERR_CHECKSUM_MISMATCH)
svnadmin: E200014: Checksum mismatch for '/shattered-2.pdf':
expected: 5bd9d8cabc46041579a311230539b8d1
actual: ee4aa52b139d925f8d8884402b0a750c
Again, the dump file looks OK. This problem occurs somewhere in the
commit processing path. No time to debug this ATM.