On Fri, Feb 24, 2017 at 04:17:44PM +0100, Andreas Stieger wrote: > Hi, > > "Stefan Hett" wrote: > > On 2/23/2017 9:02 PM, Øyvind A. Holm wrote: > > > This is the only known SHA-1 collision at the moment, but Google will > > > release the collision code in 90 days, so we can expect this not to last > > > forever. > > Reading up on that in an article on a German magazine [1] clarifies that > > the effort to create that hash still quite large (6500 CPU years + 100 > > GPU years to calculate the collision). So this relativates the impact a bit. > > Certainly I'm not trying to say that the situation on SVN's side > > should/could not be improved, though. > > > > [1] > > https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html > > An occurrence of this issue in a production repository with the published > PDFs: > https://bugs.webkit.org/show_bug.cgi?id=168774#c29 > > Andreas
Well, what did they expect? Did they expect that all software which is part of their toolchain has ever been tested with files that produce a SHA1 collision? Nobody had such files until yesterday... They should have tried this on a test repository first. Anyway, so SVN has multiple problems with SHA1 collisions. One problem is that the libsvn_wc code does the wrong thing when SHA1 hashes match but MD5 hashes do not. The error on checkout is happening because pristines are keyed on SHA1, and only one pristine is saved: $ ls .svn/pristine/ 38/ $ ls .svn/pristine/38/ 38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base $ sha1 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base SHA1 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = 38762cf7f55934b34d179ae6a4c80cadccbb7f0a $ md5 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base MD5 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = ee4aa52b139d925f8d8884402b0a750c By design, the current working copy format cannot store both of these PDFs. This is hard to solve without a working copy format bump :-/ The best fix would probably be moving libsvn_wc to SHA256 or SHA3. FSFS looks alright. The node records for these two PDFs look like this: [[[ id: 0-1.0.r1/5 type: file count: 0 text: 1 3 381130 422435 ee4aa52b139d925f8d8884402b0a750c 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_3 props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883 cpath: /shattered-1.pdf copyroot: 0 / id: 2-1.0.r1/6 type: file count: 0 text: 1 3 381130 422435 5bd9d8cabc46041579a311230539b8d1 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_4 props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883 cpath: /shattered-2.pdf copyroot: 0 / ]]] We should look into making the FSFS code make use of both checksums to handle ambiguities. It seems about time to add SHA256 and/or SHA3 as well. 'svnadmin load' fails, too: $ svnadmin create repo2 $ vi repo repo/ repo2/ $ vi repo2/db/fs fs-type fsfs.conf $ vi repo2/db/fsfs.conf # disable rep-sharing $ svnadmin dump repo > repo.dump * Dumped revision 0. * Dumped revision 1. $ svnadmin load repo2 < repo.dump <<< Started new transaction, based on original revision 1 * editing path : shattered-1.pdf ... done. * editing path : shattered-2.pdf ...subversion/libsvn_repos/load.c:709, subversion/libsvn_repos/load.c:351, subversion/libsvn_subr/stream.c:273, subversion/libsvn_subr/checksum.c:658: (apr_err=SVN_ERR_CHECKSUM_MISMATCH) svnadmin: E200014: Checksum mismatch for '/shattered-2.pdf': expected: 5bd9d8cabc46041579a311230539b8d1 actual: ee4aa52b139d925f8d8884402b0a750c Again, the dump file looks OK. This problem occurs somewhere in the commit processing path. No time to debug this ATM.