> -----Original Message----- > From: Julian Foad [mailto:julian.f...@wandisco.com] > Sent: dinsdag 24 augustus 2010 13:04 > To: Bert Huijben > Cc: 'Philip Martin'; dev@subversion.apache.org > Subject: RE: svn commit: r988074 - in > /subversion/trunk/subversion/tests/cmdline: svntest/wc.py > upgrade_tests.py > > Bert Huijben wrote: > > Philip Martin wrote: > > > "Bert Huijben" <b...@vmoo.com> writes: > > > > > > >> * subversion/tests/cmdline/upgrade_tests.py > > > >> (text_base_path): Restore MD5 support removed in r960036. > > > > > > > > I think the real fix would be to upgrade to SHA1 (and add the > > > > mapping in the pristines table) in the upgrade step. I expected that > > > > this was already handled? > > > > > > Yes, that needs to happen, and no, it doesn't happen yet. The new > > > code stores SHA1 on checkout/update but the upgrade code simply > copies > > > the MD5 and doesn't do MD5 to SHA1 conversion. I discussed this with > > > Julian on IRC yesterday, the plan is to remove the MD5 support > > > eventually. > > > > > > There are two cases to consider, upgrade from 1.6 to latest and > > > upgrade from older 1.7 to latest. For the older 1.7 upgrade we can > > > simply use the PRISTINE table to replace the MD5 with the > > > corresponding SHA1 in the bump_to_19 code. > > > > > > The 1.6 upgrade is a bit harder. We can do the text-base to pristine > > > before doing the entries file, so that the PRISTINE table is > > > available, > > If, instead, we construct each the PRISTINE table entry at the point > where we're converting an entry from the entries file, then we can > calculate both checksums on the fly, and we can store both of them in > the new DB row(s). That's true even for those few pristines that don't > have any checksum in the 'entries' file.
1.0.0 working copies have no checksums at all if I remembered correctly and we certainly have to upgrade those WCs. Same recipe for all files with a revert base. > Maybe that makes the code flow harder, but it sounds easier than > maintaining an intermediate store of checksums. > > > but the table is not currently indexed on MD5. As there is > > > now only one table per wc it might be too slow if there are lots of > > > files. We may need an MD5 index, as part of PRISTINE or separate, > > > just for the duration of the upgrade. > > *If* we were to use that method (but see below), and *if* it does turn > out to be too slow, then adding an index would be an easy change. I > don't think we need to hesitate from using MD5 look-ups on that account. > > > The bump_to_19 code can do the > > > MD5 to SHA1 conversion before switching to single-db, the table is > > > smaller and may not need an MD5 index (and the bump_to_19 code > simply > > > isn't as important as the 1.6 upgrade code). > > > > In the old entries format we only kept one checksum, while we can have > two > > pristine files, so just keeping it as MD5 can't solve all the issues. > > But we can't just assume that we never see a collision with MD5 over an > > entire tree.. or we wouldn't have switched to SHA1 in the first place. > > MD5 collisions during upgrading an existing WC? A remote possibility of > course, but yes, let's try to avoid that possibility. If MD5 look-up > was the only practical way forward, especially if it were per-directory, > then I wouldn't be too concerned about handling collisions gracefully > and think we would only need to detect them and bail out with an > apologetic message. > > For upgrades from 1.7-dev versions, I think we should be happy to accept > the possibility of MD5 collisions. For dev versions no problem, but from upgrades below from format 12 (=last entries files version) or older we should/must do the right thing. (See the other mail: Just make the intermediate versions use the python script. These users knew that this was an option when they started using trunk) Bert