Hi Stefan, > New Revision: 1561427 > > URL: http://svn.apache.org/r1561427 > Log: > Fix hotcopy code for pre-1.4 FSFS repositories. > > * subversion/libsvn_fs_fs/hotcopy.c > (hotcopy_update_current): Bump the revision number *before* trying > to read the new rev(s) from the target repo. > (hotcopy_body): Prevent multiple, expensive max ID scans. > > * subversion/tests/cmdline/svnadmin_tests.py > (fsfs_hotcopy_old_non_empty): Should pass now.
Please correct me if I am wrong, but as far as I can tell, this changeset does not actually fix the first problem I described in http://svn.haxx.se/dev/archive-2014-01/0089.shtml The fundamental problem lies in the way recover_find_max_ids is wrapped by the svn_fs_fs__find_max_ids() function. This wrapper was added in r1503742 [1] and is now used within the hotcopy and recovery code for FSFS format 1/2 repositories. The synopsis of the recover_find_max_ids() routine is quite simple: it recovers the MAX_NODE_ID / MAX_COPY_ID based solely on the given revision file and the offset within that file. In order to make everything work, callers in the 1.8.x code had to open the corresponding rev file (open_pack_or_rev_file()), determine the root offset within it and pass them to recover_find_max_ids. The new wrapper (svn_fs_fs__find_max_ids()), however, does something a bit more sophisticated. In order to retrieve the root offset, it calls the svn_fs_fs__rev_get_root() function, which *checks the revision for existence* by peeking at the db/current contents. The last difference is critical, because it renders the recovery procedure useless for old repositories. The core of the recovery lies in the ability to recreate the db/current file (which can be missing, out-of-date or even broken) from the latest revision in the repository. Thus, requiring the contents db/current to be valid in order to be able to run the recovery does not really make sense to me. The real-world issue with recovery can be reproduced in trunk@r1563090 by running 'svnadmin recover' for non-empty FSFS format 1/2 repository with absent db/current file or by executing the svnadmin_tests.py#13 test with --server-minor-version=3: [[[ svnadmin: E160006: No such revision 1 SVNUnexpectedStderr: ['svnadmin: E160006: No such revision 1\n'] FAIL: svnadmin_tests.py 13: recover a repository (FSFS only) ]]] Observed problems with hotcopy/recover for old repositories are a direct consequence of the broken assumption in the svn_fs_fs__find_max_ids() routine I am talking about. So, fixing this assumption should automatically fix the erroneous user-visible behavior. Although the "hotcopy" part of the problem seems to be gone with this changeset (see my comments below), the "recovery" part is still not. > + { > + /* Make sure NEW_YOUNGEST is a valid revision in DST_FS. > + Because a crash in hotcopy requires at least a full recovery run, > + it is safe to temporarily reset the max IDs to 0. */ > + SVN_ERR(svn_fs_fs__write_current(dst_fs, new_youngest, next_node_id, > + next_copy_id, scratch_pool)); > + > + SVN_ERR(svn_fs_fs__find_max_ids(dst_fs, new_youngest, > + &next_node_id, &next_copy_id, > + scratch_pool)); I have other concerns about this change. Currently, hotcopy_update_current behaves the following way for old repositories: [[[ (calls svn_fs_fs__write_current() with the correct youngest revision, but with invalid node ids, which are both set to zero. at this point the content of the db/current looks like "29 0 0", where 29 is the revision number) ... (recovers the maximum ids using svn_fs_fs__find_max_ids(), which now passes the svn_fs_fs__ensure_revision_exists() check due to the previous step) ... (bumps the recovered maximum ids to get the next of each) ... (finally, calls the svn_fs_fs__write_current() with the same youngest revision number and with valid ids recovered in the previous step. this results in rewriting the previous "29 0 0" contents of the db/current file with something more appropriate like "29 k 2") ]]] Here is what I think: - Using this approach means we would have to do something similiar for every calling site of the svn_fs_fs__find_max_ids() just in order to satisfy the broken contract. Forget one place and bang, you're dead (the same way it is now with the recovery). - Overall, this looks like fitting a square in a circle. We are writing the db/current file twice in a place it should clearly be written once. If the recovery procedure fails (access denied when trying to open the latest revision file?), the hotcopy destination will be left in an inconsistent state with db/current having something like "29 0 0". The repository would seem to be fine, but committing to it could easily provoke clashing node ids. If the recovery procedure fails during --incremental hotcopy, we would again end up with broken db/current file. Without this change, everything would have been fine in the latter case, because the hotcopy did not change the db/current contents until the very end: [[[ # ensure_revision_exists(), ^/branches/1.8.x/subversion/libsvn_fs_fs/fs_fs.c FSFS is based around the concept that commits only take effect when the number in "current" is bumped. Thus if there happens to be a rev or revprops file installed for a revision higher than the one recorded in "current" (because a commit failed between installing the rev file and bumping "current", or because an administrator rolled back the repository by resetting "current" without deleting rev files, etc), it ought to be completely ignored. ]]] - The last, but not the least, it does not look like backporting this hunk to 1.8.x is really required. The problem is in trunk, not in 1.8.x. Personally, I can think of three possible ways of solving the "real" problem. Implementing any of them would allow us to drop the "write wrong db/current - recover ids - correctly rewrite db/current" from hotcopy_update_current() and simultaneously fix the "No such revision" error for both hotcopy and recovery: 1) Rework the libsvn_fs_fs/fs_fs.c splitting (into recovery.c and cached_data.c) done in r1503742 [1] and r1503692 [2] in order to restore the 1.8.x contract of the maximum ids recovery routine. 2) Introduce an intermediate helper effectively doing the same as the get_root_changes_offset() function. Use it within svn_fs_fs__find_max_ids(). 3) Rework the contract of svn_fs_fs__rev_get_root() by moving the ensure_revision_exists() check to the callers (there are three of them). Then remove this check for svn_fs_fs__find_max_ids(). This change would effectively fix the broken assumption about the db/current file, however, it would also morph the svn_fs_fs__rev_get_root() function into something like "give me the root id for this rev and ignore the fact that this rev might not be a part of the repository, according to db/current". I tend to think that this approach smells bad, but at least there is something to choose from. I am currently working on a patch for this problem and (hopefully) will be able to provide it in the nearby future. [1] http://svn.apache.org/viewvc?view=revision&revision=r1503742 [2] http://svn.apache.org/viewvc?view=revision&revision=r1503692 Thanks and regards, Evgeny Kotkov

