svnadmin_tests.py

Evgeny Kotkov Fri, 31 Jan 2014 04:29:45 -0800

Hi Stefan,

> New Revision: 1561427
>
> URL: http://svn.apache.org/r1561427
> Log:
> Fix hotcopy code for pre-1.4 FSFS repositories.
>
> * subversion/libsvn_fs_fs/hotcopy.c
>   (hotcopy_update_current): Bump the revision number *before* trying
>                             to read the new rev(s) from the target repo.
>   (hotcopy_body): Prevent multiple, expensive max ID scans.
>
> * subversion/tests/cmdline/svnadmin_tests.py
>   (fsfs_hotcopy_old_non_empty): Should pass now.


Please correct me if I am wrong, but as far as I can tell, this changeset
does not actually fix the first problem I described in
http://svn.haxx.se/dev/archive-2014-01/0089.shtml

The fundamental problem lies in the way recover_find_max_ids is wrapped by
the svn_fs_fs__find_max_ids() function.  This wrapper was added in r1503742 [1]
and is now used within the hotcopy and recovery code for FSFS format 1/2
repositories.

The synopsis of the recover_find_max_ids() routine is quite simple: it recovers
the MAX_NODE_ID / MAX_COPY_ID based solely on the given revision file
and the offset within that file.  In order to make everything work, callers in
the 1.8.x code had to open the corresponding rev file (open_pack_or_rev_file()),
determine the root offset within it and pass them to recover_find_max_ids.
The new wrapper (svn_fs_fs__find_max_ids()), however, does something a bit
more sophisticated.  In order to retrieve the root offset, it calls the
svn_fs_fs__rev_get_root() function, which *checks the revision for existence*
by peeking at the db/current contents.

The last difference is critical, because it renders the recovery procedure
useless for old repositories.  The core of the recovery lies in the ability to
recreate the db/current file (which can be missing, out-of-date or even broken)
from the latest revision in the repository.  Thus, requiring the contents
db/current to be valid in order to be able to run the recovery does not really
make sense to me.  The real-world issue with recovery can be reproduced in
trunk@r1563090 by running 'svnadmin recover' for non-empty FSFS format 1/2
repository with absent db/current file or by executing the svnadmin_tests.py#13
test with --server-minor-version=3:
[[[
  svnadmin: E160006: No such revision 1

  SVNUnexpectedStderr: ['svnadmin: E160006: No such revision 1\n']
  FAIL:  svnadmin_tests.py 13: recover a repository (FSFS only)
]]]

Observed problems with hotcopy/recover for old repositories are a direct
consequence of the broken assumption in the svn_fs_fs__find_max_ids() routine
I am talking about.  So, fixing this assumption should automatically fix the
erroneous user-visible behavior.  Although the "hotcopy" part of the problem
seems to be gone with this changeset (see my comments below), the
"recovery" part is still not.

> +    {
> +      /* Make sure NEW_YOUNGEST is a valid revision in DST_FS.
> +         Because a crash in hotcopy requires at least a full recovery run,
> +         it is safe to temporarily reset the max IDs to 0. */
> +      SVN_ERR(svn_fs_fs__write_current(dst_fs, new_youngest, next_node_id,
> +                                      next_copy_id, scratch_pool));
> +
> +      SVN_ERR(svn_fs_fs__find_max_ids(dst_fs, new_youngest,
> +                                      &next_node_id, &next_copy_id,
> +                                      scratch_pool));

I have other concerns about this change.  Currently, hotcopy_update_current
behaves the following way for old repositories:
[[[
  (calls svn_fs_fs__write_current() with the correct youngest revision, but
   with invalid node ids, which are both set to zero.  at this point the
   content of the db/current looks like "29 0 0", where 29 is the revision
   number)
  ...
  (recovers the maximum ids using svn_fs_fs__find_max_ids(), which now passes
   the svn_fs_fs__ensure_revision_exists() check due to the previous step)
  ...
  (bumps the recovered maximum ids to get the next of each)
  ...
  (finally, calls the svn_fs_fs__write_current() with the same youngest revision
   number and with valid ids recovered in the previous step.  this results in
   rewriting the previous "29 0 0" contents of the db/current file with
   something more appropriate like "29 k 2")
]]]

Here is what I think:

- Using this approach means we would have to do something similiar for
  every calling site of the svn_fs_fs__find_max_ids() just in order to satisfy
  the broken contract.  Forget one place and bang, you're dead (the same way it
  is now with the recovery).

- Overall, this looks like fitting a square in a circle.  We are writing the
  db/current file twice in a place it should clearly be written once.  If the
  recovery procedure fails (access denied when trying to open the latest
  revision file?), the hotcopy destination will be left in an inconsistent state
  with db/current having something like "29 0 0".  The repository would seem to
  be fine, but committing to it could easily provoke clashing node ids.   If the
  recovery procedure fails during --incremental hotcopy, we would again end up
  with broken db/current file.  Without this change, everything would have been
  fine in the latter case, because the hotcopy did not change the db/current
  contents until the very end:
  [[[
    # ensure_revision_exists(), ^/branches/1.8.x/subversion/libsvn_fs_fs/fs_fs.c

    FSFS is based around the concept that commits only take effect when
    the number in "current" is bumped.  Thus if there happens to be a rev
    or revprops file installed for a revision higher than the one recorded
    in "current" (because a commit failed between installing the rev file
    and bumping "current", or because an administrator rolled back the
    repository by resetting "current" without deleting rev files, etc), it
    ought to be completely ignored.
  ]]]

- The last, but not the least, it does not look like backporting this hunk to
  1.8.x is really required.  The problem is in trunk, not in 1.8.x.

Personally, I can think of three possible ways of solving the "real" problem.
Implementing any of them would allow us to drop the "write wrong db/current -
recover ids - correctly rewrite db/current" from hotcopy_update_current() and
simultaneously fix the "No such revision" error for both hotcopy and recovery:

1) Rework the libsvn_fs_fs/fs_fs.c splitting (into recovery.c and cached_data.c)
   done in r1503742 [1] and r1503692 [2] in order to restore the 1.8.x contract
   of the maximum ids recovery routine.

2) Introduce an intermediate helper effectively doing the same as the
   get_root_changes_offset() function.  Use it within svn_fs_fs__find_max_ids().

3) Rework the contract of svn_fs_fs__rev_get_root() by moving the
   ensure_revision_exists() check to the callers (there are three of them).
   Then remove this check for svn_fs_fs__find_max_ids().  This change would
   effectively fix the broken assumption about the db/current file, however, it
   would also morph the svn_fs_fs__rev_get_root() function into something like
   "give me the root id for this rev and ignore the fact that this rev might not
   be a part of the repository, according to db/current".  I tend to think that
   this approach smells bad, but at least there is something to choose from.

I am currently working on a patch for this problem and (hopefully) will be able
to provide it in the nearby future.

[1] http://svn.apache.org/viewvc?view=revision&revision=r1503742
[2] http://svn.apache.org/viewvc?view=revision&revision=r1503692


Thanks and regards,
Evgeny Kotkov

Re: svn commit: r1561427 - in /subversion/trunk/subversion: libsvn_fs_fs/hotcopy.c tests/cmdline/svnadmin_tests.py

Reply via email to