A customer found that 'svnsync' errored out on trying to sync a revision in 
which the source repository introduced some mergeinfo starting with r0, similar 
to this example:

  $ svn propget svn:mergeinfo ^/foo@1234567
  /bar:0-1000000,1111111,1222222

The svnsync error message was:

  $ svnsync sync ...
  svnsync: E175008: At least one property change failed; repository is unchanged
  svnsync: E175002: Error setting property 'svn:mergeinfo':
  Could not execute PROPPATCH.

We believe this mergeinfo entered the repository through a 1.6 server. It was 
committed in mid-2012. 1.7.x servers reject such commits, as do the later 1.6.x 
servers [1]. Probably 1.6 clients were in use too, although it may have been 
committed from a non-Subversion client such as git-svn.

Anyhow, the situation is that we have at least one Subversion repository 
containing data that the 1.7 server tools reject as invalid. 'svnsync' errors 
out. Even 'svnadmin dump' errors out at this revision if you specify any 
non-zero start revision, because it parses the mergeinfo to see if it points to 
a revision earlier than the start rev. Like this:

  $ svnadmin dump --incremental -r5 repo
  svnadmin: E200020: Invalid revision number '0' found in range list

What is our migration path for this data?

We can figure out a short term work-around, perhaps using the unsupported 
"SVNSYNC_UNSUPPORTED_STRIP_MERGEINFO" environment variable to bypass the 
mergeinfo change for each revision that adds or changes such mergeinfo, if 
there aren't too many of them and if they aren't present on active branches. 
(We can write a pre-commit hook to make svnsync stop after just one revision, 
since it doesn't have a --revision option.)

But for a proper fix?

In the past we decided that some known ways in which mergeinfo can be malformed 
should be silently corrected or tolerated.

  leading slash is required
    => in some cases we add it if missing

  path-revs pointing to non-existent node-revs
    => in some cases, such as a dump-load cycle, we remove these ranges

  revision zero
    => in some cases, such as dump, we error out on reading it

  other parse errors
    => in some cases we error out
    => in the case of a merge, we report "invalid mergeinfo: merge tracking not 
possible"
    => in some cases we ignore the error in parsing whatever amount of 
mergeinfo we were trying to parse and skip the processing we were going to do 
(issue #3896)

This all makes me a bit uneasy. We seem to have a number of data 
transformations going on at quite a low level, and I'm not sure what the 
canonical position is. I would like us to have a definition of what constitutes 
"the same mergeinfo" in a repository before and after dump/load, and a way of 
testing that.

Philip pointed out that it's a good idea for 'dump' to dump whatever is 
present, and not error out and not try to correct or normalize it. If any 
correction or normalization is to be done, 'load' is a better place to do it. 
That minimizes the risk of a damaged archive due to bugs, if you archive the 
dump file.

Clearly there are some things we should do:

  * Make 'dump' not error out, but rather ignore the broken mergeinfo for the 
purposes of the warning that it refers to older revisions.

Other changes?

  * Make 'svnsync sync' strip out the revision 0 from that mergeinfo? Or make 
it strip out the whole mergeinfo property if it fails to parse? Or just that 
line of the property value? (If we do something like that, I'd like us to do it 
everywhere we ignore bad mergeinfo, not just here.)

Thoughts?

- Julian

[1] Servers >= 1.6.18 for HTTP, >= 1.6.17 for the other protocols, reject 
unparseable mergeinfo -- see issue 
<http://subversion.tigris.org/issues/show_bug.cgi?id=3895>.

--
Join WANdisco's free daily demo sessions on Scaling Subversion for the 
Enterprise
<http://www.wandisco.com/training/webinars>

Reply via email to