Sweeping through my TODO list turned up this thread.
I'll commit this last patch tomorrow if there are no objections.
I pause a day only because I recall Mike saying he was planning to
take a look at this.
Paul
On Tue, May 11, 2010 at 4:00 PM, Paul Burba <[email protected]> wrote:
> On Tue, Apr 20, 2010 at 11:08 PM, C. Michael Pilato <[email protected]>
> wrote:
>> Paul Burba wrote:
>>> Hi Mike and Julian,
>>>
>>> Sorry for the delayed response; been working on/thinking about other
>>> mergey stuff lately...
>>>
>>> A few inline comments and then a proposed course of action at the end.
>>>
>>> On Thu, Apr 15, 2010 at 9:28 AM, C. Michael Pilato <[email protected]>
>>> wrote:
>>>> The question then becomes, "How do users deal with legitimate partial dumps
>>>> that will be loaded atop something other than loads from previous
>>>> incremental windows?" I think they do this the same way they handle the
>>>> copy case: either hand-hacking the dumpstream or using 'svndumpfilter'.
>>>> So
>>>> maybe 'svndumpfilter' grows the logic and options required to pare away
>>>> mergeinfo that doesn't meet some criteria (such as "ranges outside the dump
>>>> stream range")?
>>>
>>> I vote for svndumpfilter growing the logic to filter ranges predating
>>> the start of the dump stream in the same way it strips mergeinfo with
>>> merge sources filtered from the dump stream. Basically this means
>>> moving the r927243 functionality to svndumpfilter. No really sure we
>>> need a new option, seems we could overload
>>> --skip-missing-merge-sources to include this behavior. Thoughts?
>>
>> +1
>>
>>>> Julian Foad wrote:
>>>>> On Wed, 2010-04-14, C. Michael Pilato wrote:
>>>>>> Assuming similar behavior for mergeinfo handling, we have, at a minimum:
>>>>>>
>>>>>> 1. 'svnadmin dump' warns when it is in incremental mode and must
>>>>>> generate
>>>>>> mergeinfo from a merge source that predates the beginning of the dump
>>>>>> window, but it's only a warning and the dump continues.
>>>
>>> Why only --incremental mode? We should warn for both --incremental
>>> dumps *and* non-incremental dumps because in the latter case we might
>>> be specifying a dump range -rX:Y where rX>0* and there might be
>>> mergeinfo that refers to revs < X. And while we might want to be
>>> clever about skipping the warning when no range is specified (i.e.
>>> defaulting to -r0:HEAD), even then we should warn because issue #3020
>>> might already have resulted in a r1 with mergeinfo in the starting
>>> repository.
>>
>> You're right, of course. I was thinking "partial dump" and saying
>> "incremental". My bad.
>>
>>>>>> 2. 'svnadmin load' does nothing smart, trusting that the dump it's being
>>>>>> fed is a sensible one.
>>>>> Better: 'svnadmin load' tries to validate the mergeinfo, and issues a
>>>>> warning if it refers to a non-existent source. The admin then gets to
>>>>> figure out whether it was bad in the first place or because of his/her
>>>>> partial-dump/partial-load scenario.
>>>>>
>>>>> However, it depends how efficient the checking is. If that would make
>>>>> the 'load' really slow, I can see that not being wanted.
>>
>> [...]
>>
>>> No unsurprisingly, this took significantly (62%) longer, 00:05:39.609.
>>
>> (My vote was for 'svnadmin load' does nothing smart, and I stand by it.)
>>
>>> Taking what you've had to say into consideration, I propose the following:
>>>
>>> 1) Revert http://svn.apache.org/viewvc?view=revision&revision=927243.
>>> This stops svnadmin load from automatically filtering mergeinfo
>>> referring to revisions outside of the load stream.
>>>
>>> 2) Reapply or reimplement the part of r327243 that accounts for the
>>> case where the first loaded revision in a load stream has mergeinfo,
>>> see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc10
>>> item #1.
>>
>> (r927243, you mean.)
>>
>>> 3) Keep the inline copy-source warnings as they exist today in
>>> svnadmin dump, but add a new end-of-process summary warning e.g.:
>>>
>>> * Dumped revision 504.
>>> * Dumped revision 505.
>>> * Dumped revision 506.
>>> * Dumped revision 507.
>>> WARNING: Referencing data in revision 250, which is older than the oldest
>>> WARNING: dumped revision (504). Loading this dump into an empty
>>> repository
>>> WARNING: will fail.
>>> * Dumped revision 58.
>>> * Dumped revision 59.
>>> * Dumped revision 510.
>>> * Dumped revision 511.
>>> .
>>> .
>>> .
>>> WARNING: The range of revisions dumped contained references to copy
>>> sources outside that range.
>>>
>>> 4) Implement inline and summary functionality analogous to 3) but for
>>> mergeinfo ranges outside of the dump stream, e.g.:
>>>
>>> * Dumped revision 504.
>>> * Dumped revision 505.
>>> * Dumped revision 506.
>>> * Dumped revision 507.
>>> WARNING: Mergeinfo referencing revision(s) prior to revision 250,
>>> which is older
>>> WARNING: than the oldest dumped revision (504). Loading this dump may
>>> result
>>> WARNING: in invalid mergeinfo.
>>> * Dumped revision 58.
>>> * Dumped revision 59.
>>> * Dumped revision 510.
>>> * Dumped revision 511.
>>> .
>>> .
>>> .
>>> WARNING: The range of revisions dumped contained references to
>>> mergeinfo outside that range.
>>>
>>> Also, as mentioned earlier, I think this warning should kick in
>>> regardless of whether --incremental is used.
>>>
>>> 5) Overload svndumpfilter's --skip-missing-merge-sources option to
>>> also remove mergeinfo that predates the starting revision of the dump
>>> stream.
>>
>> +1 all over this stuff.
>>
>>> 6) Open: How does svnadmin load deal with mergeinfo that doesn't exist
>>> in the load steam or the target repos:
>>>
>>> A) It doesn't.
>>>
>>> B) Only checks that the revs are valid (fast, but
>>> can allow mergeinfo that describes invalid
>>> mergesource:rev pairs)
>>>
>>> C) Find an efficient way to test for valid
>>> mergesource:rev pairs
>>
>> For now, A (or maybe B). Then we have our Paul back. :-)
>>
>
> Been down the rabbit hole on-and-off the last couple weeks with this.
> Did everything we discussed so far in this thread, plus a slew of new
> bugs, see http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc15
> and onward.
>
> At any rate, after my recent work we are left with one common(?) use
> case still broken:
>
> Loading a sequence of incremental dumps when the fist load in the
> sequence is into a *non-empty* repostitory.
>
> This is tested in svnadmin_tests.py 20 'don't filter mergeinfo revs
> from incremental dump', specifically see '# PART 4: Load a a series of
> incremental dumps to an non-empty repository."
>
> Basically the problem is this:
>
> 1) Given repository ReposA with X revisions in it.
>
> Note: Previously this use case was also broken if ReposB
> was empty to start, but this now works (assuming none of
> the incremental dumps in step 3 were svndumpfiltered in
> such a way as to remove/renumber revisions).
>
> 2) Given repository ReposB with Y revisions in it.
>
> 3) Incrementally (but fully) dump ReposA
>
> svnadmin dump ReposA -r0:100 > A.0.100.dump
> svnadmin dump ReposA -r101:200 --incremental > A.101.200.dump
> svnadmin dump ReposA -r201:300 --incremental > A.201.300.dump
> .
> .
> .
> svnadmin dump ReposA -rW:X --incremental > A.W.X.dump
>
> 4) Load the ReposA incremental dumps to ReposB
>
> svnadmin load ReposB --parent-dir /some/subtree < A.0.100.dump
> svnadmin load ReposB --parent-dir /some/subtree < A.101.200.dump
> svnadmin load ReposB --parent-dir /some/subtree < A.201.300.dump
> .
> .
> .
> svnadmin load ReposB --parent-dir /some/subtree < A.W.X.dump
>
> The problem arises when one of the incremental loads contains
> references to mergeinfo source revisions that predate the start of the
> dump. For example, say A.201.300.dump contains mergeinfo referencing
> r82. When this is loaded into ReposB the reference should be changed
> to r(82+Y) to reflect the fact that ReposB has Y revisions in it
> before we ever started loading parts of ReposA. Currently no change
> is made and the source rev stays at r82, which is obviously incorrect.
>
> The attached patch fixes this problem. It takes all mergeinfo which
> predates the first revision in the dump stream and adjusts it by the
> difference between the head rev in ReposB and the current dump stream
> revision. Yes, this is a fairly fragile solution (as is our copyfrom
> sources mapping, which this is based on). If unrelated commits are
> made to ReposB between loads of the incremental dumps, then the
> mergeinfo source revs will be wrong; but it is *always* wrong right
> now, at least this would fix this use case. Opinions appreciated.
>
> And yes, there are still other potential problems: Think incremental
> dumps that are dumpfiltered such that some revs are dropped or
> renumbered. But I'm not sure we really want to solve *every* problem
> in this space, I'm not even sure we can (without bringing the
> performance of svnadmin load to its knees). I think this patch brings
> us to "good enough" with the caveat that ill-advised use of svnadmin
> dump/load and/or svndumpfilter can result in incorrect mergeinfo.
>
> Paul
>
One last issue #3020 fix: Correctly map mergeinfo revisions when loading
incremental dumps into a repository that was not empty prior to the first
load.
* subversion/include/private/svn_mergeinfo_private.h
(svn_mergeinfo__adjust_mergeinfo_rangelists): New.
* subversion/libsvn_subr/mergeinfo.c
(svn_mergeinfo__adjust_mergeinfo_rangelists): New.
* subversion/libsvn_repos/load.c
(renumber_mergeinfo_revs): Adjust mergeinfo older than the oldest revision
in the dump stream by the difference between the head rev of the target
repository and the current dump stream rev.
(new_revision_record): Update parse_baton.oldest_old_rev here, a bit sooner
than we did before when we set it...
(close_revision): ...here.
* subversion/tests/cmdline/svnadmin_tests.py
Index: subversion/include/private/svn_mergeinfo_private.h
===================================================================
--- subversion/include/private/svn_mergeinfo_private.h (revision 943236)
+++ subversion/include/private/svn_mergeinfo_private.h (working copy)
@@ -212,6 +212,21 @@
svn_boolean_t inheritable,
apr_pool_t *result_pool);
+/* Adjust in-place MERGEINFO's rangelists by OFFSET. If OFFSET is negative
+ and would adjust any part of MERGEINFO's source revisions to 0 or less,
+ then those revisions are dropped. If all the source revisions for a merge
+ source path are dropped, then the path itself is dropped. If all merge
+ source paths are dropped, then *ADJUSTED_MERGEINFO is set to an empty
+ hash. *ADJUSTED_MERGEINFO is allocated in RESULT_POOL. SCRATCH_POOL is
+ used for any temporary allocations. */
+svn_error_t *
+svn_mergeinfo__adjust_mergeinfo_rangelists(svn_mergeinfo_t *adjusted_mergeinfo,
+ svn_mergeinfo_t mergeinfo,
+ svn_revnum_t offset,
+ apr_pool_t *result_pool,
+ apr_pool_t *scratch_pool);
+
+
#ifdef __cplusplus
}
#endif /* __cplusplus */
Index: subversion/libsvn_repos/load.c
===================================================================
--- subversion/libsvn_repos/load.c (revision 943236)
+++ subversion/libsvn_repos/load.c (working copy)
@@ -278,11 +278,31 @@
apr_pool_t *pool)
{
apr_pool_t *subpool = svn_pool_create(pool);
- apr_hash_t *mergeinfo;
- apr_hash_t *final_mergeinfo = apr_hash_make(subpool);
+ svn_mergeinfo_t mergeinfo, predates_stream_mergeinfo;
+ svn_mergeinfo_t final_mergeinfo = apr_hash_make(subpool);
apr_hash_index_t *hi;
SVN_ERR(svn_mergeinfo_parse(&mergeinfo, initial_val->data, subpool));
+
+ if (rb->pb->oldest_old_rev > 1)
+ {
+ SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges(
+ &predates_stream_mergeinfo, mergeinfo,
+ rb->pb->oldest_old_rev - 1, 0,
+ TRUE, subpool, subpool));
+ SVN_ERR(svn_mergeinfo__filter_mergeinfo_by_ranges(
+ &mergeinfo, mergeinfo,
+ rb->pb->oldest_old_rev - 1, 0,
+ FALSE, subpool, subpool));
+ SVN_ERR(svn_mergeinfo__adjust_mergeinfo_rangelists(
+ &predates_stream_mergeinfo, predates_stream_mergeinfo,
+ -rb->rev_offset, subpool, subpool));
+ }
+ else
+ {
+ predates_stream_mergeinfo = NULL;
+ }
+
for (hi = apr_hash_first(subpool, mergeinfo); hi; hi = apr_hash_next(hi))
{
const char *merge_source;
@@ -348,6 +368,11 @@
apr_hash_set(final_mergeinfo, merge_source,
APR_HASH_KEY_STRING, rangelist);
}
+
+ if (predates_stream_mergeinfo)
+ SVN_ERR(svn_mergeinfo_merge(final_mergeinfo, predates_stream_mergeinfo,
+ subpool));
+
SVN_ERR(svn_mergeinfo_sort(final_mergeinfo, subpool));
/* Mergeinfo revision sources for r0 and r1 are invalid; you can't merge r0
@@ -1053,6 +1078,10 @@
SVN_ERR(svn_stream_printf(pb->outstream, pool,
_("<<< Started new transaction, based on "
"original revision %ld\n"), rb->rev));
+
+ /* Stash the oldest "old" revision committed from the load stream. */
+ if (!SVN_IS_VALID_REVNUM(pb->oldest_old_rev))
+ pb->oldest_old_rev = rb->rev;
}
/* If we're parsing revision 0, only the revision are (possibly)
@@ -1411,10 +1440,6 @@
return svn_error_return(err);
}
- /* Stash the oldest "old" revision committed from the load stream. */
- if (!SVN_IS_VALID_REVNUM(pb->oldest_old_rev))
- pb->oldest_old_rev = *old_rev;
-
/* Run post-commit hook, if so commanded. */
if (pb->use_post_commit_hook)
{
Index: subversion/libsvn_subr/mergeinfo.c
===================================================================
--- subversion/libsvn_subr/mergeinfo.c (revision 943236)
+++ subversion/libsvn_subr/mergeinfo.c (working copy)
@@ -1998,6 +1998,58 @@
return SVN_NO_ERROR;
}
+svn_error_t *
+svn_mergeinfo__adjust_mergeinfo_rangelists(svn_mergeinfo_t *adjusted_mergeinfo,
+ svn_mergeinfo_t mergeinfo,
+ svn_revnum_t offset,
+ apr_pool_t *result_pool,
+ apr_pool_t *scratch_pool)
+{
+ apr_hash_index_t *hi;
+ *adjusted_mergeinfo = apr_hash_make(result_pool);
+
+ if (mergeinfo)
+ {
+ for (hi = apr_hash_first(scratch_pool, mergeinfo);
+ hi;
+ hi = apr_hash_next(hi))
+ {
+ int i;
+ const char *path = svn__apr_hash_index_key(hi);
+ apr_array_header_t *rangelist = svn__apr_hash_index_val(hi);
+ apr_array_header_t *adjusted_rangelist =
+ apr_array_make(result_pool, rangelist->nelts,
+ sizeof(svn_merge_range_t *));
+
+ for (i = 0; i < rangelist->nelts; i++)
+ {
+ svn_merge_range_t *range =
+ APR_ARRAY_IDX(rangelist, i, svn_merge_range_t *);
+
+ if (range->start + offset > 0 && range->end + offset > 0)
+ {
+ if (range->start + offset < 0)
+ range->start = 0;
+ else
+ range->start = range->start + offset;
+
+ if (range->end + offset < 0)
+ range->end = 0;
+ else
+ range->end = range->end + offset;
+ APR_ARRAY_PUSH(adjusted_rangelist, svn_merge_range_t *) =
+ range;
+ }
+ }
+
+ if (adjusted_rangelist->nelts)
+ apr_hash_set(*adjusted_mergeinfo, apr_pstrdup(result_pool, path),
+ APR_HASH_KEY_STRING, adjusted_rangelist);
+ }
+ }
+ return SVN_NO_ERROR;
+}
+
svn_boolean_t
svn_mergeinfo__is_noninheritable(svn_mergeinfo_t mergeinfo,
apr_pool_t *scratch_pool)
Index: subversion/tests/cmdline/svnadmin_tests.py
===================================================================
--- subversion/tests/cmdline/svnadmin_tests.py (revision 943236)
+++ subversion/tests/cmdline/svnadmin_tests.py (working copy)
@@ -1070,6 +1070,7 @@
load_and_verify_dumpstream(sbox, [], [], None,
open(dump_file_r11_13).read(),
'--ignore-uuid')
+ #raise svntest.Failure("PTB")
load_and_verify_dumpstream(sbox, [], [], None,
open(dump_file_r14_15).read(),
'--ignore-uuid')
@@ -1162,28 +1163,6 @@
# Check the resulting mergeinfo. We expect the exact same results
# as Part 3.
- #
- # Currently this fails because our current logic mapping mergeinfo revs
- # in the load stream to their new values based on the offset of the
- # target repository is quite flawed. Right now this is the resulting
- # mergeinfo:
- #
- # Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B1\B
- # svn:mergeinfo
- # /Projects/Project-X/branches/B2/B/E:11-12
- # /Projects/Project-X/trunk/B/E:5-6,8-9
- # Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B1':
- # svn:mergeinfo
- # /Projects/Project-X/branches/B2:11-18
- # ^^
- # The *only* correct rev here!
- # /Projects/Project-X/trunk:6,9
- # Properties on 'svnadmin_tests-21\Projects\Project-X\branches\B2':
- # svn:mergeinfo
- # /Projects/Project-X/trunk:9
- #
- # See http://subversion.tigris.org/issues/show_bug.cgi?id=3020#desc16 for
- # more info.
svntest.actions.run_and_verify_svn(None, expected_output, [],
'propget', 'svn:mergeinfo', '-R',
sbox.repo_url)
@@ -1215,7 +1194,7 @@
create_in_repo_subdir,
SkipUnless(verify_with_invalid_revprops,
svntest.main.is_fs_type_fsfs),
- XFail(dont_drop_valid_mergeinfo_during_incremental_loads),
+ dont_drop_valid_mergeinfo_during_incremental_loads,
]
if __name__ == '__main__':