martinvonz added a comment.
In https://phab.mercurial-scm.org/D6255#91015, @marmoute wrote: > I did a first path through it, the new code seems reasonable and easier > to read than the previous one. Some comments and questions below. The rest somehow didn't make it here, so I'll copy from the email (i.e. the below is from Pierre-Yves, not from me): I did a first path through it, the new code seems reasonable and easier to read than the previous one. Some comments and questions below. On 4/16/19 7:19 PM, martinvonz (Martin von Zweigbergk) wrote: > martinvonz created this revision. > Herald added subscribers: mercurial-devel, mjpieters. > Herald added a reviewer: hg-reviewers. > > REVISION SUMMARY > When copies are stored in changesets, we need a changeset-centric > version of mergecopies() just like we have a changeset-centric version > of pathcopies(). I think the natural way of thinking about > mergecopies() is in terms of pathcopies() from the base to each of the > commits. So if we can rewrite mergecopies() based two such > pathcopies() calls, we'll get the changeset-centric version for > free. That's what this patch does. I like the approach, if I understand it correctly, we'll have less duplicated code in the end. > > A nice bonus is that it ends up being a lot simpler. mergecopies() has > accumulated a lot of technical debt over time. One good example is the > code for dealing with grafts (the "partial/incomplete/dirty" > stuff). Since pathcopies() already deals with backwards renames and > ping-pong renames, we get that for free. > > I've run tests with hard-coded debug logging for "fullcopy" and while > I haven't looked at every difference it produces, all the ones I have > looked at seemed reasonable to me. How many difference did you had? can you share some example of them with their explanation? on related topic, you seems to be fixing a case in test-fastannotate-hg.t and test-annotate-hg.t that is probably worth mentioning. > One drawback of the rewritten code is that we may now make > remotefilelog prefetch more files. We used to prefetch files that were > unique to either side of the merge compared to the other. We now > prefetch files that are unique to either sise of the merge compared to sise → side > the base. This means that if you added the same file to each side, we > would not prefetch it before, but we would now. Such cases are > probably quite rare, but one likely scenario where they happen is when > moving from a commit to its successor (or the other way around). The > user will probably already have the files in the cache in such cases, > so it's probably not a big deal. > > Some timings for calculating mergecopies between two revisions (all > using the common ancestor as base): Which revisions did you pick? (for the record, the benchmark suite use 1daa622bbe42 and 76caed42cf7c) > In the hg repo: > 4.8 4.8: 0.21s -> 0.21s > 4.0 4.8: 0.35s -> 0.63s > > In and old copy of the mozilla-unified repo: > FIREFOX_BETA_60_BASE^ FIREFOX_BETA_60_BASE: 0.51s -> 0.60s > FIREFOX_NIGHTLY_59_END FIREFOX_BETA_60_BASE: 2.1s -> 2.3s > FIREFOX_BETA_59_END FIREFOX_BETA_60_BASE: 3.1s -> 3.3s > FIREFOX_AURORA_50_BASE FIREFOX_BETA_60_BASE: 30s -> 35s > > So it's measurably slower in most cases. Note that merge copies are > not calculated when updating with a clean working copy, which is > probably the most common case. I therefore think the much simpler code > is worth the slowdown. Do you know where the slowdown comes from ? > REPOSITORY > https://phab.mercurial-scm.org/diffusion/HG/ Mercurial > > REVISION DETAIL > https://phab.mercurial-scm.org/D6255 > > AFFECTED FILES > mercurial/copies.py > tests/test-annotate.t > tests/test-fastannotate-hg.t > tests/test-graft.t > tests/test-rename-merge2.t > > CHANGE DETAILS > > diff --git a/tests/test-rename-merge2.t b/tests/test-rename-merge2.t > > - a/tests/test-rename-merge2.t +++ b/tests/test-rename-merge2.t @@ -433,6 +433,9 @@ > > -------------- > test L:nc a b R:up b W: - 12 merge b no ancestor > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 86a2aa42fc76+, remote: af30c7647fc7 > @@ -469,6 +472,9 @@ > -------------- > test L:up b R:nm a b W: - 13 merge b no ancestor > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 59318016310c+, remote: bdb19105162a > @@ -506,6 +512,9 @@ > -------------- > test L:nc a b R:up a b W: - 14 merge b no ancestor > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 86a2aa42fc76+, remote: 8dbce441892a > @@ -543,6 +552,9 @@ > -------------- > test L:up b R:nm a b W: - 15 merge b no ancestor, remove a > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 59318016310c+, remote: bdb19105162a > @@ -580,6 +592,9 @@ > -------------- > test L:nc a b R:up a b W: - 16 get a, merge b no ancestor > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 86a2aa42fc76+, remote: 8dbce441892a > @@ -617,6 +632,9 @@ > -------------- > test L:up a b R:nc a b W: - 17 keep a, merge b no ancestor > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 0b76e65c8289+, remote: 4ce40f5aca24 > @@ -653,6 +671,9 @@ > -------------- > test L:nm a b R:up a b W: - 18 merge b no ancestor > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 02963e448370+, remote: 8dbce441892a > @@ -695,6 +716,9 @@ > -------------- > test L:up a b R:nm a b W: - 19 merge b no ancestor, prompt remove a > -------------- > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'a' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: False, partial: False > ancestor: 924404dff337, local: 0b76e65c8289+, remote: bdb19105162a > diff --git a/tests/test-graft.t b/tests/test-graft.t > > - a/tests/test-graft.t +++ b/tests/test-graft.t @@ -75,6 +75,8 @@ > > > $ hg graft -r 2 --base 3 > grafting 2:5c095ad7e90f "2" > + note: possible conflict - c was deleted and renamed to: > + a > note: graft of 2:5c095ad7e90f created no changes to commit > > Can't continue without starting: > @@ -220,6 +222,9 @@ > committing changelog > updating the branch cache > grafting 5:97f8bfe72746 "5" > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'c' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: True, partial: False > ancestor: 4c60f11aa304, local: 6b9e5368ca4e+, remote: 97f8bfe72746 > @@ -233,6 +238,9 @@ > $ HGEDITOR=cat hg graft 4 3 --log --debug > scanning for duplicate grafts > grafting 4:9c233e8e184d "4" > + all copies found (* = to merge, ! = divergent, % = renamed and deleted): > + src: 'c' -> dst: 'b' > + checking for directory renames > resolving manifests > branchmerge: True, force: True, partial: False > ancestor: 4c60f11aa304, local: 1905859650ec+, remote: 9c233e8e184d > @@ -1129,7 +1137,6 @@ > grafting 2:f58c7e2b28fa "C0" > merging f1e and f1b to f1e > merging f2a and f2c to f2c > - merging f5b and f5a to f5a > > Test the cases A.1 (f4x) and A.7 (f3x). > > diff --git a/tests/test-fastannotate-hg.t b/tests/test-fastannotate-hg.t > > - a/tests/test-fastannotate-hg.t +++ b/tests/test-fastannotate-hg.t @@ -273,37 +273,10 @@ > > > EOF > $ hg ci -mc -d '3 0' > created new head > -BROKEN: 'a' was copied to 'b' on both sides. We should not get a merge conflict here > $ hg merge > merging b > - warning: conflicts while merging b! (edit, then use 'hg resolve --mark') > - 0 files updated, 0 files merged, 0 files removed, 1 files unresolved > - use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon > - [1] > - $ cat b > - <<<<<<< working copy: b80e3e32f75a - test: c > - a > - z > - a > - ||||||| base > - ======= > - a > - a > - a > - b4 > - c > - b5 > - >>>>>>> merge rev: 64afcdf8e29e - test: mergeb > - $ cat <<EOF > b > - > a > - > z > - > a > - > b4 > - > c > - > b5 > - > EOF > - $ hg resolve --mark -q > - $ rm b.orig > + 0 files updated, 1 files merged, 0 files removed, 0 files unresolved > + (branch merge, don't forget to commit) > $ echo d >> b > $ hg ci -mmerge2 -d '4 0' > > diff --git a/tests/test-annotate.t b/tests/test-annotate.t > > - a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -273,37 +273,10 @@ > > > EOF > $ hg ci -mc -d '3 0' > created new head > -BROKEN: 'a' was copied to 'b' on both sides. We should not get a merge conflict here > $ hg merge > merging b > - warning: conflicts while merging b! (edit, then use 'hg resolve --mark') > - 0 files updated, 0 files merged, 0 files removed, 1 files unresolved > - use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon > - [1] > - $ cat b > - <<<<<<< working copy: b80e3e32f75a - test: c > - a > - z > - a > - ||||||| base > - ======= > - a > - a > - a > - b4 > - c > - b5 > - >>>>>>> merge rev: 64afcdf8e29e - test: mergeb > - $ cat <<EOF > b > - > a > - > z > - > a > - > b4 > - > c > - > b5 > - > EOF > - $ hg resolve --mark -q > - $ rm b.orig > + 0 files updated, 1 files merged, 0 files removed, 0 files unresolved > + (branch merge, don't forget to commit) > $ echo d >> b > $ hg ci -mmerge2 -d '4 0' > > diff --git a/mercurial/copies.py b/mercurial/copies.py > > - a/mercurial/copies.py +++ b/mercurial/copies.py @@ -373,57 +373,6 @@ > > > return u1, u2 > > -def _makegetfctx(ctx): > - """return a 'getfctx' function suitable for _checkcopies usage > > - - We have to re-setup the function building 'filectx' for each - '_checkcopies' to ensure the linkrev adjustment is properly setup for - each. Linkrev adjustment is important to avoid bug in rename - detection. Moreover, having a proper '_ancestrycontext' setup ensures - the performance impact of this adjustment is kept limited. Without it, - each file could do a full dag traversal making the time complexity of - the operation explode (see issue4537). - - This function exists here mostly to limit the impact on stable. Feel - free to refactor on default. - """ - rev = ctx.rev() - repo = ctx._repo - ac = getattr(ctx, '_ancestrycontext', None) - if ac is None: - revs = [rev] - if rev is None: - revs = [p.rev() for p in ctx.parents()] - ac = repo.changelog.ancestors(revs, inclusive=True) - ctx._ancestrycontext = ac - def makectx(f, n): - if n in node.wdirfilenodeids: # in a working context? - if ctx.rev() is None: - return ctx.filectx(f) - return repo[None][f] - fctx = repo.filectx(f, fileid=n) - # setup only needed for filectx not create from a changectx - fctx._ancestrycontext = ac - fctx._descendantrev = rev - return fctx - return util.lrucachefunc(makectx) - -def _combinecopies(copyfrom, copyto, finalcopy, diverge, incompletediverge): - """combine partial copy paths""" - remainder = {} - for f in copyfrom: - if f in copyto: - finalcopy[copyto[f]] = copyfrom[f] - del copyto[f] - for f in incompletediverge: - assert f not in diverge - ic = incompletediverge[f] - if ic[0] in copyto: - diverge[f] = [copyto[ic[0]], ic[1]] - else: - remainder[f] = ic - return remainder - > > def mergecopies(repo, c1, c2, base): > """ > Finds moves and copies between context c1 and c2 that are relevant for > @@ -526,168 +475,93 @@ > This is pretty slow when a lot of changesets are involved but will track all > the copies. > """ > - # In certain scenarios (e.g. graft, update or rebase), base can be > - # overridden We still need to know a real common ancestor in this case We > - # can't just compute _c1.ancestor(_c2) and compare it to ca, because there > - # can be multiple common ancestors, e.g. in case of bidmerge. Because our > - # caller may not know if the revision passed in lieu of the CA is a genuine > - # common ancestor or not without explicitly checking it, it's better to > - # determine that here. > - # > - # base.isancestorof(wc) is False, work around that > - _c1 = c1.p1() if c1.rev() is None else c1 > - _c2 = c2.p1() if c2.rev() is None else c2 > - # an endpoint is "dirty" if it isn't a descendant of the merge base > - # if we have a dirty endpoint, we need to trigger graft logic, and also > - # keep track of which endpoint is dirty > - dirtyc1 = not base.isancestorof(_c1) > - dirtyc2 = not base.isancestorof(_c2) > - graft = dirtyc1 or dirtyc2 > - tca = base > - if graft: > - tca = _c1.ancestor(_c2) > > - - limit = _findlimit(repo, c1, c2) - > > m1 = c1.manifest() > m2 = c2.manifest() > mb = base.manifest() > > - # gather data from _checkcopies: > - # - diverge = record all diverges in this dict > - # - copy = record all non-divergent copies in this dict > - # - fullcopy = record all copies in this dict > - # - incomplete = record non-divergent partial copies here > - # - incompletediverge = record divergent partial copies here > - diverge = {} # divergence data is shared > - incompletediverge = {} > - data1 = {'copy': {}, > - 'fullcopy': {}, > - 'incomplete': {}, > - 'diverge': diverge, > - 'incompletediverge': incompletediverge, > - } > - data2 = {'copy': {}, > - 'fullcopy': {}, > - 'incomplete': {}, > - 'diverge': diverge, > - 'incompletediverge': incompletediverge, > - } > + copies1 = pathcopies(base, c1) > + copies2 = pathcopies(base, c2) > + > + inversecopies1 = {} > + inversecopies2 = {} > + for dst, src in copies1.items(): > + inversecopies1.setdefault(src, []).append(dst) > + for dst, src in copies2.items(): > + inversecopies2.setdefault(src, []).append(dst) > + > + copy = {} > + diverge = {} > + renamedelete = {} > + allsources = set(inversecopies1) | set(inversecopies2) > + for src in allsources: > + dsts1 = inversecopies1.get(src) > + dsts2 = inversecopies2.get(src) > + if dsts1 and dsts2: > + # copied/renamed on both sides > + if src not in m1 and src not in m2: > + # renamed on both sides > + dsts1 = set(dsts1) > + dsts2 = set(dsts2) > + # If there's some overlap in the rename destinations, we > + # consider it not divergent. For example, if side 1 copies 'a' > + # to 'b' and 'c' and deletes 'a', and side 2 copies 'a' to 'c' > + # and 'd' and deletes 'a'. Formatting could help this comment a bit. something like side 1: a -> b -> c side 2: a -> c -> d > + if dsts1 & dsts2: > + for dst in (dsts1 & dsts2): > + copy[dst] = src > + else: > + diverge[src] = sorted(dsts1 | dsts2) > + elif src in m1 and src in m2: > + # copied on both sides > + dsts1 = set(dsts1) > + dsts2 = set(dsts2) > + for dst in (dsts1 & dsts2): > + copy[dst] = src > + # TODO: Handle cases where it was renamed on one side and copied > + # on the other side Is this TODO a regression or some ported limitation. > + elif dsts1: > + # copied/renamed only on side 1 > + if src not in m2: > + # deleted on side 2 > + if src not in m1: > + # renamed on side 1, deleted on side 2 > + renamedelete[src] = dsts1 > + elif m2[src] != mb[src]: > + # modified on side 2 > + for dst in dsts1: > + if dst not in m2: > + # dst not added on side 2 (handle as regular > + # "both created" case in manifestmerge in that case) Can you elaborate a bit on what this case means (and how we deal with it) especially, what happens if dst is indeed in m2 ? > + copy[dst] = src > + elif dsts2: > + # copied/renamed only on side 2 > + if src not in m1: > + # deleted on side 1 > + if src not in m2: > + # renamed on side 2, deleted on side 1 > + renamedelete[src] = dsts2 > + elif m1[src] != mb[src]: > + # modified on side 1 > + for dst in dsts2: > + if dst not in m1: > + # dst not added on side 1 (handle as regular > + # "both created" case in manifestmerge in that case) > + copy[dst] = src Since this seems the very same code as the previous clause, I wonder if we could factor it out. This would help to prevent subtle bug when we update it. (the answer might be "no because python is slow). > + renamedeleteset = set() > + divergeset = set() > + for src, dsts in diverge.items(): > + divergeset.update(dsts) > + for src, dsts in renamedelete.items(): > + renamedeleteset.update(dsts) small nits: Is there any reason not to use diverge.values and renamedelete.values here ? REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D6255 To: martinvonz, #hg-reviewers Cc: marmoute, mjpieters, mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel