Re: [PATCH] spelling: fixes of non-dictionary words
On 11/16/2016 04:16 PM, Augie Fackler wrote: On Nov 16, 2016, at 08:47, Mads Kiilerich wrote: On 10/18/2016 02:20 AM, Augie Fackler wrote: On Mon, Oct 17, 2016 at 11:38:02PM +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476739015 -7200 # Mon Oct 17 23:16:55 2016 +0200 # Node ID efd5397e9da5b1d7e2c3353b0b06fc904651b150 # Parent 8a864844d5a0c34bdb24d2e098a0cd339e32e020 spelling: fixes of non-dictionary words Queued, with some hunks discarded. I've tried to note the interesting ones. Thanks for pushing. You omitted prioritises -> prioritizes. I thought we used US english? We do, but I'm bad at remembering which is which. I'd gladly do that cleanup, but only if you'll publish your tool for this (it'd be nice to have this happen on an ongoing basis somehow rather than have you drop in with a monster patch once a year). Sure. I will clean it up and post it now when the fixes have landed. It hasn't changed much since https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-August/043846.html and I still don't think it is feasible to upstream it ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] spelling: fixes of non-dictionary words
On 10/18/2016 02:20 AM, Augie Fackler wrote: On Mon, Oct 17, 2016 at 11:38:02PM +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476739015 -7200 # Mon Oct 17 23:16:55 2016 +0200 # Node ID efd5397e9da5b1d7e2c3353b0b06fc904651b150 # Parent 8a864844d5a0c34bdb24d2e098a0cd339e32e020 spelling: fixes of non-dictionary words Queued, with some hunks discarded. I've tried to note the interesting ones. Thanks for pushing. You omitted prioritises -> prioritizes. I thought we used US english? /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 5 of 7 v2] bdiff: give slight preference to longest matches in the middle of the B side
On 11/15/2016 09:57 PM, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1478626653 -3600 # Tue Nov 08 18:37:33 2016 +0100 # Node ID 5ae7e8061a9671e8941a7a17e316254d228acf59 # Parent 5bb26f29b1509520ca3af4c540775cab50b4d6c0 bdiff: give slight preference to longest matches in the middle of the B side This and the following patches can probably be rearranged and folded to give less churn. I would however appreciate to get another round of thorough review with this structure - that seems to me to give a more natural progression. The benefit from these changes is mainly "better diffs". Better diffs do not necessarily compress better and there might be some small increases in actual diff size. I have not noticed any significant performance changes. Greg, can you verify it doesn't impact your bdiff benchmarks in your environment in a bad way? /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range
On 11/12/2016 03:58 PM, Jun Wu wrote: Excerpts from Mads Kiilerich's message of 2016-11-03 22:34:15 +0100: # HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1 # Parent 3e0216b2a0995cb21946bc13fb21391013332c57 bdiff: make sure we append to repeated lines instead of inserting into range This will mitigate the symptoms that tests exposed in the previous changeset. Arguably, we need similar handling for longer sequences of repeated lines... I'm concerned about this. This patch looks like a workaround for the simplest case. While you are aware it could have issues with "longer sequences of repeated lines". And that's not that uncommon. There are real-world JSON files like: [ ... { "a": "b", }, { "a": "b", }, { "a": "b", }, ... ] Actually, those kinds of JSON files are part of the motivation of mpm's f1ca249696ed. This series would surely affect them, while I'm not sure the change is in a good way or not yet. Note that after f1ca249696ed, there are still unsolved issues with some large JSON files - we have seen (from user report) diff output being "wrong", where human could generate much smaller diff easily. If you are not in a hurry, I'd like to learn this area deeper before commenting confidently. That may take me 1 to 2 weeks. The test case I added was made to cover the same problem as your JSON example. I think v2 of my bdiff patch series addresses the concerns you mention in a reasonable way. Sure, I would appreciate to get more thorough review and feedback and test cases. I page in and out and have no hurry ... except that Greg is making rapid progress in the same code ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 7 v2] bdiff: adjust criteria for getting optimal longest match in the A side middle
# HG changeset patch # User Mads Kiilerich # Date 1478626653 -3600 # Tue Nov 08 18:37:33 2016 +0100 # Node ID fbce612c9be26860338f6ba4ab5df819b6d79521 # Parent 0f97aaa1b3f5c25e02296c73e6e93d77a197910c bdiff: adjust criteria for getting optimal longest match in the A side middle We prefer matches closer to the middle to balance recursion, as introduced in f1ca249696ed. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22804885 to 22803824 bytes - an 0.005% reduction. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -151,7 +151,7 @@ static int longest_match(struct bdiff_li if (a2 - a1 > 3) a1 = a2 - 3; - half = (a1 + a2) / 2; + half = (a1 + a2 - 1) / 2; for (i = a1; i < a2; i++) { /* skip all lines in b after the current block */ @@ -177,7 +177,7 @@ static int longest_match(struct bdiff_li /* best match so far? we prefer matches closer to the middle to balance recursion */ - if (k > mk || (k == mk && (i <= mi || i < half))) { + if (k > mk || (k == mk && (i <= mi || i <= half))) { mi = i; mj = j; mk = k; diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py --- a/tests/test-bdiff.py +++ b/tests/test-bdiff.py @@ -88,7 +88,7 @@ print("Diff 1 to 3 lines - preference fo showdiff('a\n', 'a\n' * 3) print("Diff 1 to 5 lines - preference for adding / removing at the end of sequences:") showdiff('a\n', 'a\n' * 5) -print("Diff 3 to 1 lines - preference for adding / removing at the end of sequences:") +print("Diff 3 to 1 lines - preference for balanced recursion:") showdiff('a\n' * 3, 'a\n') -print("Diff 5 to 1 lines - this diff seems weird:") +print("Diff 5 to 1 lines - preference for balanced recursion:") showdiff('a\n' * 5, 'a\n') diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out --- a/tests/test-bdiff.py.out +++ b/tests/test-bdiff.py.out @@ -67,16 +67,17 @@ showdiff( 'a\na\na\na\na\n'): 'a\n' 2 2 '' -> 'a\na\na\na\n' -Diff 3 to 1 lines - preference for adding / removing at the end of sequences: +Diff 3 to 1 lines - preference for balanced recursion: showdiff( 'a\na\na\n', 'a\n'): + 0 2 'a\n' -> '' 'a\n' - 2 6 'a\na\n' -> '' -Diff 5 to 1 lines - this diff seems weird: + 4 6 'a\n' -> '' +Diff 5 to 1 lines - preference for balanced recursion: showdiff( 'a\na\na\na\na\n', 'a\n'): - 0 2 'a\n' -> '' + 0 4 'a\na\n' -> '' 'a\n' - 4 10 'a\na\na\n' -> '' + 6 10 'a\na\n' -> '' ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 7 v2] tests: explore some bdiff cases
# HG changeset patch # User Mads Kiilerich # Date 1478626653 -3600 # Tue Nov 08 18:37:33 2016 +0100 # Node ID 0f97aaa1b3f5c25e02296c73e6e93d77a197910c # Parent b7354208678098580f6e5f975d734c1e8dd9d846 tests: explore some bdiff cases diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py --- a/tests/test-bdiff.py +++ b/tests/test-bdiff.py @@ -78,3 +78,17 @@ testfixws("", "", 1) testfixws("", "", 0) print("done") + +print("Odd diff for a trivial change:") +showdiff( +''.join('<%s\n-\n' % i for i in range(5)), +''.join('>%s\n-\n' % i for i in range(5))) + +print("Diff 1 to 3 lines - preference for adding / removing at the end of sequences:") +showdiff('a\n', 'a\n' * 3) +print("Diff 1 to 5 lines - preference for adding / removing at the end of sequences:") +showdiff('a\n', 'a\n' * 5) +print("Diff 3 to 1 lines - preference for adding / removing at the end of sequences:") +showdiff('a\n' * 3, 'a\n') +print("Diff 5 to 1 lines - this diff seems weird:") +showdiff('a\n' * 5, 'a\n') diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out --- a/tests/test-bdiff.py.out +++ b/tests/test-bdiff.py.out @@ -42,3 +42,41 @@ showdiff( 'f\n' done done +Odd diff for a trivial change: +showdiff( + '<0\n-\n<1\n-\n<2\n-\n<3\n-\n<4\n-\n', + '>0\n-\n>1\n-\n>2\n-\n>3\n-\n>4\n-\n'): + 0 8 '<0\n-\n<1\n' -> '>0\n' + '-\n' + 10 13 '<2\n' -> '>1\n' + '-\n' + 15 18 '<3\n' -> '>2\n' + '-\n' + 20 23 '<4\n' -> '>3\n' + '-\n' + 25 25 '' -> '>4\n-\n' +Diff 1 to 3 lines - preference for adding / removing at the end of sequences: +showdiff( + 'a\n', + 'a\na\na\n'): + 'a\n' + 2 2 '' -> 'a\na\n' +Diff 1 to 5 lines - preference for adding / removing at the end of sequences: +showdiff( + 'a\n', + 'a\na\na\na\na\n'): + 'a\n' + 2 2 '' -> 'a\na\na\na\n' +Diff 3 to 1 lines - preference for adding / removing at the end of sequences: +showdiff( + 'a\na\na\n', + 'a\n'): + 'a\n' + 2 6 'a\na\n' -> '' +Diff 5 to 1 lines - this diff seems weird: +showdiff( + 'a\na\na\na\na\n', + 'a\n'): + 0 2 'a\n' -> '' + 'a\n' + 4 10 'a\na\na\n' -> '' ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 6 of 7 v2] bdiff: give slight preference to appending lines
# HG changeset patch # User Mads Kiilerich # Date 1479243409 -3600 # Tue Nov 15 21:56:49 2016 +0100 # Node ID e022e995415bd48ffe6d4d03dc97b99dd547cde1 # Parent 5ae7e8061a9671e8941a7a17e316254d228acf59 bdiff: give slight preference to appending lines [This change could be folded into the previous changeset to minimize the repo churn ...] The general preference to matches in the middle of bdiff ranges helps getting balanced recursion and efficient computation. But, as previous changes have shown, it might also give diffs that seems "obviously wrong". To mitigate that: If the best match on the A side starts at the beginning of the bdiff range, don't aim for the middle-most B side match but for the earliest. This will make the matches balanced (by both sides being "early") even though the bisection will be less balanced. Still, this case only apply if the *best* and middle-most match was fully unbalanced on the A side. Each recursion will thus even in this worst case reduce the problem significantly and we are not re-introducing the problem that was fixed in f1ca249696ed. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22806817 to 22807275 bytes - a 0.002% increase. This make the recent test-bdiff.py changes give a more pretty output ... but they no longer show that the recursion is around middle matches (because it in these cases isn't). diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -188,7 +188,7 @@ static int longest_match(struct bdiff_li /* same match but closer to half */ mi = i; mj = j; - } else if (i == mi && mj > bhalf) { + } else if (i == mi && (mj > bhalf || i == a1)) { /* same i but best earlier j */ mj = j; } diff --git a/tests/test-annotate.t b/tests/test-annotate.t --- a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -91,8 +91,8 @@ annotate (JSON) annotate -n b $ hg annotate -n b + 0: a 1: a - 0: a 1: a 3: b4 3: b5 @@ -111,8 +111,8 @@ annotate --no-follow b annotate -nl b $ hg annotate -nl b - 1:1: a 0:1: a + 1:2: a 1:3: a 3:4: b4 3:5: b5 @@ -121,8 +121,8 @@ annotate -nl b annotate -nf b $ hg annotate -nf b + 0 a: a 1 a: a - 0 a: a 1 a: a 3 b: b4 3 b: b5 @@ -131,8 +131,8 @@ annotate -nf b annotate -nlf b $ hg annotate -nlf b - 1 a:1: a 0 a:1: a + 1 a:2: a 1 a:3: a 3 b:4: b4 3 b:5: b5 @@ -156,8 +156,8 @@ annotate -nlf b annotate after merge $ hg annotate -nf b + 0 a: a 1 a: a - 0 a: a 1 a: a 3 b: b4 4 b: c @@ -166,8 +166,8 @@ annotate after merge annotate after merge with -l $ hg annotate -nlf b - 1 a:1: a 0 a:1: a + 1 a:2: a 1 a:3: a 3 b:4: b4 4 b:5: c @@ -198,7 +198,7 @@ annotate after merge with -l annotate after rename merge $ hg annotate -nf b - 1 a: a + 0 a: a 6 b: z 1 a: a 3 b: b4 @@ -209,7 +209,7 @@ annotate after rename merge annotate after rename merge with -l $ hg annotate -nlf b - 1 a:1: a + 0 a:1: a 6 b:2: z 1 a:3: a 3 b:4: b4 @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit $ echo more >> b $ hg ci -mmore -d '7 0' $ hg annotate -nlf b - 1 a: 1: a + 0 a: 1: a 6 b: 2: z 1 a: 3: a 3 b: 4: b4 @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit linkrev vs rev $ hg annotate -r tip -n a + 0: a 1: a - 0: a 1: a linkrev vs rev with -l $ hg annotate -r tip -nl a - 1:1: a 0:1: a + 1:2: a 1:3: a Issue589: "undelete" sequence leads to crash diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py --- a/tests/test-bdiff.py +++ b/tests/test-bdiff.py @@ -84,9 +84,9 @@ showdiff( ''.join('<%s\n-\n' % i for i in range(5)), ''.join('>%s\n-\n' % i for i in range(5))) -print("Diff 1 to 3 lines - preference for balanced recursion:") +print("Diff 1 to 3 lines - preference for appending:") showdiff('a\n', 'a\n' * 3) -print("Diff 1 to 5 lines - preference for balanced recursion:") +print("Diff 1 to 5 lines - preference for appending:") showdiff('a\n', 'a\n' * 5) print("Diff 3 to 1 lines - preference for balanced recursion:") showdiff('a\n' * 3, 'a\n') diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out --- a/tests/test-bdiff.py.out +++ b/tests/test-bdiff.py.out @@ -56,20 +56,18 @@ showdiff( '-\n' 20 23 '<4\n' -> '>4\n' '-\n' -Diff 1 to 3 lines - preference for balanced recursion:
[PATCH 7 of 7 v2] bdiff: give slight preference to removing trailing lines
# HG changeset patch # User Mads Kiilerich # Date 1479243409 -3600 # Tue Nov 15 21:56:49 2016 +0100 # Node ID efa30442953a6e1f096f8833c4ee8047375600d6 # Parent e022e995415bd48ffe6d4d03dc97b99dd547cde1 bdiff: give slight preference to removing trailing lines [This change could be folded into the previous changeset to minimize the repo churn ...] Similar to the previous change, introduce an exception to the general preference for matches in the middle of bdiff ranges: If the best match on the B side starts at the beginning of the bdiff range, don't aim for the middle-most A side match but for the earliest. New (later) matches on the A side will only be considered better if the corresponding match on the B side *not* is at the beginning of the range. Thus, if the best (middle-most) match on the B side turns out to be at the beginning of the range, the earliest match on the A side will be used. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22807275 to 22808120 bytes - a 0.004% increase. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -184,7 +184,7 @@ static int longest_match(struct bdiff_li mj = j; mk = k; } else if (k == mk) { - if (i > mi && i <= half) { + if (i > mi && i <= half && j > b1) { /* same match but closer to half */ mi = i; mj = j; diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py --- a/tests/test-bdiff.py +++ b/tests/test-bdiff.py @@ -88,7 +88,7 @@ print("Diff 1 to 3 lines - preference fo showdiff('a\n', 'a\n' * 3) print("Diff 1 to 5 lines - preference for appending:") showdiff('a\n', 'a\n' * 5) -print("Diff 3 to 1 lines - preference for balanced recursion:") +print("Diff 3 to 1 lines - preference for removing trailing lines:") showdiff('a\n' * 3, 'a\n') -print("Diff 5 to 1 lines - preference for balanced recursion:") +print("Diff 5 to 1 lines - preference for removing trailing lines:") showdiff('a\n' * 5, 'a\n') diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out --- a/tests/test-bdiff.py.out +++ b/tests/test-bdiff.py.out @@ -68,17 +68,15 @@ showdiff( 'a\na\na\na\na\n'): 'a\n' 2 2 '' -> 'a\na\na\na\n' -Diff 3 to 1 lines - preference for balanced recursion: +Diff 3 to 1 lines - preference for removing trailing lines: showdiff( 'a\na\na\n', 'a\n'): - 0 2 'a\n' -> '' 'a\n' - 4 6 'a\n' -> '' -Diff 5 to 1 lines - preference for balanced recursion: + 2 6 'a\na\n' -> '' +Diff 5 to 1 lines - preference for removing trailing lines: showdiff( 'a\na\na\na\na\n', 'a\n'): - 0 4 'a\na\n' -> '' 'a\n' - 6 10 'a\na\n' -> '' + 2 10 'a\na\na\na\n' -> '' ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 7 v2] tests: make test-bdiff.py easier to maintain
# HG changeset patch # User Mads Kiilerich # Date 1479243409 -3600 # Tue Nov 15 21:56:49 2016 +0100 # Node ID b7354208678098580f6e5f975d734c1e8dd9d846 # Parent e1d6aa0e4c3aed73e0dc523b8a8fd5f9fe23510a tests: make test-bdiff.py easier to maintain Add more stdout logging to help navigate the .out file. diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py --- a/tests/test-bdiff.py +++ b/tests/test-bdiff.py @@ -11,13 +11,12 @@ def test1(a, b): if d: c = mpatch.patches(a, [d]) if c != b: -print("***", repr(a), repr(b)) -print("bad:") -print(repr(c)[:200]) -print(repr(d)) +print("bad diff+patch result from\n %r to\n %r:" % (a, b)) +print("bdiff: %r" % d) +print("patched: %r" % c[:200]) def test(a, b): -print("***", repr(a), repr(b)) +print("test", repr(a), repr(b)) test1(a, b) test1(b, a) @@ -43,13 +42,21 @@ test("a\nb", "a\nb") #issue1295 def showdiff(a, b): +print('showdiff(\n %r,\n %r):' % (a, b)) bin = bdiff.bdiff(a, b) pos = 0 +q = 0 while pos < len(bin): p1, p2, l = struct.unpack(">lll", bin[pos:pos + 12]) pos += 12 -print(p1, p2, repr(bin[pos:pos + l])) +if p1: +print('', repr(a[q:p1])) +print('', p1, p2, repr(a[p1:p2]), '->', repr(bin[pos:pos + l])) pos += l +q = p2 +if q < len(a): +print('', repr(a[q:])) + showdiff("x\n\nx\n\nx\n\nx\n\nz\n", "x\n\nx\n\ny\n\nx\n\nx\n\nz\n") showdiff("x\n\nx\n\nx\n\nx\n\nz\n", "x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n") # we should pick up abbbc. rather than bc.de as the longest match diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out --- a/tests/test-bdiff.py.out +++ b/tests/test-bdiff.py.out @@ -1,27 +1,44 @@ -*** 'a\nc\n\n\n\n' 'a\nb\n\n\n' -*** 'a\nb\nc\n' 'a\nc\n' -*** '' '' -*** 'a\nb\nc' 'a\nb\nc' -*** 'a\nb\nc\nd\n' 'a\nd\n' -*** 'a\nb\nc\nd\n' 'a\nc\ne\n' -*** 'a\nb\nc\n' 'a\nc\n' -*** 'a\n' 'c\na\nb\n' -*** 'a\n' '' -*** 'a\n' 'b\nc\n' -*** 'a\n' 'c\na\n' -*** '' 'adjfkjdjksdhfksj' -*** '' 'ab' -*** '' 'abc' -*** 'a' 'a' -*** 'ab' 'ab' -*** 'abc' 'abc' -*** 'a\n' 'a\n' -*** 'a\nb' 'a\nb' -6 6 'y\n\n' -6 6 'y\n\n' -9 9 'y\n\n' -0 0 'a\nb\nb\n' -12 12 'b\nc\n.\n' -16 18 '' +test 'a\nc\n\n\n\n' 'a\nb\n\n\n' +test 'a\nb\nc\n' 'a\nc\n' +test '' '' +test 'a\nb\nc' 'a\nb\nc' +test 'a\nb\nc\nd\n' 'a\nd\n' +test 'a\nb\nc\nd\n' 'a\nc\ne\n' +test 'a\nb\nc\n' 'a\nc\n' +test 'a\n' 'c\na\nb\n' +test 'a\n' '' +test 'a\n' 'b\nc\n' +test 'a\n' 'c\na\n' +test '' 'adjfkjdjksdhfksj' +test '' 'ab' +test '' 'abc' +test 'a' 'a' +test 'ab' 'ab' +test 'abc' 'abc' +test 'a\n' 'a\n' +test 'a\nb' 'a\nb' +showdiff( + 'x\n\nx\n\nx\n\nx\n\nz\n', + 'x\n\nx\n\ny\n\nx\n\nx\n\nz\n'): + 'x\n\nx\n\n' + 6 6 '' -> 'y\n\n' + 'x\n\nx\n\nz\n' +showdiff( + 'x\n\nx\n\nx\n\nx\n\nz\n', + 'x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n'): + 'x\n\nx\n\n' + 6 6 '' -> 'y\n\n' + 'x\n\n' + 9 9 '' -> 'y\n\n' + 'x\n\nz\n' +showdiff( + 'a\nb\nb\nb\nc\n.\nd\ne\n.\nf\n', + 'a\nb\nb\na\nb\nb\nb\nc\n.\nb\nc\n.\nd\ne\nf\n'): + 0 0 '' -> 'a\nb\nb\n' + 'a\nb\nb\nb\nc\n.\n' + 12 12 '' -> 'b\nc\n.\n' + 'd\ne\n' + 16 18 '.\n' -> '' + 'f\n' done done ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 5 of 7 v2] bdiff: give slight preference to longest matches in the middle of the B side
# HG changeset patch # User Mads Kiilerich # Date 1478626653 -3600 # Tue Nov 08 18:37:33 2016 +0100 # Node ID 5ae7e8061a9671e8941a7a17e316254d228acf59 # Parent 5bb26f29b1509520ca3af4c540775cab50b4d6c0 bdiff: give slight preference to longest matches in the middle of the B side We already have a slight preference for matches close to the middle on the A side. Now, do the same on the B side. j is iterating the b range backwards and we thus accept a new j if the previous match was in the upper half. This makes the test-bhalf diff "correct". It obviously also gives more preference to balanced recursion than to appending to sequences. That is kind of correct, but will also unfortunately make some bundles bigger. No doubt, we can also create examples where it will make them smaller ... The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22803824 to 22806817 bytes - an 0.01% increase. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -143,7 +143,7 @@ static int longest_match(struct bdiff_li struct pos *pos, int a1, int a2, int b1, int b2, int *omi, int *omj) { - int mi = a1, mj = b1, mk = 0, i, j, k, half; + int mi = a1, mj = b1, mk = 0, i, j, k, half, bhalf; /* window our search on large regions to better bound worst-case performance. by choosing a window at the end, we @@ -152,6 +152,7 @@ static int longest_match(struct bdiff_li a1 = a2 - 3; half = (a1 + a2 - 1) / 2; + bhalf = (b1 + b2 - 1) / 2; for (i = a1; i < a2; i++) { /* skip all lines in b after the current block */ @@ -187,8 +188,8 @@ static int longest_match(struct bdiff_li /* same match but closer to half */ mi = i; mj = j; - } else if (i == mi) { - /* same i but earlier j */ + } else if (i == mi && mj > bhalf) { + /* same i but best earlier j */ mj = j; } } diff --git a/tests/test-annotate.t b/tests/test-annotate.t --- a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -91,9 +91,9 @@ annotate (JSON) annotate -n b $ hg annotate -n b + 1: a 0: a 1: a - 1: a 3: b4 3: b5 3: b6 @@ -111,8 +111,8 @@ annotate --no-follow b annotate -nl b $ hg annotate -nl b + 1:1: a 0:1: a - 1:2: a 1:3: a 3:4: b4 3:5: b5 @@ -121,9 +121,9 @@ annotate -nl b annotate -nf b $ hg annotate -nf b + 1 a: a 0 a: a 1 a: a - 1 a: a 3 b: b4 3 b: b5 3 b: b6 @@ -131,8 +131,8 @@ annotate -nf b annotate -nlf b $ hg annotate -nlf b + 1 a:1: a 0 a:1: a - 1 a:2: a 1 a:3: a 3 b:4: b4 3 b:5: b5 @@ -156,9 +156,9 @@ annotate -nlf b annotate after merge $ hg annotate -nf b + 1 a: a 0 a: a 1 a: a - 1 a: a 3 b: b4 4 b: c 3 b: b5 @@ -166,8 +166,8 @@ annotate after merge annotate after merge with -l $ hg annotate -nlf b + 1 a:1: a 0 a:1: a - 1 a:2: a 1 a:3: a 3 b:4: b4 4 b:5: c @@ -198,7 +198,7 @@ annotate after merge with -l annotate after rename merge $ hg annotate -nf b - 0 a: a + 1 a: a 6 b: z 1 a: a 3 b: b4 @@ -209,7 +209,7 @@ annotate after rename merge annotate after rename merge with -l $ hg annotate -nlf b - 0 a:1: a + 1 a:1: a 6 b:2: z 1 a:3: a 3 b:4: b4 @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit $ echo more >> b $ hg ci -mmore -d '7 0' $ hg annotate -nlf b - 0 a: 1: a + 1 a: 1: a 6 b: 2: z 1 a: 3: a 3 b: 4: b4 @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit linkrev vs rev $ hg annotate -r tip -n a + 1: a 0: a 1: a - 1: a linkrev vs rev with -l $ hg annotate -r tip -nl a + 1:1: a 0:1: a - 1:2: a 1:3: a Issue589: "undelete" sequence leads to crash diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py --- a/tests/test-bdiff.py +++ b/tests/test-bdiff.py @@ -79,14 +79,14 @@ testfixws("", "", 0) print("done") -print("Odd diff for a trivial change:") +print("Nice diff for a trivial change:") showdiff( ''.join('<%s\n-\n' % i for i in range(5)), ''.join('>%s\n-\n' % i for i in range(5))) -print("Diff 1 to 3 lines - preference for adding / removing at the end of sequences:") +print("Diff 1 to 3 lines - preference for balanced recursion:") showdiff('a\n', 'a\n' * 3) -print("Diff 1 to 5 lines - preference for adding / removing at the end of sequences:") +p
[PATCH 4 of 7 v2] bdiff: rearrange the "better longest match" code
# HG changeset patch # User Mads Kiilerich # Date 1478626653 -3600 # Tue Nov 08 18:37:33 2016 +0100 # Node ID 5bb26f29b1509520ca3af4c540775cab50b4d6c0 # Parent fbce612c9be26860338f6ba4ab5df819b6d79521 bdiff: rearrange the "better longest match" code This is primarily to make the code more managable and prepare for later changes. More specific assignments might also be slightly faster, even thought it also might generate a bit more code. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -177,10 +177,20 @@ static int longest_match(struct bdiff_li /* best match so far? we prefer matches closer to the middle to balance recursion */ - if (k > mk || (k == mk && (i <= mi || i <= half))) { + if (k > mk) { + /* a longer match */ mi = i; mj = j; mk = k; + } else if (k == mk) { + if (i > mi && i <= half) { + /* same match but closer to half */ + mi = i; + mj = j; + } else if (i == mi) { + /* same i but earlier j */ + mj = j; + } } } } ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range
On 11/07/2016 03:40 PM, Pierre-Yves David wrote: On 11/03/2016 10:34 PM, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1 # Parent 3e0216b2a0995cb21946bc13fb21391013332c57 bdiff: make sure we append to repeated lines instead of inserting into range This will mitigate the symptoms that tests exposed in the previous changeset. small nits Would it be possible to have something patch-5-ish before patch 4 then to reduce the line change in patch 5? No, not the way it is done here. They could be collapsed - I just separated them to make it clear why the extra complexity was necessary. The last step could however perhaps be replaced by the mentioned "shift matching sequences" post processing step. That one could come first ... but it would arguably also go to far in the direction of making aesthetically pleasing diffs in code that "mostly" is used for our serialization format of changes. There could perhaps be another post processing step that optimize for size. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 2] perf: add perfbdiff
On 11/06/2016 08:42 AM, Gregory Szorc wrote: # HG changeset patch # User Gregory Szorc # Date 1478414512 25200 # Sat Nov 05 23:41:52 2016 -0700 # Node ID e65ef545a3dcb46fe0f1798e76ea17f9f9323452 # Parent f01367faa792635ad2f7a6b175ae3252292b5121 perf: add perfbdiff bdiff shows up a lot in profiling. I think it would be useful to have a perf command that runs bdiff over and over so we can find hot spots. diff --git a/contrib/perf.py b/contrib/perf.py --- a/contrib/perf.py +++ b/contrib/perf.py @@ -25,6 +25,7 @@ import random import sys import time from mercurial import ( +bdiff, changegroup, cmdutil, commands, @@ -746,6 +747,30 @@ def perffncacheencode(ui, repo, **opts): timer(d) fm.end() +@command('perfbdiff', revlogopts + formatteropts, '-c|-m|FILE REV') I would prefer to keep it simple and consistently use -r for specifying revisions. +def perfbdiff(ui, repo, file_, rev=None, **opts): +"""benchmark a bdiff between a revision and its delta parent""" +if opts.get('changelog') or opts.get('manifest'): +file_, rev = None, file_ +elif rev is None: +raise error.CommandError('perfbdiff', 'invalid arguments') + +r = cmdutil.openrevlog(repo, 'perfbdiff', file_, opts) + +node = r.lookup(rev) +rev = r.rev(node) This might be a stupid lazy question that essential is a request for more clarity in code and docstring: Why this back and forth between rev and node? Must rev always be a filelog revision ... or is it a changelog revision which then is changed to revlog revision while reusing the variable name? Perhaps reduce confusion by using different names. +dp = r.deltaparent(rev) Should it also consider aggressivemergedeltas? Or perhaps more generally: should it be possible to compare any two revisions - especially for filelogs, to also cover the use case of diff? More important: patch 2 LGTM. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 5 rfc] tests: explore some bdiff cases
On 11/06/2016 10:07 AM, Yuya Nishihara wrote: On Thu, 03 Nov 2016 22:34:11 +0100, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID f6408efe0d0f4179fe6cc2b967164c1b4567f3d6 # Parent d06c049695e6ad3219e7479c65ce98a2f123e878 tests: explore some bdiff cases diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t new file mode 100644 --- /dev/null +++ b/tests/test-bhalf.t '#require no-pure' is necessary since we use difflib in pure. The other changes in this series look good to me, but it's bdiff.c so I don't queue them. Thanks for reviewing and the positive feedback. I will try to polish it for "real" submission. For the last patch, I wonder if it would be better to add a post processing step that - given all the chunks - try to shift/rotate all match sequences to be as early as possible (and thus deltas to be as late and "appending" as possible). That could give more readable diffs, especially when combined with heuristics for preferring chunks starting with the lowest amount of indentation. One lesson from these changes seems to be that it is a problem that we use the same low level diff algorithm for revlog delta storage and bundles and for readable patch diffs. One idea that got mentioned at the latest sprint was to use zstandard for storage and "just" seed it with the "a" version of the file as dictionary and let it compress the "b" side. That might be a better long term solution. More short term, I wonder how much we could gain from somehow teaching bdiff to consider both parents for each chunk instead of just using deltas from one side and store chunks from the other verbatim. I think that could make a significant difference for repositories with a lot of big merges in files or the manifest. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 5 rfc] bdiff: rearrange the better longest match code
# HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID a3e3c7075c3c4b92e6b8c27e28bef7b2c061008d # Parent c593308da04e9144da01a08401d886a64985c74b bdiff: rearrange the better longest match code Primarily to make the code more managable and prepare for later changes. More specific assignments might also be slightly faster, even thought it also might generate a bit more code. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -172,10 +172,20 @@ static int longest_match(struct bdiff_li /* best match so far? we prefer matches closer to the middle to balance recursion */ - if (k > mk || (k == mk && (i <= mi || i <= half))) { + if (k > mk) { + /* a longer match is always better */ mi = i; mj = j; mk = k; + } else if (k == mk) { + if (i > mi && i <= half) { + /* better i in first lower half */ + mi = i; + mj = j; + } else if (i == mi) { + /* an earlier j is "better" */ + mj = j; + } } } } ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range
# HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1 # Parent 3e0216b2a0995cb21946bc13fb21391013332c57 bdiff: make sure we append to repeated lines instead of inserting into range This will mitigate the symptoms that tests exposed in the previous changeset. Arguably, we need similar handling for longer sequences of repeated lines... But also, we already have examples of how the heuristics handle other cases in a similar way. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -187,7 +187,7 @@ static int longest_match(struct bdiff_li } else if (i == mi && findbetterb) { /* better j in first upper half */ mj = j; - if (j <= bhalf) + if (j <= bhalf && !(j > 0 && k == 1 && b[j - 1].e == b[j].e)) findbetterb = 0; } } diff --git a/tests/test-annotate.t b/tests/test-annotate.t --- a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -91,8 +91,8 @@ annotate (JSON) annotate -n b $ hg annotate -n b + 0: a 1: a - 0: a 1: a 3: b4 3: b5 @@ -111,8 +111,8 @@ annotate --no-follow b annotate -nl b $ hg annotate -nl b - 1:1: a 0:1: a + 1:2: a 1:3: a 3:4: b4 3:5: b5 @@ -121,8 +121,8 @@ annotate -nl b annotate -nf b $ hg annotate -nf b + 0 a: a 1 a: a - 0 a: a 1 a: a 3 b: b4 3 b: b5 @@ -131,8 +131,8 @@ annotate -nf b annotate -nlf b $ hg annotate -nlf b - 1 a:1: a 0 a:1: a + 1 a:2: a 1 a:3: a 3 b:4: b4 3 b:5: b5 @@ -156,8 +156,8 @@ annotate -nlf b annotate after merge $ hg annotate -nf b + 0 a: a 1 a: a - 0 a: a 1 a: a 3 b: b4 4 b: c @@ -166,8 +166,8 @@ annotate after merge annotate after merge with -l $ hg annotate -nlf b - 1 a:1: a 0 a:1: a + 1 a:2: a 1 a:3: a 3 b:4: b4 4 b:5: c @@ -198,7 +198,7 @@ annotate after merge with -l annotate after rename merge $ hg annotate -nf b - 1 a: a + 0 a: a 6 b: z 1 a: a 3 b: b4 @@ -209,7 +209,7 @@ annotate after rename merge annotate after rename merge with -l $ hg annotate -nlf b - 1 a:1: a + 0 a:1: a 6 b:2: z 1 a:3: a 3 b:4: b4 @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit $ echo more >> b $ hg ci -mmore -d '7 0' $ hg annotate -nlf b - 1 a: 1: a + 0 a: 1: a 6 b: 2: z 1 a: 3: a 3 b: 4: b4 @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit linkrev vs rev $ hg annotate -r tip -n a + 0: a 1: a - 0: a 1: a linkrev vs rev with -l $ hg annotate -r tip -nl a - 1:1: a 0:1: a + 1:2: a 1:3: a Issue589: "undelete" sequence leads to crash diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t --- a/tests/test-bhalf.t +++ b/tests/test-bhalf.t @@ -105,8 +105,8 @@ Explore some bdiff implementation edge c --- a/x +++ b/x @@ -1,1 +1,3 @@ + a +a - a +a diff --git a/y b/y --- a/y diff --git a/tests/test-commit-amend.t b/tests/test-commit-amend.t --- a/tests/test-commit-amend.t +++ b/tests/test-commit-amend.t @@ -47,8 +47,8 @@ Amending changeset with changes in worki --- a/a Thu Jan 01 00:00:00 1970 + +++ b/a Thu Jan 01 00:00:00 1970 + @@ -1,1 +1,3 @@ + a +a - a +a $ hg log changeset: 1:43f1ba15f28a @@ -122,13 +122,13 @@ No changes, just a different message: uncompressed size of bundle content: 254 (changelog) 163 (manifests) - 141 a + 129 a saved backup bundle to $TESTTMP/.hg/strip-backup/74609c7f506e-1bfde511-amend-backup.hg (glob) 1 changesets found uncompressed size of bundle content: 250 (changelog) 163 (manifests) - 141 a + 129 a adding branch adding changesets adding manifests @@ -140,8 +140,8 @@ No changes, just a different message: --- a/a Thu Jan 01 00:00:00 1970 + +++ b/a Thu Jan 01 00:00:00 1970 + @@ -1,1 +1,3 @@ + a +a - a +a $ hg log changeset: 1:1cd866679df8 @@ -266,13 +266,13 @@ then, test editing custom commit message uncompressed size of bundle content: 249 (changelog) 163 (manifests) - 143 a + 131 a saved backup bundle to $TESTTMP/.hg/strip-backup/5f357c7560ab-e7c84ade-amend-backup.hg (glob) 1 changesets found uncompressed size of bundle content: 257 (changelog) 163 (manifests) - 143 a + 131 a adding branch adding changesets adding manifests @@ -309,13 +309,13 @@ Same, but with changes in working dir (d uncompressed size of bundle content: 464 (changelog)
[PATCH 4 of 5 rfc] bdiff: give preference to longest matches in the middle of b side
# HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID 3e0216b2a0995cb21946bc13fb21391013332c57 # Parent a3e3c7075c3c4b92e6b8c27e28bef7b2c061008d bdiff: give preference to longest matches in the middle of b side j is iterating the b range backwards and we thus use a flag to stop searching for more-in-the-middle matches when the first hit at or before the middle has been seen. This makes the test-bhalf diff "correct". It obviously also gives more preference to balanced recursion than to appending to sequences. That is kind of correct, but will also unfortunately make some bundles bigger. No doubt, we can also create examples where it will make them smaller ... diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -138,7 +138,7 @@ static int longest_match(struct bdiff_li struct pos *pos, int a1, int a2, int b1, int b2, int *omi, int *omj) { - int mi = a1, mj = b1, mk = 0, i, j, k, half; + int mi = a1, mj = b1, mk = 0, i, j, k, half, bhalf, findbetterb; /* window our search on large regions to better bound worst-case performance. by choosing a window at the end, we @@ -147,6 +147,7 @@ static int longest_match(struct bdiff_li a1 = a2 - 3; half = (a1 + a2 - 1) / 2; + bhalf = (b1 + b2 - 1) / 2; for (i = a1; i < a2; i++) { /* skip all lines in b after the current block */ @@ -154,6 +155,7 @@ static int longest_match(struct bdiff_li ; /* loop through all lines match a[i] in b */ + findbetterb = 1; for (; j >= b1; j = b[j].n) { /* does this extend an earlier match? */ for (k = 1; j - k >= b1 && i - k >= a1; k++) { @@ -182,9 +184,11 @@ static int longest_match(struct bdiff_li /* better i in first lower half */ mi = i; mj = j; - } else if (i == mi) { - /* an earlier j is "better" */ + } else if (i == mi && findbetterb) { + /* better j in first upper half */ mj = j; + if (j <= bhalf) + findbetterb = 0; } } } diff --git a/tests/test-annotate.t b/tests/test-annotate.t --- a/tests/test-annotate.t +++ b/tests/test-annotate.t @@ -91,9 +91,9 @@ annotate (JSON) annotate -n b $ hg annotate -n b + 1: a 0: a 1: a - 1: a 3: b4 3: b5 3: b6 @@ -111,8 +111,8 @@ annotate --no-follow b annotate -nl b $ hg annotate -nl b + 1:1: a 0:1: a - 1:2: a 1:3: a 3:4: b4 3:5: b5 @@ -121,9 +121,9 @@ annotate -nl b annotate -nf b $ hg annotate -nf b + 1 a: a 0 a: a 1 a: a - 1 a: a 3 b: b4 3 b: b5 3 b: b6 @@ -131,8 +131,8 @@ annotate -nf b annotate -nlf b $ hg annotate -nlf b + 1 a:1: a 0 a:1: a - 1 a:2: a 1 a:3: a 3 b:4: b4 3 b:5: b5 @@ -156,9 +156,9 @@ annotate -nlf b annotate after merge $ hg annotate -nf b + 1 a: a 0 a: a 1 a: a - 1 a: a 3 b: b4 4 b: c 3 b: b5 @@ -166,8 +166,8 @@ annotate after merge annotate after merge with -l $ hg annotate -nlf b + 1 a:1: a 0 a:1: a - 1 a:2: a 1 a:3: a 3 b:4: b4 4 b:5: c @@ -198,7 +198,7 @@ annotate after merge with -l annotate after rename merge $ hg annotate -nf b - 0 a: a + 1 a: a 6 b: z 1 a: a 3 b: b4 @@ -209,7 +209,7 @@ annotate after rename merge annotate after rename merge with -l $ hg annotate -nlf b - 0 a:1: a + 1 a:1: a 6 b:2: z 1 a:3: a 3 b:4: b4 @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit $ echo more >> b $ hg ci -mmore -d '7 0' $ hg annotate -nlf b - 0 a: 1: a + 1 a: 1: a 6 b: 2: z 1 a: 3: a 3 b: 4: b4 @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit linkrev vs rev $ hg annotate -r tip -n a + 1: a 0: a 1: a - 1: a linkrev vs rev with -l $ hg annotate -r tip -nl a + 1:1: a 0:1: a - 1:2: a 1:3: a Issue589: "undelete" sequence leads to crash diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t --- a/tests/test-bhalf.t +++ b/tests/test-bhalf.t @@ -14,58 +14,48 @@ Diff of boring files: +++ b/this-is-the-filename @@ -1,31 +1,31 @@ -once upon a time 1 - -The quick brown fox jumps over the lazy dog - -once upon a time 2 - -The quick brown fox jumps over the lazy dog - -once upon a time 3 - -The quick brown fox jumps over the lazy dog - -once upon a
[PATCH 1 of 5 rfc] tests: explore some bdiff cases
# HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID f6408efe0d0f4179fe6cc2b967164c1b4567f3d6 # Parent d06c049695e6ad3219e7479c65ce98a2f123e878 tests: explore some bdiff cases diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t new file mode 100644 --- /dev/null +++ b/tests/test-bhalf.t @@ -0,0 +1,140 @@ +A couple of test cases exploring the bdiff implementation. + +Diff of boring files: + + $ hg init repo1 + $ cd repo1 + $ (for i in `seq 15`; do echo "once upon a time $i"; echo "The quick brown fox jumps over the lazy dog"; done; echo) > this-is-the-filename + $ hg add this-is-the-filename + $ hg ci -m "commit message commit message commit message commit message commit message commit message commit message commit message" + $ (for i in `seq 15`; do echo "twice upon a time $i"; echo "The quick brown fox jumps over the lazy dog"; done; echo) > this-is-the-filename + $ hg diff --git + diff --git a/this-is-the-filename b/this-is-the-filename + --- a/this-is-the-filename + +++ b/this-is-the-filename + @@ -1,31 +1,31 @@ + -once upon a time 1 + -The quick brown fox jumps over the lazy dog + -once upon a time 2 + -The quick brown fox jumps over the lazy dog + -once upon a time 3 + -The quick brown fox jumps over the lazy dog + -once upon a time 4 + -The quick brown fox jumps over the lazy dog + -once upon a time 5 + -The quick brown fox jumps over the lazy dog + -once upon a time 6 + -The quick brown fox jumps over the lazy dog + -once upon a time 7 + +twice upon a time 1 + The quick brown fox jumps over the lazy dog + -once upon a time 8 + -The quick brown fox jumps over the lazy dog + -once upon a time 9 + -The quick brown fox jumps over the lazy dog + -once upon a time 10 + +twice upon a time 2 + The quick brown fox jumps over the lazy dog + -once upon a time 11 + -The quick brown fox jumps over the lazy dog + -once upon a time 12 + +twice upon a time 3 + The quick brown fox jumps over the lazy dog + -once upon a time 13 + +twice upon a time 4 + The quick brown fox jumps over the lazy dog + -once upon a time 14 + +twice upon a time 5 + The quick brown fox jumps over the lazy dog + -once upon a time 15 + +twice upon a time 6 + +The quick brown fox jumps over the lazy dog + +twice upon a time 7 + +The quick brown fox jumps over the lazy dog + +twice upon a time 8 + +The quick brown fox jumps over the lazy dog + +twice upon a time 9 + +The quick brown fox jumps over the lazy dog + +twice upon a time 10 + +The quick brown fox jumps over the lazy dog + +twice upon a time 11 + +The quick brown fox jumps over the lazy dog + +twice upon a time 12 + +The quick brown fox jumps over the lazy dog + +twice upon a time 13 + +The quick brown fox jumps over the lazy dog + +twice upon a time 14 + +The quick brown fox jumps over the lazy dog + +twice upon a time 15 + The quick brown fox jumps over the lazy dog + +That's an odd diff for a trivial change! + + $ hg ci -m "commit message commit message commit message commit message commit message commit message commit message commit message" + $ hg bundle --base null ../bundle.hg + 2 changesets found + $ cd .. + + $ f --size bundle.hg + bundle.hg: size=878 + +Explore some bdiff implementation edge cases: + + $ hg init repo2 + $ cd repo2 + $ cat << EOF >> x + > a + > EOF + $ cat << EOF >> y + > a + > a + > a + > EOF + $ cat << EOF >> z + > a + > a + > a + > a + > a + > EOF + $ hg ci -qAm0 + $ cat << EOF > x + > a + > a + > a + > EOF + $ cat << EOF > y + > a + > EOF + $ cat << EOF > z + > a + > EOF + $ hg diff --git + diff --git a/x b/x + --- a/x + +++ b/x + @@ -1,1 +1,3 @@ + a + +a + +a + diff --git a/y b/y + --- a/y + +++ b/y + @@ -1,3 +1,1 @@ + a + -a + -a + diff --git a/z b/z + --- a/z + +++ b/z + @@ -1,5 +1,1 @@ + -a + a + -a + -a + -a + +x and y shows the preference for adding / removing at the end of sequences ... +z just seems weird. + + $ cd .. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 5 rfc] bdiff: adjust criteria for getting optimal longest match in the middle
# HG changeset patch # User Mads Kiilerich # Date 1478208837 -3600 # Thu Nov 03 22:33:57 2016 +0100 # Node ID c593308da04e9144da01a08401d886a64985c74b # Parent f6408efe0d0f4179fe6cc2b967164c1b4567f3d6 bdiff: adjust criteria for getting optimal longest match in the middle We prefer matches closer to the middle to balance recursion, as introduced in f1ca249696ed. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c --- a/mercurial/bdiff.c +++ b/mercurial/bdiff.c @@ -146,7 +146,7 @@ static int longest_match(struct bdiff_li if (a2 - a1 > 3) a1 = a2 - 3; - half = (a1 + a2) / 2; + half = (a1 + a2 - 1) / 2; for (i = a1; i < a2; i++) { /* skip all lines in b after the current block */ @@ -172,7 +172,7 @@ static int longest_match(struct bdiff_li /* best match so far? we prefer matches closer to the middle to balance recursion */ - if (k > mk || (k == mk && (i <= mi || i < half))) { + if (k > mk || (k == mk && (i <= mi || i <= half))) { mi = i; mj = j; mk = k; diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t --- a/tests/test-bhalf.t +++ b/tests/test-bhalf.t @@ -33,20 +33,21 @@ Diff of boring files: -once upon a time 9 -The quick brown fox jumps over the lazy dog -once upon a time 10 + -The quick brown fox jumps over the lazy dog + -once upon a time 11 +twice upon a time 2 The quick brown fox jumps over the lazy dog - -once upon a time 11 + -once upon a time 12 -The quick brown fox jumps over the lazy dog - -once upon a time 12 + -once upon a time 13 +twice upon a time 3 The quick brown fox jumps over the lazy dog - -once upon a time 13 + -once upon a time 14 +twice upon a time 4 The quick brown fox jumps over the lazy dog - -once upon a time 14 + -once upon a time 15 +twice upon a time 5 - The quick brown fox jumps over the lazy dog - -once upon a time 15 + +The quick brown fox jumps over the lazy dog +twice upon a time 6 +The quick brown fox jumps over the lazy dog +twice upon a time 7 @@ -76,7 +77,7 @@ That's an odd diff for a trivial change! $ cd .. $ f --size bundle.hg - bundle.hg: size=878 + bundle.hg: size=870 Explore some bdiff implementation edge cases: @@ -121,20 +122,21 @@ Explore some bdiff implementation edge c --- a/y +++ b/y @@ -1,3 +1,1 @@ + -a a -a - -a diff --git a/z b/z --- a/z +++ b/z @@ -1,5 +1,1 @@ -a + -a a -a -a - -a -x and y shows the preference for adding / removing at the end of sequences ... -z just seems weird. +x shows the preference for adding at the end of sequences ... +while y and z shows the preference for balanced recursion +which is more efficient in stack and performance. $ cd .. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 2] context: make sure __str__ works, also when there is no _changectx
# HG changeset patch # User Mads Kiilerich # Date 1426800170 -3600 # Thu Mar 19 22:22:50 2015 +0100 # Node ID 4b06a40809d781e9aba54718ec2fe1a232d81956 # Parent 90300200bc1fcaedcc6ab109574d08b01ece2853 context: make sure __str__ works, also when there is no _changectx Before, it could crash when trying to print the wrong kind of object at the wrong time. diff --git a/mercurial/context.py b/mercurial/context.py --- a/mercurial/context.py +++ b/mercurial/context.py @@ -709,7 +709,10 @@ class basefilectx(object): return False def __str__(self): -return "%s@%s" % (self.path(), self._changectx) +try: +return "%s@%s" % (self.path(), self._changectx) +except error.LookupError: +return "%s@???" % self.path() def __repr__(self): return "<%s %s>" % (type(self).__name__, str(self)) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 2] largefiles: clarify variable name holding file mode
# HG changeset patch # User Mads Kiilerich # Date 1476801939 -7200 # Tue Oct 18 16:45:39 2016 +0200 # Node ID 90300200bc1fcaedcc6ab109574d08b01ece2853 # Parent bb586966818986131068280bfd95fc96fbdaaa0d largefiles: clarify variable name holding file mode A follow-up to c01acee367ec. 'st' sounds like the whole stat result while 'mode' is a better name for the actual file mode. diff --git a/hgext/largefiles/lfcommands.py b/hgext/largefiles/lfcommands.py --- a/hgext/largefiles/lfcommands.py +++ b/hgext/largefiles/lfcommands.py @@ -510,18 +510,21 @@ def updatelfiles(ui, repo, filelist=None lfdirstate.normal(lfile) update1 = 1 -# copy the state of largefile standin from the repository's +# copy the exec mode of largefile standin from the repository's # dirstate to its state in the lfdirstate. rellfile = lfile relstandin = lfutil.standin(lfile) if wvfs.exists(relstandin): +# exec is decided by the users permissions using mask 0o100 standinexec = wvfs.stat(relstandin).st_mode & 0o100 -st = wvfs.stat(rellfile).st_mode -if standinexec != st & 0o100: -st &= ~0o111 +st = wvfs.stat(rellfile) +mode = st.st_mode +if standinexec != mode & 0o100: +# first remove all X bits, then shift all R bits to X +mode &= ~0o111 if standinexec: -st |= (st >> 2) & 0o111 & ~util.umask -wvfs.chmod(rellfile, st) +mode |= (mode >> 2) & 0o111 & ~util.umask +wvfs.chmod(rellfile, mode) update1 = 1 updated += update1 ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] spelling: fixes of non-dictionary words
On 11/02/2016 03:41 PM, Augie Fackler wrote: On Nov 2, 2016, at 10:32, Mads Kiilerich wrote: On 10/18/2016 02:20 AM, Augie Fackler wrote: On Mon, Oct 17, 2016 at 11:38:02PM +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476739015 -7200 # Mon Oct 17 23:16:55 2016 +0200 # Node ID efd5397e9da5b1d7e2c3353b0b06fc904651b150 # Parent 8a864844d5a0c34bdb24d2e098a0cd339e32e020 spelling: fixes of non-dictionary words Queued, with some hunks discarded. I've tried to note the interesting ones. Can you push your queued version now? I thought I already had. If it's not local, then it's gone. Do you want to just send a new one, or should I try and reconstruct what I had? Changes don't "go away" with Mercurial so I would assume that you must have it locally somewhere. Digging it out is probably less work for you than the alternatives. Alternatively, you are in a power position and can decide which alternative you want ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] spelling: fixes of non-dictionary words
On 10/18/2016 02:20 AM, Augie Fackler wrote: On Mon, Oct 17, 2016 at 11:38:02PM +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476739015 -7200 # Mon Oct 17 23:16:55 2016 +0200 # Node ID efd5397e9da5b1d7e2c3353b0b06fc904651b150 # Parent 8a864844d5a0c34bdb24d2e098a0cd339e32e020 spelling: fixes of non-dictionary words Queued, with some hunks discarded. I've tried to note the interesting ones. Can you push your queued version now? /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 4 STABLE] help: replace selenic.com by mercurial-scm.org in man pages
On 11/01/2016 12:47 PM, FUJIWARA Katsunori wrote: # HG changeset patch # User FUJIWARA Katsunori # Date 1478000376 -32400 # Tue Nov 01 20:39:36 2016 +0900 # Branch stable # Node ID 4cc5a1e982f91d009e00068e9a0f876e03afbf45 # Parent d9fd3d3b1233c6d420015f329906b9bed393d978 help: replace selenic.com by mercurial-scm.org in man pages Source code repository and mailing list services have been already migrated to mercurial-scm.org domain. diff --git a/mercurial/help/hg-ssh.8.txt b/mercurial/help/hg-ssh.8.txt --- a/mercurial/help/hg-ssh.8.txt +++ b/mercurial/help/hg-ssh.8.txt @@ -58,9 +58,9 @@ Resources " Main Web Site: https://mercurial-scm.org/ Hmm ... there is now a redirect from that to https://www.mercurial-scm.org/ . This www is so last year and redundant - wouldn't it be better to just consistently get rid of it, have the main web site where we say it is, and enjoy shorter URLs? /Mads -Source code repository: http://selenic.com/hg +Source code repository: https://www.mercurial-scm.org/repo/hg ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH stable] httppeer: make __del__ access to self.urlopener more safe
# HG changeset patch # User Mads Kiilerich # Date 1477917828 -3600 # Mon Oct 31 13:43:48 2016 +0100 # Branch stable # Node ID 13b1ef724b8c5c22f9234f62b9d9a583655ce87e # Parent 69ffbbe73dd03df0d1a00bdb2bc083fdb73ede09 httppeer: make __del__ access to self.urlopener more safe Some errors could in some cases show unfortunate scary and confusing warnings from the httppeer delstructors: abort: nodename nor servname provided, or not known Exception AttributeError: "'httpspeer' object has no attribute 'urlopener'" in > ignored``` To mute that, take 7b15dd9125b3 to the next level and use getattr in __del__. diff --git a/mercurial/httppeer.py b/mercurial/httppeer.py --- a/mercurial/httppeer.py +++ b/mercurial/httppeer.py @@ -63,8 +63,9 @@ class httppeer(wireproto.wirepeer): self.requestbuilder = urlreq.request def __del__(self): -if self.urlopener: -for h in self.urlopener.handlers: +urlopener = getattr(self, 'urlopener', None) +if urlopener: +for h in urlopener.handlers: h.close() getattr(h, "close_all", lambda : None)() ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH stable] tests: work around instability that caused test from 4999c12c526b to fail
On 10/28/2016 10:35 AM, Pierre-Yves David wrote: On 10/18/2016 03:32 PM, Pierre-Yves David wrote: On 10/18/2016 03:18 PM, Mads Kiilerich wrote: On 10/18/2016 02:30 PM, Pierre-Yves David wrote: On 10/18/2016 01:33 AM, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476746894 -7200 # Tue Oct 18 01:28:14 2016 +0200 # Branch stable # Node ID 548f82b480d086c7a551b025fb980cd70187c880 # Parent 328545c7d8a1044330b8a5bfbdd9c2ff08625d6a tests: work around instability that caused test from 4999c12c526b to fail I'm not too sure of what is going on here, Can you elaborate? I'm also not sure what is going on. I suddenly saw the new test I added started to fail. Not in the actual test but in the setup code. Apparently unrelated to other recent changes - the new test just happened to expose it. Thus, I suggest this workaround for now. I would be more comfortable if we had a better idea of what is going one here. Can you have a deeper look? Any news on this? No. The patch still makes the setup part of my new test work reliably despite an apparent existing instability in the state left from the previous test. I haven't investigated further. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH stable] largefiles: handle that a found standin file doesn't exist when removing it
# HG changeset patch # User Mads Kiilerich # Date 1477591593 -7200 # Thu Oct 27 20:06:33 2016 +0200 # Branch stable # Node ID 23f53561a8b6dcfbb35020df4d113fe34fec4c0e # Parent 69ffbbe73dd03df0d1a00bdb2bc083fdb73ede09 largefiles: handle that a found standin file doesn't exist when removing it I somehow ended up in a situation where hg crashed on an unlink I introduced in 328545c7d8a1. I don't know how it happened and can't reproduce it. It seems like it only can happen when the file is removed between the time of check in a working directory context walk that finds a standin file, and the time of use when we try to remove it because the corresponding largefile doesn't exist. But better safe than sorry: replace the plain unlink with unlinkpath with ignoremissing=True. That will also remove remaining empty directories, which arguably is more correct. diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py --- a/hgext/largefiles/reposetup.py +++ b/hgext/largefiles/reposetup.py @@ -217,7 +217,7 @@ def reposetup(ui, repo): # standin. Removing a file as a side effect of # running status is gross, but the alternatives (if # any) are worse. -self.wvfs.unlink(standin) +self.wvfs.unlinkpath(standin, ignoremissing=True) # Filter result lists result = list(result) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
speling tool
On 10/26/2016 09:22 PM, timeless wrote: Pierre-Yves David wrote: +# the remainig bit and drop this line remaining ;-) -- I really should try to put together a version of my spelling tool as a test to see what people think. I'd envision it running w/ a stored cache of allowed words, each run of the test would generate a new word list, diff it with the old one, and complain about words that aren't in the old one and aren't in the user's dictionary. It could also be adjusted to only complain about words of length {x,y} instead of {1,Inf}... That sounds kind of like https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-August/043846.html (early version) that I use for patches like https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089492.html . We do however use a lot of weird words so it seems like unless we limit the scope a lot, there will inherently be more false hits than genuine catches. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH stable] revset: don't cache abstractsmartset min/max invocations infinitely
On 10/26/2016 02:31 PM, Yuya Nishihara wrote: On Wed, 26 Oct 2016 14:09:17 +0200, Mads Kiilerich wrote: Instead, I would perhaps prefer to introduce a 'gettercache' that works on methods that only have self as parameter (and thus can set a simple instance method as I do here) ... or a 'methodcache' that would be like cachefunc but store the cache on the first 'self' parameter. That also seems fine. (I'm not a big fan of overwriting methods since we're likely to create self->self cycle, but that wouldn't be the case here.) I guess it could be slightly more gc efficient to use 'lambda v=v: v' to avoid having a reference to v in the outer scope that also has self ... /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH stable] revset: don't cache abstractsmartset min/max invocations infinitely
On 10/26/2016 01:21 PM, Yuya Nishihara wrote: On Tue, 25 Oct 2016 18:57:26 +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1477414587 -7200 # Tue Oct 25 18:56:27 2016 +0200 # Branch stable # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a # Parent 76c57e1fe79b0980b377b4f305635dea393d6315 revset: don't cache abstractsmartset min/max invocations infinitely There was a "leak", apparently introduced in ab66c1dee405. When running: hg = hglib.open('repo') while True: hg.log("max(branch('default'))") all filteredset instances from branch() would be cached indefinitely by the @util.cachefunc annotation on the max() implementation. Indeed. We've cached {self: v} pair every time max() was called. Queued this for stable, thanks. -@util.cachefunc def min(self): """return the minimum element in the set""" -if self.fastasc is not None: -for r in self.fastasc(): -return r -raise ValueError('arg is an empty sequence') -return min(self) - -@util.cachefunc +if self.fastasc is None: +v = min(self) +else: +for v in self.fastasc(): +break +else: +raise ValueError('arg is an empty sequence') +self.min = lambda: v +return v I slightly prefer using propertycache() and wrap it by a function, which is a common pattern seen in context.py, but that's just a nitpicking. Exactly what do you mean with wrapping the property by a function? I don't see a clear pattern of that in context.py? Instead, I would perhaps prefer to introduce a 'gettercache' that works on methods that only have self as parameter (and thus can set a simple instance method as I do here) ... or a 'methodcache' that would be like cachefunc but store the cache on the first 'self' parameter. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH stable] revset: don't cache abstractsmartset min/max invocations infinitely
# HG changeset patch # User Mads Kiilerich # Date 1477414587 -7200 # Tue Oct 25 18:56:27 2016 +0200 # Branch stable # Node ID c2fe58cd4235fc6c8cabea882794303d620bec3a # Parent 76c57e1fe79b0980b377b4f305635dea393d6315 revset: don't cache abstractsmartset min/max invocations infinitely There was a "leak", apparently introduced in ab66c1dee405. When running: hg = hglib.open('repo') while True: hg.log("max(branch('default'))") all filteredset instances from branch() would be cached indefinitely by the @util.cachefunc annotation on the max() implementation. util.cachefunc seems dangerous as method decorator and is barely used elsewhere in the code base. Instead, just open code caching by having the min/max methods replace themselves with a plain lambda returning the result. diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -2924,23 +2924,29 @@ class abstractsmartset(object): """True if the set will iterate in topographical order""" raise NotImplementedError() -@util.cachefunc def min(self): """return the minimum element in the set""" -if self.fastasc is not None: -for r in self.fastasc(): -return r -raise ValueError('arg is an empty sequence') -return min(self) - -@util.cachefunc +if self.fastasc is None: +v = min(self) +else: +for v in self.fastasc(): +break +else: +raise ValueError('arg is an empty sequence') +self.min = lambda: v +return v + def max(self): """return the maximum element in the set""" -if self.fastdesc is not None: -for r in self.fastdesc(): -return r -raise ValueError('arg is an empty sequence') -return max(self) +if self.fastdesc is None: +return max(self) +else: +for v in self.fastdesc(): +break +else: +raise ValueError('arg is an empty sequence') +self.max = lambda: v +return v def first(self): """return the first element in the set (user iteration perspective) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3] dirstate: add debug message for debug.dirstate.delaywrite
On 10/18/2016 06:17 PM, Kevin Bullock wrote: On Oct 18, 2016, at 10:19, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476803223 -7200 # Tue Oct 18 17:07:03 2016 +0200 # Node ID 9329389e9dc752652f7c820255ca5afa346c4e08 # Parent cde3cae17cba67f80b9f1b41e5cc5fe3b87cd06f dirstate: add debug message for debug.dirstate.delaywrite Is this series related to the intermittent test failure you described in <https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089501.html>? No, I don't think so. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 3] largefiles: clarify variable name holding file mode
# HG changeset patch # User Mads Kiilerich # Date 1476801939 -7200 # Tue Oct 18 16:45:39 2016 +0200 # Node ID 6d4322a34c2a543665d702fd10f38f1af6001715 # Parent 26089f5e3b51d0416b985ec78e7facdb3113aa48 largefiles: clarify variable name holding file mode A follow-up to c01acee367ec. 'st' sounds like the whole stat result while 'mode' is a better name for the actual file mode. diff --git a/hgext/largefiles/lfcommands.py b/hgext/largefiles/lfcommands.py --- a/hgext/largefiles/lfcommands.py +++ b/hgext/largefiles/lfcommands.py @@ -510,18 +510,21 @@ def updatelfiles(ui, repo, filelist=None lfdirstate.normal(lfile) update1 = 1 -# copy the state of largefile standin from the repository's +# copy the exec mode of largefile standin from the repository's # dirstate to its state in the lfdirstate. rellfile = lfile relstandin = lfutil.standin(lfile) if wvfs.exists(relstandin): +# exec is decided by the users permissions using mask 0o100 standinexec = wvfs.stat(relstandin).st_mode & 0o100 -st = wvfs.stat(rellfile).st_mode -if standinexec != st & 0o100: -st &= ~0o111 +st = wvfs.stat(rellfile) +mode = st.st_mode +if standinexec != mode & 0o100: +# first remove all X bits, then shift all R bits to X +mode &= ~0o111 if standinexec: -st |= (st >> 2) & 0o111 & ~util.umask -wvfs.chmod(rellfile, st) +mode |= (mode >> 2) & 0o111 & ~util.umask +wvfs.chmod(rellfile, mode) update1 = 1 updated += update1 ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 3] dirstate: add debug message for debug.dirstate.delaywrite
# HG changeset patch # User Mads Kiilerich # Date 1476803223 -7200 # Tue Oct 18 17:07:03 2016 +0200 # Node ID 9329389e9dc752652f7c820255ca5afa346c4e08 # Parent cde3cae17cba67f80b9f1b41e5cc5fe3b87cd06f dirstate: add debug message for debug.dirstate.delaywrite Show a message like: delaying dirstate write 0.305s to record that it was clean which is kind of obscure but gives a hint that something is going on and helps debugging. The functionality *is* obscure and I don't know a better way to describe it briefly. diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py --- a/mercurial/dirstate.py +++ b/mercurial/dirstate.py @@ -756,6 +756,8 @@ class dirstate(object): clock = time.time() start = int(clock) - (int(clock) % delaywrite) end = start + delaywrite +self._ui.debug('delaying dirstate write %0.3fs to record' + ' that it was clean\n' % (end - clock)) time.sleep(end - clock) now = end # trust our estimate that the end is near now break ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 3] dirstate: fix debug.dirstate.delaywrite to use the new "now" after sleeping
# HG changeset patch # User Mads Kiilerich # Date 1476802355 -7200 # Tue Oct 18 16:52:35 2016 +0200 # Node ID cde3cae17cba67f80b9f1b41e5cc5fe3b87cd06f # Parent 6d4322a34c2a543665d702fd10f38f1af6001715 dirstate: fix debug.dirstate.delaywrite to use the new "now" after sleeping It seems like the a regression has sneaked into debug.dirstate.delaywrite in 6c6b48aca328. It would sleep until no files were modified "now" any more, but when writing the dirstate it would use the old "now" and still mark files as 'unset' instead of recording the timestamp that would make the file show up as clean instead of unknown. Instead of getting a new "now" from the file system, we trust the computed end time as the new "now" and thus cause the actual modification time to be writiten to the dirstate. debug.dirstate.delaywrite is undocumented and only used in test-largefiles-update.t . All tests seems to work fine for me without debug.dirstate.delaywrite . Perhaps because it not really worked as intended without the fix in this patch, and code and tests thus have evolved to do fine without it? It could thus perhaps make sense to drop usage of this setting in the tests. That could speed the test up a bit. This functionality (or something very similar) can however apparently be very convenient in setups where checking dirty-ness is expensive - such as when using large files and have slow file filesystems or are CPU constrained. Now it works and we can try it. (But ideally, for the largefile use case, it should probably only delay lfdirstate writes - not ordinary dirstate.) diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py --- a/mercurial/dirstate.py +++ b/mercurial/dirstate.py @@ -757,6 +757,7 @@ class dirstate(object): start = int(clock) - (int(clock) % delaywrite) end = start + delaywrite time.sleep(end - clock) +now = end # trust our estimate that the end is near now break st.write(parsers.pack_dirstate(self._map, self._copymap, self._pl, now)) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH stable] tests: work around instability that caused test from 4999c12c526b to fail
On 10/18/2016 02:30 PM, Pierre-Yves David wrote: On 10/18/2016 01:33 AM, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476746894 -7200 # Tue Oct 18 01:28:14 2016 +0200 # Branch stable # Node ID 548f82b480d086c7a551b025fb980cd70187c880 # Parent 328545c7d8a1044330b8a5bfbdd9c2ff08625d6a tests: work around instability that caused test from 4999c12c526b to fail I'm not too sure of what is going on here, Can you elaborate? I'm also not sure what is going on. I suddenly saw the new test I added started to fail. Not in the actual test but in the setup code. Apparently unrelated to other recent changes - the new test just happened to expose it. Thus, I suggest this workaround for now. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] spelling: fixes of non-dictionary words
On 10/18/2016 02:20 AM, Augie Fackler wrote: diff --git a/contrib/check-code.py b/contrib/check-code.py --- a/contrib/check-code.py +++ b/contrib/check-code.py @@ -463,7 +463,7 @@ def _preparepats(): failandwarn = c[-1] for pats in failandwarn: for i, pseq in enumerate(pats): -# fix-up regexes for multi-line searches +# fix-up regexps for multi-line searches I've always used singular "regex" plural "regexes" - I don't see "regexp" much in my day-to-day life. Dropped all instances of this correction for now. Fine, as long as we can be consistent ;-) Mercurial calls it "regexp" in .hgignore - we can't change that. We have approximately twice as many uses of 'regexp' as of 'regex'. The alternative to 'regexp' thus seems to be officially inconsistent. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH stable] tests: work around instability that caused test from 4999c12c526b to fail
# HG changeset patch # User Mads Kiilerich # Date 1476746894 -7200 # Tue Oct 18 01:28:14 2016 +0200 # Branch stable # Node ID 548f82b480d086c7a551b025fb980cd70187c880 # Parent 328545c7d8a1044330b8a5bfbdd9c2ff08625d6a tests: work around instability that caused test from 4999c12c526b to fail diff --git a/tests/test-largefiles-update.t b/tests/test-largefiles-update.t --- a/tests/test-largefiles-update.t +++ b/tests/test-largefiles-update.t @@ -732,12 +732,16 @@ bit correctly on the platform being unaw #endif +FIXME: At this point large2 seems to be fishy and cause up -c to fail +"randomly" even though summary shows no changes. For now, just work around it: + $ rm large2 .hglf/large2 + Test a fatal error interrupting an update. Verify that status report dirty files correctly after an interrupted update. Also verify that checking all hashes reveals it isn't clean. Start with clean dirstates: - $ hg up -qcr "8^" + $ hg up -qCr "8^" $ sleep 1 $ hg st Update standins without updating largefiles - large1 is modified and largeX is ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH] spelling: fixes of non-dictionary words
# HG changeset patch # User Mads Kiilerich # Date 1476739015 -7200 # Mon Oct 17 23:16:55 2016 +0200 # Node ID efd5397e9da5b1d7e2c3353b0b06fc904651b150 # Parent 8a864844d5a0c34bdb24d2e098a0cd339e32e020 spelling: fixes of non-dictionary words diff --git a/contrib/all-revsets.txt b/contrib/all-revsets.txt --- a/contrib/all-revsets.txt +++ b/contrib/all-revsets.txt @@ -8,7 +8,7 @@ # # Please update this file with any revsets you use for benchmarking a change so # that future contributors can easily find and retest it when doing further -# modification. Feel free to highlight interresting variants if needed. +# modification. Feel free to highlight interesting variants if needed. ## Revset from this section are all extracted from changelog when this file was diff --git a/contrib/check-code.py b/contrib/check-code.py --- a/contrib/check-code.py +++ b/contrib/check-code.py @@ -463,7 +463,7 @@ def _preparepats(): failandwarn = c[-1] for pats in failandwarn: for i, pseq in enumerate(pats): -# fix-up regexes for multi-line searches +# fix-up regexps for multi-line searches p = pseq[0] # \s doesn't match \n p = re.sub(r'(?_GenericReview(argslist) let s:origtabpagenr = tabpagenr() silent! exe 'tabedit ' . StrippedRelativeFilePath if exists('patchcmd') - " modelines in loaded files mess with diff comparision + " modelines in loaded files mess with diff comparison let s:keep_modeline=&modeline let &modeline=0 silent! exe 'vert diffsplit ' . tmpname . '.file' diff --git a/hgext/chgserver.py b/hgext/chgserver.py --- a/hgext/chgserver.py +++ b/hgext/chgserver.py @@ -629,7 +629,7 @@ class chgunixservicehandler(object): def chgunixservice(ui, repo, opts): if repo: -# one chgserver can serve multiple repos. drop repo infomation +# one chgserver can serve multiple repos. drop repo information ui.setconfig('bundle', 'mainreporoot', '', 'repo') h = chgunixservicehandler(ui) return commandserver.unixforkingservice(ui, repo=None, opts=opts, handler=h) diff --git a/hgext/fsmonitor/__init__.py b/hgext/fsmonitor/__init__.py --- a/hgext/fsmonitor/__init__.py +++ b/hgext/fsmonitor/__init__.py @@ -563,7 +563,7 @@ def wrapsymlink(orig, source, link_name) pass class state_update(object): -''' This context mananger is responsible for dispatching the state-enter +''' This context manager is responsible for dispatching the state-enter and state-leave signals to the watchman service ''' def __init__(self, repo, node, distance, partial): diff --git a/hgext/histedit.py b/hgext/histedit.py --- a/hgext/histedit.py +++ b/hgext/histedit.py @@ -1406,12 +1406,12 @@ def verifyactions(actions, state, ctxs): % node.short(missing[0])) def adjustreplacementsfrommarkers(repo, oldreplacements): -"""Adjust replacements from obsolescense markers +"""Adjust replacements from obsolescence markers Replacements structure is originally generated based on histedit's state and does not account for changes that are not recorded there. This function fixes that by adding -data read from obsolescense markers""" +data read from obsolescence markers""" if not obsolete.isenabled(repo, obsolete.createmarkersopt): return oldreplacements diff --git a/hgext/logtoprocess.py b/hgext/logtoprocess.py --- a/hgext/logtoprocess.py +++ b/hgext/logtoprocess.py @@ -27,7 +27,7 @@ For example:: would log the warning message and traceback of any failed command dispatch. -Scripts are run asychronously as detached daemon processes; mercurial will +Scripts are run asynchronously as detached daemon processes; mercurial will not ensure that they exit cleanly. """ diff --git a/hgext/rebase.py b/hgext/rebase.py --- a/hgext/rebase.py +++ b/hgext/rebase.py @@ -1382,7 +1382,7 @@ def _computeobsoletenotrebased(repo, reb """return a mapping obsolete => successor for all obsolete nodes to be rebased that have a successors in the destination -obsolete => None entries in the mapping indicate nodes with no succesor""" +obsolete => None entries in the mapping indicate nodes with no successor""" obsoletenotrebased = {} # Build a mapping successor => obsolete nodes for the obsolete diff --git a/i18n/ja.po b/i18n/ja.po --- a/i18n/ja.po +++ b/i18n/ja.po @@ -6713,7 +6713,7 @@ msgid "" msgstr "" msgid "" -"Scripts are run asychronously as detached daemon processes; mercurial will\n" +"Scripts are run a
[PATCH] revset: optimize for destination() being "inefficient"
# HG changeset patch # User Mads Kiilerich # Date 1476726516 -7200 # Mon Oct 17 19:48:36 2016 +0200 # Node ID 50a7bbe82bd3b63a934640b1480ee643cf14ec5f # Parent 5cb830801855dbb63e98b948e355bc995d295bf3 revset: optimize for destination() being "inefficient" destination() will scan through the whole subset and read extras for each revision to get its source. diff --git a/mercurial/revset.py b/mercurial/revset.py --- a/mercurial/revset.py +++ b/mercurial/revset.py @@ -2594,7 +2594,7 @@ def _optimize(x, small): f = getsymbol(x[1]) wa, ta = _optimize(x[2], small) if f in ('author', 'branch', 'closed', 'date', 'desc', 'file', 'grep', - 'keyword', 'outgoing', 'user'): + 'keyword', 'outgoing', 'user', 'destination'): w = 10 # slow elif f in ('modifies', 'adds', 'removes'): w = 30 # slower ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH stable] largefiles: fix 'deleted' files sometimes persistently appearing with R status
On 10/17/2016 05:12 PM, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1476717144 -7200 # Mon Oct 17 17:12:24 2016 +0200 # Node ID bba9f6bba98f82f2403aac8dc656569562690472 # Parent 92414d57a05de39864aecc625e775c693d9edb51 largefiles: fix 'deleted' files sometimes persistently appearing with R status A code snippet that has been around since largefiles was introduced was wrong: Standins no longer found in lfdirstate has *not* been removed - they have probably just been deleted ... or not created. This wrong reporting did that 'up -C' didn't undo the change and didn't sync the two dirstates. Instead of reporting such files as removed, propagate the deletion to the standin file and report the file as deleted. diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py --- a/hgext/largefiles/reposetup.py +++ b/hgext/largefiles/reposetup.py @@ -164,8 +164,8 @@ def reposetup(ui, repo): # files from lfdirstate unsure, s = lfdirstate.status(match, [], False, listclean, False) -(modified, added, removed, clean) = (s.modified, s.added, - s.removed, s.clean) +(modified, added, removed, deleted, clean) = ( +s.modified, s.added, s.removed, s.deleted, s.clean) if parentworking: for lfile in unsure: standin = lfutil.standin(lfile) @@ -206,14 +206,18 @@ def reposetup(ui, repo): removed = [lfile for lfile in removed if lfutil.standin(lfile) in ctx1] -# Standins no longer found in lfdirstate has been -# removed +# Standins no longer found in lfdirstate have been deleted for standin in ctx1.walk(lfutil.getstandinmatcher(self)): lfile = lfutil.splitstandin(standin) if not match(lfile): continue if lfile not in lfdirstate: -removed.append(lfile) +deleted.append(lfile) +# Sync "largefile has been removed" back to the Hmm ... for clarity, this should have been "deleted" /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH stable] largefiles: fix 'deleted' files sometimes persistently appearing with R status
# HG changeset patch # User Mads Kiilerich # Date 1476717144 -7200 # Mon Oct 17 17:12:24 2016 +0200 # Node ID bba9f6bba98f82f2403aac8dc656569562690472 # Parent 92414d57a05de39864aecc625e775c693d9edb51 largefiles: fix 'deleted' files sometimes persistently appearing with R status A code snippet that has been around since largefiles was introduced was wrong: Standins no longer found in lfdirstate has *not* been removed - they have probably just been deleted ... or not created. This wrong reporting did that 'up -C' didn't undo the change and didn't sync the two dirstates. Instead of reporting such files as removed, propagate the deletion to the standin file and report the file as deleted. diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py --- a/hgext/largefiles/reposetup.py +++ b/hgext/largefiles/reposetup.py @@ -164,8 +164,8 @@ def reposetup(ui, repo): # files from lfdirstate unsure, s = lfdirstate.status(match, [], False, listclean, False) -(modified, added, removed, clean) = (s.modified, s.added, - s.removed, s.clean) +(modified, added, removed, deleted, clean) = ( +s.modified, s.added, s.removed, s.deleted, s.clean) if parentworking: for lfile in unsure: standin = lfutil.standin(lfile) @@ -206,14 +206,18 @@ def reposetup(ui, repo): removed = [lfile for lfile in removed if lfutil.standin(lfile) in ctx1] -# Standins no longer found in lfdirstate has been -# removed +# Standins no longer found in lfdirstate have been deleted for standin in ctx1.walk(lfutil.getstandinmatcher(self)): lfile = lfutil.splitstandin(standin) if not match(lfile): continue if lfile not in lfdirstate: -removed.append(lfile) +deleted.append(lfile) +# Sync "largefile has been removed" back to the +# standin. Removing a file as a side effect of +# running status is gross, but the alternatives (if +# any) are worse. +self.wvfs.unlink(standin) # Filter result lists result = list(result) @@ -237,7 +241,7 @@ def reposetup(ui, repo): normals = [[fn for fn in filelist if not lfutil.isstandin(fn)] for filelist in result] -lfstatus = (modified, added, removed, s.deleted, [], [], +lfstatus = (modified, added, removed, deleted, [], [], clean) result = [sorted(list1 + list2) for (list1, list2) in zip(normals, lfstatus)] diff --git a/tests/test-largefiles-update.t b/tests/test-largefiles-update.t --- a/tests/test-largefiles-update.t +++ b/tests/test-largefiles-update.t @@ -751,30 +751,22 @@ added: $ hg up -Cr "8" --config extensions.crashupdatelfiles=../crashupdatelfiles.py [7] Check large1 content and status ... and that update will undo modifications: -BUG: large is R $ cat large1 large1 in #3 $ hg st M large1 - R largeX + ! largeX $ hg up -Cr . getting changed largefiles - 1 largefiles updated, 0 removed - 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + 2 largefiles updated, 0 removed + 2 files updated, 0 files merged, 0 files removed, 0 files unresolved $ cat large1 manually modified before 'hg transplant --continue' $ hg st - R largeX -Force largefiles rehashing and check again - which makes it realize that largeX -not has been removed but just doesn't exist: +Force largefiles rehashing and check that all changes have been caught by +status and update: $ rm .hg/largefiles/dirstate $ hg st - ! largeX - $ hg up -Cr . - getting changed largefiles - 1 largefiles updated, 0 removed - 0 files updated, 0 files merged, 0 files removed, 0 files unresolved - $ hg st $ cd .. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: disrstate._check(link|exec) modifies repo dir mtime
On 10/17/2016 03:55 PM, Yuya Nishihara wrote: Please correct me if I misread the patch. 1. checkexec() on normal unix filesystem: creates 'checkisexec' 0755 2. copy it to VFAT: 'checkisexec' 0644 3. copy back from VFAT: 'checkisexec' 0644 4. checkexec() returns False because no exec flag set You are right. I had local changes that not had been posted. I will resend after 4.0 . /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: disrstate._check(link|exec) modifies repo dir mtime
On 10/16/2016 04:19 PM, Yuya Nishihara wrote on the users list: On Sun, 16 Oct 2016 14:08:52 +0200, Mads Kiilerich wrote: On 10/16/2016 12:29 PM, Yuya Nishihara wrote: On Sat, 1 Oct 2016 11:35:40 -0700, Dorian Taylor wrote: is there a reason why all this testing can’t go on in $REPO/.hg? No idea why, but there was an attempt to move check* files to .hg/cache. https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-October/074885.html https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-November/074971.html About that: I don't think this is sufficient. This case can occur, for instance, if someone naively copies a repo from one machine to another with a USB stick. Or perhaps they've done a naive recursive chmod. Then we're permanently stuck thinking our filesystem doesn't have exec bits. If our fast-path files don't have the right permissions (for who knows what reasons), then we have to do the entire slow path. I don't see how copying or chmod can trigger wrong measurement of file system capabilities. "Wrong" files will be removed again - just like in the old slow path. The slow path will pretty much be the existing slow path - it will not be slower than before but it will usually not be used. Can we distinguish "no exec bit support" with "cache/checkisexec was chmod by user" ? If a repository is copied through VFAT USB stick, exec bit could be dropped from both checkisexec and checknoexec. The basic idea (IIRC) in https://www.mercurial-scm.org/pipermail/mercurial-devel/2015-October/074887.html is that if we have one cached file that is exec and another that isn't, then we can be pretty sure that that file system supports X bit. (That assumption might be wrong for filesystems that cache the x bit but doesn't persist it - but in that case the repo will be screwed anyway.) If we can't conclude anything from stat'ing existing files, we remove them and check the old way ... and if the result is positive, we leave the files behind so it can take the fast path next time. I thus don't see what you are asking for that not already is taken care of. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 2 stable] largefiles: test coverage of fatal interruption of update
# HG changeset patch # User Mads Kiilerich # Date 1476577598 -7200 # Sun Oct 16 02:26:38 2016 +0200 # Node ID 2105975810c843d016ba930c44e63699af2703a6 # Parent 5cb830801855dbb63e98b948e355bc995d295bf3 largefiles: test coverage of fatal interruption of update Test using existing changesets in a clean working directory, revealing problems with files that don't show up as modified or do show up as removed when they just not have been written yet. diff --git a/tests/test-largefiles-update.t b/tests/test-largefiles-update.t --- a/tests/test-largefiles-update.t +++ b/tests/test-largefiles-update.t @@ -144,6 +144,7 @@ Test that "hg merge" updates largefiles large1 in #1 $ cat .hglf/large1 58e24f733a964da346e2407a2bee99d9001184f5 + $ rm normal1.orig (merge non-existing largefiles from "other" via conflict prompt - make sure the following commit doesn't abort in a confusing way when trying to @@ -243,6 +244,7 @@ Test that "hg rollback" restores status ? largeY $ test -f .hglf/largeY [1] + $ rm largeY Test that "hg rollback" restores standins correctly @@ -285,6 +287,7 @@ is not branch-tip) 58e24f733a964da346e2407a2bee99d9001184f5 $ cat .hglf/large2 1deebade43c8c498a3c8daddac0244dc55d1331d + $ rm normalX Test that "hg status" shows status of largefiles correctly just after automated commit like rebase/transplant @@ -598,6 +601,7 @@ it is aborted by conflict. 58e24f733a964da346e2407a2bee99d9001184f5 $ cat large1 large1 in #1 + $ rm normal1.orig Test that rebase updates standins for manually modified largefiles at the 1st commit of resuming. @@ -728,6 +732,48 @@ bit correctly on the platform being unaw #endif +Test a fatal error interrupting an update. lfdirstate doesn't realize that +.hglf has been updated while the largefile hasn't. Status thus shows a clean +state ... but rebuilding lfdirstate and checking all hashes reveals it isn't +clean. + +Start with clean dirstates: + $ hg up -qcr "8^" + $ sleep 1 + $ hg st +Update standins without updating largefiles: + $ cat << EOF > ../crashupdatelfiles.py + > import hgext.largefiles.lfutil + > def getlfilestoupdate(oldstandins, newstandins): + > raise SystemExit(7) + > hgext.largefiles.lfutil.getlfilestoupdate = getlfilestoupdate + > EOF + $ hg up -Cr "8" --config extensions.crashupdatelfiles=../crashupdatelfiles.py + [7] +Check large1 content and status: +BUG: largeX is R and large1 is not M and update does nothing + $ cat large1 + large1 in #3 + $ hg st + R largeX + $ hg up -Cr . + 0 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ hg st + R largeX +Force largefiles rehashing and check again - revealing modifications that +update now can remove: + $ rm .hg/largefiles/dirstate + $ hg st + M large1 + ! largeX + $ hg up -Cr . + getting changed largefiles + 2 largefiles updated, 0 removed + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ hg st + $ cat large1 + manually modified before 'hg transplant --continue' + $ cd .. Test that "hg convert" avoids copying largefiles from the working ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 2 stable] largefiles: more safe handling of interruptions while updating modifications
# HG changeset patch # User Mads Kiilerich # Date 1476577785 -7200 # Sun Oct 16 02:29:45 2016 +0200 # Node ID dcd190cb73a0740be570bdfbc1090066c38e1173 # Parent 2105975810c843d016ba930c44e63699af2703a6 largefiles: more safe handling of interruptions while updating modifications Largefiles are fragile with the design where dirstate and lfdirstate must be kept in sync. To be less fragile, mark all clean largefiles as unsure ("normallookup") before updating standins. After standins have been updated and we know exactly which largefile standins actually was changed, mark the unchanged largefiles back to clean ("normal"). This will make the failure mode more safe. If interrupted, the next command will continue to perform extra hashing of all largefiles. That will do that all largefiles that are out of sync with their standin will be marked dirty and they will show up in status and can be cleaned with update --clean. diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py --- a/hgext/largefiles/overrides.py +++ b/hgext/largefiles/overrides.py @@ -1385,7 +1385,8 @@ def mergeupdate(orig, repo, node, branch lfdirstate = lfutil.openlfdirstate(repo.ui, repo) unsure, s = lfdirstate.status(matchmod.always(repo.root, repo.getcwd()), - [], False, False, False) + [], False, True, False) +oldclean = set(s.clean) pctx = repo['.'] for lfile in unsure + s.modified: lfileabs = repo.wvfs.join(lfile) @@ -1397,9 +1398,13 @@ def mergeupdate(orig, repo, node, branch lfutil.getexecutable(lfileabs)) if (standin in pctx and lfhash == lfutil.readstandin(repo, lfile, '.')): -lfdirstate.normal(lfile) +oldclean.add(lfile) for lfile in s.added: lfutil.updatestandin(repo, lfutil.standin(lfile)) +# mark all clean largefiles as dirty, just in case the update gets +# interrupted before largefiles and lfdirstate are synchronized +for lfile in oldclean: +lfdirstate.normallookup(lfile) lfdirstate.write() oldstandins = lfutil.getstandinsstate(repo) @@ -1408,6 +1413,13 @@ def mergeupdate(orig, repo, node, branch newstandins = lfutil.getstandinsstate(repo) filelist = lfutil.getlfilestoupdate(oldstandins, newstandins) + +# to avoid leaving all largefiles as dirty and thus rehash them, mark +# all the ones that didn't change as clean +for lfile in oldclean.difference(filelist): +lfdirstate.normal(lfile) +lfdirstate.write() + if branchmerge or force or partial: filelist.extend(s.deleted + s.removed) diff --git a/tests/test-largefiles-update.t b/tests/test-largefiles-update.t --- a/tests/test-largefiles-update.t +++ b/tests/test-largefiles-update.t @@ -732,16 +732,16 @@ bit correctly on the platform being unaw #endif -Test a fatal error interrupting an update. lfdirstate doesn't realize that -.hglf has been updated while the largefile hasn't. Status thus shows a clean -state ... but rebuilding lfdirstate and checking all hashes reveals it isn't -clean. +Test a fatal error interrupting an update. Verify that status report dirty +files correctly after an interrupted update. Also verify that checking all +hashes reveals it isn't clean. Start with clean dirstates: $ hg up -qcr "8^" $ sleep 1 $ hg st -Update standins without updating largefiles: +Update standins without updating largefiles - large1 is modified and largeX is +added: $ cat << EOF > ../crashupdatelfiles.py > import hgext.largefiles.lfutil > def getlfilestoupdate(oldstandins, newstandins): @@ -750,29 +750,31 @@ Update standins without updating largefi > EOF $ hg up -Cr "8" --config extensions.crashupdatelfiles=../crashupdatelfiles.py [7] -Check large1 content and status: -BUG: largeX is R and large1 is not M and update does nothing +Check large1 content and status ... and that update will undo modifications: +BUG: large is R $ cat large1 large1 in #3 $ hg st + M large1 R largeX $ hg up -Cr . - 0 files updated, 0 files merged, 0 files removed, 0 files unresolved + getting changed largefiles + 1 largefiles updated, 0 removed + 1 files updated, 0 files merged, 0 files removed, 0 files unresolved + $ cat large1 + manually modified before 'hg transplant --continue' $ hg st R largeX -Force largefiles rehashing and check again - revealing modifications that -update now can remove: +Force largefiles rehashing and check again - which makes it realize that largeX +not has been removed but just doesn't exist: $ rm .hg/largefiles/dirstate $ hg st - M larg
[PATCH 1 of 3 v2] cmdutil: satisfy expections in dirstateguard.__del__, even if __init__ fails
# HG changeset patch # User Mads Kiilerich # Date 1476402795 -7200 # Fri Oct 14 01:53:15 2016 +0200 # Node ID 426994acbf9c9b78aa6922a279dca4091b6508dc # Parent c0410814002f467c24ef07ce73850ba15b306f8e cmdutil: satisfy expections in dirstateguard.__del__, even if __init__ fails Python "delstructors" are terrible - this one because it assumed that __init__ had completed before it was called. That would not necessarily be the case if the repository was read only or broken and saving the dirstate thus failed in unexpected ways. That could give confusing warnings about missing '_active' after failures. To fix that, make sure all member variables are "declared" before doing anything that possibly could fail. [Famous last words.] diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py --- a/mercurial/cmdutil.py +++ b/mercurial/cmdutil.py @@ -3508,10 +3508,11 @@ class dirstateguard(object): def __init__(self, repo, name): self._repo = repo +self._active = False +self._closed = False self._suffix = '.backup.%s.%d' % (name, id(self)) repo.dirstate.savebackup(repo.currenttransaction(), self._suffix) self._active = True -self._closed = False def __del__(self): if self._active: # still active ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 3 v2] tests: add more test coverage of phase changes when pushing
# HG changeset patch # User Mads Kiilerich # Date 1471734720 -7200 # Sun Aug 21 01:12:00 2016 +0200 # Node ID 24a7ccfb932b134da24e58817991943c8bbd63fa # Parent 426994acbf9c9b78aa6922a279dca4091b6508dc tests: add more test coverage of phase changes when pushing Prepare for test coverage of phase updates with future push --readonly option, both with and without actually pushing changesets. diff --git a/tests/test-phases-exchange.t b/tests/test-phases-exchange.t --- a/tests/test-phases-exchange.t +++ b/tests/test-phases-exchange.t @@ -1182,10 +1182,40 @@ 2. Test that failed phases movement are cannot lock source repo, skipping local public phase update [1] $ chmod -R +w .hg - $ hgph Upsilon $ cd .. - $ killdaemons.py +#endif -#endif +Test that clone behaves like pull and doesn't +publish changesets as plain push does + + $ hg -R Upsilon phase -q --force --draft 2 + $ hg clone -q Upsilon Pi -r 7 + $ hgph Upsilon -r 'min(draft())' + o 2 draft a-C - 54acac6f23ab + | + ~ + + $ hg -R Upsilon push Pi -r 7 + pushing to Pi + searching for changes + no changes found + [1] + $ hgph Upsilon -r 'min(draft())' + o 8 draft a-F - b740e3e5c05d + | + ~ + + $ hg -R Upsilon push Pi -r 8 + pushing to Pi + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 1 changes to 1 files + + $ hgph Upsilon -r 'min(draft())' + o 9 draft a-G - 3e27b6f1eee1 + | + ~ ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 3 v2] repos: introduce '-R readonly:PATH' for doing local operations "read only"
# HG changeset patch # User Mads Kiilerich # Date 1476402795 -7200 # Fri Oct 14 01:53:15 2016 +0200 # Node ID 491da143c2f310da732c2e1005e7f8394134a51d # Parent 24a7ccfb932b134da24e58817991943c8bbd63fa repos: introduce '-R readonly:PATH' for doing local operations "read only" When used as 'readonly:.', this is pretty much like if the repository was owned by another user and the current user didn't have write access to anything in .hg . Using this feature will for example allow multiple simultaneous pushes, pushes without phase changes, and will provide a "safe" way to run commands ... assuming this and our use of VFS is complete and correct. The existing "API" for repository types could use some cleanup - it requires modules with special undefined duck typing. This patch seems to do what is needed. The existing VFS class hierarchy has a "readonlyvfs" class, but whatever it is, it doesn't seem suitable for this use; it doesn't seem to be a reusable class or mixin. diff --git a/mercurial/hg.py b/mercurial/hg.py --- a/mercurial/hg.py +++ b/mercurial/hg.py @@ -31,6 +31,7 @@ from . import ( merge as mergemod, node, phases, +readonlyrepo, repoview, scmutil, sshpeer, @@ -112,6 +113,7 @@ schemes = { 'https': httppeer, 'ssh': sshpeer, 'static-http': statichttprepo, +'readonly': lambda path: readonlyrepo, } def _peerlookup(path): diff --git a/mercurial/readonlyrepo.py b/mercurial/readonlyrepo.py new file mode 100644 --- /dev/null +++ b/mercurial/readonlyrepo.py @@ -0,0 +1,57 @@ +# readonlyrepo.py - a local repository class for mercurial that can't write +# or lock the repository +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +from __future__ import absolute_import + +import errno + +from .i18n import _ +from . import ( +localrepo, +scmutil, +util, +) + +# created, not thrown yet +readonlyexception = IOError(errno.EACCES, _('this is a "readonly" repository')) + +# this seems to be a necessary part of the repository type API +islocal = localrepo.islocal + +class readonlyvfs(scmutil.vfs): +"""A VFS that only can be called with read modes - writing will fail with +an IO error as if the user didn't have write access""" + +def __call__(self, path, mode='r', *args, **kw): +if mode not in ('r', 'rb'): +raise readonlyexception +return super(readonlyvfs, self).__call__(path, mode, *args, **kw) + +class readonlyrepo(localrepo.localrepository): +"""A repository that is local but read only, as if the user didn't have +file system write access.""" + +def __init__(self, baseui, path=None, create=False): +# we know the "scheme" for path is "readonly" but do not want to extend +# the file/bundle hack in the "url" parser - just strip it here +assert path.startswith('readonly:'), path +path = path[len('readonly:'):] + +super(readonlyrepo, self).__init__(baseui, path=path, create=False) + +assert self.vfs.__class__ is scmutil.vfs +self.vfs.__class__ = readonlyvfs +assert self.wvfs.__class__ is scmutil.vfs +self.wvfs.__class__ = readonlyvfs + +def lock(self, wait=True): +raise readonlyexception + +def wlock(self, wait=True): +raise readonlyexception + +def instance(ui, path, create): +return readonlyrepo(ui, util.urllocalpath(path), create=False) diff --git a/tests/test-phases-exchange.t b/tests/test-phases-exchange.t --- a/tests/test-phases-exchange.t +++ b/tests/test-phases-exchange.t @@ -1197,25 +1197,27 @@ publish changesets as plain push does | ~ - $ hg -R Upsilon push Pi -r 7 + $ hg -R readonly:Upsilon push Pi -r 7 pushing to Pi searching for changes no changes found + cannot lock source repo, skipping local public phase update [1] $ hgph Upsilon -r 'min(draft())' - o 8 draft a-F - b740e3e5c05d + o 2 draft a-C - 54acac6f23ab | ~ - $ hg -R Upsilon push Pi -r 8 + $ hg -R readonly:Upsilon push Pi -r 8 pushing to Pi searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files + cannot lock source repo, skipping local public phase update $ hgph Upsilon -r 'min(draft())' - o 9 draft a-G - 3e27b6f1eee1 + o 2 draft a-C - 54acac6f23ab | ~ ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 2 v2] util: increase filechunkiter size to 128k
# HG changeset patch # User Mads Kiilerich # Date 1476402795 -7200 # Fri Oct 14 01:53:15 2016 +0200 # Node ID 8909ce1593659bce0563cc66ca3fa34525daed65 # Parent b7889580507c3d1d40e61904c7a2c2aba335c6cd util: increase filechunkiter size to 128k util.filechunkiter has been using a chunk size of 64k for more than 10 years, also in years where Moore's law still was a law. It is probably ok to bump it now and perhaps get a slight win in some cases. Also, largefiles have been using 128k for a long time. Specifying that size multiple times (or forgetting to do it) seems a bit stupid. Decreasing it to 64k also seems unfortunate. Thus, we will set the default chunksize to 128k and use the default everywhere. diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -372,7 +372,7 @@ def hashfile(file): return '' hasher = hashlib.sha1('') with open(file, 'rb') as fd: -for data in util.filechunkiter(fd, 128 * 1024): +for data in util.filechunkiter(fd): hasher.update(data) return hasher.hexdigest() diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py --- a/hgext/largefiles/overrides.py +++ b/hgext/largefiles/overrides.py @@ -1356,7 +1356,7 @@ def overridecat(orig, ui, repo, file1, * 'downloaded') % lf) path = lfutil.usercachepath(repo.ui, hash) with open(path, "rb") as fpin: -for chunk in util.filechunkiter(fpin, 128 * 1024): +for chunk in util.filechunkiter(fpin): fp.write(chunk) err = 0 return err diff --git a/hgext/largefiles/proto.py b/hgext/largefiles/proto.py --- a/hgext/largefiles/proto.py +++ b/hgext/largefiles/proto.py @@ -134,7 +134,7 @@ def wirereposetup(ui, repo): length)) # SSH streams will block if reading more than length -for chunk in util.filechunkiter(stream, 128 * 1024, length): +for chunk in util.filechunkiter(stream, limit=length): yield chunk # HTTP streams must hit the end to process the last empty # chunk of Chunked-Encoding so the connection can be reused. diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -1684,9 +1684,9 @@ class chunkbuffer(object): return ''.join(buf) -def filechunkiter(f, size=65536, limit=None): +def filechunkiter(f, size=131072, limit=None): """Create a generator that produces the data in the file size -(default 65536) bytes at a time, up to optional limit (default is +(default 131072) bytes at a time, up to optional limit (default is to read all data). Chunks may be less than size bytes if the chunk is the last chunk in the file, or the file is a socket or some other type of file that sometimes reads less data than is diff --git a/tests/test-largefiles-small-disk.t b/tests/test-largefiles-small-disk.t --- a/tests/test-largefiles-small-disk.t +++ b/tests/test-largefiles-small-disk.t @@ -11,7 +11,7 @@ Test how largefiles abort in case the di > shutil.copyfileobj = copyfileobj > # > # this makes the rewritten code abort: - > def filechunkiter(f, size=65536, limit=None): + > def filechunkiter(f, size=131072, limit=None): > yield f.read(4) > raise IOError(errno.ENOSPC, os.strerror(errno.ENOSPC)) > util.filechunkiter = filechunkiter ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 2 v2] largefiles: always use filechunkiter when iterating files
# HG changeset patch # User Mads Kiilerich # Date 1476267738 -7200 # Wed Oct 12 12:22:18 2016 +0200 # Node ID b7889580507c3d1d40e61904c7a2c2aba335c6cd # Parent c0410814002f467c24ef07ce73850ba15b306f8e largefiles: always use filechunkiter when iterating files Before, we would sometimes use the default iterator over large files. That iterator is line based and would add extra buffering and use odd chunk sizes which could give some overhead. copyandhash can't just apply a filechunkiter as it sometimes is passed a genuine generator when downloading remotely. diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -231,7 +231,8 @@ def copyfromcache(repo, hash, filename): # don't use atomic writes in the working copy. with open(path, 'rb') as srcfd: with wvfs(filename, 'wb') as destfd: -gothash = copyandhash(srcfd, destfd) +gothash = copyandhash( +util.filechunkiter(srcfd), destfd) if gothash != hash: repo.ui.warn(_('%s: data corruption in %s with hash %s\n') % (filename, path, gothash)) diff --git a/hgext/largefiles/localstore.py b/hgext/largefiles/localstore.py --- a/hgext/largefiles/localstore.py +++ b/hgext/largefiles/localstore.py @@ -10,6 +10,7 @@ from __future__ import absolute_import from mercurial.i18n import _ +from mercurial import util from . import ( basestore, @@ -42,7 +43,8 @@ class localstore(basestore.basestore): raise basestore.StoreError(filename, hash, self.url, _("can't get file locally")) with open(path, 'rb') as fd: -return lfutil.copyandhash(fd, tmpfile) +return lfutil.copyandhash( +util.filechunkiter(fd), tmpfile) def _verifyfiles(self, contents, filestocheck): failed = False diff --git a/hgext/largefiles/remotestore.py b/hgext/largefiles/remotestore.py --- a/hgext/largefiles/remotestore.py +++ b/hgext/largefiles/remotestore.py @@ -118,7 +118,7 @@ class remotestore(basestore.basestore): raise NotImplementedError('abstract method') def _get(self, hash): -'''Get file with the given hash from the remote store.''' +'''Get a iterator for content with the given hash.''' raise NotImplementedError('abstract method') def _stat(self, hashes): ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH] demandimport: disable lazy import of __builtin__
# HG changeset patch # User Mads Kiilerich # Date 1476407019 -7200 # Fri Oct 14 03:03:39 2016 +0200 # Node ID c0fd76f612735f0354102603ee630437208d8336 # Parent c0410814002f467c24ef07ce73850ba15b306f8e demandimport: disable lazy import of __builtin__ Demandimport uses the "try to import __builtin__, else use builtins" trick to handle Python 3. External libraries and extensions might do something similar. On Fedora 25 subversion-python-1.9.4-4.fc25.x86_64 will do just that (except the opposite) ... and it failed all subversion convert tests because demandimport was hiding that it didn't have builtins but should use __builtin__. The builtin module has already been imported when demandimport is loaded so there is no point in trying to import it on demand. Just always ignore both variants in demandimport. diff --git a/mercurial/demandimport.py b/mercurial/demandimport.py --- a/mercurial/demandimport.py +++ b/mercurial/demandimport.py @@ -291,6 +291,8 @@ ignore = [ 'sqlalchemy.events', # has import-time side effects (issue5085) # setuptools 8 expects this module to explode early when not on windows 'distutils.msvc9compiler', +'__builtin__', +'builtins', ] if _pypy: ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 6] largefiles: when setting/clearing x bit on largefiles, don't change other bits
On 10/13/2016 05:57 PM, Pierre-Yves David wrote: On 10/13/2016 04:13 PM, Yuya Nishihara wrote: Okay, pruned the following changes from hg-committed: Urg, direct pruning on hg-commmitted is problematic as it makes changesets disappear on the contributor see (see the related thread on the reviewers mailing list a while back and the drophack¹ extension (for reviewers only)). Don't worry; I don't use evolve - this is one of the reasons ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 2] eol: do not wait on lack when writing cache
On 10/13/2016 05:07 PM, Pierre-Yves David wrote: On 10/13/2016 03:21 PM, Mads Kiilerich wrote: On 10/13/2016 01:53 PM, Pierre-Yves David wrote: # HG changeset patch # User Pierre-Yves David # Date 1476359131 -7200 # Thu Oct 13 13:45:31 2016 +0200 # Node ID 88cc944830d0c1895e527d6ca13687f1d5e1c785 # Parent 747e546c561fbf34d07cd30013eaf42b0190bb3b eol: do not wait on lack when writing cache The cache writing process is properly catching and handling the case where the lock is unavailable. However, it fails to specify the lock can failed to be acquired when requesting it. This is now fixed. Hmm. *If* the user has write access to the repo and *could* lock the repo, then it seems reasonable that it waits for the lock and does the right thing. It would be unfortunate to bail out early and happily continue to expose the less optimal state that read only users might have to deal with. The change introduced by this changeset make the cache in line with how most of other cache works in Mercurial. A part of the problem here might be that it is unclear to me what happens with wait=False. I don't remember the details: will it continue without locking or will it raise? It would be nice to get that clarified in the localrepo docstrings. Also, this "cache" is not so much a "store values so we don't have to compute them again" cache. It is more about recording how .hgeol looked like when file content from the repo was "cached" in the working directory. Without this, we have to invalidate the dirstate more often. The lock don't really have time out so simple read only command could hold themself forever if make this call blocking. Why not "really"? There is the 10 minutes timeout? Given than eol is probably not going to be user on pulling Sorry, I don't understand that sentence. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 6] largefiles: when setting/clearing x bit on largefiles, don't change other bits
On 10/13/2016 03:58 PM, Yuya Nishihara wrote: On Thu, 13 Oct 2016 15:46:25 +0200, Mads Kiilerich wrote: On 10/13/2016 03:39 PM, Yuya Nishihara wrote: The series looks good to me. Queued up to the patch 5, thanks. I have also looked into reworking it to increasing the default chunksize to 128k everywhere - that seems a bit cleaner in hindsight. I will send that version if you un-queue it again ;-) Unqueue everything? I have no changes to the first two so it would be great to keep them ... but I could also easily resend them - whatever is least trouble ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 6] largefiles: when setting/clearing x bit on largefiles, don't change other bits
On 10/13/2016 03:39 PM, Yuya Nishihara wrote: On Sat, 08 Oct 2016 01:26:05 +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1475881180 -7200 # Sat Oct 08 00:59:40 2016 +0200 # Node ID 96315a5833ed015acb7bd8f6d7f1e38db6fa9c50 # Parent 1779dde4c9ef97cb242f8d501655f236f66e5439 largefiles: when setting/clearing x bit on largefiles, don't change other bits The series looks good to me. Queued up to the patch 5, thanks. I have also looked into reworking it to increasing the default chunksize to 128k everywhere - that seems a bit cleaner in hindsight. I will send that version if you un-queue it again ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 1 of 2] eol: do not wait on lack when writing cache
On 10/13/2016 01:53 PM, Pierre-Yves David wrote: # HG changeset patch # User Pierre-Yves David # Date 1476359131 -7200 # Thu Oct 13 13:45:31 2016 +0200 # Node ID 88cc944830d0c1895e527d6ca13687f1d5e1c785 # Parent 747e546c561fbf34d07cd30013eaf42b0190bb3b eol: do not wait on lack when writing cache The cache writing process is properly catching and handling the case where the lock is unavailable. However, it fails to specify the lock can failed to be acquired when requesting it. This is now fixed. Hmm. *If* the user has write access to the repo and *could* lock the repo, then it seems reasonable that it waits for the lock and does the right thing. It would be unfortunate to bail out early and happily continue to expose the less optimal state that read only users might have to deal with. I thus don't think this change would make it less reliable ... and I don't see it solving a real problem. (The next change for proper release is however +1.) /Mads diff --git a/hgext/eol.py b/hgext/eol.py --- a/hgext/eol.py +++ b/hgext/eol.py @@ -335,7 +335,7 @@ def reposetup(ui, repo): wlock = None try: -wlock = self.wlock() +wlock = self.wlock(wait=False) for f in self.dirstate: if self.dirstate[f] != 'n': continue ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 3 v2] tests: add more test coverage of phase changes when pushing
# HG changeset patch # User Mads Kiilerich # Date 1471734720 -7200 # Sun Aug 21 01:12:00 2016 +0200 # Node ID 6affe6ed5cc5b945961fb8f1fe2dc18e711089c2 # Parent 8db96226f748ff80003c0964bc4ece3b6c3e6d12 tests: add more test coverage of phase changes when pushing Prepare for test coverage of phase updates with future push --readonly option, both with and without actually pushing changesets. diff --git a/tests/test-phases-exchange.t b/tests/test-phases-exchange.t --- a/tests/test-phases-exchange.t +++ b/tests/test-phases-exchange.t @@ -1182,10 +1182,40 @@ 2. Test that failed phases movement are cannot lock source repo, skipping local public phase update [1] $ chmod -R +w .hg - $ hgph Upsilon $ cd .. - $ killdaemons.py +#endif -#endif +Test that clone behaves like pull and doesn't +publish changesets as plain push does + + $ hg -R Upsilon phase -q --force --draft 2 + $ hg clone -q Upsilon Pi -r 7 + $ hgph Upsilon -r 'min(draft())' + o 2 draft a-C - 54acac6f23ab + | + ~ + + $ hg -R Upsilon push Pi -r 7 + pushing to Pi + searching for changes + no changes found + [1] + $ hgph Upsilon -r 'min(draft())' + o 8 draft a-F - b740e3e5c05d + | + ~ + + $ hg -R Upsilon push Pi -r 8 + pushing to Pi + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 1 changes to 1 files + + $ hgph Upsilon -r 'min(draft())' + o 9 draft a-G - 3e27b6f1eee1 + | + ~ ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 3 v2] cmdutil: satisfy expections in dirstateguard.__del__, even if __init__ fails
# HG changeset patch # User Mads Kiilerich # Date 1476317981 -7200 # Thu Oct 13 02:19:41 2016 +0200 # Node ID 8db96226f748ff80003c0964bc4ece3b6c3e6d12 # Parent b85fa6bf298be07804a74d8fdec0d19fdbc6d740 cmdutil: satisfy expections in dirstateguard.__del__, even if __init__ fails Python "delstructors" are terrible - this one because it assumed that __init__ had completed before it was called. That would not necessarily be the case if the repository was read only or broken and saving the dirstate thus failed in unexpected ways. That could give confusing warnings about missing '_active' after failures. To fix that, make sure all member variables are "declared" before doing anything that possibly could fail. [Famous last words.] diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py --- a/mercurial/cmdutil.py +++ b/mercurial/cmdutil.py @@ -3502,10 +3502,11 @@ class dirstateguard(object): def __init__(self, repo, name): self._repo = repo +self._active = False +self._closed = False self._suffix = '.backup.%s.%d' % (name, id(self)) repo.dirstate.savebackup(repo.currenttransaction(), self._suffix) self._active = True -self._closed = False def __del__(self): if self._active: # still active ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 3 v2] repos: introduce '-R readonly:PATH' for doing local operations "read only"
# HG changeset patch # User Mads Kiilerich # Date 1476319116 -7200 # Thu Oct 13 02:38:36 2016 +0200 # Node ID bc711933bb8cfacfe87d83fe945cf566d79439d8 # Parent 6affe6ed5cc5b945961fb8f1fe2dc18e711089c2 repos: introduce '-R readonly:PATH' for doing local operations "read only" When used as 'readonly:.', this is pretty much like if the repository was owned by another user and the current user didn't have write access to anything in .hg . Using this feature will for example allow multiple simultaneous pushes, pushes without phase changes, and will provide a "safe" way to run commands ... assuming this and our use of VFS is complete and correct. The "API" for repository types could use some cleanup, but this seems to be one way to do what is needed. The existing VFS class hierarchy has a "readonlyvfs" class, but whatever it is, it doesn't seem suitable for this use. It doesn't seem to be a reusable class or mixin. diff --git a/mercurial/hg.py b/mercurial/hg.py --- a/mercurial/hg.py +++ b/mercurial/hg.py @@ -31,6 +31,7 @@ from . import ( merge as mergemod, node, phases, +readonlyrepo, repoview, scmutil, sshpeer, @@ -112,6 +113,7 @@ schemes = { 'https': httppeer, 'ssh': sshpeer, 'static-http': statichttprepo, +'readonly': lambda path: readonlyrepo, } def _peerlookup(path): diff --git a/mercurial/readonlyrepo.py b/mercurial/readonlyrepo.py new file mode 100644 --- /dev/null +++ b/mercurial/readonlyrepo.py @@ -0,0 +1,57 @@ +# readonlyrepo.py - a local repository class for mercurial that can't write +# or lock the repository +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +from __future__ import absolute_import + +import errno + +from .i18n import _ +from . import ( +localrepo, +scmutil, +util, +) + +# created, not thrown yet +readonlyexception = IOError(errno.EACCES, _('this is a "readonly" repository')) + +# this seems to be a necessary part of the repository type API +islocal = localrepo.islocal + +class readonlyvfs(scmutil.vfs): +"""A VFS that only can be called with read modes - writing will fail with +an IO error as if the user didn't have write access""" + +def __call__(self, path, mode='r', *args, **kw): +if mode not in ('r', 'rb'): +raise readonlyexception +return super(readonlyvfs, self).__call__(path, mode, *args, **kw) + +class readonlyrepo(localrepo.localrepository): +"""A repository that is local but read only, as if the user didn't have +file system write access.""" + +def __init__(self, baseui, path=None, create=False): +# we know the "scheme" for path is "readonly" but do not want to extend +# the file/bundle hack in the "url" parser - just strip it here +assert path.startswith('readonly:'), path +path = path[len('readonly:'):] + +super(readonlyrepo, self).__init__(baseui, path=path, create=False) + +assert self.vfs.__class__ is scmutil.vfs +self.vfs.__class__ = readonlyvfs +assert self.wvfs.__class__ is scmutil.vfs +self.wvfs.__class__ = readonlyvfs + +def lock(self, wait=True): +raise readonlyexception + +def wlock(self, wait=True): +raise readonlyexception + +def instance(ui, path, create): +return readonlyrepo(ui, util.urllocalpath(path), create=False) diff --git a/tests/test-phases-exchange.t b/tests/test-phases-exchange.t --- a/tests/test-phases-exchange.t +++ b/tests/test-phases-exchange.t @@ -1197,25 +1197,27 @@ publish changesets as plain push does | ~ - $ hg -R Upsilon push Pi -r 7 + $ hg -R readonly:Upsilon push Pi -r 7 pushing to Pi searching for changes no changes found + cannot lock source repo, skipping local public phase update [1] $ hgph Upsilon -r 'min(draft())' - o 8 draft a-F - b740e3e5c05d + o 2 draft a-C - 54acac6f23ab | ~ - $ hg -R Upsilon push Pi -r 8 + $ hg -R readonly:Upsilon push Pi -r 8 pushing to Pi searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files + cannot lock source repo, skipping local public phase update $ hgph Upsilon -r 'min(draft())' - o 9 draft a-G - 3e27b6f1eee1 + o 2 draft a-C - 54acac6f23ab | ~ ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 3] merge: only show "cannot merge flags for %s" warning if flags are different
# HG changeset patch # User Mads Kiilerich # Date 1476267738 -7200 # Wed Oct 12 12:22:18 2016 +0200 # Node ID 29bd20c4999865fc19ca3f0344c2c1231a318b1c # Parent bc7af83150d0137d35d2cd9bf715fc0e3658cf8a merge: only show "cannot merge flags for %s" warning if flags are different diff --git a/mercurial/merge.py b/mercurial/merge.py --- a/mercurial/merge.py +++ b/mercurial/merge.py @@ -475,7 +475,7 @@ class mergestate(object): flo = fco.flags() fla = fca.flags() if 'x' in flags + flo + fla and 'l' not in flags + flo + fla: -if fca.node() == nullid: +if fca.node() == nullid and flags != flo: if preresolve: self._repo.ui.warn( _('warning: cannot merge flags for %s\n') % afile) diff --git a/tests/test-merge-types.t b/tests/test-merge-types.t --- a/tests/test-merge-types.t +++ b/tests/test-merge-types.t @@ -337,7 +337,6 @@ h: l vs l, different merging a warning: cannot merge flags for b merging b - warning: cannot merge flags for bx merging bx warning: cannot merge flags for c merging d @@ -400,7 +399,6 @@ h: l vs l, different merging a warning: cannot merge flags for b merging b - warning: cannot merge flags for bx merging bx warning: cannot merge flags for c merging d ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 3] tests: add test coverage of merging x flag without ancestor
# HG changeset patch # User Mads Kiilerich # Date 1476267738 -7200 # Wed Oct 12 12:22:18 2016 +0200 # Node ID bc7af83150d0137d35d2cd9bf715fc0e3658cf8a # Parent b85fa6bf298be07804a74d8fdec0d19fdbc6d740 tests: add test coverage of merging x flag without ancestor It is more noisy than necessary - we will fix that later. diff --git a/tests/test-merge-types.t b/tests/test-merge-types.t --- a/tests/test-merge-types.t +++ b/tests/test-merge-types.t @@ -307,6 +307,8 @@ h: l vs l, different $ echo 1 > a $ echo 1 > b $ chmod +x b + $ echo 1 > bx + $ chmod +x bx $ echo x > c $ chmod +x c $ echo 1 > d @@ -321,6 +323,8 @@ h: l vs l, different $ hg up -qr0 $ echo 2 > a $ echo 2 > b + $ echo 2 > bx + $ chmod +x bx $ echo x > c $ ln -s 2 d $ ln -s x e @@ -333,6 +337,8 @@ h: l vs l, different merging a warning: cannot merge flags for b merging b + warning: cannot merge flags for bx + merging bx warning: cannot merge flags for c merging d warning: internal :merge cannot merge symlinks for d @@ -345,29 +351,31 @@ h: l vs l, different warning: conflicts while merging h! (edit, then use 'hg resolve --mark') warning: conflicts while merging a! (edit, then use 'hg resolve --mark') warning: conflicts while merging b! (edit, then use 'hg resolve --mark') - 3 files updated, 0 files merged, 0 files removed, 5 files unresolved + warning: conflicts while merging bx! (edit, then use 'hg resolve --mark') + 3 files updated, 0 files merged, 0 files removed, 6 files unresolved use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon [1] $ hg resolve -l U a U b + U bx U d U f U h $ tellmeabout a a is a plain file with content: - <<<<<<< working copy: 0139c5610547 - test: 2 + <<<<<<< working copy: 0c617753b41b - test: 2 2 === 1 - >>>>>>> merge rev:97e29675e796 - test: 1 + >>>>>>> merge rev:2e60aa20b912 - test: 1 $ tellmeabout b b is a plain file with content: - <<<<<<< working copy: 0139c5610547 - test: 2 + <<<<<<< working copy: 0c617753b41b - test: 2 2 === 1 - >>>>>>> merge rev:97e29675e796 - test: 1 + >>>>>>> merge rev:2e60aa20b912 - test: 1 $ tellmeabout c c is a plain file with content: x @@ -392,6 +400,8 @@ h: l vs l, different merging a warning: cannot merge flags for b merging b + warning: cannot merge flags for bx + merging bx warning: cannot merge flags for c merging d warning: internal :merge cannot merge symlinks for d @@ -404,23 +414,24 @@ h: l vs l, different warning: conflicts while merging h! (edit, then use 'hg resolve --mark') warning: conflicts while merging a! (edit, then use 'hg resolve --mark') warning: conflicts while merging b! (edit, then use 'hg resolve --mark') - 3 files updated, 0 files merged, 0 files removed, 5 files unresolved + warning: conflicts while merging bx! (edit, then use 'hg resolve --mark') + 3 files updated, 0 files merged, 0 files removed, 6 files unresolved use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon [1] $ tellmeabout a a is a plain file with content: - <<<<<<< working copy: 97e29675e796 - test: 1 + <<<<<<< working copy: 2e60aa20b912 - test: 1 1 === 2 - >>>>>>> merge rev:0139c5610547 - test: 2 + >>>>>>> merge rev:0c617753b41b - test: 2 $ tellmeabout b b is an executable file with content: - <<<<<<< working copy: 97e29675e796 - test: 1 + <<<<<<< working copy: 2e60aa20b912 - test: 1 1 === 2 - >>>>>>> merge rev:0139c5610547 - test: 2 + >>>>>>> merge rev:0c617753b41b - test: 2 $ tellmeabout c c is an executable file with content: x ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 3] merge: clarify warning for (not) merging flags without ancestor
# HG changeset patch # User Mads Kiilerich # Date 1476267738 -7200 # Wed Oct 12 12:22:18 2016 +0200 # Node ID 9588752fc3a6d2b5b1f40ddfc48965997270892d # Parent 29bd20c4999865fc19ca3f0344c2c1231a318b1c merge: clarify warning for (not) merging flags without ancestor Give hints why it can't merge and what it will do instead. diff --git a/mercurial/merge.py b/mercurial/merge.py --- a/mercurial/merge.py +++ b/mercurial/merge.py @@ -478,7 +478,9 @@ class mergestate(object): if fca.node() == nullid and flags != flo: if preresolve: self._repo.ui.warn( -_('warning: cannot merge flags for %s\n') % afile) +_('warning: cannot merge flags for %s ' + 'without common ancestor - keeping local flags\n') +% afile) elif flags == fla: flags = flo if preresolve: diff --git a/tests/test-merge-types.t b/tests/test-merge-types.t --- a/tests/test-merge-types.t +++ b/tests/test-merge-types.t @@ -335,10 +335,10 @@ h: l vs l, different $ hg merge merging a - warning: cannot merge flags for b + warning: cannot merge flags for b without common ancestor - keeping local flags merging b merging bx - warning: cannot merge flags for c + warning: cannot merge flags for c without common ancestor - keeping local flags merging d warning: internal :merge cannot merge symlinks for d warning: conflicts while merging d! (edit, then use 'hg resolve --mark') @@ -397,10 +397,10 @@ h: l vs l, different $ hg up -Cqr1 $ hg merge merging a - warning: cannot merge flags for b + warning: cannot merge flags for b without common ancestor - keeping local flags merging b merging bx - warning: cannot merge flags for c + warning: cannot merge flags for c without common ancestor - keeping local flags merging d warning: internal :merge cannot merge symlinks for d warning: conflicts while merging d! (edit, then use 'hg resolve --mark') ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3] eol: on update, only re-check files if filtering changed
On 10/09/2016 04:45 PM, Pierre-Yves David wrote: On 10/09/2016 04:19 PM, Mads Kiilerich wrote: diff --git a/hgext/eol.py b/hgext/eol.py --- a/hgext/eol.py +++ b/hgext/eol.py @@ -312,10 +312,15 @@ def reposetup(ui, repo): self._eolmatch = util.never return +oldeol = None try: cachemtime = os.path.getmtime(self.join("eol.cache")) This seems like it should live in the "cache/" directory. Any reason not to ? I don't know. I guess it predates the cache directory. I would probably agree it now would be better if it lived there. This patch is backwards compatible and doesn't change that. Changing the location would be a separate independent change - and apparently trivial and backwards compatible. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 3] eol: store and reuse pattern matchers instead of creating in tight loop
# HG changeset patch # User Mads Kiilerich # Date 1476021282 -7200 # Sun Oct 09 15:54:42 2016 +0200 # Node ID 5240a8c9b6289838592f24f4e41a1158a48b951a # Parent 408ddba4689de432fe77e0ba8d27765b63719180 eol: store and reuse pattern matchers instead of creating in tight loop More "right" and more efficient. diff --git a/hgext/eol.py b/hgext/eol.py --- a/hgext/eol.py +++ b/hgext/eol.py @@ -175,25 +175,27 @@ class eolfile(object): include = [] exclude = [] +self.patterns = [] for pattern, style in self.cfg.items('patterns'): key = style.upper() if key == 'BIN': exclude.append(pattern) else: include.append(pattern) +m = match.match(root, '', [pattern]) +self.patterns.append((pattern, key, m)) # This will match the files for which we need to care # about inconsistent newlines. self.match = match.match(root, '', [], include, exclude) def copytoui(self, ui): -for pattern, style in self.cfg.items('patterns'): -key = style.upper() +for pattern, key, m in self.patterns: try: ui.setconfig('decode', pattern, self._decode[key], 'eol') ui.setconfig('encode', pattern, self._encode[key], 'eol') except KeyError: ui.warn(_("ignoring unknown EOL style '%s' from %s\n") -% (style, self.cfg.source('patterns', pattern))) +% (key, self.cfg.source('patterns', pattern))) # eol.only-consistent can be specified in ~/.hgrc or .hgeol for k, v in self.cfg.items('eol'): ui.setconfig('eol', k, v, 'eol') @@ -203,10 +205,10 @@ class eolfile(object): for f in (files or ctx.files()): if f not in ctx: continue -for pattern, style in self.cfg.items('patterns'): -if not match.match(repo.root, '', [pattern])(f): +for pattern, key, m in self.patterns: +if not m(f): continue -target = self._encode[style.upper()] +target = self._encode[key] data = ctx[f].data() if (target == "to-lf" and "\r\n" in data or target == "to-crlf" and singlelf.search(data)): ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 3] eol: fix variable naming - call it _eolmatch instead of _eolfile
# HG changeset patch # User Mads Kiilerich # Date 1476020562 -7200 # Sun Oct 09 15:42:42 2016 +0200 # Node ID 408ddba4689de432fe77e0ba8d27765b63719180 # Parent dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39 eol: fix variable naming - call it _eolmatch instead of _eolfile It is not the file but a match object based on it. diff --git a/hgext/eol.py b/hgext/eol.py --- a/hgext/eol.py +++ b/hgext/eol.py @@ -305,9 +305,9 @@ def reposetup(ui, repo): return eol.match def _hgcleardirstate(self): -self._eolfile = self.loadeol([None, 'tip']) -if not self._eolfile: -self._eolfile = util.never +self._eolmatch = self.loadeol([None, 'tip']) +if not self._eolmatch: +self._eolmatch = util.never return try: @@ -344,7 +344,7 @@ def reposetup(ui, repo): def commitctx(self, ctx, haserror=False): for f in sorted(ctx.added() + ctx.modified()): -if not self._eolfile(f): +if not self._eolmatch(f): continue fctx = ctx[f] if fctx is None: ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 3] eol: on update, only re-check files if filtering changed
# HG changeset patch # User Mads Kiilerich # Date 1476021289 -7200 # Sun Oct 09 15:54:49 2016 +0200 # Node ID 956e0c953cca045fc2bb57867b8c95f123d66841 # Parent 5240a8c9b6289838592f24f4e41a1158a48b951a eol: on update, only re-check files if filtering changed Before, update would mark all files as 'normallookup' in dirstate if .hgeol changed so all files would get the new filtering applied. That takes some time ... and is pointless if the filtering for that file didn't change. Instead, keep track of the old filtering and only check files where the filtering is changed. To keep the old filtering, change to write the applied .hgeol content to .hg/eol.cache instead of just touching it. That change is backwards/forwards compatible. In a real world test, this takes an update that is changing .hgeol and 3 files from 12s to 4s - where the remaining eol overhead is 1-2s. diff --git a/hgext/eol.py b/hgext/eol.py --- a/hgext/eol.py +++ b/hgext/eol.py @@ -312,10 +312,15 @@ def reposetup(ui, repo): self._eolmatch = util.never return +oldeol = None try: cachemtime = os.path.getmtime(self.join("eol.cache")) except OSError: cachemtime = 0 +else: +olddata = self.vfs.read("eol.cache") +if olddata: +oldeol = eolfile(self.ui, self.root, olddata) try: eolmtime = os.path.getmtime(self.wjoin(".hgeol")) @@ -324,17 +329,37 @@ def reposetup(ui, repo): if eolmtime > cachemtime: self.ui.debug("eol: detected change in .hgeol\n") + +hgeoldata = self.wvfs.read('.hgeol') +neweol = eolfile(self.ui, self.root, hgeoldata) + wlock = None try: wlock = self.wlock() for f in self.dirstate: -if self.dirstate[f] == 'n': -# all normal files need to be looked at -# again since the new .hgeol file might no -# longer match a file it matched before -self.dirstate.normallookup(f) -# Create or touch the cache to update mtime -self.vfs("eol.cache", "w").close() +if self.dirstate[f] != 'n': +continue +if oldeol is not None: +if not oldeol.match(f) and not neweol.match(f): +continue +oldkey = None +for pattern, key, m in oldeol.patterns: +if m(f): +oldkey = key +break +newkey = None +for pattern, key, m in neweol.patterns: +if m(f): +newkey = key +break +if oldkey == newkey: +continue +# all normal files need to be looked at again since +# the new .hgeol file specify a different filter +self.dirstate.normallookup(f) +# Write the cache to update mtime and cache .hgeol +with self.vfs("eol.cache", "w") as f: +f.write(hgeoldata) wlock.release() except error.LockUnavailable: # If we cannot lock the repository and clear the ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH] import: abort instead of crashing when copy source does not exist (issue5375)
On 10/08/2016 06:11 PM, Ryan McElroy wrote: # HG changeset patch # User Ryan McElroy # Date 1475929618 25200 # Sat Oct 08 05:26:58 2016 -0700 # Node ID 961569a2cbeebfd54c9369c6e36d03d4938aef38 # Parent dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39 import: abort instead of crashing when copy source does not exist (issue5375) Previously, when a patch contained a move or copy from a source that did not exist, `hg import` would crash. This patch changes import to raise a PatchError with an explanantion of what is wrong with the patch to avoid the stack trace and bad user experience. diff --git a/mercurial/patch.py b/mercurial/patch.py --- a/mercurial/patch.py +++ b/mercurial/patch.py @@ -1952,8 +1952,10 @@ def _applydiff(ui, fp, patcher, backend, data, mode = None, None if gp.op in ('RENAME', 'COPY'): data, mode = store.getfile(gp.oldpath)[:2] -# FIXME: failing getfile has never been handled here -assert data is not None +if data is None: +# This means that the old path does not exist +raise PatchError(_("source file '%s' does not exist") + % gp.oldpath) if gp.mode: mode = gp.mode if gp.op == 'ADD': diff --git a/tests/test-import.t b/tests/test-import.t --- a/tests/test-import.t +++ b/tests/test-import.t @@ -1793,3 +1793,13 @@ repository when file not found for patch 1 out of 1 hunks FAILED -- saving rejects to file file1.rej abort: patch failed to apply [255] + +test import crash (issue5375) + $ cd .. + $ hg init repo + $ cd repo + $ printf "diff --git a/a b/b\nrename from a\nrename to b" | hg import - + applying patch from stdin + a not tracked! + abort: source file 'a' does not exist + [255] Hmm. For a patch modifying a non-existing file we already get: unable to find 'f' for patching 1 out of 1 hunks FAILED -- saving rejects to file f.rej patch failed, unable to continue (try -v) patch failed, rejects left in working directory $ cat f.rej --- f +++ f @@ -1,1 +1,2 @@ +f In the tested case, I would thus also expect it to leave a .rej file with the failing rename "hunk" while applying the rest of the patch (even though a pure rename arguably *doesn't* have any hunks). BUT the logic around this check seems wrong. A rename or copy of a missing file should be handled exactly the same, no matter if it is a bare rename/copy or if it also modifies the file (= has a first hunk). I don't know if it is better to give a not-entirely-correct abort than to fail with an assertion error, but I think it still deserves a FIXME/TODO. (The iterhunks docstring seems wrong, for example regarding 'file' entries and firsthunk. Let's go find pmezard!) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 6 of 6] largefiles: PoC: replace largefile copies in working directory with symlinks
# HG changeset patch # User Mads Kiilerich # Date 1475882684 -7200 # Sat Oct 08 01:24:44 2016 +0200 # Node ID 36e7e42f266e5cc70535479e50445395402c7400 # Parent 2c25dc4a5a556abbebe09fe3c4eb0ff4c8fa0cd4 largefiles: PoC: replace largefile copies in working directory with symlinks Updating between distant revisions can be expensive when working on multiple divergent branches. The basic working directory update might be efficient but copying the largefiles and hashing them will take a lot of bandwidth and thus take time. Usually, largefiles are copied from the store to the working directory to make sure that any in-place modification of largefiles in the working directory doesn't corrupt the store. In a build farm, there will often be a lot of switching between branches, and the risk of someone doing in-place modification of largefiles is quite small. It might thus make sense to make a different trade-off. The idea here is to make a symlink pointing at the storage file instead of copying it. In some cases, this can make working copy updates NaN times faster. To reduce the risk of corruption, storage files are made read-only. To make sure that time-stamp based build systems manage a symlink change correctly, storage files are touched when they are linked to. This should also work on Windows IF running as admin or ordinary users have been authorized to make symlinks (which probably in general not is a good idea). This is a proof of concept and doesn't contain or pass any tests. It should perhaps turn into an extension. diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -10,6 +10,7 @@ from __future__ import absolute_import import copy +import errno import hashlib import os import platform @@ -32,6 +33,20 @@ shortnameslash = shortname + '/' longname = 'largefiles' filechunkitersize = 128 * 1024 +if os.name == 'nt': +import ctypes +FILESYMLINK = 0 +_kernel32 = ctypes.windll.kernel32 +def symlink(src, dst): +"""Create a symbolic link pointing to src named dst. +Requires running as Admin ... or enabling in: +Computer configuration > Windows Settings > Security Settings > +Local Policies > User Rights Assignment > Create symbolic links +""" +_kernel32.CreateSymbolicLinkA(dst, os.path.normpath(src), FILESYMLINK) +else: +symlink = os.symlink + # -- Private worker functions -- def getminsize(ui, assumelfiles, opt, default=10): @@ -227,18 +242,26 @@ def copyfromcache(repo, hash, filename): path = findfile(repo, hash) if path is None: return False +# make storage file read-only before pointing at it - reduce risk of +# corruption +try: +os.chmod(path, os.stat(path).st_mode & + ~(stat.S_IWUSR|stat.S_IWGRP|stat.S_IWOTH)) +except OSError as e: +if e.errno != errno.EPERM: +raise +# no chmod access probably means that it is safe (but touch will fail) +repo.ui.debug("can't chmod %s\n" % path) wvfs.makedirs(wvfs.dirname(wvfs.join(filename))) -# The write may fail before the file is fully written, but we -# don't use atomic writes in the working copy. -with open(path, 'rb') as srcfd: -with wvfs(filename, 'wb') as destfd: -gothash = copyandhash( -util.filechunkiter(srcfd, filechunkitersize), destfd) -if gothash != hash: -repo.ui.warn(_('%s: data corruption in %s with hash %s\n') - % (filename, path, gothash)) -wvfs.unlink(filename) -return False +symlink(path, wvfs.join(filename)) +# touch the storage file so build systems see the file as modified +try: +os.utime(path, None) # touch +except OSError as e: +if e.errno != errno.EACCES: +raise +repo.ui.warn(_("can't touch %s - that might confuse build systems\n") + % path) return True def copytostore(repo, rev, file, uploaded=False): diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py --- a/hgext/largefiles/overrides.py +++ b/hgext/largefiles/overrides.py @@ -1392,6 +1392,8 @@ def mergeupdate(orig, repo, node, branch lfileabs = repo.wvfs.join(lfile) if not repo.wvfs.exists(lfileabs): continue +if repo.wvfs.islink(lfile): +continue # never update standins from symlinks lfhash = lfutil.hashrepofile(repo, lfile) standin = lfutil.standin(lfile) lfutil.writestandin(repo, standin, lfhash, diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py --- a/hgext/la
[PATCH 3 of 6] largefiles: use constant file chunk size instead of repeating 128 * 1024
# HG changeset patch # User Mads Kiilerich # Date 1475881181 -7200 # Sat Oct 08 00:59:41 2016 +0200 # Node ID 5d63b84517ab6ecff8695aff0748bef9695c241c # Parent 87cea1040403001e660dd1c6b2e2d069d8a51d2e largefiles: use constant file chunk size instead of repeating 128 * 1024 diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -30,6 +30,7 @@ from mercurial import ( shortname = '.hglf' shortnameslash = shortname + '/' longname = 'largefiles' +filechunkitersize = 128 * 1024 # -- Private worker functions -- @@ -371,7 +372,7 @@ def hashfile(file): return '' hasher = hashlib.sha1('') with open(file, 'rb') as fd: -for data in util.filechunkiter(fd, 128 * 1024): +for data in util.filechunkiter(fd, filechunkitersize): hasher.update(data) return hasher.hexdigest() diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py --- a/hgext/largefiles/overrides.py +++ b/hgext/largefiles/overrides.py @@ -1356,7 +1356,8 @@ def overridecat(orig, ui, repo, file1, * 'downloaded') % lf) path = lfutil.usercachepath(repo.ui, hash) with open(path, "rb") as fpin: -for chunk in util.filechunkiter(fpin, 128 * 1024): +for chunk in util.filechunkiter(fpin, +lfutil.filechunkitersize): fp.write(chunk) err = 0 return err diff --git a/hgext/largefiles/proto.py b/hgext/largefiles/proto.py --- a/hgext/largefiles/proto.py +++ b/hgext/largefiles/proto.py @@ -134,7 +134,8 @@ def wirereposetup(ui, repo): length)) # SSH streams will block if reading more than length -for chunk in util.filechunkiter(stream, 128 * 1024, length): +for chunk in util.filechunkiter(stream, lfutil.filechunkitersize, +length): yield chunk # HTTP streams must hit the end to process the last empty # chunk of Chunked-Encoding so the connection can be reused. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 4 of 6] largefiles: always use filechunkitersize when using filechunkiter
# HG changeset patch # User Mads Kiilerich # Date 1475881182 -7200 # Sat Oct 08 00:59:42 2016 +0200 # Node ID 8838011d6452cbe31608482cc9f1b81e00fb1ca5 # Parent 5d63b84517ab6ecff8695aff0748bef9695c241c largefiles: always use filechunkitersize when using filechunkiter (Alternatively, we could change the filechunkiter default, use the existing default, or have a custom largefiles filechunkiter wrapper ...) diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -57,7 +57,7 @@ def link(src, dest): # if hardlinks fail, fallback on atomic copy with open(src, 'rb') as srcf: with util.atomictempfile(dest) as dstf: -for chunk in util.filechunkiter(srcf): +for chunk in util.filechunkiter(srcf, filechunkitersize): dstf.write(chunk) os.chmod(dest, os.stat(src).st_mode) @@ -268,7 +268,7 @@ def copytostoreabsolute(repo, file, hash with open(file, 'rb') as srcf: with util.atomictempfile(storepath(repo, hash), createmode=repo.store.createmode) as dstf: -for chunk in util.filechunkiter(srcf): +for chunk in util.filechunkiter(srcf, filechunkitersize): dstf.write(chunk) linktousercache(repo, hash) @@ -399,7 +399,7 @@ def hexsha1(data): """hexsha1 returns the hex-encoded sha1 sum of the data in the file-like object data""" h = hashlib.sha1() -for chunk in util.filechunkiter(data): +for chunk in util.filechunkiter(data, filechunkitersize): h.update(chunk) return h.hexdigest() ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 6] largefiles: use context for file closing
# HG changeset patch # User Mads Kiilerich # Date 1475881181 -7200 # Sat Oct 08 00:59:41 2016 +0200 # Node ID 87cea1040403001e660dd1c6b2e2d069d8a51d2e # Parent 96315a5833ed015acb7bd8f6d7f1e38db6fa9c50 largefiles: use context for file closing Make the code slightly smaller and safer (and more deeply indented). diff --git a/hgext/largefiles/basestore.py b/hgext/largefiles/basestore.py --- a/hgext/largefiles/basestore.py +++ b/hgext/largefiles/basestore.py @@ -91,15 +91,13 @@ class basestore(object): storefilename = lfutil.storepath(self.repo, hash) tmpname = storefilename + '.tmp' -tmpfile = util.atomictempfile(tmpname, - createmode=self.repo.store.createmode) - -try: -gothash = self._getfile(tmpfile, filename, hash) -except StoreError as err: -self.ui.warn(err.longmessage()) -gothash = "" -tmpfile.close() +with util.atomictempfile(tmpname, +createmode=self.repo.store.createmode) as tmpfile: +try: +gothash = self._getfile(tmpfile, filename, hash) +except StoreError as err: +self.ui.warn(err.longmessage()) +gothash = "" if gothash != hash: if gothash != "": diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -54,10 +54,10 @@ def link(src, dest): util.oslink(src, dest) except OSError: # if hardlinks fail, fallback on atomic copy -dst = util.atomictempfile(dest) -for chunk in util.filechunkiter(open(src, 'rb')): -dst.write(chunk) -dst.close() +with open(src, 'rb') as srcf: +with util.atomictempfile(dest) as dstf: +for chunk in util.filechunkiter(srcf): +dstf.write(chunk) os.chmod(dest, os.stat(src).st_mode) def usercachepath(ui, hash): @@ -264,11 +264,11 @@ def copytostoreabsolute(repo, file, hash link(usercachepath(repo.ui, hash), storepath(repo, hash)) else: util.makedirs(os.path.dirname(storepath(repo, hash))) -dst = util.atomictempfile(storepath(repo, hash), - createmode=repo.store.createmode) -for chunk in util.filechunkiter(open(file, 'rb')): -dst.write(chunk) -dst.close() +with open(file, 'rb') as srcf: +with util.atomictempfile(storepath(repo, hash), + createmode=repo.store.createmode) as dstf: +for chunk in util.filechunkiter(srcf): +dstf.write(chunk) linktousercache(repo, hash) def linktousercache(repo, hash): @@ -370,10 +370,9 @@ def hashfile(file): if not os.path.exists(file): return '' hasher = hashlib.sha1('') -fd = open(file, 'rb') -for data in util.filechunkiter(fd, 128 * 1024): -hasher.update(data) -fd.close() +with open(file, 'rb') as fd: +for data in util.filechunkiter(fd, 128 * 1024): +hasher.update(data) return hasher.hexdigest() def getexecutable(filename): diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py --- a/hgext/largefiles/overrides.py +++ b/hgext/largefiles/overrides.py @@ -883,11 +883,8 @@ def hgclone(orig, ui, opts, *args, **kwa # If largefiles is required for this repo, permanently enable it locally if 'largefiles' in repo.requirements: -fp = repo.vfs('hgrc', 'a', text=True) -try: +with repo.vfs('hgrc', 'a', text=True) as fp: fp.write('\n[extensions]\nlargefiles=\n') -finally: -fp.close() # Caching is implicitly limited to 'rev' option, since the dest repo was # truncated at that point. The user may expect a download count with @@ -1339,30 +1336,28 @@ def overridecat(orig, ui, repo, file1, * m.visitdir = lfvisitdirfn for f in ctx.walk(m): -fp = cmdutil.makefileobj(repo, opts.get('output'), ctx.node(), - pathname=f) -lf = lfutil.splitstandin(f) -if lf is None or origmatchfn(f): -# duplicating unreachable code from commands.cat -data = ctx[f].data() -if opts.get('decode'): -data = repo.wwritedata(f, data) -fp.write(data) -else: -hash = lfutil.readstandin(repo, lf, ctx.rev()) -if not lfutil.inusercache(repo.ui, hash): -store = storefactory.openstore(repo) -success, missing = store.get([(lf, hash)]) -
[PATCH 1 of 6] largefiles: when setting/clearing x bit on largefiles, don't change other bits
# HG changeset patch # User Mads Kiilerich # Date 1475881180 -7200 # Sat Oct 08 00:59:40 2016 +0200 # Node ID 96315a5833ed015acb7bd8f6d7f1e38db6fa9c50 # Parent 1779dde4c9ef97cb242f8d501655f236f66e5439 largefiles: when setting/clearing x bit on largefiles, don't change other bits It is only the X bit that it matters to copy from the standin to the largefile in the working directory. While it generally doesn't do any harm to copy the whole mode, it is also "wrong" to copy more than the X bit we care about. It can make a difference if someone should try to handle largefiles differently, such as marking them read-only. Thus, do similar to what utils.setflags does and set the X bit where there are R bits and obey umask. diff --git a/hgext/largefiles/lfcommands.py b/hgext/largefiles/lfcommands.py --- a/hgext/largefiles/lfcommands.py +++ b/hgext/largefiles/lfcommands.py @@ -515,9 +515,13 @@ def updatelfiles(ui, repo, filelist=None rellfile = lfile relstandin = lfutil.standin(lfile) if wvfs.exists(relstandin): -mode = wvfs.stat(relstandin).st_mode -if mode != wvfs.stat(rellfile).st_mode: -wvfs.chmod(rellfile, mode) +standinexec = wvfs.stat(relstandin).st_mode & 0o100 +st = wvfs.stat(rellfile).st_mode +if standinexec != st & 0o100: +st &= ~0o111 +if standinexec: +st |= (st >> 2) & 0o111 & ~util.umask +wvfs.chmod(rellfile, st) update1 = 1 updated += update1 ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 5 of 6] largefiles: always use filechunkiter when iterating files
# HG changeset patch # User Mads Kiilerich # Date 1475881183 -7200 # Sat Oct 08 00:59:43 2016 +0200 # Node ID 2c25dc4a5a556abbebe09fe3c4eb0ff4c8fa0cd4 # Parent 8838011d6452cbe31608482cc9f1b81e00fb1ca5 largefiles: always use filechunkiter when iterating files Before, we would sometimes use the default iterator over large files. That iterator is line based and would add extra buffering and use odd chunk sizes which could give some overhead. copyandhash can't just apply a filechunkiter as it sometimes is passed a genuine generator when downloading remotely. diff --git a/hgext/largefiles/lfutil.py b/hgext/largefiles/lfutil.py --- a/hgext/largefiles/lfutil.py +++ b/hgext/largefiles/lfutil.py @@ -232,7 +232,8 @@ def copyfromcache(repo, hash, filename): # don't use atomic writes in the working copy. with open(path, 'rb') as srcfd: with wvfs(filename, 'wb') as destfd: -gothash = copyandhash(srcfd, destfd) +gothash = copyandhash( +util.filechunkiter(srcfd, filechunkitersize), destfd) if gothash != hash: repo.ui.warn(_('%s: data corruption in %s with hash %s\n') % (filename, path, gothash)) diff --git a/hgext/largefiles/localstore.py b/hgext/largefiles/localstore.py --- a/hgext/largefiles/localstore.py +++ b/hgext/largefiles/localstore.py @@ -10,6 +10,7 @@ from __future__ import absolute_import from mercurial.i18n import _ +from mercurial import util from . import ( basestore, @@ -42,7 +43,8 @@ class localstore(basestore.basestore): raise basestore.StoreError(filename, hash, self.url, _("can't get file locally")) with open(path, 'rb') as fd: -return lfutil.copyandhash(fd, tmpfile) +return lfutil.copyandhash( +util.filechunkiter(fd, lfutil.filechunkitersize), tmpfile) def _verifyfiles(self, contents, filestocheck): failed = False diff --git a/hgext/largefiles/remotestore.py b/hgext/largefiles/remotestore.py --- a/hgext/largefiles/remotestore.py +++ b/hgext/largefiles/remotestore.py @@ -118,7 +118,7 @@ class remotestore(basestore.basestore): raise NotImplementedError('abstract method') def _get(self, hash): -'''Get file with the given hash from the remote store.''' +'''Get a iterator for content with the given hash.''' raise NotImplementedError('abstract method') def _stat(self, hashes): ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] push: introduce --readonly to make push behave more like pull
On 08/23/2016 07:02 PM, Gregory Szorc wrote: On Mon, Aug 22, 2016 at 6:14 PM, Mads Kiilerich <mailto:m...@kiilerich.com>> wrote: Push has some advantages: * I can use revsets to control what is pushed. * Pushing is "the right model" for propagating changes - polling/pulling is bad. * For a master server with "slaves" it also has some advantages that the master will call out to the slaves instead of allowing them to connect. * It's so weird to creating a ssh connection to a remote server and pass the result of some local revset query, just to let it connect back and make a pull. And I think a pull based model is more robust for replication. And I think a pull based model where clients consume a log of "pull requests" that is asynchronous to the push operation is even more robust. Right now, I would agree. Pull is more stable than push because of the hardcoded policy that pull is read only so there can be multiple parallel pulls from a repo, but only one locking push at a time from a repo. I want to fix that. (Btw: AFAIK, that "scalability regression" came when phases were introduced. I want to give an option for working around that.) If you are running async out of a queue anyway and have this read-only push, then I doubt there will be much significant difference in robustness between initiating the "exchange" from the master and from the remove server. So sure, if you want more logic than just the plain hgweb on the remote servers and already have a reliable distributed secure message queue, then pulling is just as fine as pushing (except for the missing revsets). No single size fits all, and even though our setups are similar, they also have differences. I would love to look more into your solution. But now, I just need this missing piece to make the next step / improvement in our setup. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] push: introduce --readonly to make push behave more like pull
On 08/23/2016 03:49 PM, Augie Fackler wrote: I don't know if this feature I'm asking for most correctly is described as non-locking and thus readonly and not modifying state or publishing, or if it non-publishing and thus readonly and with no need for locking. It could be described either way - I just want one of them ;-) I think the latter is more sensible from a user perspective - the user doesn't (ideally) know about or care about our locking requirements, so the visible side effect to them (no phase changes) seems like the thing to focus on from a UI level. It is however (in the situation) very visible and annoying for the user that push is locking the source repo. For many use cases, phase change (publish on push) is really not relevant. For example when you are propagating a mainline / upstream repo to a mirror and everything already has been published. /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 2 of 2] push: introduce --readonly to make push behave more like pull
On 08/22/2016 04:25 PM, Augie Fackler wrote: On Sun, Aug 21, 2016 at 01:33:14AM +0200, Mads Kiilerich wrote: # HG changeset patch # User Mads Kiilerich # Date 1471735620 -7200 # Sun Aug 21 01:27:00 2016 +0200 # Branch stable # Node ID 48f4ae36a49929c2389bfb75064f96d16159b2d0 # Parent 7880f56ca7495a2d0365a8280935eccd1e63f0c8 push: introduce --readonly to make push behave more like pull Hm, interesting. To clarify the use case, you've got two local repositories, and you want to move some draft changes between them, but because (for example) push allows a revset, it's easier to push than to pull? So the goal here is to "push without publishing", and if the destination supports drafts, you want to push the draft changes as drafts, otherwise fail? Is that right? My primary use case is to have a push hook that will push to other repositories - some of them local, some of them remote. Push has some advantages: * I can use revsets to control what is pushed. * Pushing is "the right model" for propagating changes - polling/pulling is bad. * For a master server with "slaves" it also has some advantages that the master will call out to the slaves instead of allowing them to connect. * It's so weird to creating a ssh connection to a remote server and pass the result of some local revset query, just to let it connect back and make a pull. But the problems are: * The primary problem is the write lock that prevents multiple simultaneous pushes. Sudoing to another user is a odd workaround. * When I'm pushing I'm in charge. Then it is "wrong" that it is hardcoded that the remote server decides how the push modifies state on my server. These problems can be addressed with this flag. I wonder if we should have --[no-]publish flags on push, wherein the --no-publish form would make it easy to "push these draft changes but only if it doesn't publish them" and the --publish form would make it easier to "push and publish these, but don't make them public unless the push succeeds." Thoughts? Perhaps ... but that is also the opposite of what I'm asking for and would not solve my use case, right? I want to push as if the receiving repo was non-publishing. Also, I want it to not lock. I don't know if this feature I'm asking for most correctly is described as non-locking and thus readonly and not modifying state or publishing, or if it non-publishing and thus readonly and with no need for locking. It could be described either way - I just want one of them ;-) /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 1 of 2] tests: add more test coverage of phase changes when pushing
# HG changeset patch # User Mads Kiilerich # Date 1471734720 -7200 # Sun Aug 21 01:12:00 2016 +0200 # Branch stable # Node ID 7880f56ca7495a2d0365a8280935eccd1e63f0c8 # Parent 5004ef47f437b3bafdd3800551c638a0bf4bb99b tests: add more test coverage of phase changes when pushing Prepare for test coverage of phase updates with future push --readonly option, both with and without actually pushing changesets. diff --git a/tests/test-phases-exchange.t b/tests/test-phases-exchange.t --- a/tests/test-phases-exchange.t +++ b/tests/test-phases-exchange.t @@ -1189,10 +1189,40 @@ 2. Test that failed phases movement are cannot lock source repo, skipping local public phase update [1] $ chmod -R +w .hg - $ hgph Upsilon $ cd .. - $ killdaemons.py +#endif -#endif +Test that clone behaves like pull and doesn't +publish changesets as plain push does + + $ hg -R Upsilon phase -q --force --draft 2 + $ hg clone -q Upsilon Pi -r 7 + $ hgph Upsilon -r 'min(draft())' + o 2 draft a-C - 54acac6f23ab + | + ~ + + $ hg -R Upsilon push Pi -r 7 + pushing to Pi + searching for changes + no changes found + [1] + $ hgph Upsilon -r 'min(draft())' + o 8 draft a-F - b740e3e5c05d + | + ~ + + $ hg -R Upsilon push Pi -r 8 + pushing to Pi + searching for changes + adding changesets + adding manifests + adding file changes + added 1 changesets with 1 changes to 1 files + + $ hgph Upsilon -r 'min(draft())' + o 9 draft a-G - 3e27b6f1eee1 + | + ~ ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 2 of 2] push: introduce --readonly to make push behave more like pull
# HG changeset patch # User Mads Kiilerich # Date 1471735620 -7200 # Sun Aug 21 01:27:00 2016 +0200 # Branch stable # Node ID 48f4ae36a49929c2389bfb75064f96d16159b2d0 # Parent 7880f56ca7495a2d0365a8280935eccd1e63f0c8 push: introduce --readonly to make push behave more like pull Push and pull are almost symmetric ... except when they are not. For example, push makes it possible to use a revset to specify the revisions to push. Push can also be more suitable for distributing changes to multiple mirrors than pull is. Finally, push will by default publish draft changesets. Often, it is correct and desirable that push tries to update phases and thus locks the repository while pushing ... put that also means that there can't be multiple concurrent pushes and that pushing from non-publishing repositories depends on correct configuration of the target repository. We thus introduce --readonly to prevent push from locking the source repository and thus prevent it from publishing. Push with --readonly is thus more symmetric to pull. diff --git a/mercurial/commands.py b/mercurial/commands.py --- a/mercurial/commands.py +++ b/mercurial/commands.py @@ -5878,6 +5878,7 @@ def pull(ui, repo, source="default", **o ('b', 'branch', [], _('a specific branch you would like to push'), _('BRANCH')), ('', 'new-branch', False, _('allow pushing a new branch')), +('', 'readonly', False, _("no source repo locking or phase update")), ] + remoteopts, _('[-f] [-r REV]... [-e CMD] [--remotecmd CMD] [DEST]')) def push(ui, repo, dest=None, **opts): @@ -5886,8 +5887,9 @@ def push(ui, repo, dest=None, **opts): Push changesets from the local repository to the specified destination. -This operation is symmetrical to pull: it is identical to a pull -in the destination repository from the current one. +This operation is almost symmetrical to pull: it is identical to a pull +in the destination repository from the current one, except that pushed +changesets in draft phase are made public unless --readonly is used. By default, push will not allow creation of new heads at the destination, since multiple heads would make it unclear which head @@ -5969,7 +5971,8 @@ def push(ui, repo, dest=None, **opts): pushop = exchange.push(repo, other, opts.get('force'), revs=revs, newbranch=opts.get('new_branch'), bookmarks=opts.get('bookmark', ()), - opargs=opts.get('opargs')) + opargs=opts.get('opargs'), + readonly=opts.get('readonly')) result = not pushop.cgresult diff --git a/mercurial/exchange.py b/mercurial/exchange.py --- a/mercurial/exchange.py +++ b/mercurial/exchange.py @@ -275,7 +275,7 @@ class pushoperation(object): """ def __init__(self, repo, remote, force=False, revs=None, newbranch=False, - bookmarks=()): + bookmarks=(), readonly=False): # repo we push from self.repo = repo self.ui = repo.ui @@ -289,6 +289,8 @@ class pushoperation(object): self.bookmarks = bookmarks # allow push of new branch self.newbranch = newbranch +# no locking of source repo and no publishing phase update +self.readonly = readonly # did a local lock get acquired? self.locallocked = None # step already performed @@ -379,7 +381,7 @@ bookmsgmap = {'update': (_("updating boo def push(repo, remote, force=False, revs=None, newbranch=False, bookmarks=(), - opargs=None): + opargs=None, readonly=False): '''Push outgoing changesets (limited by revs) from a local repository to remote. Return an integer: - None means nothing to push @@ -391,7 +393,7 @@ def push(repo, remote, force=False, revs if opargs is None: opargs = {} pushop = pushoperation(repo, remote, force, revs, newbranch, bookmarks, - **opargs) + readonly, **opargs) if pushop.remote.local(): missing = (set(pushop.repo.requirements) - pushop.remote.local().supported) @@ -413,23 +415,26 @@ def push(repo, remote, force=False, revs raise error.Abort(_("destination does not support push")) # get local lock as we might write phase data localwlock = locallock = None -try: -# bundle2 push may receive a reply bundle touching bookmarks or other -# things requiring the wlock. Take it now to ensure proper ordering. -maypushback = pushop.ui.configbool('experimental', 'bundle2.pushback') -if _canusebundle2(pushop)
Re: [PATCH 3 of 5] branchmap: acquires lock before writting the rev branch cache
On Sun, Aug 7, 2016 at 1:21 PM, Yuya Nishihara wrote: > On Sat, 06 Aug 2016 01:02:55 +0200, Pierre-Yves David wrote: > > # HG changeset patch > > # User Pierre-Yves David > > # Date 1470401836 -7200 > > # Fri Aug 05 14:57:16 2016 +0200 > > # Node ID 84885aeec2e442d18dcb70fcce2d65dd9fafbf91 > > # Parent fa3f05a4219c4ea6b9bc9682f258e4418b36c265 > > # EXP-Topic vfsward > > branchmap: acquires lock before writting the rev branch cache > > > > We now attempt to acquire a lock and write the branch cache within that > lock. > > This would prevent cache corruption when multiple processes try to write > the cache > > at the same time. > > (+CC Durham, Mads) > > I wonder if revbranchcache is designed to not take a lock. 9347c15d8136 > says > corruption can be recovered. > Yes, I think it was given as an initial design constraint for acceptance upstream that the cache had to be updated on the fly by read only operations without locking. Hindsight perspective might have changed mine and others memory and opinion ;-) In my opinion, the revision branch cache should be treated exactly like the branch head cache and updated the same way. It might be expensive to update initially, but not doing it will end up being more expensive. Proper transaction support would however also require that the design constraint of doing in-place (non-atomic non-append) updates got lifted. The existing algorithm is not perfect lockfree, but I doubt it was possible to come up with a test case where simultaneous writes would give a corruption that not would be detected and recovered automatically. This change looks like a fine small improvement that will reduce the risk. But still, like most other write locks in Mercurial, it has the problem that data is read and checked before taking the lock and then "leaks" into the locked zone. I wonder but haven't investigated: How will this handle the case where the repo can't be locked because it is on a read only filesystem? Will it get LockHeld (which would be a misleading exception name) and skip the writing? /Mads ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel