Re: [PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

2017-05-15 Thread Martin von Zweigbergk via Mercurial-devel
On Mon, May 15, 2017 at 1:39 PM, Durham Goode  wrote:
>
>
> On 5/15/17 1:32 PM, Martin von Zweigbergk wrote:
>>
>> On Mon, May 8, 2017 at 11:40 AM, Durham Goode  wrote:
>>>
>>> # HG changeset patch
>>> # User Durham Goode 
>>> # Date 1494268523 25200
>>> #  Mon May 08 11:35:23 2017 -0700
>>> # Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
>>> # Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
>>> strip: make tree stripping O(changes) instead of O(repo)
>>>
>>> The old tree stripping logic iterated over every tree revlog in the repo
>>> looking
>>> for commits that had revs to be stripped. That's very inefficient in
>>> large
>>> repos. Instead, let's look at what files are touched by the strip and
>>> only
>>> inspect those revlogs.
>>
>>
>> Sorry, I didn't notice this patch until today (because bisection of a
>> test failure in narrowhg pointed me to it). This patch makes me a
>> little worried that it'll have the same bugs as the initial version of
>> changegroup generation for treemanifests had, i.e. that it forgets
>> that merge commits can affect no files, but still affect directories
>> (fixed in commit 1ac8ce13). I haven't tried to see if I can make it
>> fail with this patch applied (the narrowhg failures was unrelated).
>
>
> Ug, good point. I think this patch will fail in that case. Iterating over
> all the revlogs really isn't a scalable option though, so it sounds like
> maybe we need to actually do a walk of the trees for manifests that are
> being stripped.  Like, for each tree being stripped, diff them with their
> parents and return all directories that are new.

Yep, that's exactly what I did for changegroup generation in 1ac8ce13.

For now, could you send a patch to back this change out and I'll queue it?

>
> We could probably add an argument to walksubtrees that only yields subtrees
> that are different from the argument tree. This type of logic is useful for
> determining what trees need to be bundled too. We do something like this in
> our native implementation.
>
>
>>>
>>> diff --git a/mercurial/repair.py b/mercurial/repair.py
>>> --- a/mercurial/repair.py
>>> +++ b/mercurial/repair.py
>>> @@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
>>>  def striptrees(repo, tr, striprev, files):
>>>  if 'treemanifest' in repo.requirements: # safe but unnecessary
>>>  # otherwise
>>> -for unencoded, encoded, size in repo.store.datafiles():
>>> -if (unencoded.startswith('meta/') and
>>> -unencoded.endswith('00manifest.i')):
>>> -dir = unencoded[5:-12]
>>> -repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
>>> +treerevlog = repo.manifestlog._revlog
>>> +for dir in util.dirs(files):
>>
>>
>> Note that util.dirs() does not return the root directory (I've often
>> wanted to change that, and I may send a patch soon), so do you need to
>> manually include it here?
>
>
> The root manifest is already handled by the normal strip mechanism, so no
> need to handle it here.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

2017-05-15 Thread Durham Goode



On 5/15/17 1:32 PM, Martin von Zweigbergk wrote:

On Mon, May 8, 2017 at 11:40 AM, Durham Goode  wrote:

# HG changeset patch
# User Durham Goode 
# Date 1494268523 25200
#  Mon May 08 11:35:23 2017 -0700
# Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
# Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
strip: make tree stripping O(changes) instead of O(repo)

The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.


Sorry, I didn't notice this patch until today (because bisection of a
test failure in narrowhg pointed me to it). This patch makes me a
little worried that it'll have the same bugs as the initial version of
changegroup generation for treemanifests had, i.e. that it forgets
that merge commits can affect no files, but still affect directories
(fixed in commit 1ac8ce13). I haven't tried to see if I can make it
fail with this patch applied (the narrowhg failures was unrelated).


Ug, good point. I think this patch will fail in that case. Iterating 
over all the revlogs really isn't a scalable option though, so it sounds 
like maybe we need to actually do a walk of the trees for manifests that 
are being stripped.  Like, for each tree being stripped, diff them with 
their parents and return all directories that are new.


We could probably add an argument to walksubtrees that only yields 
subtrees that are different from the argument tree. This type of logic 
is useful for determining what trees need to be bundled too. We do 
something like this in our native implementation.





diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
 def striptrees(repo, tr, striprev, files):
 if 'treemanifest' in repo.requirements: # safe but unnecessary
 # otherwise
-for unencoded, encoded, size in repo.store.datafiles():
-if (unencoded.startswith('meta/') and
-unencoded.endswith('00manifest.i')):
-dir = unencoded[5:-12]
-repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
+treerevlog = repo.manifestlog._revlog
+for dir in util.dirs(files):


Note that util.dirs() does not return the root directory (I've often
wanted to change that, and I may send a patch soon), so do you need to
manually include it here?


The root manifest is already handled by the normal strip mechanism, so 
no need to handle it here.

___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

2017-05-15 Thread Martin von Zweigbergk via Mercurial-devel
On Mon, May 8, 2017 at 11:40 AM, Durham Goode  wrote:
> # HG changeset patch
> # User Durham Goode 
> # Date 1494268523 25200
> #  Mon May 08 11:35:23 2017 -0700
> # Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
> # Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
> strip: make tree stripping O(changes) instead of O(repo)
>
> The old tree stripping logic iterated over every tree revlog in the repo 
> looking
> for commits that had revs to be stripped. That's very inefficient in large
> repos. Instead, let's look at what files are touched by the strip and only
> inspect those revlogs.

Sorry, I didn't notice this patch until today (because bisection of a
test failure in narrowhg pointed me to it). This patch makes me a
little worried that it'll have the same bugs as the initial version of
changegroup generation for treemanifests had, i.e. that it forgets
that merge commits can affect no files, but still affect directories
(fixed in commit 1ac8ce13). I haven't tried to see if I can make it
fail with this patch applied (the narrowhg failures was unrelated).

>
> I don't have actual perf numbers, since internally we don't use a true
> treemanifest, but simply iterating over hundreds of thousands of revlogs takes
> many, many seconds, so this should help tremendously when stripping only a few
> commits.
>
> diff --git a/mercurial/repair.py b/mercurial/repair.py
> --- a/mercurial/repair.py
> +++ b/mercurial/repair.py
> @@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
>  def striptrees(repo, tr, striprev, files):
>  if 'treemanifest' in repo.requirements: # safe but unnecessary
>  # otherwise
> -for unencoded, encoded, size in repo.store.datafiles():
> -if (unencoded.startswith('meta/') and
> -unencoded.endswith('00manifest.i')):
> -dir = unencoded[5:-12]
> -repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
> +treerevlog = repo.manifestlog._revlog
> +for dir in util.dirs(files):

Note that util.dirs() does not return the root directory (I've often
wanted to change that, and I may send a patch soon), so do you need to
manually include it here?

> +# If the revlog doesn't exist, this returns an empty revlog and 
> is a
> +# no-op.
> +rl = treerevlog.dirlog(dir)
> +rl.strip(striprev, tr)
>
>  def rebuildfncache(ui, repo):
>  """Rebuilds the fncache file from repo history.
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

2017-05-08 Thread Augie Fackler
On Mon, May 08, 2017 at 11:40:03AM -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode 
> # Date 1494268523 25200
> #  Mon May 08 11:35:23 2017 -0700
> # Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
> # Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
> strip: make tree stripping O(changes) instead of O(repo)

queued, thanks
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 2] strip: make tree stripping O(changes) instead of O(repo)

2017-05-08 Thread Durham Goode
# HG changeset patch
# User Durham Goode 
# Date 1494268523 25200
#  Mon May 08 11:35:23 2017 -0700
# Node ID 74881b9a39b2bab273d09009385e3c9ca717a13a
# Parent  5dec5907fe49a488d3ade272d4a5cf090914e59c
strip: make tree stripping O(changes) instead of O(repo)

The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.

I don't have actual perf numbers, since internally we don't use a true
treemanifest, but simply iterating over hundreds of thousands of revlogs takes
many, many seconds, so this should help tremendously when stripping only a few
commits.

diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -238,11 +238,12 @@ def strip(ui, repo, nodelist, backup=Tru
 def striptrees(repo, tr, striprev, files):
 if 'treemanifest' in repo.requirements: # safe but unnecessary
 # otherwise
-for unencoded, encoded, size in repo.store.datafiles():
-if (unencoded.startswith('meta/') and
-unencoded.endswith('00manifest.i')):
-dir = unencoded[5:-12]
-repo.manifestlog._revlog.dirlog(dir).strip(striprev, tr)
+treerevlog = repo.manifestlog._revlog
+for dir in util.dirs(files):
+# If the revlog doesn't exist, this returns an empty revlog and is 
a
+# no-op.
+rl = treerevlog.dirlog(dir)
+rl.strip(striprev, tr)
 
 def rebuildfncache(ui, repo):
 """Rebuilds the fncache file from repo history.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel