On 5/2/19 11:55 PM, Martin von Zweigbergk wrote:
On Thu, May 2, 2019 at 2:48 PM Pierre-Yves David
<pierre-yves.da...@ens-lyon.org <mailto:pierre-yves.da...@ens-lyon.org>>
wrote:
On 5/2/19 8:24 PM, Martin von Zweigbergk wrote:
>
>
> On Thu, May 2, 2019 at 9:37 AM Pierre-Yves David
> <pierre-yves.da...@ens-lyon.org
<mailto:pierre-yves.da...@ens-lyon.org>
<mailto:pierre-yves.da...@ens-lyon.org
<mailto:pierre-yves.da...@ens-lyon.org>>>
> wrote:
>
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.da...@octobus.net
<mailto:pierre-yves.da...@octobus.net>
> <mailto:pierre-yves.da...@octobus.net
<mailto:pierre-yves.da...@octobus.net>>>
> # Date 1552263020 -3600
> # Mon Mar 11 01:10:20 2019 +0100
> # Node ID eac353183daaef0a503da8cd72b8df43f54d7fb8
> # Parent a753bc019c1ad7c5661a050adce49e4c3cd5a786
> # EXP-Topic fnodecache
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> # hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r eac353183daa
> hgtagsfnodescache: inherit fnode from parent when possible
>
> If a changeset does not update the content of `.hgtags`, it
means it
> will use
> the same file-node (for `.hgtags`) than its parents. In this such
> case we can
> directly reuse the parent's file-node.
>
> We use this property when updating the `hgtagsfnodescache`
taking a
> faster path
> if we already have a cached value for the parent's of the node we
> are looking
> at.
>
> Doing so provide a large performance boost when looking at a
lot of
> fnode,
> especially on repository with very large manifest:
>
> timing for `tagsmod.fnoderevs(ui, repo, repo.changelog.revs())`
>
>
> What end-user command does this correspond to? `hg tags` with no
> .hg/cache/tags?
hg debugupdatecache
>
>
> mercurial: (41907 revisions, 1923 files)
>
> before: 6.9 seconds
> after: 2.7 seconds (-54%)
>
> pypy: (96266 revisions, 5198 files)
>
> before: 80 seconds
> after: 20 seconds (-75%)
>
> mozilla-central: (463411 revisions, 272080 files)
>
> before: 7166.4 seconds
> after: 47.8 seconds (-99%, x150 speedup)
>
>
> Nice improvements :) How did people work with these repos before?
This is the timing for compute the information for all nodes. To
retrieve current tags name we only need this data for all heads.
Getting it for all heads is still very slow to compute initially. (that
is why we exchange them during clone now).
To illustrate the slowness, I started a tags computation from cold
cache… This was 3 hours ago…
So currently we only use (and exchange) entry for the repository heads.
However, the speedup rely on reusing data from the parent. So warming
all entries during a `hg debugupdatecache` turns out to be more
efficient (with the new code).
I guess the next step from here is to warm all entry in all cases (not
just `hg debugupdatecache`) and efficiently exchange them over the wire.
>
>
> On a copy of mozilla-try with about 35K heads ans 1.7M
changesets,
> this move
> the computation from many hours to a couple of minutes.
Making it more
> interresting to do a full warm up of this cache before computing
> tags (from a
> cold cache).
>
> There seems to be other performance low hanging fruits, like
avoid
> the used of
> changectx or a more revision centric logic. However, the new
code is
> fast enough
> for my needs right now.
>
> diff --git a/mercurial/tags.py b/mercurial/tags.py
> --- a/mercurial/tags.py
> +++ b/mercurial/tags.py
> @@ -18,6 +18,7 @@ from .node import (
> bin,
> hex,
> nullid,
> + nullrev,
> short,
> )
> from .i18n import _
> @@ -718,12 +719,33 @@ class hgtagsfnodescache(object):
> if not computemissing:
> return None
>
> - # Populate missing entry.
> - try:
> - fnode = ctx.filenode('.hgtags')
> - except error.LookupError:
> - # No .hgtags file on this revision.
> - fnode = nullid
> + fnode = None
> + cl = self._repo.changelog
> + p1rev, p2rev = cl._uncheckedparentrevs(rev)
> + p1node = cl.node(p1rev)
> + p1fnode = self.getfnode(p1node, computemissing=False)
> + if p2rev != nullrev:
> + # There is some no-merge changeset where p1 is
null and
> p2 is set
> + # Processing them are merge is just slower, but
still
> give a good
> + # result.
>
>
> I think you're thinking of file copies, see
>
https://www.mercurial-scm.org/repo/hg/file/fdbeacb9d456/mercurial/localrepo.py#l2348
I am lost here. were are iterating over the changelog and the manifest
here. This code deal with "malformed" changelog entry. Why are file
copies relevant here?
I don't think you're lost. I think I just misunderstood what this was
about. I was not aware that some repos have commits broken in that way.
Any idea how that happened?
People using debugsetparent in most case (could be faulty extension code
too). The data structure allows to represent it, so it happens.
You can also have changeset with p1 == p2 ≠ nullrev.
--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel