Re: [PATCH 2 of 5] adjustlinkrev: use C implementation to test ancestor if possible

Jun Wu Thu, 10 Nov 2016 08:35:13 -0800

Excerpts from Pierre-Yves David's message of 2016-11-10 15:02:26 +0000:
> Plan page does not needs to be very complex,
> 
> So you current plan is to have this "(path, fnode) → [linkrev]" mapping 
> stored into a sqlite database and update it on the fly when needed. What 
> were your rational to pick this over a more complete cache of 
> (changerev, path, filenode) exchanged over the wire ?
>
> Out of curiosity, do you have speedup number for your current experiment ?


The experiment is still ongoing so I don't have data or clear plans yet.
Things can change - for example, I may want to replace sqlite with gdbm.
(So I'm still not motivated to write such a subject-to-change plan page)


On "on demand":

The motivation of the linkrev cache is to speed up building fastannotate
cache, which essentially goes through every file in the repo and does a
massive adjustlinkrevs. So it's better to have an offline command like
"debugbuildlinkrevcache" to build the cache, in a more efficient way. I'm
less concerned about the on-demand use-case - Ideally I want to make sure
the linkrev database is *always* up-to-date, i.e. it contains all possible
candidate linkrevs for every file node so changelog.d walk becomes
impossible and we can have another optimization: if there is only one
candidate linkrevs, just use it, skip the changelog.i ancestor test.

That said, "on demand" could be an optional feature, it's not hard to
implement after all.


On "(changerev, path, filenode)":

Note that "filenode" is redundant because "(changerev, path)" can decide
the filenode.

I believe a traditional database for "(changerev, path) -> linkrev" is not
the way to go as it could be extremely inefficient on space usage with an
offline command to build it. (think about storing manifests un-delta-ed)

I didn't say the current approach I'm taking is the unique / best way to
address the adjustlinkrev perf problem. There is possibility that we can
have O(1)-like adjustlinkrev with cost of space usage like summing up all .i
files.

That approach is to make linkrev unique. Since we cannot BC the hash
function, we can store new file nodes which takes the commit hash as a salt
(and new manifests) in parallel. If we use tree-manifest, the space usage
will be somehow feasible. I didn't go this way because it's more complex
(may require invent some new formats) and depends on tree manifest, which is
still "experimental" internally.


On "exchange":

The current data ((path, fnode) -> [linkrev]) map could also be exchanged -
but there is no immediate plan for that right now.

> > But that does not speed up changelog.d walk. And I doubt that the use of
> > _ancestrycontext could lead to incorrect results in some cases if they are
> > the same object for the chain.
> 
> The changelog.d walk is about looking for modified file, right ?

Yes. The changelog.d walk happens if none of the linkrev candidates is an
ancestor of "srcrev". It's basically what git does all the time (one of the
key reasons that git does not scale currently imo).
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH 2 of 5] adjustlinkrev: use C implementation to test ancestor if possible

Reply via email to