On 11/08/2016 05:15 PM, Jun Wu wrote:
Excerpts from Pierre-Yves David's message of 2016-11-08 17:28:21 +0100:

On 11/07/2016 02:01 AM, Jun Wu wrote:
Excerpts from Pierre-Yves David's message of 2016-11-07 01:08:18 +0100:
[…]
What about an intermediate version that use C code to fill a Python heap
object? A good way to get a good grasp on the boost I think having a
small Cython proof of concept would help.

That might be helpful. However, I don't have immediate plans for it. Because
we will use C code to do fast "isancestor" test for the "changelog.i" walk,
and if we pre-build the linkrev database, there is no need to do
"changelog.d" walk. So "lazyancestors" won't be actually used in
"adjustlinkrev", which is my main focus right now.

Can you setup a plan page for this linkrev database  project ?

This is pretty simple and straightforward that probably does not need a plan
page. The database maintains a map from (path, fnode) to [linkrev] (a list
of possible linkrevs). The first version is being reviewed internally. A
preview is at https://bpaste.net/show/8634205271dc . I'll send them upstream
as Augie suggested once it's more polished, and will try to work with
non-filelog case that Martin mentioned on IRC.

Plan page does not needs to be very complex,

So you current plan is to have this "(path, fnode) → [linkrev]" mapping stored into a sqlite database and update it on the fly when needed. What were your rational to pick this over a more complete cache of (changerev, path, filenode) exchanged over the wire ?

Out of curiosity, do you have speedup number for your current experiment ?

I'd note that I think the C code may have room to improve. And if we are
going to add some "state" to speed up testing ancestors, I think it should
be done globally, i.e. on the C index object. So the state is private to the
C code and the Python code just calls index.isancestor, without knowing
about the details. As the changelog is shared globally and it may be a waste
of memory to have separated states for each fctx.

I'm a bit afraid of the space complexity of such cache. However, one of
you colleague at the sprint some idea around this.

The memory usage of the global cache would be something like O(log(N) * N)
or O(N), depending on how useful we want it to be.

I'm not sure how you forsee this cache, can you elaborate ? A naive
persistent cache approach would give a O(N²) space.

If I'm going to write the O(N^2) version, you can safely assume that I
couldn't write fastannotate :)

;-) (that's why I asked)

This is better than the "_ancestrycontext" approach - suppose every ctx has
an "_ancestrycontext", it's N^2 memory usage, where N is len(repo).

Except we don't do that right now. Especially, if we use many changectx
with a _ancestrycontext, it is likely to be the same object used for all
if I remember correctly.

But that does not speed up changelog.d walk. And I doubt that the use of
_ancestrycontext could lead to incorrect results in some cases if they are
the same object for the chain.

The changelog.d walk is about looking for modified file, right ?

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to