Excerpts from Pierre-Yves David's message of 2016-11-10 15:02:26 +0000: > Plan page does not needs to be very complex, > > So you current plan is to have this "(path, fnode) → [linkrev]" mapping > stored into a sqlite database and update it on the fly when needed. What > were your rational to pick this over a more complete cache of > (changerev, path, filenode) exchanged over the wire ? > > Out of curiosity, do you have speedup number for your current experiment ?
The experiment is still ongoing so I don't have data or clear plans yet. Things can change - for example, I may want to replace sqlite with gdbm. (So I'm still not motivated to write such a subject-to-change plan page) On "on demand": The motivation of the linkrev cache is to speed up building fastannotate cache, which essentially goes through every file in the repo and does a massive adjustlinkrevs. So it's better to have an offline command like "debugbuildlinkrevcache" to build the cache, in a more efficient way. I'm less concerned about the on-demand use-case - Ideally I want to make sure the linkrev database is *always* up-to-date, i.e. it contains all possible candidate linkrevs for every file node so changelog.d walk becomes impossible and we can have another optimization: if there is only one candidate linkrevs, just use it, skip the changelog.i ancestor test. That said, "on demand" could be an optional feature, it's not hard to implement after all. On "(changerev, path, filenode)": Note that "filenode" is redundant because "(changerev, path)" can decide the filenode. I believe a traditional database for "(changerev, path) -> linkrev" is not the way to go as it could be extremely inefficient on space usage with an offline command to build it. (think about storing manifests un-delta-ed) I didn't say the current approach I'm taking is the unique / best way to address the adjustlinkrev perf problem. There is possibility that we can have O(1)-like adjustlinkrev with cost of space usage like summing up all .i files. That approach is to make linkrev unique. Since we cannot BC the hash function, we can store new file nodes which takes the commit hash as a salt (and new manifests) in parallel. If we use tree-manifest, the space usage will be somehow feasible. I didn't go this way because it's more complex (may require invent some new formats) and depends on tree manifest, which is still "experimental" internally. On "exchange": The current data ((path, fnode) -> [linkrev]) map could also be exchanged - but there is no immediate plan for that right now. > > But that does not speed up changelog.d walk. And I doubt that the use of > > _ancestrycontext could lead to incorrect results in some cases if they are > > the same object for the chain. > > The changelog.d walk is about looking for modified file, right ? Yes. The changelog.d walk happens if none of the linkrev candidates is an ancestor of "srcrev". It's basically what git does all the time (one of the key reasons that git does not scale currently imo). _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel