On 9/2/16 9:14 AM, Gregory Szorc wrote:

On Aug 31, 2016, at 13:53, Durham Goode <dur...@fb.com> wrote:

One of the performance costs that affects every command is the computehidden 
function (which computes what commits are hidden based on a combination of 
obsmarkers, bookmarks, workingcopy, phases, and tags).  At Facebook this 
function alone can add 300ms to every command a user runs (depending on a 
variety of factors).

I've begun work on a storage structure for this information, so we don't need 
to recompute it during every read command.  Before I go through the work to 
polish it enough to send it upstream, I want to run it by people for comments.

The idea is basically:

- Have a file that acts as a bitmap, where a changelog rev number maps to a 
particular bit, and if that bit is set then the node is considered hidden.  
computehidden will return a class that uses this to answer 
filteredrevs.__contains__ calls.

- When something is changed that would affect visibility (bookmark edit, phase 
change, obsmarker creation, dirstate movement, tag edit), that code is 
responsible for calling hidden.updatevisibilty(repo, affectednodes) which 
reports which nodes are touched by the edit (like the old bookmark node and the 
new bookmark node), and updatevisibility() does the minimal calculation of 
visibility changes and propagates those changes to the node ancestors as 
required.

- The file would be stored in .hg/cache/hidden, and would only be 122kb per 
million commits, so we'd just use the normal copy + atomic rename strategy for 
transactionality (i.e. transaction.addfilegenerator)

- In theory the file would always be up-to-date, but we can put some metadata 
at the front of the file to allow verifying the repo hasn't changed 
significantly out from under it (recording tip, working copy parents, obsmarker 
file timestamp, etc). If the repo has changed, or if the bitmap doesn't yet 
exist, it can be trivially calculated using 
hidden.updatevisibility(repo.revs('not public()')) in a manner similar to how 
it works today.


Notable issues:

- A number of places treat filteredrevs as a set, and do things like 'myset - 
repo.filteredrevs'.  With a bitmap this doesn't work, so we need to translate 
it to '(r for r in myset if r not in repo.filteredrevs)'.  Which is probably a 
better O() anyway since it won't be affected by the size of filteredrevs.

- filteredrevs is currently a frozen set.  Editing the bitmap will have to be 
done in a way where the edits only become visible when the 'frozen' set is 
invalidated.  So we maintain the existing behavior.


Follow up ideas:

- Once this ships as a cache, it might be possible to progress it to 
not-a-cache, such that we could define visibility in other terms.  For 
instance, we could allow hiding/reviving commits without obsolescence markers 
in non-evolve repos.  This is a bit more of a long term thought, but might be 
worth brainstorming about.
Why not ship it as part of the store from the beginning (without the extra 
features)? Cache validation is hard. Tying the updates to store 
locks/transactions seems easier and guarantees no updates are needed at read 
time (which has historically caused issues on multi process HTTP servers).
The cache will be tied to locks and transactions already. So it effectively won't be a cache. The only reason for treating it like a cache and having some sort of validation is so that we can detect issues where the bitmap becomes out of sync and we can fix them before we make this really not-a-cache.
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to