On Aug 31, 2016, at 13:53, Durham Goode <dur...@fb.com> wrote:
One of the performance costs that affects every command is the computehidden
function (which computes what commits are hidden based on a combination of
obsmarkers, bookmarks, workingcopy, phases, and tags). At Facebook this
function alone can add 300ms to every command a user runs (depending on a
variety of factors).
I've begun work on a storage structure for this information, so we don't need
to recompute it during every read command. Before I go through the work to
polish it enough to send it upstream, I want to run it by people for comments.
The idea is basically:
- Have a file that acts as a bitmap, where a changelog rev number maps to a
particular bit, and if that bit is set then the node is considered hidden.
computehidden will return a class that uses this to answer
filteredrevs.__contains__ calls.
- When something is changed that would affect visibility (bookmark edit, phase
change, obsmarker creation, dirstate movement, tag edit), that code is
responsible for calling hidden.updatevisibilty(repo, affectednodes) which
reports which nodes are touched by the edit (like the old bookmark node and the
new bookmark node), and updatevisibility() does the minimal calculation of
visibility changes and propagates those changes to the node ancestors as
required.
- The file would be stored in .hg/cache/hidden, and would only be 122kb per
million commits, so we'd just use the normal copy + atomic rename strategy for
transactionality (i.e. transaction.addfilegenerator)
- In theory the file would always be up-to-date, but we can put some metadata
at the front of the file to allow verifying the repo hasn't changed
significantly out from under it (recording tip, working copy parents, obsmarker
file timestamp, etc). If the repo has changed, or if the bitmap doesn't yet
exist, it can be trivially calculated using
hidden.updatevisibility(repo.revs('not public()')) in a manner similar to how
it works today.
Notable issues:
- A number of places treat filteredrevs as a set, and do things like 'myset -
repo.filteredrevs'. With a bitmap this doesn't work, so we need to translate
it to '(r for r in myset if r not in repo.filteredrevs)'. Which is probably a
better O() anyway since it won't be affected by the size of filteredrevs.
- filteredrevs is currently a frozen set. Editing the bitmap will have to be
done in a way where the edits only become visible when the 'frozen' set is
invalidated. So we maintain the existing behavior.
Follow up ideas:
- Once this ships as a cache, it might be possible to progress it to
not-a-cache, such that we could define visibility in other terms. For
instance, we could allow hiding/reviving commits without obsolescence markers
in non-evolve repos. This is a bit more of a long term thought, but might be
worth brainstorming about.