Jun Wu <qu...@fb.com> writes: > Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800: >> On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <qu...@fb.com> wrote: >> >> > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000: >> > > As I said before we will load all non-public revs in one set and all >> > >> > The problem is, loading a Python set from disk is O(size-of-the-set). >> > >> > Bitmap's loading cost should be basically 0 (with mmap). I think that's why >> > we want bitmap at the first place. There are other choices like packfile >> > index, hash tables, but bitmap is the simplest and most efficient. >> > >> >> Hey folks, >> >> I haven't yet seen mention of some considerations that seem very important >> in driving the decision-making, so I'd appreciate it if someone could fill >> me in. >> >> Firstly, what's our current understanding of the sizes and compositions of >> these sets of numbers? In theory, we have a lot of data from practical >> application at Facebook, but nobody's brought it up. >> >> To clarify why this matters, if an obsstore contains say 100 entries, then >> a very naive implementation should be fine. If it's related to something >> else, such as the size of, amount of activity in, or local age of a repo, >> then maybe we have a different set of decisions to make. > > First, I agree that if N is small, O(N) or O(log(N)), or O(1), do not matter > much in practice. I tend to always reason about future-proof solutions, > which may or may not be a good habit - I'll adjust in the future. > > I think there are multiple topics being discussed: > > 1. How to solve the overhead loading hiddenrevs with minimal changes? > 2. Why is the bitmap format interesting (future use-cases)? > > For 1, > > I once tried to make one bit of revlog flag as the source of truth of > "hidden", which looks a nice solution. But it got rejected because revlog > has to be append-only. Moving the flag bits out formed the bitmap idea > naturally. > > Bitmap fits here but it does not mean we have to use it to solve this > particular problem. For example, stateful chg will get us some easy wins, > and see the problem analysis and proposed solution below. > > Note that we actually have a simple caching format for the "hiddenrevs". > See repoview.py, computehidden. > > But what's wrong with "computehidden" is the choice of the cache key - it > calls "hideablerevs", which takes 0.5 seconds in obsolete.getrevs(repo, > 'obsolete'), which reads the giant obsstore (in hg-committed) and do some > complex computation. The result is then hashed and used as a cache key of > "computehidden". > > So, if we do not want to know the "obsolete()" revisions, but just want to > know what's hidden. Changing the cache key would be a very straightforward > and effective solution which does not require a new file format. > > That'll help commands like "hg st", "hg log -T foo", etc. But it won't > help complex "log"s which do require the "obsolete()" revisions. To speed > up the latter, we can probably just copy the caching infrastructure from > repoview.py. > > For 2, > > len(filteredrevs) in my hg-committed is 2155. It's small. I tend to think
Another data point for my hg-committed: len(filteredrevs) > 8k len(obs markers) > 120k _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel