Re: [PATCH 3 of 3 V3] chgcache: implement repocache
On Sat, 11 Mar 2017 17:21:41 -0800, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-03-11 15:50:43 -0800: > > On Sat, 11 Mar 2017 14:54:54 -0800, Jun Wu wrote: > > > Unlike the current "start new server for new extensions" behavior. The new > > > design allows us to serve the request using the old server sub-optimally, > > > and start a more efficient server in the background to replace the current > > > one soon. > > > > > > A fb-only example is that remotefilelog uses repo-independent packfiles. > > > > > > I think the cache state is conceptually attached to the current process, > > > instead of repo. Thus a global may actually be a better fit. > > > > Yeah, it will live longer than a repo, and will reside in dispatcher level, > > which I don't think mean the data should be globally accessible by anyone. > > It sounds like it could be done by split the "global" in chgcache states > into different modules: chgserver, dispatch, ... > > In that case, adding a new parameter to "smartcache" to decouple it from > chgcache._cache seems to address the issue. > > It sounds possible, but I need more time to make sure it works. Maybe something like that. Since most data to be cached would be repo-dependent, we don't need a global namespace for cache content. We can just attach repocache per repo and its content can be managed by the server. cache = {repo.root: repostrage}# managed by chg server repo.repocache = cache[repo.root] # maybe by chg ui/req hook? changelog.repocache = repo.repocache # passed by repo For remotefilelog which prefers wider cache, maybe it can manage its own cache storage? I haven't consider well how it should be invalidated, though. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3 V3] chgcache: implement repocache
Excerpts from Yuya Nishihara's message of 2017-03-11 15:50:43 -0800: > On Sat, 11 Mar 2017 14:54:54 -0800, Jun Wu wrote: > > Excerpts from Yuya Nishihara's message of 2017-03-11 12:54:17 -0800: > > > On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote: > > > > # HG changeset patch > > > > # User Jun Wu> > > > # Date 1488953311 28800 > > > > # Tue Mar 07 22:08:31 2017 -0800 > > > > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb > > > > # Parent f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2 > > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r > > > > d136f214b3a5 > > > > chgcache: implement repocache > > > > > > > +repoloadfunctable = {'changelog': loadchangelog} > > > > + > > > > +ui = uimod.ui() > > > > +ui.setconfig('ui', 'allowemptycommit', '1') > > > > + > > > > +repo = localrepo.localrepository( > > > > +ui, > > > > +os.path.join(os.environ['TESTTMP'], 'repo1'), > > > > +create=True) > > > > +repocache = chgcache.repocache(repo, repoloadfunctable) > > > > > > Suppose a cache object is attached to a repo by chgserver, do we really > > > need a global _cache storage? > > > > One design goal is to decouple low-level API from repo. i.e. chgcache could > > be used to cache non-repo objects. > > > > Extensions are an example of non-repo objects that could benefit from the > > design. Workers could send new extensions to the master, and the master > > could have better "extension cache hit rate" after reloading extensions. > > I don't get how that would work. Anyway, the extension would register > something > in the master process to load into cache. Then, it would be able to access to > the same cache object without using a global _cache dict. chgserver (or dispatch, or extensions, or someone else) will have a cache like: _loadedextension = {hash: pymod} I think chg server can just call extensions._importext to load an extension, assuming side-effect free. Maybe Python has some behavior to make it impossible. I'll investigate. > > > Unlike the current "start new server for new extensions" behavior. The new > > design allows us to serve the request using the old server sub-optimally, > > and start a more efficient server in the background to replace the current > > one soon. > > > > A fb-only example is that remotefilelog uses repo-independent packfiles. > > > > I think the cache state is conceptually attached to the current process, > > instead of repo. Thus a global may actually be a better fit. > > Yeah, it will live longer than a repo, and will reside in dispatcher level, > which I don't think mean the data should be globally accessible by anyone. It sounds like it could be done by split the "global" in chgcache states into different modules: chgserver, dispatch, ... In that case, adding a new parameter to "smartcache" to decouple it from chgcache._cache seems to address the issue. It sounds possible, but I need more time to make sure it works. > > > We want the > > behavior that "fork()" copies the state, and don't care if fork() should > > copy the repo object or not. > > IMHO, fork() is just for parallelism (given we can eventually omit uisetup() > in the master process.) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3 V3] chgcache: implement repocache
On Sat, 11 Mar 2017 14:54:54 -0800, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-03-11 12:54:17 -0800: > > On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote: > > > # HG changeset patch > > > # User Jun Wu> > > # Date 1488953311 28800 > > > # Tue Mar 07 22:08:31 2017 -0800 > > > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb > > > # Parent f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2 > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r > > > d136f214b3a5 > > > chgcache: implement repocache > > > > > +repoloadfunctable = {'changelog': loadchangelog} > > > + > > > +ui = uimod.ui() > > > +ui.setconfig('ui', 'allowemptycommit', '1') > > > + > > > +repo = localrepo.localrepository( > > > +ui, > > > +os.path.join(os.environ['TESTTMP'], 'repo1'), > > > +create=True) > > > +repocache = chgcache.repocache(repo, repoloadfunctable) > > > > Suppose a cache object is attached to a repo by chgserver, do we really > > need a global _cache storage? > > One design goal is to decouple low-level API from repo. i.e. chgcache could > be used to cache non-repo objects. > > Extensions are an example of non-repo objects that could benefit from the > design. Workers could send new extensions to the master, and the master > could have better "extension cache hit rate" after reloading extensions. I don't get how that would work. Anyway, the extension would register something in the master process to load into cache. Then, it would be able to access to the same cache object without using a global _cache dict. > Unlike the current "start new server for new extensions" behavior. The new > design allows us to serve the request using the old server sub-optimally, > and start a more efficient server in the background to replace the current > one soon. > > A fb-only example is that remotefilelog uses repo-independent packfiles. > > I think the cache state is conceptually attached to the current process, > instead of repo. Thus a global may actually be a better fit. Yeah, it will live longer than a repo, and will reside in dispatcher level, which I don't think mean the data should be globally accessible by anyone. > We want the > behavior that "fork()" copies the state, and don't care if fork() should > copy the repo object or not. IMHO, fork() is just for parallelism (given we can eventually omit uisetup() in the master process.) ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3 V3] chgcache: implement repocache
Excerpts from Yuya Nishihara's message of 2017-03-11 12:54:17 -0800: > On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu> > # Date 1488953311 28800 > > # Tue Mar 07 22:08:31 2017 -0800 > > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb > > # Parent f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2 > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r > > d136f214b3a5 > > chgcache: implement repocache > > > +repoloadfunctable = {'changelog': loadchangelog} > > + > > +ui = uimod.ui() > > +ui.setconfig('ui', 'allowemptycommit', '1') > > + > > +repo = localrepo.localrepository( > > +ui, > > +os.path.join(os.environ['TESTTMP'], 'repo1'), > > +create=True) > > +repocache = chgcache.repocache(repo, repoloadfunctable) > > Suppose a cache object is attached to a repo by chgserver, do we really > need a global _cache storage? One design goal is to decouple low-level API from repo. i.e. chgcache could be used to cache non-repo objects. Extensions are an example of non-repo objects that could benefit from the design. Workers could send new extensions to the master, and the master could have better "extension cache hit rate" after reloading extensions. Unlike the current "start new server for new extensions" behavior. The new design allows us to serve the request using the old server sub-optimally, and start a more efficient server in the background to replace the current one soon. A fb-only example is that remotefilelog uses repo-independent packfiles. I think the cache state is conceptually attached to the current process, instead of repo. Thus a global may actually be a better fit. We want the behavior that "fork()" copies the state, and don't care if fork() should copy the repo object or not. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: [PATCH 3 of 3 V3] chgcache: implement repocache
On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote: > # HG changeset patch > # User Jun Wu> # Date 1488953311 28800 > # Tue Mar 07 22:08:31 2017 -0800 > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb > # Parent f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r > d136f214b3a5 > chgcache: implement repocache > +repoloadfunctable = {'changelog': loadchangelog} > + > +ui = uimod.ui() > +ui.setconfig('ui', 'allowemptycommit', '1') > + > +repo = localrepo.localrepository( > +ui, > +os.path.join(os.environ['TESTTMP'], 'repo1'), > +create=True) > +repocache = chgcache.repocache(repo, repoloadfunctable) Suppose a cache object is attached to a repo by chgserver, do we really need a global _cache storage? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
[PATCH 3 of 3 V3] chgcache: implement repocache
# HG changeset patch # User Jun Wu# Date 1488953311 28800 # Tue Mar 07 22:08:31 2017 -0800 # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb # Parent f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2 # Available At https://bitbucket.org/quark-zju/hg-draft # hg pull https://bitbucket.org/quark-zju/hg-draft -r d136f214b3a5 chgcache: implement repocache The repocache is based on smartcache. It will be used widely because most objects interested to stateful chg are repo-related. In the future, we may want to move part of localrepository to a thin class, but for now we just use the repo object directly. diff --git a/mercurial/chgcache.py b/mercurial/chgcache.py --- a/mercurial/chgcache.py +++ b/mercurial/chgcache.py @@ -80,2 +80,7 @@ class smartcache(object): set(fullkey, (newhash, newvalue)) return newvalue + +class repocache(smartcache): +def __init__(self, repo, loadfunctable): +keyprefix = 'repo\0%s\0' % repo.root +super(repocache, self).__init__(keyprefix, repo, loadfunctable) diff --git a/tests/test-chgcache.py b/tests/test-chgcache.py --- a/tests/test-chgcache.py +++ b/tests/test-chgcache.py @@ -4,6 +4,9 @@ import os from mercurial import ( +changelog, chgcache, +localrepo, scmutil, +ui as uimod, ) @@ -55,2 +58,35 @@ printcache() # None, will invalidate the vfs.write(filename, 'ef') printcache() # cache miss, 'ef' + +def loadchangelog(repo, oldhash, oldvalue): +# NOTE: This does not take care of corner cases. See "readfoo". +newhash = repo.svfs.stat('00changelog.i').st_size +if newhash == oldhash: +print('changelog cache hit') +return oldhash, oldvalue +else: +print('changelog cache miss') +newvalue = changelog.changelog(repo.svfs) +return newhash, newvalue + +repoloadfunctable = {'changelog': loadchangelog} + +ui = uimod.ui() +ui.setconfig('ui', 'allowemptycommit', '1') + +repo = localrepo.localrepository( +ui, +os.path.join(os.environ['TESTTMP'], 'repo1'), +create=True) +repocache = chgcache.repocache(repo, repoloadfunctable) + +def printrepocache(): +print('changelog has %d revisions' % len(repocache.get('changelog'))) + +repo.commit('foo') +printrepocache() +printrepocache() + +repo.commit('bar') +printrepocache() +printrepocache() diff --git a/tests/test-chgcache.py.out b/tests/test-chgcache.py.out --- a/tests/test-chgcache.py.out +++ b/tests/test-chgcache.py.out @@ -11,2 +11,10 @@ cache["foo"] = None cache miss cache["foo"] = 'ef' +changelog cache miss +changelog has 1 revisions +changelog cache hit +changelog has 1 revisions +changelog cache miss +changelog has 2 revisions +changelog cache hit +changelog has 2 revisions ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel