Re: [PATCH 3 of 3 V3] chgcache: implement repocache

2017-03-12 Thread Yuya Nishihara
On Sat, 11 Mar 2017 17:21:41 -0800, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-03-11 15:50:43 -0800:
> > On Sat, 11 Mar 2017 14:54:54 -0800, Jun Wu wrote:
> > > Unlike the current "start new server for new extensions" behavior. The new
> > > design allows us to serve the request using the old server sub-optimally,
> > > and start a more efficient server in the background to replace the current
> > > one soon.
> > > 
> > > A fb-only example is that remotefilelog uses repo-independent packfiles.
> > > 
> > > I think the cache state is conceptually attached to the current process,
> > > instead of repo. Thus a global may actually be a better fit.
> > 
> > Yeah, it will live longer than a repo, and will reside in dispatcher level,
> > which I don't think mean the data should be globally accessible by anyone.
> 
> It sounds like it could be done by split the "global" in chgcache states
> into different modules: chgserver, dispatch, ...
> 
> In that case, adding a new parameter to "smartcache" to decouple it from
> chgcache._cache seems to address the issue.
> 
> It sounds possible, but I need more time to make sure it works.

Maybe something like that. Since most data to be cached would be repo-dependent,
we don't need a global namespace for cache content. We can just attach repocache
per repo and its content can be managed by the server.

  cache = {repo.root: repostrage}# managed by chg server
  repo.repocache = cache[repo.root]  # maybe by chg ui/req hook?
  changelog.repocache = repo.repocache  # passed by repo

For remotefilelog which prefers wider cache, maybe it can manage its own cache
storage? I haven't consider well how it should be invalidated, though.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3 V3] chgcache: implement repocache

2017-03-11 Thread Jun Wu
Excerpts from Yuya Nishihara's message of 2017-03-11 15:50:43 -0800:
> On Sat, 11 Mar 2017 14:54:54 -0800, Jun Wu wrote:
> > Excerpts from Yuya Nishihara's message of 2017-03-11 12:54:17 -0800:
> > > On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote:
> > > > # HG changeset patch
> > > > # User Jun Wu 
> > > > # Date 1488953311 28800
> > > > #  Tue Mar 07 22:08:31 2017 -0800
> > > > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb
> > > > # Parent  f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2
> > > > # Available At https://bitbucket.org/quark-zju/hg-draft  
> > > > #  hg pull https://bitbucket.org/quark-zju/hg-draft   -r 
> > > > d136f214b3a5
> > > > chgcache: implement repocache
> > > 
> > > > +repoloadfunctable = {'changelog': loadchangelog}
> > > > +
> > > > +ui = uimod.ui()
> > > > +ui.setconfig('ui', 'allowemptycommit', '1')
> > > > +
> > > > +repo = localrepo.localrepository(
> > > > +ui,
> > > > +os.path.join(os.environ['TESTTMP'], 'repo1'),
> > > > +create=True)
> > > > +repocache = chgcache.repocache(repo, repoloadfunctable)
> > > 
> > > Suppose a cache object is attached to a repo by chgserver, do we really
> > > need a global _cache storage?
> > 
> > One design goal is to decouple low-level API from repo. i.e. chgcache could
> > be used to cache non-repo objects.
> > 
> > Extensions are an example of non-repo objects that could benefit from the
> > design. Workers could send new extensions to the master, and the master
> > could have better "extension cache hit rate" after reloading extensions.
> 
> I don't get how that would work. Anyway, the extension would register 
> something
> in the master process to load into cache. Then, it would be able to access to
> the same cache object without using a global _cache dict.

chgserver (or dispatch, or extensions, or someone else) will have a cache
like:

  _loadedextension = {hash: pymod}

I think chg server can just call extensions._importext to load an extension,
assuming side-effect free. Maybe Python has some behavior to make it
impossible. I'll investigate.

> 
> > Unlike the current "start new server for new extensions" behavior. The new
> > design allows us to serve the request using the old server sub-optimally,
> > and start a more efficient server in the background to replace the current
> > one soon.
> > 
> > A fb-only example is that remotefilelog uses repo-independent packfiles.
> > 
> > I think the cache state is conceptually attached to the current process,
> > instead of repo. Thus a global may actually be a better fit.
> 
> Yeah, it will live longer than a repo, and will reside in dispatcher level,
> which I don't think mean the data should be globally accessible by anyone.

It sounds like it could be done by split the "global" in chgcache states
into different modules: chgserver, dispatch, ...

In that case, adding a new parameter to "smartcache" to decouple it from
chgcache._cache seems to address the issue.

It sounds possible, but I need more time to make sure it works.

>
> > We want the
> > behavior that "fork()" copies the state, and don't care if fork() should
> > copy the repo object or not.
> 
> IMHO, fork() is just for parallelism (given we can eventually omit uisetup()
> in the master process.)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3 V3] chgcache: implement repocache

2017-03-11 Thread Yuya Nishihara
On Sat, 11 Mar 2017 14:54:54 -0800, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-03-11 12:54:17 -0800:
> > On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote:
> > > # HG changeset patch
> > > # User Jun Wu 
> > > # Date 1488953311 28800
> > > #  Tue Mar 07 22:08:31 2017 -0800
> > > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb
> > > # Parent  f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2
> > > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > > #  hg pull https://bitbucket.org/quark-zju/hg-draft  -r 
> > > d136f214b3a5
> > > chgcache: implement repocache
> > 
> > > +repoloadfunctable = {'changelog': loadchangelog}
> > > +
> > > +ui = uimod.ui()
> > > +ui.setconfig('ui', 'allowemptycommit', '1')
> > > +
> > > +repo = localrepo.localrepository(
> > > +ui,
> > > +os.path.join(os.environ['TESTTMP'], 'repo1'),
> > > +create=True)
> > > +repocache = chgcache.repocache(repo, repoloadfunctable)
> > 
> > Suppose a cache object is attached to a repo by chgserver, do we really
> > need a global _cache storage?
> 
> One design goal is to decouple low-level API from repo. i.e. chgcache could
> be used to cache non-repo objects.
> 
> Extensions are an example of non-repo objects that could benefit from the
> design. Workers could send new extensions to the master, and the master
> could have better "extension cache hit rate" after reloading extensions.

I don't get how that would work. Anyway, the extension would register something
in the master process to load into cache. Then, it would be able to access to
the same cache object without using a global _cache dict.

> Unlike the current "start new server for new extensions" behavior. The new
> design allows us to serve the request using the old server sub-optimally,
> and start a more efficient server in the background to replace the current
> one soon.
> 
> A fb-only example is that remotefilelog uses repo-independent packfiles.
> 
> I think the cache state is conceptually attached to the current process,
> instead of repo. Thus a global may actually be a better fit.

Yeah, it will live longer than a repo, and will reside in dispatcher level,
which I don't think mean the data should be globally accessible by anyone.

> We want the
> behavior that "fork()" copies the state, and don't care if fork() should
> copy the repo object or not.

IMHO, fork() is just for parallelism (given we can eventually omit uisetup()
in the master process.)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3 V3] chgcache: implement repocache

2017-03-11 Thread Jun Wu
Excerpts from Yuya Nishihara's message of 2017-03-11 12:54:17 -0800:
> On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote:
> > # HG changeset patch
> > # User Jun Wu 
> > # Date 1488953311 28800
> > #  Tue Mar 07 22:08:31 2017 -0800
> > # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb
> > # Parent  f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2
> > # Available At https://bitbucket.org/quark-zju/hg-draft 
> > #  hg pull https://bitbucket.org/quark-zju/hg-draft  -r 
> > d136f214b3a5
> > chgcache: implement repocache
> 
> > +repoloadfunctable = {'changelog': loadchangelog}
> > +
> > +ui = uimod.ui()
> > +ui.setconfig('ui', 'allowemptycommit', '1')
> > +
> > +repo = localrepo.localrepository(
> > +ui,
> > +os.path.join(os.environ['TESTTMP'], 'repo1'),
> > +create=True)
> > +repocache = chgcache.repocache(repo, repoloadfunctable)
> 
> Suppose a cache object is attached to a repo by chgserver, do we really
> need a global _cache storage?

One design goal is to decouple low-level API from repo. i.e. chgcache could
be used to cache non-repo objects.

Extensions are an example of non-repo objects that could benefit from the
design. Workers could send new extensions to the master, and the master
could have better "extension cache hit rate" after reloading extensions.
Unlike the current "start new server for new extensions" behavior. The new
design allows us to serve the request using the old server sub-optimally,
and start a more efficient server in the background to replace the current
one soon.

A fb-only example is that remotefilelog uses repo-independent packfiles.

I think the cache state is conceptually attached to the current process,
instead of repo. Thus a global may actually be a better fit. We want the
behavior that "fork()" copies the state, and don't care if fork() should
copy the repo object or not.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 3 V3] chgcache: implement repocache

2017-03-11 Thread Yuya Nishihara
On Tue, 7 Mar 2017 22:35:59 -0800, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu 
> # Date 1488953311 28800
> #  Tue Mar 07 22:08:31 2017 -0800
> # Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb
> # Parent  f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #  hg pull https://bitbucket.org/quark-zju/hg-draft -r 
> d136f214b3a5
> chgcache: implement repocache

> +repoloadfunctable = {'changelog': loadchangelog}
> +
> +ui = uimod.ui()
> +ui.setconfig('ui', 'allowemptycommit', '1')
> +
> +repo = localrepo.localrepository(
> +ui,
> +os.path.join(os.environ['TESTTMP'], 'repo1'),
> +create=True)
> +repocache = chgcache.repocache(repo, repoloadfunctable)

Suppose a cache object is attached to a repo by chgserver, do we really need
a global _cache storage?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 3 V3] chgcache: implement repocache

2017-03-07 Thread Jun Wu
# HG changeset patch
# User Jun Wu 
# Date 1488953311 28800
#  Tue Mar 07 22:08:31 2017 -0800
# Node ID d136f214b3a5bd4698dfd96c641ad73f96a743cb
# Parent  f0bded8d53c5c9a5cfb25d29dd99cf4eb3fb79b2
# Available At https://bitbucket.org/quark-zju/hg-draft
#  hg pull https://bitbucket.org/quark-zju/hg-draft -r d136f214b3a5
chgcache: implement repocache

The repocache is based on smartcache. It will be used widely because most
objects interested to stateful chg are repo-related.

In the future, we may want to move part of localrepository to a thin class,
but for now we just use the repo object directly.

diff --git a/mercurial/chgcache.py b/mercurial/chgcache.py
--- a/mercurial/chgcache.py
+++ b/mercurial/chgcache.py
@@ -80,2 +80,7 @@ class smartcache(object):
 set(fullkey, (newhash, newvalue))
 return newvalue
+
+class repocache(smartcache):
+def __init__(self, repo, loadfunctable):
+keyprefix = 'repo\0%s\0' % repo.root
+super(repocache, self).__init__(keyprefix, repo, loadfunctable)
diff --git a/tests/test-chgcache.py b/tests/test-chgcache.py
--- a/tests/test-chgcache.py
+++ b/tests/test-chgcache.py
@@ -4,6 +4,9 @@ import os
 
 from mercurial import (
+changelog,
 chgcache,
+localrepo,
 scmutil,
+ui as uimod,
 )
 
@@ -55,2 +58,35 @@ printcache() # None, will invalidate the
 vfs.write(filename, 'ef')
 printcache() # cache miss, 'ef'
+
+def loadchangelog(repo, oldhash, oldvalue):
+# NOTE: This does not take care of corner cases. See "readfoo".
+newhash = repo.svfs.stat('00changelog.i').st_size
+if newhash == oldhash:
+print('changelog cache hit')
+return oldhash, oldvalue
+else:
+print('changelog cache miss')
+newvalue = changelog.changelog(repo.svfs)
+return newhash, newvalue
+
+repoloadfunctable = {'changelog': loadchangelog}
+
+ui = uimod.ui()
+ui.setconfig('ui', 'allowemptycommit', '1')
+
+repo = localrepo.localrepository(
+ui,
+os.path.join(os.environ['TESTTMP'], 'repo1'),
+create=True)
+repocache = chgcache.repocache(repo, repoloadfunctable)
+
+def printrepocache():
+print('changelog has %d revisions' % len(repocache.get('changelog')))
+
+repo.commit('foo')
+printrepocache()
+printrepocache()
+
+repo.commit('bar')
+printrepocache()
+printrepocache()
diff --git a/tests/test-chgcache.py.out b/tests/test-chgcache.py.out
--- a/tests/test-chgcache.py.out
+++ b/tests/test-chgcache.py.out
@@ -11,2 +11,10 @@ cache["foo"] = None
 cache miss
 cache["foo"] = 'ef'
+changelog cache miss
+changelog has 1 revisions
+changelog cache hit
+changelog has 1 revisions
+changelog cache miss
+changelog has 2 revisions
+changelog cache hit
+changelog has 2 revisions
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel