[RFC/PATCH v2] patch-ids: cache patch IDs in a notes tree

2013-05-12 Thread John Keeping
This adds a new configuration variable patchid.cacheRef which controls
whether (and where) patch IDs will be cached in a notes tree.  The cache
includes a hash of the diff options in place when the cache was
generated as well as the version of Git that generated it.

When the diff options hash changes we simply ignore that cache entry and
replace it with the new patch ID with the new diff options.  But if the
Git version changes we throw away the entire cache tree on the
assumption that the user is unlikely to be simply flipping between
different versions of Git.

Caching patch IDs in this way results in a performance improvement in
every case I have tried, for example if I run a script which checks
whether any of six local branches have been picked up upstream I get the
following results:

Without patchid.cacheRef enabled:

$ time git integration --status jk/pu /dev/null

real0m4.295s
user0m3.927s
sys 0m0.270s

With patchid.cacheRef set but not yet initialised:

$ time git integration --status jk/pu /dev/null

real0m2.317s
user0m2.036s
sys 0m0.187s

With patchid.cacheRef and cache primed:

$ time git integration --status jk/pu /dev/null

real0m1.565s
user0m1.310s
sys 0m0.153s

The script basically does:

git log --cherry pu...branch

for each of six branches in turn (with some additional commands around
that which are now the bottleneck).

Signed-off-by: John Keeping j...@keeping.me.uk
---
On Sun, May 12, 2013 at 10:08:51AM +0100, John Keeping wrote:
 On Sat, May 11, 2013 at 06:57:02PM -0500, Johannes Schindelin wrote:
  On Sat, 11 May 2013, Linus Torvalds wrote:
  
   [...] I really think caching patch ID's should be something people
   should be aware of is fundamentally wrong, even if it might work.
  
  Given the incredible performance win, I would say it is still worth
  looking into.
  
  If you store also a hash of Git version and diff options (may even be the
  hash of the raw bytes of the diff options if you do not plan to share the
  ref between machines) with the patch ID, you can make it safe.
  
  That hash would be generated at patch_id init time and
  load_cached_patch_id() would check this hash in addition to the return
  value of get_sha1() (and ignore the note if the version/diff options
  differ).
 
 I was thinking about this overnight, glad to see someone else had the
 same idea :-)
 
 It's slightly annoying because the diff options can be customized after
 we return from init_patch_ids() so we either need a new
 setup_patch_ids() function to be run after init once diff options have
 been set or to set it lazily.  I'll try introducing a setup function.

This is what that looks like.  I think the way I'm hashing the diff
options is sensible but another pair of eyes would be useful there.

A cache entry looks like this:

9e99e9dbf5c6a717ac60f7ee425c53e87ffd821a
diffopts:f8ca35c3d9d57076338dff8abf91b07157bb6329
1.8.3.rc1.45.gcb72da6.dirty

 Documentation/config.txt |   7 ++
 builtin/log.c|   1 +
 patch-ids.c  | 215 ++-
 patch-ids.h  |   4 +
 revision.c   |   1 +
 t/t6007-rev-list-cherry-pick-file.sh |  16 +++
 6 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6e53fc5..e36585c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1798,6 +1798,13 @@ pager.cmd::
precedence over this option.  To disable pagination for all
commands, set `core.pager` or `GIT_PAGER` to `cat`.
 
+patchid.cacheRef::
+   The name of a notes ref in which to store patch IDs for commits.
+   The ref is taken to be in `refs/notes/` if it is not qualified.
++
+Note that the patch ID notes are never pruned automatically, so you may
+wish to periodically run `git notes --ref ref prune` against this ref.
+
 pretty.name::
Alias for a --pretty= format string, as specified in
linkgit:git-log[1]. Any aliases defined here can be used just
diff --git a/builtin/log.c b/builtin/log.c
index 6e56a50..dd6c24d 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -758,6 +758,7 @@ static void get_patch_ids(struct rev_info *rev, struct 
patch_ids *ids)
die(_(Not a range.));
 
init_patch_ids(ids);
+   setup_patch_ids(ids);
 
/* given a range a..b get all patch ids for b..a */
init_revisions(check_rev, rev-prefix);
diff --git a/patch-ids.c b/patch-ids.c
index bc8a28f..73b2aaf 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -1,8 +1,12 @@
 #include cache.h
+#include blob.h
 #include diff.h
 #include commit.h
+#include notes.h
+#include refs.h
 #include sha1-lookup.h
 #include patch-ids.h
+#include version.h
 
 static int commit_patch_id(struct commit *commit, struct 

Re: [RFC/PATCH v2] patch-ids: cache patch IDs in a notes tree

2013-05-12 Thread John Keeping
On Sun, May 12, 2013 at 12:41:31PM +0100, John Keeping wrote:
 diff --git a/t/t6007-rev-list-cherry-pick-file.sh 
 b/t/t6007-rev-list-cherry-pick-file.sh
 index 28d4f6b..378cf3e 100755
 --- a/t/t6007-rev-list-cherry-pick-file.sh
 +++ b/t/t6007-rev-list-cherry-pick-file.sh
 @@ -207,4 +207,20 @@ test_expect_success '--count --left-right' '
   test_cmp expect actual
  '
  
 +cat expect EOF
 +32   0
 +EOF
 +
 +test_expect_success 'rev-list --cherry-mark caches patch ids' '
 + test_config patchid.cacheref patchids 
 + git rev-list --cherry-mark --left-right --count F...E actual 
 + test_cmp expect actual 
 + git notes --ref patchids show master cached_master 

I forgot to update this test, it needs a | sed -e 1q in the middle of
this line to make sure that we're only checking the patch ID and not the
diffopts hash and Git version.

 + git log -p -1 master | git patch-id | sed -e s/ .*// patch-id_master 
 
 + test_cmp patch-id_master cached_master 
 + # Get the patch IDs again, now they should be cached
 + git rev-list --cherry-mark --left-right --count F...E actual 
 + test_cmp expect actual
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html