On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:
> >>> What's the current state of the index before this checkout?
> >>
> >> This was after running "git checkout" multiple times so there was really
> >> nothing for git to do.
> > 
> > Hmm.. this means cache-tree is fully valid, unless you have changes in
> > index. We're quite aggressive in repairing cache-tree since aecf567cbf
> > (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
> > have very good cache-tree records and still spend 33s on
> > traverse_trees, maybe there's something else.
> > 
> 
> I'm not at all familiar with the cache-tree and couldn't find any 
> documentation on it other than index-format.txt which says "it helps 
> speed up tree object generation for a new commit."

I guess you have the starting points you need after Jeff's and Junio's
explanation (and it would be great if cache-tree could actually be for
for this two-way merge). But to make it easier for new people in
future, maybe we should add this?

This is basically a ripoff of Junio's explanation with starting points
(write-tree and index-format.txt). I wanted to incorporate some pieces
from Jeff's too but I think Junio's already covered it well.

-- 8< --
Subject: [PATCH] cache-tree.h: more description of what it is and what's it 
used for

Signed-off-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
---
 cache-tree.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/cache-tree.h b/cache-tree.h
index cfd5328cc9..d25a800a72 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -5,6 +5,35 @@
 #include "tree.h"
 #include "tree-walk.h"
 
+/*
+ * cache-tree is an index extension that records tree object names for
+ * subdirectories you see in the index. It is mainly used for
+ * generating trees from the index before you create a new commit (see
+ * builtin/write-tree.c as starting point) but it's also used in "git
+ * diff-index --cached $TREE" as an optimization. See index-format.txt
+ * for on-disk format.
+ *
+ * Every time you write the contents of the index as a tree object, we
+ * need to collect the object name for each top-level paths and write
+ * a new top-level tree object out, after doing the same recursively
+ * for any modified subdirectory. Whenever you add, remove or modify a
+ * path in the index, the cache-tree entry for enclosing directories
+ * are invalidated, so a cache-tree entry that is still valid means
+ * that all the paths in the index under that directory match the
+ * contents of the tree object that the cache-tree entry holds.
+ *
+ * And that property is used by "diff-index --cached $TREE" that is
+ * run internally.  When we find that the subdirectory "D"'s
+ * cache-tree entry is valid in the index, and the tree object
+ * recorded in the cache-tree for that subdirectory matches the
+ * subtree D in the tree object $TREE, then "diff-index --cached"
+ * ignores the entire subdirectory D (which saves relatively little in
+ * the index as it only needs to scan what is already in the memory
+ * forward, but on the $TREE traversal side, it does not have to even
+ * open a subtree, that can save a lot), and with a well-populated
+ * cache-tree, it can save a significant processing.
+ */
+
 struct cache_tree;
 struct cache_tree_sub {
        struct cache_tree *cache_tree;
-- 
2.18.0.656.gda699b98b3

-- 8< --

Reply via email to