Re: [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree

2018-08-15 Thread Duy Nguyen
On Mon, Aug 13, 2018 at 8:58 PM Ben Peart  wrote:
> > +  *
> > +  * D/F conflicts and higher stage entries are not a concern
> > +  * because cache-tree would be invalidated and we would never
> > +  * get here in the first place.
> > +  */
> > + for (i = 0; i < nr_entries; i++) {
> > + struct cache_entry *tree_ce;
> > + int len, rc;
> > +
> > + src[0] = o->src_index->cache[pos + i];
> > +
> > + len = ce_namelen(src[0]);
> > + tree_ce = xcalloc(1, cache_entry_size(len));
> > +
> > + tree_ce->ce_mode = src[0]->ce_mode;
> > + tree_ce->ce_flags = create_ce_flags(0);
> > + tree_ce->ce_namelen = len;
> > + oidcpy(_ce->oid, [0]->oid);
> > + memcpy(tree_ce->name, src[0]->name, len + 1);
> > +
> > + for (d = 1; d <= nr_names; d++)
> > + src[d] = tree_ce;
> > +
> > + rc = call_unpack_fn((const struct cache_entry * const *)src, 
> > o);
>
> I don't fully understand why this is still necessary since "we detect
> that all trees are the same as cache-tree at this path."  I do know
> (because I tried it :)) that if we don't actually call the unpack
> function the patch fails a bunch of tests so clearly something important
> is being missed.

Yeah because removing this line assumes n-way logic, which most likely
means "use the index version if all trees are the same as the index"
but it's not necessarily true. There could be flags that make n-way
behave differently. And even if we make that assumption, we need to
copy src[0] to o->result (heh I tried that "skip call_unpack_fn" thing
too when I thought this would be the same as the diff-index --cached
optimization path, and only realized copying to o->result was needed
afterwards).
-- 
Duy


Re: [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree

2018-08-13 Thread Ben Peart




On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:

In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on webkit.git (275k files):

 baseline  new
   
 0.056651714   0.080394752 s:  read cache .git/index
 0.183101080   0.216010838 s:  preload index
 0.008584433   0.008534301 s:  refresh index
 0.633767589   0.251992198 s:   traverse_trees
 0.340265448   0.377031383 s:   check_updates
 0.381884638   0.372768105 s:   cache_tree_update
 1.401562947   1.045887251 s:  unpack_trees
 0.338687914   0.314983512 s:  write index, changed mask = 2e
 0.411927922   0.062572653 s:traverse_trees
 0.23335   0.22544 s:check_updates
 0.423697246   0.073795585 s:   unpack_trees
 0.423708360   0.073807557 s:  diff-index
 2.559524127   1.938191592 s: git command: git checkout -

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

 baselinenew
   --
 0.535510167 0.556558733 s: read cache .git/index
 0.3057373   0.3147105   s: initialize name hash
 0.0184082   0.023558433 s: preload index
 0.086910967 0.089085967 s: refresh index
 7.889590767 2.191554433 s: unpack trees
 0.120760833 0.131941267 s: update worktree after a merge
 2.2583504   2.572663167 s: repair cache-tree
 0.8916137   0.959495233 s: write index, changed mask = 28
 3.405199233 0.2710663   s: unpack trees
 0.000999667 0.0021554   s: update worktree after a merge
 3.4063306   0.273318333 s: diff-index
 16.9524923  9.462943133 s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

PS. A note about cache-tree invalidation and the use of it in this
code.

We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?

The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  unpack-trees.c | 127 +
  1 file changed, 127 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index b237eaa0f2..07456d0fb2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, 
struct name_entry *nam
return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
  }
  
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,

+   struct name_entry *names,
+   struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int i;
+
+   if (!o->merge || dirmask != ((1 << n) - 1))
+   return 0;
+
+   for (i = 1; i < n; i++)
+   if (!are_same_oid(names, names + i))
+   return 0;
+
+   return cache_tree_matches_traversal(o->src_index->cache_tree, names, 
info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+ struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int len = traverse_path_len(info, names);
+   char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+   int pos;
+
+   

[PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree

2018-08-12 Thread Nguyễn Thái Ngọc Duy
In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on webkit.git (275k files):

baseline  new
  
0.056651714   0.080394752 s:  read cache .git/index
0.183101080   0.216010838 s:  preload index
0.008584433   0.008534301 s:  refresh index
0.633767589   0.251992198 s:   traverse_trees
0.340265448   0.377031383 s:   check_updates
0.381884638   0.372768105 s:   cache_tree_update
1.401562947   1.045887251 s:  unpack_trees
0.338687914   0.314983512 s:  write index, changed mask = 2e
0.411927922   0.062572653 s:traverse_trees
0.23335   0.22544 s:check_updates
0.423697246   0.073795585 s:   unpack_trees
0.423708360   0.073807557 s:  diff-index
2.559524127   1.938191592 s: git command: git checkout -

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

baselinenew
  --
0.535510167 0.556558733 s: read cache .git/index
0.3057373   0.3147105   s: initialize name hash
0.0184082   0.023558433 s: preload index
0.086910967 0.089085967 s: refresh index
7.889590767 2.191554433 s: unpack trees
0.120760833 0.131941267 s: update worktree after a merge
2.2583504   2.572663167 s: repair cache-tree
0.8916137   0.959495233 s: write index, changed mask = 28
3.405199233 0.2710663   s: unpack trees
0.000999667 0.0021554   s: update worktree after a merge
3.4063306   0.273318333 s: diff-index
16.9524923  9.462943133 s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

PS. A note about cache-tree invalidation and the use of it in this
code.

We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?

The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 unpack-trees.c | 127 +
 1 file changed, 127 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index b237eaa0f2..07456d0fb2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, 
struct name_entry *nam
return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+   struct name_entry *names,
+   struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int i;
+
+   if (!o->merge || dirmask != ((1 << n) - 1))
+   return 0;
+
+   for (i = 1; i < n; i++)
+   if (!are_same_oid(names, names + i))
+   return 0;
+
+   return cache_tree_matches_traversal(o->src_index->cache_tree, names, 
info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+ struct traverse_info *info)
+{
+   struct unpack_trees_options *o = info->data;
+   int len = traverse_path_len(info, names);
+   char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+   int pos;
+
+   make_traverse_path(name, info, names);
+   name[len++] = '/';
+   name[len] =