Hi Duy,

On Sun, 29 Apr 2018, Duy Nguyen wrote:

> On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren <new...@gmail.com> wrote:
> > Currently, all callers of unpack_trees() set o->src_index == o->dst_index.
> > The code in unpack_trees() does not correctly handle them being different.
> > There are two separate issues:
> >
> > First, there is the possibility of memory corruption.  Since
> > unpack_trees() creates a temporary index in o->result and then discards
> > o->dst_index and overwrites it with o->result, in the special case that
> > o->src_index == o->dst_index, it is safe to just reuse o->src_index's
> > split_index for o->result.  However, when src and dst are different,
> > reusing o->src_index's split_index for o->result will cause the
> > split_index to be shared.  If either index then has entries replaced or
> > removed, it will result in the other index referring to free()'d memory.
> >
> > Second, we can drop the index extensions.  Previously, we were moving
> > index extensions from o->dst_index to o->result.  Since o->src_index is
> > the one that will have the necessary extensions (o->dst_index is likely to
> > be a new index temporary index created to store the results), we should be
> > moving the index extensions from there.
> >
> > Signed-off-by: Elijah Newren <new...@gmail.com>
> > ---
> >
> > Differences from v2:
> >   - Don't NULLify src_index until we're done using it
> >   - Actually built and tested[1]
> >
> > But it now passes the testsuite on both linux and mac[2], and I even 
> > re-merged
> > all 53288 merge commits in linux.git (with a merge of this patch together 
> > with
> > the directory rename detection series) for good measure.  [Only 7 commits
> > showed a difference, all due to directory rename detection kicking in.]
> >
> > [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of
> > parallelization are great until you realize that your new setup omitted a
> > critical step, leaving you running a slightly stale version of git 
> > instead...
> > :-(
> >
> > [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both
> > with unicode normalization tests, but those two tests fail before my changes
> > too.  All the other tests pass.
> >
> >  unpack-trees.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index e73745051e..49526d70aa 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> >         o->result.timestamp.sec = o->src_index->timestamp.sec;
> >         o->result.timestamp.nsec = o->src_index->timestamp.nsec;
> >         o->result.version = o->src_index->version;
> > -       o->result.split_index = o->src_index->split_index;
> > -       if (o->result.split_index)
> > +       if (!o->src_index->split_index) {
> > +               o->result.split_index = NULL;
> > +       } else if (o->src_index == o->dst_index) {
> > +               /*
> > +                * o->dst_index (and thus o->src_index) will be discarded
> > +                * and overwritten with o->result at the end of this 
> > function,
> > +                * so just use src_index's split_index to avoid having to
> > +                * create a new one.
> > +                */
> > +               o->result.split_index = o->src_index->split_index;
> >                 o->result.split_index->refcount++;
> > +       } else {
> > +               o->result.split_index = init_split_index(&o->result);
> > +       }
> >         hashcpy(o->result.sha1, o->src_index->sha1);
> >         o->merge_size = len;
> >         mark_all_ce_unused(o->src_index);
> > @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> >                 }
> >         }
> >
> > -       o->src_index = NULL;
> >         ret = check_updates(o) ? (-2) : 0;
> >         if (o->dst_index) {
> >                 if (!ret) {
> > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> >                                                   WRITE_TREE_SILENT |
> >                                                   WRITE_TREE_REPAIR);
> >                 }
> > -               move_index_extensions(&o->result, o->dst_index);
> > +               move_index_extensions(&o->result, o->src_index);
> 
> While this looks like the right thing to do on paper, I believe it's
> actually broken for a specific case of untracked cache. In short,
> please do not touch this line. I will send a patch to revert
> edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08),
> which essentially deletes this line, with proper explanation and
> perhaps a test if I could come up with one.
> 
> When we update the index, we depend on the fact that all updates must
> invalidate the right untracked cache correctly. In this unpack
> operations, we start copying entries over from src to result. Since
> 'result' (at least from the beginning) does not have an untracked
> cache, it has nothing to invalidate when we copy entries over. By the
> time we have done preparing 'result', what's recorded in src's (or
> dst's for that matter) untracked cache may or may not apply to
> 'result'  index anymore. This copying only leads to more problems when
> untracked cache is used.

Is there really no way to invalidate just individual entries?

I have a couple of worktrees which are *huge*. And edf3b90553 really
helped relieve the pain a bit when running `git status`. Now you say that
even a `git checkout -b new-branch` would blow the untracked cache away
again?

It would be *really* nice if we could prevent that performance regression
somehow.

Ciao,
Dscho

Reply via email to