On Wed, 27 Sep 2017 15:09:43 -0400
Jeff Hostetler <[email protected]> wrote:
> By adding it to the set of provisionally omitted objects, we
> have the option to capture a little extra information with it
> and refer to that the next time we see the object in the traversal.
> For example, in the sparse-checkout case, the first time we see the
> object we know the pathname and know that it does not need to be
> included. The second time we see that object, we can see if the
> new pathname is the same as the previous one with a simple strcmp
> and avoid the expensive is_excluded_from_list() computation. Keep
> in mind that rev-list or pack-objects could be called be on something
> like HEAD~100000..HEAD or that there may be 50,000 tips. So a file
> that doesn't change across that range will be visited many times
> with the same {pathname, sha}.
Ah, capturing the extra information makes sense. I missed that detail.
> Right now I want to force the tree to be shown the first time it is
> visited (because I don't want to do tree filtering yet). I don't mark
> it SEEN yet because we may want to revisit blobs within (say, after a
> folder rename like I described previously).
>
> I do, however, mark the tree object as SEEN (in the _END event) when I
> can verify that I've included ALL of the children.
This optimization makes sense too.
> So it might be possible that I could change the flags and not use
> FILTER_REVISIT on tree objects, I hesitate to do that right now.
You're probably right that we need some sort of flag on tree objects,
and FILTER_REVISIT can do the job. (My suggestion SHOWN plays a similar
role anyway.)
> Having the FILTER_REVISIT flag on blob objects means I can avoid
> doing a hash/oidset lookup on subsequent visits.
By the hash/oidset lookup, I presume you mean the lookup on the set of
provisionally omitted objects? If yes, this makes sense.
Thanks for your clarifications - I'll take another look at the code
here.