On Wed, 27 Sep 2017 13:04:42 -0400
Jeff Hostetler <[email protected]> wrote:
> The sparse filter is looking at pathnames and using the same rules
> as sparse-checkout to decide which to *include* in the result. This
> is essentially backwards from the other filters which are looking for
> reasons to *exclude* a blob. If I see a {pathname, sha} pair and the
> pathname is not wanted (by the sparse-checkout rules), I still don't
> know anything about the object -- since the same SHA may appear later
> in the treewalk but with a different pathname that *does* match the
> patterns and *is* wanted.
>
> The net-net is that I have to mark the blob as "provisionally omitted"
> until I've seen all the pathnames associated with it. Only then can I
> say it should be omitted.
How is this different from refraining from marking the blob as
LOFR_MARK_SEEN? When you would provisionally omit the blob, return
LOFR_ZERO so that a future iteration will revisit the blob again, and
when you would include it in the output, return
LOFR_MARK_SEEN|LOFR_SHOW.
> Likewise, there are things about the tree object that we cannot
> decide until we've seen all possible directory paths that reference it.
> For example, if you rename a tree/directory between 2 commits, but make no
> other changes within the directory, it will/should have the same SHA in the
> second commit. If one of those paths is included in the sparse-checkout
> and one is not, then you need include those blobs (and the tree object)
> in the result. If the treewalk visits the excluded case first, you don't
> want to discard the tree object (and shortcut future treewalks) because
> the filter won't get a chance to see the included directory path case.
For trees, I guess it's slightly different in that you do need an extra
flag to keep track of whether the tree has been shown. So mark SHOWN and
return LOFR_SHOW on the first time the tree is shown, and LOFR_ZERO
otherwise. And trees must never be marked as LOFR_MARK_SEEN.
(This SHOWN flag might play a similar role to your FILTER_REVISIT.)
Until now, it seems to me that the _END event is not required.
> Also, the current code does not attempt to omit tree objects, but a
> future version may. And having the _BEGIN_ and _END_ events means the
> filter can keep track of the nesting without the expense of having to
> discover it by parsing the pathname looking for slashes as we do elsewhere.
A feature that omits tree objects would need _END, true. But until
then, I personally don't think we should add such infrastructure until
we have a feature that needs it.