Re: A two-part vision for Subversion and large binary objects.

Evgeny Kotkov Thu, 27 Jan 2022 14:59:07 -0800

Julian Foad <julianf...@apache.org> writes:

> Scanning with 'stat'
>
> I'm concerned about the implementation scanning the whole subtree,
> calling 'stat' on every file to determine whether the file is "changed"
> (locally modified). This is done in svn_wc__textbase_sync() with its
> textbase_walk_cb().
>
> It does this scan on every sync, which is twice on every syncing
> operation such as diff.
>
> Don't we already have an optimised scan for local modifications
> implemented in the "status" code? Could we re-use this?


In a few of my experiments, performance of textbase_sync() was more or
less comparable to a status walk.  So maybe it's not actually worthwhile
spending time on improving this part, at least for now.

Also, I tend to think that DRY doesn't really apply here, because a status
walk and a textbase sync are essentially different operations that just
happen to have something in common internally.  For example, a textbase
sync doesn't have to follow the tree structure and can be implemented
with an arbitrarily ordered walk over NODES.

> Premature Hydrating
>
> The present implementation "hydrates" (fetches missing pristines) every
> file within the whole subtree the operation targets. This is done by
> every major client operation calling svn_client__textbase_sync() before
> and afterwards.
>
> That is pessimistic: the operation may not actually touch all these
> files if limited in any way such as by
>
>   - depth filtering
>   - other filtering (changelist, properties-only, ...)
>   - terminating early (e.g. output piped to 'head')
>
> That introduces all the fetching overhead for the given subtree as a
> latency before the operation shows its results, which for something
> small at the root of the tree such as "svn diff --depth=empty
> --properties-only ./" may make a significant usability impact.
>
> Presumably we could add the depth and some other kinds of filtering to
> the tree walk. But that will always leave terminating early, and
> possibly other cases, sub-optimal.
>
> I would prefer a solution that defers the hydrating until closer to the
> moment of demand.

I think that fetching the pristine contents at the moment of demand is a
particularly problematic concept to pursue, because it implies that there is
a network request that can now happen at an unpredictable moment of time.
So any operation that may access the pristine contents has to be ready for
a network fetch.  Compared to that, fetching the required pristines before
the operation does not impose that kind of requirement on the existing code.


Thanks,
Evgeny Kotkov

Re: A two-part vision for Subversion and large binary objects.

Reply via email to