Julian Foad <julianf...@apache.org> writes: > Scanning with 'stat' > > I'm concerned about the implementation scanning the whole subtree, > calling 'stat' on every file to determine whether the file is "changed" > (locally modified). This is done in svn_wc__textbase_sync() with its > textbase_walk_cb(). > > It does this scan on every sync, which is twice on every syncing > operation such as diff. > > Don't we already have an optimised scan for local modifications > implemented in the "status" code? Could we re-use this?
In a few of my experiments, performance of textbase_sync() was more or less comparable to a status walk. So maybe it's not actually worthwhile spending time on improving this part, at least for now. Also, I tend to think that DRY doesn't really apply here, because a status walk and a textbase sync are essentially different operations that just happen to have something in common internally. For example, a textbase sync doesn't have to follow the tree structure and can be implemented with an arbitrarily ordered walk over NODES. > Premature Hydrating > > The present implementation "hydrates" (fetches missing pristines) every > file within the whole subtree the operation targets. This is done by > every major client operation calling svn_client__textbase_sync() before > and afterwards. > > That is pessimistic: the operation may not actually touch all these > files if limited in any way such as by > > - depth filtering > - other filtering (changelist, properties-only, ...) > - terminating early (e.g. output piped to 'head') > > That introduces all the fetching overhead for the given subtree as a > latency before the operation shows its results, which for something > small at the root of the tree such as "svn diff --depth=empty > --properties-only ./" may make a significant usability impact. > > Presumably we could add the depth and some other kinds of filtering to > the tree walk. But that will always leave terminating early, and > possibly other cases, sub-optimal. > > I would prefer a solution that defers the hydrating until closer to the > moment of demand. I think that fetching the pristine contents at the moment of demand is a particularly problematic concept to pursue, because it implies that there is a network request that can now happen at an unpredictable moment of time. So any operation that may access the pristine contents has to be ready for a network fetch. Compared to that, fetching the required pristines before the operation does not impose that kind of requirement on the existing code. Thanks, Evgeny Kotkov