Julian Foad wrote on Fri, Jan 21, 2022 at 11:15:04 +0000: > Scanning with 'stat' > > I'm concerned about the implementation scanning the whole subtree, > calling 'stat' on every file to determine whether the file is "changed" > (locally modified). This is done in svn_wc__textbase_sync() with its > textbase_walk_cb(). > > It does this scan on every sync, which is twice on every syncing > operation such as diff. > > Don't we already have an optimised scan for local modifications > implemented in the "status" code?
This? — [subversion/libsvn_wc/status.c] 457 /* If the on-disk dirent exactly matches the expected state 458 skip all operations in svn_wc__internal_text_modified_p() 459 to avoid an extra filestat for every file, which can be 460 expensive on network drives as a filestat usually can't 461 be cached there */ 462 if (!info->has_checksum) 463 text_modified_p = TRUE; /* Local addition -> Modified */ 464 else if (ignore_text_mods 465 ||(dirent 466 && info->recorded_size != SVN_INVALID_FILESIZE 467 && info->recorded_time != 0 468 && info->recorded_size == dirent->filesize 469 && info->recorded_time == dirent->mtime)) 470 text_modified_p = FALSE; This still does a stat() on every file; how else would it obtain dirent->mtime? It doesn't do open()/read()/memcmp(). > Could we re-use this? textbase_walk_cb() calls check_file_modified() which has a very similar size-and-mtime check right at the top. So, we already repeat the logic; we just implement it twice. Reuse would be nice, of course. If nothing else, we could at least add comments to the two locations cross-referencing them. > Premature Hydrating > > The present implementation "hydrates" (fetches missing pristines) every > file within the whole subtree the operation targets. This is done by > every major client operation calling svn_client__textbase_sync() before > and afterwards. > Does it? Looking at textbase_walk_cb(), it only sets REFERENCED to TRUE for modified files. If I understand correctly textbase_walk_cb() and the docstring of svn_wc__db_textbase_walk(), something along the lines of . svn revert -R ./ echo foo > subversion/tests/README svn diff . would fetch the pristine only for that one file, wouldn't it? Sorry, I haven't got time to test this right now. > That is pessimistic: the operation may not actually touch all these > files if limited in any way such as by > > - depth filtering > - other filtering (changelist, properties-only, ...) > - terminating early (e.g. output piped to 'head') > > That introduces all the fetching overhead for the given subtree as a > latency before the operation shows its results, which for something > small at the root of the tree such as "svn diff --depth=empty > --properties-only ./" may make a significant usability impact. > > Presumably we could add the depth and some other kinds of filtering to > the tree walk. But that will always leave terminating early, and > possibly other cases, sub-optimal. > > I would prefer a solution that defers the hydrating until closer to the > moment of demand. Agree that from a UX perspective, it would be nice to avoid a long delay at the start of an operation. However, in cases such as «svn diff --diff-cmd=<GUI tool>», fetching the pristines (too) close to the time they are needed could result in having to reopen RA sessions. In this case, perhaps it would make sense to download the pristines in the background in a separate thread (at least in case APR_HAS_THREADS)? > Evgeny, have you looked into these possibilities at all? What are your > thoughts about these? Cheers, Daniel