On 10/18/2018 2:26 PM, Duy Nguyen wrote:
On Thu, Oct 18, 2018 at 8:18 PM Ben Peart <peart...@gmail.com> wrote:
I actually started my effort to speed up reset by attempting to
multi-thread refresh_index(). You can see a work in progress at:
https://github.com/benpeart/git/pull/new/refresh-index-multithread-gvfs
The patch doesn't always work as it is still not thread safe. When it
works, it's great but I ran into to many difficulties trying to debug
the remaining threading issues (even adding print statements would
change the timing and the repro would disappear). It will take a lot of
code review to discover and fix the remaining non-thread safe code paths.
In addition, the optimized code path that takes advantage of fsmonitor,
uses multiple threads, fscache, etc _already exists_ in preload_index().
Trying to recreate all those optimizations in refresh_index() is (as I
discovered) a daunting task.
Why not make refresh_index() run preload_index() first (or the
parallel lstat part to be precise), and only do the heavy
content-based refresh in single thread mode?
Head smack! Why didn't I think of that?
That is a terrific suggestion. Calling preload_index() right before the
big for loop in refresh_index() is a trivial and effective way to do the
bulk of the updating with the optimized code. After doing that, most of
the cache entries can bail out quickly down in refresh_cache_ent() when
it tests ce_uptodate(ce).
Here are the numbers using that optimization (hot cache, averaged across
3 runs):
0.32 git add asdf
1.67 git reset asdf
1.68 git status
3.67 Total
vs without it:
0.32 git add asdf
2.48 git reset asdf
1.50 git status
4.30 Total
For a savings in the reset command of 32% and 15% overall.
Clearly doing the refresh_index() faster is not as much savings as not
doing it at all. Given how simple this patch is, I think it makes sense
to do both so that we have optimized each path to is fullest.