On 2/6/2018 7:27 AM, Duy Nguyen wrote:

This is another thing that bugs me. I know you're talking about huge
index files, but at what size should we start this sort of
optimization? Writing down a few MBs on linux is cheap enough that I
won't bother optimizing (and I get my UNTR extension repaired all the
time, so reduced lstat calls and stuff). This "typically" only comes
at certain size, what size?


It's important to identify what we're trading off here. With the proposed optimization, we'll end up doing less writes of the index but potentially more lstat calls. Even with a small index, writing the index is much more expensive than calling lstat on a file. Exactly how much more expensive depends on a lot of variables but even with a SSD its still orders of magnitude difference.

That means we could potentially lstat hundreds or thousands of files and still come out ahead. Given the untracked cache works at the directory level, the lstat cost actually scales with the number of directories that have had changes (ie changing a 2nd file in the same directory doesn't add any additional cost). In "typical" workflows, developers don't often change hundreds of files across different directories so we'd "typically" still come out ahead.

We have internal performance tests based on common developer sequences of commands (status, edit a file, status, add, status, commit, log, push, etc) that show that having the untracked cache turned on actually makes these sequences slower. At the time, we didn't have time to investigate why so we just turned off the untracked cache.

When writing the fsmonitor patch series which can utilize the untracked cache, I did some investigation into why the untracked cache was slowing things down in these tests and discovered the cause was the additional index writes. That is what led to this patch.

I've been sitting on it for months now because I didn't have the time to write some performance tests for the git perf framework to demonstrate the problem and how this helps solve it. With the discussion about the fsmonitor extension, I thought this might be a good time to send it out there.

If you have the cycles, time a typical series of commands with and without the untracked cache (ie don't just call status over and over in a loop) and you should see the same results. Given my limited time right now, I'm OK putting this on the back burner again until a git perf test can be written to ensure it actually speeds things up as advertised.

Reply via email to