Duy Nguyen <pclo...@gmail.com> writes: > I can think of two improvements we could make, either increase cache > size dynamically (within limits) or make it configurable. If we have N > entries in worktree (both trees and blobs) and depth M, then we might > need to cache N*M objects for it to be effective. Christian, if you > want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and > rebuild.
Well, my optimized "git-blame" code is considerably hit by an aggressively packed Emacs repository so I took a look at it with the MAX_DELTA_CACHE value set to the default 256, and then 512, 1024, 2048. Here are the results: dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null real 1m17.496s user 0m30.552s sys 0m46.496s dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null real 1m13.888s user 0m30.060s sys 0m43.420s dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null real 1m16.415s user 0m31.436s sys 0m44.564s dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null real 1m24.732s user 0m34.416s sys 0m49.808s So using a value of 512 helps a bit (7% or so), but further increases already cause a hit. My machine has 4G of memory (32bit x86), so it is unlikely that memory is running out. I have no idea why this would be so: either memory locality plays a role here, or the cache for some reason gets reinitialized or scanned/copied/accessed as a whole repeatedly, defeating the idea of a cache. Or the access pattern are such that it's entirely useless as a cache even at this size. Trying with 16384: dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null real 2m8.000s user 0m54.968s sys 1m12.624s And memory consumption did not exceed about 200m all the while, so is far lower than what would have been available. Something's _really_ fishy about that cache behavior. Note that the _system_ time goes up considerably, not just user time. Since the packs are zlib-packed, it's reasonable that more I/O time is also associated with more user time and it is well possible that the user time increase is entirely explainable by the larger amount of compressed data to access. But this stinks. I doubt that the additional time is spent in memory allocation: most of that would register only as user time. And the total allocated memory is not large enough that one can explain this away with fewer available disk buffers for the kernel: the aggressively packed repo takes about 300m so it would fine into memory together with the git process. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html