Duy Nguyen <pclo...@gmail.com> writes:

> I can think of two improvements we could make, either increase cache
> size dynamically (within limits) or make it configurable. If we have N
> entries in worktree (both trees and blobs) and depth M, then we might
> need to cache N*M objects for it to be effective. Christian, if you
> want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
> rebuild.

Well, my optimized "git-blame" code is considerably hit by an
aggressively packed Emacs repository so I took a look at it with the
MAX_DELTA_CACHE value set to the default 256, and then 512, 1024, 2048.

Here are the results:

dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real    1m17.496s
user    0m30.552s
sys     0m46.496s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real    1m13.888s
user    0m30.060s
sys     0m43.420s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real    1m16.415s
user    0m31.436s
sys     0m44.564s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real    1m24.732s
user    0m34.416s
sys     0m49.808s

So using a value of 512 helps a bit (7% or so), but further increases
already cause a hit.  My machine has 4G of memory (32bit x86), so it is
unlikely that memory is running out.  I have no idea why this would be
so: either memory locality plays a role here, or the cache for some
reason gets reinitialized or scanned/copied/accessed as a whole
repeatedly, defeating the idea of a cache.  Or the access pattern are
such that it's entirely useless as a cache even at this size.

Trying with 16384:
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c >/dev/null

real    2m8.000s
user    0m54.968s
sys     1m12.624s

And memory consumption did not exceed about 200m all the while, so is
far lower than what would have been available.

Something's _really_ fishy about that cache behavior.  Note that the
_system_ time goes up considerably, not just user time.  Since the packs
are zlib-packed, it's reasonable that more I/O time is also associated
with more user time and it is well possible that the user time increase
is entirely explainable by the larger amount of compressed data to
access.

But this stinks.  I doubt that the additional time is spent in memory
allocation: most of that would register only as user time.  And the
total allocated memory is not large enough that one can explain this
away with fewer available disk buffers for the kernel: the aggressively
packed repo takes about 300m so it would fine into memory together with
the git process.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to