On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote:
> On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote:
> > On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:
> >
> > > If I have a git repository with a clean working tree, and I delete the
> > > index, then I can use git reset (with no arguments) to recreate it.
> > > However, when I do recreate it, it doesn't come back the same. I have
> > > not analyzed this in detail, but the effect is that commands like git
> > > status take much longer because they must read objects out of a pack
> > > file. In other words, the index seems to not realize that the index (or
> > > at least most of it) represents the same state as HEAD. If I do git
> > > reset --hard, the index is restored to the original state (it's
> > > byte-for-byte identical), and the pack file is no longer read.
> >
> > Are you sure it's reading a packfile?
>
> Well, it's calling inflate(), and strace says it is reading
> e.g. .git/objects/pack/pack-....{idx,pack}.
>
> So, I would say so.
That seems odd that we would be spending extra time there. We do
inflate() the trees in order to diff the index against HEAD, but we
shouldn't need to inflate any blobs.
Here it is for me (on linux.git):
[before, warm cache]
$ time perf record -q git status >/dev/null
real 0m0.192s
user 0m0.080s
sys 0m0.108s
$ perf report | grep -v '#' | head -5
7.46% git [kernel.kallsyms] [k] __d_lookup_rcu
4.55% git libz.so.1.2.8 [.] inflate
3.53% git libc-2.18.so [.] __memcmp_sse4_1
3.46% git [kernel.kallsyms] [k] security_inode_getattr
3.29% git git [.] memihash
$ time git reset
real 0m0.080s
user 0m0.036s
sys 0m0.040s
So status is pretty quick, and the time is going to lstat in the kernel,
and some tree inflation. Reset is fast, because it has nothing much to
do. Now let's kill off the index's stat cache:
$ rm .git/index
$ time perf record -q git reset
real 0m0.967s
user 0m0.780s
sys 0m0.180s
That took a while. What was it doing?
$ perf report | grep -v '#' | head -5
3.23% git [kernel.kallsyms] [k] copy_user_enhanced_fast_string
1.74% git libcrypto.so.1.0.0 [.] 0x000000000007e010
1.60% git [kernel.kallsyms] [k] __d_lookup_rcu
1.51% git [kernel.kallsyms] [k] page_fault
1.44% git libc-2.18.so [.] __memcmp_sse4_1
Reading files and sha1. We hash the working-tree files here (reset
doesn't technically need to refresh the index from the working tree to
copy entries from HEAD into the index, but it does it so it can do fancy
things like tell you about which files are now out-of-date).
Now how does stat fare after this?
$ time perf record -q git status >/dev/null
real 0m0.189s
user 0m0.088s
sys 0m0.096s
Looks about the same as before to me.
Note that if you use "read-tree" instead of "reset", it _just_ loads the
index, and doesn't touch the working tree. If you then run "git status",
then _that_ command has to refresh the index, and it will pay the
hashing cost. Like:
$ rm .git/index
$ time git read-tree HEAD
real 0m0.084s
user 0m0.064s
sys 0m0.016s
$ time git status >/dev/null
real 0m0.833s
user 0m0.712s
sys 0m0.112s
All of this is behaving as I would expect. Can you show us a set of
commands that deviate from this?
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html