Am 10.03.2013 21:17, schrieb Ramkumar Ramachandra:
> git operations are slow on repositories with lots of files, and lots
> of tiny filesystem calls like lstat(), getdents(), open() are
> reposible for this.  On the linux-2.6 repository, for instance, the
> numbers for "git status" look like this:
> 
>   top syscalls sorted     top syscalls sorted
>   by acc. time            by number
>   ----------------------------------------------
>   0.401906 40950 lstat    0.401906 40950 lstat
>   0.190484 5343 getdents  0.150055 5374 open
>   0.150055 5374 open      0.190484 5343 getdents
>   0.074843 2806 close     0.074843 2806 close
>   0.003216 157 read       0.003216 157 read
> 
> To solve this problem, we propose to build a daemon which will watch
> the filesystem using inotify and report batched up events over a UNIX
> socket.

[...]

> +
> +The credential C API is meant to be called by Git code which needs
> +information aboutx filesystem changes.  It is centered around an
> +object representing the changes the filesystem since the last
> +invocation.
> +

Hmmm...I don't see how filesystem changes since last invocation can solve the 
problem, or am I missing something? I think what you mean to say is that the 
daemon should keep track of the filesystem *state* of the working copy, or 
alternatively the deltas/changes to some known state (such as .git/index)?

I'm also still skeptical whether a daemon will improve overall performance. In 
my understanding its essentially a filesystem cache in user-mode. The 
difference to using the OS filesystem cache directly (via lstat/readdir) is 
that we replace ~50k sys-calls with a single IPC call (i.e. the git <--> 
fswatch daemon communication is less 'chatty'). However, the 'chattyness' is 
still there between the fswatch daemon and the OS / inotify. Consider 'git 
status; make; make clean; git status'...that's a *lot* of changes to process 
for nothing (potentially slowing down make).

Then there's the issue of stale data in the cache. Modifying porcelain commands 
that use 'git status --porcelain' to compile their changesets will want 100% 
exact data. I'm not saying its not doable, but adding another platform 
specific, caching daemon to the tool chain doesn't exactly simplify things...

But perhaps I'm too pessimistic (or just stigmatized by inherently slow and 
out-of-date TGitCache/TSvnCache on Windows :-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to