Re: [msysGit] Re: Windows performance / threading file access

2013-10-22 Thread Karsten Blees
Am 22.10.2013 00:58, schrieb pro-logic:
 The trace_performance functions require manual instrumentation of
 the code sections you want to measure
 Ahh a case of RTFM :)
 
 Could you post details about your test setup? Are you still using
 WebKit for your tests?
 I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus
 scanner, truecrypt, no defragger.
 

OK, so truecrypt and luafv may screw things up for you (according to my 
measurements, luafv roughly doubles lstat times on C:).

 I've tried to be a bit smarter with the intent of my code, and this
 is what I came up with.
 
 diff --git a/cache.h b/cache.h
 index 4bf19e3..2e9fb1f 100644
 --- a/cache.h
 +++ b/cache.h
 @@ -294,7 +294,7 @@ extern void free_name_hash(struct index_state *istate);
  #define active_cache_changed (the_index.cache_changed)
  #define active_cache_tree (the_index.cache_tree)
  
 -#define read_cache() read_index(the_index)
 +#define read_cache() read_index_preload(the_index, NULL)
  #define read_cache_from(path) read_index_from(the_index, (path))
  #define read_cache_preload(pathspec) read_index_preload(the_index, 
 (pathspec))
  #define is_cache_unborn() is_index_unborn(the_index)
 diff --git a/read-cache.c b/read-cache.c
 index c3d5e35..5fb2788 100644
 --- a/read-cache.c
 +++ b/read-cache.c
 @@ -1866,7 +1866,7 @@ int read_index_unmerged(struct index_state *istate)
  int i;
  int unmerged = 0;
  
 -read_index(istate);
 +read_index_preload(istate, NULL);
  for (i = 0; i  istate-cache_nr; i++) {
  struct cache_entry *ce = istate-cache[i];
  struct cache_entry *new_ce;
 -- 
 

Ahh, I thought that you had enabled fscache during the entire checkout.

 Interestingly when I run on a cleanly checked out blink repo my
 changes seem to make matters worse in terms of performance, but when
 working on a repo with ignored files in it it seems to work better.
 So for point of comparison I decided to run it on a comparison on a
 repo with working ignored files in it in this case msysgit/git after
 a 'make install'. When I get a few hours I'll try to build blink and
 re-run the numbers on a much much larger repo.
 
 This comparison is a average of 3 cold cache runs of the
 kb/fscache-v4 [a] vs kb/fscache-v4 with my above changes applied [b],
 with preloadindex and fscache set to true.
 
 For comparison
 git status -s
 [a] 3.02s
 [b] 2.92s
 
 git reset --hard head
 [a] 3.67s
 [b] 3.09s
 

These numbers look far too good, so you don't actually do a fresh checkout, do 
you? I mean, delete all files except .git; killcache; git reset --hard / git 
checkout -f? That would also explain your 95% lstat times, if there's nothing 
to do...

 git add -u
 [a] 2.89s
 [b] 2.08s
 
 
 I noticed something interesting. Preload index uses 20 threads to do
 the work. When I was keeping an eye on them in task manager some
 threads will finish quite quickly, while others will run a lot
 longer. The way I understand the code at the moment the threads get
 equal chunks of work to perform. It's quite lilkely that even more
 performance could be obtained out of preload if the work splitting
 was 'smarter'. My currently best idea would be to use something like
 a lock-free queue to queue up the work and let the threads get the
 work of the queue. That way all threads are busy with work for
 longer. A candidate for the implementation would be libfds [1] queue.
 However my issue with this library and the reason I haven't tried to
 integrate is simply because the code expressly has no license.
 

As cache/cache_nr are not modified by the threads, you actually don't need a 
lock-free queue. An atomic counter shared by all threads should suffice (i.e. 
pthread's equivalent to InterlockedIncrement/InterlockedAdd).

Karsten




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [msysGit] Re: Windows performance / threading file access

2013-10-22 Thread Sebastian Schuberth
On Tue, Oct 22, 2013 at 4:30 PM, Karsten Blees karsten.bl...@gmail.com wrote:

 Could you post details about your test setup? Are you still using
 WebKit for your tests?
 I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus
 scanner, truecrypt, no defragger.


 OK, so truecrypt and luafv may screw things up for you (according to my 
 measurements, luafv roughly doubles lstat times on C:).

Aren't we disabling UAC / LUAFV on a per-executable basis using
manifests? At least the blog article at [1] suggests that we are in
fact doing it the right way using our script to genera the manifests
[2].

Oh but wait, we're not generating a manifest for git.exe itself, only
for executables that contain setup, install, update, patch
etc. So maybe having a manifest for git.exe, too, would improve
performance?

[1] 
http://blogs.msdn.com/b/alexcarp/archive/2009/06/25/the-deal-with-luafv-sys.aspx
[2] 
https://github.com/msysgit/msysgit/blob/master/share/msysGit/make-manifests.sh

-- 
Sebastian Schuberth
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [msysGit] Re: Windows performance / threading file access

2013-10-22 Thread Karsten Blees
Am 22.10.2013 16:49, schrieb Sebastian Schuberth:
 On Tue, Oct 22, 2013 at 4:30 PM, Karsten Blees karsten.bl...@gmail.com 
 wrote:
 
 Could you post details about your test setup? Are you still using
 WebKit for your tests?
 I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus
 scanner, truecrypt, no defragger.


 OK, so truecrypt and luafv may screw things up for you (according to my 
 measurements, luafv roughly doubles lstat times on C:).
 
 Aren't we disabling UAC / LUAFV on a per-executable basis using
 manifests? At least the blog article at [1] suggests that we are in
 fact doing it the right way using our script to genera the manifests
 [2].
 
 Oh but wait, we're not generating a manifest for git.exe itself, only
 for executables that contain setup, install, update, patch
 etc. So maybe having a manifest for git.exe, too, would improve
 performance?
 

Even with UAC disabled in control panel, the luafv.sys driver slows things down 
on C: (no impact on non-system drives). Procmon shows that with disabled luafv, 
GetFileAttributesEx is a single FASTIO call. With luafv running, FASTIO fails 
and is followed by three IRP calls (open, query, close).

I haven't tried with UAC enabled, or if disabling virtualization for git.exe 
has an impact.

Karsten

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows performance / threading file access

2013-10-17 Thread Karsten Blees
Am 16.10.2013 00:22, schrieb pro-logic:
 I also get fairly slow performance out of the checkout / reset 
 operations on windows.
 
 This discussion got me trying to work out what's taking so long on 
 windows. To help I used killcache [1] to flush the HDD cache and
 Very Sleepy [2] to profile the code. I couldn't use the 
 GIT_TRACE_PERFORMANCE [3] patch as that seems to only work on script 
 commands, and in my case I just get a result of 335 seconds git 
 reset --hard head from the log.

The trace_performance functions require manual instrumentation of the code 
sections you want to measure, e.g. like this [1]. Output looks like this for a 
full WebKit checkout (Win7 x64, Core i7 860, WD VelociRaptor 300, NTFS, no 
virus scanner, no luafv, no defragger):

trace: at entry.c:128, time: 135.786 s: write_entry::create
trace: at entry.c:129, time: 101.6 s: write_entry::stream
trace: at entry.c:130, time: 0 s: write_entry::read
trace: at entry.c:131, time: 0 s: write_entry::convert
trace: at entry.c:132, time: 0 s: write_entry::write
trace: at entry.c:133, time: 4.71825 s: write_entry::close
trace: at compat/mingw.c:2150, time: 5.68786 s: mingw_lstat (called 661660 
times)
trace: at compat/mingw.c:2151, time: 259.219 s: command: 
c:\git\msysgit\git\git-checkout.exe -f HEAD

 After running killcache I ran very sleepy connected to git, and 
 according to the profile: 95.5% of the time is spent in do_lstat 
 (mingw.c) / NtQueryFullAttributeFile (ntdll)

Very Sleepy confirmed my numbers from above: lstat was always much smaller than 
create/stream/read/write. Could you post details about your test setup? Are you 
still using WebKit for your tests?

 For fun, not knowing if I would break anything or not (it probably 
 does), I wrapped the entire unpack_trees method in the fscache [4] 
 and the total git reset --hard head time fell from 335 seconds to 28 
 seconds, a 11x improvement.

Hmmm...this doesn't work for me at all. Fscache isn't updated during checkout, 
so lstat-checks whether creating a file or directory succeeded will fail.

$ git config core.fscache true
$ time git checkout -f HEAD
Unlink of file 'Examples' failed. Should I try again? (y/n) n
warning: unable to unlink Examples: Permission denied
fatal: cannot create directory at 'Examples': Permission denied


Karsten

[1] https://github.com/kblees/git/commit/b8eca278
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Windows performance / threading file access

2013-10-10 Thread Stefan Zager
Hi folks,

I don't follow the mailing list carefully, so forgive me if this has
been discussed before, but:

I've noticed that when working with a very large repository using msys
git, the initial checkout of a cloned repository is excruciatingly
slow (80%+ of total clone time).  The root cause, I think, is that git
does all the file access serially, and that's really slow on Windows.

Has anyone considered threading file access to speed this up?  In
particular, I've got my eye on this loop in unpack-trees.c:

static struct checkout state;
static int check_updates(struct unpack_trees_options *o)
{
unsigned cnt = 0, total = 0;
struct progress *progress = NULL;
struct index_state *index = o-result;
int i;
int errs = 0;

...

for (i = 0; i  index-cache_nr; i++) {
struct cache_entry *ce = index-cache[i];

if (ce-ce_flags  CE_UPDATE) {
display_progress(progress, ++cnt);
ce-ce_flags = ~CE_UPDATE;
if (o-update  !o-dry_run) {
errs |= checkout_entry(ce, state, NULL);
}
}
}
stop_progress(progress);
if (o-update)
git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
return errs != 0;
}


Any thoughts on adding threading around the call to checkout_entry?


Thanks in advance,

Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows performance / threading file access

2013-10-10 Thread Sebastian Schuberth
Please keep in mind to CC the msysgit mailing list for Windows-specific 
stuff. I'm also CC'ing Karsten who has worked on performance 
improvements for Git for Windows in the past.


Thanks for bringing this up!

--
Sebastian Schuberth



Hi folks,

I don't follow the mailing list carefully, so forgive me if this has
been discussed before, but:

I've noticed that when working with a very large repository using msys
git, the initial checkout of a cloned repository is excruciatingly
slow (80%+ of total clone time).  The root cause, I think, is that git
does all the file access serially, and that's really slow on Windows.

Has anyone considered threading file access to speed this up?  In
particular, I've got my eye on this loop in unpack-trees.c:

static struct checkout state;
static int check_updates(struct unpack_trees_options *o)
{
 unsigned cnt = 0, total = 0;
 struct progress *progress = NULL;
 struct index_state *index = o-result;
 int i;
 int errs = 0;

 ...

 for (i = 0; i  index-cache_nr; i++) {
 struct cache_entry *ce = index-cache[i];

 if (ce-ce_flags  CE_UPDATE) {
 display_progress(progress, ++cnt);
 ce-ce_flags = ~CE_UPDATE;
 if (o-update  !o-dry_run) {
 errs |= checkout_entry(ce, state, NULL);
 }
 }
 }
 stop_progress(progress);
 if (o-update)
 git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
 return errs != 0;
}


Any thoughts on adding threading around the call to checkout_entry?


Thanks in advance,

Stefan



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows performance / threading file access

2013-10-10 Thread Karsten Blees
Am 10.10.2013 22:19, schrieb Sebastian Schuberth:
 Please keep in mind to CC the msysgit mailing list for Windows-specific 
 stuff. I'm also CC'ing Karsten who has worked on performance improvements for 
 Git for Windows in the past.
 

Thanks

 Thanks for bringing this up!
 
 -- 
 Sebastian Schuberth
 
 
 Hi folks,

 I don't follow the mailing list carefully, so forgive me if this has
 been discussed before, but:

 I've noticed that when working with a very large repository using msys
 git, the initial checkout of a cloned repository is excruciatingly
 slow (80%+ of total clone time).  The root cause, I think, is that git
 does all the file access serially, and that's really slow on Windows.


What exactly do you mean by excruciatingly slow?

I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full 
checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows. Xcopy 
/s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD (WD 
Caviar Green) the Windows checkout took ~9 minutes.

That's not so bad I think, considering that we read from pack files and write 
both files and directory structures, so there's a lot of disk seeking involved.

If your numbers are much slower, check for overeager virus scanners and 
probably the infamous User Account Control (On Vista/7 (8?), the luafv.sys 
driver slows down things on the system drive even with UAC turned off in 
control panel. The driver can be disabled with sc config luafv start= 
disabled + reboot. Reenable with sc config luafv start= auto).

 Has anyone considered threading file access to speed this up?  In
 particular, I've got my eye on this loop in unpack-trees.c:


Its probably worth a try, however, in my experience, doing disk IO in parallel 
tends to slow things down due to more disk seeks.

I'd rather try to minimize seeks, e.g.:

* read the blob data for a block of cache_entries, then write out the files, 
repeat (this would require lots of memory, though)

* index-cache is typically sorted by name and pack files by size, right? 
Perhaps its faster to iterate cache_entries by size so that we read the pack 
file sequentially (but then we'd write files/directories in random order...)


If you want to measure exactly which part of checkout eats the performance, 
check out this: https://github.com/kblees/git/commits/kb/performance-tracing-v3

Bye,
Karsten
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows performance / threading file access

2013-10-10 Thread Stefan Zager
On Thu, Oct 10, 2013 at 5:51 PM, Karsten Blees karsten.bl...@gmail.comwrote:

  I've noticed that when working with a very large repository using msys
  git, the initial checkout of a cloned repository is excruciatingly
  slow (80%+ of total clone time).  The root cause, I think, is that git
  does all the file access serially, and that's really slow on Windows.
 

 What exactly do you mean by excruciatingly slow?

 I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full
 checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows.
 Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD
 (WD Caviar Green) the Windows checkout took ~9 minutes.

I'm using blink for my test, which should be more or less indistinguishable
from WebKit.  I'm using a standard spinning disk, no SSD.  For my purposes,
I need to optimize this for standard-ish hardware, not best-in-class.

For my test, I first run 'git clone -n repo', and then measure the
running time of 'git checkout --force HEAD'.  On linux, the checkout
command runs in 0:12; on Windows, it's about 3:30.

 If your numbers are much slower, check for overeager virus scanners and
 probably the infamous User Account Control (On Vista/7 (8?), the
 luafv.sys driver slows down things on the system drive even with UAC turned
 off in control panel. The driver can be disabled with sc config luafv
 start= disabled + reboot. Reenable with sc config luafv start= auto).

I confess that I am pretty ignorant about Windows, so I'll have to research
these.

 Has anyone considered threading file access to speed this up?  In
  particular, I've got my eye on this loop in unpack-trees.c:
 

 Its probably worth a try, however, in my experience, doing disk IO in
 parallel tends to slow things down due to more disk seeks.

 I'd rather try to minimize seeks, ...


In my experience, modern disk controllers are very very good at this; it
rarely, if ever, makes sense to try and outsmart them.

But, from talking to Windows-savvy people, I believe the issue is not disk
seek time, but rather the fact that Windows doesn't cache file stat
information.  Instead, it goes all the way to the source of truth (i.e.,
the physical disk) every time it stats a file or directory.  That's what
causes the checkout to be so slow: all those file stats run serially.

Does that sound right?  I'm prepared to be wrong about this; but if no one
has tried it, then it's probably at least worth an experiment.

Thanks,

Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows performance / threading file access

2013-10-10 Thread Duy Nguyen
On Fri, Oct 11, 2013 at 12:35 PM, Stefan Zager sza...@google.com wrote:
 For my test, I first run 'git clone -n repo', and then measure the
 running time of 'git checkout --force HEAD'.  On linux, the checkout
 command runs in 0:12; on Windows, it's about 3:30.

try

git read-tree HEAD
git ls-files | xargs -P=XXX -n= git checkout-index

That should give you a rough idea how much gain (or loss) by parallelization
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html