Re: [msysGit] Re: Windows performance / threading file access
Am 22.10.2013 00:58, schrieb pro-logic: The trace_performance functions require manual instrumentation of the code sections you want to measure Ahh a case of RTFM :) Could you post details about your test setup? Are you still using WebKit for your tests? I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus scanner, truecrypt, no defragger. OK, so truecrypt and luafv may screw things up for you (according to my measurements, luafv roughly doubles lstat times on C:). I've tried to be a bit smarter with the intent of my code, and this is what I came up with. diff --git a/cache.h b/cache.h index 4bf19e3..2e9fb1f 100644 --- a/cache.h +++ b/cache.h @@ -294,7 +294,7 @@ extern void free_name_hash(struct index_state *istate); #define active_cache_changed (the_index.cache_changed) #define active_cache_tree (the_index.cache_tree) -#define read_cache() read_index(the_index) +#define read_cache() read_index_preload(the_index, NULL) #define read_cache_from(path) read_index_from(the_index, (path)) #define read_cache_preload(pathspec) read_index_preload(the_index, (pathspec)) #define is_cache_unborn() is_index_unborn(the_index) diff --git a/read-cache.c b/read-cache.c index c3d5e35..5fb2788 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1866,7 +1866,7 @@ int read_index_unmerged(struct index_state *istate) int i; int unmerged = 0; -read_index(istate); +read_index_preload(istate, NULL); for (i = 0; i istate-cache_nr; i++) { struct cache_entry *ce = istate-cache[i]; struct cache_entry *new_ce; -- Ahh, I thought that you had enabled fscache during the entire checkout. Interestingly when I run on a cleanly checked out blink repo my changes seem to make matters worse in terms of performance, but when working on a repo with ignored files in it it seems to work better. So for point of comparison I decided to run it on a comparison on a repo with working ignored files in it in this case msysgit/git after a 'make install'. When I get a few hours I'll try to build blink and re-run the numbers on a much much larger repo. This comparison is a average of 3 cold cache runs of the kb/fscache-v4 [a] vs kb/fscache-v4 with my above changes applied [b], with preloadindex and fscache set to true. For comparison git status -s [a] 3.02s [b] 2.92s git reset --hard head [a] 3.67s [b] 3.09s These numbers look far too good, so you don't actually do a fresh checkout, do you? I mean, delete all files except .git; killcache; git reset --hard / git checkout -f? That would also explain your 95% lstat times, if there's nothing to do... git add -u [a] 2.89s [b] 2.08s I noticed something interesting. Preload index uses 20 threads to do the work. When I was keeping an eye on them in task manager some threads will finish quite quickly, while others will run a lot longer. The way I understand the code at the moment the threads get equal chunks of work to perform. It's quite lilkely that even more performance could be obtained out of preload if the work splitting was 'smarter'. My currently best idea would be to use something like a lock-free queue to queue up the work and let the threads get the work of the queue. That way all threads are busy with work for longer. A candidate for the implementation would be libfds [1] queue. However my issue with this library and the reason I haven't tried to integrate is simply because the code expressly has no license. As cache/cache_nr are not modified by the threads, you actually don't need a lock-free queue. An atomic counter shared by all threads should suffice (i.e. pthread's equivalent to InterlockedIncrement/InterlockedAdd). Karsten -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [msysGit] Re: Windows performance / threading file access
On Tue, Oct 22, 2013 at 4:30 PM, Karsten Blees karsten.bl...@gmail.com wrote: Could you post details about your test setup? Are you still using WebKit for your tests? I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus scanner, truecrypt, no defragger. OK, so truecrypt and luafv may screw things up for you (according to my measurements, luafv roughly doubles lstat times on C:). Aren't we disabling UAC / LUAFV on a per-executable basis using manifests? At least the blog article at [1] suggests that we are in fact doing it the right way using our script to genera the manifests [2]. Oh but wait, we're not generating a manifest for git.exe itself, only for executables that contain setup, install, update, patch etc. So maybe having a manifest for git.exe, too, would improve performance? [1] http://blogs.msdn.com/b/alexcarp/archive/2009/06/25/the-deal-with-luafv-sys.aspx [2] https://github.com/msysgit/msysgit/blob/master/share/msysGit/make-manifests.sh -- Sebastian Schuberth -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [msysGit] Re: Windows performance / threading file access
Am 22.10.2013 16:49, schrieb Sebastian Schuberth: On Tue, Oct 22, 2013 at 4:30 PM, Karsten Blees karsten.bl...@gmail.com wrote: Could you post details about your test setup? Are you still using WebKit for your tests? I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus scanner, truecrypt, no defragger. OK, so truecrypt and luafv may screw things up for you (according to my measurements, luafv roughly doubles lstat times on C:). Aren't we disabling UAC / LUAFV on a per-executable basis using manifests? At least the blog article at [1] suggests that we are in fact doing it the right way using our script to genera the manifests [2]. Oh but wait, we're not generating a manifest for git.exe itself, only for executables that contain setup, install, update, patch etc. So maybe having a manifest for git.exe, too, would improve performance? Even with UAC disabled in control panel, the luafv.sys driver slows things down on C: (no impact on non-system drives). Procmon shows that with disabled luafv, GetFileAttributesEx is a single FASTIO call. With luafv running, FASTIO fails and is followed by three IRP calls (open, query, close). I haven't tried with UAC enabled, or if disabling virtualization for git.exe has an impact. Karsten -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows performance / threading file access
Am 16.10.2013 00:22, schrieb pro-logic: I also get fairly slow performance out of the checkout / reset operations on windows. This discussion got me trying to work out what's taking so long on windows. To help I used killcache [1] to flush the HDD cache and Very Sleepy [2] to profile the code. I couldn't use the GIT_TRACE_PERFORMANCE [3] patch as that seems to only work on script commands, and in my case I just get a result of 335 seconds git reset --hard head from the log. The trace_performance functions require manual instrumentation of the code sections you want to measure, e.g. like this [1]. Output looks like this for a full WebKit checkout (Win7 x64, Core i7 860, WD VelociRaptor 300, NTFS, no virus scanner, no luafv, no defragger): trace: at entry.c:128, time: 135.786 s: write_entry::create trace: at entry.c:129, time: 101.6 s: write_entry::stream trace: at entry.c:130, time: 0 s: write_entry::read trace: at entry.c:131, time: 0 s: write_entry::convert trace: at entry.c:132, time: 0 s: write_entry::write trace: at entry.c:133, time: 4.71825 s: write_entry::close trace: at compat/mingw.c:2150, time: 5.68786 s: mingw_lstat (called 661660 times) trace: at compat/mingw.c:2151, time: 259.219 s: command: c:\git\msysgit\git\git-checkout.exe -f HEAD After running killcache I ran very sleepy connected to git, and according to the profile: 95.5% of the time is spent in do_lstat (mingw.c) / NtQueryFullAttributeFile (ntdll) Very Sleepy confirmed my numbers from above: lstat was always much smaller than create/stream/read/write. Could you post details about your test setup? Are you still using WebKit for your tests? For fun, not knowing if I would break anything or not (it probably does), I wrapped the entire unpack_trees method in the fscache [4] and the total git reset --hard head time fell from 335 seconds to 28 seconds, a 11x improvement. Hmmm...this doesn't work for me at all. Fscache isn't updated during checkout, so lstat-checks whether creating a file or directory succeeded will fail. $ git config core.fscache true $ time git checkout -f HEAD Unlink of file 'Examples' failed. Should I try again? (y/n) n warning: unable to unlink Examples: Permission denied fatal: cannot create directory at 'Examples': Permission denied Karsten [1] https://github.com/kblees/git/commit/b8eca278 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Windows performance / threading file access
Hi folks, I don't follow the mailing list carefully, so forgive me if this has been discussed before, but: I've noticed that when working with a very large repository using msys git, the initial checkout of a cloned repository is excruciatingly slow (80%+ of total clone time). The root cause, I think, is that git does all the file access serially, and that's really slow on Windows. Has anyone considered threading file access to speed this up? In particular, I've got my eye on this loop in unpack-trees.c: static struct checkout state; static int check_updates(struct unpack_trees_options *o) { unsigned cnt = 0, total = 0; struct progress *progress = NULL; struct index_state *index = o-result; int i; int errs = 0; ... for (i = 0; i index-cache_nr; i++) { struct cache_entry *ce = index-cache[i]; if (ce-ce_flags CE_UPDATE) { display_progress(progress, ++cnt); ce-ce_flags = ~CE_UPDATE; if (o-update !o-dry_run) { errs |= checkout_entry(ce, state, NULL); } } } stop_progress(progress); if (o-update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); return errs != 0; } Any thoughts on adding threading around the call to checkout_entry? Thanks in advance, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows performance / threading file access
Please keep in mind to CC the msysgit mailing list for Windows-specific stuff. I'm also CC'ing Karsten who has worked on performance improvements for Git for Windows in the past. Thanks for bringing this up! -- Sebastian Schuberth Hi folks, I don't follow the mailing list carefully, so forgive me if this has been discussed before, but: I've noticed that when working with a very large repository using msys git, the initial checkout of a cloned repository is excruciatingly slow (80%+ of total clone time). The root cause, I think, is that git does all the file access serially, and that's really slow on Windows. Has anyone considered threading file access to speed this up? In particular, I've got my eye on this loop in unpack-trees.c: static struct checkout state; static int check_updates(struct unpack_trees_options *o) { unsigned cnt = 0, total = 0; struct progress *progress = NULL; struct index_state *index = o-result; int i; int errs = 0; ... for (i = 0; i index-cache_nr; i++) { struct cache_entry *ce = index-cache[i]; if (ce-ce_flags CE_UPDATE) { display_progress(progress, ++cnt); ce-ce_flags = ~CE_UPDATE; if (o-update !o-dry_run) { errs |= checkout_entry(ce, state, NULL); } } } stop_progress(progress); if (o-update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); return errs != 0; } Any thoughts on adding threading around the call to checkout_entry? Thanks in advance, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows performance / threading file access
Am 10.10.2013 22:19, schrieb Sebastian Schuberth: Please keep in mind to CC the msysgit mailing list for Windows-specific stuff. I'm also CC'ing Karsten who has worked on performance improvements for Git for Windows in the past. Thanks Thanks for bringing this up! -- Sebastian Schuberth Hi folks, I don't follow the mailing list carefully, so forgive me if this has been discussed before, but: I've noticed that when working with a very large repository using msys git, the initial checkout of a cloned repository is excruciatingly slow (80%+ of total clone time). The root cause, I think, is that git does all the file access serially, and that's really slow on Windows. What exactly do you mean by excruciatingly slow? I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows. Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD (WD Caviar Green) the Windows checkout took ~9 minutes. That's not so bad I think, considering that we read from pack files and write both files and directory structures, so there's a lot of disk seeking involved. If your numbers are much slower, check for overeager virus scanners and probably the infamous User Account Control (On Vista/7 (8?), the luafv.sys driver slows down things on the system drive even with UAC turned off in control panel. The driver can be disabled with sc config luafv start= disabled + reboot. Reenable with sc config luafv start= auto). Has anyone considered threading file access to speed this up? In particular, I've got my eye on this loop in unpack-trees.c: Its probably worth a try, however, in my experience, doing disk IO in parallel tends to slow things down due to more disk seeks. I'd rather try to minimize seeks, e.g.: * read the blob data for a block of cache_entries, then write out the files, repeat (this would require lots of memory, though) * index-cache is typically sorted by name and pack files by size, right? Perhaps its faster to iterate cache_entries by size so that we read the pack file sequentially (but then we'd write files/directories in random order...) If you want to measure exactly which part of checkout eats the performance, check out this: https://github.com/kblees/git/commits/kb/performance-tracing-v3 Bye, Karsten -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows performance / threading file access
On Thu, Oct 10, 2013 at 5:51 PM, Karsten Blees karsten.bl...@gmail.comwrote: I've noticed that when working with a very large repository using msys git, the initial checkout of a cloned repository is excruciatingly slow (80%+ of total clone time). The root cause, I think, is that git does all the file access serially, and that's really slow on Windows. What exactly do you mean by excruciatingly slow? I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows. Xcopy /s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD (WD Caviar Green) the Windows checkout took ~9 minutes. I'm using blink for my test, which should be more or less indistinguishable from WebKit. I'm using a standard spinning disk, no SSD. For my purposes, I need to optimize this for standard-ish hardware, not best-in-class. For my test, I first run 'git clone -n repo', and then measure the running time of 'git checkout --force HEAD'. On linux, the checkout command runs in 0:12; on Windows, it's about 3:30. If your numbers are much slower, check for overeager virus scanners and probably the infamous User Account Control (On Vista/7 (8?), the luafv.sys driver slows down things on the system drive even with UAC turned off in control panel. The driver can be disabled with sc config luafv start= disabled + reboot. Reenable with sc config luafv start= auto). I confess that I am pretty ignorant about Windows, so I'll have to research these. Has anyone considered threading file access to speed this up? In particular, I've got my eye on this loop in unpack-trees.c: Its probably worth a try, however, in my experience, doing disk IO in parallel tends to slow things down due to more disk seeks. I'd rather try to minimize seeks, ... In my experience, modern disk controllers are very very good at this; it rarely, if ever, makes sense to try and outsmart them. But, from talking to Windows-savvy people, I believe the issue is not disk seek time, but rather the fact that Windows doesn't cache file stat information. Instead, it goes all the way to the source of truth (i.e., the physical disk) every time it stats a file or directory. That's what causes the checkout to be so slow: all those file stats run serially. Does that sound right? I'm prepared to be wrong about this; but if no one has tried it, then it's probably at least worth an experiment. Thanks, Stefan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows performance / threading file access
On Fri, Oct 11, 2013 at 12:35 PM, Stefan Zager sza...@google.com wrote: For my test, I first run 'git clone -n repo', and then measure the running time of 'git checkout --force HEAD'. On linux, the checkout command runs in 0:12; on Windows, it's about 3:30. try git read-tree HEAD git ls-files | xargs -P=XXX -n= git checkout-index That should give you a rough idea how much gain (or loss) by parallelization -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html