Re: Windows performance / threading file access

Karsten Blees Thu, 10 Oct 2013 17:53:10 -0700

Am 10.10.2013 22:19, schrieb Sebastian Schuberth:
> Please keep in mind to CC the msysgit mailing list for Windows-specific 
> stuff. I'm also CC'ing Karsten who has worked on performance improvements for 
> Git for Windows in the past.
>


Thanks

> Thanks for bringing this up!
> 
> -- 
> Sebastian Schuberth
> 
> 
>> Hi folks,
>>
>> I don't follow the mailing list carefully, so forgive me if this has
>> been discussed before, but:
>>
>> I've noticed that when working with a very large repository using msys
>> git, the initial checkout of a cloned repository is excruciatingly
>> slow (80%+ of total clone time).  The root cause, I think, is that git
>> does all the file access serially, and that's really slow on Windows.
>>

What exactly do you mean by "excruciatingly slow"?

I just ran a few tests with a big repo (WebKit, ~2GB, ~200k files). A full 
checkout with git 1.8.4 on my SSD took 52s on Linux and 81s on Windows. Xcopy 
/s took ~4 minutes (so xcopy is much slower than git). On a 'real' HD (WD 
Caviar Green) the Windows checkout took ~9 minutes.

That's not so bad I think, considering that we read from pack files and write 
both files and directory structures, so there's a lot of disk seeking involved.

If your numbers are much slower, check for overeager virus scanners and 
probably the infamous "User Account Control" (On Vista/7 (8?), the luafv.sys 
driver slows down things on the system drive even with UAC turned off in 
control panel. The driver can be disabled with "sc config luafv start= 
disabled" + reboot. Reenable with "sc config luafv start= auto").

>> Has anyone considered threading file access to speed this up?  In
>> particular, I've got my eye on this loop in unpack-trees.c:
>>

Its probably worth a try, however, in my experience, doing disk IO in parallel 
tends to slow things down due to more disk seeks.

I'd rather try to minimize seeks, e.g.:

* read the blob data for a block of cache_entries, then write out the files, 
repeat (this would require lots of memory, though)

* index->cache is typically sorted by name and pack files by size, right? 
Perhaps its faster to iterate cache_entries by size so that we read the pack 
file sequentially (but then we'd write files/directories in random order...)


If you want to measure exactly which part of checkout eats the performance, 
check out this: https://github.com/kblees/git/commits/kb/performance-tracing-v3

Bye,
Karsten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Windows performance / threading file access

Reply via email to