Re: git grep parallelism question
torva...@linux-foundation.org wrote on Fri, 26 Apr 2013 13:31 -0700: Anyway, I think your patch is good if for no other reason that it allows this kind of testing, but at least for my machine, clearly the current default of eight threads is actually good enough. Maybe somebody with a very different machine might want to run the above script and see if how sensitive other machines are to this parameter.. NFS numbers behave as expected: IO concurrency is key. WARM 1 real 0m23.147s 2 real 0m13.913s 4 real 0m6.958s 8 real 0m4.104s 16 real 0m3.588s 32 real 0m3.212s 64 real 0m3.173s COLD 1 real 1m36.969s 2 real 0m51.627s 4 real 0m32.994s 8 real 0m25.657s 16 real 0m21.260s 32 real 0m18.138s 64 real 0m17.265s I am tempted to change the default locally from 8 to 32. -- Pete -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Mon, Apr 29, 2013 at 03:22:24PM -0700, Junio C Hamano wrote: John Keeping j...@keeping.me.uk writes: No, I was the one missing something (--root to be precise). But with TEST_OUTPUT_DIRECTORY you also get the result files in your temporary location, not just the trash directory. With your patch, doesn't t-*.sh --root $there automatically use the fast $there temporary location as the result depot, too? No, the current code uses: $TEST_OUTPUT_DIRECTORY/$root/trash\ directory.t where we don't prepend $TEST_OUTPUT_DIRECTORY/ if $root is absolute. If it doesn't with the current code, shouldn't it? I think the current behaviour is fine and the two options complement each other. TEST_OUTPUT_DIRECTORY is something you set once and forget about which says all of the test output should go over here, whereas --root is passed to a specific test and says put your output here but does not affect the result aggregation which is not specific to that test. Note that setting TEST_OUTPUT_DIRECTORY in config.mak affects all tests no matter how you run them (via make or as ./t-.sh) whereas setting --root=... in GIT_TEST_OPTS only affect tests run via make. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Tue, Apr 30, 2013 at 09:08:49AM +0100, John Keeping wrote: With your patch, doesn't t-*.sh --root $there automatically use the fast $there temporary location as the result depot, too? No, the current code uses: $TEST_OUTPUT_DIRECTORY/$root/trash\ directory.t where we don't prepend $TEST_OUTPUT_DIRECTORY/ if $root is absolute. If it doesn't with the current code, shouldn't it? I think the current behaviour is fine and the two options complement each other. TEST_OUTPUT_DIRECTORY is something you set once and forget about which says all of the test output should go over here, whereas --root is passed to a specific test and says put your output here but does not affect the result aggregation which is not specific to that test. The original intent of --root (and how I use it) is to set and forget it, too, via GIT_TEST_OPTS. I intentionally didn't move test results with it, because to me the point was a pure optimization: put the trash directories on a faster disk, and leave everything else identical. With --root, any scripts which later want to look at test-results will find them in the usual place. Your patch updates all of the in-tree spots which look at the results, but any third-party scripts would need to take it into account, too (though I have no idea if any such scripts even exist). I'm curious if there is a good reason to want to move the results. Some possibilities I can think of are: 1. More optimization, as results are written to the faster filesystem. I doubt this is noticeable, though, as the amount of data written is relatively small compared to the tests themselves (which are constantly creating and deleting repos). 2. You can run tests in a read-only git checkout. I'm not sure how useful that is, though, since you would already need to compile git. 3. You could have multiple sets of test results to keep or compare. I'd think you'd want to keep the built versions of git around, too, though. Which would mean that a full checkout like git-new-workdir would be a much simpler way to accomplish the same thing. So I'm not against TEST_OUTPUT_DIRECTORY as a concept, but I'm having trouble seeing how it is more useful than --root. Note that setting TEST_OUTPUT_DIRECTORY in config.mak affects all tests no matter how you run them (via make or as ./t-.sh) whereas setting --root=... in GIT_TEST_OPTS only affect tests run via make. I actually consider that a feature of --root. When I run make test everything happens fast. When I run the script manually (which is usually because I'm debugging), the trash directory appears in the current directory, so I can easily investigate it. And if you are running a single test, the performance impact is usually negligible (where you really notice it is when running make -j32 test). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Tue, Apr 30, 2013 at 11:59:39AM -0400, Jeff King wrote: On Tue, Apr 30, 2013 at 09:08:49AM +0100, John Keeping wrote: With your patch, doesn't t-*.sh --root $there automatically use the fast $there temporary location as the result depot, too? No, the current code uses: $TEST_OUTPUT_DIRECTORY/$root/trash\ directory.t where we don't prepend $TEST_OUTPUT_DIRECTORY/ if $root is absolute. If it doesn't with the current code, shouldn't it? I think the current behaviour is fine and the two options complement each other. TEST_OUTPUT_DIRECTORY is something you set once and forget about which says all of the test output should go over here, whereas --root is passed to a specific test and says put your output here but does not affect the result aggregation which is not specific to that test. The original intent of --root (and how I use it) is to set and forget it, too, via GIT_TEST_OPTS. I intentionally didn't move test results with it, because to me the point was a pure optimization: put the trash directories on a faster disk, and leave everything else identical. With --root, any scripts which later want to look at test-results will find them in the usual place. Your patch updates all of the in-tree spots which look at the results, but any third-party scripts would need to take it into account, too (though I have no idea if any such scripts even exist). I'm curious if there is a good reason to want to move the results. Some possibilities I can think of are: 1. More optimization, as results are written to the faster filesystem. I doubt this is noticeable, though, as the amount of data written is relatively small compared to the tests themselves (which are constantly creating and deleting repos). 2. You can run tests in a read-only git checkout. I'm not sure how useful that is, though, since you would already need to compile git. 3. You could have multiple sets of test results to keep or compare. I'd think you'd want to keep the built versions of git around, too, though. Which would mean that a full checkout like git-new-workdir would be a much simpler way to accomplish the same thing. So I'm not against TEST_OUTPUT_DIRECTORY as a concept, but I'm having trouble seeing how it is more useful than --root. I think the original intent of TEST_OUTPUT_DIRECTORY was to allow other users of the test framework (in contrib/ or the performance tests) to put their output in a sensible place for those tests, like you describe below. The patch being discussed here [1] just makes sure that it applies to everything - previously it was applied to test-results/ inconsistently; test-lib.sh used TEST_OUTPUT_DIRECTORY but the makefile didn't. So we haven't actually changed where test-results/ live as a result of this change, just where the makefile looks in order to display the aggregate results and clean them up. Note that setting TEST_OUTPUT_DIRECTORY in config.mak affects all tests no matter how you run them (via make or as ./t-.sh) whereas setting --root=... in GIT_TEST_OPTS only affect tests run via make. I actually consider that a feature of --root. When I run make test everything happens fast. When I run the script manually (which is usually because I'm debugging), the trash directory appears in the current directory, so I can easily investigate it. And if you are running a single test, the performance impact is usually negligible (where you really notice it is when running make -j32 test). This confirms to me that the patch as it currently stands is correct: we have made TEST_OUTPUT_DIRECTORY consistent and --root still works as before. [1] http://article.gmane.org/gmane.comp.version-control.git/222555 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Tue, Apr 30, 2013 at 05:12:08PM +0100, John Keeping wrote: So I'm not against TEST_OUTPUT_DIRECTORY as a concept, but I'm having trouble seeing how it is more useful than --root. I think the original intent of TEST_OUTPUT_DIRECTORY was to allow other users of the test framework (in contrib/ or the performance tests) to put their output in a sensible place for those tests, like you describe below. The patch being discussed here [1] just makes sure that it applies to everything - previously it was applied to test-results/ inconsistently; test-lib.sh used TEST_OUTPUT_DIRECTORY but the makefile didn't. So we haven't actually changed where test-results/ live as a result of this change, just where the makefile looks in order to display the aggregate results and clean them up. Ah, I see. Thanks, that was the piece I was missing. This confirms to me that the patch as it currently stands is correct: we have made TEST_OUTPUT_DIRECTORY consistent and --root still works as before. [1] http://article.gmane.org/gmane.comp.version-control.git/222555 Yeah, if it is about harmonizing the Makefile and the test scripts, that is definitely a bug fix, and the right thing to do. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
Linus Torvalds wrote: Anyway, I think your patch is good if for no other reason that it allows this kind of testing, but at least for my machine, clearly the current default of eight threads is actually good enough. Maybe somebody with a very different machine might want to run the above script and see if how sensitive other machines are to this parameter.. I have four cores with HT, so I expect similar gains from CPU parallelism (in the hot-cache case). And the test results meet this expectation: 1 real 0m1.287s 2 real 0m0.635s 4 real 0m0.460s 8 real 0m0.413s 16 real 0m0.443s 32 real 0m0.715s I have a rotating hard disk, and expect the IO parallelism (cold-cache case) benefits to peak at a larger number of threads (than in your SSD case). These are the test results: 1 real 0m28.566s 4 real 0m20.361s 8 real 0m16.990s 16 real 0m15.278s 32 real 0m13.710s 64 real 0m12.405s 128 real 0m11.913s 256 real 0m11.759s 512 real 0m12.022s It looks like 128~256 threads is the sweet spot. It's impossible for git to determine if the cache is hot or cold, correct? Cold cache is a very rare case, and I'm not sure how we can optimize specifically for that case. So, I'm not sure how we can improve grep. On a related note, one place that IO parallelism can provide massive benefits is in executing shell scripts. Accordingly, I always use the following commands to compile and test git respectively: make -j 8 CFLAGS=-g -O0 -Wall make -j 8 DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS=-j 16 test i.e. always use 8 threads when the task is known to be CPU intensive, and always use 16 threads when the task is known to be IO intensive. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Mon, Apr 29, 2013 at 07:35:01PM +0530, Ramkumar Ramachandra wrote: On a related note, one place that IO parallelism can provide massive benefits is in executing shell scripts. Accordingly, I always use the following commands to compile and test git respectively: make -j 8 CFLAGS=-g -O0 -Wall make -j 8 DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS=-j 16 test i.e. always use 8 threads when the task is known to be CPU intensive, and always use 16 threads when the task is known to be IO intensive. On this tangent, I recently added a TEST_OUTPUT_DIRECTORY line to my config.mak which points into a tmpfs mount. Keeping all of the test repositories in RAM makes the tests significantly faster for me and works nicely when you have the patches in jk/test-output (without those patches the individual tests work but the reporting of aggregate results doesn't). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
John Keeping j...@keeping.me.uk writes: On Mon, Apr 29, 2013 at 07:35:01PM +0530, Ramkumar Ramachandra wrote: On a related note, one place that IO parallelism can provide massive benefits is in executing shell scripts. Accordingly, I always use the following commands to compile and test git respectively: make -j 8 CFLAGS=-g -O0 -Wall make -j 8 DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS=-j 16 test i.e. always use 8 threads when the task is known to be CPU intensive, and always use 16 threads when the task is known to be IO intensive. On this tangent, I recently added a TEST_OUTPUT_DIRECTORY line to my config.mak which points into a tmpfs mount. Keeping all of the test repositories in RAM makes the tests significantly faster for me and works nicely when you have the patches in jk/test-output (without those patches the individual tests work but the reporting of aggregate results doesn't). But that's been possible for quite some time now, using --root, or am I missing something? (Not that the fix as such is a bad idea, but other readers might not want to wait for it to hit master.) -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Mon, Apr 29, 2013 at 08:04:10PM +0200, Thomas Rast wrote: John Keeping j...@keeping.me.uk writes: On Mon, Apr 29, 2013 at 07:35:01PM +0530, Ramkumar Ramachandra wrote: On a related note, one place that IO parallelism can provide massive benefits is in executing shell scripts. Accordingly, I always use the following commands to compile and test git respectively: make -j 8 CFLAGS=-g -O0 -Wall make -j 8 DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS=-j 16 test i.e. always use 8 threads when the task is known to be CPU intensive, and always use 16 threads when the task is known to be IO intensive. On this tangent, I recently added a TEST_OUTPUT_DIRECTORY line to my config.mak which points into a tmpfs mount. Keeping all of the test repositories in RAM makes the tests significantly faster for me and works nicely when you have the patches in jk/test-output (without those patches the individual tests work but the reporting of aggregate results doesn't). But that's been possible for quite some time now, using --root, or am I missing something? No, I was the one missing something (--root to be precise). But with TEST_OUTPUT_DIRECTORY you also get the result files in your temporary location, not just the trash directory. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
John Keeping j...@keeping.me.uk writes: On Mon, Apr 29, 2013 at 08:04:10PM +0200, Thomas Rast wrote: John Keeping j...@keeping.me.uk writes: On Mon, Apr 29, 2013 at 07:35:01PM +0530, Ramkumar Ramachandra wrote: On a related note, one place that IO parallelism can provide massive benefits is in executing shell scripts. Accordingly, I always use the following commands to compile and test git respectively: make -j 8 CFLAGS=-g -O0 -Wall make -j 8 DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS=-j 16 test i.e. always use 8 threads when the task is known to be CPU intensive, and always use 16 threads when the task is known to be IO intensive. On this tangent, I recently added a TEST_OUTPUT_DIRECTORY line to my config.mak which points into a tmpfs mount. Keeping all of the test repositories in RAM makes the tests significantly faster for me and works nicely when you have the patches in jk/test-output (without those patches the individual tests work but the reporting of aggregate results doesn't). But that's been possible for quite some time now, using --root, or am I missing something? No, I was the one missing something (--root to be precise). But with TEST_OUTPUT_DIRECTORY you also get the result files in your temporary location, not just the trash directory. With your patch, doesn't t-*.sh --root $there automatically use the fast $there temporary location as the result depot, too? If it doesn't with the current code, shouldn't it? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
Linus Torvalds torva...@linux-foundation.org writes: Anyway, I think your patch is good if for no other reason that it allows this kind of testing, but at least for my machine, clearly the current default of eight threads is actually good enough. Maybe somebody with a very different machine might want to run the above script and see if how sensitive other machines are to this parameter.. I think the last time Duy, Peff, me and others looked into grep threading (which was a while ago) we basically reached the conclusion that it's very unstable across machines. Back then I tested things on a 2x6-core Xeon OS X machine and there the performance (for the hot-cache, worktree case, which should parallelize nicely) flatlines at 5 threads for no apparent reason. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git grep parallelism question
Since I reboot fairly regularly to test new kernels, I don't *always* have the kernel source tree in my caches, so I still care about some cold-cache performance. And git grep tends to be the most noticeable one. Now, I have a SSD, and even the cold-cache case takes just five seconds or so, but that's still somethng I react to, since the normal kernel tree in cache case ends up bring close enough to instantaneous (half a second) that then when it takes longer I react to it. And I started thinking about it, and our git grep parallelism seems to be limited to 8. Which is fine most of the time for CPU parallelism (although maybe some people with big machines would prefer higher numbers), but for IO parallelism I thought that maybe we'd like a higher number... So I tried it out, and with THREADS set to 32, I get a roughly 15% performance boost for the cold-cache case (the error bar is high enough to not give a very precise number, but I see it going from ~4.8-4.9s on my machine down to 4.2..4.6s). That's on an SSD, though - the performance implications might be very different for other use cases (NFS would likely prefer higher IO parallelism and might show bigger improvement, while a rotational disk might end up just thrashing more) Is this a big deal? Probably not. But I did react to how annoying it was to set the parallelism factor (recompile git with a new number). Wouldn't it be lovely if it was slightly smarter (something more akin to the index preloading that takes number of files into account) or at least allowed people to set the parallelism explicitly with a command line switch? Right now it disables the parallel grep entirely for UP, for example. Which makes perfect sense if grep is all about CPU use. But even UP might improve from parallel IO.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
Linus Torvalds torva...@linux-foundation.org writes: Wouldn't it be lovely if it was slightly smarter (something more akin to the index preloading that takes number of files into account) or at least allowed people to set the parallelism explicitly with a command line switch? Yeah, a reasonable starting point for auto-tuning may be to use the same kind of parameters and heuristics (i.e. max parallel of 20 threads, assume a cost to use an extra thread is the same as running 500 greps), and then tweak them (for example, thread cost of 500 may be reasonable for lstat() but it would be way too big for grep()). The real issue may be that we do not have a good estimate of how many paths are involved in the request before starting these threads, though. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Fri, Apr 26, 2013 at 11:47 AM, Junio C Hamano gits...@pobox.com wrote: The real issue may be that we do not have a good estimate of how many paths are involved in the request before starting these threads, though. Yes. Also, I'm not sure if the 15% possible improvement on my SSD case is even worth it for something that in the end isn't necessarily the common case. I *suspect* that it might be a much bigger deal on NFS (IO parallelism really does end up being a big deal sometimes, and caching tends to be less aggressive too), but on rotational media it might be much less clear, or even a loss.. Are there people out there who use git grep over NFS and have been unhappy with performance? If are willing to recompile git with a different THREAD value in builtin/grep.c, then on a Linux client you can try echo 3 /proc/sys/vm/drop_caches to largely force cold-cache behavior for testing (I say largely, because it won't drop busy/dirty pages, but for git grep kind of loads it should be good). Of course, you need root for it, so.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
Linus Torvalds torva...@linux-foundation.org writes: Yes. Also, I'm not sure if the 15% possible improvement on my SSD case is even worth it for something that in the end isn't necessarily the common case. Cold cache being uncommon case would be forever true but more and more people are on SSD, and 15% is not a trivial improvement. Are there people out there who use git grep over NFS and have been unhappy with performance? If are willing to recompile git with a different THREAD value in builtin/grep.c,... OK, you have to recompile at least once for experiment, so perhaps building the test binary with this patch may help. builtin/grep.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 159e65d..f635cd5 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -26,12 +26,14 @@ static char const * const grep_usage[] = { static int use_threads = 1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +#define MAX_THREADS 100 +static int num_threads = 8; +static pthread_t threads[MAX_THREADS]; -/* We use one producer thread and THREADS consumer - * threads. The producer adds struct work_items to 'todo' and the - * consumers pick work items from the same array. +/* + * We use one producer thread and THREADS consumer threads. The + * producer adds struct work_items to 'todo' and the consumers pick + * work items from the same array. */ struct work_item { struct grep_source source; @@ -205,7 +207,7 @@ static void start_threads(struct grep_opt *opt) strbuf_init(todo[i].out, 0); } - for (i = 0; i ARRAY_SIZE(threads); i++) { + for (i = 0; i num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o-output = strbuf_out; @@ -237,7 +239,7 @@ static int wait_all(void) pthread_cond_broadcast(cond_add); grep_unlock(); - for (i = 0; i ARRAY_SIZE(threads); i++) { + for (i = 0; i num_threads; i++) { void *h; pthread_join(threads[i], h); hit |= (int) (intptr_t) h; @@ -636,6 +638,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) int i; int dummy; int use_index = 1; + int num_threads_explicit = -1; int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; struct option options[] = { @@ -743,6 +746,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_(allow calling of grep(1) (ignored by this build))), { OPTION_CALLBACK, 0, help-all, options, NULL, N_(show usage), PARSE_OPT_HIDDEN | PARSE_OPT_NOARG, help_callback }, + OPT_INTEGER(0, threads, num_threads_explicit, + N_(use threads when searching)), OPT_END() }; @@ -773,6 +778,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_NO_INTERNAL_HELP); grep_commit_pattern_type(pattern_type_arg, opt); + if (MAX_THREADS = num_threads_explicit) { + warning(limiting --threads to %d, MAX_THREADS); + num_threads = MAX_THREADS; + } else if (num_threads_explicit 0) { + ; /* keep num_threads to compiled-in default */ + } else { + num_threads = num_threads_explicit; + } + if (use_index !startup_info-have_repository) /* die the same way as if we did it at the beginning */ setup_git_directory(); @@ -834,7 +848,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } #ifndef NO_PTHREADS - if (list.nr || cached || online_cpus() == 1) + if ((list.nr || cached || online_cpus() == 1) num_threads_explicit 0) use_threads = 0; #else use_threads = 0; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git grep parallelism question
On Fri, Apr 26, 2013 at 12:19 PM, Junio C Hamano gits...@pobox.com wrote: OK, you have to recompile at least once for experiment, so perhaps building the test binary with this patch may help. So here's my experiment on my machine with this patch and the kernel tree. First I did the warm-cache case: for i in 1 4 8 16 32 64 do echo $i: for j in 1 2 3 4 do t=$(sh -c time git grep --threads=$i hjahsja 21 | grep real) echo $i $t done done and the numbers are pretty stable, here's just he summary (best of four tries for each case): 1 real 0m0.598s 4 real 0m0.253s 8 real 0m0.235s 16 real 0m0.269s 32 real 0m0.412s 64 real 0m0.420s so for this machine, 8 threads (our old number) is actually optimal even if it has just four cores (actually, two cores with HT). I suspect it's just because the load is slightly unbalanced, so a few extra threads helps. Looks like anything in the 4-16 thread range is ok, though. But 32 threads is clearly too much. Then I did the exact same thing, but with the echo 3 /proc/sys/vm/drop_caches just before the timing of the git grep. Again, summarizing (best-of-four number, the variation wasn't that big): 1 real 0m17.866s 4 real 0m6.367s 8 real 0m4.855s 16 real 0m4.307s 32 real 0m4.153s 64 real 0m4.128s here, the numbers continue to improve up to 64 threads, although the difference between 32 and 64 is starting to be in the noise. I suspect it's a combination of better IO overlap (although not all SSD's will even improve all that much from overlapping IO past a certain point) and probably a more noticeable imbalance between threads. Anyway, for *my* machine and for *this* particular load, I'd say that we're already pretty close to optimal. I did some trials just to see, but the best hot-cache numbers were fairly reliably for 7 or 8 threads. Looking at the numbers, I can't really convince myself that it would be worth it to do (say) 12 threads on this machine. Yes, the cold-cache case improves from the 8-thread case (best-of-four for 12 threads: 0m4.467s), but the hot-cache case has gotten sufficiently worse (0m0.251s) that I'm not sure.. Of course, in *absolute* numbers the cold-cache case is so much slower that a half-second improvement from going to 16 threads might be considered worth it, because while the the hot-cache case gets worse, we may just not care because it's fast enough that it's not noticeable. Anyway, I think your patch is good if for no other reason that it allows this kind of testing, but at least for my machine, clearly the current default of eight threads is actually good enough. Maybe somebody with a very different machine might want to run the above script and see if how sensitive other machines are to this parameter.. Linus -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html