Pádraig Brady wrote: > I've not fully analyzed this yet, and I'm not saying it's wrong, > but the above change seems to have a large effect on thread > creation when smaller buffers are used (you hinted previously > that being less aggressive with the amount of mem used by default > might be appropriate, and I agree). > > Anyway with the above I seem to need a buffer size more > than 10M to have any threads created at all. > > Testing the original 4 lines heuristic with the following, shows: > (note I only get > 4 threads after 4M of input, not 7 for 16 lines > as indicated in NEWS). > > $ for i in $(seq 30); do >> j=$((2<<$i)) >> yes | head -n$j > t.sort >> strace -c -e clone sort --parallel=16 t.sort -o /dev/null 2>&1 | >> join --nocheck-order -a1 -o1.4,1.5 - /dev/null | >> sed -n "s/\([0-9]*\) clone/$j\t\1/p" >> done > 4 1 > 8 2 > 16 3 > 32 4 > 64 4 > 128 4 ... > 1048576 4 > 2097152 4 > 4194304 8 > 8388608 16 > > When I restrict the buffer size with '-S 1M', many more threads > are created (a max of 16 in parallel with the above command) > 4 1 > 8 2 > 16 3 > 32 4 > 64 4 > 128 4 > 256 4 > 512 4 > 1024 4 > 2048 4 > 4096 4 > 8192 4 > 16384 8 > 32768 12 > 65536 24 > 131072 44 > 262144 84 > 524288 167 > 1048576 332 > 2097152 660 > 4194304 1316 > 8388608 2628 > > After increasing the heuristic to 128K, I get _no_ threads until -S > 10M > and this seems to be independent of line length.
Thanks for investigating that. Could strace -c -e clone be doing something unexpected? When I run this (without my patch), it would use 8 threads: seq 16 > in; strace -ff -o k ./sort --parallel=16 in -o /dev/null since it created eight k.PID files: $ ls -1 k.*|wc -l 8 Now, for such a small file, it does not call clone at all.