On 16/03/11 12:07, Jim Meyering wrote: > Pádraig Brady wrote: >> I've not fully analyzed this yet, and I'm not saying it's wrong, >> but the above change seems to have a large effect on thread >> creation when smaller buffers are used (you hinted previously >> that being less aggressive with the amount of mem used by default >> might be appropriate, and I agree). >> >> Anyway with the above I seem to need a buffer size more >> than 10M to have any threads created at all. >> >> Testing the original 4 lines heuristic with the following, shows: >> (note I only get > 4 threads after 4M of input, not 7 for 16 lines >> as indicated in NEWS). >> >> $ for i in $(seq 30); do >>> j=$((2<<$i)) >>> yes | head -n$j > t.sort >>> strace -c -e clone sort --parallel=16 t.sort -o /dev/null 2>&1 | >>> join --nocheck-order -a1 -o1.4,1.5 - /dev/null | >>> sed -n "s/\([0-9]*\) clone/$j\t\1/p" >>> done >> 4 1 >> 8 2 >> 16 3 >> 32 4 >> 64 4 >> 128 4 > ... >> 1048576 4 >> 2097152 4 >> 4194304 8 >> 8388608 16 >> >> When I restrict the buffer size with '-S 1M', many more threads >> are created (a max of 16 in parallel with the above command) >> 4 1 >> 8 2 >> 16 3 >> 32 4 >> 64 4 >> 128 4 >> 256 4 >> 512 4 >> 1024 4 >> 2048 4 >> 4096 4 >> 8192 4 >> 16384 8 >> 32768 12 >> 65536 24 >> 131072 44 >> 262144 84 >> 524288 167 >> 1048576 332 >> 2097152 660 >> 4194304 1316 >> 8388608 2628 >> >> After increasing the heuristic to 128K, I get _no_ threads until -S > 10M >> and this seems to be independent of line length. > > Thanks for investigating that. > Could strace -c -e clone be doing something unexpected? > When I run this (without my patch), it would use 8 threads: > > seq 16 > in; strace -ff -o k ./sort --parallel=16 in -o /dev/null > > since it created eight k.PID files: > > $ ls -1 k.*|wc -l > 8 > > Now, for such a small file, it does not call clone at all. >
Oops, yep I forget to add -f to strace. So NEWS is correct. # SUBTHREAD_LINES_HEURISTIC = 4 $ for i in $(seq 22); do j=$((2<<$i)) yes | head -n$j > t.sort strace -f -c -e clone ./sort --parallel=16 t.sort -o /dev/null 2>&1 | join --nocheck-order -a1 -o1.4,1.5 - /dev/null | sed -n "s/\([0-9]*\) clone/$j\t\1/p" done 4 1 8 3 16 7 32 15 64 15 128 15 256 15 512 15 1024 15 2048 15 4096 15 8192 15 16384 15 32768 15 65536 15 131072 15 262144 15 524288 15 1048576 15 2097152 15 4194304 30 8388608 45 # As above, but add -S1M option to sort 4 1 8 3 16 7 32 15 64 15 128 15 256 15 512 15 1024 15 2048 15 4096 15 8192 15 16384 30 32768 45 65536 90 131072 165 262144 315 524288 622 1048576 1245 2097152 2475 4194304 4935 8388608 9855 With SUBTHREAD_LINES_HEURISTIC=128k and -S1M option to sort we get no threads as nlines never gets above 12787 (there looks to be around 80 bytes overhead per line?). Only when -S >= 12M do we get nlines high enough to create threads. cheers, Pádraig.