Hello, A question regarding the memory usage requirements of the parallel sort: It seems that the memory usage (resident size) increases with the number of threads used.
It also seems to me (but not verified) that the increased memory usage happens not at the sorting phase, but at the output phase (when writing the sorted results to STDOUT). I'm wondering if this is the intended behavior, because I'm sorting big files in memory, and with a single-threaded sort, the rule of thumb is to use --buffer-size of 150% the file size to do the sorting complete in memory without temporary files. Here's an example: ===== ## directory without write permission, used as temporary-directory - ## sort will fail if it tries to use temporary files $ ls -lod /data/gordon/forbidden/ dr-xr-xr-x. 2 gordon 4096 Dec 17 12:03 /data/gordon/forbidden/ ## Big file to sort, created with "gensort -a 2000000" $ ls -lhos /data/gordon/ramdisk/gensort-2m 1.9G -rw-r--r--. 1 gordon 1.9G Dec 17 11:46 /data/gordon/ramdisk/gensort-2m ## Sort with single thread, in-memory - works OK $ src/sort --parallel=1 -T /data/gordon/forbidden/ -S 4G \ /data/gordon/ramdisk/gensort-2m > /dev/null ## Sort with two threads, in-memory, still works OK $ src/sort --parallel=2 -T /data/gordon/forbidden/ -S 4G \ /data/gordon/ramdisk/gensort-2m > /dev/null ## sort with 16 threads, sort tries to use temporary files, ## meaning 4GB is not enough to sort a 2GB file. $ src/sort --parallel=16 -T /data/gordon/forbidden/ -S 4G \ /data/gordon/ramdisk/gensort-2m > /dev/null src/sort: cannot create temporary file in `/data/gordon/forbidden/': Permission denied ===== The reason I think it happens in the output phase, is because it seems memory usage stays the same while the output file has zero size, and it goes up once the output file starts increasing in size (not very scientific observation, but still...). Checking resident size with "top", shows: --parallel RES (GB) 1 2.8 2 3.1 4 3.7 6 3.9 8 4.2 10 4.4 12 4.5 14 4.7 16 4.8 18 4.9 20 5 22 5 24 5.1 26 5.2 28 5.3 If this happens by design, then no problem (perhaps just document it, to warn about increased memory requirements). -gordon