On Mon, Jan 29, 2018 at 6:19 PM, hubert depesz lubaczewski <[email protected]> wrote: > On Sun, Jan 28, 2018 at 02:45:42AM +0100, Ole Tange wrote: >> On Thu, Jan 25, 2018 at 4:33 PM, hubert depesz lubaczewski >> You can also use --cat: >> >> tar cf - /some/directory | parallel -j 5 --pipe --block 5G --cat >> --recend '' 'cat {} | ./handle-single-part.sh {#}' >> >> This way each block is saved to the tempdir before the job starts. By >> my limited testing this should make GNU Parallel only keep 1-2 blocks >> in memory. > > So, I did try it. > To make it as simple as possible, I made source of data: > dd if=/dev/zero bs=8k count=13107200
Are you sure your tar command can deliver data at that speed sustained? If not, then you are not doing a real test. The above command is based on tar _not_ delivering data faster than than saving the temp file to the tempdisk. Typically the tmp-filesystem will be at least as fast as any other file system, but on many systems /tmp is faster than other filesystems on the server. If you really will not use tar to generate the input, then at least make sure 'dd' only delivers data at the speed that 'tar' would have (eg. by using 'pv'). /Ole
