Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
Hello Andrew, Andrew McGill list2008 at lunch.za.net writes: find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums sort -k2 md5sums md5sums.sorted To avoid losing output, use append mode for writing: : ~/md5sums find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums 21 sort -k2 md5sums md5sums.sorted This just recently came up in Autoconf: http://thread.gmane.org/gmane.comp.shells.bash.bugs/11958 Cheers, Ralf ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
On Thursday 13 November 2008 14:52:44 Ralf Wildenhues wrote: Hello Andrew, Andrew McGill list2008 at lunch.za.net writes: find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums sort -k2 md5sums md5sums.sorted To avoid losing output, use append mode for writing: : ~/md5sums find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums 21 sort -k2 md5sums md5sums.sorted This just recently came up in Autoconf: http://thread.gmane.org/gmane.comp.shells.bash.bugs/11958 Ah! I see! So without O_APPEND, things don't work quite right. At the risk of drifting off topic - is there ever a benefit in the shell implementing a -redirection with just O_TRUNC , rather than O_TRUNC | O_APPEND ? Does the output process ever need to seek() back in stdout? (If this off topic, please feel free to flame me, and/or direct me to the correct forum -- but I did freely send a bug report to the bash folks, even though I'll bet they're not alone in omitting O_APPEND with O_TRUNC). :-) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
James Youngman wrote: This version should be race-free: find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums 21 I think that writing into a pipe should be OK, since pipes are non-seekable. However, with pipes in this situation you still have a problem if processes try to write more than PIPE_BUF bytes. You aren't using a pipe there. What you are doing is having the shell open the file, then the md5sum processes all inherit that fd so they all share the same offset. As long as they write() the entire line at once, the file pointer will be updated atomically for all processes and the lines from each process won't clobber each other. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
[ CC ++ [EMAIL PROTECTED] ] On Tue, Nov 11, 2008 at 2:58 PM, Andrew McGill [EMAIL PROTECTED] wrote: What would you expect this to do --: find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums Produce a race condition :)It generates 16 parallel processes, each writing to the md5sums file. Unfortunately sometimes the writes occur at the same offset in the output file. To illustrate: ~$ strace -f -e open,fork,execve sh -c echo hello foo execve(/bin/sh, [sh, -c, echo hello foo], [/* 39 vars */]) = 0 [...] open(foo, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 ~$ strace -f -e open,fork,execve sh -c echo hello foo execve(/bin/sh, [sh, -c, echo hello foo], [/* 39 vars */]) = 0 [...] open(foo, O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 This version should be race-free: find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums 21 I think that writing into a pipe should be OK, since pipes are non-seekable. However, with pipes in this situation you still have a problem if processes try to write more than PIPE_BUF bytes. Is there a correct way to do md5sums in parallel without having a shared output buffer which eats output (I presume) -- or is losing output when haphazardly combining output streams actually strange and unusual? I hope the solution about solved your problem - and please follow up if so. This example is probably worthy of being mentioned in the xargs documentation, too. Thanks for your comment! James. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
On Saturday 08 November 2008 20:05:25 Jim Meyering wrote: Andrew McGill [EMAIL PROTECTED] wrote: Greetings coreutils folks, There are a number of interesting filesystems (glusterfs, lustre? ... NFS) which could benefit from userspace utilities doing certain operatings in parallel. (I have a very slow glusterfs installation that makes me think that some things can be done better.) For example, copying a number of files is currently done in series ... cp a b c d e f g h dest/ but, on certain filesystems, it would be roughly twice as efficient if implemented in two parallel threads, something like: cp a c e g dest/ cp b d f h dest/ since the source and destination files can be stored on multiple physical volumes. How about parallelizing it via xargs, e.g., $ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \ --max-procs=2 -- cp --target-directory=dest cp --target-directory=dest a b c d cp --target-directory=dest e f g h Obviously the above is tailored (-L4) to your 8-input example. In practice, you'd use a larger number, unless latency is so high as to dwarf the cost of extra fork/exec syscalls, in which case even -L1 might make sense. I did the command above with md5sum as the command, and got missing lines in the output. I optimistically hoped that would not happen! mv and ln also accept the --target-directory=dest option. Simlarly, ls -l . will readdir(), and then stat() each file in the directory. On a filesystem with high latency, it would be faster to issue the stat() calls asynchronously, and in parallel, and then collect the results for If you can demonstrate a large performance gain on systems that many people use, then maybe... There is more than a little value in keeping programs like those in the coreutils package relatively simple, but if the cost(maintenance+portability burden)/benefit ratio is low enough, then anything is possible. For example, a well-encapsulated, optionally-threaded stat_all_dir_entries API might be useful in some situations. So a relatively small change for parallel stat() in ls could fly. If getting any eventual patch into upstream coreutils is important to you, be sure there is some consensus on this list before doing a lot of work on it. Any ideas on how to do a parallel cp / mv in a way that is not Considered Harmful? Maybe prefetch_files(max_bytes,file1,...,NULL) ... aargh. display. (This could improve performance for NFS, in proportion to the latency and the number of threads.) Question: Is there already a set of improved utilities that implement this kind of technique? Not that I know of. If not, would this kind of performance enhancements be considered useful? It's impossible to say without knowing more. On the (de?)merits of xargs for parallel processing: What would you expect this to do --: find -type f -print0 | xargs -0 -n 8 --max-procs=16 md5sum ~/md5sums sort -k2 md5sums md5sums.sorted Compared to this? find -type f -print0 | xargs -0 md5sum ~/md5sums sort -k2 md5sums md5sums.sorted I was a little surprised that on my system running in parallel (the first version) loses around 1 line of output per thousand (md5sum of 22Gb in mostly small files). Is there a correct way to do md5sums in parallel without having a shared output buffer which eats output (I presume) -- or is losing output when haphazardly combining output streams actually strange and unusual? ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
On Sun, Nov 9, 2008 at 11:06 PM, Dr. David Alan Gilbert [EMAIL PROTECTED] wrote: I keep wondering if the OS level needs a better interface; an 'openv' or 'statv' or I'm currently wondering if a combined call would work - something which would stat a path, if it's a normal file, open it, read upto a buffers worth and if finished close it - it might work nicely for small files. I suspect that a combined call would not be widely useful, though it would likely provide a useful speedup for your use case. I suspect that the statv/openv combination would fit more use-cases. A statv function could be useful for anything that uses fts for example (rm, find, ...) and for file-open dialogue boxes. I have to say though that I've never used writev. However, people continue to design more advanced filesystems; the filesystem knows a lot about how the data is arranged and (therefore) the optimal order in which to perform operations. The application knows a lot about the set of operations it plans to execute, too. However, these two pieces of software communicate through a small keyhole; the POSIX file API. I'm not clear though on what nature of API might be more generally useful for a wide class of programs; existing programs after all are designed in ways that work well with the existing operating system interfaces. Perhaps this overcomplicates the issue though, since not many programs interact with more than a few dozen files and therefore probably wouldn't need to adopt a more complex API. Thanks, James. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
* Andrew McGill ([EMAIL PROTECTED]) wrote: Greetings coreutils folks, There are a number of interesting filesystems (glusterfs, lustre? ... NFS) which could benefit from userspace utilities doing certain operatings in parallel. (I have a very slow glusterfs installation that makes me think that some things can be done better.) For example, copying a number of files is currently done in series ... cp a b c d e f g h dest/ but, on certain filesystems, it would be roughly twice as efficient if implemented in two parallel threads, something like: cp a c e g dest/ cp b d f h dest/ since the source and destination files can be stored on multiple physical volumes. Of course you can't do that by hand since each might be a directory with an unbalanced number of files etc - so you are right, something smarter is needed (my pet hate is 'tar' or 'cp' working it's way through a source tree of thousands of small files). Simlarly, ls -l . will readdir(), and then stat() each file in the directory. On a filesystem with high latency, it would be faster to issue the stat() calls asynchronously, and in parallel, and then collect the results for display. (This could improve performance for NFS, in proportion to the latency and the number of threads.) I think, as you are suggesting, you have to end up doing threading in the userland code which to me seems to be mad since the code doesn't really know how wide to go and it's a fair overhead. In addition this behaviour can be really bad if you get it wrong - for example if 'dest' is a single disc then having multiple writers writing two large files leads to fragmentation on many filesystems. I once tried to write a backup system that streamed data from 10's of machines trying to write a few MB at a time on Linux, each machine being a separate process; unfortuantely the kernel was too smart and ended up writing a few KB from each process before moving onto the next leading to *awful* throughput. Question: Is there already a set of improved utilities that implement this kind of technique? If not, would this kind of performance enhancements be considered useful? (It would mean introducing threading into programs which are currently single-threaded.) One could also optimise the text utilities like cat by doing the open() and stat() operations in parallel and in the background -- userspace read-ahead caching. All of the utilities which process mutliple files could get small speed boosts from this -- rm, cat, chown, chmod ... even tail, head, wc -- but probably only on network filesystems. I keep wondering if the OS level needs a better interface; an 'openv' or 'statv' or I'm currently wondering if a combined call would work - something which would stat a path, if it's a normal file, open it, read upto a buffers worth and if finished close it - it might work nicely for small files. Dave -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC HPPA | In Hex / \ _|_ http://www.treblig.org |___/ ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Threaded versions of cp, mv, ls for high latency / parallel filesystems?
Greetings coreutils folks, There are a number of interesting filesystems (glusterfs, lustre? ... NFS) which could benefit from userspace utilities doing certain operatings in parallel. (I have a very slow glusterfs installation that makes me think that some things can be done better.) For example, copying a number of files is currently done in series ... cp a b c d e f g h dest/ but, on certain filesystems, it would be roughly twice as efficient if implemented in two parallel threads, something like: cp a c e g dest/ cp b d f h dest/ since the source and destination files can be stored on multiple physical volumes. Simlarly, ls -l . will readdir(), and then stat() each file in the directory. On a filesystem with high latency, it would be faster to issue the stat() calls asynchronously, and in parallel, and then collect the results for display. (This could improve performance for NFS, in proportion to the latency and the number of threads.) Question: Is there already a set of improved utilities that implement this kind of technique? If not, would this kind of performance enhancements be considered useful? (It would mean introducing threading into programs which are currently single-threaded.) To the user, it could look very much the same ... export GNU_COREUTILS_THREADS=8 cp # manipulate multiple files simultaneously mv # manipulate multiple files simultaneously ls # stat() multiple files simultaneously One could also optimise the text utilities like cat by doing the open() and stat() operations in parallel and in the background -- userspace read-ahead caching. All of the utilities which process mutliple files could get small speed boosts from this -- rm, cat, chown, chmod ... even tail, head, wc -- but probably only on network filesystems. :-) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
Andrew McGill [EMAIL PROTECTED] wrote: Greetings coreutils folks, There are a number of interesting filesystems (glusterfs, lustre? ... NFS) which could benefit from userspace utilities doing certain operatings in parallel. (I have a very slow glusterfs installation that makes me think that some things can be done better.) For example, copying a number of files is currently done in series ... cp a b c d e f g h dest/ but, on certain filesystems, it would be roughly twice as efficient if implemented in two parallel threads, something like: cp a c e g dest/ cp b d f h dest/ since the source and destination files can be stored on multiple physical volumes. How about parallelizing it via xargs, e.g., $ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \ --max-procs=2 -- cp --target-directory=dest cp --target-directory=dest a b c d cp --target-directory=dest e f g h Obviously the above is tailored (-L4) to your 8-input example. In practice, you'd use a larger number, unless latency is so high as to dwarf the cost of extra fork/exec syscalls, in which case even -L1 might make sense. mv and ln also accept the --target-directory=dest option. Simlarly, ls -l . will readdir(), and then stat() each file in the directory. On a filesystem with high latency, it would be faster to issue the stat() calls asynchronously, and in parallel, and then collect the results for If you can demonstrate a large performance gain on systems that many people use, then maybe... There is more than a little value in keeping programs like those in the coreutils package relatively simple, but if the cost(maintenance+portability burden)/benefit ratio is low enough, then anything is possible. For example, a well-encapsulated, optionally-threaded stat_all_dir_entries API might be useful in some situations. If getting any eventual patch into upstream coreutils is important to you, be sure there is some consensus on this list before doing a lot of work on it. display. (This could improve performance for NFS, in proportion to the latency and the number of threads.) Question: Is there already a set of improved utilities that implement this kind of technique? Not that I know of. If not, would this kind of performance enhancements be considered useful? It's impossible to say without knowing more. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?
On Sat, Nov 8, 2008 at 6:05 PM, Jim Meyering [EMAIL PROTECTED] wrote: How about parallelizing it via xargs, e.g., $ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \ --max-procs=2 -- cp --target-directory=dest cp --target-directory=dest a b c d cp --target-directory=dest e f g h For tools lacking a --target-directory option there is this shell trick: $ echo a b c d e f g h | xargs -n4 --no-run-if-empty --max-procs=2 -- sh -c 'prog $@ destination' prog James. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils