Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-13 Thread Ralf Wildenhues
Hello Andrew,

Andrew McGill list2008 at lunch.za.net writes:
 
     find -type f -print0 | 
 xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums
 
     sort -k2  md5sums  md5sums.sorted

To avoid losing output, use append mode for writing:
 :  ~/md5sums
 find -type f -print0 | 
 xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums 21

 sort -k2  md5sums  md5sums.sorted

This just recently came up in Autoconf:
http://thread.gmane.org/gmane.comp.shells.bash.bugs/11958

Cheers,
Ralf



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-13 Thread Andrew McGill
On Thursday 13 November 2008 14:52:44 Ralf Wildenhues wrote:
 Hello Andrew,

 Andrew McGill list2008 at lunch.za.net writes:
      find -type f -print0 |
  xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums
 
      sort -k2  md5sums  md5sums.sorted

 To avoid losing output, use append mode for writing:
  :  ~/md5sums

  find -type f -print0 |
  xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums 21

  sort -k2  md5sums  md5sums.sorted

 This just recently came up in Autoconf:
 http://thread.gmane.org/gmane.comp.shells.bash.bugs/11958
Ah!  I see!  So without O_APPEND, things don't work quite right.

At the risk of drifting off topic - is there ever a benefit in the shell 
implementing a -redirection with just O_TRUNC , rather than O_TRUNC | 
O_APPEND ?   Does the output process ever need to seek() back in stdout?  (If 
this off topic, please feel free to flame me, and/or direct me to the correct 
forum -- but I did freely send a bug report to the bash folks, even though 
I'll bet they're not alone in omitting O_APPEND with O_TRUNC).

:-)


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-12 Thread Phillip Susi

James Youngman wrote:

This version should be race-free:

find -type f -print0 |
 xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums 21

I think that writing into a pipe should be OK, since pipes are
non-seekable.  However, with pipes in this situation you still have a
problem if processes try to write more than PIPE_BUF bytes.


You aren't using a pipe there.  What you are doing is having the shell 
open the file, then the md5sum processes all inherit that fd so they all 
share the same offset.  As long as they write() the entire line at once, 
the file pointer will be updated atomically for all processes and the 
lines from each process won't clobber each other.




___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-12 Thread James Youngman
[ CC ++ [EMAIL PROTECTED] ]


On Tue, Nov 11, 2008 at 2:58 PM, Andrew McGill [EMAIL PROTECTED] wrote:
 What would you expect this to do --:

 find -type f -print0 |
 xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums

Produce a race condition :)It generates 16 parallel processes,
each writing to the md5sums file.  Unfortunately sometimes the writes
occur at the same offset in the output file. To illustrate:

~$ strace -f -e open,fork,execve sh -c echo hello  foo
execve(/bin/sh, [sh, -c, echo hello  foo], [/* 39 vars */]) = 0
[...]
open(foo, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
~$ strace -f -e open,fork,execve sh -c echo hello  foo
execve(/bin/sh, [sh, -c, echo hello  foo], [/* 39 vars */]) = 0
[...]
open(foo, O_WRONLY|O_CREAT|O_APPEND, 0666) = 3

This version should be race-free:

find -type f -print0 |
 xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums 21

I think that writing into a pipe should be OK, since pipes are
non-seekable.  However, with pipes in this situation you still have a
problem if processes try to write more than PIPE_BUF bytes.


 Is there a correct way to do md5sums in parallel without having a shared
 output buffer which eats output (I presume) -- or is losing output when
 haphazardly combining output streams actually strange and unusual?

I hope the solution about solved your problem - and please follow up
if so.  This example is probably worthy of being mentioned in the
xargs documentation, too.

Thanks for your comment!

James.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-11 Thread Andrew McGill
On Saturday 08 November 2008 20:05:25 Jim Meyering wrote:
 Andrew McGill [EMAIL PROTECTED] wrote:
  Greetings coreutils folks,
 
  There are a number of interesting filesystems (glusterfs, lustre? ...
  NFS) which could benefit from userspace utilities doing certain
  operatings in parallel.  (I have a very slow glusterfs installation that
  makes me think that some things can be done better.)
 
  For example, copying a number of files is currently done in series ...
  cp a b c d e f g h dest/
  but, on certain filesystems, it would be roughly twice as efficient if
  implemented in two parallel threads, something like:
  cp a c e g dest/ 
  cp b d f h dest/
  since the source and destination files can be stored on multiple physical
  volumes.

 How about parallelizing it via xargs, e.g.,

 $ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \
   --max-procs=2 -- cp --target-directory=dest
 cp --target-directory=dest a b c d
 cp --target-directory=dest e f g h

 Obviously the above is tailored (-L4) to your 8-input example.
 In practice, you'd use a larger number, unless latency is
 so high as to dwarf the cost of extra fork/exec syscalls,
 in which case even -L1 might make sense.
I did the command above with md5sum as the command, and got missing lines in 
the output.  I optimistically hoped that would not happen!

 mv and ln also accept the --target-directory=dest option.

  Simlarly, ls -l . will readdir(), and then stat() each file in the
  directory. On a filesystem with high latency, it would be faster to issue
  the stat() calls asynchronously, and in parallel, and then collect the
  results for

 If you can demonstrate a large performance gain on
 systems that many people use, then maybe...

 There is more than a little value in keeping programs
 like those in the coreutils package relatively simple,
 but if the cost(maintenance+portability burden)/benefit
 ratio is low enough, then anything is possible.

 For example, a well-encapsulated, optionally-threaded
 stat_all_dir_entries API might be useful in some situations.
So a relatively small change for parallel stat() in ls could fly.

 If getting any eventual patch into upstream coreutils is
 important to you, be sure there is some consensus on this
 list before doing a lot of work on it.
Any ideas on how to do a parallel cp / mv in a way that is not Considered 
Harmful?  Maybe prefetch_files(max_bytes,file1,...,NULL) ... aargh.

  display.  (This could improve performance for NFS, in proportion to the
  latency and the number of threads.)
 
 
  Question:  Is there already a set of improved utilities that implement
  this kind of technique?

 Not that I know of.

  If not, would this kind of performance enhancements be
  considered useful?

 It's impossible to say without knowing more.

On the (de?)merits of xargs for parallel processing:

What would you expect this to do --:

    find -type f -print0 | 
xargs -0 -n 8 --max-procs=16 md5sum  ~/md5sums

    sort -k2  md5sums  md5sums.sorted

Compared to this?

    find -type f -print0 | 
xargs -0                     md5sum  ~/md5sums

    sort -k2  md5sums  md5sums.sorted

I was a little surprised that on my system running in parallel (the first 
version) loses around 1 line of output per thousand (md5sum of 22Gb in mostly 
small files).  

Is there a correct way to do md5sums in parallel without having a shared 
output buffer which eats output (I presume) -- or is losing output when 
haphazardly combining output streams actually strange and unusual?


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-09 Thread James Youngman
On Sun, Nov 9, 2008 at 11:06 PM, Dr. David Alan Gilbert
[EMAIL PROTECTED] wrote:
 I keep wondering if the OS level needs a better interface; an 'openv' or 
 'statv'
 or I'm currently wondering if a combined call would work - something which
 would stat a path, if it's a normal file, open it, read upto a buffers worth
 and if finished close it - it might work nicely for small files.

I suspect that a combined call would not be widely useful, though it
would likely provide a useful speedup for your use case.

I suspect that the statv/openv combination would fit more use-cases.
A statv function could be useful for anything that uses fts for
example (rm, find, ...) and for file-open dialogue boxes.

I have to say though that I've never used writev.   However, people
continue to design more advanced filesystems; the filesystem knows a
lot about how the data is arranged and (therefore) the optimal order
in which to perform operations.   The application knows a lot about
the set of operations it plans to execute, too.   However, these two
pieces of software communicate through a small keyhole; the POSIX file
API.  I'm not clear though on what nature of API might be more
generally useful for a wide class of programs; existing programs after
all are designed in ways that work well with the existing operating
system interfaces.  Perhaps this overcomplicates the issue though,
since not many programs interact with more than a few dozen files and
therefore probably wouldn't need to adopt a more complex API.

Thanks,
James.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-09 Thread Dr. David Alan Gilbert
* Andrew McGill ([EMAIL PROTECTED]) wrote:
 Greetings coreutils folks,
 
 There are a number of interesting filesystems (glusterfs, lustre? ... NFS) 
 which could benefit from userspace utilities doing certain operatings in 
 parallel.  (I have a very slow glusterfs installation that makes me think 
 that some things can be done better.)
 
 For example, copying a number of files is currently done in series ...
   cp a b c d e f g h dest/
 but, on certain filesystems, it would be roughly twice as efficient if 
 implemented in two parallel threads, something like:
   cp a c e g dest/ 
   cp b d f h dest/
 since the source and destination files can be stored on multiple physical 
 volumes.  

Of course you can't do that by hand since each might be a directory with an
unbalanced number of files etc - so you are right, something smarter
is needed (my pet hate is 'tar' or 'cp' working it's way through a 
source tree of thousands of small files).

 Simlarly, ls -l . will readdir(), and then stat() each file in the directory. 
  
 On a filesystem with high latency, it would be faster to issue the stat() 
 calls asynchronously, and in parallel, and then collect the results for 
 display.  (This could improve performance for NFS, in proportion to the 
 latency and the number of threads.)

I think, as you are suggesting, you have to end up doing threading
in the userland code which to me seems to be mad since the code doesn't
really know how wide to go and it's a fair overhead.  In addition this
behaviour can be really bad if you get it wrong - for example
if 'dest' is a single disc then having multiple writers writing two
large files leads to fragmentation on many filesystems.

I once tried to write a backup system that streamed data from 10's of machines
trying to write a few MB at a time on Linux, each machine being a separate
process; unfortuantely the kernel was too smart and ended up writing a few
KB from each process before moving onto the next leading to *awful* throughput.

 Question:  Is there already a set of improved utilities that implement this 
 kind of technique?  If not, would this kind of performance enhancements be 
 considered useful?  (It would mean introducing threading into programs which 
 are currently single-threaded.)
 
 
 One could also optimise the text utilities like cat by doing the open() and 
 stat() operations in parallel and in the background -- userspace read-ahead 
 caching.  All of the utilities which process mutliple files could get 
 small speed boosts from this -- rm, cat, chown, chmod ... even tail, head, 
 wc -- but probably only on network filesystems.

I keep wondering if the OS level needs a better interface; an 'openv' or 'statv'
or I'm currently wondering if a combined call would work - something which
would stat a path, if it's a normal file, open it, read upto a buffers worth
and if finished close it - it might work nicely for small files.


Dave
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert| Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC  HPPA | In Hex /
 \ _|_ http://www.treblig.org   |___/


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-08 Thread Andrew McGill
Greetings coreutils folks,

There are a number of interesting filesystems (glusterfs, lustre? ... NFS) 
which could benefit from userspace utilities doing certain operatings in 
parallel.  (I have a very slow glusterfs installation that makes me think 
that some things can be done better.)

For example, copying a number of files is currently done in series ...
cp a b c d e f g h dest/
but, on certain filesystems, it would be roughly twice as efficient if 
implemented in two parallel threads, something like:
cp a c e g dest/ 
cp b d f h dest/
since the source and destination files can be stored on multiple physical 
volumes.  

Simlarly, ls -l . will readdir(), and then stat() each file in the directory.  
On a filesystem with high latency, it would be faster to issue the stat() 
calls asynchronously, and in parallel, and then collect the results for 
display.  (This could improve performance for NFS, in proportion to the 
latency and the number of threads.)


Question:  Is there already a set of improved utilities that implement this 
kind of technique?  If not, would this kind of performance enhancements be 
considered useful?  (It would mean introducing threading into programs which 
are currently single-threaded.)


To the user, it could look very much the same ...
export GNU_COREUTILS_THREADS=8
cp   # manipulate multiple files simultaneously
mv   # manipulate multiple files simultaneously
ls   # stat() multiple files simultaneously

One could also optimise the text utilities like cat by doing the open() and 
stat() operations in parallel and in the background -- userspace read-ahead 
caching.  All of the utilities which process mutliple files could get 
small speed boosts from this -- rm, cat, chown, chmod ... even tail, head, 
wc -- but probably only on network filesystems.

:-)


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-08 Thread Jim Meyering
Andrew McGill [EMAIL PROTECTED] wrote:
 Greetings coreutils folks,

 There are a number of interesting filesystems (glusterfs, lustre? ... NFS)
 which could benefit from userspace utilities doing certain operatings in
 parallel.  (I have a very slow glusterfs installation that makes me think
 that some things can be done better.)

 For example, copying a number of files is currently done in series ...
   cp a b c d e f g h dest/
 but, on certain filesystems, it would be roughly twice as efficient if
 implemented in two parallel threads, something like:
   cp a c e g dest/ 
   cp b d f h dest/
 since the source and destination files can be stored on multiple physical
 volumes.

How about parallelizing it via xargs, e.g.,

$ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \
  --max-procs=2 -- cp --target-directory=dest
cp --target-directory=dest a b c d
cp --target-directory=dest e f g h

Obviously the above is tailored (-L4) to your 8-input example.
In practice, you'd use a larger number, unless latency is
so high as to dwarf the cost of extra fork/exec syscalls,
in which case even -L1 might make sense.

mv and ln also accept the --target-directory=dest option.

 Simlarly, ls -l . will readdir(), and then stat() each file in the directory.
 On a filesystem with high latency, it would be faster to issue the stat()
 calls asynchronously, and in parallel, and then collect the results for

If you can demonstrate a large performance gain on
systems that many people use, then maybe...

There is more than a little value in keeping programs
like those in the coreutils package relatively simple,
but if the cost(maintenance+portability burden)/benefit
ratio is low enough, then anything is possible.

For example, a well-encapsulated, optionally-threaded
stat_all_dir_entries API might be useful in some situations.

If getting any eventual patch into upstream coreutils is
important to you, be sure there is some consensus on this
list before doing a lot of work on it.

 display.  (This could improve performance for NFS, in proportion to the
 latency and the number of threads.)


 Question:  Is there already a set of improved utilities that implement this
 kind of technique?

Not that I know of.

 If not, would this kind of performance enhancements be
 considered useful?

It's impossible to say without knowing more.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Threaded versions of cp, mv, ls for high latency / parallel filesystems?

2008-11-08 Thread James Youngman
On Sat, Nov 8, 2008 at 6:05 PM, Jim Meyering [EMAIL PROTECTED] wrote:
 How about parallelizing it via xargs, e.g.,

$ echo a b c d e f g h | xargs -t -n4 --no-run-if-empty \
  --max-procs=2 -- cp --target-directory=dest
cp --target-directory=dest a b c d
cp --target-directory=dest e f g h

For tools lacking a --target-directory option there is this shell trick:

$ echo a b c d e f g h |
 xargs -n4 --no-run-if-empty --max-procs=2 -- sh -c 'prog $@ destination' prog

James.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils