Re: better buffer size for copy

2005-11-22 Thread Robert Latham
On Mon, Nov 21, 2005 at 12:45:40AM -0500, Phillip Susi wrote:
> If we are talking about the conventional blocking cached read,
> followed by a blocking cached write, then I think you will find that
> using a buffer size of several pages ( say 32 or 64 KB ) will be
> MUCH more efficient than 1024 bytes ( the typical local filesystem
> block size ), so using st_blksize for the size of the read/write
> buffer is not good.  I think you may be ascribing meaning to
> st_blksize that is not there. 

I mean no offense cutting out most of your points.  You describe great
ways to achieve high I/O rates for anyone writing a custom file mover.
I shouldn't have mentioned network file systems.  It's a distraction
from the real point of my patch: cp(1) should consider both the source
and the destination st_blksize.

All I expect from st_blksize is what the stat(2)
manpage suggests:

   The value st_blocks gives the size of  the  file  in  512-byte
   blocks.  (This  may  be  smaller than st_size/512 e.g. when the
   file has holes.) The value st_blksize gives the "preferred"
   blocksize for efficient file system  I/O.  (Writing to a file
   in smaller chunks may cause an inefficient
   read-modify-rewrite.)

All I really want is for cp(1) to do the right thing no matter what
the soruce or destination st_blksize value might be. 

In copying from a 4k blocksize file sytem to a 64k blocksize
filesystem, cp(1) will perform well, as it is using a 64k buffer.  

In copying *from* that 64k blocksize filesystem *to* a 4k blocksize
filesytem, cp(1) will not perform as well: it's using a 4k buffer and
so reading from the source filesystem in less-than-ideal chunks.

Thanks again for taking the time to respond.  I hope I have made the
intent of my patch more clear. 

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: better buffer size for copy

2005-11-20 Thread Robert Latham
On Sat, Nov 19, 2005 at 10:49:07AM -0500, Phillip Susi wrote:
> I don't see why the filesystem's cluster size should have a thing to do 
> with the buffer size used to copy files.  For optimal performance, the 
> larger the buffer, the better.  Diminishing returns applies of course, 
> so at some point the increase in buffer size results in little to no 
> further increase in performance, so that's the size you should use.  I 
> believe that the optimal size is about 64 KB. 

In local file systems, i'm sure you are correct.  If you are working
with a remote file system, however, the optimal size is on the order
of megabytes, not kilobytes.  For a specific example, consider the
PVFS2 file system, where the plateau in "blocksize vs. bandwitdh" is
two orders of magnitude larger than 64 KB.  PVFS2 is a parallel file
system for linux clusters.  I am not nearly as familiar with Lustre,
GPFS, or GFS, but I suspect those filesystems too would benefit from
block sizes larger than 64 KB.  

Are you taking umbrage at the idea of using st_blksize to direct how
large the transfer size should be for I/O?  I don't know what other
purpose st_blksize should have, nor are there any other fields which
are remotely valid for that purpose.  

Thanks for your feedback. 
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: better buffer size for copy

2005-11-18 Thread Robert Latham

(README says to ping if there's not been an ack of a patch after two
weeks.  here i go)

This patch to today's (18 Nov 2005) coreutils CVS makes copy.c
consider both the source and destination blocksize when computing
buf_size.  With this patch, src/copy.c will use the LCM of the soruce
and destination block sizes.  As Paul suggested, I used the buffer_lcm
routine from diffutils. 

For what it's worth, this patch does not introduce any regressions
into the coreutils testsuite.

When copying from a remote filesystem with a block size of 4MB to a
filesystem with a 4k blocksize, the copy is *very* slow.  Going from a
filesystem with 4k blocks to a filesystem with 4MB blocks is much
faster.  With this patch, both operations are equally performant.

I went ahead and added a ChangeLog entry as well.  

Thanks.  I'll be more than happy to incorporate any suggestions or
comments.

==rob


-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/ChangeLog coreutils.lcm/ChangeLog
--- coreutils/ChangeLog 2005-11-18 16:24:52.0 -0600
+++ coreutils.lcm/ChangeLog 2005-11-18 16:24:34.0 -0600
@@ -1,3 +1,10 @@
+2005-11-18  Rob Latham <[EMAIL PROTECTED]>
+   * lib/Makefile.am, lib/buffer-lcm.c, lib/buffer-lcm.h: add code to find
+ least common multiple of two values, with logic to handle unusual
+ input (taken from diffutils)
+   * src/copy.c: use the LCM of the source and dest blocksize when
+ figuring out the ideal blocksize.
+
 2005-11-17  Jim Meyering  <[EMAIL PROTECTED]>
 
* Version 6.0-cvs.
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/lib/buffer-lcm.c coreutils.lcm/lib/buffer-lcm.c
--- coreutils/lib/buffer-lcm.c  1969-12-31 18:00:00.0 -0600
+++ coreutils.lcm/lib/buffer-lcm.c  2005-11-18 10:04:54.0 -0600
@@ -0,0 +1,47 @@
+/* buffer-lcm.c - an lcm routine used for computing optimal buffer size
+ 
+   Copyright (C) 2005 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+
+/* Least common multiple of two buffer sizes A and B.  However, if
+   either A or B is zero, or if the multiple is greater than LCM_MAX,
+   return a reasonable buffer size.  
+ 
+   This method was taken from diffutils/lib/cmpbuf.c */
+
+#include 
+
+size_t
+buffer_lcm (size_t a, size_t b, size_t lcm_max)
+{
+  size_t lcm, m, n, q, r;
+
+  /* Yield reasonable values if buffer sizes are zero.  */
+  if (!a)
+return b ? b : 8 * 1024;
+  if (!b)
+return a;
+
+  /* n = gcd (a, b) */
+  for (m = a, n = b;  (r = m % n) != 0;  m = n, n = r)
+continue;
+
+  /* Yield a if there is an overflow.  */
+  q = a / n;
+  lcm = q * b;
+  return lcm <= lcm_max && lcm / b == q ? lcm : a;
+}
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/lib/buffer-lcm.h coreutils.lcm/lib/buffer-lcm.h
--- coreutils/lib/buffer-lcm.h  1969-12-31 18:00:00.0 -0600
+++ coreutils.lcm/lib/buffer-lcm.h  2005-11-18 10:04:54.0 -0600
@@ -0,0 +1,23 @@
+/* buffer-lcm.c - an lcm routine used for computing optimal buffer size
+
+   Copyright (C) 2005 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+/* Taken from diffutils/lib/cmpbuf.c */
+
+#include 
+
+size_t buffer_lcm(size_t a, size_t b, size_t lcm_max);
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutil

Re: better buffer size for copy

2005-11-07 Thread Robert Latham
On Mon, Nov 07, 2005 at 12:20:47PM -0800, Paul Eggert wrote:
> It's too much for an inlined function, I think.

That's what i thought you'd say.  Ok, this patch vs. today's
CVS adds buffer-lcm.h and buffer-lcm.c, adds those files to
Makefile.am,  and makes copy.c call
buffer_lcm. 

I left alone the other places that call lcm.  

Thanks for the feedback

==rob


diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/lib/buffer-lcm.c coreutils.lcm/lib/buffer-lcm.c
--- coreutils/lib/buffer-lcm.c  1969-12-31 18:00:00.0 -0600
+++ coreutils.lcm/lib/buffer-lcm.c  2005-11-07 14:46:29.0 -0600
@@ -0,0 +1,47 @@
+/* buffer-lcm.c - an lcm routine used for computing optimal buffer size
+ 
+   Copyright (C) 2005 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+
+/* Least common multiple of two buffer sizes A and B.  However, if
+   either A or B is zero, or if the multiple is greater than LCM_MAX,
+   return a reasonable buffer size.  
+ 
+   This method was taken from diffutils/lib/cmpbuf.c */
+
+#include 
+
+size_t
+buffer_lcm (size_t a, size_t b, size_t lcm_max)
+{
+  size_t lcm, m, n, q, r;
+
+  /* Yield reasonable values if buffer sizes are zero.  */
+  if (!a)
+return b ? b : 8 * 1024;
+  if (!b)
+return a;
+
+  /* n = gcd (a, b) */
+  for (m = a, n = b;  (r = m % n) != 0;  m = n, n = r)
+continue;
+
+  /* Yield a if there is an overflow.  */
+  q = a / n;
+  lcm = q * b;
+  return lcm <= lcm_max && lcm / b == q ? lcm : a;
+}
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/lib/buffer-lcm.h coreutils.lcm/lib/buffer-lcm.h
--- coreutils/lib/buffer-lcm.h  1969-12-31 18:00:00.0 -0600
+++ coreutils.lcm/lib/buffer-lcm.h  2005-11-07 14:46:26.0 -0600
@@ -0,0 +1,23 @@
+/* buffer-lcm.c - an lcm routine used for computing optimal buffer size
+
+   Copyright (C) 2005 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+/* Taken from diffutils/lib/cmpbuf.c */
+
+#include 
+
+size_t buffer_lcm(size_t a, size_t b, size_t lcm_max);
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/lib/Makefile.am coreutils.lcm/lib/Makefile.am
--- coreutils/lib/Makefile.am   2005-10-05 09:54:17.0 -0500
+++ coreutils.lcm/lib/Makefile.am   2005-11-07 14:49:01.0 -0600
@@ -27,6 +27,7 @@ DEFS += -DLIBDIR=\"$(libdir)\"
 
 libcoreutils_a_SOURCES = \
   allocsa.c allocsa.h \
+  buffer-lcm.c buffer-lcm.h \
   euidaccess.h \
   exit.h \
   fprintftime.c fprintftime.h \
diff -burpN -x CVS -x 'cscope*' -x Makefile.in -x configure -x autom4te.cache 
-x aclocal.m4 coreutils/src/copy.c coreutils.lcm/src/copy.c
--- coreutils/src/copy.c2005-09-24 22:07:33.0 -0500
+++ coreutils.lcm/src/copy.c2005-11-07 14:42:27.0 -0600
@@ -31,6 +31,7 @@
 
 #include "system.h"
 #include "backupfile.h"
+#include "buffer-lcm.h"
 #include "copy.h"
 #include "cp-hash.h"
 #include "dirname.h"
@@ -291,7 +292,7 @@ copy_reg (char const *src_name, char con
   goto close_src_and_dst_desc;
 }
 
-  buf_size = ST_BLKSIZE (sb);
+  buf_size = buffer_lcm(ST_BLKSIZE (sb), ST_BLKSIZE(src_open_sb), SIZE_MAX);
 
   /* Even with --sparse=always, try to create holes only
  if the destination is a regular file.  */

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http

Re: better buffer size for copy

2005-11-07 Thread Robert Latham
On Fri, Nov 04, 2005 at 10:07:51PM -0800, Paul Eggert wrote:
> [EMAIL PROTECTED] (Robert Latham) writes:
> 
> > In the time since the above thread was started, there is now an
> > implementation of lcm in src/system.h.
> 
> I'd rather use something more like buffer_lcm in diffutils, since it
> handles weird cases without dumping core.
> 

Ok, no problem.  In the old thread you wanted a new file under lib to
contain the implementation of buffer_lcm.  Coreutils now has a lot of
inlined routines in src/system.h, so would it be better to add
buffer_lcm to src/system.h, or stick with creating new files under
lib/ ?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


better buffer size for copy

2005-11-04 Thread Robert Latham
Hi

The thread here 
http://lists.gnu.org/archive/html/bug-coreutils/2003-11/msg00030.html
suggested that copy, instead of using the destination block size,
should use the LCM of the source block size and the destination block
size.

In the time since the above thread was started, there is now an
implementation of lcm in src/system.h.  Would the patch to
src/copy.c below make sense?

Thanks
==rob

Index: src/copy.c
===
RCS file: /cvsroot/coreutils/coreutils/src/copy.c,v
retrieving revision 1.190
diff -u -w -p -r1.190 copy.c
--- src/copy.c  25 Sep 2005 03:07:33 -  1.190
+++ src/copy.c  4 Nov 2005 19:12:23 -
@@ -291,7 +291,7 @@ copy_reg (char const *src_name, char con
   goto close_src_and_dst_desc;
 }
 
-  buf_size = ST_BLKSIZE (sb);
+  buf_size = lcm(ST_BLKSIZE (sb), ST_BLKSIZE(src_open_sb));
 
   /* Even with --sparse=always, try to create holes only
  if the destination is a regular file.  */



-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils