Re: proposed 'string' module for gnulib

2007-01-26 Thread Paul Eggert
While we're on the subject, here are the changes to coreutils
that will be needed once this stuff goes into gnulib.  (This
is the fun part -- removing stuff)

2007-01-26  Paul Eggert  <[EMAIL PROTECTED]>

* src/dircolors.c: Don't include strcase.h, as its contents
are now in  due to recent gnulib changes.
* src/system.h: Likewise, for mempcpy.h, memrchr.h, stpcpy.h,
strpbrk.h.

diff --git a/src/dircolors.c b/src/dircolors.c
index cf6e2a9..a6c9889 100644
--- a/src/dircolors.c
+++ b/src/dircolors.c
@@ -1,5 +1,5 @@
 /* dircolors - output commands to set the LS_COLOR environment variable
-   Copyright (C) 1996-2006 Free Software Foundation, Inc.
+   Copyright (C) 1996-2007 Free Software Foundation, Inc.
Copyright (C) 1994, 1995, 1997, 1998, 1999, 2000 H. Peter Anvin
 
This program is free software; you can redistribute it and/or modify
@@ -28,7 +28,6 @@
 #include "getline.h"
 #include "obstack.h"
 #include "quote.h"
-#include "strcase.h"
 #include "xstrndup.h"
 
 /* The official name of this program (e.g., no `g' prefix).  */
diff --git a/src/system.h b/src/system.h
index 2a90913..383388b 100644
--- a/src/system.h
+++ b/src/system.h
@@ -97,10 +97,6 @@ you must include  before including this file
memcpy otherwise.  */
 
 #include 
-#include "mempcpy.h"
-#include "memrchr.h"
-#include "stpcpy.h"
-#include "strpbrk.h"
 
 #include 
 


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: stat() order performance issues

2007-01-26 Thread Jim Meyering
Phillip Susi <[EMAIL PROTECTED]> wrote:
> I have noticed that performing commands such as ls ( even with -U ) and

Which ls option(s) are you using?
Which file system?  As you probably know, it really matters.
If it's just "ls -U", then ls may not have to perform a single "stat" call.
If it's "ls -l", then the stat per file is inevitable.
But if it's "ls --inode" or "ls --file-type", with the right file system,
ls gets all it needs via readdir, and can skip all stat calls.  But with
some other file system types, it still has to stat every file.

For example, when I run "ls --file-type" on three maildirs containing
over 160K entries, it's nearly instantaneous.  There are only 3 stat calls:

$ strace -c ls -1 a b c|wc -l
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 88.550.025785 60043   getdents64
 11.400.003320 23714   munmap
  0.040.13   0   233   write
  0.000.00   014   read
  0.000.00   020   open
  0.000.00   026   close
  0.000.00   0 3   stat
  0.000.00   021   fstat
  0.000.00   0 5   lseek
  0.000.00   040   mmap
  0.000.00   010   mprotect
  0.000.00   019   brk
  0.000.00   0 2   rt_sigaction
  0.000.00   0 1   rt_sigprocmask
  0.000.00   0 2 2 ioctl
  0.000.00   01111 access
  0.000.00   0 7   mremap
  0.000.00   0 4   socket
  0.000.00   0 4 4 connect
  0.000.00   0 1   execve
  0.000.00   0 1   uname
  0.000.00   015   fcntl
  0.000.00   0 1   getrlimit
  0.000.00   0 1   arch_prctl
  0.000.00   0 1   set_tid_address
-- --- --- - - 
100.000.029118   49917 total
163843

> du in a Maildir with many thousands of small files takes ages to
> complete.  I have investigated and believe this is due to the order in

Yep.  du has to perform the stat calls.

"ages"?  Give us numbers.  Is NFS involved?  A slow disk?
I've just run "du -s" on a directory containing almost 70,000 entries,
and it didn't take *too* long with a cold cache: 21 seconds.
Running the same command again, (hot cache), it took just 2s.
The disk is local (about 2yrs old), but nothing fancy.  The file system
type is reiserfs.

> which the files are stat()ed.  I believe that these utilities are simply
> stat()ing the files in the order that they are returned by readdir(),
> and this causes a lot of random disk reads to fetch the inodes from disk
> out of order.
>
> My initial testing indicates that sorting the files into inode order and
> calling stat() on them in order is around an order of magnitude faster,
> so I would suggest that utilities be modified to behave this way.

Post your patch, so others can try easily.
If sorting entries (when possible, i.e., for du, and some invocations
of ls) before stating them really does result in a 10x speed-up on
"important" systems, then there's a good chance we'll do it.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: stat() order performance issues

2007-01-26 Thread Jim Meyering
Phillip Susi <[EMAIL PROTECTED]> wrote:
> Jim Meyering wrote:
>> Which ls option(s) are you using?
>
> I used ls -Ui to list the inode number and do not sort.  I expected this
> to simply return the contents from getdents, but I see stat64 calls on
> each file, I believe in the order they are returned by getdents in,
> which causes a massive seek storm.
>
>> Which file system?  As you probably know, it really matters.
>
> In my case, reiserfs, but this should apply equally as well to ext2/3.

That's good, but libc version matters too.
And the kernel version.  Here, I have linux-2.6.18 and
Debian/unstable's libc-2.3.6.

>> If it's just "ls -U", then ls may not have to perform a single "stat" call.
>> If it's "ls -l", then the stat per file is inevitable.
>> But if it's "ls --inode" or "ls --file-type", with the right file system,
>> ls gets all it needs via readdir, and can skip all stat calls.  But with
>> some other file system types, it still has to stat every file.
>
> It seems that ls -U does not stat, but ls -Ui does.  It seems it
> shouldn't because the name and inode number are returned by readdir
> aren't they?

Yes.

Make sure you're using the latest version of coreutils.
If necessary, use a debugger to see whether readdir provides
valid inode information on your system.  It should

>> For example, when I run "ls --file-type" on three maildirs containing
>> over 160K entries, it's nearly instantaneous.  There are only 3 stat calls:
>> $ strace -c ls -1 a b c|wc -l
>
> Are a, b and c files or directories?  If they are files, then of course

They're directories (of course), containing a total of 160K+ entries.

> it would only stat 3 times, because you have only asked ls to look up 3
> files.  Try just ls -Ui without the a b c parameters.
>
>>> du in a Maildir with many thousands of small files takes ages to
>>> complete.  I have investigated and believe this is due to the order in
>> Yep.  du has to perform the stat calls.
>> "ages"?  Give us numbers.  Is NFS involved?  A slow disk?
>> I've just run "du -s" on a directory containing almost 70,000 entries,
>> and it didn't take *too* long with a cold cache: 21 seconds.
>
> Modest disk, no NFS, 114k entries, and it takes 10-15 minutes with cold
> cache.  When I sorted the directory listing by inode number and ran stat
> on each in that order with cold caches, it only took something like 1
> minute.

10-15 minutes is very bad.
Something needs an upgrade.

I presume you used xargs -- you wouldn't run stat 114K times...


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: stat() order performance issues

2007-01-26 Thread Phillip Susi

Jim Meyering wrote:

Which ls option(s) are you using?


I used ls -Ui to list the inode number and do not sort.  I expected this 
to simply return the contents from getdents, but I see stat64 calls on 
each file, I believe in the order they are returned by getdents in, 
which causes a massive seek storm.



Which file system?  As you probably know, it really matters.


In my case, reiserfs, but this should apply equally as well to ext2/3.


If it's just "ls -U", then ls may not have to perform a single "stat" call.
If it's "ls -l", then the stat per file is inevitable.
But if it's "ls --inode" or "ls --file-type", with the right file system,
ls gets all it needs via readdir, and can skip all stat calls.  But with
some other file system types, it still has to stat every file.



It seems that ls -U does not stat, but ls -Ui does.  It seems it 
shouldn't because the name and inode number are returned by readdir 
aren't they?



For example, when I run "ls --file-type" on three maildirs containing
over 160K entries, it's nearly instantaneous.  There are only 3 stat calls:

$ strace -c ls -1 a b c|wc -l


Are a, b and c files or directories?  If they are files, then of course 
it would only stat 3 times, because you have only asked ls to look up 3 
files.  Try just ls -Ui without the a b c parameters.



du in a Maildir with many thousands of small files takes ages to
complete.  I have investigated and believe this is due to the order in


Yep.  du has to perform the stat calls.

"ages"?  Give us numbers.  Is NFS involved?  A slow disk?
I've just run "du -s" on a directory containing almost 70,000 entries,
and it didn't take *too* long with a cold cache: 21 seconds.


Modest disk, no NFS, 114k entries, and it takes 10-15 minutes with cold 
cache.  When I sorted the directory listing by inode number and ran stat 
on each in that order with cold caches, it only took something like 1 
minute.



Post your patch, so others can try easily.
If sorting entries (when possible, i.e., for du, and some invocations
of ls) before stating them really does result in a 10x speed-up on
"important" systems, then there's a good chance we'll do it.


I have no patch, I merely did some instrumentation with shell scripts, 
ls, and stat.





___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: stat() order performance issues

2007-01-26 Thread Phillip Susi

Jim Meyering wrote:

That's good, but libc version matters too.
And the kernel version.  Here, I have linux-2.6.18 and
Debian/unstable's libc-2.3.6.


How does the kernel or libc version matter at all?  What matters is the 
on disk filesystem layout and how it is not optimized for fetching stat 
information on files in what is essentially a random order, instead of 
inode order.  In the case of ext2/3, the inodes are stored on disk in 
numerical order, and for reiserfs, they tend to be stored in order, but 
don't have to be.  On ext2/3 I believe file names are stored in the 
order they were created in, and on reiserfs, they are stored in order of 
their hash.  In both cases the ordering of inodes and the ordering of 
names returned from readdir are essentially randomly related.


Anyhow, I am running kernel 2.6.15 and libc 2.3.6.


10-15 minutes is very bad.
Something needs an upgrade.


Or a bugfix/enhancement, unless there already is a newer version of 
coreutils that stats in inode order.  My version of coreutils is 5.93.



I presume you used xargs -- you wouldn't run stat 114K times...


Yes

ls -Ui > files
cat files | sort -g | cut -c 9- > files-sorted
cat files | cut -c 9- > files-unsorted
time cat files-unsorted | xargs stat > /dev/null
< clear cache >
time cat files-sorted | xargs stat > /dev/null


Sorting by inode number made the stats at least 10 times faster.




___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: stat() order performance issues

2007-01-26 Thread Jim Meyering
Phillip Susi <[EMAIL PROTECTED]> wrote:
> Jim Meyering wrote:
>> That's good, but libc version matters too.
>> And the kernel version.  Here, I have linux-2.6.18 and
>> Debian/unstable's libc-2.3.6.
>
> How does the kernel or libc version matter at all?  What matters is the
> on disk filesystem layout and how it is not optimized for fetching stat
> information on files in what is essentially a random order, instead of

For some cases, it matters a lot.  I.e. for ls -i.
If your libc's readdir support doesn't put file system information
into struct dirent, then you're out of luck.
Likewise, if your OS is too old.

> inode order.  In the case of ext2/3, the inodes are stored on disk in
> numerical order, and for reiserfs, they tend to be stored in order, but
> don't have to be.  On ext2/3 I believe file names are stored in the
> order they were created in, and on reiserfs, they are stored in order of
> their hash.  In both cases the ordering of inodes and the ordering of
> names returned from readdir are essentially randomly related.
>
> Anyhow, I am running kernel 2.6.15 and libc 2.3.6.
>
>> 10-15 minutes is very bad.
>> Something needs an upgrade.
>
> Or a bugfix/enhancement, unless there already is a newer version of
> coreutils that stats in inode order.  My version of coreutils is 5.93.

Um... that's over a year old.  The latest is coreutils-6.7.
And yes, you'll notice improvements in this regard.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: dd (coreutils) 5.97 used power of 10 not 2 for calculating MB

2007-01-26 Thread Jim Meyering
Olivier Delhomme <[EMAIL PROTECTED]> wrote:

> Le Mon, 22 Jan 2007 20:09:27 -0700, Eric Blake disait :
>
>> Thanks for the report.  According to 'dd --help', "MB 1000*1000, M
>> 1024*1024", so the result matches documentation.  However, some of the
>> coreutils (such as ls) support --human-readable vs. --si to choose between
>> power of 10^3 and power of 2^10, maybe it would be worth adding such an
>> option to dd.
>
> There was an old patch that I proposed a long time ago about that
> against coreutils 5.0.91. Jim corrected it according to gnu standards
> and I think it was then Ok (it is still pending - see the TODO file at
> chapter "Changes expected to go in, someday.").

Oh!  Thanks for the reminder.
I see that the FSF now has a copyright assignment on file for you.

Did you ever write a ChangeLog entry or the paragraph of coreutils.texi
describing the new feature?  If not, can you do that, and adjust the
patch (below) so that it can be applied to the latest?

Here's the latest patch I see in the archives:
  http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/1183/focus=1202


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: proposed 'string' module for gnulib

2007-01-26 Thread Jim Meyering
Paul Eggert <[EMAIL PROTECTED]> wrote:
> While we're on the subject, here are the changes to coreutils
> that will be needed once this stuff goes into gnulib.  (This
> is the fun part -- removing stuff)
>
> 2007-01-26  Paul Eggert  <[EMAIL PROTECTED]>
>
>   * src/dircolors.c: Don't include strcase.h, as its contents
>   are now in  due to recent gnulib changes.
>   * src/system.h: Likewise, for mempcpy.h, memrchr.h, stpcpy.h,
>   strpbrk.h.

Looks fine.
I'll apply it as soon as the gnulib changes go in.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


coreutils patch to accommodate today's change to gnulib string module

2007-01-26 Thread Paul Eggert
Here's the patch (with ChangeLog entry) to adjust coreutils to match
today's change to the gnulib string module.

2007-01-26  Paul Eggert  <[EMAIL PROTECTED]>

Adjust to today's change to gnulib, which added a module for
string.h to replace the little include files like strcase.h.
* src/dircolors.c: Don't include strcase.h.
* src/system.h: Don't include mempcpy.h, memrchr.h, stpcpy.h, strpbrk.h.

diff --git a/src/dircolors.c b/src/dircolors.c
index cf6e2a9..a6c9889 100644
--- a/src/dircolors.c
+++ b/src/dircolors.c
@@ -1,5 +1,5 @@
 /* dircolors - output commands to set the LS_COLOR environment variable
-   Copyright (C) 1996-2006 Free Software Foundation, Inc.
+   Copyright (C) 1996-2007 Free Software Foundation, Inc.
Copyright (C) 1994, 1995, 1997, 1998, 1999, 2000 H. Peter Anvin

This program is free software; you can redistribute it and/or modify
@@ -28,7 +28,6 @@
 #include "getline.h"
 #include "obstack.h"
 #include "quote.h"
-#include "strcase.h"
 #include "xstrndup.h"

 /* The official name of this program (e.g., no `g' prefix).  */
diff --git a/src/system.h b/src/system.h
index 2a90913..383388b 100644
--- a/src/system.h
+++ b/src/system.h
@@ -97,10 +97,6 @@ you must include  before including this file
memcpy otherwise.  */

 #include 
-#include "mempcpy.h"
-#include "memrchr.h"
-#include "stpcpy.h"
-#include "strpbrk.h"

 #include 

M ChangeLog
M src/dircolors.c
M src/system.h
Committed as de18fcbc08a3c93060cf4f05052b0da26466019b


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: coreutils patch to accommodate today's change to gnulib string module

2007-01-26 Thread Jim Meyering
Paul Eggert <[EMAIL PROTECTED]> wrote:
> Here's the patch (with ChangeLog entry) to adjust coreutils to match
> today's change to the gnulib string module.
>
> 2007-01-26  Paul Eggert  <[EMAIL PROTECTED]>
>
>   Adjust to today's change to gnulib, which added a module for
>   string.h to replace the little include files like strcase.h.
>   * src/dircolors.c: Don't include strcase.h.
>   * src/system.h: Don't include mempcpy.h, memrchr.h, stpcpy.h, strpbrk.h.

Thanks again.
Applied and pushed.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


coreutils snapshot available

2007-01-26 Thread Jim Meyering
In case you can try the build/test routine using the
very latest snapshot tarball, here it is:

  http://meyering.net/cu/coreutils-6.7-dirty.tar.gz
  http://meyering.net/cu/coreutils-6.7-dirty.tar.gz.sig

This includes the very latest changes from gnulib (as of an hour
ago), so if you find a portability problem, check out both the
bug-coreutils@gnu.org and bug-gnulib@gnu.org mailing lists, in case
someone else has already reported/fixed it there.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: controlling ACL linkage in coreutils

2007-01-26 Thread Mike Frysinger
On Sunday 26 November 2006 15:48, Mike Frysinger wrote:
> On Sunday 26 November 2006 10:53, Jim Meyering wrote:
> > Sounds ok.
>
> patch attached ... it's pretty much all just whitespace changes to indent
> the ACL logic into the new if statement

ping ?
-mike


pgp9P6cxb5QIR.pgp
Description: PGP signature
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils