stat() order performance issues
I have noticed that performing commands such as ls ( even with -U ) and du in a Maildir with many thousands of small files takes ages to complete. I have investigated and believe this is due to the order in which the files are stat()ed. I believe that these utilities are simply stat()ing the files in the order that they are returned by readdir(), and this causes a lot of random disk reads to fetch the inodes from disk out of order. My initial testing indicates that sorting the files into inode order and calling stat() on them in order is around an order of magnitude faster, so I would suggest that utilities be modified to behave this way. Questions/comments? ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: stat() order performance issues
On Thu, 25 Jan 2007, Phillip Susi wrote: I have noticed that performing commands such as ls ( even with -U ) and du in a Maildir with many thousands of small files takes ages to complete. I have investigated and believe this is due to the order in which the files are stat()ed. I believe that these utilities are simply stat()ing the files in the order that they are returned by readdir(), and this causes a lot of random disk reads to fetch the inodes from disk out of order. My initial testing indicates that sorting the files into inode order and calling stat() on them in order is around an order of magnitude faster, so I would suggest that utilities be modified to behave this way. If I shared the same filesystem type, kernel, and I/O subsystem then I would likely be able to reproduce these results, but... It's not really possible for the kernel to help by re-ordering the requests to disk, as the system calls are issued serially. Although there may be cases where "stat in inode order" doesn't help, I'm struggling to see how it could be worse than "stat in directory order", especially for high-churn directories. Are there filesystems which re-order directories to optimize access patterns? At a quick glance, I see nothing in POSIX which disallows this. Cheers, Phil ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: stat() order performance issues
Phillip Susi <[EMAIL PROTECTED]> wrote: > I have noticed that performing commands such as ls ( even with -U ) and Which ls option(s) are you using? Which file system? As you probably know, it really matters. If it's just "ls -U", then ls may not have to perform a single "stat" call. If it's "ls -l", then the stat per file is inevitable. But if it's "ls --inode" or "ls --file-type", with the right file system, ls gets all it needs via readdir, and can skip all stat calls. But with some other file system types, it still has to stat every file. For example, when I run "ls --file-type" on three maildirs containing over 160K entries, it's nearly instantaneous. There are only 3 stat calls: $ strace -c ls -1 a b c|wc -l % time seconds usecs/call callserrors syscall -- --- --- - - 88.550.025785 60043 getdents64 11.400.003320 23714 munmap 0.040.13 0 233 write 0.000.00 014 read 0.000.00 020 open 0.000.00 026 close 0.000.00 0 3 stat 0.000.00 021 fstat 0.000.00 0 5 lseek 0.000.00 040 mmap 0.000.00 010 mprotect 0.000.00 019 brk 0.000.00 0 2 rt_sigaction 0.000.00 0 1 rt_sigprocmask 0.000.00 0 2 2 ioctl 0.000.00 01111 access 0.000.00 0 7 mremap 0.000.00 0 4 socket 0.000.00 0 4 4 connect 0.000.00 0 1 execve 0.000.00 0 1 uname 0.000.00 015 fcntl 0.000.00 0 1 getrlimit 0.000.00 0 1 arch_prctl 0.000.00 0 1 set_tid_address -- --- --- - - 100.000.029118 49917 total 163843 > du in a Maildir with many thousands of small files takes ages to > complete. I have investigated and believe this is due to the order in Yep. du has to perform the stat calls. "ages"? Give us numbers. Is NFS involved? A slow disk? I've just run "du -s" on a directory containing almost 70,000 entries, and it didn't take *too* long with a cold cache: 21 seconds. Running the same command again, (hot cache), it took just 2s. The disk is local (about 2yrs old), but nothing fancy. The file system type is reiserfs. > which the files are stat()ed. I believe that these utilities are simply > stat()ing the files in the order that they are returned by readdir(), > and this causes a lot of random disk reads to fetch the inodes from disk > out of order. > > My initial testing indicates that sorting the files into inode order and > calling stat() on them in order is around an order of magnitude faster, > so I would suggest that utilities be modified to behave this way. Post your patch, so others can try easily. If sorting entries (when possible, i.e., for du, and some invocations of ls) before stating them really does result in a 10x speed-up on "important" systems, then there's a good chance we'll do it. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: stat() order performance issues
Phillip Susi <[EMAIL PROTECTED]> wrote: > Jim Meyering wrote: >> Which ls option(s) are you using? > > I used ls -Ui to list the inode number and do not sort. I expected this > to simply return the contents from getdents, but I see stat64 calls on > each file, I believe in the order they are returned by getdents in, > which causes a massive seek storm. > >> Which file system? As you probably know, it really matters. > > In my case, reiserfs, but this should apply equally as well to ext2/3. That's good, but libc version matters too. And the kernel version. Here, I have linux-2.6.18 and Debian/unstable's libc-2.3.6. >> If it's just "ls -U", then ls may not have to perform a single "stat" call. >> If it's "ls -l", then the stat per file is inevitable. >> But if it's "ls --inode" or "ls --file-type", with the right file system, >> ls gets all it needs via readdir, and can skip all stat calls. But with >> some other file system types, it still has to stat every file. > > It seems that ls -U does not stat, but ls -Ui does. It seems it > shouldn't because the name and inode number are returned by readdir > aren't they? Yes. Make sure you're using the latest version of coreutils. If necessary, use a debugger to see whether readdir provides valid inode information on your system. It should >> For example, when I run "ls --file-type" on three maildirs containing >> over 160K entries, it's nearly instantaneous. There are only 3 stat calls: >> $ strace -c ls -1 a b c|wc -l > > Are a, b and c files or directories? If they are files, then of course They're directories (of course), containing a total of 160K+ entries. > it would only stat 3 times, because you have only asked ls to look up 3 > files. Try just ls -Ui without the a b c parameters. > >>> du in a Maildir with many thousands of small files takes ages to >>> complete. I have investigated and believe this is due to the order in >> Yep. du has to perform the stat calls. >> "ages"? Give us numbers. Is NFS involved? A slow disk? >> I've just run "du -s" on a directory containing almost 70,000 entries, >> and it didn't take *too* long with a cold cache: 21 seconds. > > Modest disk, no NFS, 114k entries, and it takes 10-15 minutes with cold > cache. When I sorted the directory listing by inode number and ran stat > on each in that order with cold caches, it only took something like 1 > minute. 10-15 minutes is very bad. Something needs an upgrade. I presume you used xargs -- you wouldn't run stat 114K times... ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: stat() order performance issues
Jim Meyering wrote: Which ls option(s) are you using? I used ls -Ui to list the inode number and do not sort. I expected this to simply return the contents from getdents, but I see stat64 calls on each file, I believe in the order they are returned by getdents in, which causes a massive seek storm. Which file system? As you probably know, it really matters. In my case, reiserfs, but this should apply equally as well to ext2/3. If it's just "ls -U", then ls may not have to perform a single "stat" call. If it's "ls -l", then the stat per file is inevitable. But if it's "ls --inode" or "ls --file-type", with the right file system, ls gets all it needs via readdir, and can skip all stat calls. But with some other file system types, it still has to stat every file. It seems that ls -U does not stat, but ls -Ui does. It seems it shouldn't because the name and inode number are returned by readdir aren't they? For example, when I run "ls --file-type" on three maildirs containing over 160K entries, it's nearly instantaneous. There are only 3 stat calls: $ strace -c ls -1 a b c|wc -l Are a, b and c files or directories? If they are files, then of course it would only stat 3 times, because you have only asked ls to look up 3 files. Try just ls -Ui without the a b c parameters. du in a Maildir with many thousands of small files takes ages to complete. I have investigated and believe this is due to the order in Yep. du has to perform the stat calls. "ages"? Give us numbers. Is NFS involved? A slow disk? I've just run "du -s" on a directory containing almost 70,000 entries, and it didn't take *too* long with a cold cache: 21 seconds. Modest disk, no NFS, 114k entries, and it takes 10-15 minutes with cold cache. When I sorted the directory listing by inode number and ran stat on each in that order with cold caches, it only took something like 1 minute. Post your patch, so others can try easily. If sorting entries (when possible, i.e., for du, and some invocations of ls) before stating them really does result in a 10x speed-up on "important" systems, then there's a good chance we'll do it. I have no patch, I merely did some instrumentation with shell scripts, ls, and stat. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: stat() order performance issues
Jim Meyering wrote: That's good, but libc version matters too. And the kernel version. Here, I have linux-2.6.18 and Debian/unstable's libc-2.3.6. How does the kernel or libc version matter at all? What matters is the on disk filesystem layout and how it is not optimized for fetching stat information on files in what is essentially a random order, instead of inode order. In the case of ext2/3, the inodes are stored on disk in numerical order, and for reiserfs, they tend to be stored in order, but don't have to be. On ext2/3 I believe file names are stored in the order they were created in, and on reiserfs, they are stored in order of their hash. In both cases the ordering of inodes and the ordering of names returned from readdir are essentially randomly related. Anyhow, I am running kernel 2.6.15 and libc 2.3.6. 10-15 minutes is very bad. Something needs an upgrade. Or a bugfix/enhancement, unless there already is a newer version of coreutils that stats in inode order. My version of coreutils is 5.93. I presume you used xargs -- you wouldn't run stat 114K times... Yes ls -Ui > files cat files | sort -g | cut -c 9- > files-sorted cat files | cut -c 9- > files-unsorted time cat files-unsorted | xargs stat > /dev/null < clear cache > time cat files-sorted | xargs stat > /dev/null Sorting by inode number made the stats at least 10 times faster. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: stat() order performance issues
Phillip Susi <[EMAIL PROTECTED]> wrote: > Jim Meyering wrote: >> That's good, but libc version matters too. >> And the kernel version. Here, I have linux-2.6.18 and >> Debian/unstable's libc-2.3.6. > > How does the kernel or libc version matter at all? What matters is the > on disk filesystem layout and how it is not optimized for fetching stat > information on files in what is essentially a random order, instead of For some cases, it matters a lot. I.e. for ls -i. If your libc's readdir support doesn't put file system information into struct dirent, then you're out of luck. Likewise, if your OS is too old. > inode order. In the case of ext2/3, the inodes are stored on disk in > numerical order, and for reiserfs, they tend to be stored in order, but > don't have to be. On ext2/3 I believe file names are stored in the > order they were created in, and on reiserfs, they are stored in order of > their hash. In both cases the ordering of inodes and the ordering of > names returned from readdir are essentially randomly related. > > Anyhow, I am running kernel 2.6.15 and libc 2.3.6. > >> 10-15 minutes is very bad. >> Something needs an upgrade. > > Or a bugfix/enhancement, unless there already is a newer version of > coreutils that stats in inode order. My version of coreutils is 5.93. Um... that's over a year old. The latest is coreutils-6.7. And yes, you'll notice improvements in this regard. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils