Re: How git affects kernel.org performance

2007-01-12 Thread Nigel Cunningham
Hi.

On Wed, 2007-01-10 at 22:07 +0800, Fengguang Wu wrote:
> Thanks, Nigel.
> But I'm very sorry that the calculation in the patch was wrong.
> 
> Would you give this new patch a run?

Sorry for my slowness. I just did

time find /usr/src | wc -l

again:

Without patch: 35.137, 35.104, 35.351 seconds
With patch: 34.518, 34.376, 34.489 seconds

So there's about .8 seconds saved.

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-12 Thread Nigel Cunningham
Hi.

On Wed, 2007-01-10 at 22:07 +0800, Fengguang Wu wrote:
 Thanks, Nigel.
 But I'm very sorry that the calculation in the patch was wrong.
 
 Would you give this new patch a run?

Sorry for my slowness. I just did

time find /usr/src | wc -l

again:

Without patch: 35.137, 35.104, 35.351 seconds
With patch: 34.518, 34.376, 34.489 seconds

So there's about .8 seconds saved.

Regards,

Nigel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-10 Thread Fengguang Wu
On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
> > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > > > >
> > > > > The fastest and probably most important thing to add is some readahead
> > > > > smarts to directories --- both to the htree and non-htree cases.  If
> > > >
> > > > Here's is a quick hack to practice the directory readahead idea.
> > > > Comments are welcome, it's a freshman's work :)
> > >
> > > Well, I'd probably have done it differently, but more important is whether
> > > this actually makes a difference performance-wise. Have you benchmarked it
> > > at all?
> > 
> > Yes, a trivial test shows a marginal improvement, on a minimal debian 
> > system:
> > 
> > # find / | wc -l
> > 13641
> > 
> > # time find / > /dev/null
> > 
> > real0m10.000s
> > user0m0.210s
> > sys 0m4.370s
> > 
> > # time find / > /dev/null
> > 
> > real0m9.890s
> > user0m0.160s
> > sys 0m3.270s
> > 
> > > Doing an
> > >
> > >   echo 3 > /proc/sys/vm/drop_caches
> > >
> > > is your friend for testing things like this, to force cold-cache
> > > behaviour..
> > 
> > Thanks, I'll work out numbers on large/concurrent dir accesses soon.
> 
> I gave it a try, and I'm afraid the results weren't pretty.
> 
> I did:
> 
> time find /usr/src | wc -l
> 
> on current git with (3 times) and without (5 times) the patch, and got
> 
> with:
> real   54.306, 54.327, 53.742s
> usr0.324, 0.284, 0.234s
> sys2.432, 2.484, 2.592s
> 
> without:
> real   24.413, 24.616, 24.080s
> usr0.208, 0.316, 0.312s
> sys:   2.496, 2.440, 2.540s
> 
> Subsequent runs without dropping caches did give a significant
> improvement in both cases (1.821/.188/1.632 is one result I wrote with
> the patch applied).

Thanks, Nigel.
But I'm very sorry that the calculation in the patch was wrong.

Would you give this new patch a run?

It produced pretty numbers here:

#!/bin/zsh

ROOT=/mnt/mnt
TIMEFMT="%E clock  %S kernel  %U user  %w+%c cs  %J"

echo 3 > /proc/sys/vm/drop_caches

# 49: enable dir readahead
# 50: disable
echo ${1:-50} > /proc/sys/vm/readahead_ratio

# time find $ROOT/a > /dev/null

time find /etch > /dev/null

# time find $ROOT/a > /dev/null&
# time grep -r asdf $ROOT/b > /dev/null&
# time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null&

exit 0

# collected results on a SATA disk:
# ./test-parallel-dir-reada.sh 49
4.18s clock  0.08s kernel  0.04s user  418+0 cs  find $ROOT/a > /dev/null
4.09s clock  0.10s kernel  0.02s user  410+1 cs  find $ROOT/a > /dev/null

# ./test-parallel-dir-reada.sh 50
12.18s clock  0.15s kernel  0.07s user  1520+4 cs  find $ROOT/a > /dev/null
11.99s clock  0.13s kernel  0.04s user  1558+6 cs  find $ROOT/a > /dev/null


# ./test-parallel-dir-reada.sh 49
4.01s clock  0.06s kernel  0.01s user  1567+2 cs  find /etch > /dev/null
4.08s clock  0.07s kernel  0.00s user  1568+0 cs  find /etch > /dev/null

# ./test-parallel-dir-reada.sh 50
4.10s clock  0.09s kernel  0.01s user  1578+1 cs  find /etch > /dev/null
4.19s clock  0.08s kernel  0.03s user  1578+0 cs  find /etch > /dev/null


# ./test-parallel-dir-reada.sh 49
7.73s clock  0.11s kernel  0.06s user  438+2 cs  find $ROOT/a > /dev/null
18.92s clock  0.43s kernel  0.02s user  1246+13 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
32.91s clock  4.20s kernel  1.55s user  103564+51 cs  grep -r asdf $ROOT/b > 
/dev/null

8.47s clock  0.10s kernel  0.02s user  442+4 cs  find $ROOT/a > /dev/null
19.24s clock  0.53s kernel  0.03s user  1250+23 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
29.93s clock  4.18s kernel  1.61s user  100425+47 cs  grep -r asdf $ROOT/b > 
/dev/null

# ./test-parallel-dir-reada.sh 50
17.87s clock  0.57s kernel  0.02s user  1244+21 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
21.30s clock  0.08s kernel  0.05s user  1517+5 cs  find $ROOT/a > /dev/null
49.68s clock  3.94s kernel  1.67s user  101520+57 cs  grep -r asdf $ROOT/b > 
/dev/null

15.66s clock  0.51s kernel  0.00s user  1248+25 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
22.15s clock  0.15s kernel  0.04s user  1520+5 cs  find $ROOT/a > /dev/null
46.14s clock  4.08s kernel  1.68s user  101517+63 cs  grep -r asdf $ROOT/b > 
/dev/null

Thanks,
Wu
---

Subject: ext3 readdir readahead

Do readahead for ext3_readdir().

Reasons to be aggressive:
- readdir() users are likely to traverse the whole directory,
  so readahead miss is not a concern.
- most dirs are small, so slow start is not good
- the htree indexing introduces some randomness,
  which can be helped by the aggressiveness.

So we do 128K sized readaheads, at twice the speed of reads.

The following actual readahead pages are collected for a dir with
11 entries:
32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19
That means a readahead hit ratio of
454/541 = 84%

The performance 

Re: How git affects kernel.org performance

2007-01-10 Thread Fengguang Wu
On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote:
 Hi.
 
 On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
  On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
  
  
   On Tue, 9 Jan 2007, Fengguang Wu wrote:

 The fastest and probably most important thing to add is some readahead
 smarts to directories --- both to the htree and non-htree cases.  If
   
Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)
  
   Well, I'd probably have done it differently, but more important is whether
   this actually makes a difference performance-wise. Have you benchmarked it
   at all?
  
  Yes, a trivial test shows a marginal improvement, on a minimal debian 
  system:
  
  # find / | wc -l
  13641
  
  # time find /  /dev/null
  
  real0m10.000s
  user0m0.210s
  sys 0m4.370s
  
  # time find /  /dev/null
  
  real0m9.890s
  user0m0.160s
  sys 0m3.270s
  
   Doing an
  
 echo 3  /proc/sys/vm/drop_caches
  
   is your friend for testing things like this, to force cold-cache
   behaviour..
  
  Thanks, I'll work out numbers on large/concurrent dir accesses soon.
 
 I gave it a try, and I'm afraid the results weren't pretty.
 
 I did:
 
 time find /usr/src | wc -l
 
 on current git with (3 times) and without (5 times) the patch, and got
 
 with:
 real   54.306, 54.327, 53.742s
 usr0.324, 0.284, 0.234s
 sys2.432, 2.484, 2.592s
 
 without:
 real   24.413, 24.616, 24.080s
 usr0.208, 0.316, 0.312s
 sys:   2.496, 2.440, 2.540s
 
 Subsequent runs without dropping caches did give a significant
 improvement in both cases (1.821/.188/1.632 is one result I wrote with
 the patch applied).

Thanks, Nigel.
But I'm very sorry that the calculation in the patch was wrong.

Would you give this new patch a run?

It produced pretty numbers here:

#!/bin/zsh

ROOT=/mnt/mnt
TIMEFMT=%E clock  %S kernel  %U user  %w+%c cs  %J

echo 3  /proc/sys/vm/drop_caches

# 49: enable dir readahead
# 50: disable
echo ${1:-50}  /proc/sys/vm/readahead_ratio

# time find $ROOT/a  /dev/null

time find /etch  /dev/null

# time find $ROOT/a  /dev/null
# time grep -r asdf $ROOT/b  /dev/null
# time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null

exit 0

# collected results on a SATA disk:
# ./test-parallel-dir-reada.sh 49
4.18s clock  0.08s kernel  0.04s user  418+0 cs  find $ROOT/a  /dev/null
4.09s clock  0.10s kernel  0.02s user  410+1 cs  find $ROOT/a  /dev/null

# ./test-parallel-dir-reada.sh 50
12.18s clock  0.15s kernel  0.07s user  1520+4 cs  find $ROOT/a  /dev/null
11.99s clock  0.13s kernel  0.04s user  1558+6 cs  find $ROOT/a  /dev/null


# ./test-parallel-dir-reada.sh 49
4.01s clock  0.06s kernel  0.01s user  1567+2 cs  find /etch  /dev/null
4.08s clock  0.07s kernel  0.00s user  1568+0 cs  find /etch  /dev/null

# ./test-parallel-dir-reada.sh 50
4.10s clock  0.09s kernel  0.01s user  1578+1 cs  find /etch  /dev/null
4.19s clock  0.08s kernel  0.03s user  1578+0 cs  find /etch  /dev/null


# ./test-parallel-dir-reada.sh 49
7.73s clock  0.11s kernel  0.06s user  438+2 cs  find $ROOT/a  /dev/null
18.92s clock  0.43s kernel  0.02s user  1246+13 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
32.91s clock  4.20s kernel  1.55s user  103564+51 cs  grep -r asdf $ROOT/b  
/dev/null

8.47s clock  0.10s kernel  0.02s user  442+4 cs  find $ROOT/a  /dev/null
19.24s clock  0.53s kernel  0.03s user  1250+23 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
29.93s clock  4.18s kernel  1.61s user  100425+47 cs  grep -r asdf $ROOT/b  
/dev/null

# ./test-parallel-dir-reada.sh 50
17.87s clock  0.57s kernel  0.02s user  1244+21 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
21.30s clock  0.08s kernel  0.05s user  1517+5 cs  find $ROOT/a  /dev/null
49.68s clock  3.94s kernel  1.67s user  101520+57 cs  grep -r asdf $ROOT/b  
/dev/null

15.66s clock  0.51s kernel  0.00s user  1248+25 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
22.15s clock  0.15s kernel  0.04s user  1520+5 cs  find $ROOT/a  /dev/null
46.14s clock  4.08s kernel  1.68s user  101517+63 cs  grep -r asdf $ROOT/b  
/dev/null

Thanks,
Wu
---

Subject: ext3 readdir readahead

Do readahead for ext3_readdir().

Reasons to be aggressive:
- readdir() users are likely to traverse the whole directory,
  so readahead miss is not a concern.
- most dirs are small, so slow start is not good
- the htree indexing introduces some randomness,
  which can be helped by the aggressiveness.

So we do 128K sized readaheads, at twice the speed of reads.

The following actual readahead pages are collected for a dir with
11 entries:
32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19
That means a readahead hit ratio of
454/541 = 84%

The performance is marginally better for a minimal debian system:
command:find /
baseline:   4.10s   4.19s
patched:4.01s   4.08s

And 

Re: How git affects kernel.org performance

2007-01-09 Thread Nigel Cunningham
Hi.

On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
> On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
> >
> >
> > On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > > >
> > > > The fastest and probably most important thing to add is some readahead
> > > > smarts to directories --- both to the htree and non-htree cases.  If
> > >
> > > Here's is a quick hack to practice the directory readahead idea.
> > > Comments are welcome, it's a freshman's work :)
> >
> > Well, I'd probably have done it differently, but more important is whether
> > this actually makes a difference performance-wise. Have you benchmarked it
> > at all?
> 
> Yes, a trivial test shows a marginal improvement, on a minimal debian system:
> 
> # find / | wc -l
> 13641
> 
> # time find / > /dev/null
> 
> real0m10.000s
> user0m0.210s
> sys 0m4.370s
> 
> # time find / > /dev/null
> 
> real0m9.890s
> user0m0.160s
> sys 0m3.270s
> 
> > Doing an
> >
> > echo 3 > /proc/sys/vm/drop_caches
> >
> > is your friend for testing things like this, to force cold-cache
> > behaviour..
> 
> Thanks, I'll work out numbers on large/concurrent dir accesses soon.

I gave it a try, and I'm afraid the results weren't pretty.

I did:

time find /usr/src | wc -l

on current git with (3 times) and without (5 times) the patch, and got

with:
real   54.306, 54.327, 53.742s
usr0.324, 0.284, 0.234s
sys2.432, 2.484, 2.592s

without:
real   24.413, 24.616, 24.080s
usr0.208, 0.316, 0.312s
sys:   2.496, 2.440, 2.540s

Subsequent runs without dropping caches did give a significant
improvement in both cases (1.821/.188/1.632 is one result I wrote with
the patch applied).

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Fengguang Wu
On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > >
> > > The fastest and probably most important thing to add is some readahead
> > > smarts to directories --- both to the htree and non-htree cases.  If
> >
> > Here's is a quick hack to practice the directory readahead idea.
> > Comments are welcome, it's a freshman's work :)
>
> Well, I'd probably have done it differently, but more important is whether
> this actually makes a difference performance-wise. Have you benchmarked it
> at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find / > /dev/null

real0m10.000s
user0m0.210s
sys 0m4.370s

# time find / > /dev/null

real0m9.890s
user0m0.160s
sys 0m3.270s

> Doing an
>
>   echo 3 > /proc/sys/vm/drop_caches
>
> is your friend for testing things like this, to force cold-cache
> behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Linus Torvalds


On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > 
> > The fastest and probably most important thing to add is some readahead
> > smarts to directories --- both to the htree and non-htree cases.  If
> 
> Here's is a quick hack to practice the directory readahead idea.
> Comments are welcome, it's a freshman's work :)

Well, I'd probably have done it differently, but more important is whether 
this actually makes a difference performance-wise. Have you benchmarked it 
at all?

Doing an

echo 3 > /proc/sys/vm/drop_caches

is your friend for testing things like this, to force cold-cache 
behaviour..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Fengguang Wu
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > > Yeah, slowly-growing directories will get splattered all over the disk.
> > > 
> > > Possible short-term fixes would be to just allocate up to (say) eight
> > > blocks when we grow a directory by one block.  Or teach the
> > > directory-growth code to use ext3 reservations.
> > > 
> > > Longer-term people are talking about things like on-disk rerservations.
> > > But I expect directories are being forgotten about in all of that.
> > 
> > By on-disk reservations, do you mean persistent file preallocation ? (that
> > is explicit preallocation of blocks to a given file) If so, you are
> > right, we haven't really given any thought to the possibility of directories
> > needing that feature.
> 
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases.  If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
 fs/ext3/dir.c   |   22 ++
 fs/ext3/inode.c |2 +-
 2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
 }
 
+int ext3_get_block(struct inode *inode, sector_t iblock,
+   struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+   struct inode *inode = filp->f_path.dentry->d_inode;
+   struct address_space *mapping = 
inode->i_sb->s_bdev->bd_inode->i_mapping;
+   unsigned long sector;
+   unsigned long blk;
+   pgoff_t offset;
+
+   for (blk = 0; blk < inode->i_blocks; blk++) {
+   sector = blk << (inode->i_blkbits - 9);
+   sector = generic_block_bmap(inode->i_mapping, sector, 
ext3_get_block);
+   offset = sector >> (PAGE_CACHE_SHIFT - 9);
+   do_page_cache_readahead(mapping, filp, offset, 1);
+   }
+}
+
 static int ext3_readdir(struct file * filp,
 void * dirent, filldir_t filldir)
 {
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi
 
sb = inode->i_sb;
 
+   if (!filp->f_pos)
+   ext3_dir_readahead(filp);
+
 #ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:
 
 #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
 
-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
 {
handle_t *handle = journal_current_handle();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Fengguang Wu
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
 On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
   Yeah, slowly-growing directories will get splattered all over the disk.
   
   Possible short-term fixes would be to just allocate up to (say) eight
   blocks when we grow a directory by one block.  Or teach the
   directory-growth code to use ext3 reservations.
   
   Longer-term people are talking about things like on-disk rerservations.
   But I expect directories are being forgotten about in all of that.
  
  By on-disk reservations, do you mean persistent file preallocation ? (that
  is explicit preallocation of blocks to a given file) If so, you are
  right, we haven't really given any thought to the possibility of directories
  needing that feature.
 
 The fastest and probably most important thing to add is some readahead
 smarts to directories --- both to the htree and non-htree cases.  If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
 fs/ext3/dir.c   |   22 ++
 fs/ext3/inode.c |2 +-
 2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
 }
 
+int ext3_get_block(struct inode *inode, sector_t iblock,
+   struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+   struct inode *inode = filp-f_path.dentry-d_inode;
+   struct address_space *mapping = 
inode-i_sb-s_bdev-bd_inode-i_mapping;
+   unsigned long sector;
+   unsigned long blk;
+   pgoff_t offset;
+
+   for (blk = 0; blk  inode-i_blocks; blk++) {
+   sector = blk  (inode-i_blkbits - 9);
+   sector = generic_block_bmap(inode-i_mapping, sector, 
ext3_get_block);
+   offset = sector  (PAGE_CACHE_SHIFT - 9);
+   do_page_cache_readahead(mapping, filp, offset, 1);
+   }
+}
+
 static int ext3_readdir(struct file * filp,
 void * dirent, filldir_t filldir)
 {
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi
 
sb = inode-i_sb;
 
+   if (!filp-f_pos)
+   ext3_dir_readahead(filp);
+
 #ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode-i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) 
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:
 
 #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
 
-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
 {
handle_t *handle = journal_current_handle();
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Linus Torvalds


On Tue, 9 Jan 2007, Fengguang Wu wrote:
  
  The fastest and probably most important thing to add is some readahead
  smarts to directories --- both to the htree and non-htree cases.  If
 
 Here's is a quick hack to practice the directory readahead idea.
 Comments are welcome, it's a freshman's work :)

Well, I'd probably have done it differently, but more important is whether 
this actually makes a difference performance-wise. Have you benchmarked it 
at all?

Doing an

echo 3  /proc/sys/vm/drop_caches

is your friend for testing things like this, to force cold-cache 
behaviour..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Fengguang Wu
On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:


 On Tue, 9 Jan 2007, Fengguang Wu wrote:
  
   The fastest and probably most important thing to add is some readahead
   smarts to directories --- both to the htree and non-htree cases.  If
 
  Here's is a quick hack to practice the directory readahead idea.
  Comments are welcome, it's a freshman's work :)

 Well, I'd probably have done it differently, but more important is whether
 this actually makes a difference performance-wise. Have you benchmarked it
 at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find /  /dev/null

real0m10.000s
user0m0.210s
sys 0m4.370s

# time find /  /dev/null

real0m9.890s
user0m0.160s
sys 0m3.270s

 Doing an

   echo 3  /proc/sys/vm/drop_caches

 is your friend for testing things like this, to force cold-cache
 behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-09 Thread Nigel Cunningham
Hi.

On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
 On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
 
 
  On Tue, 9 Jan 2007, Fengguang Wu wrote:
   
The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
  
   Here's is a quick hack to practice the directory readahead idea.
   Comments are welcome, it's a freshman's work :)
 
  Well, I'd probably have done it differently, but more important is whether
  this actually makes a difference performance-wise. Have you benchmarked it
  at all?
 
 Yes, a trivial test shows a marginal improvement, on a minimal debian system:
 
 # find / | wc -l
 13641
 
 # time find /  /dev/null
 
 real0m10.000s
 user0m0.210s
 sys 0m4.370s
 
 # time find /  /dev/null
 
 real0m9.890s
 user0m0.160s
 sys 0m3.270s
 
  Doing an
 
  echo 3  /proc/sys/vm/drop_caches
 
  is your friend for testing things like this, to force cold-cache
  behaviour..
 
 Thanks, I'll work out numbers on large/concurrent dir accesses soon.

I gave it a try, and I'm afraid the results weren't pretty.

I did:

time find /usr/src | wc -l

on current git with (3 times) and without (5 times) the patch, and got

with:
real   54.306, 54.327, 53.742s
usr0.324, 0.284, 0.234s
sys2.432, 2.484, 2.592s

without:
real   24.413, 24.616, 24.080s
usr0.208, 0.316, 0.312s
sys:   2.496, 2.440, 2.540s

Subsequent runs without dropping caches did give a significant
improvement in both cases (1.821/.188/1.632 is one result I wrote with
the patch applied).

Regards,

Nigel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Jeremy Higdon
On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote:
> Jeff wrote:
> > Something I just thought of:  ATA and SCSI hard disks do their own
> > read-ahead.
> 
> Probably this is wishful thinking on my part, but I would have hoped
> that most of the read-ahead they did was for stuff that happened to be
> on the cylinder they were reading anyway.  So long as their read-ahead
> doesn't cause much extra or delayed disk head motion, what does it
> matter?


And they usually won't readahead if there is another command to
process, though they can be set up to read unrequested data in
spite of outstanding commands.

When they are reading ahead, they'll only fetch LBAs beyond the last
request until a buffer fills or the readahead gets interrupted.

jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Paul Jackson
Jeff wrote:
> Something I just thought of:  ATA and SCSI hard disks do their own
> read-ahead.

Probably this is wishful thinking on my part, but I would have hoped
that most of the read-ahead they did was for stuff that happened to be
on the cylinder they were reading anyway.  So long as their read-ahead
doesn't cause much extra or delayed disk head motion, what does it
matter?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Nicolas Pitre
On Sun, 7 Jan 2007, Shawn O. Pearce wrote:

> Krzysztof Halasa <[EMAIL PROTECTED]> wrote:
> > Hmm... Perhaps it should be possible to push git updates as a pack
> > file only? I mean, the pack file would stay packed = never individual
> > files and never 256 directories?
> 
> Latest Git does this.  If the server is later than 1.4.3.3 then
> the receive-pack process can actually store the pack file rather
> than unpacking it into loose objects.  The downside is that it will
> copy any missing base objects onto the end of a thin pack to make
> it not-thin.

No.  There are no thin packs for pushes.  And IMHO it should stay that 
way exactly to avoid this little inconvenience on servers.

The fetch case is a different story of course.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > Would e2fsck -D help? What kind of optimization
> > > does it perform?
> > 
> > It will help a little; e2fsck -D compresses the logical view of the
> > directory, but it doesn't optimize the physical layout on disk at all,
> > and of course, it won't help with the lack of readahead logic.  It's
> > possible to improve how e2fsck -D works, at the moment, it's not
> > trying to make the directory be contiguous on disk.  What it should
> > probably do is to pull a list of all of the blocks used by the
> > directory, sort them, and then try to see if it can improve on the
> > list by allocating some new blocks that would make the directory more
> > contiguous on disk.  I suspect any improvements that would be seen by
> > doing this would be second order effects at most, though.
> 
> ...sounds like a job for e2defrag, not e2fsck...

I wasn't proposing to move other data blocks around in order make the
directory be contiguous, but just a "quick and dirty" try to make
things better.  But yes, in order to really fix layout issues you
would have to do a full defrag, and it's probably more important that
we try to fix things so that defragmentation runs aren't necessary in
the first place

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Johannes Stezenbach
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> 
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases.  If
> you're using some kind of b-tree structure, such as XFS does for
> directories, preallocation doesn't help you much.  Delayed allocation
> can save you if your delayed allocator knows how to structure disk
> blocks so that a btree-traversal is efficient, but I'm guessing the
> biggest reason why we are losing is because we don't have sufficient
> readahead.  This also has the advantage that it will help without
> needing to doing a backup/restore to improve layout.

Would e2fsck -D help? What kind of optimization
does it perform?


Thanks,
Johannes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Pavel Machek
Hi!

> > Would e2fsck -D help? What kind of optimization
> > does it perform?
> 
> It will help a little; e2fsck -D compresses the logical view of the
> directory, but it doesn't optimize the physical layout on disk at all,
> and of course, it won't help with the lack of readahead logic.  It's
> possible to improve how e2fsck -D works, at the moment, it's not
> trying to make the directory be contiguous on disk.  What it should
> probably do is to pull a list of all of the blocks used by the
> directory, sort them, and then try to see if it can improve on the
> list by allocating some new blocks that would make the directory more
> contiguous on disk.  I suspect any improvements that would be seen by
> doing this would be second order effects at most, though.

...sounds like a job for e2defrag, not e2fsck...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote:
> 
> Would e2fsck -D help? What kind of optimization
> does it perform?

It will help a little; e2fsck -D compresses the logical view of the
directory, but it doesn't optimize the physical layout on disk at all,
and of course, it won't help with the lack of readahead logic.  It's
possible to improve how e2fsck -D works, at the moment, it's not
trying to make the directory be contiguous on disk.  What it should
probably do is to pull a list of all of the blocks used by the
directory, sort them, and then try to see if it can improve on the
list by allocating some new blocks that would make the directory more
contiguous on disk.  I suspect any improvements that would be seen by
doing this would be second order effects at most, though.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Jeff Garzik

Theodore Tso wrote:

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.



Something I just thought of:  ATA and SCSI hard disks do their own 
read-ahead.  Seeking all over the place to pick up bits of directory 
will hurt even more with the disk reading and throwing away data (albeit 
in its internal elevator and cache).


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > Yeah, slowly-growing directories will get splattered all over the disk.
> > 
> > Possible short-term fixes would be to just allocate up to (say) eight
> > blocks when we grow a directory by one block.  Or teach the
> > directory-growth code to use ext3 reservations.
> > 
> > Longer-term people are talking about things like on-disk rerservations.
> > But I expect directories are being forgotten about in all of that.
> 
> By on-disk reservations, do you mean persistent file preallocation ? (that
> is explicit preallocation of blocks to a given file) If so, you are
> right, we haven't really given any thought to the possibility of directories
> needing that feature.

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.

Allocating some number of empty blocks when we grow the directory
would be a quick hack that I'd probably do as a 2nd priority.  It
won't help pre-existing directories, but combined with readahead
logic, should help us out greatly in the non-btree case.  

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
  Yeah, slowly-growing directories will get splattered all over the disk.
  
  Possible short-term fixes would be to just allocate up to (say) eight
  blocks when we grow a directory by one block.  Or teach the
  directory-growth code to use ext3 reservations.
  
  Longer-term people are talking about things like on-disk rerservations.
  But I expect directories are being forgotten about in all of that.
 
 By on-disk reservations, do you mean persistent file preallocation ? (that
 is explicit preallocation of blocks to a given file) If so, you are
 right, we haven't really given any thought to the possibility of directories
 needing that feature.

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.

Allocating some number of empty blocks when we grow the directory
would be a quick hack that I'd probably do as a 2nd priority.  It
won't help pre-existing directories, but combined with readahead
logic, should help us out greatly in the non-btree case.  

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Jeff Garzik

Theodore Tso wrote:

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.



Something I just thought of:  ATA and SCSI hard disks do their own 
read-ahead.  Seeking all over the place to pick up bits of directory 
will hurt even more with the disk reading and throwing away data (albeit 
in its internal elevator and cache).


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote:
 
 Would e2fsck -D help? What kind of optimization
 does it perform?

It will help a little; e2fsck -D compresses the logical view of the
directory, but it doesn't optimize the physical layout on disk at all,
and of course, it won't help with the lack of readahead logic.  It's
possible to improve how e2fsck -D works, at the moment, it's not
trying to make the directory be contiguous on disk.  What it should
probably do is to pull a list of all of the blocks used by the
directory, sort them, and then try to see if it can improve on the
list by allocating some new blocks that would make the directory more
contiguous on disk.  I suspect any improvements that would be seen by
doing this would be second order effects at most, though.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Pavel Machek
Hi!

  Would e2fsck -D help? What kind of optimization
  does it perform?
 
 It will help a little; e2fsck -D compresses the logical view of the
 directory, but it doesn't optimize the physical layout on disk at all,
 and of course, it won't help with the lack of readahead logic.  It's
 possible to improve how e2fsck -D works, at the moment, it's not
 trying to make the directory be contiguous on disk.  What it should
 probably do is to pull a list of all of the blocks used by the
 directory, sort them, and then try to see if it can improve on the
 list by allocating some new blocks that would make the directory more
 contiguous on disk.  I suspect any improvements that would be seen by
 doing this would be second order effects at most, though.

...sounds like a job for e2defrag, not e2fsck...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Johannes Stezenbach
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
 
 The fastest and probably most important thing to add is some readahead
 smarts to directories --- both to the htree and non-htree cases.  If
 you're using some kind of b-tree structure, such as XFS does for
 directories, preallocation doesn't help you much.  Delayed allocation
 can save you if your delayed allocator knows how to structure disk
 blocks so that a btree-traversal is efficient, but I'm guessing the
 biggest reason why we are losing is because we don't have sufficient
 readahead.  This also has the advantage that it will help without
 needing to doing a backup/restore to improve layout.

Would e2fsck -D help? What kind of optimization
does it perform?


Thanks,
Johannes
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote:
 Hi!
 
   Would e2fsck -D help? What kind of optimization
   does it perform?
  
  It will help a little; e2fsck -D compresses the logical view of the
  directory, but it doesn't optimize the physical layout on disk at all,
  and of course, it won't help with the lack of readahead logic.  It's
  possible to improve how e2fsck -D works, at the moment, it's not
  trying to make the directory be contiguous on disk.  What it should
  probably do is to pull a list of all of the blocks used by the
  directory, sort them, and then try to see if it can improve on the
  list by allocating some new blocks that would make the directory more
  contiguous on disk.  I suspect any improvements that would be seen by
  doing this would be second order effects at most, though.
 
 ...sounds like a job for e2defrag, not e2fsck...

I wasn't proposing to move other data blocks around in order make the
directory be contiguous, but just a quick and dirty try to make
things better.  But yes, in order to really fix layout issues you
would have to do a full defrag, and it's probably more important that
we try to fix things so that defragmentation runs aren't necessary in
the first place

- Ted

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Nicolas Pitre
On Sun, 7 Jan 2007, Shawn O. Pearce wrote:

 Krzysztof Halasa [EMAIL PROTECTED] wrote:
  Hmm... Perhaps it should be possible to push git updates as a pack
  file only? I mean, the pack file would stay packed = never individual
  files and never 256 directories?
 
 Latest Git does this.  If the server is later than 1.4.3.3 then
 the receive-pack process can actually store the pack file rather
 than unpacking it into loose objects.  The downside is that it will
 copy any missing base objects onto the end of a thin pack to make
 it not-thin.

No.  There are no thin packs for pushes.  And IMHO it should stay that 
way exactly to avoid this little inconvenience on servers.

The fetch case is a different story of course.


Nicolas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Paul Jackson
Jeff wrote:
 Something I just thought of:  ATA and SCSI hard disks do their own
 read-ahead.

Probably this is wishful thinking on my part, but I would have hoped
that most of the read-ahead they did was for stuff that happened to be
on the cylinder they were reading anyway.  So long as their read-ahead
doesn't cause much extra or delayed disk head motion, what does it
matter?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-08 Thread Jeremy Higdon
On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote:
 Jeff wrote:
  Something I just thought of:  ATA and SCSI hard disks do their own
  read-ahead.
 
 Probably this is wishful thinking on my part, but I would have hoped
 that most of the read-ahead they did was for stuff that happened to be
 on the cylinder they were reading anyway.  So long as their read-ahead
 doesn't cause much extra or delayed disk head motion, what does it
 matter?


And they usually won't readahead if there is another command to
process, though they can be set up to read unrequested data in
spite of outstanding commands.

When they are reading ahead, they'll only fetch LBAs beyond the last
request until a buffer fills or the readahead gets interrupted.

jeremy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Suparna Bhattacharya
On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote:
> On Sun, 7 Jan 2007 09:55:26 +0100
> Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > > >
> > > > During extremely high load, it appears that what slows kernel.org down 
> > > > more
> > > > than anything else is the time that each individual getdents() call 
> > > > takes.
> > > > When I've looked this I've observed times from 200 ms to almost 2 
> > > > seconds!
> > > > Since an unpacked *OR* unpruned git tree adds 256 directories to a 
> > > > cleanly
> > > > packed tree, you can do the math yourself.
> > >
> > > "getdents()" is totally serialized by the inode semaphore. It's one of the
> > > most expensive system calls in Linux, partly because of that, and partly
> > > because it has to call all the way down into the filesystem in a way that
> > > almost no other common system call has to (99% of all filesystem calls can
> > > be handled basically at the VFS layer with generic caches - but not
> > > getdents()).
> > >
> > > So if there are concurrent readdirs on the same directory, they get
> > > serialized. If there is any file creation/deletion activity in the
> > > directory, it serializes getdents().
> > >
> > > To make matters worse, I don't think it has any read-ahead at all when you
> > > use hashed directory entries. So if you have cold-cache case, you'll read
> > > every single block totally individually, and serialized. One block at a
> > > time (I think the non-hashed case is likely also suspect, but that's a
> > > separate issue)
> > >
> > > In other words, I'm not at all surprised it hits on filldir time.
> > > Especially on ext3.
> >
> > At work, we had the same problem on a file server with ext3. We use rsync
> > to make backups to a local IDE disk, and we noticed that getdents() took
> > about the same time as Peter reports (0.2 to 2 seconds), especially in
> > maildir directories. We tried many things to fix it with no result,
> > including enabling dirindexes. Finally, we made a full backup, and switched
> > over to XFS and the problem totally disappeared. So it seems that the
> > filesystem matters a lot here when there are lots of entries in a
> > directory, and that ext3 is not suitable for usages with thousands
> > of entries in directories with millions of files on disk. I'm not
> > certain it would be that easy to try other filesystems on kernel.org
> > though :-/
> >
> 
> Yeah, slowly-growing directories will get splattered all over the disk.
> 
> Possible short-term fixes would be to just allocate up to (say) eight
> blocks when we grow a directory by one block.  Or teach the
> directory-growth code to use ext3 reservations.
> 
> Longer-term people are talking about things like on-disk rerservations.
> But I expect directories are being forgotten about in all of that.

By on-disk reservations, do you mean persistent file preallocation ? (that
is explicit preallocation of blocks to a given file) If so, you are
right, we haven't really given any thought to the possibility of directories
needing that feature.

Regards
Suparna

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Jakub Narebski
Robert Fitzsimons wrote:

>> Some more data on how git affects kernel.org...
> 
> I have a quick question about the gitweb configuration, does the
> $projects_list config entry point to a directory or a file?

It can point to both. Usually it is either unset, and then we
do find over $projectroot, or it is a file (URI escaped path
relative to $projectroot, SPACE, and URI escaped owner of a project;
you can get the file clicking on TXT on projects_list page).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Shawn O. Pearce
Krzysztof Halasa <[EMAIL PROTECTED]> wrote:
> Hmm... Perhaps it should be possible to push git updates as a pack
> file only? I mean, the pack file would stay packed = never individual
> files and never 256 directories?

Latest Git does this.  If the server is later than 1.4.3.3 then
the receive-pack process can actually store the pack file rather
than unpacking it into loose objects.  The downside is that it will
copy any missing base objects onto the end of a thin pack to make
it not-thin.

There's actually a limit that controls when to keep the pack and when
not to (receive.unpackLimit).  In 1.4.3.3 this defaulted to 5000
objects, which meant all but the largest pushes will be exploded
into loose objects.  In 1.5.0-rc0 that limit changed from 5000 to
100, though Nico did a lot of study and discovered that the optimum
is likely 3.  But that tends to create too many pack files so 100
was arbitrarily chosen.

So if the user pushes <100 objects to a 1.5.0-rc0 server we unpack
to loose; >= 100 we keep the pack file.  Perhaps this would help
kernel.org.
 
-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Randy Dunlap wrote:
> 
> ISTM that Linus is trying to make 2.6.20-final before LCA.  We'll see.

No. Hopefully "final -rc" before LCA, but I'll do the actual 2.6.20 
release afterwards. I don't want to have a merge window during LCA, as I 
and many others will all be out anyway. So it's much better to have LCA 
happen during the end of the stabilization phase when there's hopefully 
not a lot going on.

(Of course, often at the end of the stabilization phase there is all the 
"ok, what about regression XyZ?" panic)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Jon Smirl wrote:
> > 
> >  - proper read-ahead. Right now, even if the directory is totally
> >contiguous on disk (just remove the thing that writes data to the
> >files, so that you'll have empty files instead of 8kB files), I think
> >we do those reads totally synchronously if the filesystem was mounted
> >with directory hashing enabled.
> 
> What's the status on the Adaptive Read-ahead patch from Wu Fengguang
> <[EMAIL PROTECTED]> ? That patch really helped with read ahead
> problems I was having with mmap. It was in mm forever and I've lost
> track of it.

Won't help. ext3 does NO readahead at all. It doesn't use the general VFS 
helper routines to read data (because it doesn't use the page cache), it 
just does the raw buffer-head IO directly.

(In the non-indexed case, it does do some read-ahead, and it uses the 
generic routines for it, but because it does everything by physical 
address, even the generic routines will decide that it's just doing random 
reading if the directory isn't physically contiguous - and stop reading 
ahead).

(I may have missed some case where it does do read-ahead in the index 
routines, so don't take my word as being unquestionably true. I'm _fairly_ 
sure, but..)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Randy Dunlap
On Sun, 7 Jan 2007 20:07:43 +0100 (MET) Jan Engelhardt wrote:

> 
> On Jan 7 2007 10:49, Randy Dunlap wrote:
> >On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:
> >> On Jan 7 2007 10:03, Willy Tarreau wrote:
> >> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
> >> >> >[..]
> >> >> >entries in directories with millions of files on disk. I'm not
> >> >> >certain it would be that easy to try other filesystems on
> >> >> >kernel.org though :-/
> >> >> 
> >> >> Changing filesystems would mean about a week of downtime for a server. 
> >> >> It's painful, but it's doable; however, if we get a traffic spike 
> >> >> during 
> >> >> that time it'll hurt like hell.
> >> 
> >> Then make sure noone releases a kernel ;-)
> >
> >maybe the week of LCA ?

Sorry, it means Linux.conf.au (Australia):
  http://lca2007.linux.org.au/
Jan. 15-20, 2007

> I don't know that acronym, but if you ask me when it should happen:
> _Before_ the next big thing is released, e.g. before 2.6.20-final.
> Reason: You never know how long they're chewing [downloading] on 2.6.20.
> Excluding other projects on kernel.org from my hypothesis, I'd suppose the
> lowest bandwidth usage the longer no new files have been released. (Because
> everyone has them then more or less.)

ISTM that Linus is trying to make 2.6.20-final before LCA.  We'll see.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Jan Engelhardt

On Jan 7 2007 10:49, Randy Dunlap wrote:
>On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:
>> On Jan 7 2007 10:03, Willy Tarreau wrote:
>> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
>> >> >[..]
>> >> >entries in directories with millions of files on disk. I'm not
>> >> >certain it would be that easy to try other filesystems on
>> >> >kernel.org though :-/
>> >> 
>> >> Changing filesystems would mean about a week of downtime for a server. 
>> >> It's painful, but it's doable; however, if we get a traffic spike during 
>> >> that time it'll hurt like hell.
>> 
>> Then make sure noone releases a kernel ;-)
>
>maybe the week of LCA ?

I don't know that acronym, but if you ask me when it should happen:
_Before_ the next big thing is released, e.g. before 2.6.20-final.
Reason: You never know how long they're chewing [downloading] on 2.6.20.
Excluding other projects on kernel.org from my hypothesis, I'd suppose the
lowest bandwidth usage the longer no new files have been released. (Because
everyone has them then more or less.)


-`J'
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Linus Torvalds wrote:
> 
> A year or two ago I did a totally half-assed code for the non-hashed 
> readdir that improved performance by an order of magnitude for ext3 for a 
> test-case of mine, but it was subtly buggy and didn't do the hashed case 
> AT ALL.

Btw, this isn't the test-case, but it's a half-way re-creation of 
something like it. It's _really_ stupid, but here's what you can do:

 - compile and run this idiotic program. It creates a directory called 
   "throwaway" that is ~44kB in size, and if I did things right, it should 
   not be totally contiguous on disk with the current ext3 allocation 
   logic.

 - as root, do "echo 3 > /proc/sys/vm/drop_caches" to get a cache-cold 
   schenario.

 - do "time ls throwaway > /dev/null".

I don't know what people consider to be reasonable performance, but for 
me, it takes about half a second to do a simple "ls". NOTE! This is _not_ 
reading inode stat information or anything like that. It literally takes 
0.3-0.4 seconds to read ~44kB off the disk. That's a whopping 125kB/s 
throughput on a reasonably fast modern disk.

That's what we in the industry call  "sad".

And that's on a totally unloaded machine. There was _nothing_ else going 
on. No IO congestion, no nothing. Just the cost of synchronously doing 
ten or eleven disk reads.

The fix?

 - proper read-ahead. Right now, even if the directory is totally 
   contiguous on disk (just remove the thing that writes data to the 
   files, so that you'll have empty files instead of 8kB files), I think 
   we do those reads totally synchronously if the filesystem was mounted 
   with directory hashing enabled.

   Without hashing, the directory will be much smaller too, so readdir() 
   will have less data to read. And it _should_ do some readahead, 
   although in my testing, the best I could do was still 0.185s for a (now 
   shrunken) 28kB directory. 

 - better directory block allocation patterns would likely help a lot, 
   rather than single blocks. That's true even without any read-ahead (at 
   least the disk wouldn't need to seek, and any on-disk track buffers etc 
   would work better), but with read-ahead and contiguous blocks it should 
   be just a couple of IO's (the indirect stuff means that it's more than 
   one), and so you should see much better IO patterns because the 
   elevator can try to help too.

Maybe I just have unrealistic expectations, but I really don't like how a 
fairly small 50kB directory takes an appreciable fraction of a second to 
read.

Once it's cached, it still takes too long, but at least at that point the 
individual getdents calls take just tens of microseconds.

Here's cold-cache numbers (notice: 34 msec for the first one, and 17 msec 
in the middle.. The 5-6ms range indicates a single IO for the intermediate 
ones, which basically says that each call does roughly one IO, except the 
first one that does ~5 (probably the indirect index blocks), and two in 
the middle who are able to fill up the buffer from the IO done by the 
previous one (4kB buffers, so if the previous getdents() happened to just 
read the beginning of a block, the next one might be able to fill 
everything from that block without having to do IO).

getdents(3, /* 103 entries */, 4096)= 4088 <0.034830>
getdents(3, /* 102 entries */, 4096)= 4080 <0.006703>
getdents(3, /* 102 entries */, 4096)= 4080 <0.006719>
getdents(3, /* 102 entries */, 4096)= 4080 <0.000354>
getdents(3, /* 102 entries */, 4096)= 4080 <0.17>
getdents(3, /* 102 entries */, 4096)= 4080 <0.005302>
getdents(3, /* 102 entries */, 4096)= 4080 <0.016957>
getdents(3, /* 102 entries */, 4096)= 4080 <0.17>
getdents(3, /* 102 entries */, 4096)= 4080 <0.003530>
getdents(3, /* 83 entries */, 4096) = 3320 <0.000296>
getdents(3, /* 0 entries */, 4096)  = 0 <0.06>

Here's the pure CPU overhead: still pretty high (200 usec! For a single 
system call! That's disgusting! In contrast, a 4kB read() call takes 7 
usec on this machine, so the overhead of doing things one dentry at a 
time, and calling down to several layers of filesystem is quite high):

getdents(3, /* 103 entries */, 4096)= 4088 <0.000204>
getdents(3, /* 102 entries */, 4096)= 4080 <0.000122>
getdents(3, /* 102 entries */, 4096)= 4080 <0.000112>
getdents(3, /* 102 entries */, 4096)= 4080 <0.000153>
getdents(3, /* 102 entries */, 4096)= 4080 <0.18>
getdents(3, /* 102 entries */, 4096)= 4080 <0.000103>
getdents(3, /* 102 entries */, 4096)= 4080 <0.000217>
getdents(3, /* 102 entries */, 4096)= 4080 <0.18>
getdents(3, /* 102 entries */, 4096)= 4080 <0.95>
getdents(3, /* 83 entries */, 4096) = 3320 <0.89>
getdents(3, /* 0 entries */, 4096)  = 0 <0.06>

but you can see 

Re: How git affects kernel.org performance

2007-01-07 Thread J.H.
With my gitweb caching changes this isn't as big of a deal as the front
page is only generated once every 10 minutes or so (and with the changes
I'm working on today that timeout will be variable)

- John

On Sun, 2007-01-07 at 14:57 +, Robert Fitzsimons wrote:
> > Some more data on how git affects kernel.org...
> 
> I have a quick question about the gitweb configuration, does the
> $projects_list config entry point to a directory or a file?
> 
> When it is a directory gitweb ends up doing the equivalent of a 'find
> $project_list' to find all the available projects, so it really should
> be changed to a projects list file.
> 
> Robert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Randy Dunlap
On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:

> 
> On Jan 7 2007 10:03, Willy Tarreau wrote:
> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
> >> >[..]
> >> >entries in directories with millions of files on disk. I'm not
> >> >certain it would be that easy to try other filesystems on
> >> >kernel.org though :-/
> >> 
> >> Changing filesystems would mean about a week of downtime for a server. 
> >> It's painful, but it's doable; however, if we get a traffic spike during 
> >> that time it'll hurt like hell.
> 
> Then make sure noone releases a kernel ;-)

maybe the week of LCA ?

> >> However, if there is credible reasons to believe XFS will help, I'd be 
> >> inclined to try it out.
> >
> >Hmmm I'm thinking about something very dirty : would it be possible
> >to reduce the current FS size to get more space to create another
> >FS ? Supposing you create a XX GB/TB XFS after the current ext3,
> >you would be able to mount it in some directories with --bind and
> >slowly switch some parts to it. The problem with this approach is
> >that it will never be 100% converted, but as an experiment it might
> >be worth it, no ?
> 
> Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync
> again to catch any new files that have been added until the ftp
> upload was closed, then do _one_ (technically two) mountpoint moves
> (as opposed to Willy's idea of "some directories") in a mere second
> along the lines of
> 
>   mount --move /oldfs /older; mount --move /newfs /oldfs.
> 
> let old transfers that still use files in /older complete (lsof or
> fuser -m), then disconnect the old volume. In case /newfs (now
> /oldfs) is a volume you borrowed from someone and need to return it,
> well, I guess you need to rsync back somehow.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Christoph Hellwig wrote:
>
> On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> > The problem is that I have no sufficient FS knowledge to argument why
> > it helps here. It was a desperate attempt to fix the problem for us
> > and it definitely worked well.
> 
> XFS does rather efficient btree directories, and it does sophisticated
> readahead for directories.  I suspect that's what is helping you there.

The sad part is that this is a long-standing issue, and the directory 
reading code in ext3 really _should_ be able to do ok. 

A year or two ago I did a totally half-assed code for the non-hashed 
readdir that improved performance by an order of magnitude for ext3 for a 
test-case of mine, but it was subtly buggy and didn't do the hashed case 
AT ALL. Andrew fixed it up so that it at least wasn't subtly buggy any 
more, but in the process it also lost all capability of doing fragmented 
directories (so it doesn't help very much any more under exactly the 
situation that is the worst case), and it still doesn't do the hashed 
directory case.

It's my personal pet peeve with ext3 (as Andrew can attest). And it's 
really sad, because I don't think it is fundamental per se, but the way 
the directory handling and jdb are done, it's apparently very hard to fix.

(It's clearly not _impossible_ to do: I think that it should be possible 
to treat ext3 directories the same way we treat files, except they would 
always be in "data=journal" mode. But I understand ext2, not ext3 (and 
absolutely not jbd), so I'm not going to be able to do anything about it 
personally).

Anyway, I think that disabling hashing can actually help. And I suspect 
that even with hashing enabled, there should be some quick hack for making 
the directory reading at least be able to do multiple outstanding reads in 
parallel, instead of reading the blocks totally synchronously ("read five 
blocks, then wait for the one we care" rather than the current "read one 
block at a time, wait for it, read the next one, wait for it.." 
situation).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Krzysztof Halasa
"H. Peter Anvin" <[EMAIL PROTECTED]> writes:

> During extremely high load, it appears that what slows kernel.org down
> more than anything else is the time that each individual getdents()
> call takes.  When I've looked this I've observed times from 200 ms to
> almost 2 seconds!  Since an unpacked *OR* unpruned git tree adds 256
> directories to a cleanly packed tree, you can do the math yourself.

Hmm... Perhaps it should be possible to push git updates as a pack
file only? I mean, the pack file would stay packed = never individual
files and never 256 directories?

People aren't doing commit/etc. activity there, right?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Robert Fitzsimons
> Some more data on how git affects kernel.org...

I have a quick question about the gitweb configuration, does the
$projects_list config entry point to a directory or a file?

When it is a directory gitweb ends up doing the equivalent of a 'find
$project_list' to find all the available projects, so it really should
be changed to a projects list file.

Robert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Jan Engelhardt

On Jan 7 2007 10:03, Willy Tarreau wrote:
>On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
>> >[..]
>> >entries in directories with millions of files on disk. I'm not
>> >certain it would be that easy to try other filesystems on
>> >kernel.org though :-/
>> 
>> Changing filesystems would mean about a week of downtime for a server. 
>> It's painful, but it's doable; however, if we get a traffic spike during 
>> that time it'll hurt like hell.

Then make sure noone releases a kernel ;-)

>> However, if there is credible reasons to believe XFS will help, I'd be 
>> inclined to try it out.
>
>Hmmm I'm thinking about something very dirty : would it be possible
>to reduce the current FS size to get more space to create another
>FS ? Supposing you create a XX GB/TB XFS after the current ext3,
>you would be able to mount it in some directories with --bind and
>slowly switch some parts to it. The problem with this approach is
>that it will never be 100% converted, but as an experiment it might
>be worth it, no ?

Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync
again to catch any new files that have been added until the ftp
upload was closed, then do _one_ (technically two) mountpoint moves
(as opposed to Willy's idea of "some directories") in a mere second
along the lines of

  mount --move /oldfs /older; mount --move /newfs /oldfs.

let old transfers that still use files in /older complete (lsof or
fuser -m), then disconnect the old volume. In case /newfs (now
/oldfs) is a volume you borrowed from someone and need to return it,
well, I guess you need to rsync back somehow.


-`J'
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2007 at 10:28:53AM +, Christoph Hellwig wrote:
> On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> > The problem is that I have no sufficient FS knowledge to argument why
> > it helps here. It was a desperate attempt to fix the problem for us
> > and it definitely worked well.
> 
> XFS does rather efficient btree directories, and it does sophisticated
> readahead for directories.  I suspect that's what is helping you there.

Ok. Do you too think it might help (or even solve) the problem on
kernel.org ?

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Christoph Hellwig
On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> The problem is that I have no sufficient FS knowledge to argument why
> it helps here. It was a desperate attempt to fix the problem for us
> and it definitely worked well.

XFS does rather efficient btree directories, and it does sophisticated
readahead for directories.  I suspect that's what is helping you there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Rene Herman

On 01/07/2007 10:15 AM, Andrew Morton wrote:


Yeah, slowly-growing directories will get splattered all over the
disk.

Possible short-term fixes would be to just allocate up to (say) eight
 blocks when we grow a directory by one block.  Or teach the 
directory-growth code to use ext3 reservations.


Longer-term people are talking about things like on-disk
rerservations. But I expect directories are being forgotten about in
all of that.


I wish people would just talk about de2fsrag... ;-\

Rene

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Andrew Morton
On Sun, 7 Jan 2007 09:55:26 +0100
Willy Tarreau <[EMAIL PROTECTED]> wrote:

> On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> > 
> > 
> > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > > 
> > > During extremely high load, it appears that what slows kernel.org down 
> > > more
> > > than anything else is the time that each individual getdents() call takes.
> > > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > > packed tree, you can do the math yourself.
> > 
> > "getdents()" is totally serialized by the inode semaphore. It's one of the 
> > most expensive system calls in Linux, partly because of that, and partly 
> > because it has to call all the way down into the filesystem in a way that 
> > almost no other common system call has to (99% of all filesystem calls can 
> > be handled basically at the VFS layer with generic caches - but not 
> > getdents()).
> > 
> > So if there are concurrent readdirs on the same directory, they get 
> > serialized. If there is any file creation/deletion activity in the 
> > directory, it serializes getdents(). 
> > 
> > To make matters worse, I don't think it has any read-ahead at all when you 
> > use hashed directory entries. So if you have cold-cache case, you'll read 
> > every single block totally individually, and serialized. One block at a 
> > time (I think the non-hashed case is likely also suspect, but that's a 
> > separate issue)
> > 
> > In other words, I'm not at all surprised it hits on filldir time. 
> > Especially on ext3.
> 
> At work, we had the same problem on a file server with ext3. We use rsync
> to make backups to a local IDE disk, and we noticed that getdents() took
> about the same time as Peter reports (0.2 to 2 seconds), especially in
> maildir directories. We tried many things to fix it with no result,
> including enabling dirindexes. Finally, we made a full backup, and switched
> over to XFS and the problem totally disappeared. So it seems that the
> filesystem matters a lot here when there are lots of entries in a
> directory, and that ext3 is not suitable for usages with thousands
> of entries in directories with millions of files on disk. I'm not
> certain it would be that easy to try other filesystems on kernel.org
> though :-/
> 

Yeah, slowly-growing directories will get splattered all over the disk.

Possible short-term fixes would be to just allocate up to (say) eight
blocks when we grow a directory by one block.  Or teach the
directory-growth code to use ext3 reservations.

Longer-term people are talking about things like on-disk rerservations. 
But I expect directories are being forgotten about in all of that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
> Willy Tarreau wrote:
> >
> >At work, we had the same problem on a file server with ext3. We use rsync
> >to make backups to a local IDE disk, and we noticed that getdents() took
> >about the same time as Peter reports (0.2 to 2 seconds), especially in
> >maildir directories. We tried many things to fix it with no result,
> >including enabling dirindexes. Finally, we made a full backup, and switched
> >over to XFS and the problem totally disappeared. So it seems that the
> >filesystem matters a lot here when there are lots of entries in a
> >directory, and that ext3 is not suitable for usages with thousands
> >of entries in directories with millions of files on disk. I'm not
> >certain it would be that easy to try other filesystems on kernel.org
> >though :-/
> >
> 
> Changing filesystems would mean about a week of downtime for a server. 
> It's painful, but it's doable; however, if we get a traffic spike during 
> that time it'll hurt like hell.
> 
> However, if there is credible reasons to believe XFS will help, I'd be 
> inclined to try it out.

The problem is that I have no sufficient FS knowledge to argument why
it helps here. It was a desperate attempt to fix the problem for us
and it definitely worked well.

Hmmm I'm thinking about something very dirty : would it be possible
to reduce the current FS size to get more space to create another
FS ? Supposing you create a XX GB/TB XFS after the current ext3,
you would be able to mount it in some directories with --bind and
slowly switch some parts to it. The problem with this approach is
that it will never be 100% converted, but as an experiment it might
be worth it, no ?

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread H. Peter Anvin

Willy Tarreau wrote:


At work, we had the same problem on a file server with ext3. We use rsync
to make backups to a local IDE disk, and we noticed that getdents() took
about the same time as Peter reports (0.2 to 2 seconds), especially in
maildir directories. We tried many things to fix it with no result,
including enabling dirindexes. Finally, we made a full backup, and switched
over to XFS and the problem totally disappeared. So it seems that the
filesystem matters a lot here when there are lots of entries in a
directory, and that ext3 is not suitable for usages with thousands
of entries in directories with millions of files on disk. I'm not
certain it would be that easy to try other filesystems on kernel.org
though :-/



Changing filesystems would mean about a week of downtime for a server. 
It's painful, but it's doable; however, if we get a traffic spike during 
that time it'll hurt like hell.


However, if there is credible reasons to believe XFS will help, I'd be 
inclined to try it out.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Willy Tarreau
On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > 
> > During extremely high load, it appears that what slows kernel.org down more
> > than anything else is the time that each individual getdents() call takes.
> > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > packed tree, you can do the math yourself.
> 
> "getdents()" is totally serialized by the inode semaphore. It's one of the 
> most expensive system calls in Linux, partly because of that, and partly 
> because it has to call all the way down into the filesystem in a way that 
> almost no other common system call has to (99% of all filesystem calls can 
> be handled basically at the VFS layer with generic caches - but not 
> getdents()).
> 
> So if there are concurrent readdirs on the same directory, they get 
> serialized. If there is any file creation/deletion activity in the 
> directory, it serializes getdents(). 
> 
> To make matters worse, I don't think it has any read-ahead at all when you 
> use hashed directory entries. So if you have cold-cache case, you'll read 
> every single block totally individually, and serialized. One block at a 
> time (I think the non-hashed case is likely also suspect, but that's a 
> separate issue)
> 
> In other words, I'm not at all surprised it hits on filldir time. 
> Especially on ext3.

At work, we had the same problem on a file server with ext3. We use rsync
to make backups to a local IDE disk, and we noticed that getdents() took
about the same time as Peter reports (0.2 to 2 seconds), especially in
maildir directories. We tried many things to fix it with no result,
including enabling dirindexes. Finally, we made a full backup, and switched
over to XFS and the problem totally disappeared. So it seems that the
filesystem matters a lot here when there are lots of entries in a
directory, and that ext3 is not suitable for usages with thousands
of entries in directories with millions of files on disk. I'm not
certain it would be that easy to try other filesystems on kernel.org
though :-/

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Willy Tarreau
On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
 
 
 On Sat, 6 Jan 2007, H. Peter Anvin wrote:
  
  During extremely high load, it appears that what slows kernel.org down more
  than anything else is the time that each individual getdents() call takes.
  When I've looked this I've observed times from 200 ms to almost 2 seconds!
  Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
  packed tree, you can do the math yourself.
 
 getdents() is totally serialized by the inode semaphore. It's one of the 
 most expensive system calls in Linux, partly because of that, and partly 
 because it has to call all the way down into the filesystem in a way that 
 almost no other common system call has to (99% of all filesystem calls can 
 be handled basically at the VFS layer with generic caches - but not 
 getdents()).
 
 So if there are concurrent readdirs on the same directory, they get 
 serialized. If there is any file creation/deletion activity in the 
 directory, it serializes getdents(). 
 
 To make matters worse, I don't think it has any read-ahead at all when you 
 use hashed directory entries. So if you have cold-cache case, you'll read 
 every single block totally individually, and serialized. One block at a 
 time (I think the non-hashed case is likely also suspect, but that's a 
 separate issue)
 
 In other words, I'm not at all surprised it hits on filldir time. 
 Especially on ext3.

At work, we had the same problem on a file server with ext3. We use rsync
to make backups to a local IDE disk, and we noticed that getdents() took
about the same time as Peter reports (0.2 to 2 seconds), especially in
maildir directories. We tried many things to fix it with no result,
including enabling dirindexes. Finally, we made a full backup, and switched
over to XFS and the problem totally disappeared. So it seems that the
filesystem matters a lot here when there are lots of entries in a
directory, and that ext3 is not suitable for usages with thousands
of entries in directories with millions of files on disk. I'm not
certain it would be that easy to try other filesystems on kernel.org
though :-/

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread H. Peter Anvin

Willy Tarreau wrote:


At work, we had the same problem on a file server with ext3. We use rsync
to make backups to a local IDE disk, and we noticed that getdents() took
about the same time as Peter reports (0.2 to 2 seconds), especially in
maildir directories. We tried many things to fix it with no result,
including enabling dirindexes. Finally, we made a full backup, and switched
over to XFS and the problem totally disappeared. So it seems that the
filesystem matters a lot here when there are lots of entries in a
directory, and that ext3 is not suitable for usages with thousands
of entries in directories with millions of files on disk. I'm not
certain it would be that easy to try other filesystems on kernel.org
though :-/



Changing filesystems would mean about a week of downtime for a server. 
It's painful, but it's doable; however, if we get a traffic spike during 
that time it'll hurt like hell.


However, if there is credible reasons to believe XFS will help, I'd be 
inclined to try it out.


-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
 Willy Tarreau wrote:
 
 At work, we had the same problem on a file server with ext3. We use rsync
 to make backups to a local IDE disk, and we noticed that getdents() took
 about the same time as Peter reports (0.2 to 2 seconds), especially in
 maildir directories. We tried many things to fix it with no result,
 including enabling dirindexes. Finally, we made a full backup, and switched
 over to XFS and the problem totally disappeared. So it seems that the
 filesystem matters a lot here when there are lots of entries in a
 directory, and that ext3 is not suitable for usages with thousands
 of entries in directories with millions of files on disk. I'm not
 certain it would be that easy to try other filesystems on kernel.org
 though :-/
 
 
 Changing filesystems would mean about a week of downtime for a server. 
 It's painful, but it's doable; however, if we get a traffic spike during 
 that time it'll hurt like hell.
 
 However, if there is credible reasons to believe XFS will help, I'd be 
 inclined to try it out.

The problem is that I have no sufficient FS knowledge to argument why
it helps here. It was a desperate attempt to fix the problem for us
and it definitely worked well.

Hmmm I'm thinking about something very dirty : would it be possible
to reduce the current FS size to get more space to create another
FS ? Supposing you create a XX GB/TB XFS after the current ext3,
you would be able to mount it in some directories with --bind and
slowly switch some parts to it. The problem with this approach is
that it will never be 100% converted, but as an experiment it might
be worth it, no ?

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Andrew Morton
On Sun, 7 Jan 2007 09:55:26 +0100
Willy Tarreau [EMAIL PROTECTED] wrote:

 On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
  
  
  On Sat, 6 Jan 2007, H. Peter Anvin wrote:
   
   During extremely high load, it appears that what slows kernel.org down 
   more
   than anything else is the time that each individual getdents() call takes.
   When I've looked this I've observed times from 200 ms to almost 2 seconds!
   Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
   packed tree, you can do the math yourself.
  
  getdents() is totally serialized by the inode semaphore. It's one of the 
  most expensive system calls in Linux, partly because of that, and partly 
  because it has to call all the way down into the filesystem in a way that 
  almost no other common system call has to (99% of all filesystem calls can 
  be handled basically at the VFS layer with generic caches - but not 
  getdents()).
  
  So if there are concurrent readdirs on the same directory, they get 
  serialized. If there is any file creation/deletion activity in the 
  directory, it serializes getdents(). 
  
  To make matters worse, I don't think it has any read-ahead at all when you 
  use hashed directory entries. So if you have cold-cache case, you'll read 
  every single block totally individually, and serialized. One block at a 
  time (I think the non-hashed case is likely also suspect, but that's a 
  separate issue)
  
  In other words, I'm not at all surprised it hits on filldir time. 
  Especially on ext3.
 
 At work, we had the same problem on a file server with ext3. We use rsync
 to make backups to a local IDE disk, and we noticed that getdents() took
 about the same time as Peter reports (0.2 to 2 seconds), especially in
 maildir directories. We tried many things to fix it with no result,
 including enabling dirindexes. Finally, we made a full backup, and switched
 over to XFS and the problem totally disappeared. So it seems that the
 filesystem matters a lot here when there are lots of entries in a
 directory, and that ext3 is not suitable for usages with thousands
 of entries in directories with millions of files on disk. I'm not
 certain it would be that easy to try other filesystems on kernel.org
 though :-/
 

Yeah, slowly-growing directories will get splattered all over the disk.

Possible short-term fixes would be to just allocate up to (say) eight
blocks when we grow a directory by one block.  Or teach the
directory-growth code to use ext3 reservations.

Longer-term people are talking about things like on-disk rerservations. 
But I expect directories are being forgotten about in all of that.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Rene Herman

On 01/07/2007 10:15 AM, Andrew Morton wrote:


Yeah, slowly-growing directories will get splattered all over the
disk.

Possible short-term fixes would be to just allocate up to (say) eight
 blocks when we grow a directory by one block.  Or teach the 
directory-growth code to use ext3 reservations.


Longer-term people are talking about things like on-disk
rerservations. But I expect directories are being forgotten about in
all of that.


I wish people would just talk about de2fsrag... ;-\

Rene

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Christoph Hellwig
On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
 The problem is that I have no sufficient FS knowledge to argument why
 it helps here. It was a desperate attempt to fix the problem for us
 and it definitely worked well.

XFS does rather efficient btree directories, and it does sophisticated
readahead for directories.  I suspect that's what is helping you there.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Willy Tarreau
On Sun, Jan 07, 2007 at 10:28:53AM +, Christoph Hellwig wrote:
 On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
  The problem is that I have no sufficient FS knowledge to argument why
  it helps here. It was a desperate attempt to fix the problem for us
  and it definitely worked well.
 
 XFS does rather efficient btree directories, and it does sophisticated
 readahead for directories.  I suspect that's what is helping you there.

Ok. Do you too think it might help (or even solve) the problem on
kernel.org ?

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Jan Engelhardt

On Jan 7 2007 10:03, Willy Tarreau wrote:
On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
 [..]
 entries in directories with millions of files on disk. I'm not
 certain it would be that easy to try other filesystems on
 kernel.org though :-/
 
 Changing filesystems would mean about a week of downtime for a server. 
 It's painful, but it's doable; however, if we get a traffic spike during 
 that time it'll hurt like hell.

Then make sure noone releases a kernel ;-)

 However, if there is credible reasons to believe XFS will help, I'd be 
 inclined to try it out.

Hmmm I'm thinking about something very dirty : would it be possible
to reduce the current FS size to get more space to create another
FS ? Supposing you create a XX GB/TB XFS after the current ext3,
you would be able to mount it in some directories with --bind and
slowly switch some parts to it. The problem with this approach is
that it will never be 100% converted, but as an experiment it might
be worth it, no ?

Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync
again to catch any new files that have been added until the ftp
upload was closed, then do _one_ (technically two) mountpoint moves
(as opposed to Willy's idea of some directories) in a mere second
along the lines of

  mount --move /oldfs /older; mount --move /newfs /oldfs.

let old transfers that still use files in /older complete (lsof or
fuser -m), then disconnect the old volume. In case /newfs (now
/oldfs) is a volume you borrowed from someone and need to return it,
well, I guess you need to rsync back somehow.


-`J'
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Robert Fitzsimons
 Some more data on how git affects kernel.org...

I have a quick question about the gitweb configuration, does the
$projects_list config entry point to a directory or a file?

When it is a directory gitweb ends up doing the equivalent of a 'find
$project_list' to find all the available projects, so it really should
be changed to a projects list file.

Robert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Krzysztof Halasa
H. Peter Anvin [EMAIL PROTECTED] writes:

 During extremely high load, it appears that what slows kernel.org down
 more than anything else is the time that each individual getdents()
 call takes.  When I've looked this I've observed times from 200 ms to
 almost 2 seconds!  Since an unpacked *OR* unpruned git tree adds 256
 directories to a cleanly packed tree, you can do the math yourself.

Hmm... Perhaps it should be possible to push git updates as a pack
file only? I mean, the pack file would stay packed = never individual
files and never 256 directories?

People aren't doing commit/etc. activity there, right?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Christoph Hellwig wrote:

 On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
  The problem is that I have no sufficient FS knowledge to argument why
  it helps here. It was a desperate attempt to fix the problem for us
  and it definitely worked well.
 
 XFS does rather efficient btree directories, and it does sophisticated
 readahead for directories.  I suspect that's what is helping you there.

The sad part is that this is a long-standing issue, and the directory 
reading code in ext3 really _should_ be able to do ok. 

A year or two ago I did a totally half-assed code for the non-hashed 
readdir that improved performance by an order of magnitude for ext3 for a 
test-case of mine, but it was subtly buggy and didn't do the hashed case 
AT ALL. Andrew fixed it up so that it at least wasn't subtly buggy any 
more, but in the process it also lost all capability of doing fragmented 
directories (so it doesn't help very much any more under exactly the 
situation that is the worst case), and it still doesn't do the hashed 
directory case.

It's my personal pet peeve with ext3 (as Andrew can attest). And it's 
really sad, because I don't think it is fundamental per se, but the way 
the directory handling and jdb are done, it's apparently very hard to fix.

(It's clearly not _impossible_ to do: I think that it should be possible 
to treat ext3 directories the same way we treat files, except they would 
always be in data=journal mode. But I understand ext2, not ext3 (and 
absolutely not jbd), so I'm not going to be able to do anything about it 
personally).

Anyway, I think that disabling hashing can actually help. And I suspect 
that even with hashing enabled, there should be some quick hack for making 
the directory reading at least be able to do multiple outstanding reads in 
parallel, instead of reading the blocks totally synchronously (read five 
blocks, then wait for the one we care rather than the current read one 
block at a time, wait for it, read the next one, wait for it.. 
situation).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Randy Dunlap
On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:

 
 On Jan 7 2007 10:03, Willy Tarreau wrote:
 On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
  [..]
  entries in directories with millions of files on disk. I'm not
  certain it would be that easy to try other filesystems on
  kernel.org though :-/
  
  Changing filesystems would mean about a week of downtime for a server. 
  It's painful, but it's doable; however, if we get a traffic spike during 
  that time it'll hurt like hell.
 
 Then make sure noone releases a kernel ;-)

maybe the week of LCA ?

  However, if there is credible reasons to believe XFS will help, I'd be 
  inclined to try it out.
 
 Hmmm I'm thinking about something very dirty : would it be possible
 to reduce the current FS size to get more space to create another
 FS ? Supposing you create a XX GB/TB XFS after the current ext3,
 you would be able to mount it in some directories with --bind and
 slowly switch some parts to it. The problem with this approach is
 that it will never be 100% converted, but as an experiment it might
 be worth it, no ?
 
 Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync
 again to catch any new files that have been added until the ftp
 upload was closed, then do _one_ (technically two) mountpoint moves
 (as opposed to Willy's idea of some directories) in a mere second
 along the lines of
 
   mount --move /oldfs /older; mount --move /newfs /oldfs.
 
 let old transfers that still use files in /older complete (lsof or
 fuser -m), then disconnect the old volume. In case /newfs (now
 /oldfs) is a volume you borrowed from someone and need to return it,
 well, I guess you need to rsync back somehow.

---
~Randy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread J.H.
With my gitweb caching changes this isn't as big of a deal as the front
page is only generated once every 10 minutes or so (and with the changes
I'm working on today that timeout will be variable)

- John

On Sun, 2007-01-07 at 14:57 +, Robert Fitzsimons wrote:
  Some more data on how git affects kernel.org...
 
 I have a quick question about the gitweb configuration, does the
 $projects_list config entry point to a directory or a file?
 
 When it is a directory gitweb ends up doing the equivalent of a 'find
 $project_list' to find all the available projects, so it really should
 be changed to a projects list file.
 
 Robert

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Linus Torvalds wrote:
 
 A year or two ago I did a totally half-assed code for the non-hashed 
 readdir that improved performance by an order of magnitude for ext3 for a 
 test-case of mine, but it was subtly buggy and didn't do the hashed case 
 AT ALL.

Btw, this isn't the test-case, but it's a half-way re-creation of 
something like it. It's _really_ stupid, but here's what you can do:

 - compile and run this idiotic program. It creates a directory called 
   throwaway that is ~44kB in size, and if I did things right, it should 
   not be totally contiguous on disk with the current ext3 allocation 
   logic.

 - as root, do echo 3  /proc/sys/vm/drop_caches to get a cache-cold 
   schenario.

 - do time ls throwaway  /dev/null.

I don't know what people consider to be reasonable performance, but for 
me, it takes about half a second to do a simple ls. NOTE! This is _not_ 
reading inode stat information or anything like that. It literally takes 
0.3-0.4 seconds to read ~44kB off the disk. That's a whopping 125kB/s 
throughput on a reasonably fast modern disk.

That's what we in the industry call  sad.

And that's on a totally unloaded machine. There was _nothing_ else going 
on. No IO congestion, no nothing. Just the cost of synchronously doing 
ten or eleven disk reads.

The fix?

 - proper read-ahead. Right now, even if the directory is totally 
   contiguous on disk (just remove the thing that writes data to the 
   files, so that you'll have empty files instead of 8kB files), I think 
   we do those reads totally synchronously if the filesystem was mounted 
   with directory hashing enabled.

   Without hashing, the directory will be much smaller too, so readdir() 
   will have less data to read. And it _should_ do some readahead, 
   although in my testing, the best I could do was still 0.185s for a (now 
   shrunken) 28kB directory. 

 - better directory block allocation patterns would likely help a lot, 
   rather than single blocks. That's true even without any read-ahead (at 
   least the disk wouldn't need to seek, and any on-disk track buffers etc 
   would work better), but with read-ahead and contiguous blocks it should 
   be just a couple of IO's (the indirect stuff means that it's more than 
   one), and so you should see much better IO patterns because the 
   elevator can try to help too.

Maybe I just have unrealistic expectations, but I really don't like how a 
fairly small 50kB directory takes an appreciable fraction of a second to 
read.

Once it's cached, it still takes too long, but at least at that point the 
individual getdents calls take just tens of microseconds.

Here's cold-cache numbers (notice: 34 msec for the first one, and 17 msec 
in the middle.. The 5-6ms range indicates a single IO for the intermediate 
ones, which basically says that each call does roughly one IO, except the 
first one that does ~5 (probably the indirect index blocks), and two in 
the middle who are able to fill up the buffer from the IO done by the 
previous one (4kB buffers, so if the previous getdents() happened to just 
read the beginning of a block, the next one might be able to fill 
everything from that block without having to do IO).

getdents(3, /* 103 entries */, 4096)= 4088 0.034830
getdents(3, /* 102 entries */, 4096)= 4080 0.006703
getdents(3, /* 102 entries */, 4096)= 4080 0.006719
getdents(3, /* 102 entries */, 4096)= 4080 0.000354
getdents(3, /* 102 entries */, 4096)= 4080 0.17
getdents(3, /* 102 entries */, 4096)= 4080 0.005302
getdents(3, /* 102 entries */, 4096)= 4080 0.016957
getdents(3, /* 102 entries */, 4096)= 4080 0.17
getdents(3, /* 102 entries */, 4096)= 4080 0.003530
getdents(3, /* 83 entries */, 4096) = 3320 0.000296
getdents(3, /* 0 entries */, 4096)  = 0 0.06

Here's the pure CPU overhead: still pretty high (200 usec! For a single 
system call! That's disgusting! In contrast, a 4kB read() call takes 7 
usec on this machine, so the overhead of doing things one dentry at a 
time, and calling down to several layers of filesystem is quite high):

getdents(3, /* 103 entries */, 4096)= 4088 0.000204
getdents(3, /* 102 entries */, 4096)= 4080 0.000122
getdents(3, /* 102 entries */, 4096)= 4080 0.000112
getdents(3, /* 102 entries */, 4096)= 4080 0.000153
getdents(3, /* 102 entries */, 4096)= 4080 0.18
getdents(3, /* 102 entries */, 4096)= 4080 0.000103
getdents(3, /* 102 entries */, 4096)= 4080 0.000217
getdents(3, /* 102 entries */, 4096)= 4080 0.18
getdents(3, /* 102 entries */, 4096)= 4080 0.95
getdents(3, /* 83 entries */, 4096) = 3320 0.89
getdents(3, /* 0 entries */, 4096)  = 0 0.06

but you can see the difference.. The real cost is obviously the IO.

 

Re: How git affects kernel.org performance

2007-01-07 Thread Jan Engelhardt

On Jan 7 2007 10:49, Randy Dunlap wrote:
On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:
 On Jan 7 2007 10:03, Willy Tarreau wrote:
 On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
  [..]
  entries in directories with millions of files on disk. I'm not
  certain it would be that easy to try other filesystems on
  kernel.org though :-/
  
  Changing filesystems would mean about a week of downtime for a server. 
  It's painful, but it's doable; however, if we get a traffic spike during 
  that time it'll hurt like hell.
 
 Then make sure noone releases a kernel ;-)

maybe the week of LCA ?

I don't know that acronym, but if you ask me when it should happen:
_Before_ the next big thing is released, e.g. before 2.6.20-final.
Reason: You never know how long they're chewing [downloading] on 2.6.20.
Excluding other projects on kernel.org from my hypothesis, I'd suppose the
lowest bandwidth usage the longer no new files have been released. (Because
everyone has them then more or less.)


-`J'
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Randy Dunlap
On Sun, 7 Jan 2007 20:07:43 +0100 (MET) Jan Engelhardt wrote:

 
 On Jan 7 2007 10:49, Randy Dunlap wrote:
 On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:
  On Jan 7 2007 10:03, Willy Tarreau wrote:
  On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
   [..]
   entries in directories with millions of files on disk. I'm not
   certain it would be that easy to try other filesystems on
   kernel.org though :-/
   
   Changing filesystems would mean about a week of downtime for a server. 
   It's painful, but it's doable; however, if we get a traffic spike 
   during 
   that time it'll hurt like hell.
  
  Then make sure noone releases a kernel ;-)
 
 maybe the week of LCA ?

Sorry, it means Linux.conf.au (Australia):
  http://lca2007.linux.org.au/
Jan. 15-20, 2007

 I don't know that acronym, but if you ask me when it should happen:
 _Before_ the next big thing is released, e.g. before 2.6.20-final.
 Reason: You never know how long they're chewing [downloading] on 2.6.20.
 Excluding other projects on kernel.org from my hypothesis, I'd suppose the
 lowest bandwidth usage the longer no new files have been released. (Because
 everyone has them then more or less.)

ISTM that Linus is trying to make 2.6.20-final before LCA.  We'll see.

---
~Randy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Jon Smirl wrote:
  
   - proper read-ahead. Right now, even if the directory is totally
 contiguous on disk (just remove the thing that writes data to the
 files, so that you'll have empty files instead of 8kB files), I think
 we do those reads totally synchronously if the filesystem was mounted
 with directory hashing enabled.
 
 What's the status on the Adaptive Read-ahead patch from Wu Fengguang
 [EMAIL PROTECTED] ? That patch really helped with read ahead
 problems I was having with mmap. It was in mm forever and I've lost
 track of it.

Won't help. ext3 does NO readahead at all. It doesn't use the general VFS 
helper routines to read data (because it doesn't use the page cache), it 
just does the raw buffer-head IO directly.

(In the non-indexed case, it does do some read-ahead, and it uses the 
generic routines for it, but because it does everything by physical 
address, even the generic routines will decide that it's just doing random 
reading if the directory isn't physically contiguous - and stop reading 
ahead).

(I may have missed some case where it does do read-ahead in the index 
routines, so don't take my word as being unquestionably true. I'm _fairly_ 
sure, but..)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Linus Torvalds


On Sun, 7 Jan 2007, Randy Dunlap wrote:
 
 ISTM that Linus is trying to make 2.6.20-final before LCA.  We'll see.

No. Hopefully final -rc before LCA, but I'll do the actual 2.6.20 
release afterwards. I don't want to have a merge window during LCA, as I 
and many others will all be out anyway. So it's much better to have LCA 
happen during the end of the stabilization phase when there's hopefully 
not a lot going on.

(Of course, often at the end of the stabilization phase there is all the 
ok, what about regression XyZ? panic)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Shawn O. Pearce
Krzysztof Halasa [EMAIL PROTECTED] wrote:
 Hmm... Perhaps it should be possible to push git updates as a pack
 file only? I mean, the pack file would stay packed = never individual
 files and never 256 directories?

Latest Git does this.  If the server is later than 1.4.3.3 then
the receive-pack process can actually store the pack file rather
than unpacking it into loose objects.  The downside is that it will
copy any missing base objects onto the end of a thin pack to make
it not-thin.

There's actually a limit that controls when to keep the pack and when
not to (receive.unpackLimit).  In 1.4.3.3 this defaulted to 5000
objects, which meant all but the largest pushes will be exploded
into loose objects.  In 1.5.0-rc0 that limit changed from 5000 to
100, though Nico did a lot of study and discovered that the optimum
is likely 3.  But that tends to create too many pack files so 100
was arbitrarily chosen.

So if the user pushes 100 objects to a 1.5.0-rc0 server we unpack
to loose; = 100 we keep the pack file.  Perhaps this would help
kernel.org.
 
-- 
Shawn.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Jakub Narebski
Robert Fitzsimons wrote:

 Some more data on how git affects kernel.org...
 
 I have a quick question about the gitweb configuration, does the
 $projects_list config entry point to a directory or a file?

It can point to both. Usually it is either unset, and then we
do find over $projectroot, or it is a file (URI escaped path
relative to $projectroot, SPACE, and URI escaped owner of a project;
you can get the file clicking on TXT on projects_list page).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-07 Thread Suparna Bhattacharya
On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote:
 On Sun, 7 Jan 2007 09:55:26 +0100
 Willy Tarreau [EMAIL PROTECTED] wrote:
 
  On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
  
  
   On Sat, 6 Jan 2007, H. Peter Anvin wrote:
   
During extremely high load, it appears that what slows kernel.org down 
more
than anything else is the time that each individual getdents() call 
takes.
When I've looked this I've observed times from 200 ms to almost 2 
seconds!
Since an unpacked *OR* unpruned git tree adds 256 directories to a 
cleanly
packed tree, you can do the math yourself.
  
   getdents() is totally serialized by the inode semaphore. It's one of the
   most expensive system calls in Linux, partly because of that, and partly
   because it has to call all the way down into the filesystem in a way that
   almost no other common system call has to (99% of all filesystem calls can
   be handled basically at the VFS layer with generic caches - but not
   getdents()).
  
   So if there are concurrent readdirs on the same directory, they get
   serialized. If there is any file creation/deletion activity in the
   directory, it serializes getdents().
  
   To make matters worse, I don't think it has any read-ahead at all when you
   use hashed directory entries. So if you have cold-cache case, you'll read
   every single block totally individually, and serialized. One block at a
   time (I think the non-hashed case is likely also suspect, but that's a
   separate issue)
  
   In other words, I'm not at all surprised it hits on filldir time.
   Especially on ext3.
 
  At work, we had the same problem on a file server with ext3. We use rsync
  to make backups to a local IDE disk, and we noticed that getdents() took
  about the same time as Peter reports (0.2 to 2 seconds), especially in
  maildir directories. We tried many things to fix it with no result,
  including enabling dirindexes. Finally, we made a full backup, and switched
  over to XFS and the problem totally disappeared. So it seems that the
  filesystem matters a lot here when there are lots of entries in a
  directory, and that ext3 is not suitable for usages with thousands
  of entries in directories with millions of files on disk. I'm not
  certain it would be that easy to try other filesystems on kernel.org
  though :-/
 
 
 Yeah, slowly-growing directories will get splattered all over the disk.
 
 Possible short-term fixes would be to just allocate up to (say) eight
 blocks when we grow a directory by one block.  Or teach the
 directory-growth code to use ext3 reservations.
 
 Longer-term people are talking about things like on-disk rerservations.
 But I expect directories are being forgotten about in all of that.

By on-disk reservations, do you mean persistent file preallocation ? (that
is explicit preallocation of blocks to a given file) If so, you are
right, we haven't really given any thought to the possibility of directories
needing that feature.

Regards
Suparna

 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-06 Thread Linus Torvalds


On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> 
> During extremely high load, it appears that what slows kernel.org down more
> than anything else is the time that each individual getdents() call takes.
> When I've looked this I've observed times from 200 ms to almost 2 seconds!
> Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> packed tree, you can do the math yourself.

"getdents()" is totally serialized by the inode semaphore. It's one of the 
most expensive system calls in Linux, partly because of that, and partly 
because it has to call all the way down into the filesystem in a way that 
almost no other common system call has to (99% of all filesystem calls can 
be handled basically at the VFS layer with generic caches - but not 
getdents()).

So if there are concurrent readdirs on the same directory, they get 
serialized. If there is any file creation/deletion activity in the 
directory, it serializes getdents(). 

To make matters worse, I don't think it has any read-ahead at all when you 
use hashed directory entries. So if you have cold-cache case, you'll read 
every single block totally individually, and serialized. One block at a 
time (I think the non-hashed case is likely also suspect, but that's a 
separate issue)

In other words, I'm not at all surprised it hits on filldir time. 
Especially on ext3.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


How git affects kernel.org performance

2007-01-06 Thread H. Peter Anvin

Some more data on how git affects kernel.org...

During extremely high load, it appears that what slows kernel.org down 
more than anything else is the time that each individual getdents() call 
takes.  When I've looked this I've observed times from 200 ms to almost 
2 seconds!  Since an unpacked *OR* unpruned git tree adds 256 
directories to a cleanly packed tree, you can do the math yourself.


I have tried reducing vm.vfs_cache_pressure down to 1 on the kernel.org 
machines in order to improve the situation, but even at that point it 
appears the kernel doesn't readily hold the entire directory hierarchy 
in memory, even though there is space to do so.  I have suggested that 
we might want to add a sysctl to change the denominator from the default 
100.


The one thing that we need done locally is to have a smart uploader, 
instead of relying on rsync.  That, unfortunately, is a fairly sizable 
project.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


How git affects kernel.org performance

2007-01-06 Thread H. Peter Anvin

Some more data on how git affects kernel.org...

During extremely high load, it appears that what slows kernel.org down 
more than anything else is the time that each individual getdents() call 
takes.  When I've looked this I've observed times from 200 ms to almost 
2 seconds!  Since an unpacked *OR* unpruned git tree adds 256 
directories to a cleanly packed tree, you can do the math yourself.


I have tried reducing vm.vfs_cache_pressure down to 1 on the kernel.org 
machines in order to improve the situation, but even at that point it 
appears the kernel doesn't readily hold the entire directory hierarchy 
in memory, even though there is space to do so.  I have suggested that 
we might want to add a sysctl to change the denominator from the default 
100.


The one thing that we need done locally is to have a smart uploader, 
instead of relying on rsync.  That, unfortunately, is a fairly sizable 
project.


-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How git affects kernel.org performance

2007-01-06 Thread Linus Torvalds


On Sat, 6 Jan 2007, H. Peter Anvin wrote:
 
 During extremely high load, it appears that what slows kernel.org down more
 than anything else is the time that each individual getdents() call takes.
 When I've looked this I've observed times from 200 ms to almost 2 seconds!
 Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
 packed tree, you can do the math yourself.

getdents() is totally serialized by the inode semaphore. It's one of the 
most expensive system calls in Linux, partly because of that, and partly 
because it has to call all the way down into the filesystem in a way that 
almost no other common system call has to (99% of all filesystem calls can 
be handled basically at the VFS layer with generic caches - but not 
getdents()).

So if there are concurrent readdirs on the same directory, they get 
serialized. If there is any file creation/deletion activity in the 
directory, it serializes getdents(). 

To make matters worse, I don't think it has any read-ahead at all when you 
use hashed directory entries. So if you have cold-cache case, you'll read 
every single block totally individually, and serialized. One block at a 
time (I think the non-hashed case is likely also suspect, but that's a 
separate issue)

In other words, I'm not at all surprised it hits on filldir time. 
Especially on ext3.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/