Re: Enabling h-trees too early?

2007-09-21 Thread Andi Kleen
Theodore Tso [EMAIL PROTECTED] writes:
 
 Certainly one of the things that we could consider is for small
 directories to do an in-memory sort of all of the directory entries at
 opendir() time, and keeping that list until it is closed.  We can't do
 this for really big directories, but we could easily do it for
 directories under 32k or 64k.

I assume you mean sort by inode, because sort by htree key would
be as bad as htrees.

But wouldn't that break parallel readdir for a directory that just grows
from 32/64K to over it?  e.g. if the sort moves already read 
entries to after the  cursor readdir would return some entries twice.

I suspect you would need to keep it always sorted after that no matter
how big it gets. So the 32/64k boundary seems useless and you would
need a sorted potentially partial in memory rbtree anyways.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-21 Thread Theodore Tso
On Fri, Sep 21, 2007 at 11:02:58AM +0200, Andi Kleen wrote:
 I assume you mean sort by inode, because sort by htree key would
 be as bad as htrees.
 
 But wouldn't that break parallel readdir for a directory that just grows
 from 32/64K to over it?  e.g. if the sort moves already read 
 entries to after the  cursor readdir would return some entries twice.

No, it wouldn't break it because after snapshotting the tree, we would
only use the snapshot and not refer to the on-disk format until the
directory file handle is closed or there is an explicit lseek to 0.
To quote from the spec:

  If a file is removed from or added to the directory after the most recent
  call to the opendir() or rewinddir() function, whether a subsequent call to
  the readdir() function returns an entry for that file is unspecified.

So, if some program does the stupid thing and opens a directory,
doesn't use it for five hours, and in the meantime 2,000 files are
created, it's OK if we only return the set of files that were in
existence when the opendir() was originally called.  This makes us OK
in terms of spec conformance, and given that real life programs don't
do stuff this stupid, we should be fine.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-20 Thread Theodore Tso
On Thu, Sep 20, 2007 at 03:33:50PM +0200, Jan Kara wrote:
   So for example deleting kernel tree on my computer takes ~14 seconds with
 h-trees and less than 9 without them. Also doing 'cp -lr' of the kernel
 tree takes 8 seconds with h-trees and 6.3s without them... So I think the
 performance difference is quite measurable.

This is in a completely cold cache state?  (i.e. mounting and
unmounting the filesystem before doing the rm -rf?)

On my kernel tree, using the command: lsattr -R | grep -- -I- shows
that only 8 directories are htree indexed, and they're not that big:

12 drwxr-xr-x 12 tytso tytso 12288 2007-09-14 16:25 ./drivers/char
24 drwxr-xr-x 30 tytso tytso 24576 2007-09-14 16:25 ./drivers/net
20 drwxr-xr-x  2 tytso tytso 20480 2007-09-14 16:25 ./drivers/usb/serial
32 drwxr-xr-x 24 tytso tytso 32768 2007-09-14 16:10 ./include/linux
12 drwxr-xr-x  2 tytso tytso 12288 2007-09-14 16:25 ./net/bridge/netfilter
24 drwxr-xr-x  2 tytso tytso 24576 2007-09-14 16:25 ./net/ipv4/netfilter
12 drwxr-xr-x  2 tytso tytso 12288 2007-09-14 16:25 ./net/ipv6/netfilter
32 drwxr-xr-x  2 tytso tytso 32768 2007-09-14 16:25 ./net/netfilter

... which means if the benchmark only focused on deleting these files,
then presumably the percentage increase would be even worse.

  Certainly one of the things that we could consider is for small
  directories to do an in-memory sort of all of the directory entries at
  opendir() time, and keeping that list until it is closed.  We can't do
  this for really big directories, but we could easily do it for
  directories under 32k or 64k.

   Umm, yes. That would be probably feasible. But converting to htrees only
 when directories grow larger would avoid the problem also. It also does not
 seem *that* hard but maybe I miss some nasty details...

The reason why I mentioned the caching idea is we already have code to
manage and return directories stored in an rbtree in the kernel,
albeit for a slightly different purpose.  So hacking it up to cache
all of the directory entries for directories  64k and to index them
by inode number instead of hash key would be pretty easy.

What's nasty about converting to htrees after the directories become
larger is that we need to reserve extra space in the journal for each
block that we need to modify, and then just the fact that we have to
keep track of the multiple buffers.  Basically, not impossible but
just a pain in the *ss.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-20 Thread Jan Kara
On Thu 20-09-07 11:14:40, Theodore Tso wrote:
 On Thu, Sep 20, 2007 at 04:58:39PM +0200, Jan Kara wrote:
Hmm, strange - I've just looked at my computer and dir_index is set
  just for 5 directories in my tree.
 
 I looked at a tree that had object files, which is probably why I had
 8 directories; I'm guessing you probably just had kernel sources and
 no build files.
 
  If I try deleting just them, I also
  see some performance decrease but it's less than if I try deleting the
  whole tree (and that result seems to be quite consistent)... There's 
  something
  fishy there. Maybe I could try seekwatcher or something similar to see
  what's really happening.
 
 That is very strange.
  Just a guess: Can't the culprit be the following test in ext3/4_readdir()?
if (EXT4_HAS_COMPAT_FEATURE(inode-i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) 
((EXT4_I(inode)-i_flags  EXT4_INDEX_FL) ||
 ((inode-i_size  sb-s_blocksize_bits) == 1))) {
error = ext4_dx_readdir(filp, dirent, filldir);
if (error != ERR_BAD_DX_DIR) {
ret = error;
goto out;
}
/*
 * We don't set the inode dirty flag since it's not
 * critical that it get flushed back to the disk.
 */
EXT4_I(filp-f_path.dentry-d_inode)-i_flags = ~EXT4_INDEX_FL;
}
  It calls ext4_dx_readdir() for *every* directory with 1 block (we have
1326 of them in the kernel tree). Now ext4_dx_readdir() calls
ext4_htree_fill_tree() which finds out the directory is not h-tree and
and calls htree_dirblock_to_tree(). So even for 4KB directories we end up
deleting inodes in hash order! And as a bonus we burn some cycles building
trees etc. What is the point of this?

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-20 Thread Theodore Tso
On Thu, Sep 20, 2007 at 06:19:04PM +0200, Jan Kara wrote:
 if (EXT4_HAS_COMPAT_FEATURE(inode-i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) 
 ((EXT4_I(inode)-i_flags  EXT4_INDEX_FL) ||
  ((inode-i_size  sb-s_blocksize_bits) == 1))) {
 error = ext4_dx_readdir(filp, dirent, filldir);
 if (error != ERR_BAD_DX_DIR) {
 ret = error;
 goto out;
 }
 /*
  * We don't set the inode dirty flag since it's not
  * critical that it get flushed back to the disk.
  */
 EXT4_I(filp-f_path.dentry-d_inode)-i_flags = ~EXT4_INDEX_FL;
 }
   It calls ext4_dx_readdir() for *every* directory with 1 block (we have
 1326 of them in the kernel tree). Now ext4_dx_readdir() calls
 ext4_htree_fill_tree() which finds out the directory is not h-tree and
 and calls htree_dirblock_to_tree(). So even for 4KB directories we end up
 deleting inodes in hash order! And as a bonus we burn some cycles building
 trees etc. What is the point of this?

That was added so we wouldn't get screwed when a directory that was
previously non htree became an htree directory while the directory fd
is open.  So the failure case is one where you do opendir(), readdir()
on 25% of the directory, sleep for 2 hours, and in the meantime, 200
files are added to the directory and it gets converted into a htree
index, causing all of the previously returned readdir() results in
directory order to be completely screwed up now that the directory has
been converted into an htree.  (All of the readdir/telldir/seekdir 
POSIX requirements cause filesystem designers to tear their hair out.)

What we would need to do to avoid needing this is to read in the
entire directory leaf page into the rbtree, sorted by inode number,
and then to keep that rbtree for the entire life of the open directory
file descriptor.  We would also have to change telldir/seekdir to use
something else as a telldir cookie, and readdir would have to be set
up to *only* use the rbtree, and never look at the on-disk directory.
This would also mean that all of the files created or deleted after
the initial opendir() would never be reflected in results returned by
readdir(), but that's allowed by POSIX.  And if we do this for a
single block 4k directory, we might as well do it for a 32k or 64k
HTREE directory as well.

Does that make sense?

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Enabling h-trees too early?

2007-09-19 Thread Jan Kara
  Hi,

  I was just wondering: Currently we start to build h-tree in a directory
already when the size of directory exceeds one block. But honestly, it does
not seem to make much sence to use this feature until the directory is much
larger (I'd say at least 16 or 32 KB). It actually slows down some
operations like deleting the whole directory, etc. So what is the reason
for starting building the tree so early? Just the simplicity of building it
when the directory is just one block large?

Honza

-- 
Jan Kara [EMAIL PROTECTED]
SUSE Labs, CR
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-19 Thread Andreas Dilger
On Sep 19, 2007  17:07 +0200, Jan Kara wrote:
   I was just wondering: Currently we start to build h-tree in a directory
 already when the size of directory exceeds one block. But honestly, it does
 not seem to make much sence to use this feature until the directory is much
 larger (I'd say at least 16 or 32 KB). It actually slows down some
 operations like deleting the whole directory, etc. So what is the reason
 for starting building the tree so early? Just the simplicity of building it
 when the directory is just one block large?

Yes, doing it at one block is easy, doing it with more blocks is complex.
At the time we added this feature, tests showed no performance difference
between 2 unhashed blocks and 2 hashed blocks so the easier code was used.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enabling h-trees too early?

2007-09-19 Thread Theodore Tso
On Wed, Sep 19, 2007 at 05:07:15PM +0200, Jan Kara wrote:
 
   I was just wondering: Currently we start to build h-tree in a directory
 already when the size of directory exceeds one block. But honestly, it does
 not seem to make much sence to use this feature until the directory is much
 larger (I'd say at least 16 or 32 KB). It actually slows down some
 operations like deleting the whole directory, etc. So what is the reason
 for starting building the tree so early? Just the simplicity of building it
 when the directory is just one block large?

How much is it slowing down operations such as rm -rf?  For a small
directory ( 32k), I would assume that the difference would be
relatively small.  What sort of differences have you measured, and is
this a common case problem?

Certainly one of the things that we could consider is for small
directories to do an in-memory sort of all of the directory entries at
opendir() time, and keeping that list until it is closed.  We can't do
this for really big directories, but we could easily do it for
directories under 32k or 64k.

  - Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html