Re: Ext2/3 block remapping tool

2007-05-01 Thread Andreas Dilger
On Apr 30, 2007  08:09 -0400, Theodore Tso wrote:
 On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
  I'd prefer that such functionality be integrated with Takashi's online
  defrag tool, since it needs virtually the same functionality.  For that
  matter, this is also very similar to the block-mapped - extents tool
  from Aneesh.  It doesn't make sense to have so many separate tools for
  users, especially if they start interfering with each other (i.e. defrag
  undoes the remapping done by your tool).
 
 While we're at it, someone want to start thinking about on-line
 shrinking of ext4 filesystems?  Again, the same block remapping
 interfaces for defrag and file access optimizations should also be
 useful for shrinking filesystems (even if some of the files that need
 to be relocated are being actively used).  If not, that probably means
 we got the interface wrong.

Except one other issue with online shrinking is that we need to move
inodes on occasion and this poses a bunch of other problems over just
remapping the data blocks.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-05-01 Thread Theodore Tso
On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote:
 Except one other issue with online shrinking is that we need to move
 inodes on occasion and this poses a bunch of other problems over just
 remapping the data blocks.

Well, I did say necessary, and not sufficient.  But yes, moving
inodes, especially if the inode is currently open gets interesting.  I
don't think there are that many user space applications that would
notice or care if the st_ino of an open file changed out from under
them, but there are obviously userspace applications, such as tar,
that would most definitely care.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-05-01 Thread Andreas Dilger
On May 01, 2007  11:28 -0400, Theodore Tso wrote:
 On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote:
  Except one other issue with online shrinking is that we need to move
  inodes on occasion and this poses a bunch of other problems over just
  remapping the data blocks.
 
 Well, I did say necessary, and not sufficient.  But yes, moving
 inodes, especially if the inode is currently open gets interesting.  I
 don't think there are that many user space applications that would
 notice or care if the st_ino of an open file changed out from under
 them, but there are obviously userspace applications, such as tar,
 that would most definitely care.

I think rm -r does a LOT of this kind of operation, like:

stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.)

I think find does the same to avoid security problems with malicious
path manipulation.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-05-01 Thread Theodore Tso
On Tue, May 01, 2007 at 12:52:49PM -0600, Andreas Dilger wrote:
 I think rm -r does a LOT of this kind of operation, like:
 
 stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.)
 
 I think find does the same to avoid security problems with malicious
 path manipulation.

Yep, so if you're doing an rm -rf (or any other recursive descent)
while we're doing an on-line shrink, it's going to fail.  I suppose we
could have an in-core inode mapping table that would continue to remap
inode numbers until the next reboot.  I'm not sure we would want to
keep the inode remapping indefinitely, although if we don't it could
also end up screwing up NFS as well.  Not sure I care, though.  :-)

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-04-30 Thread Jan Kara
On Fri 27-04-07 12:09:42, Andreas Dilger wrote:
 On Apr 26, 2007  21:29 +0200, Jan Kara wrote:
I've been lately playing with remapping ext2/ext3 blocks (especially how
  much it can give us in terms of speed of things like KDE start). For that
  I've written two simple tools (you can get them from
  ftp.suse.com/pub/people/jack/ext3remapper.tar.gz):
e2block2file to transform (preparsed) output from blktrace into a list
  of accessed files and offsets accessed
e2remapblocks to use output from e2block2file and remap blocks into big
  chunks in the order in which they were accessed.
 
 Does it map the whole file contiguously, or does it interleave blocks of the
 file in the order they are accessed?  I would hope that it maps the whole
 file contiguously, and let readahead work properly to fetch the whole file.
 Also, keeping the file contiguous avoids fragmentation later if that file is
 updated, deleted, etc, and conflicts with allocator/defrag/etc.
  No, it does interleave blocks of different files. Reading the whole file
is exactly what you often don't want. During startup KDE (which was my
benchmark) accesses basically two things: shared libraries and config files / 
icons.
Config files and icons usually fit into a single block so just mapping them
in the right order close together is fine. On the other hand, shared
libraries are large and you usually need just a few blocks scattered all
over them. So here we just remap those few blocks we need...
  I see the downsides of this approach. If the file is rewritten, you
loose the tight packing, but this is not going to happen often. I'm more
seriously concerned about the possibility, that this optimizatiom of
startup time may hurt running performace or more probably performance of
other apps...

(see README in the tools archive for more details)
  
So far the tools (especially e2remapblocks ;) work on unmounted
  filesystem. The ultimate goal is to be able to do similar things for
  mounted filesystems but I wanted to see whether block remapping is worth it
  and what kernel interfaces would be useful for achieving the goal.
 
 I'd prefer that such functionality be integrated with Takashi's online
 defrag tool, since it needs virtually the same functionality.  For that
  Yes, definitely these two have quite similar needs and I'd like to have
just one tool in the end.

 matter, this is also very similar to the block-mapped - extents tool
 from Aneesh.  It doesn't make sense to have so many separate tools for
 users, especially if they start interfering with each other (i.e. defrag
 undoes the remapping done by your tool).
  Agreed.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-04-30 Thread Theodore Tso
On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
 I'd prefer that such functionality be integrated with Takashi's online
 defrag tool, since it needs virtually the same functionality.  For that
 matter, this is also very similar to the block-mapped - extents tool
 from Aneesh.  It doesn't make sense to have so many separate tools for
 users, especially if they start interfering with each other (i.e. defrag
 undoes the remapping done by your tool).

Yep, in fact, I'm really glad that Jan is working on the remapping
tool because if the on-line defrag kernel interfaces don't have the
right support for it, then that means we need to fix the on-line
defrag patches.  :-)

While we're at it, someone want to start thinking about on-line
shrinking of ext4 filesystems?  Again, the same block remapping
interfaces for defrag and file access optimizations should also be
useful for shrinking filesystems (even if some of the files that need
to be relocated are being actively used).  If not, that probably means
we got the interface wrong.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-04-30 Thread Jan Kara
On Mon 30-04-07 08:09:30, Theodore Tso wrote:
 On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote:
  I'd prefer that such functionality be integrated with Takashi's online
  defrag tool, since it needs virtually the same functionality.  For that
  matter, this is also very similar to the block-mapped - extents tool
  from Aneesh.  It doesn't make sense to have so many separate tools for
  users, especially if they start interfering with each other (i.e. defrag
  undoes the remapping done by your tool).
 
 Yep, in fact, I'm really glad that Jan is working on the remapping
 tool because if the on-line defrag kernel interfaces don't have the
 right support for it, then that means we need to fix the on-line
 defrag patches.  :-)
  ;-) Exactly that was the reason why I wrote the userspace program - so
that I have something in hands when we start discussing how the kernel
interface will look like.

 While we're at it, someone want to start thinking about on-line
 shrinking of ext4 filesystems?  Again, the same block remapping
 interfaces for defrag and file access optimizations should also be
 useful for shrinking filesystems (even if some of the files that need
 to be relocated are being actively used).  If not, that probably means
 we got the interface wrong.
  Yes, that's a good idea. Currently it seems to me that block+inode
relocation (we also need for defrag) would be enough to support filesystem
shrinking. Actually, in some ancient times (like 6-7 years ago) I had
written ext2 online filesystem shrinking. Currently, the patch is probably
unusably obsolete but I can still dig it out and look what functions did I
need at that time.

Honza
-- 
Jan Kara [EMAIL PROTECTED]
SuSE CR Labs
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext2/3 block remapping tool

2007-04-27 Thread Andreas Dilger
On Apr 26, 2007  21:29 +0200, Jan Kara wrote:
   I've been lately playing with remapping ext2/ext3 blocks (especially how
 much it can give us in terms of speed of things like KDE start). For that
 I've written two simple tools (you can get them from
 ftp.suse.com/pub/people/jack/ext3remapper.tar.gz):
   e2block2file to transform (preparsed) output from blktrace into a list
 of accessed files and offsets accessed
   e2remapblocks to use output from e2block2file and remap blocks into big
 chunks in the order in which they were accessed.

Does it map the whole file contiguously, or does it interleave blocks of the
file in the order they are accessed?  I would hope that it maps the whole
file contiguously, and let readahead work properly to fetch the whole file.
Also, keeping the file contiguous avoids fragmentation later if that file is
updated, deleted, etc, and conflicts with allocator/defrag/etc.

   (see README in the tools archive for more details)
 
   So far the tools (especially e2remapblocks ;) work on unmounted
 filesystem. The ultimate goal is to be able to do similar things for
 mounted filesystems but I wanted to see whether block remapping is worth it
 and what kernel interfaces would be useful for achieving the goal.

I'd prefer that such functionality be integrated with Takashi's online
defrag tool, since it needs virtually the same functionality.  For that
matter, this is also very similar to the block-mapped - extents tool
from Aneesh.  It doesn't make sense to have so many separate tools for
users, especially if they start interfering with each other (i.e. defrag
undoes the remapping done by your tool).

   BTW, the results for KDE startup are as follows:
 The root partition was about 4.8 GB with around 1 GB free. System has
 1GB mem. All measurements (except for warmcache) were performed after
   sync; echo 3 /proc/sys/vm/drop_caches
 
 Ordinary start: 19.2 20.3 19.5 19.8 19.3; avg. 19.62
 Start with all data cached: 7 7.6 7.3 7.1 7.1; avg. 7.22
 Start with fcache (see thread http://lkml.org/lkml/2006/5/15/46 for details
 on fcache):
   11.3 11 10.3 10.8 10.6; avg. 10.8
 Start with blocks remapped with e2remapblocks:
   13.5 15 13 14.5 14.5; avg. 14.1
 (after remapping, data was stored in 20 continguous extents on disk)



Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html