Re: Ext2/3 block remapping tool
On Apr 30, 2007 08:09 -0400, Theodore Tso wrote: On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote: I'd prefer that such functionality be integrated with Takashi's online defrag tool, since it needs virtually the same functionality. For that matter, this is also very similar to the block-mapped - extents tool from Aneesh. It doesn't make sense to have so many separate tools for users, especially if they start interfering with each other (i.e. defrag undoes the remapping done by your tool). While we're at it, someone want to start thinking about on-line shrinking of ext4 filesystems? Again, the same block remapping interfaces for defrag and file access optimizations should also be useful for shrinking filesystems (even if some of the files that need to be relocated are being actively used). If not, that probably means we got the interface wrong. Except one other issue with online shrinking is that we need to move inodes on occasion and this poses a bunch of other problems over just remapping the data blocks. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote: Except one other issue with online shrinking is that we need to move inodes on occasion and this poses a bunch of other problems over just remapping the data blocks. Well, I did say necessary, and not sufficient. But yes, moving inodes, especially if the inode is currently open gets interesting. I don't think there are that many user space applications that would notice or care if the st_ino of an open file changed out from under them, but there are obviously userspace applications, such as tar, that would most definitely care. - Ted - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On May 01, 2007 11:28 -0400, Theodore Tso wrote: On Tue, May 01, 2007 at 12:01:42AM -0600, Andreas Dilger wrote: Except one other issue with online shrinking is that we need to move inodes on occasion and this poses a bunch of other problems over just remapping the data blocks. Well, I did say necessary, and not sufficient. But yes, moving inodes, especially if the inode is currently open gets interesting. I don't think there are that many user space applications that would notice or care if the st_ino of an open file changed out from under them, but there are obviously userspace applications, such as tar, that would most definitely care. I think rm -r does a LOT of this kind of operation, like: stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.) I think find does the same to avoid security problems with malicious path manipulation. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Tue, May 01, 2007 at 12:52:49PM -0600, Andreas Dilger wrote: I think rm -r does a LOT of this kind of operation, like: stat(.); stat(foo); chdir(foo); stat(.); unlink(*); chdir(..); stat(.) I think find does the same to avoid security problems with malicious path manipulation. Yep, so if you're doing an rm -rf (or any other recursive descent) while we're doing an on-line shrink, it's going to fail. I suppose we could have an in-core inode mapping table that would continue to remap inode numbers until the next reboot. I'm not sure we would want to keep the inode remapping indefinitely, although if we don't it could also end up screwing up NFS as well. Not sure I care, though. :-) - Ted - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Fri 27-04-07 12:09:42, Andreas Dilger wrote: On Apr 26, 2007 21:29 +0200, Jan Kara wrote: I've been lately playing with remapping ext2/ext3 blocks (especially how much it can give us in terms of speed of things like KDE start). For that I've written two simple tools (you can get them from ftp.suse.com/pub/people/jack/ext3remapper.tar.gz): e2block2file to transform (preparsed) output from blktrace into a list of accessed files and offsets accessed e2remapblocks to use output from e2block2file and remap blocks into big chunks in the order in which they were accessed. Does it map the whole file contiguously, or does it interleave blocks of the file in the order they are accessed? I would hope that it maps the whole file contiguously, and let readahead work properly to fetch the whole file. Also, keeping the file contiguous avoids fragmentation later if that file is updated, deleted, etc, and conflicts with allocator/defrag/etc. No, it does interleave blocks of different files. Reading the whole file is exactly what you often don't want. During startup KDE (which was my benchmark) accesses basically two things: shared libraries and config files / icons. Config files and icons usually fit into a single block so just mapping them in the right order close together is fine. On the other hand, shared libraries are large and you usually need just a few blocks scattered all over them. So here we just remap those few blocks we need... I see the downsides of this approach. If the file is rewritten, you loose the tight packing, but this is not going to happen often. I'm more seriously concerned about the possibility, that this optimizatiom of startup time may hurt running performace or more probably performance of other apps... (see README in the tools archive for more details) So far the tools (especially e2remapblocks ;) work on unmounted filesystem. The ultimate goal is to be able to do similar things for mounted filesystems but I wanted to see whether block remapping is worth it and what kernel interfaces would be useful for achieving the goal. I'd prefer that such functionality be integrated with Takashi's online defrag tool, since it needs virtually the same functionality. For that Yes, definitely these two have quite similar needs and I'd like to have just one tool in the end. matter, this is also very similar to the block-mapped - extents tool from Aneesh. It doesn't make sense to have so many separate tools for users, especially if they start interfering with each other (i.e. defrag undoes the remapping done by your tool). Agreed. Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote: I'd prefer that such functionality be integrated with Takashi's online defrag tool, since it needs virtually the same functionality. For that matter, this is also very similar to the block-mapped - extents tool from Aneesh. It doesn't make sense to have so many separate tools for users, especially if they start interfering with each other (i.e. defrag undoes the remapping done by your tool). Yep, in fact, I'm really glad that Jan is working on the remapping tool because if the on-line defrag kernel interfaces don't have the right support for it, then that means we need to fix the on-line defrag patches. :-) While we're at it, someone want to start thinking about on-line shrinking of ext4 filesystems? Again, the same block remapping interfaces for defrag and file access optimizations should also be useful for shrinking filesystems (even if some of the files that need to be relocated are being actively used). If not, that probably means we got the interface wrong. - Ted - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Mon 30-04-07 08:09:30, Theodore Tso wrote: On Fri, Apr 27, 2007 at 12:09:42PM -0600, Andreas Dilger wrote: I'd prefer that such functionality be integrated with Takashi's online defrag tool, since it needs virtually the same functionality. For that matter, this is also very similar to the block-mapped - extents tool from Aneesh. It doesn't make sense to have so many separate tools for users, especially if they start interfering with each other (i.e. defrag undoes the remapping done by your tool). Yep, in fact, I'm really glad that Jan is working on the remapping tool because if the on-line defrag kernel interfaces don't have the right support for it, then that means we need to fix the on-line defrag patches. :-) ;-) Exactly that was the reason why I wrote the userspace program - so that I have something in hands when we start discussing how the kernel interface will look like. While we're at it, someone want to start thinking about on-line shrinking of ext4 filesystems? Again, the same block remapping interfaces for defrag and file access optimizations should also be useful for shrinking filesystems (even if some of the files that need to be relocated are being actively used). If not, that probably means we got the interface wrong. Yes, that's a good idea. Currently it seems to me that block+inode relocation (we also need for defrag) would be enough to support filesystem shrinking. Actually, in some ancient times (like 6-7 years ago) I had written ext2 online filesystem shrinking. Currently, the patch is probably unusably obsolete but I can still dig it out and look what functions did I need at that time. Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext2/3 block remapping tool
On Apr 26, 2007 21:29 +0200, Jan Kara wrote: I've been lately playing with remapping ext2/ext3 blocks (especially how much it can give us in terms of speed of things like KDE start). For that I've written two simple tools (you can get them from ftp.suse.com/pub/people/jack/ext3remapper.tar.gz): e2block2file to transform (preparsed) output from blktrace into a list of accessed files and offsets accessed e2remapblocks to use output from e2block2file and remap blocks into big chunks in the order in which they were accessed. Does it map the whole file contiguously, or does it interleave blocks of the file in the order they are accessed? I would hope that it maps the whole file contiguously, and let readahead work properly to fetch the whole file. Also, keeping the file contiguous avoids fragmentation later if that file is updated, deleted, etc, and conflicts with allocator/defrag/etc. (see README in the tools archive for more details) So far the tools (especially e2remapblocks ;) work on unmounted filesystem. The ultimate goal is to be able to do similar things for mounted filesystems but I wanted to see whether block remapping is worth it and what kernel interfaces would be useful for achieving the goal. I'd prefer that such functionality be integrated with Takashi's online defrag tool, since it needs virtually the same functionality. For that matter, this is also very similar to the block-mapped - extents tool from Aneesh. It doesn't make sense to have so many separate tools for users, especially if they start interfering with each other (i.e. defrag undoes the remapping done by your tool). BTW, the results for KDE startup are as follows: The root partition was about 4.8 GB with around 1 GB free. System has 1GB mem. All measurements (except for warmcache) were performed after sync; echo 3 /proc/sys/vm/drop_caches Ordinary start: 19.2 20.3 19.5 19.8 19.3; avg. 19.62 Start with all data cached: 7 7.6 7.3 7.1 7.1; avg. 7.22 Start with fcache (see thread http://lkml.org/lkml/2006/5/15/46 for details on fcache): 11.3 11 10.3 10.8 10.6; avg. 10.8 Start with blocks remapped with e2remapblocks: 13.5 15 13 14.5 14.5; avg. 14.1 (after remapping, data was stored in 20 continguous extents on disk) Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html