Re: [RFC] basic delayed allocation in VFS
David Chinner wrote: Using a new API for new functionality is a bad thing? if existing API can be used ... No, it doesn't provide the same functionality. Firstly, XFS attaches a different I/O completion to delalloc writes to allow us to update the file size when the write is beyond the current on disk EOF. This code cannot do that as all it does is allocation and present normal looking buffers to the generic code path. good point, I was going to take care of it in a separate patch to support data=ordered. Secondly, apart from delalloc, XFS cannot use the generic code paths for writeback because unwritten extent conversion also requires custom I/O completion handlers. Given that __mpage_writepage() only calls -writepage when it is confused, XFS simply cannot use this API. this doesn't mean fs/mpage.c should go, right? Also, looking at the way mpage_da_map_blocks() is done - if we have an 128MB delalloc extent - ext4 will allocate that will allocate it in one go, right? What happens if we then crash after only writing a few megabytes of that extent? stale data exposure? XFS can allocate multiple gigabytes in a single get_blocks call so even if ext4 can't do this, it's a problem for XFS. what happens if IO to 2nd MB is completed, while IO to 1st MB is not (probably sitting in queue) ? do you update on-disk size in this case? how do you track this? So without the ability to attach specific I/O completions to bios or support for unwritten extents directly in __mpage_writepage, there is no way XFS can use this generic delayed allocation code. I didn't say generic, see Subject: :) thanks, Alex - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] basic delayed allocation in VFS
Jeff Garzik wrote: Alex Tomas wrote: So without the ability to attach specific I/O completions to bios or support for unwritten extents directly in __mpage_writepage, there is no way XFS can use this generic delayed allocation code. I didn't say generic, see Subject: :) Well, it shouldn't even be in the VFS layer if it's only usable by one filesystem. sorry, but it seems I can say the same about iomap/ioend. I think mpage_da_writepages() is simple enough to be adopted by other filesystem, ext2 for example. thanks, Alex - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] basic delayed allocation in VFS
David Chinner wrote: Firstly, XFS attaches a different I/O completion to delalloc writes to allow us to update the file size when the write is beyond the current on disk EOF. This code cannot do that as all it does is allocation and present normal looking buffers to the generic code path. how do you implement fsync(2) ? you'd have to wait such IO to complete, then update the inode and write it through the log? Also, looking at the way mpage_da_map_blocks() is done - if we have an 128MB delalloc extent - ext4 will allocate that will allocate it in one go, right? What happens if we then crash after only writing a few megabytes of that extent? stale data exposure? XFS can allocate multiple gigabytes in a single get_blocks call so even if ext4 can't do this, it's a problem for XFS. I just realized that you're talking about data=ordered mode in ext4, where care is taken to prevent on-disk references to no-yet-written blocks. The solution is to wait such IO to complete before metadata commit. And the key thing here is to allocate and attach to inode blocks we're writing immediately. IOW, there is no unwritten blocks attached to inode (except fallocate(2) case), but there may be blocks preallocated for this inode in-core. same gigabytes, but different way ;) I have no single objection to custom IO completion callback per mpage_writepages(). thanks, Alex - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] e2fsprogs: Support for large inode migration.
On Fri, Jul 27, 2007 at 08:29:31AM +0530, Aneesh Kumar K.V wrote: What are the issues you see with PATCH 1 and PATCH 2 which implement Undo I/O Manager and undoe2fs other than it is not hooked into any of the existing tools. I will try to add it to mke2fs as you suggested. But should that prevent it from going in ? The main issue is that it's not used by an in-tree caller. If you can add it to mke2fs, that would be great; otherwise, it's something that's on my todo list so we can merge the undo manager. The other possibility is that I might use this as justification for creating a pu (proposed update) branch which has the property that patches on the pu branch can get removed or modified at any time, and the pu branch can get rewound or rebased at any time. See the section of text starting at line #68 (and going on to line #160) in the following git's maintainer notes for a much more comprehensive description of maint, master, next, and pu, which is the direction in which I plan to take the e2fsprogs git tree: http://git.kernel.org/?p=git/git.git;a=blob;f=MaintNotes;h=8bf0352adf2b4ac775d6100ad937600ecb5be5f2;hb=962753f75390f5c5ea23a2d1e6996f6027003478 But yes, the main reason why I haven't merged the undo manager is because we don't have an in-tree user of that interface. We don't even have test cases in the tree, either (also on my todo list, but feel free to beat me to it). - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] ext4_ext_put_in_cache uses __u32 to receive physical block number.
On Fri, 2007-07-27 at 13:16 +0800, Yan Zheng wrote: Hi, all I think I found a bug in ext4/extents.c, ext4_ext_put_in_cache uses __u32 to receive physical block number. ext4_ext_put_in_cache is used in ext4_ext_get_blocks, it sets ext4 inode's extent cache according most recently tree lookup (higher 16 bits of saved physical block number are always zero). when serving a mapping request, ext4_ext_get_blocks first check whether the logical block is in inode's extent cache. if the logical block is in the cache and the cached region isn't a gap, ext4_ext_get_blocks gets physical block number by using cached region's physical block number and offset in the cached region. as described above, ext4_ext_get_blocks may return wrong result when there are physical block numbers bigger than 0x. Regards YZ You are right. Thanks for reporting this! Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.22/fs/ext4/extents.c === --- linux-2.6.22.orig/fs/ext4/extents.c 2007-07-27 08:31:02.0 -0700 +++ linux-2.6.22/fs/ext4/extents.c 2007-07-27 08:31:48.0 -0700 @@ -1544,7 +1544,7 @@ int ext4_ext_walk_space(struct inode *in static void ext4_ext_put_in_cache(struct inode *inode, __u32 block, - __u32 len, __u32 start, int type) + __u32 len, ext4_fsblk_t start, int type) { struct ext4_ext_cache *cex; BUG_ON(len == 0); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] basic delayed allocation in VFS
Alex Tomas wrote: So without the ability to attach specific I/O completions to bios or support for unwritten extents directly in __mpage_writepage, there is no way XFS can use this generic delayed allocation code. I didn't say generic, see Subject: :) Well, it shouldn't even be in the VFS layer if it's only usable by one filesystem. Jeff - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] e2fsprogs: Support for large inode migration.
Theodore Tso wrote: On Fri, Jul 27, 2007 at 08:29:31AM +0530, Aneesh Kumar K.V wrote: What are the issues you see with PATCH 1 and PATCH 2 which implement Undo I/O Manager and undoe2fs other than it is not hooked into any of the existing tools. I will try to add it to mke2fs as you suggested. But should that prevent it from going in ? The main issue is that it's not used by an in-tree caller. If you can add it to mke2fs, that would be great; diff --git a/misc/mke2fs.c b/misc/mke2fs.c index 0c6d4f3..fab8830 100644 --- a/misc/mke2fs.c +++ b/misc/mke2fs.c @@ -1521,6 +1521,92 @@ static void PRS(int argc, char *argv[]) fs_param.s_blocks_count); } +static int fileystem_exist(const char *name) +{ + int retval; + io_channel channel; + __u16 s_magic; + struct ext2_super_block super; + io_manager manager = unix_io_manager; + + retval = manager-open(name, IO_FLAG_EXCLUSIVE, channel); + if (retval) { + retval = 0; + goto open_err_out; + } + + io_channel_set_blksize(channel, SUPERBLOCK_OFFSET); + retval = io_channel_read_blk(channel, 1, -SUPERBLOCK_SIZE, super); + if (retval) { + retval = 0; + goto err_out; + } + +#if defined(WORDS_BIGENDIAN) + s_magic = ext2fs_swab16(super.s_magic); +#else + s_magic = super.s_magic; +#endif + + if (s_magic == EXT2_SUPER_MAGIC) + retval = 1; + +err_out: + io_channel_close(channel); + +open_err_out: + + /* +* We don't handle error cases instead we +* declare that the file system doesn't exist +* and let the rest of mke2fs take care of +* error +*/ + return retval; +} + +static int mke2fs_setup_tdb(const char *name) +{ + char *tdb_dir, tdb_file[PATH_MAX]; +#if 0 /* FIXME!! */ + /* +* Configuration via a conf file would be +* nice +*/ + profile_get_string(profile, scratch_files, + directory, 0, 0, + tdb_dir); +#endif + tdb_dir = getenv(MKE2FS_SCRATCH_DIR); + if (!tdb_dir) { + printf(_(MKE2FS_SCRATCH_DIR not configured\n)); + printf(_(Using /var/lib/e2fsprogs\n)); + tdb_dir=/var/lib/e2fsprogs; + } + if (access(tdb_dir, W_OK)) { + fprintf(stderr, + _(Cannot create file under %s\n), + tdb_dir); + return EXT2_ET_INVALID_ARGUMENT; + + } + + /* FIXME!! Should we generate Unique file name ?? */ + sprintf(tdb_file, %s/mke2fs-XX, tdb_dir); + + if (!access(tdb_file, F_OK)) { + fprintf(stderr, + _(File exist %s\n), tdb_file); + return EXT2_ET_INVALID_ARGUMENT; + } + + set_undo_io_backup_file(tdb_file); + printf(_(previous filesystem detected; to undo +the mke2fs operation, please run the +command \n'undoe2fs %s %s' in order to recover\n\n), + tdb_file, name); + return 0; +} int main (int argc, char *argv[]) { errcode_t retval = 0; @@ -1543,7 +1629,17 @@ int main (int argc, char *argv[]) io_ptr = test_io_manager; test_io_backing_manager = unix_io_manager; #else - io_ptr = unix_io_manager; + if (fileystem_exist(device_name)) { + + io_ptr = undo_io_manager; + set_undo_io_backing_manager(unix_io_manager); + retval = mke2fs_setup_tdb(device_name); + if (retval) + exit(1); + + } else { + io_ptr = unix_io_manager; + } #endif /* - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html