Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Alex Tomas

David Chinner wrote:

Using a new API for new functionality is a bad thing?


if existing API can be used ...


No, it doesn't provide the same functionality.

Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
current on disk EOF. This code cannot do that as all it does is
allocation and present normal looking buffers to the generic code
path.


good point, I was going to take care of it in a separate patch
to support data=ordered.


Secondly, apart from delalloc, XFS cannot use the generic code paths
for writeback because unwritten extent conversion also requires
custom I/O completion handlers. Given that __mpage_writepage() only
calls -writepage when it is confused, XFS simply cannot use this
API.


this doesn't mean fs/mpage.c should go, right?


Also, looking at the way mpage_da_map_blocks() is done - if we have
an 128MB delalloc extent - ext4 will allocate that will allocate it
in one go, right? What happens if we then crash after only writing a
few megabytes of that extent? stale data exposure? XFS can allocate
multiple gigabytes in a single get_blocks call so even if ext4 can't
do this, it's a problem for XFS.


what happens if IO to 2nd MB is completed, while IO to 1st MB is not
(probably sitting in queue) ? do you update on-disk size in this case?
how do you track this?


So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this generic delayed allocation code.


I didn't say generic, see Subject: :)

thanks, Alex

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Alex Tomas

Jeff Garzik wrote:

Alex Tomas wrote:

So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this generic delayed allocation code.


I didn't say generic, see Subject: :)


Well, it shouldn't even be in the VFS layer if it's only usable by one 
filesystem.


sorry, but it seems I can say the same about iomap/ioend. I think
mpage_da_writepages() is simple enough to be adopted by other
filesystem, ext2 for example.

thanks, Alex

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Alex Tomas

David Chinner wrote:

Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
current on disk EOF. This code cannot do that as all it does is
allocation and present normal looking buffers to the generic code
path.


how do you implement fsync(2) ? you'd have to wait such IO to complete,
then update the inode and write it through the log?


Also, looking at the way mpage_da_map_blocks() is done - if we have
an 128MB delalloc extent - ext4 will allocate that will allocate it
in one go, right? What happens if we then crash after only writing a
few megabytes of that extent? stale data exposure? XFS can allocate
multiple gigabytes in a single get_blocks call so even if ext4 can't
do this, it's a problem for XFS.


I just realized that you're talking about data=ordered mode in ext4,
where care is taken to prevent on-disk references to no-yet-written
blocks. The solution is to wait such IO to complete before metadata
commit. And the key thing here is to allocate and attach to inode
blocks we're writing immediately. IOW, there is no unwritten blocks
attached to inode (except fallocate(2) case), but there may be blocks
preallocated for this inode in-core. same gigabytes, but different
way ;)

I have no single objection to custom IO completion callback per
mpage_writepages().


thanks, Alex


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] e2fsprogs: Support for large inode migration.

2007-07-27 Thread Theodore Tso
On Fri, Jul 27, 2007 at 08:29:31AM +0530, Aneesh Kumar K.V wrote:
 What are the issues you see with PATCH 1 and PATCH 2 which implement 
 Undo I/O Manager and undoe2fs other than it is not hooked into any of
 the existing tools. I will try to add it to mke2fs as you suggested. But
 should that prevent it from going in ?

The main issue is that it's not used by an in-tree caller.  If you can
add it to mke2fs, that would be great; otherwise, it's something
that's on my todo list so we can merge the undo manager.  The other
possibility is that I might use this as justification for creating a
pu (proposed update) branch which has the property that patches on
the pu branch can get removed or modified at any time, and the pu
branch can get rewound or rebased at any time.

See the section of text starting at line #68 (and going on to line
#160) in the following git's maintainer notes for a much more
comprehensive description of maint, master, next, and pu,
which is the direction in which I plan to take the e2fsprogs git tree:

http://git.kernel.org/?p=git/git.git;a=blob;f=MaintNotes;h=8bf0352adf2b4ac775d6100ad937600ecb5be5f2;hb=962753f75390f5c5ea23a2d1e6996f6027003478

But yes, the main reason why I haven't merged the undo manager is
because we don't have an in-tree user of that interface.  We don't
even have test cases in the tree, either (also on my todo list, but
feel free to beat me to it).

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?] ext4_ext_put_in_cache uses __u32 to receive physical block number.

2007-07-27 Thread Mingming Cao
On Fri, 2007-07-27 at 13:16 +0800, Yan Zheng wrote:
 Hi, all
 
 I think I found a bug in ext4/extents.c, ext4_ext_put_in_cache uses
 __u32 to receive physical block number.  ext4_ext_put_in_cache is
 used in ext4_ext_get_blocks, it sets ext4 inode's extent cache
 according most recently tree lookup (higher 16 bits of saved physical
 block number are always zero). when serving a mapping request,
 ext4_ext_get_blocks first check whether the logical block is in
 inode's extent cache. if the logical block is in the cache and the
 cached region isn't a gap, ext4_ext_get_blocks gets physical block
 number by using cached region's physical block number and offset in
 the cached region.  as described above, ext4_ext_get_blocks may
 return wrong result when there are physical block numbers bigger than
 0x.
 
 Regards
 
 YZ

You are right.  Thanks for reporting this!

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: linux-2.6.22/fs/ext4/extents.c
===
--- linux-2.6.22.orig/fs/ext4/extents.c 2007-07-27 08:31:02.0 -0700
+++ linux-2.6.22/fs/ext4/extents.c  2007-07-27 08:31:48.0 -0700
@@ -1544,7 +1544,7 @@ int ext4_ext_walk_space(struct inode *in
 
 static void
 ext4_ext_put_in_cache(struct inode *inode, __u32 block,
-   __u32 len, __u32 start, int type)
+   __u32 len, ext4_fsblk_t start, int type)
 {
struct ext4_ext_cache *cex;
BUG_ON(len == 0);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] basic delayed allocation in VFS

2007-07-27 Thread Jeff Garzik

Alex Tomas wrote:

So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this generic delayed allocation code.


I didn't say generic, see Subject: :)


Well, it shouldn't even be in the VFS layer if it's only usable by one 
filesystem.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] e2fsprogs: Support for large inode migration.

2007-07-27 Thread Aneesh Kumar K.V



Theodore Tso wrote:

On Fri, Jul 27, 2007 at 08:29:31AM +0530, Aneesh Kumar K.V wrote:
What are the issues you see with PATCH 1 and PATCH 2 which implement 
Undo I/O Manager and undoe2fs other than it is not hooked into any of

the existing tools. I will try to add it to mke2fs as you suggested. But
should that prevent it from going in ?


The main issue is that it's not used by an in-tree caller.  If you can
add it to mke2fs, that would be great; 



diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 0c6d4f3..fab8830 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1521,6 +1521,92 @@ static void PRS(int argc, char *argv[])
fs_param.s_blocks_count);
}

+static int fileystem_exist(const char *name)
+{
+   int retval;
+   io_channel channel;
+   __u16   s_magic;
+   struct ext2_super_block super;
+   io_manager manager = unix_io_manager;
+
+   retval = manager-open(name, IO_FLAG_EXCLUSIVE,  channel);
+   if (retval) {
+   retval = 0;
+   goto open_err_out;
+   }
+
+   io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
+   retval = io_channel_read_blk(channel, 1, -SUPERBLOCK_SIZE, super);
+   if (retval) {
+   retval = 0;
+   goto err_out;
+   }
+
+#if defined(WORDS_BIGENDIAN)
+   s_magic = ext2fs_swab16(super.s_magic);
+#else
+   s_magic = super.s_magic;
+#endif
+
+   if (s_magic == EXT2_SUPER_MAGIC)
+   retval = 1;
+
+err_out:
+   io_channel_close(channel);
+
+open_err_out:
+
+   /*
+* We don't handle error cases instead we
+* declare that the file system doesn't exist
+* and let the rest of mke2fs take care of
+* error
+*/
+   return retval;
+}
+
+static int mke2fs_setup_tdb(const char *name)
+{
+   char *tdb_dir, tdb_file[PATH_MAX];
+#if 0 /* FIXME!! */
+   /*
+* Configuration via a conf file would be
+* nice
+*/
+   profile_get_string(profile, scratch_files,
+   directory, 0, 0,
+   tdb_dir);
+#endif
+   tdb_dir = getenv(MKE2FS_SCRATCH_DIR);
+   if (!tdb_dir) {
+   printf(_(MKE2FS_SCRATCH_DIR not configured\n));
+   printf(_(Using /var/lib/e2fsprogs\n));
+   tdb_dir=/var/lib/e2fsprogs;
+   }
+   if (access(tdb_dir, W_OK)) {
+   fprintf(stderr,
+   _(Cannot create file under %s\n),
+   tdb_dir);
+   return EXT2_ET_INVALID_ARGUMENT;
+
+   }
+
+   /* FIXME!! Should we generate Unique file name ?? */
+   sprintf(tdb_file, %s/mke2fs-XX, tdb_dir);
+
+   if (!access(tdb_file, F_OK)) {
+   fprintf(stderr,
+   _(File exist %s\n), tdb_file);
+   return EXT2_ET_INVALID_ARGUMENT;
+   }
+
+   set_undo_io_backup_file(tdb_file);
+   printf(_(previous filesystem detected; to undo 
+the mke2fs operation, please run the 
+command \n'undoe2fs %s %s' in order to recover\n\n),
+ tdb_file, name);
+   return 0;
+}
int main (int argc, char *argv[])
{
errcode_t   retval = 0;
@@ -1543,7 +1629,17 @@ int main (int argc, char *argv[])
io_ptr = test_io_manager;
test_io_backing_manager = unix_io_manager;
#else
-   io_ptr = unix_io_manager;
+   if (fileystem_exist(device_name)) {
+
+   io_ptr = undo_io_manager;
+   set_undo_io_backing_manager(unix_io_manager);
+   retval = mke2fs_setup_tdb(device_name);
+   if (retval)
+   exit(1);
+
+   } else {
+   io_ptr = unix_io_manager;
+   }
#endif

/*
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html