Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Wed, Jul 11, 2007 at 10:03:12AM +0100, Christoph Hellwig wrote: On Tue, Jul 03, 2007 at 05:16:50PM +0530, Amit K. Arora wrote: Well, if you see the modes proposed using above flags : #define FA_ALLOCATE 0 #define FA_DEALLOCATE FA_FL_DEALLOC #define FA_RESV_SPACE FA_FL_KEEP_SIZE #define FA_UNRESV_SPACE (FA_FL_DEALLOC | FA_FL_KEEP_SIZE | FA_FL_DEL_DATA) FA_FL_DEL_DATA is _not_ being used for preallocation. We have two modes for preallocation FA_ALLOCATE and FA_RESV_SPACE, which do not use this flag. Hence prealloction will never delete data. This mode is required only for FA_UNRESV_SPACE, which is a deallocation mode, to support any existing XFS aware applications/usage-scenarios. Sorry, but this doesn't make any sense. There is no need to put every feature in the XFS ioctls in the syscalls. The XFS ioctls will need to be supported forever anyway - as I suggested before they really should be moved to generic code. What needs to be supported is what makes sense as an interface. A punch a hole interface does make sense, but trying to hack this into a preallocation system call is just madness. We're not IRIX or windows that fit things into random subcall just because there was some space left to squeeze them in. FA_FL_NO_MTIME0x10 /* keep same mtime (default change on size, data change) */ FA_FL_NO_CTIME0x20 /* keep same ctime (default change on size, data change) */ NACK to these aswell. If i_size changes c/mtime need updates, if the size doesn't chamge they don't. No need to add more flags for this. This requirement was from the point of view of HSM applications. Hope you saw Andreas previous post and are keeping that in mind. HSMs needs this basically for every system call, which screams for an open flag like O_INVISIBLE anyway. Adding this in a generic way is a good idea, but hacking bits and pieces that won't fit into the global design is completely wrong. Why don't we just merge the interface for preallocation (essentially enough to satisfy posix_fallocate() and the simple XFS requirement for space reservation without changing file size), which there is clear agreement on (I hope :)). After all, this was all that we set out to do when we started. And leave all the dealloc/punch/hsm type features for separate future patches/ debates, those really shouldn't hold up the basic fallocate interface. I agree with Christoph that we are just diverging too much in trying to club those decisions here. Dave, Andreas, Ted ? Regards Suparna - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Thu, Jul 12, 2007 at 12:58:13PM +0530, Suparna Bhattacharya wrote: On Wed, Jul 11, 2007 at 10:03:12AM +0100, Christoph Hellwig wrote: On Tue, Jul 03, 2007 at 05:16:50PM +0530, Amit K. Arora wrote: Well, if you see the modes proposed using above flags : #define FA_ALLOCATE 0 #define FA_DEALLOCATE FA_FL_DEALLOC #define FA_RESV_SPACE FA_FL_KEEP_SIZE #define FA_UNRESV_SPACE (FA_FL_DEALLOC | FA_FL_KEEP_SIZE | FA_FL_DEL_DATA) FA_FL_DEL_DATA is _not_ being used for preallocation. We have two modes for preallocation FA_ALLOCATE and FA_RESV_SPACE, which do not use this flag. Hence prealloction will never delete data. This mode is required only for FA_UNRESV_SPACE, which is a deallocation mode, to support any existing XFS aware applications/usage-scenarios. Sorry, but this doesn't make any sense. There is no need to put every feature in the XFS ioctls in the syscalls. The XFS ioctls will need to be supported forever anyway - as I suggested before they really should be moved to generic code. What needs to be supported is what makes sense as an interface. A punch a hole interface does make sense, but trying to hack this into a preallocation system call is just madness. We're not IRIX or windows that fit things into random subcall just because there was some space left to squeeze them in. FA_FL_NO_MTIME 0x10 /* keep same mtime (default change on size, data change) */ FA_FL_NO_CTIME 0x20 /* keep same ctime (default change on size, data change) */ NACK to these aswell. If i_size changes c/mtime need updates, if the size doesn't chamge they don't. No need to add more flags for this. This requirement was from the point of view of HSM applications. Hope you saw Andreas previous post and are keeping that in mind. HSMs needs this basically for every system call, which screams for an open flag like O_INVISIBLE anyway. Adding this in a generic way is a good idea, but hacking bits and pieces that won't fit into the global design is completely wrong. Why don't we just merge the interface for preallocation (essentially enough to satisfy posix_fallocate() and the simple XFS requirement for space reservation without changing file size), which there is clear agreement on (I hope :)). After all, this was all that we set out to do when we started. As you suggest, let us just have two modes for the time being: #define FALLOC_ALLOCATE 0x1 #define FALLOC_ALLOCATE_KEEP_SIZE 0x2 As the name suggests, when FALLOC_ALLOCATE_KEEP_SIZE mode is passed it will result in file size not being changed even if the preallocation is beyond EOF. And leave all the dealloc/punch/hsm type features for separate future patches/ debates, those really shouldn't hold up the basic fallocate interface. I agree. I agree with Christoph that we are just diverging too much in trying to club those decisions here. Dave, Andreas, Ted ? Regards Suparna -- Regards, Amit Arora - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Random corruption test for e2fsck
On Wed, Jul 11, 2007 at 11:19:38PM -0600, Andreas Dilger wrote: On Jul 11, 2007 17:20 +0200, Andi Kleen wrote: If you use a normal pseudo random number generator and print the seed (e.g. create from the time) initially the image can be easily recreated later without shipping it around. /dev/urandom is not really needed for this since you don't need cryptographic strength randomness. Besides urandom data is precious and it's a pity to use it up needlessly. bash has $RANDOM built in for this purpose. Except it is a lot more efficient and easy to do Ah you chose to only address one sentence in my reply. I thought only Linus liked to to do that. If you're worried about efficiency it's trivial to write a C program that generates bulk pseudo random numbers using random(3) dd if=/dev/urandom bs=1k ... than to spin in a loop getting 16-bit random numbers from bash. We would also be at the mercy of the shell being identical on the user and debugger's systems. With /dev/urandom you have the guarantee you'll never ever reproduce it again. Andrea A. used to rant about people who use srand(time(NULL)) in benchmarks and it's sad these mistakes get repeated again and again. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode
The next version of checkpatch.pl (0.08) should have support for a number of the missed sylistics you mention. Will let them soak for a bit to ensure we're not majorly regressing anything else. -apw ERROR: braces {} are not necessary for single statements #4: FILE: Z11.c:1: +if (EXT4_I(inode)-i_extra_isize = new_extra_isize) { WARNING: declaring multiple variables together should be avoided #9: FILE: Z11.c:6: +unsigned int total_size, shift_bytes, temp = ~0U; WARNING: multiple assignments should be avoided #12: FILE: Z11.c:9: +is-s.not_found = bs-s.not_found = -ENODATA; WARNING: kfree(NULL) is safe this check is probabally not required #16: FILE: Z11.c:13: +if (bs) + kfree(bs); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks
Andrew Morton wrote: On Sun, 01 Jul 2007 03:36:22 -0400 Mingming Cao [EMAIL PROTECTED] wrote: with the patch all headers are checked. the code should become more resistant to on-disk corruptions. needless BUG_ON() have been removed. please, review for inclusion. ... @@ -269,6 +239,70 @@ return size; } +static inline int +ext4_ext_max_entries(struct inode *inode, int depth) Please remove the `inline'. `inline' is usually wrong. It is wrong here. If this function has a single callsite (it has) then the compiler will inline it. If the function later gains more callsites then the compiler will know (correctly) not to inline it. We hope. If we find the compiler still inlines a function as large as this one then we'd need to use noinline and complain at the gcc developers. +{ +int max; + +if (depth == ext_depth(inode)) { +if (depth == 0) +max = ext4_ext_space_root(inode); +else +max = ext4_ext_space_root_idx(inode); +} else { +if (depth == 0) +max = ext4_ext_space_block(inode); +else +max = ext4_ext_space_block_idx(inode); +} + +return max; +} + +static int __ext4_ext_check_header(const char *function, struct inode *inode, +struct ext4_extent_header *eh, +int depth) +{ +const char *error_msg = NULL; Unneeded initialisation. +int max = 0; + +if (unlikely(eh-eh_magic != EXT4_EXT_MAGIC)) { +error_msg = invalid magic; +goto corrupted; +} +if (unlikely(le16_to_cpu(eh-eh_depth) != depth)) { +error_msg = unexpected eh_depth; +goto corrupted; +} +if (unlikely(eh-eh_max == 0)) { +error_msg = invalid eh_max; +goto corrupted; +} +max = ext4_ext_max_entries(inode, depth); +if (unlikely(le16_to_cpu(eh-eh_max) max)) { +error_msg = too large eh_max; +goto corrupted; +} +if (unlikely(le16_to_cpu(eh-eh_entries) le16_to_cpu(eh-eh_max))) { +error_msg = invalid eh_entries; +goto corrupted; +} +return 0; + +corrupted: +ext4_error(inode-i_sb, function, +bad header in inode #%lu: %s - magic %x, +entries %u, max %u(%u), depth %u(%u), +inode-i_ino, error_msg, le16_to_cpu(eh-eh_magic), +le16_to_cpu(eh-eh_entries), le16_to_cpu(eh-eh_max), +max, le16_to_cpu(eh-eh_depth), depth); + +return -EIO; +} + ... +i = depth = ext_depth(inode); Mulitple assignments like this are somewhat unpopular from the coding-style POV. We have a local variable called `i' which is not used as a simple counter and which has no explanatory comment. This is plain bad. Please find a better name for this variable. Review the code for other such lack of clarity - I'm sure there's more. -BUG_ON(le16_to_cpu(eh-eh_entries) le16_to_cpu(eh-eh_max)); -BUG_ON(eh-eh_magic != EXT4_EXT_MAGIC); Yeah, this patch improves things a lot. How well-tested is it? Have any of these errors actually been triggered? path[i].p_hdr = ext_block_hdr(path[i].p_bh); -if (ext4_ext_check_header(__FUNCTION__, inode, -path[i].p_hdr)) { -err = -EIO; -goto out; -} } -BUG_ON(le16_to_cpu(path[i].p_hdr-eh_entries) -le16_to_cpu(path[i].p_hdr-eh_max)); -BUG_ON(path[i].p_hdr-eh_magic != EXT4_EXT_MAGIC); - if (!path[i].p_idx) { /* this level hasn't been touched yet */ path[i].p_idx = EXT_LAST_INDEX(path[i].p_hdr); @@ -1873,17 +1890,24 @@ i, EXT_FIRST_INDEX(path[i].p_hdr), path[i].p_idx); if (ext4_ext_more_to_rm(path + i)) { +struct buffer_head *bh; /* go to the next level */ ext_debug(move to level %d (block %llu)\n, i + 1, idx_pblock(path[i].p_idx)); memset(path + i + 1, 0, sizeof(*path)); -path[i+1].p_bh = -sb_bread(sb, idx_pblock(path[i].p_idx)); -if (!path[i+1].p_bh) { +bh = sb_bread(sb, idx_pblock(path[i].p_idx)); +if (!bh) { /* should we reset i_size? */ err = -EIO; break; } +BUG_ON(i + 1 depth); ouch. Couldn't we do
Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode
On Tue, 2007-07-10 at 16:32 -0700, Andrew Morton wrote: On Sun, 01 Jul 2007 03:38:01 -0400 Mingming Cao [EMAIL PROTECTED] wrote: This patch is on top of the nanosecond timestamp and i_version_hi patches. This sort of information isn't needed (or desired) when this patch hits the git tree. Please ensure that things like this are cleaned up before the patches go to Linus. The incremental patch I have attached contains an updated Changelog entry which cleans this up. + !(EXT4_I(inode)-i_state EXT4_STATE_NO_EXPAND)) { + /* We need extra buffer credits since we may write into EA block +* with this same handle */ + if ((jbd2_journal_extend(handle, +EXT4_DATA_TRANS_BLOCKS(inode-i_sb))) == 0) { + ret = ext4_expand_extra_isize(inode, + EXT4_SB(inode-i_sb)-s_want_extra_isize, + iloc, handle); + if (ret) { + EXT4_I(inode)-i_state |= EXT4_STATE_NO_EXPAND; + if (!expand_message) { + ext4_warning(inode-i_sb, __FUNCTION__, + Unable to expand inode %lu. Delete +some EAs or run e2fsck., + inode-i_ino); + expand_message = 1; + } + } + } + } Maybe that message should come out once per mount rather than once per boot (or once per modprobe)? Done. Now the message gets printed the first time s_mnt_count changes, which means once per mount. + if (free new_extra_isize) { + if (!tried_min_extra_isize s_min_extra_isize) { + tried_min_extra_isize++; + new_extra_isize = s_min_extra_isize; Aren't we missing a brelse(bh) here? I have corrected this. +#define IHDR(inode, raw_inode) \ + ((struct ext4_xattr_ibody_header *) \ + ((void *)raw_inode + \ + EXT4_GOOD_OLD_INODE_SIZE + \ + EXT4_I(inode)-i_extra_isize)) +#define IFIRST(hdr) ((struct ext4_xattr_entry *)((hdr)+1)) Neither of these _have_ to be implemented as macros and hence they should not be. These macros existed before and have just been moved. There are similar such macros in the ext3/4 xattr code and they should probably be changed to inlines. But that should be done in a different patch. Thanks, Kalpak. We need to make sure that existing ext3 filesystems can also avail the new fields that have been added to the ext4 inode. We use s_want_extra_isize and s_min_extra_isize to decide by how much we should expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question about whether users should be able to set s_*_extra_isize smaller than the known fields or not. This patch also adds the functionality to expand inodes to include the newly added fields. We start by trying to expand by s_want_extra_isize bytes and if its fails we try to expand by s_min_extra_isize bytes. This is done by changing the i_extra_isize if enough space is available in the inode and no EAs are present. If EAs are present and there is enough space in the inode then the EAs in the inode are shifted to make space. If enough space is not available in the inode due to the EAs then 1 or more EAs are shifted to the external EA block. In the worst case when even the external EA block does not have enough space we inform the user that some EA would need to be deleted or s_min_extra_isize would have to be reduced. This would be online expansion of inodes. expand_extra_isize option will also be added to e2fsck which will expand all the inodes and if for any reason expansion fails for any inode then the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature will be reset. Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Kalpak Shah [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.22/fs/ext4/inode.c === --- linux-2.6.22.orig/fs/ext4/inode.c +++ linux-2.6.22/fs/ext4/inode.c @@ -3130,9 +3130,8 @@ int ext4_expand_extra_isize(struct inode struct ext4_xattr_ibody_header *header; struct ext4_xattr_entry *entry; - if (EXT4_I(inode)-i_extra_isize = new_extra_isize) { + if (EXT4_I(inode)-i_extra_isize = new_extra_isize) return 0; - } raw_inode = ext4_raw_inode(iloc); @@ -3148,9 +3147,9 @@ int ext4_expand_extra_isize(struct inode return 0; } - /* try to expand with EA present */ + /* try to expand with EAs present */ return ext4_expand_extra_isize_ea(inode, new_extra_isize, - raw_inode, handle); +
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Thu, Jul 12, 2007 at 12:58:13PM +0530, Suparna Bhattacharya wrote: Why don't we just merge the interface for preallocation (essentially enough to satisfy posix_fallocate() and the simple XFS requirement for space reservation without changing file size), which there is clear agreement on (I hope :)). After all, this was all that we set out to do when we started. And leave all the dealloc/punch/hsm type features for separate future patches/ debates, those really shouldn't hold up the basic fallocate interface. I agree with Christoph that we are just diverging too much in trying to club those decisions here. Dave, Andreas, Ted ? Sure. I'll just make XFS work with whatever it is that gets merged. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Initial results of FLEX_BG feature.
On Wed, 11 Jul 2007 18:14:25 -0400 Theodore Tso [EMAIL PROTECTED] wrote: On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote: Right now what I've done is allocate the bitmaps and inode tables at the beginning of each group of 64 BG. Still need to work on fsck since just removing the restriction on were the bitmaps and inode table are located still gives me errors of uninitialized inodes with dtime set. Seems like fsck still expect inode information to be located at specific locations within the disk. Can you send me the patch which you were playing with? I might be able to help you with this. It should be pretty straightforward to remove the constraint on the inode table location. Here is the kernel piece. -JRS --- fs/ext4/super.c |3 3 + 0 - 0 ! include/linux/ext4_fs.h |4 3 + 1 - 0 ! 2 files changed, 6 insertions(+), 1 deletion(-) Index: linux-2.6/fs/ext4/super.c === --- linux-2.6.orig/fs/ext4/super.c 2007-07-11 15:34:58.0 -0500 +++ linux-2.6/fs/ext4/super.c 2007-07-11 16:19:08.0 -0500 @@ -1271,6 +1271,9 @@ static int ext4_check_descriptors (struc ext4_debug (Checking group descriptors); + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) + return 1; + for (i = 0; i sbi-s_groups_count; i++) { if (i == sbi-s_groups_count - 1) Index: linux-2.6/include/linux/ext4_fs.h === --- linux-2.6.orig/include/linux/ext4_fs.h 2007-07-11 15:34:58.0 -0500 +++ linux-2.6/include/linux/ext4_fs.h 2007-07-12 09:58:51.0 -0500 @@ -698,13 +698,15 @@ static inline int ext4_valid_inum(struct #define EXT4_FEATURE_INCOMPAT_META_BG 0x0010 #define EXT4_FEATURE_INCOMPAT_EXTENTS 0x0040 /* extents support */ #define EXT4_FEATURE_INCOMPAT_64BIT0x0080 +#define EXT4_FEATURE_INCOMPAT_FLEX_BG 0x0200 #define EXT4_FEATURE_COMPAT_SUPP EXT2_FEATURE_COMPAT_EXT_ATTR #define EXT4_FEATURE_INCOMPAT_SUPP (EXT4_FEATURE_INCOMPAT_FILETYPE| \ EXT4_FEATURE_INCOMPAT_RECOVER| \ EXT4_FEATURE_INCOMPAT_META_BG| \ EXT4_FEATURE_INCOMPAT_EXTENTS| \ -EXT4_FEATURE_INCOMPAT_64BIT) +EXT4_FEATURE_INCOMPAT_64BIT| \ +EXT4_FEATURE_INCOMPAT_FLEX_BG) #define EXT4_FEATURE_RO_COMPAT_SUPP(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT4_FEATURE_RO_COMPAT_DIR_NLINK | \ - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks
On Thu, 12 Jul 2007 08:57:51 -0500 Dave Kleikamp [EMAIL PROTECTED] wrote: On Thu, 2007-07-12 at 12:38 +0100, Andy Whitcroft wrote: Andrew Morton wrote: +if (ext4_ext_check_header(inode, ext_block_hdr(bh), +depth - i - 1)) { +err = -EIO; +break; +} +path[i+1].p_bh = bh; Really that should have been i + 1. checkpatch misses this. It seems to be missing more stuff that it used to lately. This one is difficult. The rules up to now have been consistent spacing is required on both sides of mathematics operators. I personally like spaces always, but we do tend to use them without spaces too where the binding is effectivly part of the value -- the classic case is something like: pfn MAX_ORDER-1 In allowing that sort of thing, we implictly allow the one you note above. We have tried to be overly annoying on these things, and so the check is consistancy, spaces both or neither. We could be stricter. I personally think stricter is better. An occasionally false-positive isn't going to hurt anyone. (Well, maybe the checkpatch.pl maintainers will get nagged.) It at least will cause the developer to look at the line of code in question and make a conscious decision to leave it as it is. I'm assuming that upstream maintainers use checkpatch.pl with some constraint, and don't throw every patch that produces a warning back at the submitter. I'm in two minds. Missing-the-spaces is pretty damn common and is sometimes a reasonable way of saving quite a lot of horizontal space. I spose we could take it out again if it's causing problems. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html