Ext3 not marking filesystems as with errors
Hi, one friend has just pointed me to a following misbehaviour of ext3. If we stumble on some error in JBD (e.g. in commit code), we call __journal_abort_hard(). It just marks the journal as aborted but does nothing else. Later ext3 comes, finds journal aborted, calls ext3_abort() which remounts fs read-only and stops (it does not mark filesystem as having errors). It calls journal_abort(.., -EIO) but that does nothing because the journal is already aborted. If you then unmount the filesystem and mount it again, everything goes on happily as if there was no error - no suggestion for running fsck, nothing. I guess this is a bug but please correct me if you don't think so. There are two possibilities how to fix it - either we mark the filesystem as with errors in ext3_abort() or we could call some less lowlevel function from JBD to abort journal (as soon as j_errno is set, we are safe). Any feeling what is less hacky? Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling h-trees too early?
On Thu, Sep 20, 2007 at 03:33:50PM +0200, Jan Kara wrote: So for example deleting kernel tree on my computer takes ~14 seconds with h-trees and less than 9 without them. Also doing 'cp -lr' of the kernel tree takes 8 seconds with h-trees and 6.3s without them... So I think the performance difference is quite measurable. This is in a completely cold cache state? (i.e. mounting and unmounting the filesystem before doing the rm -rf?) On my kernel tree, using the command: lsattr -R | grep -- -I- shows that only 8 directories are htree indexed, and they're not that big: 12 drwxr-xr-x 12 tytso tytso 12288 2007-09-14 16:25 ./drivers/char 24 drwxr-xr-x 30 tytso tytso 24576 2007-09-14 16:25 ./drivers/net 20 drwxr-xr-x 2 tytso tytso 20480 2007-09-14 16:25 ./drivers/usb/serial 32 drwxr-xr-x 24 tytso tytso 32768 2007-09-14 16:10 ./include/linux 12 drwxr-xr-x 2 tytso tytso 12288 2007-09-14 16:25 ./net/bridge/netfilter 24 drwxr-xr-x 2 tytso tytso 24576 2007-09-14 16:25 ./net/ipv4/netfilter 12 drwxr-xr-x 2 tytso tytso 12288 2007-09-14 16:25 ./net/ipv6/netfilter 32 drwxr-xr-x 2 tytso tytso 32768 2007-09-14 16:25 ./net/netfilter ... which means if the benchmark only focused on deleting these files, then presumably the percentage increase would be even worse. Certainly one of the things that we could consider is for small directories to do an in-memory sort of all of the directory entries at opendir() time, and keeping that list until it is closed. We can't do this for really big directories, but we could easily do it for directories under 32k or 64k. Umm, yes. That would be probably feasible. But converting to htrees only when directories grow larger would avoid the problem also. It also does not seem *that* hard but maybe I miss some nasty details... The reason why I mentioned the caching idea is we already have code to manage and return directories stored in an rbtree in the kernel, albeit for a slightly different purpose. So hacking it up to cache all of the directory entries for directories 64k and to index them by inode number instead of hash key would be pretty easy. What's nasty about converting to htrees after the directories become larger is that we need to reserve extra space in the journal for each block that we need to modify, and then just the fact that we have to keep track of the multiple buffers. Basically, not impossible but just a pain in the *ss. - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH e2fsprogs] - ignore bind mounts in fsck
Theodore Tso wrote: On Wed, Sep 19, 2007 at 03:20:14PM -0500, Eric Sandeen wrote: An entry like this in /etc/fstab: /foo /barext3bind,defaults 1 3 will stop boot, as fsck.ext3 tries to check it and fails: e2fsck 1.40.2 (12-Jul-2007) fsck.ext3: Is a directory while trying to open /foo The superblock could not be read or does not describe a correct ext2 filesystem. ... Granted, asking for fsck of a bind mount in the fstab is a bit odd, but it doesn't seem like it should stop the boot process if you make this mistake. That's fair, but note that the dump number and fsck pass number really should be zero in the fstab entry. i.e., it really should be 0 0, or just plain ommitted. Agreed. If you think fsck shouldn't silently cope with this mistake, and instead punish the user for it (it is what they asked for, after all), I'm ok with that too. I'm willing to close my end as NOTABUG if you don't want to take this patch. :) (FWIW, this is RH bug #151533) -Eric - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Avoid rec_len overflow with 64KB block size
Hello, when converting ext4 directories to pagecache I just came over Takashi's patch preventing overflowing of rec_len. Looking over the patch - can't we do it more elegantly by using say 0x instead of 64K and perform conversion (using some helper) at the moment we read / store rec_len? That would be IMHO more transparent than current approach (at least it took me some time to understand what's going on with the current patch when I was looking at the code)... Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH e2fsprogs] - ignore bind mounts in fsck
On Thu, Sep 20, 2007 at 09:31:56AM -0500, Eric Sandeen wrote: Agreed. If you think fsck shouldn't silently cope with this mistake, and instead punish the user for it (it is what they asked for, after all), I'm ok with that too. I'm willing to close my end as NOTABUG if you don't want to take this patch. :) I'm willing to take the patch, although I am thinking that it might be appropriate for fsck to print a warning message --- Bind mount with non-zero fsck pass, skipping, or some such. What do you think? - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling h-trees too early?
On Thu 20-09-07 11:14:40, Theodore Tso wrote: On Thu, Sep 20, 2007 at 04:58:39PM +0200, Jan Kara wrote: Hmm, strange - I've just looked at my computer and dir_index is set just for 5 directories in my tree. I looked at a tree that had object files, which is probably why I had 8 directories; I'm guessing you probably just had kernel sources and no build files. If I try deleting just them, I also see some performance decrease but it's less than if I try deleting the whole tree (and that result seems to be quite consistent)... There's something fishy there. Maybe I could try seekwatcher or something similar to see what's really happening. That is very strange. Just a guess: Can't the culprit be the following test in ext3/4_readdir()? if (EXT4_HAS_COMPAT_FEATURE(inode-i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) ((EXT4_I(inode)-i_flags EXT4_INDEX_FL) || ((inode-i_size sb-s_blocksize_bits) == 1))) { error = ext4_dx_readdir(filp, dirent, filldir); if (error != ERR_BAD_DX_DIR) { ret = error; goto out; } /* * We don't set the inode dirty flag since it's not * critical that it get flushed back to the disk. */ EXT4_I(filp-f_path.dentry-d_inode)-i_flags = ~EXT4_INDEX_FL; } It calls ext4_dx_readdir() for *every* directory with 1 block (we have 1326 of them in the kernel tree). Now ext4_dx_readdir() calls ext4_htree_fill_tree() which finds out the directory is not h-tree and and calls htree_dirblock_to_tree(). So even for 4KB directories we end up deleting inodes in hash order! And as a bonus we burn some cycles building trees etc. What is the point of this? Honza -- Jan Kara [EMAIL PROTECTED] SUSE Labs, CR - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Avoid rec_len overflow with 64KB block size
when converting ext4 directories to pagecache I just came over Takashi's patch preventing overflowing of rec_len. Looking over the patch - can't we do it more elegantly by using say 0x instead of 64K and perform conversion (using some helper) at the moment we read / store rec_len? That would be IMHO more transparent than current approach (at least it took me some time to understand what's going on with the current patch when I was looking at the code)... Attached is a patch that does this for ext4. If you like this approach, I can cook up a similar patch for ext2 / ext3. Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0x instead and convert value when read from / written to disk. The patch also converts some places to use ext4_next_entry() when we are changing them anyway. Signed-off-by: Jan Kara [EMAIL PROTECTED] diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-rc6/fs/ext4/dir.c linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/dir.c --- linux-2.6.23-rc6/fs/ext4/dir.c 2007-09-18 19:22:28.0 +0200 +++ linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/dir.c 2007-09-20 18:08:02.0 +0200 @@ -69,7 +69,7 @@ int ext4_check_dir_entry (const char * f unsigned long offset) { const char * error_msg = NULL; - const int rlen = le16_to_cpu(de-rec_len); + const int rlen = ext4_get_rec_len(le16_to_cpu(de-rec_len)); if (rlen EXT4_DIR_REC_LEN(1)) error_msg = rec_len is smaller than minimal; @@ -176,10 +176,10 @@ revalidate: * least that it is non-zero. A * failure will be detected in the * dirent test below. */ -if (le16_to_cpu(de-rec_len) - EXT4_DIR_REC_LEN(1)) +if (ext4_get_rec_len(le16_to_cpu(de-rec_len)) + EXT4_DIR_REC_LEN(1)) break; -i += le16_to_cpu(de-rec_len); +i += ext4_get_rec_len(le16_to_cpu(de-rec_len)); } offset = i; filp-f_pos = (filp-f_pos ~(sb-s_blocksize - 1)) @@ -201,7 +201,7 @@ revalidate: ret = stored; goto out; } - offset += le16_to_cpu(de-rec_len); + offset += ext4_get_rec_len(le16_to_cpu(de-rec_len)); if (le32_to_cpu(de-inode)) { /* We might block in the next section * if the data destination is @@ -223,7 +223,7 @@ revalidate: goto revalidate; stored ++; } - filp-f_pos += le16_to_cpu(de-rec_len); + filp-f_pos += ext4_get_rec_len(le16_to_cpu(de-rec_len)); } offset = 0; brelse (bh); diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-rc6/fs/ext4/namei.c linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/namei.c --- linux-2.6.23-rc6/fs/ext4/namei.c 2007-09-18 19:22:28.0 +0200 +++ linux-2.6.23-rc6-1-ext4_64k_blocksize/fs/ext4/namei.c 2007-09-20 18:29:29.0 +0200 @@ -280,7 +280,7 @@ static struct stats dx_show_leaf(struct space += EXT4_DIR_REC_LEN(de-name_len); names++; } - de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de-rec_len)); + de = ext4_next_entry(de); } printk((%i)\n, names); return (struct stats) { names, space, 1 }; @@ -525,7 +525,8 @@ static int ext4_htree_next_block(struct */ static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 *p) { - return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p-rec_len)); + return (struct ext4_dir_entry_2 *)((char*)p + + ext4_get_rec_len(le16_to_cpu(p-rec_len))); } /* @@ -689,7 +690,7 @@ static int dx_make_map (struct ext4_dir_ cond_resched(); } /* XXX: do we need to check rec_len == 0 case? -Chris */ - de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de-rec_len)); + de = ext4_next_entry(de); } return count; } @@ -790,7 +791,7 @@ static inline int search_dirblock(struct return 1; } /* prevent looping on a bad block */ - de_len = le16_to_cpu(de-rec_len); + de_len = ext4_get_rec_len(le16_to_cpu(de-rec_len)); if (de_len = 0) return -1; offset += de_len; @@ -1099,7 +1100,7 @@ dx_move_dirents(char *from, char *to, st rec_len = EXT4_DIR_REC_LEN(de-name_len); memcpy (to, de, rec_len); ((struct ext4_dir_entry_2 *) to)-rec_len = -cpu_to_le16(rec_len); +cpu_to_le16(ext4_store_rec_len(rec_len)); de-inode = 0; map++; to += rec_len; @@ -1114,13 +1115,12 @@ static struct ext4_dir_entry_2* dx_pack_ prev = to = de; while ((char*)de base + size) { - next = (struct ext4_dir_entry_2 *) ((char *) de + - le16_to_cpu(de-rec_len)); + next = ext4_next_entry(de); if (de-inode de-name_len) { rec_len = EXT4_DIR_REC_LEN(de-name_len); if (de to) memmove(to, de, rec_len); - to-rec_len = cpu_to_le16(rec_len); + to-rec_len = cpu_to_le16(ext4_store_rec_len(rec_len)); prev = to; to = (struct ext4_dir_entry_2 *) (((char *) to) + rec_len); } @@ -1178,8 +1178,8 @@ static struct ext4_dir_entry_2 *do_split /* Fancy dance
Re: Enabling h-trees too early?
On Thu, Sep 20, 2007 at 06:19:04PM +0200, Jan Kara wrote: if (EXT4_HAS_COMPAT_FEATURE(inode-i_sb, EXT4_FEATURE_COMPAT_DIR_INDEX) ((EXT4_I(inode)-i_flags EXT4_INDEX_FL) || ((inode-i_size sb-s_blocksize_bits) == 1))) { error = ext4_dx_readdir(filp, dirent, filldir); if (error != ERR_BAD_DX_DIR) { ret = error; goto out; } /* * We don't set the inode dirty flag since it's not * critical that it get flushed back to the disk. */ EXT4_I(filp-f_path.dentry-d_inode)-i_flags = ~EXT4_INDEX_FL; } It calls ext4_dx_readdir() for *every* directory with 1 block (we have 1326 of them in the kernel tree). Now ext4_dx_readdir() calls ext4_htree_fill_tree() which finds out the directory is not h-tree and and calls htree_dirblock_to_tree(). So even for 4KB directories we end up deleting inodes in hash order! And as a bonus we burn some cycles building trees etc. What is the point of this? That was added so we wouldn't get screwed when a directory that was previously non htree became an htree directory while the directory fd is open. So the failure case is one where you do opendir(), readdir() on 25% of the directory, sleep for 2 hours, and in the meantime, 200 files are added to the directory and it gets converted into a htree index, causing all of the previously returned readdir() results in directory order to be completely screwed up now that the directory has been converted into an htree. (All of the readdir/telldir/seekdir POSIX requirements cause filesystem designers to tear their hair out.) What we would need to do to avoid needing this is to read in the entire directory leaf page into the rbtree, sorted by inode number, and then to keep that rbtree for the entire life of the open directory file descriptor. We would also have to change telldir/seekdir to use something else as a telldir cookie, and readdir would have to be set up to *only* use the rbtree, and never look at the on-disk directory. This would also mean that all of the files created or deleted after the initial opendir() would never be reflected in results returned by readdir(), but that's allowed by POSIX. And if we do this for a single block 4k directory, we might as well do it for a 32k or 64k HTREE directory as well. Does that make sense? - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH e2fsprogs] return status from chattr
This is for RH bug #180596, Chattr command doesn't provide expected exit code in case of failure. (trying to clear out an e2fsprogs bug backlog, can you tell?) :) This is a little funky as a result of the man page saying that links encountered on recursive traversal are (silently?) ignored. I changed this a bit so that if it's explicitly listed on the commandline, the link itself gets chattr'd. I'm not quite sure what is intended here; that the links are not *followed* or that they are not chattr'd? Seems a little odd to me. I tried to follow the way other recursive commands work, for example chmod -R, and carry on in the face of any errors. If any error was encountered, exit with an error. If no errors, exit 0. Also, if both flags and -v (version) are specified, and the flag set encounters an error, the version set is not attempted. Is this ok or should both commands be tried? Finally, I'm curious, the utility ignores anything that's not a link, regular file, or dir, but the kernel code doesn't have these checks. Should it? Comments? Thanks, -Eric Signed-off-by: Eric Sandeen [EMAIL PROTECTED] Index: e2fsprogs-1.40.2/misc/chattr.c === --- e2fsprogs-1.40.2.orig/misc/chattr.c +++ e2fsprogs-1.40.2/misc/chattr.c @@ -182,7 +182,7 @@ static int decode_arg (int * i, int argc static int chattr_dir_proc (const char *, struct dirent *, void *); -static void change_attributes (const char * name) +static int change_attributes (const char * name, int cmdline) { unsigned long flags; STRUCT_STAT st; @@ -190,19 +190,20 @@ static void change_attributes (const cha if (LSTAT (name, st) == -1) { com_err (program_name, errno, _(while trying to stat %s), name); - return; + return -1; } - if (S_ISLNK(st.st_mode) recursive) - return; - /* Don't try to open device files, fifos etc. We probably - ought to display an error if the file was explicitly given - on the command line (whether or not recursive was - requested). */ - if (!S_ISREG(st.st_mode) !S_ISLNK(st.st_mode) - !S_ISDIR(st.st_mode)) - return; + /* Just silently ignore links found by recursion; + not an error according to the manpage */ + if (S_ISLNK(st.st_mode) !cmdline) + return 0; + /* Don't try to open device files, fifos etc. */ + if (!S_ISREG(st.st_mode) !S_ISLNK(st.st_mode) + !S_ISDIR(st.st_mode)) { + com_err (program_name, EINVAL, _(for file %s), name); + return -1; + } if (set) { if (verbose) { printf (_(Flags of %s set as ), name); @@ -212,10 +213,11 @@ static void change_attributes (const cha if (fsetflags (name, sf) == -1) perror (name); } else { - if (fgetflags (name, flags) == -1) + if (fgetflags (name, flags) == -1) { com_err (program_name, errno, _(while reading flags on %s), name); - else { + return -1; + } else { if (rem) flags = ~rf; if (add) @@ -227,25 +229,32 @@ static void change_attributes (const cha } if (!S_ISDIR(st.st_mode)) flags = ~EXT2_DIRSYNC_FL; - if (fsetflags (name, flags) == -1) + if (fsetflags (name, flags) == -1) { com_err (program_name, errno, _(while setting flags on %s), name); + return -1; + } + } } if (set_version) { if (verbose) printf (_(Version of %s set as %lu\n), name, version); - if (fsetversion (name, version) == -1) + if (fsetversion (name, version) == -1) { com_err (program_name, errno, _(while setting version on %s), name); + return -1; + } } if (S_ISDIR(st.st_mode) recursive) - iterate_on_dir (name, chattr_dir_proc, NULL); + return iterate_on_dir (name, chattr_dir_proc, NULL); } static int chattr_dir_proc (const char * dir_name, struct dirent * de, void * private EXT2FS_ATTR((unused))) { + int err; + if (strcmp (de-d_name, .) strcmp (de-d_name, ..)) { char *path; @@ -253,11 +262,13 @@ static int chattr_dir_proc (const char * if (!path) { fprintf(stderr,
Re: Avoid rec_len overflow with 64KB block size
On Sep 20, 2007 18:17 +0200, Jan Kara wrote: when converting ext4 directories to pagecache I just came over Takashi's patch preventing overflowing of rec_len. Looking over the patch - can't we do it more elegantly by using say 0x instead of 64K and perform conversion (using some helper) at the moment we read / store rec_len? That would be IMHO more transparent than current approach (at least it took me some time to understand what's going on with the current patch when I was looking at the code)... Attached is a patch that does this for ext4. If you like this approach, I can cook up a similar patch for ext2 / ext3. Yes, I think this is much cleaner to avoid all the conditionals in the code. With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0x instead and convert value when read from / written to disk. The patch also converts some places to use ext4_next_entry() when we are changing them anyway. const char * error_msg = NULL; - const int rlen = le16_to_cpu(de-rec_len); + const int rlen = ext4_get_rec_len(le16_to_cpu(de-rec_len)); Maybe we should wrap the le16_to_cpu() into ext4_get_rec_len() itself, making the parameter just be __le16 rec_len? We appear to have le16_to_cpu() at every callsite. Likewise for ext4_store_rec_len() it should do the cpu_to_le16() internally and return an __le16. It should maybe be called ext4_set_rec_len() to be a more natural pairing? This also needs a patch for e2fsprogs, while I'm not sure the old patch did (has anyone ever checked this?) We could still consider making EXT4_DIR_MAX_REC_LEN as in Takashi's patch, but keep the cleanups here. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH e2fsprogs] - ignore bind mounts in fsck
It might also be worthwhile to file a documentation bug against the mount and fstab man pages, since it doesn't currently seem to specify (at least on my Ubuntu system; maybe it's been fixed in newer upstream packages) that you can specify the bind option in the fstab file. /src/dest ext3bind,default It's not clear to me that this should be the preferred form. Why not? /src/dest bind defaults or /src/dest none bind,defaults instead? In any case, how bind mounts are supposed to be specified in fstab is not documented, and it really should be. - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH e2fsprogs] - ignore bind mounts in fsck
This is what I actually committed into e2fsprogs git, in the maint branch. Note the one-line summary at the beginning of the patch description, and the Addresses-Red-Hat-Bugzilla line before the Signed-off-by lines. - Ted commit ed773a263829493e4e4bf612dbec2380cf09349f Author: Theodore Ts'o [EMAIL PROTECTED] Date: Thu Sep 20 15:06:35 2007 -0400 fsck: Ignore /etc/fstab entries for bind mounts If a user specifies a bind mount with a non-zero fsck pass number, for example: /foo/barext3bind,defaults 1 3 print a warning and ignore the fstab entry. Addresses-Red-Hat-Bugzilla: #151533 Signed-off-by: Eric Sandeen [EMAIL PROTECTED] Signed-off-by: Theodore Ts'o [EMAIL PROTECTED] diff --git a/misc/fsck.c b/misc/fsck.c index 1dcac25..108adf6 100644 --- a/misc/fsck.c +++ b/misc/fsck.c @@ -867,6 +867,16 @@ static int ignore(struct fs_info *fs) if (fs-passno == 0) return 1; + /* +* If this is a bind mount, ignore it. +*/ + if (opt_in_list(bind, fs-opts)) { + fprintf(stderr, + _(%s: skipping bad line in /etc/fstab: bind mount with nonzero fsck pass number\n), + fs-mountpt); + return 1; + } + interpret_type(fs); /* - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] obsolete libcom-err for SuSE e2fsprogs
Andreas Dilger wrote: On Sep 19, 2007 20:41 -0500, Eric Sandeen wrote: Andreas Dilger wrote: It isn't possible to build an e2fsprogs via make rpm on SuSE and have it install cleanly, because they split out some of the libraries into separate packages. We've got the current patch to the .spec file, but I'm open to discussion if it is more desirable to change the .spec to continue to build separate RPMs (though that is more of a distribution hassle and might need major changes in the .spec file). FWIW, I also have an RFE assigned to me for RHEL/Fedora to split up our e2fsprogs packages for libcom_err and libuuid... since many non-filesystem things now require them. So, this is sort of going in the opposite direction. :) Any idea how many distros already split it out? I know Debian-based distros have done this for ages... I'd also welcome someone with rpm-fu split it into separate packages. I'd do this, my rpm-fu is still reasonably strong, though - I'm curious, is there a compelling reason to split out just libcom-err? what about libuuid? libblkid? e2fsprogs is a bit of a grab bag of things. What's the rationale for the split? -Eric - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Ext4: Uninitialized Block Groups
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur [EMAIL PROTECTED] wrote: In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. This needed a few fixups due to conflicts with ext2-ext3-ext4-add-block-bitmap-validation.patch but they were pretty straightforward. Please check that the result is OK. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
+ ext4-uninitialized-block-groups.patch added to -mm tree
The patch titled Ext4: Uninitialized Block Groups has been added to the -mm tree. Its filename is ext4-uninitialized-block-groups.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: Ext4: Uninitialized Block Groups From: Andreas Dilger [EMAIL PROTECTED] In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count */ __le16 bg_checksum;/* crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Avantika Mathur [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/ext4/balloc.c| 92 +- fs/ext4/group.h | 29 fs/ext4/ialloc.c| 132 +++--- fs/ext4/resize.c|2 fs/ext4/super.c | 96 +++ include/linux/ext4_fs.h | 16 +++- 6 files changed, 351 insertions(+), 16 deletions(-) diff -puN fs/ext4/balloc.c~ext4-uninitialized-block-groups fs/ext4/balloc.c --- a/fs/ext4/balloc.c~ext4-uninitialized-block-groups +++ a/fs/ext4/balloc.c @@ -20,6 +20,7 @@ #include linux/quotaops.h #include linux/buffer_head.h +#include group.h /* * balloc.c contains the blocks allocation and deallocation routines */ @@ -42,6 +43,74 @@ void ext4_get_group_no_and_offset(struct } +/* Initializes an uninitialized block bitmap if given, and returns the + * number of blocks free in the group. */ +unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh, + int block_group, struct ext4_group_desc *gdp) +{ + unsigned long start; + int bit, bit_max; + unsigned free_blocks; + struct ext4_sb_info *sbi = EXT4_SB(sb); + + if (bh) { + J_ASSERT_BH(bh, buffer_locked(bh)); + + /* If checksum is bad mark all blocks used to prevent allocation +* essentially implementing a per-group read-only flag. */ + if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) { + ext4_error(sb, __FUNCTION__, +
+ ext3-new-export-ops.patch added to -mm tree
The patch titled ext3: new export ops has been added to the -mm tree. Its filename is ext3-new-export-ops.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: ext3: new export ops From: Christoph Hellwig [EMAIL PROTECTED] Trivial switch over to the new generic helpers. Signed-off-by: Christoph Hellwig [EMAIL PROTECTED] Cc: Neil Brown [EMAIL PROTECTED] Cc: J. Bruce Fields [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/ext3/super.c | 35 --- 1 files changed, 20 insertions(+), 15 deletions(-) diff -puN fs/ext3/super.c~ext3-new-export-ops fs/ext3/super.c --- a/fs/ext3/super.c~ext3-new-export-ops +++ a/fs/ext3/super.c @@ -631,13 +631,10 @@ static int ext3_show_options(struct seq_ } -static struct dentry *ext3_get_dentry(struct super_block *sb, void *vobjp) +static struct inode *ext3_nfs_get_inode(struct super_block *sb, + u64 ino, u32 generation) { - __u32 *objp = vobjp; - unsigned long ino = objp[0]; - __u32 generation = objp[1]; struct inode *inode; - struct dentry *result; if (ino EXT3_FIRST_INO(sb) ino != EXT3_ROOT_INO) return ERR_PTR(-ESTALE); @@ -660,15 +657,22 @@ static struct dentry *ext3_get_dentry(st iput(inode); return ERR_PTR(-ESTALE); } - /* now to find a dentry. -* If possible, get a well-connected one -*/ - result = d_alloc_anon(inode); - if (!result) { - iput(inode); - return ERR_PTR(-ENOMEM); - } - return result; + + return inode; +} + +static struct dentry *ext3_fh_to_dentry(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + return generic_fh_to_dentry(sb, fid, fh_len, fh_type, + ext3_nfs_get_inode); +} + +static struct dentry *ext3_fh_to_parent(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + return generic_fh_to_parent(sb, fid, fh_len, fh_type, + ext3_nfs_get_inode); } #ifdef CONFIG_QUOTA @@ -738,8 +742,9 @@ static const struct super_operations ext }; static struct export_operations ext3_export_ops = { + .fh_to_dentry = ext3_fh_to_dentry, + .fh_to_parent = ext3_fh_to_parent, .get_parent = ext3_get_parent, - .get_dentry = ext3_get_dentry, }; enum { _ Patches currently in -mm which might be from [EMAIL PROTECTED] are git-nfs.patch git-nfsd.patch partially-fix-up-the-lookup_one_noperm-mess.patch optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch git-xfs.patch sysv-convert-to-new-aops.patch alpha-convert-to-generic-sys_ptrace.patch kill-declare_mutex_locked.patch remove-unneded-lock_kernel-in-driver-block-loopc.patch ufs-move-non-layout-parts-of-ufs_fsh-to-fs-ufs.patch fix-execute-checking-in-permission.patch exec-remove-unnecessary-check-for-mnt_noexec.patch fix-f_version-type-should-be-u64-instead-of-unsigned-long.patch unprivileged-mounts-add-user-mounts-to-the-kernel.patch unprivileged-mounts-allow-unprivileged-umount.patch unprivileged-mounts-account-user-mounts.patch unprivileged-mounts-propagate-error-values-from-clone_mnt.patch unprivileged-mounts-allow-unprivileged-bind-mounts.patch unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch unprivileged-mounts-allow-unprivileged-mounts.patch unprivileged-mounts-allow-unprivileged-fuse-mounts.patch unprivileged-mounts-propagation-inherit-owner-from-parent.patch unprivileged-mounts-add-no-submounts-flag.patch revoke-special-mmap-handling.patch revoke-core-code.patch revoke-support-for-ext2-and-ext3.patch revoke-add-documentation.patch revoke-wire-up-i386-system-calls.patch exportfs-add-fid-type.patch exportfs-add-new-methods.patch ext2-new-export-ops.patch ext3-new-export-ops.patch ext4-new-export-ops.patch efs-new-export-ops.patch jfs-new-export-ops.patch ntfs-new-export-ops.patch xfs-new-export-ops.patch fat-new-export-ops.patch isofs-new-export-ops.patch shmem-new-export-ops.patch reiserfs-new-export-ops.patch gfs2-new-export-ops.patch ocfs2-new-export-ops.patch exportfs-remove-old-methods.patch exportfs-make-struct-export_operations-const.patch exportfs-update-documentation.patch - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
+ ext4-new-export-ops.patch added to -mm tree
The patch titled ext4: new export ops has been added to the -mm tree. Its filename is ext4-new-export-ops.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: ext4: new export ops From: Christoph Hellwig [EMAIL PROTECTED] Trivial switch over to the new generic helpers. Signed-off-by: Christoph Hellwig [EMAIL PROTECTED] Cc: Neil Brown [EMAIL PROTECTED] Cc: J. Bruce Fields [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/ext4/super.c | 35 --- 1 files changed, 20 insertions(+), 15 deletions(-) diff -puN fs/ext4/super.c~ext4-new-export-ops fs/ext4/super.c --- a/fs/ext4/super.c~ext4-new-export-ops +++ a/fs/ext4/super.c @@ -694,13 +694,10 @@ static int ext4_show_options(struct seq_ } -static struct dentry *ext4_get_dentry(struct super_block *sb, void *vobjp) +static struct inode *ext4_nfs_get_inode(struct super_block *sb, + u64 ino, u32 generation) { - __u32 *objp = vobjp; - unsigned long ino = objp[0]; - __u32 generation = objp[1]; struct inode *inode; - struct dentry *result; if (ino EXT4_FIRST_INO(sb) ino != EXT4_ROOT_INO) return ERR_PTR(-ESTALE); @@ -723,15 +720,22 @@ static struct dentry *ext4_get_dentry(st iput(inode); return ERR_PTR(-ESTALE); } - /* now to find a dentry. -* If possible, get a well-connected one -*/ - result = d_alloc_anon(inode); - if (!result) { - iput(inode); - return ERR_PTR(-ENOMEM); - } - return result; + + return inode; +} + +static struct dentry *ext4_fh_to_dentry(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + return generic_fh_to_dentry(sb, fid, fh_len, fh_type, + ext4_nfs_get_inode); +} + +static struct dentry *ext4_fh_to_parent(struct super_block *sb, struct fid *fid, + int fh_len, int fh_type) +{ + return generic_fh_to_parent(sb, fid, fh_len, fh_type, + ext4_nfs_get_inode); } #ifdef CONFIG_QUOTA @@ -801,8 +805,9 @@ static const struct super_operations ext }; static struct export_operations ext4_export_ops = { + .fh_to_dentry = ext4_fh_to_dentry, + .fh_to_parent = ext4_fh_to_parent, .get_parent = ext4_get_parent, - .get_dentry = ext4_get_dentry, }; enum { _ Patches currently in -mm which might be from [EMAIL PROTECTED] are git-nfs.patch git-nfsd.patch partially-fix-up-the-lookup_one_noperm-mess.patch optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch git-xfs.patch sysv-convert-to-new-aops.patch alpha-convert-to-generic-sys_ptrace.patch kill-declare_mutex_locked.patch remove-unneded-lock_kernel-in-driver-block-loopc.patch ufs-move-non-layout-parts-of-ufs_fsh-to-fs-ufs.patch fix-execute-checking-in-permission.patch exec-remove-unnecessary-check-for-mnt_noexec.patch fix-f_version-type-should-be-u64-instead-of-unsigned-long.patch unprivileged-mounts-add-user-mounts-to-the-kernel.patch unprivileged-mounts-allow-unprivileged-umount.patch unprivileged-mounts-account-user-mounts.patch unprivileged-mounts-propagate-error-values-from-clone_mnt.patch unprivileged-mounts-allow-unprivileged-bind-mounts.patch unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch unprivileged-mounts-allow-unprivileged-mounts.patch unprivileged-mounts-allow-unprivileged-fuse-mounts.patch unprivileged-mounts-propagation-inherit-owner-from-parent.patch unprivileged-mounts-add-no-submounts-flag.patch revoke-special-mmap-handling.patch revoke-core-code.patch revoke-support-for-ext2-and-ext3.patch revoke-add-documentation.patch revoke-wire-up-i386-system-calls.patch exportfs-add-fid-type.patch exportfs-add-new-methods.patch ext2-new-export-ops.patch ext3-new-export-ops.patch ext4-new-export-ops.patch efs-new-export-ops.patch jfs-new-export-ops.patch ntfs-new-export-ops.patch xfs-new-export-ops.patch fat-new-export-ops.patch isofs-new-export-ops.patch shmem-new-export-ops.patch reiserfs-new-export-ops.patch gfs2-new-export-ops.patch ocfs2-new-export-ops.patch exportfs-remove-old-methods.patch exportfs-make-struct-export_operations-const.patch exportfs-update-documentation.patch - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
+ exportfs-add-new-methods.patch added to -mm tree
The patch titled exportfs: add new methods has been added to the -mm tree. Its filename is exportfs-add-new-methods.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: exportfs: add new methods From: Christoph Hellwig [EMAIL PROTECTED] Add the guts for the new filesystem API to exportfs. There's now a fh_to_dentry method that returns a dentry for the object looked for given a filehandle fragment, and a fh_to_parent operation that returns the dentry for the encoded parent directory in case the file handle contains it. There are default implementations for these methods that only take a callback for an nfs-enhanced iget variant and implement the rest of the semantics. Signed-off-by: Christoph Hellwig [EMAIL PROTECTED] Cc: Neil Brown [EMAIL PROTECTED] Cc: J. Bruce Fields [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Cc: Dave Kleikamp [EMAIL PROTECTED] Cc: Anton Altaparmakov [EMAIL PROTECTED] Cc: David Chinner [EMAIL PROTECTED] Cc: Timothy Shimmin [EMAIL PROTECTED] Cc: OGAWA Hirofumi [EMAIL PROTECTED] Cc: Hugh Dickins [EMAIL PROTECTED] Cc: Chris Mason [EMAIL PROTECTED] Cc: Jeff Mahoney [EMAIL PROTECTED] Cc: Vladimir V. Saveliev [EMAIL PROTECTED] Cc: Steven Whitehouse [EMAIL PROTECTED] Cc: Mark Fasheh [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/exportfs/expfs.c | 136 +++-- fs/libfs.c | 88 +++ include/linux/exportfs.h | 30 3 files changed, 248 insertions(+), 6 deletions(-) diff -puN fs/exportfs/expfs.c~exportfs-add-new-methods fs/exportfs/expfs.c --- a/fs/exportfs/expfs.c~exportfs-add-new-methods +++ a/fs/exportfs/expfs.c @@ -514,17 +514,141 @@ struct dentry *exportfs_decode_fh(struct int (*acceptable)(void *, struct dentry *), void *context) { struct export_operations *nop = mnt-mnt_sb-s_export_op; - struct dentry *result; + struct dentry *result, *alias; + int err; - if (nop-decode_fh) { - result = nop-decode_fh(mnt-mnt_sb, fid-raw, fh_len, + /* +* Old way of doing things. Will go away soon. +*/ + if (!nop-fh_to_dentry) { + if (nop-decode_fh) { + return nop-decode_fh(mnt-mnt_sb, fid-raw, fh_len, fileid_type, acceptable, context); + } else { + return export_decode_fh(mnt-mnt_sb, fid-raw, fh_len, + fileid_type, acceptable, context); + } + } + + /* +* Try to get any dentry for the given file handle from the filesystem. +*/ + result = nop-fh_to_dentry(mnt-mnt_sb, fid, fh_len, fileid_type); + if (!result) + result = ERR_PTR(-ESTALE); + if (IS_ERR(result)) + return result; + + if (S_ISDIR(result-d_inode-i_mode)) { + /* +* This request is for a directory. +* +* On the positive side there is only one dentry for each +* directory inode. On the negative side this implies that we +* to ensure our dentry is connected all the way up to the +* filesystem root. +*/ + if (result-d_flags DCACHE_DISCONNECTED) { + err = reconnect_path(mnt-mnt_sb, result); + if (err) + goto err_result; + } + + if (!acceptable(context, result)) { + err = -EACCES; + goto err_result; + } + + return result; } else { - result = export_decode_fh(mnt-mnt_sb, fid-raw, fh_len, - fileid_type, acceptable, context); + /* +* It's not a directory. Life is a little more complicated. +*/ + struct dentry *target_dir, *nresult; + char nbuf[NAME_MAX+1]; + + /* +* See if either the dentry we just got from the filesystem +* or any alias for it is acceptable. This is always true +* if this filesystem is exported without the subtreecheck +* option. If the filesystem is exported with the subtree +* check option there's a fair chance we need to look at +* the parent directory in the file handle and make sure +* it's connected to the filesystem root. +*/ + alias = find_acceptable_alias(result, acceptable, context); + if (alias) + return alias; + + /* +
[PATCH] Introduce ext4_find_next_bit
Also add generic_find_next_le_bit This gets used by the ext4 multi block allocator patches. Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] --- include/asm-generic/bitops/ext2-non-atomic.h |2 + include/asm-generic/bitops/le.h |4 ++ include/asm-powerpc/bitops.h |4 ++ include/linux/ext4_fs.h |1 + lib/find_next_bit.c | 44 ++ 5 files changed, 55 insertions(+), 0 deletions(-) diff --git a/include/asm-generic/bitops/ext2-non-atomic.h b/include/asm-generic/bitops/ext2-non-atomic.h index 1697404..63cf822 100644 --- a/include/asm-generic/bitops/ext2-non-atomic.h +++ b/include/asm-generic/bitops/ext2-non-atomic.h @@ -14,5 +14,7 @@ generic_find_first_zero_le_bit((unsigned long *)(addr), (size)) #define ext2_find_next_zero_bit(addr, size, off) \ generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off)) +#define ext2_find_next_bit(addr, size, off) \ + generic_find_next_le_bit((unsigned long *)(addr), (size), (off)) #endif /* _ASM_GENERIC_BITOPS_EXT2_NON_ATOMIC_H_ */ diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h index b9c7e5d..80e3bf1 100644 --- a/include/asm-generic/bitops/le.h +++ b/include/asm-generic/bitops/le.h @@ -20,6 +20,8 @@ #define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, addr) #define generic_find_next_zero_le_bit(addr, size, offset) find_next_zero_bit(addr, size, offset) +#define generic_find_next_le_bit(addr, size, offset) \ + find_next_bit(addr, size, offset) #elif defined(__BIG_ENDIAN) @@ -42,6 +44,8 @@ extern unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset); +extern unsigned long generic_find_next_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); #else #error Please fix asm/byteorder.h diff --git a/include/asm-powerpc/bitops.h b/include/asm-powerpc/bitops.h index 8144a27..60652a3 100644 --- a/include/asm-powerpc/bitops.h +++ b/include/asm-powerpc/bitops.h @@ -310,6 +310,8 @@ static __inline__ int test_le_bit(unsigned long nr, unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset); +unsigned long generic_find_next_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); /* Bitmap functions for the ext2 filesystem */ #define ext2_set_bit(nr,addr) \ @@ -329,6 +331,8 @@ unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, #define ext2_find_next_zero_bit(addr, size, off) \ generic_find_next_zero_le_bit((unsigned long*)addr, size, off) +#define ext2_find_next_bit(addr, size, off) \ + generic_find_next_le_bit((unsigned long *)addr, size, off) /* Bitmap functions for the minix filesystem. */ #define minix_test_and_set_bit(nr,addr) \ diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index cdee7aa..c7b9bb2 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -502,6 +502,7 @@ do { \ #define ext4_test_bit ext2_test_bit #define ext4_find_first_zero_bit ext2_find_first_zero_bit #define ext4_find_next_zero_bitext2_find_next_zero_bit +#define ext4_find_next_bit ext2_find_next_bit /* * Maximal mount counts between two filesystem checks diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c index bda0d71..0306c04 100644 --- a/lib/find_next_bit.c +++ b/lib/find_next_bit.c @@ -178,4 +178,48 @@ found_middle_swap: EXPORT_SYMBOL(generic_find_next_zero_le_bit); +unsigned long generic_find_next_le_bit(const unsigned long *addr, unsigned + long size, unsigned long offset) +{ + const unsigned long *p = addr + BITOP_WORD(offset); + unsigned long result = offset ~(BITS_PER_LONG - 1); + unsigned long tmp; + + if (offset = size) + return size; + size -= result; + offset = (BITS_PER_LONG - 1UL); + if (offset) { + tmp = ext2_swabp(p++); + tmp = (~0UL offset); + if (size BITS_PER_LONG) + goto found_first; + if (tmp) + goto found_middle; + size -= BITS_PER_LONG; + result += BITS_PER_LONG; + } + + while (size ~(BITS_PER_LONG - 1)) { + tmp = *(p++); + if (tmp) + goto found_middle_swap; + result += BITS_PER_LONG; + size -= BITS_PER_LONG; + } + if (!size) + return result; + tmp = ext2_swabp(p); +found_first: + tmp = (~0UL (BITS_PER_LONG - size)); + if (tmp == 0UL)
[PATCH] ext4: Fix spare warnings
Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] --- fs/ext4/inode.c |6 -- include/linux/ext4_fs.h | 16 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index a4848e0..307e240 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3177,12 +3177,14 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode) iloc, handle); if (ret) { EXT4_I(inode)-i_state |= EXT4_STATE_NO_EXPAND; - if (mnt_count != sbi-s_es-s_mnt_count) { + if (mnt_count != + le16_to_cpu(sbi-s_es-s_mnt_count)) { ext4_warning(inode-i_sb, __FUNCTION__, Unable to expand inode %lu. Delete some EAs or run e2fsck., inode-i_ino); - mnt_count = sbi-s_es-s_mnt_count; + mnt_count = + le16_to_cpu(sbi-s_es-s_mnt_count); } } } diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index c7b9bb2..ab7edaa 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -129,7 +129,7 @@ struct ext4_group_desc __le16 bg_free_blocks_count; /* Free blocks count */ __le16 bg_free_inodes_count; /* Free inodes count */ __le16 bg_used_dirs_count; /* Directories count */ - __u16 bg_flags; + __le16 bg_flags; __u32 bg_reserved[3]; __le32 bg_block_bitmap_hi; /* Blocks bitmap block MSB */ __le32 bg_inode_bitmap_hi; /* Inodes bitmap block MSB */ @@ -596,13 +596,13 @@ struct ext4_super_block { /*150*/__le32 s_blocks_count_hi; /* Blocks count */ __le32 s_r_blocks_count_hi;/* Reserved blocks count */ __le32 s_free_blocks_count_hi; /* Free blocks count */ - __u16 s_min_extra_isize; /* All inodes have at least # bytes */ - __u16 s_want_extra_isize; /* New inodes should reserve # bytes */ - __u32 s_flags;/* Miscellaneous flags */ - __u16 s_raid_stride; /* RAID stride */ - __u16 s_mmp_interval; /* # seconds to wait in MMP checking */ - __u64 s_mmp_block;/* Block for multi-mount protection */ - __u32 s_raid_stripe_width;/* blocks on all data disks (N*stride)*/ + __le16 s_min_extra_isize; /* All inodes have at least # bytes */ + __le16 s_want_extra_isize; /* New inodes should reserve # bytes */ + __le32 s_flags;/* Miscellaneous flags */ + __le16 s_raid_stride; /* RAID stride */ + __le16 s_mmp_interval; /* # seconds to wait in MMP checking */ + __le64 s_mmp_block;/* Block for multi-mount protection */ + __le32 s_raid_stripe_width;/* blocks on all data disks (N*stride)*/ __u32 s_reserved[163];/* Padding to the end of the block */ }; -- 1.5.3.1.91.gd3392-dirty - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
+ ext4-uninitialized-block-groups-fix.patch added to -mm tree
The patch titled Ext4: Uninitialized Block Groups (fix) has been added to the -mm tree. Its filename is ext4-uninitialized-block-groups-fix.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: Ext4: Uninitialized Block Groups (fix) From: Avantika Mathur [EMAIL PROTECTED] Andreas Dilger wrote: On Sep 18, 2007 20:03 -0700, Andrew Morton wrote: On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur [EMAIL PROTECTED] wrote: +#if !defined(CONFIG_CRC16) +/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */ +__u16 const crc16_table[256] = { me + 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241, That's rather sad. A plain old depends on would be better. My bad. We wrote this patch and it had to run on older kernels that might not even have lib/crc16.c (it was added around 2.6.15 or so, so e.g. RHEL4 doesn't have it). I forgot to remove it from the upstream submission, and since it didn't cause problems nobody else complained... The incremental patch below removes the local crc16 code, and also resolves an issue with properly updating bg_itable_unused when an inode is allocated in an unused block groups. Cc: Andreas Dilger [EMAIL PROTECTED] Cc: Avantika Mathur [EMAIL PROTECTED] Cc: Mingming Cao [EMAIL PROTECTED] Cc: Aneesh Kumar K.V [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- fs/Kconfig |1 fs/ext4/ialloc.c |8 ++- fs/ext4/super.c | 51 - 3 files changed, 9 insertions(+), 51 deletions(-) diff -puN fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix fs/ext4/ialloc.c --- a/fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix +++ a/fs/ext4/ialloc.c @@ -632,12 +632,18 @@ got: if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { if (gdp-bg_flags cpu_to_le16(EXT4_BG_INODE_UNINIT)) { gdp-bg_flags = cpu_to_le16(~EXT4_BG_INODE_UNINIT); - free = EXT4_INODES_PER_GROUP(sb); + free = 0; } else { free = EXT4_INODES_PER_GROUP(sb) - le16_to_cpu(gdp-bg_itable_unused); } + /* +* Check the relative inode number against the last used +* relative inode number in this group. if it is greater +* we need to update the bg_itable_unused count +* +*/ if (ino free) gdp-bg_itable_unused = cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino); diff -puN fs/ext4/super.c~ext4-uninitialized-block-groups-fix fs/ext4/super.c --- a/fs/ext4/super.c~ext4-uninitialized-block-groups-fix +++ a/fs/ext4/super.c @@ -37,6 +37,7 @@ #include linux/quotaops.h #include linux/seq_file.h #include linux/log2.h +#include linux/crc16.h #include asm/uaccess.h @@ -1337,56 +1338,6 @@ static int ext4_setup_super(struct super return res; } -#if !defined(CONFIG_CRC16) -/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */ -__u16 const crc16_table[256] = { - 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241, - 0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440, - 0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40, - 0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841, - 0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40, - 0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41, - 0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0, 0x1680, 0xD641, - 0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1, 0xD081, 0x1040, - 0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1, 0xF281, 0x3240, - 0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0, 0x3480, 0xF441, - 0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0, 0x3E80, 0xFE41, - 0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1, 0xF881, 0x3840, - 0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0, 0x2A80, 0xEA41, - 0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1, 0xEC81, 0x2C40, - 0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1, 0xE681, 0x2640, - 0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0, 0x2080, 0xE041, - 0xA001, 0x60C0, 0x6180, 0xA141, 0x6300, 0xA3C1, 0xA281, 0x6240, - 0x6600, 0xA6C1, 0xA781, 0x6740, 0xA501, 0x65C0, 0x6480, 0xA441, - 0x6C00, 0xACC1, 0xAD81, 0x6D40, 0xAF01, 0x6FC0, 0x6E80, 0xAE41, - 0xAA01, 0x6AC0, 0x6B80, 0xAB41, 0x6900, 0xA9C1, 0xA881, 0x6840, - 0x7800, 0xB8C1, 0xB981, 0x7940, 0xBB01, 0x7BC0, 0x7A80, 0xBA41, - 0xBE01, 0x7EC0, 0x7F80, 0xBF41, 0x7D00, 0xBDC1,
Re: + ext4-uninitialized-block-groups-fix.patch added to -mm tree
[EMAIL PROTECTED] wrote: The patch titled Ext4: Uninitialized Block Groups (fix) has been added to the -mm tree. Its filename is ext4-uninitialized-block-groups-fix.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: Ext4: Uninitialized Block Groups (fix) From: Avantika Mathur [EMAIL PROTECTED] Andreas Dilger wrote: On Sep 18, 2007 20:03 -0700, Andrew Morton wrote: On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur [EMAIL PROTECTED] wrote: +#if !defined(CONFIG_CRC16) +/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */ +__u16 const crc16_table[256] = { me + 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241, That's rather sad. A plain old depends on would be better. My bad. We wrote this patch and it had to run on older kernels that might not even have lib/crc16.c (it was added around 2.6.15 or so, so e.g. RHEL4 doesn't have it). I forgot to remove it from the upstream submission, and since it didn't cause problems nobody else complained... The incremental patch below removes the local crc16 code, and also resolves an issue with properly updating bg_itable_unused when an inode is allocated in an unused block groups. Cc: Andreas Dilger [EMAIL PROTECTED] Cc: Avantika Mathur [EMAIL PROTECTED] Cc: Mingming Cao [EMAIL PROTECTED] Cc: Aneesh Kumar K.V [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED] --- fs/Kconfig |1 fs/ext4/ialloc.c |8 ++- fs/ext4/super.c | 51 - 3 files changed, 9 insertions(+), 51 deletions(-) diff -puN fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix fs/ext4/ialloc.c --- a/fs/ext4/ialloc.c~ext4-uninitialized-block-groups-fix +++ a/fs/ext4/ialloc.c @@ -632,12 +632,18 @@ got: if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) { if (gdp-bg_flags cpu_to_le16(EXT4_BG_INODE_UNINIT)) { gdp-bg_flags = cpu_to_le16(~EXT4_BG_INODE_UNINIT); - free = EXT4_INODES_PER_GROUP(sb); I guess a comment as below would help to explain why are we not decrementing bg_itable_unused by 1. /* When marking the block group with ~EXT4_BG_INODE_UNINIT we don't want to depend * on the value of bg_itable_unsed even though mke2fs could have initialized the * same for us. Instead we calculated the value below */ + free = 0; } else { free = EXT4_INODES_PER_GROUP(sb) - le16_to_cpu(gdp-bg_itable_unused); } + /* +* Check the relative inode number against the last used +* relative inode number in this group. if it is greater +* we need to update the bg_itable_unused count +* +*/ if (ino free) gdp-bg_itable_unused = cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino); diff -puN fs/ext4/super.c~ext4-uninitialized-block-groups-fix fs/ext4/super.c --- a/fs/ext4/super.c~ext4-uninitialized-block-groups-fix +++ a/fs/ext4/super.c @@ -37,6 +37,7 @@ #include linux/quotaops.h #include linux/seq_file.h #include linux/log2.h +#include linux/crc16.h #include asm/uaccess.h @@ -1337,56 +1338,6 @@ static int ext4_setup_super(struct super return res; } -#if !defined(CONFIG_CRC16) -/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */ -__u16 const crc16_table[256] = { - 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241, - 0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440, - 0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40, - 0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841, - 0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40, - 0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41, - 0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0, 0x1680, 0xD641, - 0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1, 0xD081, 0x1040, - 0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1, 0xF281, 0x3240, - 0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0, 0x3480, 0xF441, - 0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0, 0x3E80, 0xFE41, - 0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1, 0xF881, 0x3840, - 0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0, 0x2A80, 0xEA41, - 0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1, 0xEC81, 0x2C40, - 0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1, 0xE681, 0x2640, - 0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0, 0x2080, 0xE041, -