Ext4 patches for 2.6.22-rc6
On Fri, 2007-06-29 at 13:57 -0700, Andrew Morton wrote: On Fri, 29 Jun 2007 11:50:04 -0400 Mingming Caoc [EMAIL PROTECTED] wrote: I think the ext4 patch queue is in good shape now. Which ext4 patches are you intending to merge into 2.6.23? Please send all those out to lkml for review? Hi Andrew, Here are the patches in ext4-patch-queue that I think can be considered to be merged to upstream. Please review. All of the patches have been posted on ext4 mailinglist before. Some are bug fixes, some are features, to summaries: - make extents on by default in ext4dev - nanosecond timestamp - 64 bit inode versioning support - remove 32k subdir limits - journal checksumming - journal stats via procfs - delayed allocation for ext4 writeback mode - fallocate() All the patches can be found at http://repo.or.cz/w/ext4-patch-queue.git and have been tested(with fsx ,dbench, FFSB, iozone) on x86,x86_64,ppc64, with extents and delayed allocation enabled And the full series can be found at http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=series;h=2f43431db28778ce8d2149bce7a51566a2d2517c;hb=56e27e20cf228b32f5162a76b3bad154d1d3b730 I will post the patches-in-good-shape (in 9 set of patches) to lkml in the following emails, except for the bottom two feature: *the fallocate() patches, which Amit just posted a few days ago and are under review (hopefully we can reach a agreement on the interface and the modes before 2.6.23-rc1 window closed). *Another one is the delayed allocation patches in ext4 patch queue. Alex mentioned in another email that he is working on another version of delalloc that can handle block size page size, and move some work to vfs. So it's probably not very useful to post this version for people to review. So, here is the series file. # Rebased the patches to 2.6.22-rc6 # Add mount option to turn off extents ext4_noextent_mount_opt.patch # Mounted ext4dev fs with extents by default for testing purpose, # for Ext4 product release, extents mount option # will be turn on only if the fs has EXTENTS feature on ext4_extents_on_by_default.patch # Propagate inode flags ext4-propagate_flags.patch # Add extent sanity checks ext4-extent-sanity-checks.patch # Bug fix:set 64bit JBD2 feature on 32bit ext4 fs ext4_set_jbd2_64bit_feature.patch # Fix: Rename CONFIG_JBD_DEBUG to CONFIG_JBD2_DEBUG jbd2_config_jbd2_debug_fix.patch # Export jbd2-debug via debugfs ext4_CONFIG_JBD2_DEBUG.patch jbd2_move_jbd2_debug_to_debugfs.patch # Nanosecond timestamp support ext4-nanosecond-patch # inode verion patch series # inode versioning is needed for NFSv4 # vfs changes, 64 bit inode-i_version 64-bit-i_version.patch # reserve hi 32 bit inode version on ext4 on-disk inode i_version_hi.patch # ext4 inode version read/store ext4_i_version_hi_2.patch # ext4 inode version update i_version_update_ext4.patch # add a noversion mount option to disable inode version updates ext4_no_version.patch # New patch to expand inode i_extra_isize to support features # in high part of inode (128 bytes) ext4_expand_inode_extra_isize.patch # Export jbd stats through procfs # Shall this move to debugfs? jbd-stats-through-procfs # Remove 32000 subdirs limit. ext4_remove_subdirs_limit.patch # Add journal checksums ext4-journal_chksum-2.6.20.patch # Various Cleanups ext4-zero_user_page.patch is_power_of_2-ext4-superc.patch ext4-remove-extra-is_rdonly-check.patch ext4_extent_compilation_fixes.patch ext4_extent_macros_cleanup.patch - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 1][PATCH 1/2] Add noextents mount option
Add a mount option to turn off extents. Signed-off-by: Mingming Cao [EMAIL PROTECTED] --- Index: linux-2.6.22-rc4/fs/ext4/super.c === --- linux-2.6.22-rc4.orig/fs/ext4/super.c 2007-06-11 17:02:18.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-11 17:02:22.0 -0700 @@ -725,7 +725,7 @@ Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, - Opt_grpquota, Opt_extents, + Opt_grpquota, Opt_extents, Opt_noextents, }; static match_table_t tokens = { @@ -776,6 +776,7 @@ {Opt_usrquota, usrquota}, {Opt_barrier, barrier=%u}, {Opt_extents, extents}, + {Opt_noextents, noextents}, {Opt_err, NULL}, {Opt_resize, resize}, }; @@ -,6 +1112,9 @@ case Opt_extents: set_opt (sbi-s_mount_opt, EXTENTS); break; + case Opt_noextents: + clear_opt (sbi-s_mount_opt, EXTENTS); + break; default: printk (KERN_ERR EXT4-fs: Unrecognized mount option \%s\ - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 1][PATCH 2/2] Enable extents by default for ext4dev
Turn on extents feature by default in ext4 filesystem. User could use -o noextents to turn it off. Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.22-rc4/fs/ext4/super.c === --- linux-2.6.22-rc4.orig/fs/ext4/super.c 2007-06-11 17:02:22.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-11 17:03:09.0 -0700 @@ -1546,6 +1546,12 @@ set_opt(sbi-s_mount_opt, RESERVATION); + /* +* turn on extents feature by default in ext4 filesystem +* User -o noextents to turn it off +*/ + set_opt (sbi-s_mount_opt, EXTENTS); + if (!parse_options ((char *) data, sb, journal_inum, journal_devnum, NULL, 0)) goto failed_mount; - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 2][PATCH 1/5] cleanups: Propagate some i_flags to disk
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ext4-specific i_flags. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.22-rc4/fs/ext4/inode.c === --- linux-2.6.22-rc4.orig/fs/ext4/inode.c 2007-06-11 17:24:01.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/inode.c2007-06-11 17:24:28.0 -0700 @@ -2583,6 +2583,25 @@ inode-i_flags |= S_DIRSYNC; } +/* Propagate flags from i_flags to EXT4_I(inode)-i_flags */ +void ext4_get_inode_flags(struct ext4_inode_info *ei) +{ + unsigned int flags = ei-vfs_inode.i_flags; + + ei-i_flags = ~(EXT4_SYNC_FL|EXT4_APPEND_FL| + EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL); + if (flags S_SYNC) + ei-i_flags |= EXT4_SYNC_FL; + if (flags S_APPEND) + ei-i_flags |= EXT4_APPEND_FL; + if (flags S_IMMUTABLE) + ei-i_flags |= EXT4_IMMUTABLE_FL; + if (flags S_NOATIME) + ei-i_flags |= EXT4_NOATIME_FL; + if (flags S_DIRSYNC) + ei-i_flags |= EXT4_DIRSYNC_FL; +} + void ext4_read_inode(struct inode * inode) { struct ext4_iloc iloc; @@ -2742,6 +2761,7 @@ if (ei-i_state EXT4_STATE_NEW) memset(raw_inode, 0, EXT4_SB(inode-i_sb)-s_inode_size); + ext4_get_inode_flags(ei); raw_inode-i_mode = cpu_to_le16(inode-i_mode); if(!(test_opt(inode-i_sb, NO_UID32))) { raw_inode-i_uid_low = cpu_to_le16(low_16_bits(inode-i_uid)); Index: linux-2.6.22-rc4/fs/ext4/ioctl.c === --- linux-2.6.22-rc4.orig/fs/ext4/ioctl.c 2007-06-11 17:24:01.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/ioctl.c2007-06-11 17:25:11.0 -0700 @@ -28,6 +28,7 @@ switch (cmd) { case EXT4_IOC_GETFLAGS: + ext4_get_inode_flags(ei); flags = ei-i_flags EXT4_FL_USER_VISIBLE; return put_user(flags, (int __user *) arg); case EXT4_IOC_SETFLAGS: { Index: linux-2.6.22-rc4/include/linux/ext4_fs.h === --- linux-2.6.22-rc4.orig/include/linux/ext4_fs.h 2007-06-11 17:24:01.0 -0700 +++ linux-2.6.22-rc4/include/linux/ext4_fs.h2007-06-11 17:24:28.0 -0700 @@ -862,6 +862,7 @@ extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *); extern void ext4_truncate (struct inode *); extern void ext4_set_inode_flags(struct inode *); +extern void ext4_get_inode_flags(struct ext4_inode_info *); extern void ext4_set_aops(struct inode *inode); extern int ext4_writepage_trans_blocks(struct inode *); extern int ext4_block_truncate_page(handle_t *handle, struct page *page, - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks
with the patch all headers are checked. the code should become more resistant to on-disk corruptions. needless BUG_ON() have been removed. please, review for inclusion. Signed-off-by: Alex Tomas [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.22-rc4/fs/ext4/extents.c === --- linux-2.6.22-rc4.orig/fs/ext4/extents.c 2007-06-11 17:22:15.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/extents.c 2007-06-11 17:27:57.0 -0700 @@ -91,36 +91,6 @@ ix-ei_leaf_hi = cpu_to_le16((unsigned long) ((pb 31) 1) 0x); } -static int ext4_ext_check_header(const char *function, struct inode *inode, - struct ext4_extent_header *eh) -{ - const char *error_msg = NULL; - - if (unlikely(eh-eh_magic != EXT4_EXT_MAGIC)) { - error_msg = invalid magic; - goto corrupted; - } - if (unlikely(eh-eh_max == 0)) { - error_msg = invalid eh_max; - goto corrupted; - } - if (unlikely(le16_to_cpu(eh-eh_entries) le16_to_cpu(eh-eh_max))) { - error_msg = invalid eh_entries; - goto corrupted; - } - return 0; - -corrupted: - ext4_error(inode-i_sb, function, - bad header in inode #%lu: %s - magic %x, - entries %u, max %u, depth %u, - inode-i_ino, error_msg, le16_to_cpu(eh-eh_magic), - le16_to_cpu(eh-eh_entries), le16_to_cpu(eh-eh_max), - le16_to_cpu(eh-eh_depth)); - - return -EIO; -} - static handle_t *ext4_ext_journal_restart(handle_t *handle, int needed) { int err; @@ -269,6 +239,70 @@ return size; } +static inline int +ext4_ext_max_entries(struct inode *inode, int depth) +{ + int max; + + if (depth == ext_depth(inode)) { + if (depth == 0) + max = ext4_ext_space_root(inode); + else + max = ext4_ext_space_root_idx(inode); + } else { + if (depth == 0) + max = ext4_ext_space_block(inode); + else + max = ext4_ext_space_block_idx(inode); + } + + return max; +} + +static int __ext4_ext_check_header(const char *function, struct inode *inode, + struct ext4_extent_header *eh, + int depth) +{ + const char *error_msg = NULL; + int max = 0; + + if (unlikely(eh-eh_magic != EXT4_EXT_MAGIC)) { + error_msg = invalid magic; + goto corrupted; + } + if (unlikely(le16_to_cpu(eh-eh_depth) != depth)) { + error_msg = unexpected eh_depth; + goto corrupted; + } + if (unlikely(eh-eh_max == 0)) { + error_msg = invalid eh_max; + goto corrupted; + } + max = ext4_ext_max_entries(inode, depth); + if (unlikely(le16_to_cpu(eh-eh_max) max)) { + error_msg = too large eh_max; + goto corrupted; + } + if (unlikely(le16_to_cpu(eh-eh_entries) le16_to_cpu(eh-eh_max))) { + error_msg = invalid eh_entries; + goto corrupted; + } + return 0; + +corrupted: + ext4_error(inode-i_sb, function, + bad header in inode #%lu: %s - magic %x, + entries %u, max %u(%u), depth %u(%u), + inode-i_ino, error_msg, le16_to_cpu(eh-eh_magic), + le16_to_cpu(eh-eh_entries), le16_to_cpu(eh-eh_max), + max, le16_to_cpu(eh-eh_depth), depth); + + return -EIO; +} + +#define ext4_ext_check_header(inode, eh, depth)\ + __ext4_ext_check_header(__FUNCTION__, inode, eh, depth) + #ifdef EXT_DEBUG static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path) { @@ -329,6 +363,7 @@ /* * ext4_ext_binsearch_idx: * binary search for the closest index of the given block + * the header must be checked before calling this */ static void ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int block) @@ -336,9 +371,6 @@ struct ext4_extent_header *eh = path-p_hdr; struct ext4_extent_idx *r, *l, *m; - BUG_ON(eh-eh_magic != EXT4_EXT_MAGIC); - BUG_ON(le16_to_cpu(eh-eh_entries) le16_to_cpu(eh-eh_max)); - BUG_ON(le16_to_cpu(eh-eh_entries) = 0); ext_debug(binsearch for %d(idx): , block); @@ -388,6 +420,7 @@ /* * ext4_ext_binsearch: * binary search for closest extent of the given block + * the header must be checked before calling this */ static void ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block) @@ -395,9 +428,6 @@ struct ext4_extent_header *eh = path-p_hdr; struct
[EXT4 set 2][PATCH 3/5] cleanups: set_jbd2_64bit_feature for 16TB ext4 fs
Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more than 32bit block sizes during mount time. This ensure proper record lenth when writing to the journal. Signed-off-by: Jose R. Santos [EMAIL PROTECTED] Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Signed-off-by: Laurent Vivier [EMAIL PROTECTED] --- fs/ext4/super.c | 11 +++ 1 file changed, 11 insertions(+) Index: linux-2.6.22-rc4/fs/ext4/super.c === --- linux-2.6.22-rc4.orig/fs/ext4/super.c 2007-06-11 16:15:54.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-11 16:16:10.0 -0700 @@ -1804,6 +1804,13 @@ goto failed_mount3; } + if (ext4_blocks_count(es) 0xULL + !jbd2_journal_set_features(EXT4_SB(sb)-s_journal, 0, 0, + JBD2_FEATURE_INCOMPAT_64BIT)) { + printk(KERN_ERR ext4: Failed to set 64-bit journal feature\n); + goto failed_mount4; + } + /* We have now updated the journal if required, so we can * validate the data journaling mode. */ switch (test_opt(sb, DATA_FLAGS)) { - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 2][PATCH 4/5] cleanups: Rename CONFIG_JBD_DEBUG to CONFIG_JBD2_DEBUG
When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos [EMAIL PROTECTED] --- Index: linux-2.6.22-rc4/fs/jbd2/journal.c === --- linux-2.6.22-rc4.orig/fs/jbd2/journal.c 2007-06-11 16:15:49.0 -0700 +++ linux-2.6.22-rc4/fs/jbd2/journal.c 2007-06-11 16:16:18.0 -0700 @@ -528,7 +528,7 @@ { int err = 0; -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG spin_lock(journal-j_state_lock); if (!tid_geq(journal-j_commit_request, tid)) { printk(KERN_EMERG @@ -1709,7 +1709,7 @@ * Journal_head storage management */ static struct kmem_cache *jbd2_journal_head_cache; -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG static atomic_t nr_journal_heads = ATOMIC_INIT(0); #endif @@ -1747,7 +1747,7 @@ struct journal_head *ret; static unsigned long last_warning; -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG atomic_inc(nr_journal_heads); #endif ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS); @@ -1768,7 +1768,7 @@ static void journal_free_journal_head(struct journal_head *jh) { -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG atomic_dec(nr_journal_heads); memset(jh, JBD_POISON_FREE, sizeof(*jh)); #endif @@ -1953,12 +1953,12 @@ /* * /proc tunables */ -#if defined(CONFIG_JBD_DEBUG) +#if defined(CONFIG_JBD2_DEBUG) int jbd2_journal_enable_debug; EXPORT_SYMBOL(jbd2_journal_enable_debug); #endif -#if defined(CONFIG_JBD_DEBUG) defined(CONFIG_PROC_FS) +#if defined(CONFIG_JBD2_DEBUG) defined(CONFIG_PROC_FS) static struct proc_dir_entry *proc_jbd_debug; @@ -2073,7 +2073,7 @@ static void __exit journal_exit(void) { -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG int n = atomic_read(nr_journal_heads); if (n) printk(KERN_EMERG JBD: leaked %d journal_heads!\n, n); Index: linux-2.6.22-rc4/fs/jbd2/recovery.c === --- linux-2.6.22-rc4.orig/fs/jbd2/recovery.c2007-06-04 17:57:25.0 -0700 +++ linux-2.6.22-rc4/fs/jbd2/recovery.c 2007-06-11 16:16:18.0 -0700 @@ -295,7 +295,7 @@ printk(KERN_ERR JBD: error %d scanning journal\n, err); ++journal-j_transaction_sequence; } else { -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG int dropped = info.end_transaction - be32_to_cpu(sb-s_sequence); #endif jbd_debug(0, Index: linux-2.6.22-rc4/include/linux/ext4_fs.h === --- linux-2.6.22-rc4.orig/include/linux/ext4_fs.h 2007-06-11 16:15:59.0 -0700 +++ linux-2.6.22-rc4/include/linux/ext4_fs.h2007-06-11 16:16:18.0 -0700 @@ -237,7 +237,7 @@ #define EXT4_IOC_GROUP_ADD _IOW('f', 8,struct ext4_new_group_input) #defineEXT4_IOC_GETVERSION_OLD FS_IOC_GETVERSION #defineEXT4_IOC_SETVERSION_OLD FS_IOC_SETVERSION -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG #define EXT4_IOC_WAIT_FOR_READONLY _IOR('f', 99, long) #endif #define EXT4_IOC_GETRSVSZ _IOR('f', 5, long) @@ -253,7 +253,7 @@ #define EXT4_IOC32_GETRSVSZ_IOR('f', 5, int) #define EXT4_IOC32_SETRSVSZ_IOW('f', 6, int) #define EXT4_IOC32_GROUP_EXTEND_IOW('f', 7, unsigned int) -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG #define EXT4_IOC32_WAIT_FOR_READONLY _IOR('f', 99, int) #endif #define EXT4_IOC32_GETVERSION_OLD FS_IOC32_GETVERSION Index: linux-2.6.22-rc4/include/linux/ext4_fs_sb.h === --- linux-2.6.22-rc4.orig/include/linux/ext4_fs_sb.h2007-06-11 16:15:55.0 -0700 +++ linux-2.6.22-rc4/include/linux/ext4_fs_sb.h 2007-06-11 16:16:18.0 -0700 @@ -71,7 +71,7 @@ struct list_head s_orphan; unsigned long s_commit_interval; struct block_device *journal_bdev; -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG struct timer_list turn_ro_timer;/* For turning read-only (crash simulation) */ wait_queue_head_t ro_wait_queue;/* For people waiting for the fs to go read-only */ #endif Index: linux-2.6.22-rc4/include/linux/jbd2.h === --- linux-2.6.22-rc4.orig/include/linux/jbd2.h 2007-06-11 16:15:49.0 -0700 +++ linux-2.6.22-rc4/include/linux/jbd2.h 2007-06-11 16:16:18.0 -0700 @@ -50,11 +50,11 @@ */ #define JBD_DEFAULT_MAX_COMMIT_AGE 5 -#ifdef CONFIG_JBD_DEBUG +#ifdef CONFIG_JBD2_DEBUG /* * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal * consistency checks. By default we don't do this unless - * CONFIG_JBD_DEBUG
[EXT4 set 4][PATCH 2/5] i_version: Add hi 32 bit inode version on ext4 on-disk inode
This patch adds a 32-bit i_version_hi field to ext4_inode, which can be used for 64-bit inode versions. This field will store the higher 32 bits of the version, while Jean Noel's patch has added support to store the lower 32-bits in osd1.linux1.l_i_version. Signed-off-by: Mingming Cao [EMAIL PROTECTED] Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Kalpak Shah [EMAIL PROTECTED] --- Index: linux-2.6.21/include/linux/ext4_fs.h === --- linux-2.6.21.orig/include/linux/ext4_fs.h +++ linux-2.6.21/include/linux/ext4_fs.h @@ -342,6 +342,7 @@ struct ext4_inode { __le32 i_atime_extra; /* extra Access time (nsec 2 | epoch) */ __le32 i_crtime; /* File Creation time */ __le32 i_crtime_extra; /* extra FileCreationtime (nsec 2 | epoch) */ + __le32 i_version_hi; /* high 32 bits for 64-bit version */ }; #define i_size_highi_dir_acl - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 4][PATCH 3/5] i_version:ext4 inode version read/store
This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. Signed-off-by: Kalpak Shah [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.21/fs/ext4/inode.c === --- linux-2.6.21.orig/fs/ext4/inode.c +++ linux-2.6.21/fs/ext4/inode.c @@ -2709,6 +2709,13 @@ void ext4_read_inode(struct inode * inod EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode); EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode); + inode-i_version = le32_to_cpu(raw_inode-i_disk_version); + if (EXT4_INODE_SIZE(inode-i_sb) EXT4_GOOD_OLD_INODE_SIZE) { + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) + inode-i_version |= + (__u64)(le32_to_cpu(raw_inode-i_version_hi)) 32; + } + if (S_ISREG(inode-i_mode)) { inode-i_op = ext4_file_inode_operations; inode-i_fop = ext4_file_operations; @@ -2852,8 +2859,14 @@ static int ext4_do_update_inode(handle_t } else for (block = 0; block EXT4_N_BLOCKS; block++) raw_inode-i_block[block] = ei-i_data[block]; - if (ei-i_extra_isize) + raw_inode-i_disk_version = cpu_to_le32(inode-i_version); + if (ei-i_extra_isize) { + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) { + raw_inode-i_version_hi = + cpu_to_le32(inode-i_version 32); + } raw_inode-i_extra_isize = cpu_to_le16(ei-i_extra_isize); + } BUFFER_TRACE(bh, call ext4_journal_dirty_metadata); rc = ext4_journal_dirty_metadata(handle, bh); Index: linux-2.6.21/include/linux/ext4_fs.h === --- linux-2.6.21.orig/include/linux/ext4_fs.h +++ linux-2.6.21/include/linux/ext4_fs.h @@ -297,7 +297,7 @@ struct ext4_inode { __le32 i_flags;/* File flags */ union { struct { - __u32 l_i_reserved1; + __u32 l_i_version; } linux1; struct { __u32 h_i_translator; @@ -406,6 +406,8 @@ do { \ raw_inode-xtime ## _extra);\ } while (0) +#define i_disk_version osd1.linux1.l_i_version + #if defined(__KERNEL__) || defined(__linux__) #define i_reserved1osd1.linux1.l_i_reserved1 #define i_frag osd2.linux2.l_i_frag - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 4][PATCH 4/5] i_version:ext4 inode version update
This patch is on top of i_version_update_vfs. The i_version field of the inode is set on inode creation and incremented when the inode is being modified. Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED] Signed-off-by: Mingming Cao [EMAIL PROTECTED] Index: linux-2.6.22-rc4/fs/ext4/ialloc.c === --- linux-2.6.22-rc4.orig/fs/ext4/ialloc.c 2007-06-13 17:16:28.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/ialloc.c 2007-06-13 17:24:45.0 -0700 @@ -565,6 +565,7 @@ got: inode-i_blocks = 0; inode-i_mtime = inode-i_atime = inode-i_ctime = ei-i_crtime = ext4_current_time(inode); + inode-i_version = 1; memset(ei-i_data, 0, sizeof(ei-i_data)); ei-i_dir_start_lookup = 0; Index: linux-2.6.22-rc4/fs/ext4/inode.c === --- linux-2.6.22-rc4.orig/fs/ext4/inode.c 2007-06-13 17:21:29.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/inode.c2007-06-13 17:24:45.0 -0700 @@ -3082,6 +3082,7 @@ int ext4_mark_iloc_dirty(handle_t *handl { int err = 0; + inode-i_version++; /* the do_update_inode consumes one bh-b_count */ get_bh(iloc-bh); Index: linux-2.6.22-rc4/fs/ext4/super.c === --- linux-2.6.22-rc4.orig/fs/ext4/super.c 2007-06-13 17:19:11.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-13 17:24:45.0 -0700 @@ -2846,8 +2846,8 @@ out: i_size_write(inode, off+len-towrite); EXT4_I(inode)-i_disksize = inode-i_size; } - inode-i_version++; inode-i_mtime = inode-i_ctime = CURRENT_TIME; + inode-i_version = 1; ext4_mark_inode_dirty(handle, inode); mutex_unlock(inode-i_mutex); return len - towrite; - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 4][PATCH 5/5] i_version: noversion mount option to disable inode version updates
Add a noversion mount option to disable inode version updates. Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Kalpak Shah [EMAIL PROTECTED] Index: linux-2.6.21/fs/ext4/super.c === --- linux-2.6.21.orig/fs/ext4/super.c +++ linux-2.6.21/fs/ext4/super.c @@ -725,7 +725,7 @@ enum { Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, - Opt_grpquota, Opt_extents, Opt_noextents, + Opt_grpquota, Opt_extents, Opt_noextents, Opt_noversion, }; static match_table_t tokens = { @@ -777,6 +777,7 @@ static match_table_t tokens = { {Opt_barrier, barrier=%u}, {Opt_extents, extents}, {Opt_noextents, noextents}, + {Opt_noversion, noversion}, {Opt_err, NULL}, {Opt_resize, resize}, }; @@ -1115,6 +1116,9 @@ clear_qf_name: case Opt_noextents: clear_opt (sbi-s_mount_opt, EXTENTS); break; + case Opt_noversion: + set_opt(sbi-s_mount_opt, NOVERSION); + break; default: printk (KERN_ERR EXT4-fs: Unrecognized mount option \%s\ Index: linux-2.6.21/include/linux/ext4_fs.h === --- linux-2.6.21.orig/include/linux/ext4_fs.h +++ linux-2.6.21/include/linux/ext4_fs.h @@ -473,6 +473,7 @@ do { \ #define EXT4_MOUNT_USRQUOTA0x10 /* old user quota */ #define EXT4_MOUNT_GRPQUOTA0x20 /* old group quota */ #define EXT4_MOUNT_EXTENTS 0x40 /* Extents support */ +#define EXT4_MOUNT_NOVERSION 0x80 /* No inode version updates */ /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */ #ifndef _LINUX_EXT2_FS_H Index: linux-2.6.21/fs/ext4/inode.c === --- linux-2.6.21.orig/fs/ext4/inode.c +++ linux-2.6.21/fs/ext4/inode.c @@ -3082,7 +3082,9 @@ int ext4_mark_iloc_dirty(handle_t *handl { int err = 0; - inode-i_version++; + if (!test_opt(inode-i_sb, NOVERSION)) + inode-i_version++; + /* the do_update_inode consumes one bh-b_count */ get_bh(iloc-bh); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 6][PATCH 1/1]Export jbd stats through procfs
[PATCH] jbd2 stats through procfs The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4m=113538565128617w=2). It provides statistics via procfs such as transaction lifetime and size. [ This probably should be rewritten to use debugfs? -- Ted] Signed-off-by: Johann Lombardi [EMAIL PROTECTED] -- Index: linux-2.6.22-rc4/include/linux/jbd2.h === --- linux-2.6.22-rc4.orig/include/linux/jbd2.h 2007-06-11 17:28:17.0 -0700 +++ linux-2.6.22-rc4/include/linux/jbd2.h 2007-06-13 10:45:21.0 -0700 @@ -408,6 +408,16 @@ }; +/* + * Some stats for checkpoint phase + */ +struct transaction_chp_stats_s { + unsigned long cs_chp_time; + unsigned long cs_forced_to_close; + unsigned long cs_written; + unsigned long cs_dropped; +}; + /* The transaction_t type is the guts of the journaling mechanism. It * tracks a compound transaction through its various states: * @@ -543,6 +553,21 @@ spinlock_t t_handle_lock; /* +* Longest time some handle had to wait for running transaction +*/ + unsigned long t_max_wait; + + /* +* When transaction started +*/ + unsigned long t_start; + + /* +* Checkpointing stats [j_checkpoint_sem] +*/ + struct transaction_chp_stats_s t_chp_stats; + + /* * Number of outstanding updates running on this transaction * [t_handle_lock] */ @@ -573,6 +598,57 @@ }; +struct transaction_run_stats_s { + unsigned long rs_wait; + unsigned long rs_running; + unsigned long rs_locked; + unsigned long rs_flushing; + unsigned long rs_logging; + + unsigned long rs_handle_count; + unsigned long rs_blocks; + unsigned long rs_blocks_logged; +}; + +struct transaction_stats_s +{ + int ts_type; + unsigned long ts_tid; + union { + struct transaction_run_stats_s run; + struct transaction_chp_stats_s chp; + } u; +}; + +#define JBD2_STATS_RUN 1 +#define JBD2_STATS_CHECKPOINT 2 + +#define ts_waitu.run.rs_wait +#define ts_running u.run.rs_running +#define ts_locked u.run.rs_locked +#define ts_flushingu.run.rs_flushing +#define ts_logging u.run.rs_logging +#define ts_handle_countu.run.rs_handle_count +#define ts_blocks u.run.rs_blocks +#define ts_blocks_logged u.run.rs_blocks_logged + +#define ts_chp_timeu.chp.cs_chp_time +#define ts_forced_to_close u.chp.cs_forced_to_close +#define ts_written u.chp.cs_written +#define ts_dropped u.chp.cs_dropped + +#define CURRENT_MSECS (jiffies_to_msecs(jiffies)) + +static inline unsigned int +jbd2_time_diff(unsigned int start, unsigned int end) +{ + if (unlikely(start end)) + end = end + (~0UL - start); + else + end -= start; + return end; +} + /** * struct journal_s - The journal_s type is the concrete type associated with * journal_t. @@ -634,6 +710,12 @@ * @j_wbufsize: maximum number of buffer_heads allowed in j_wbuf, the * number that will fit in j_blocksize * @j_last_sync_writer: most recent pid which did a synchronous write + * @j_history: Buffer storing the transactions statistics history + * @j_history_max: Maximum number of transactions in the statistics history + * @j_history_cur: Current number of transactions in the statistics history + * @j_history_lock: Protect the transactions statistics history + * @j_proc_entry: procfs entry for the jbd statistics directory + * @j_stats: Overall statistics * @j_private: An opaque pointer to fs-private information. */ @@ -826,6 +908,16 @@ pid_t j_last_sync_writer; /* +* Journal statistics +*/ + struct transaction_stats_s *j_history; + int j_history_max; + int j_history_cur; + spinlock_t j_history_lock; + struct proc_dir_entry *j_proc_entry; + struct transaction_stats_s j_stats; + + /* * An opaque pointer to fs-private information. ext3 puts its * superblock pointer here */ Index: linux-2.6.22-rc4/fs/jbd2/transaction.c === --- linux-2.6.22-rc4.orig/fs/jbd2/transaction.c 2007-06-11 17:22:14.0 -0700 +++ linux-2.6.22-rc4/fs/jbd2/transaction.c 2007-06-13 10:47:56.0 -0700 @@ -59,6 +59,8 @@ J_ASSERT(journal-j_running_transaction
[EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.
From [EMAIL PROTECTED] Thu May 17 17:21:08 2007 Hi, I have rebased this patch to 2.6.22-rc1 so that it can be added to the ext4 patch queue. It has been tested by creating more than 65000 subdirs and then deleting them and checking the nlinks. The e2fsprogs part of this patch was sent earlier by me to linux-ext4 and doesn't need any changes, so not submitting it again. -- This patch adds support to ext4 for allowing more than 65000 subdirectories. Currently the maximum number of subdirectories is capped at 32000. If we exceed 65000 subdirectories in an htree directory it sets the inode link count to 1 and no longer counts subdirectories. The directory link count is not actually used when determining if a directory is empty, as that only counts subdirectories and not regular files that might be in there. A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if the subdir count for any directory crosses 65000. Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Kalpak Shah [EMAIL PROTECTED] Index: linux-2.6.22-rc4/fs/ext4/namei.c === --- linux-2.6.22-rc4.orig/fs/ext4/namei.c 2007-06-14 17:30:47.0 -0700 +++ linux-2.6.22-rc4/fs/ext4/namei.c2007-06-14 17:32:55.0 -0700 @@ -1619,6 +1619,27 @@ static int ext4_delete_entry (handle_t * return -ENOENT; } +static inline void ext4_inc_count(handle_t *handle, struct inode *inode) +{ + inc_nlink(inode); + if (is_dx(inode) inode-i_nlink 1) { + /* limit is 16-bit i_links_count */ + if (inode-i_nlink = EXT4_LINK_MAX || inode-i_nlink == 2) { + inode-i_nlink = 1; + EXT4_SET_RO_COMPAT_FEATURE(inode-i_sb, + EXT4_FEATURE_RO_COMPAT_DIR_NLINK); + } + } +} + +static inline void ext4_dec_count(handle_t *handle, struct inode *inode) +{ + drop_nlink(inode); + if (S_ISDIR(inode-i_mode) inode-i_nlink == 0) + inc_nlink(inode); +} + + static int ext4_add_nondir(handle_t *handle, struct dentry *dentry, struct inode *inode) { @@ -1715,7 +1736,7 @@ static int ext4_mkdir(struct inode * dir struct ext4_dir_entry_2 * de; int err, retries = 0; - if (dir-i_nlink = EXT4_LINK_MAX) + if (EXT4_DIR_LINK_MAX(dir)) return -EMLINK; retry: @@ -1738,7 +1759,7 @@ retry: inode-i_size = EXT4_I(inode)-i_disksize = inode-i_sb-s_blocksize; dir_block = ext4_bread (handle, inode, 0, 1, err); if (!dir_block) { - drop_nlink(inode); /* is this nlink == 0? */ + ext4_dec_count(handle, inode); /* is this nlink == 0? */ ext4_mark_inode_dirty(handle, inode); iput (inode); goto out_stop; @@ -1770,7 +1791,7 @@ retry: iput (inode); goto out_stop; } - inc_nlink(dir); + ext4_inc_count(handle, dir); ext4_update_dx_flag(dir); ext4_mark_inode_dirty(handle, dir); d_instantiate(dentry, inode); @@ -2035,10 +2056,10 @@ static int ext4_rmdir (struct inode * di retval = ext4_delete_entry(handle, dir, de, bh); if (retval) goto end_rmdir; - if (inode-i_nlink != 2) - ext4_warning (inode-i_sb, ext4_rmdir, - empty directory has nlink!=2 (%d), - inode-i_nlink); + if (!EXT4_DIR_LINK_EMPTY(inode)) + ext4_warning(inode-i_sb, ext4_rmdir, +empty directory has too many links (%d), +inode-i_nlink); inode-i_version++; clear_nlink(inode); /* There's no need to set i_disksize: the fact that i_nlink is @@ -2048,7 +2069,7 @@ static int ext4_rmdir (struct inode * di ext4_orphan_add(handle, inode); inode-i_ctime = dir-i_ctime = dir-i_mtime = ext4_current_time(inode); ext4_mark_inode_dirty(handle, inode); - drop_nlink(dir); + ext4_dec_count(handle, dir); ext4_update_dx_flag(dir); ext4_mark_inode_dirty(handle, dir); @@ -2099,7 +2120,7 @@ static int ext4_unlink(struct inode * di dir-i_ctime = dir-i_mtime = ext4_current_time(dir); ext4_update_dx_flag(dir); ext4_mark_inode_dirty(handle, dir); - drop_nlink(inode); + ext4_dec_count(handle, inode); if (!inode-i_nlink) ext4_orphan_add(handle, inode); inode-i_ctime = ext4_current_time(inode); @@ -2149,7 +2170,7 @@ retry: err = __page_symlink(inode, symname, l, mapping_gfp_mask(inode-i_mapping) ~__GFP_FS); if (err) { - drop_nlink(inode); + ext4_dec_count(handle, inode);
[EXT4 set 8][PATCH 1/1]Add journal checksums
Journal checksum feature has been added to detect corruption of journal. Signed-off-by: Andreas Dilger [EMAIL PROTECTED] Signed-off-by: Girish Shilamkar [EMAIL PROTECTED] Signed-off-by: Dave Kleikamp [EMAIL PROTECTED] diff -Nurp linux024/fs/ext4/super.c linux/fs/ext4/super.c --- linux024/fs/ext4/super.c2007-06-25 16:19:24.0 -0500 +++ linux/fs/ext4/super.c 2007-06-26 08:35:16.0 -0500 @@ -721,6 +721,7 @@ enum { Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl, Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh, Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev, + Opt_journal_checksum, Opt_journal_async_commit, Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback, Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, @@ -760,6 +761,8 @@ static match_table_t tokens = { {Opt_journal_update, journal=update}, {Opt_journal_inum, journal=%u}, {Opt_journal_dev, journal_dev=%u}, + {Opt_journal_checksum, journal_checksum}, + {Opt_journal_async_commit, journal_async_commit}, {Opt_abort, abort}, {Opt_data_journal, data=journal}, {Opt_data_ordered, data=ordered}, @@ -948,6 +951,13 @@ static int parse_options (char *options, return 0; *journal_devnum = option; break; + case Opt_journal_checksum: + set_opt (sbi-s_mount_opt, JOURNAL_CHECKSUM); + break; + case Opt_journal_async_commit: + set_opt (sbi-s_mount_opt, JOURNAL_ASYNC_COMMIT); + set_opt (sbi-s_mount_opt, JOURNAL_CHECKSUM); + break; case Opt_noload: set_opt (sbi-s_mount_opt, NOLOAD); break; @@ -1817,6 +1827,21 @@ static int ext4_fill_super (struct super goto failed_mount4; } + if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { + jbd2_journal_set_features(sbi-s_journal, + JBD2_FEATURE_COMPAT_CHECKSUM, 0, + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT); + } else if (test_opt(sb, JOURNAL_CHECKSUM)) { + jbd2_journal_set_features(sbi-s_journal, + JBD2_FEATURE_COMPAT_CHECKSUM, 0, 0); + jbd2_journal_clear_features(sbi-s_journal, 0, 0, + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT); + } else { + jbd2_journal_clear_features(sbi-s_journal, + JBD2_FEATURE_COMPAT_CHECKSUM, 0, + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT); + } + /* We have now updated the journal if required, so we can * validate the data journaling mode. */ switch (test_opt(sb, DATA_FLAGS)) { diff -Nurp linux024/fs/jbd2/commit.c linux/fs/jbd2/commit.c --- linux024/fs/jbd2/commit.c 2007-06-25 16:19:25.0 -0500 +++ linux/fs/jbd2/commit.c 2007-06-26 08:40:03.0 -0500 @@ -21,6 +21,7 @@ #include linux/mm.h #include linux/pagemap.h #include linux/jiffies.h +#include linux/crc32.h /* * Default IO end handler for temporary BJ_IO buffer_heads. @@ -93,15 +94,18 @@ static int inverted_lock(journal_t *jour return 1; } -/* Done it all: now write the commit record. We should have +/* + * Done it all: now submit the commit record. We should have * cleaned up our previous buffers by now, so if we are in abort * mode we can now just skip the rest of the journal write * entirely. * * Returns 1 if the journal needs to be aborted or 0 on success */ -static int journal_write_commit_record(journal_t *journal, - transaction_t *commit_transaction) +static int journal_submit_commit_record(journal_t *journal, + transaction_t *commit_transaction, + struct buffer_head **cbh, + __u32 crc32_sum) { struct journal_head *descriptor; struct buffer_head *bh; @@ -117,21 +121,36 @@ static int journal_write_commit_record(j bh = jh2bh(descriptor); - /* AKPM: buglet - add `i' to tmp! */ for (i = 0; i bh-b_size; i += 512) { - journal_header_t *tmp = (journal_header_t*)bh-b_data; + struct commit_header *tmp = + (struct commit_header *)(bh-b_data + i); tmp-h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER); tmp-h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK); tmp-h_sequence = cpu_to_be32(commit_transaction-t_tid); + + if (JBD2_HAS_COMPAT_FEATURE(journal, +
[EXT4 set 9][PATCH 1/5]Morecleanups:ext4-zero_user_page
Use zero_user_page() in ext4 where possible. Signed-off-by: Eric Sandeen [EMAIL PROTECTED] Index: linux-2.6.22-rc4-mm2/fs/ext4/inode.c === --- linux-2.6.22-rc4-mm2.orig/fs/ext4/inode.c +++ linux-2.6.22-rc4-mm2/fs/ext4/inode.c @@ -1830,7 +1830,6 @@ int ext4_block_truncate_page(handle_t *h struct inode *inode = mapping-host; struct buffer_head *bh; int err = 0; - void *kaddr; if ((EXT4_I(inode)-i_flags EXT4_EXTENTS_FL) test_opt(inode-i_sb, EXTENTS) @@ -1847,10 +1846,7 @@ int ext4_block_truncate_page(handle_t *h */ if (!page_has_buffers(page) test_opt(inode-i_sb, NOBH) ext4_should_writeback_data(inode) PageUptodate(page)) { - kaddr = kmap_atomic(page, KM_USER0); - memset(kaddr + offset, 0, length); - flush_dcache_page(page); - kunmap_atomic(kaddr, KM_USER0); + zero_user_page(page, offset, length, KM_USER0); set_page_dirty(page); goto unlock; } @@ -1903,10 +1899,7 @@ int ext4_block_truncate_page(handle_t *h goto unlock; } - kaddr = kmap_atomic(page, KM_USER0); - memset(kaddr + offset, 0, length); - flush_dcache_page(page); - kunmap_atomic(kaddr, KM_USER0); + zero_user_page(page, offset, length, KM_USER0); BUFFER_TRACE(bh, zeroed end of block); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 9][PATCH 2/5]Morecleanups: use is_power_of_2 () in fill_super
Subject: is_power_of_2: ext4/super.c From: vignesh babu [EMAIL PROTECTED] Replace (n (n-1)) in the context of power of 2 checks with is_power_of_2() Signed-off-by: vignesh babu [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Dave Kleikamp [EMAIL PROTECTED] --- fs/ext4/super.c |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff -puN fs/ext4/super.c~is_power_of_2-ext4-superc fs/ext4/super.c --- a/fs/ext4/super.c~is_power_of_2-ext4-superc +++ a/fs/ext4/super.c @@ -36,6 +36,7 @@ #include linux/namei.h #include linux/quotaops.h #include linux/seq_file.h +#include linux/log2.h #include asm/uaccess.h @@ -1662,7 +1663,7 @@ static int ext4_fill_super (struct super sbi-s_inode_size = le16_to_cpu(es-s_inode_size); sbi-s_first_ino = le32_to_cpu(es-s_first_ino); if ((sbi-s_inode_size EXT4_GOOD_OLD_INODE_SIZE) || - (sbi-s_inode_size (sbi-s_inode_size - 1)) || + (!is_power_of_2(sbi-s_inode_size)) || (sbi-s_inode_size blocksize)) { printk (KERN_ERR EXT4-fs: unsupported inode size: %d\n, _ Patches currently in -mm which might be from [EMAIL PROTECTED] are git-ubi.patch use-is_power_of_2-in-cxgb3-cxgb3_mainc.patch use-is_power_of_2-in-myri10ge-myri10gec.patch is_power_of_2-ext3-superc.patch is_power_of_2-ext4-superc.patch is_pwoer_of_2-jbd.patch - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 9][PATCH 3/5]Morecleanups:ext4-remove-extra-is_rdonly-check
Subject: ext4: remove extra IS_RDONLY() check From: Dave Hansen [EMAIL PROTECTED] ext4_change_inode_journal_flag() is only called from one location: ext4_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY() call in it so this one is superfluous. Signed-off-by: Dave Hansen [EMAIL PROTECTED] Cc: linux-ext4@vger.kernel.org Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Dave Kleikamp [EMAIL PROTECTED] --- fs/ext4/inode.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN fs/ext4/inode.c~ext4-remove-extra-is_rdonly-check fs/ext4/inode.c --- a/fs/ext4/inode.c~ext4-remove-extra-is_rdonly-check +++ a/fs/ext4/inode.c @@ -3352,7 +3352,7 @@ int ext4_change_inode_journal_flag(struc */ journal = EXT4_JOURNAL(inode); - if (is_journal_aborted(journal) || IS_RDONLY(inode)) + if (is_journal_aborted(journal)) return -EROFS; jbd2_journal_lock_updates(journal); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[EXT4 set 9][PATCH 5/5]Extent micro cleanups
From: Dmitry Monakhov [EMAIL PROTECTED] Subject: ext4: extent macros cleanup - Replace math equation to it's macro equivalent - make ext4_ext_grow_indepth() indexes/leaf correct Signed-off-by: Dmitry Monakhov [EMAIL PROTECTED] Acked-by: Alex Tomas [EMAIL PROTECTED] Signed-off-by: Dave Kleikamp [EMAIL PROTECTED] --- fs/ext4/extents.c | 11 +++ 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 12fe3d7..1fd00ac 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -375,7 +375,7 @@ ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int bloc ext_debug(binsearch for %d(idx): , block); l = EXT_FIRST_INDEX(eh) + 1; - r = EXT_FIRST_INDEX(eh) + le16_to_cpu(eh-eh_entries) - 1; + r = EXT_LAST_INDEX(eh); while (l = r) { m = l + (r - l) / 2; if (block le32_to_cpu(m-ei_block)) @@ -440,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block) ext_debug(binsearch for %d: , block); l = EXT_FIRST_EXTENT(eh) + 1; - r = EXT_FIRST_EXTENT(eh) + le16_to_cpu(eh-eh_entries) - 1; + r = EXT_LAST_EXTENT(eh); while (l = r) { m = l + (r - l) / 2; @@ -922,8 +922,11 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode, curp-p_hdr-eh_max = cpu_to_le16(ext4_ext_space_root_idx(inode)); curp-p_hdr-eh_entries = cpu_to_le16(1); curp-p_idx = EXT_FIRST_INDEX(curp-p_hdr); - /* FIXME: it works, but actually path[0] can be index */ - curp-p_idx-ei_block = EXT_FIRST_EXTENT(path[0].p_hdr)-ee_block; + + if (path[0].p_hdr-eh_depth) + curp-p_idx-ei_block = EXT_FIRST_INDEX(path[0].p_hdr)-ei_block; + else + curp-p_idx-ei_block = EXT_FIRST_EXTENT(path[0].p_hdr)-ee_block; ext4_idx_store_pblock(curp-p_idx, newblock); neh = ext_inode_hdr(inode); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs
On Sun, 01 Jul 2007 03:38:10 -0400 Mingming Cao [EMAIL PROTECTED] wrote: [PATCH] jbd2 stats through procfs The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4m=113538565128617w=2). It provides statistics via procfs such as transaction lifetime and size. [ This probably should be rewritten to use debugfs? -- Ted] Was a decision ever made on whether this should remain on procfs or be move to use debugfs? I can recall this being discuss but don't recall a firm decision on it. -JRS - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] BIG_BG vs extended META_BG in ext4
On Sat, Jun 30, 2007 at 11:39:08PM -0500, Jose R. Santos wrote: On Sat, 30 Jun 2007 01:51:25 -0400 Andreas Dilger [EMAIL PROTECTED] wrote: I don't think there is actually any fundamental difference between these proposals. The reality is that we cannot change the semantics of the META_BG flag at this point, since both e2fsprogs and ext3/ext4 in the kernel understand META_BG to mean only group descriptor backups are in groups {0, 1, last} of the metagroup and nothing else. Agree. I call it extended META_BG for lack of a better name, but a new feature flag will be required. It was the intention that META_BG include allowing the bitmap and inode tables to range anywhere outside of the block group, but that never got coded. It would be confusing though if we relaxed it withotu adding a feature bit, and I agree that we might as well use overload the BIG_BG group to indicate this feature. The fact that BIG_BG requires contiguous blocks for the bitmaps when they exceed blocksize*8 blocks still concerns me a minor amount, and given the hopeful inclusion of kernel patches that allow blocksize pagesize. Furthermore, I still wonder whether we will want to make blockgroups that much bigger (since reducing the allocation groups is not necessarily a smart thing; we will need to do some benchmarks with filesystem aging to see how this affects antifragmentation efforts), but the complexity engenered by adding BIG_BG isn't that bad (again, my only concern is with the contiguous bitmap blocks requirements). - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] BIG_BG vs extended META_BG in ext4
On Jun 30, 2007 23:40 -0500, Jose R. Santos wrote: Yes, I think bigger block groups will benefit extents a great deal since not only can we have larger extents, but I believe that as the filesystem ages the chances of getting large number contiguous block can be reduce with small block groups. This turns out not to be true, and in fact we need to change the unwritten extents patch a tiny bit. The reason is that we have limited the maximum extent size to 2^16-1 = 32767 blocks. The current maximum for the number of blocks in a group is 65528, so that we can always fit the free blocks count into a __u16 if the bitmaps and inode table are moved out of the group. Moving the bitmaps and itable will hit the max extent length. There are still other benefits to moving the metadata together. Now, the one minor problem with the unwritten extent patches is that by using the high bit of the ee_len this limits the extent length to 2^15-1 blocks, but it would be MUCH better if this limit was 2^16 blocks and it fit evenly into an empty group, consecutive extents were aligned, etc. It also doesn't make sense to have an uninitialized 0-length extent, so I think the unwritten extent (fallocate) patch needs to special case the ee_len = 65536 to be a regular extent instead of unwritten. With less groups, we load less group descriptors in memory, we have less I/O to read bitmap and inode array (because we manage less group descriptors again, because we load bigger bitmap and array in one time) Presumably, we would still need to access the same amount data but latencies should be reduce since we could do larger IO's and less seeks to read the bitmaps. I also wonder if there are benefits in terms of locality to having the bitmaps closer to its blocks vs having them far away like in xMETA_BG. Having the bitmaps together will fix this independent of BIG_BG. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix error handling in ext3_create_journal
--- From: Borislav Petkov [EMAIL PROTECTED] Fix error handling in ext3_create_journal according to kernel conventions. Signed-off-by: Borislav Petkov [EMAIL PROTECTED] -- Index: linux-2.6.22-rc6/fs/ext3/super.c === --- linux-2.6.22-rc6/fs/ext3/super.c.orig 2007-07-01 21:12:51.0 +0200 +++ linux-2.6.22-rc6/fs/ext3/super.c2007-07-01 21:14:32.0 +0200 @@ -2075,6 +2075,7 @@ unsigned int journal_inum) { journal_t *journal; + int err; if (sb-s_flags MS_RDONLY) { printk(KERN_ERR EXT3-fs: readonly filesystem when trying to @@ -2082,13 +2083,15 @@ return -EROFS; } - if (!(journal = ext3_get_journal(sb, journal_inum))) + journal = ext3_get_journal(sb, journal_inum); + if (!journal) return -EINVAL; printk(KERN_INFO EXT3-fs: creating new journal on inode %u\n, journal_inum); - if (journal_create(journal)) { + err = journal_create(journal); + if (err) { printk(KERN_ERR EXT3-fs: error creating journal.\n); journal_destroy(journal); return -EIO; -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Sat, Jun 30, 2007 at 11:21:11AM +0100, Christoph Hellwig wrote: On Tue, Jun 26, 2007 at 04:02:47PM +0530, Amit K. Arora wrote: Can you clarify - what is the current behaviour when ENOSPC (or some other error) is hit? Does it keep the current fallocate() or does it free it? Currently it is left on the file system implementation. In ext4, we do not undo preallocation if some error (say, ENOSPC) is hit. Hence it may end up with partial (pre)allocation. This is inline with dd and posix_fallocate, which also do not free the partially allocated space. I can't find anything in the specification of posix_fallocate (http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html) that tells what should happen to allocate blocks on error. Yeah, and AFAICT glibc leaves them behind ATM. But common sense would be to not leak disk space on failure of this syscall, and this definitively should not be left up to the filesystem, either we always leak it or always free it, and I'd strongly favour the latter variant. We can't simply walk the range an remove unwritten extents, as some of them may have been present before the fallocate() call. That makes it extremely difficult to undo a failed call and not remove more pre-existing pre-allocations. Given the current behaviour for posix_fallocate() in glibc, I think that retaining the same error semantic and punting the cleanup to userspace (where the app will fail with ENOSPC anyway) is the only sane thing we can do here. Trying to undo this in the kernel leads to lots of extra rarely used code in error handling paths... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html