Ext4 patches for 2.6.22-rc6

2007-07-01 Thread Mingming Cao
On Fri, 2007-06-29 at 13:57 -0700, Andrew Morton wrote:
 On Fri, 29 Jun 2007 11:50:04 -0400
 Mingming Caoc [EMAIL PROTECTED] wrote:
 
  I think the ext4 patch queue is in good shape now.
 
 Which ext4 patches are you intending to merge into 2.6.23?
 
 Please send all those out to lkml for review?

Hi Andrew, 

Here are the patches in ext4-patch-queue that I think can be considered
to be merged to upstream. Please review.

All of the patches have been posted on ext4 mailinglist before. Some are
bug fixes, some are features, to summaries:
- make extents on by default in ext4dev
- nanosecond timestamp
- 64 bit inode versioning support
- remove 32k subdir limits
- journal  checksumming
- journal stats via procfs
- delayed allocation for ext4 writeback mode
- fallocate()

All the patches can be found at http://repo.or.cz/w/ext4-patch-queue.git
and have been tested(with fsx ,dbench, FFSB, iozone) on
x86,x86_64,ppc64, with extents and delayed allocation enabled

And the full series can be found at
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=series;h=2f43431db28778ce8d2149bce7a51566a2d2517c;hb=56e27e20cf228b32f5162a76b3bad154d1d3b730

I will post the patches-in-good-shape (in 9 set of patches) to lkml in
the following emails, except for the bottom two feature:

*the fallocate() patches, which Amit just posted a few days ago and are
under review (hopefully we can reach a agreement on the interface and
the modes before 2.6.23-rc1 window closed).

*Another one is the delayed allocation patches in ext4 patch queue. Alex
mentioned in another email that he is working on another version of
delalloc that can handle block size  page size, and move some work to
vfs. So it's probably not very useful to post this version for people to
review.


So, here is the series file.

# Rebased the patches to 2.6.22-rc6

# Add mount option to turn off extents
ext4_noextent_mount_opt.patch

# Mounted ext4dev fs with extents by default for testing purpose,
# for Ext4 product release, extents mount option
# will be turn on only if the fs has EXTENTS feature on
ext4_extents_on_by_default.patch

# Propagate inode flags
ext4-propagate_flags.patch

# Add extent sanity checks
ext4-extent-sanity-checks.patch

# Bug fix:set 64bit JBD2 feature on 32bit ext4 fs
ext4_set_jbd2_64bit_feature.patch

# Fix: Rename CONFIG_JBD_DEBUG to CONFIG_JBD2_DEBUG
jbd2_config_jbd2_debug_fix.patch

# Export jbd2-debug via debugfs
ext4_CONFIG_JBD2_DEBUG.patch
jbd2_move_jbd2_debug_to_debugfs.patch

# Nanosecond timestamp support
ext4-nanosecond-patch

# inode verion patch series
# inode versioning is needed for NFSv4

# vfs changes, 64 bit inode-i_version
64-bit-i_version.patch
# reserve hi 32 bit inode version on ext4 on-disk inode
i_version_hi.patch
# ext4 inode version read/store
ext4_i_version_hi_2.patch
# ext4 inode version update
i_version_update_ext4.patch
# add a noversion mount option to disable inode version updates
ext4_no_version.patch

# New patch to expand inode i_extra_isize to support features
# in high part of inode (128 bytes)
ext4_expand_inode_extra_isize.patch

# Export jbd stats through procfs
# Shall this move to debugfs?
jbd-stats-through-procfs

# Remove 32000 subdirs limit. 
ext4_remove_subdirs_limit.patch

# Add journal checksums
ext4-journal_chksum-2.6.20.patch

# Various Cleanups
ext4-zero_user_page.patch
is_power_of_2-ext4-superc.patch
ext4-remove-extra-is_rdonly-check.patch
ext4_extent_compilation_fixes.patch
ext4_extent_macros_cleanup.patch


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 1][PATCH 1/2] Add noextents mount option

2007-07-01 Thread Mingming Cao
Add a mount option to turn off extents.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
Index: linux-2.6.22-rc4/fs/ext4/super.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/super.c   2007-06-11 17:02:18.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-11 17:02:22.0 -0700
@@ -725,7 +725,7 @@
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
-   Opt_grpquota, Opt_extents,
+   Opt_grpquota, Opt_extents, Opt_noextents,
 };
 
 static match_table_t tokens = {
@@ -776,6 +776,7 @@
{Opt_usrquota, usrquota},
{Opt_barrier, barrier=%u},
{Opt_extents, extents},
+   {Opt_noextents, noextents},
{Opt_err, NULL},
{Opt_resize, resize},
 };
@@ -,6 +1112,9 @@
case Opt_extents:
set_opt (sbi-s_mount_opt, EXTENTS);
break;
+   case Opt_noextents:
+   clear_opt (sbi-s_mount_opt, EXTENTS);
+   break;
default:
printk (KERN_ERR
EXT4-fs: Unrecognized mount option \%s\ 


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 1][PATCH 2/2] Enable extents by default for ext4dev

2007-07-01 Thread Mingming Cao
Turn on extents feature by default in ext4 filesystem. User could use
-o noextents to turn it off.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: linux-2.6.22-rc4/fs/ext4/super.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/super.c   2007-06-11 17:02:22.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-11 17:03:09.0 -0700
@@ -1546,6 +1546,12 @@
 
set_opt(sbi-s_mount_opt, RESERVATION);
 
+   /*
+* turn on extents feature by default in ext4 filesystem
+* User -o noextents to turn it off
+*/
+   set_opt (sbi-s_mount_opt, EXTENTS);
+
if (!parse_options ((char *) data, sb, journal_inum, journal_devnum,
NULL, 0))
goto failed_mount;


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 2][PATCH 1/5] cleanups: Propagate some i_flags to disk

2007-07-01 Thread Mingming Cao
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext4-specific i_flags. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: linux-2.6.22-rc4/fs/ext4/inode.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/inode.c   2007-06-11 17:24:01.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/inode.c2007-06-11 17:24:28.0 -0700
@@ -2583,6 +2583,25 @@
inode-i_flags |= S_DIRSYNC;
 }
 
+/* Propagate flags from i_flags to EXT4_I(inode)-i_flags */
+void ext4_get_inode_flags(struct ext4_inode_info *ei)
+{
+   unsigned int flags = ei-vfs_inode.i_flags;
+
+   ei-i_flags = ~(EXT4_SYNC_FL|EXT4_APPEND_FL|
+   EXT4_IMMUTABLE_FL|EXT4_NOATIME_FL|EXT4_DIRSYNC_FL);
+   if (flags  S_SYNC)
+   ei-i_flags |= EXT4_SYNC_FL;
+   if (flags  S_APPEND)
+   ei-i_flags |= EXT4_APPEND_FL;
+   if (flags  S_IMMUTABLE)
+   ei-i_flags |= EXT4_IMMUTABLE_FL;
+   if (flags  S_NOATIME)
+   ei-i_flags |= EXT4_NOATIME_FL;
+   if (flags  S_DIRSYNC)
+   ei-i_flags |= EXT4_DIRSYNC_FL;
+}
+
 void ext4_read_inode(struct inode * inode)
 {
struct ext4_iloc iloc;
@@ -2742,6 +2761,7 @@
if (ei-i_state  EXT4_STATE_NEW)
memset(raw_inode, 0, EXT4_SB(inode-i_sb)-s_inode_size);
 
+   ext4_get_inode_flags(ei);
raw_inode-i_mode = cpu_to_le16(inode-i_mode);
if(!(test_opt(inode-i_sb, NO_UID32))) {
raw_inode-i_uid_low = cpu_to_le16(low_16_bits(inode-i_uid));
Index: linux-2.6.22-rc4/fs/ext4/ioctl.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/ioctl.c   2007-06-11 17:24:01.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/ioctl.c2007-06-11 17:25:11.0 -0700
@@ -28,6 +28,7 @@
 
switch (cmd) {
case EXT4_IOC_GETFLAGS:
+   ext4_get_inode_flags(ei);
flags = ei-i_flags  EXT4_FL_USER_VISIBLE;
return put_user(flags, (int __user *) arg);
case EXT4_IOC_SETFLAGS: {
Index: linux-2.6.22-rc4/include/linux/ext4_fs.h
===
--- linux-2.6.22-rc4.orig/include/linux/ext4_fs.h   2007-06-11 
17:24:01.0 -0700
+++ linux-2.6.22-rc4/include/linux/ext4_fs.h2007-06-11 17:24:28.0 
-0700
@@ -862,6 +862,7 @@
 extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
 extern void ext4_truncate (struct inode *);
 extern void ext4_set_inode_flags(struct inode *);
+extern void ext4_get_inode_flags(struct ext4_inode_info *);
 extern void ext4_set_aops(struct inode *inode);
 extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_block_truncate_page(handle_t *handle, struct page *page,


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks

2007-07-01 Thread Mingming Cao
with the patch all headers are checked. the code should become
more resistant to on-disk corruptions. needless BUG_ON() have
been removed. please, review for inclusion.

Signed-off-by: Alex Tomas [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: linux-2.6.22-rc4/fs/ext4/extents.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/extents.c 2007-06-11 17:22:15.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/extents.c  2007-06-11 17:27:57.0 -0700
@@ -91,36 +91,6 @@
ix-ei_leaf_hi = cpu_to_le16((unsigned long) ((pb  31)  1)  
0x);
 }
 
-static int ext4_ext_check_header(const char *function, struct inode *inode,
-   struct ext4_extent_header *eh)
-{
-   const char *error_msg = NULL;
-
-   if (unlikely(eh-eh_magic != EXT4_EXT_MAGIC)) {
-   error_msg = invalid magic;
-   goto corrupted;
-   }
-   if (unlikely(eh-eh_max == 0)) {
-   error_msg = invalid eh_max;
-   goto corrupted;
-   }
-   if (unlikely(le16_to_cpu(eh-eh_entries)  le16_to_cpu(eh-eh_max))) {
-   error_msg = invalid eh_entries;
-   goto corrupted;
-   }
-   return 0;
-
-corrupted:
-   ext4_error(inode-i_sb, function,
-   bad header in inode #%lu: %s - magic %x, 
-   entries %u, max %u, depth %u,
-   inode-i_ino, error_msg, le16_to_cpu(eh-eh_magic),
-   le16_to_cpu(eh-eh_entries), le16_to_cpu(eh-eh_max),
-   le16_to_cpu(eh-eh_depth));
-
-   return -EIO;
-}
-
 static handle_t *ext4_ext_journal_restart(handle_t *handle, int needed)
 {
int err;
@@ -269,6 +239,70 @@
return size;
 }
 
+static inline int
+ext4_ext_max_entries(struct inode *inode, int depth)
+{
+   int max;
+
+   if (depth == ext_depth(inode)) {
+   if (depth == 0)
+   max = ext4_ext_space_root(inode);
+   else
+   max = ext4_ext_space_root_idx(inode);
+   } else {
+   if (depth == 0)
+   max = ext4_ext_space_block(inode);
+   else
+   max = ext4_ext_space_block_idx(inode);
+   }
+
+   return max;
+}
+
+static int __ext4_ext_check_header(const char *function, struct inode *inode,
+   struct ext4_extent_header *eh,
+   int depth)
+{
+   const char *error_msg = NULL;
+   int max = 0;
+
+   if (unlikely(eh-eh_magic != EXT4_EXT_MAGIC)) {
+   error_msg = invalid magic;
+   goto corrupted;
+   }
+   if (unlikely(le16_to_cpu(eh-eh_depth) != depth)) {
+   error_msg = unexpected eh_depth;
+   goto corrupted;
+   }
+   if (unlikely(eh-eh_max == 0)) {
+   error_msg = invalid eh_max;
+   goto corrupted;
+   }
+   max = ext4_ext_max_entries(inode, depth);
+   if (unlikely(le16_to_cpu(eh-eh_max)  max)) {
+   error_msg = too large eh_max;
+   goto corrupted;
+   }
+   if (unlikely(le16_to_cpu(eh-eh_entries)  le16_to_cpu(eh-eh_max))) {
+   error_msg = invalid eh_entries;
+   goto corrupted;
+   }
+   return 0;
+
+corrupted:
+   ext4_error(inode-i_sb, function,
+   bad header in inode #%lu: %s - magic %x, 
+   entries %u, max %u(%u), depth %u(%u),
+   inode-i_ino, error_msg, le16_to_cpu(eh-eh_magic),
+   le16_to_cpu(eh-eh_entries), le16_to_cpu(eh-eh_max),
+   max, le16_to_cpu(eh-eh_depth), depth);
+
+   return -EIO;
+}
+
+#define ext4_ext_check_header(inode, eh, depth)\
+   __ext4_ext_check_header(__FUNCTION__, inode, eh, depth)
+
 #ifdef EXT_DEBUG
 static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path)
 {
@@ -329,6 +363,7 @@
 /*
  * ext4_ext_binsearch_idx:
  * binary search for the closest index of the given block
+ * the header must be checked before calling this
  */
 static void
 ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int 
block)
@@ -336,9 +371,6 @@
struct ext4_extent_header *eh = path-p_hdr;
struct ext4_extent_idx *r, *l, *m;
 
-   BUG_ON(eh-eh_magic != EXT4_EXT_MAGIC);
-   BUG_ON(le16_to_cpu(eh-eh_entries)  le16_to_cpu(eh-eh_max));
-   BUG_ON(le16_to_cpu(eh-eh_entries) = 0);
 
ext_debug(binsearch for %d(idx):  , block);
 
@@ -388,6 +420,7 @@
 /*
  * ext4_ext_binsearch:
  * binary search for closest extent of the given block
+ * the header must be checked before calling this
  */
 static void
 ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block)
@@ -395,9 +428,6 @@
struct ext4_extent_header *eh = path-p_hdr;
struct 

[EXT4 set 2][PATCH 3/5] cleanups: set_jbd2_64bit_feature for 16TB ext4 fs

2007-07-01 Thread Mingming Cao
Set the journals JBD2_FEATURE_INCOMPAT_64BIT on devices with more
than 32bit block sizes during mount time.  This ensure proper record
lenth when writing to the journal.

Signed-off-by: Jose R. Santos [EMAIL PROTECTED]
Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
Signed-off-by: Laurent Vivier [EMAIL PROTECTED]
---
 fs/ext4/super.c |   11 +++
 1 file changed, 11 insertions(+)

Index: linux-2.6.22-rc4/fs/ext4/super.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/super.c   2007-06-11 16:15:54.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-11 16:16:10.0 -0700
@@ -1804,6 +1804,13 @@
goto failed_mount3;
}
 
+   if (ext4_blocks_count(es)  0xULL 
+   !jbd2_journal_set_features(EXT4_SB(sb)-s_journal, 0, 0,
+  JBD2_FEATURE_INCOMPAT_64BIT)) {
+   printk(KERN_ERR ext4: Failed to set 64-bit journal feature\n);
+   goto failed_mount4;
+   }
+
/* We have now updated the journal if required, so we can
 * validate the data journaling mode. */
switch (test_opt(sb, DATA_FLAGS)) {


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 2][PATCH 4/5] cleanups: Rename CONFIG_JBD_DEBUG to CONFIG_JBD2_DEBUG

2007-07-01 Thread Mingming Cao
When the JBD code was forked to create the new JBD2 code base, the
references to CONFIG_JBD_DEBUG where never changed to
CONFIG_JBD2_DEBUG.  This patch fixes that.

Signed-off-by: Jose R. Santos [EMAIL PROTECTED]
---
Index: linux-2.6.22-rc4/fs/jbd2/journal.c
===
--- linux-2.6.22-rc4.orig/fs/jbd2/journal.c 2007-06-11 16:15:49.0 
-0700
+++ linux-2.6.22-rc4/fs/jbd2/journal.c  2007-06-11 16:16:18.0 -0700
@@ -528,7 +528,7 @@
 {
int err = 0;
 
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
spin_lock(journal-j_state_lock);
if (!tid_geq(journal-j_commit_request, tid)) {
printk(KERN_EMERG
@@ -1709,7 +1709,7 @@
  * Journal_head storage management
  */
 static struct kmem_cache *jbd2_journal_head_cache;
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
 static atomic_t nr_journal_heads = ATOMIC_INIT(0);
 #endif
 
@@ -1747,7 +1747,7 @@
struct journal_head *ret;
static unsigned long last_warning;
 
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
atomic_inc(nr_journal_heads);
 #endif
ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
@@ -1768,7 +1768,7 @@
 
 static void journal_free_journal_head(struct journal_head *jh)
 {
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
atomic_dec(nr_journal_heads);
memset(jh, JBD_POISON_FREE, sizeof(*jh));
 #endif
@@ -1953,12 +1953,12 @@
 /*
  * /proc tunables
  */
-#if defined(CONFIG_JBD_DEBUG)
+#if defined(CONFIG_JBD2_DEBUG)
 int jbd2_journal_enable_debug;
 EXPORT_SYMBOL(jbd2_journal_enable_debug);
 #endif
 
-#if defined(CONFIG_JBD_DEBUG)  defined(CONFIG_PROC_FS)
+#if defined(CONFIG_JBD2_DEBUG)  defined(CONFIG_PROC_FS)
 
 static struct proc_dir_entry *proc_jbd_debug;
 
@@ -2073,7 +2073,7 @@
 
 static void __exit journal_exit(void)
 {
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
int n = atomic_read(nr_journal_heads);
if (n)
printk(KERN_EMERG JBD: leaked %d journal_heads!\n, n);
Index: linux-2.6.22-rc4/fs/jbd2/recovery.c
===
--- linux-2.6.22-rc4.orig/fs/jbd2/recovery.c2007-06-04 17:57:25.0 
-0700
+++ linux-2.6.22-rc4/fs/jbd2/recovery.c 2007-06-11 16:16:18.0 -0700
@@ -295,7 +295,7 @@
printk(KERN_ERR JBD: error %d scanning journal\n, err);
++journal-j_transaction_sequence;
} else {
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
int dropped = info.end_transaction - 
be32_to_cpu(sb-s_sequence);
 #endif
jbd_debug(0,
Index: linux-2.6.22-rc4/include/linux/ext4_fs.h
===
--- linux-2.6.22-rc4.orig/include/linux/ext4_fs.h   2007-06-11 
16:15:59.0 -0700
+++ linux-2.6.22-rc4/include/linux/ext4_fs.h2007-06-11 16:16:18.0 
-0700
@@ -237,7 +237,7 @@
 #define EXT4_IOC_GROUP_ADD _IOW('f', 8,struct ext4_new_group_input)
 #defineEXT4_IOC_GETVERSION_OLD FS_IOC_GETVERSION
 #defineEXT4_IOC_SETVERSION_OLD FS_IOC_SETVERSION
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
 #define EXT4_IOC_WAIT_FOR_READONLY _IOR('f', 99, long)
 #endif
 #define EXT4_IOC_GETRSVSZ  _IOR('f', 5, long)
@@ -253,7 +253,7 @@
 #define EXT4_IOC32_GETRSVSZ_IOR('f', 5, int)
 #define EXT4_IOC32_SETRSVSZ_IOW('f', 6, int)
 #define EXT4_IOC32_GROUP_EXTEND_IOW('f', 7, unsigned int)
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
 #define EXT4_IOC32_WAIT_FOR_READONLY   _IOR('f', 99, int)
 #endif
 #define EXT4_IOC32_GETVERSION_OLD  FS_IOC32_GETVERSION
Index: linux-2.6.22-rc4/include/linux/ext4_fs_sb.h
===
--- linux-2.6.22-rc4.orig/include/linux/ext4_fs_sb.h2007-06-11 
16:15:55.0 -0700
+++ linux-2.6.22-rc4/include/linux/ext4_fs_sb.h 2007-06-11 16:16:18.0 
-0700
@@ -71,7 +71,7 @@
struct list_head s_orphan;
unsigned long s_commit_interval;
struct block_device *journal_bdev;
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
struct timer_list turn_ro_timer;/* For turning read-only (crash 
simulation) */
wait_queue_head_t ro_wait_queue;/* For people waiting for the 
fs to go read-only */
 #endif
Index: linux-2.6.22-rc4/include/linux/jbd2.h
===
--- linux-2.6.22-rc4.orig/include/linux/jbd2.h  2007-06-11 16:15:49.0 
-0700
+++ linux-2.6.22-rc4/include/linux/jbd2.h   2007-06-11 16:16:18.0 
-0700
@@ -50,11 +50,11 @@
  */
 #define JBD_DEFAULT_MAX_COMMIT_AGE 5
 
-#ifdef CONFIG_JBD_DEBUG
+#ifdef CONFIG_JBD2_DEBUG
 /*
  * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal
  * consistency checks.  By default we don't do this unless
- * CONFIG_JBD_DEBUG 

[EXT4 set 4][PATCH 2/5] i_version: Add hi 32 bit inode version on ext4 on-disk inode

2007-07-01 Thread Mingming Cao
This patch adds a 32-bit i_version_hi field to ext4_inode, which can be used 
for 64-bit inode versions. This field will store the higher 32 bits of the 
version, while Jean Noel's patch has added support to store the lower 32-bits 
in osd1.linux1.l_i_version.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Kalpak Shah [EMAIL PROTECTED]
---
Index: linux-2.6.21/include/linux/ext4_fs.h
===
--- linux-2.6.21.orig/include/linux/ext4_fs.h
+++ linux-2.6.21/include/linux/ext4_fs.h
@@ -342,6 +342,7 @@ struct ext4_inode {
__le32  i_atime_extra;  /* extra Access time  (nsec  2 | epoch) */
__le32  i_crtime;   /* File Creation time */
__le32  i_crtime_extra; /* extra FileCreationtime (nsec  2 | epoch) */
+   __le32  i_version_hi;   /* high 32 bits for 64-bit version */
 };

 #define i_size_highi_dir_acl


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 4][PATCH 3/5] i_version:ext4 inode version read/store

2007-07-01 Thread Mingming Cao
This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.

Signed-off-by: Kalpak Shah [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: linux-2.6.21/fs/ext4/inode.c
===
--- linux-2.6.21.orig/fs/ext4/inode.c
+++ linux-2.6.21/fs/ext4/inode.c
@@ -2709,6 +2709,13 @@ void ext4_read_inode(struct inode * inod
EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode);
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);

+   inode-i_version = le32_to_cpu(raw_inode-i_disk_version);
+   if (EXT4_INODE_SIZE(inode-i_sb)  EXT4_GOOD_OLD_INODE_SIZE) {
+   if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+   inode-i_version |=
+   (__u64)(le32_to_cpu(raw_inode-i_version_hi))  32;
+   }
+
if (S_ISREG(inode-i_mode)) {
inode-i_op = ext4_file_inode_operations;
inode-i_fop = ext4_file_operations;
@@ -2852,8 +2859,14 @@ static int ext4_do_update_inode(handle_t
} else for (block = 0; block  EXT4_N_BLOCKS; block++)
raw_inode-i_block[block] = ei-i_data[block];

-   if (ei-i_extra_isize)
+   raw_inode-i_disk_version = cpu_to_le32(inode-i_version);
+   if (ei-i_extra_isize) {
+   if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) {
+   raw_inode-i_version_hi =
+   cpu_to_le32(inode-i_version  32);
+   }
raw_inode-i_extra_isize = cpu_to_le16(ei-i_extra_isize);
+   }

BUFFER_TRACE(bh, call ext4_journal_dirty_metadata);
rc = ext4_journal_dirty_metadata(handle, bh);
Index: linux-2.6.21/include/linux/ext4_fs.h
===
--- linux-2.6.21.orig/include/linux/ext4_fs.h
+++ linux-2.6.21/include/linux/ext4_fs.h
@@ -297,7 +297,7 @@ struct ext4_inode {
__le32  i_flags;/* File flags */
union {
struct {
-   __u32  l_i_reserved1;
+   __u32  l_i_version;
} linux1;
struct {
__u32  h_i_translator;
@@ -406,6 +406,8 @@ do {
   \
   raw_inode-xtime ## _extra);\
 } while (0)

+#define i_disk_version osd1.linux1.l_i_version
+
 #if defined(__KERNEL__) || defined(__linux__)
 #define i_reserved1osd1.linux1.l_i_reserved1
 #define i_frag osd2.linux2.l_i_frag


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 4][PATCH 4/5] i_version:ext4 inode version update

2007-07-01 Thread Mingming Cao
This patch is on top of i_version_update_vfs.
The i_version field of the inode is set on inode creation and incremented when
the inode is being modified.

Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: linux-2.6.22-rc4/fs/ext4/ialloc.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/ialloc.c  2007-06-13 17:16:28.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/ialloc.c   2007-06-13 17:24:45.0 -0700
@@ -565,6 +565,7 @@ got:
inode-i_blocks = 0;
inode-i_mtime = inode-i_atime = inode-i_ctime = ei-i_crtime =
   ext4_current_time(inode);
+   inode-i_version = 1;
 
memset(ei-i_data, 0, sizeof(ei-i_data));
ei-i_dir_start_lookup = 0;
Index: linux-2.6.22-rc4/fs/ext4/inode.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/inode.c   2007-06-13 17:21:29.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/inode.c2007-06-13 17:24:45.0 -0700
@@ -3082,6 +3082,7 @@ int ext4_mark_iloc_dirty(handle_t *handl
 {
int err = 0;
 
+   inode-i_version++;
/* the do_update_inode consumes one bh-b_count */
get_bh(iloc-bh);
 
Index: linux-2.6.22-rc4/fs/ext4/super.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/super.c   2007-06-13 17:19:11.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/super.c2007-06-13 17:24:45.0 -0700
@@ -2846,8 +2846,8 @@ out:
i_size_write(inode, off+len-towrite);
EXT4_I(inode)-i_disksize = inode-i_size;
}
-   inode-i_version++;
inode-i_mtime = inode-i_ctime = CURRENT_TIME;
+   inode-i_version = 1;
ext4_mark_inode_dirty(handle, inode);
mutex_unlock(inode-i_mutex);
return len - towrite;


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 4][PATCH 5/5] i_version: noversion mount option to disable inode version updates

2007-07-01 Thread Mingming Cao
Add a noversion mount option to disable inode version updates.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Kalpak Shah [EMAIL PROTECTED]

Index: linux-2.6.21/fs/ext4/super.c
===
--- linux-2.6.21.orig/fs/ext4/super.c
+++ linux-2.6.21/fs/ext4/super.c
@@ -725,7 +725,7 @@ enum {
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
-   Opt_grpquota, Opt_extents, Opt_noextents,
+   Opt_grpquota, Opt_extents, Opt_noextents, Opt_noversion,
 };

 static match_table_t tokens = {
@@ -777,6 +777,7 @@ static match_table_t tokens = {
{Opt_barrier, barrier=%u},
{Opt_extents, extents},
{Opt_noextents, noextents},
+   {Opt_noversion, noversion},
{Opt_err, NULL},
{Opt_resize, resize},
 };
@@ -1115,6 +1116,9 @@ clear_qf_name:
case Opt_noextents:
clear_opt (sbi-s_mount_opt, EXTENTS);
break;
+   case Opt_noversion:
+   set_opt(sbi-s_mount_opt, NOVERSION);
+   break;
default:
printk (KERN_ERR
EXT4-fs: Unrecognized mount option \%s\ 
Index: linux-2.6.21/include/linux/ext4_fs.h
===
--- linux-2.6.21.orig/include/linux/ext4_fs.h
+++ linux-2.6.21/include/linux/ext4_fs.h
@@ -473,6 +473,7 @@ do {
   \
 #define EXT4_MOUNT_USRQUOTA0x10 /* old user quota */
 #define EXT4_MOUNT_GRPQUOTA0x20 /* old group quota */
 #define EXT4_MOUNT_EXTENTS 0x40 /* Extents support */
+#define EXT4_MOUNT_NOVERSION   0x80 /* No inode version updates */

 /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */
 #ifndef _LINUX_EXT2_FS_H
Index: linux-2.6.21/fs/ext4/inode.c
===
--- linux-2.6.21.orig/fs/ext4/inode.c
+++ linux-2.6.21/fs/ext4/inode.c
@@ -3082,7 +3082,9 @@ int ext4_mark_iloc_dirty(handle_t *handl
 {
int err = 0;

-   inode-i_version++;
+   if (!test_opt(inode-i_sb, NOVERSION))
+   inode-i_version++;
+
/* the do_update_inode consumes one bh-b_count */
get_bh(iloc-bh);



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-07-01 Thread Mingming Cao
[PATCH] jbd2 stats through procfs

The patch below updates the jbd stats patch to 2.6.20/jbd2.
The initial patch was posted by Alex Tomas in December 2005
(http://marc.info/?l=linux-ext4m=113538565128617w=2).
It provides statistics via procfs such as transaction lifetime and size.

[ This probably should be rewritten to use debugfs?   -- Ted]

Signed-off-by: Johann Lombardi [EMAIL PROTECTED]
--

Index: linux-2.6.22-rc4/include/linux/jbd2.h
===
--- linux-2.6.22-rc4.orig/include/linux/jbd2.h  2007-06-11 17:28:17.0 
-0700
+++ linux-2.6.22-rc4/include/linux/jbd2.h   2007-06-13 10:45:21.0 
-0700
@@ -408,6 +408,16 @@
 };
 
 
+/*
+ * Some stats for checkpoint phase
+ */
+struct transaction_chp_stats_s {
+   unsigned long   cs_chp_time;
+   unsigned long   cs_forced_to_close;
+   unsigned long   cs_written;
+   unsigned long   cs_dropped;
+};
+
 /* The transaction_t type is the guts of the journaling mechanism.  It
  * tracks a compound transaction through its various states:
  *
@@ -543,6 +553,21 @@
spinlock_t  t_handle_lock;
 
/*
+* Longest time some handle had to wait for running transaction
+*/
+   unsigned long   t_max_wait;
+
+   /*
+* When transaction started
+*/
+   unsigned long   t_start;
+
+   /*
+* Checkpointing stats [j_checkpoint_sem]
+*/
+   struct transaction_chp_stats_s t_chp_stats;
+
+   /*
 * Number of outstanding updates running on this transaction
 * [t_handle_lock]
 */
@@ -573,6 +598,57 @@
 
 };
 
+struct transaction_run_stats_s {
+   unsigned long   rs_wait;
+   unsigned long   rs_running;
+   unsigned long   rs_locked;
+   unsigned long   rs_flushing;
+   unsigned long   rs_logging;
+
+   unsigned long   rs_handle_count;
+   unsigned long   rs_blocks;
+   unsigned long   rs_blocks_logged;
+};
+
+struct transaction_stats_s
+{
+   int ts_type;
+   unsigned long   ts_tid;
+   union {
+   struct transaction_run_stats_s run;
+   struct transaction_chp_stats_s chp;
+   } u;
+};
+
+#define JBD2_STATS_RUN 1
+#define JBD2_STATS_CHECKPOINT  2
+
+#define ts_waitu.run.rs_wait
+#define ts_running u.run.rs_running
+#define ts_locked  u.run.rs_locked
+#define ts_flushingu.run.rs_flushing
+#define ts_logging u.run.rs_logging
+#define ts_handle_countu.run.rs_handle_count
+#define ts_blocks  u.run.rs_blocks
+#define ts_blocks_logged   u.run.rs_blocks_logged
+
+#define ts_chp_timeu.chp.cs_chp_time
+#define ts_forced_to_close u.chp.cs_forced_to_close
+#define ts_written u.chp.cs_written
+#define ts_dropped u.chp.cs_dropped
+
+#define CURRENT_MSECS  (jiffies_to_msecs(jiffies))
+
+static inline unsigned int
+jbd2_time_diff(unsigned int start, unsigned int end)
+{
+   if (unlikely(start  end))
+   end = end + (~0UL - start);
+   else
+   end -= start;
+   return end;
+}
+
 /**
  * struct journal_s - The journal_s type is the concrete type associated with
  * journal_t.
@@ -634,6 +710,12 @@
  * @j_wbufsize: maximum number of buffer_heads allowed in j_wbuf, the
  * number that will fit in j_blocksize
  * @j_last_sync_writer: most recent pid which did a synchronous write
+ * @j_history: Buffer storing the transactions statistics history
+ * @j_history_max: Maximum number of transactions in the statistics history
+ * @j_history_cur: Current number of transactions in the statistics history
+ * @j_history_lock: Protect the transactions statistics history
+ * @j_proc_entry: procfs entry for the jbd statistics directory
+ * @j_stats: Overall statistics
  * @j_private: An opaque pointer to fs-private information.
  */
 
@@ -826,6 +908,16 @@
pid_t   j_last_sync_writer;
 
/*
+* Journal statistics
+*/
+   struct transaction_stats_s *j_history;
+   int j_history_max;
+   int j_history_cur;
+   spinlock_t  j_history_lock;
+   struct proc_dir_entry   *j_proc_entry;
+   struct transaction_stats_s j_stats;
+
+   /*
 * An opaque pointer to fs-private information.  ext3 puts its
 * superblock pointer here
 */
Index: linux-2.6.22-rc4/fs/jbd2/transaction.c
===
--- linux-2.6.22-rc4.orig/fs/jbd2/transaction.c 2007-06-11 17:22:14.0 
-0700
+++ linux-2.6.22-rc4/fs/jbd2/transaction.c  2007-06-13 10:47:56.0 
-0700
@@ -59,6 +59,8 @@
 
J_ASSERT(journal-j_running_transaction 

[EXT4 set 7][PATCH 1/1]Remove 32000 subdirs limit.

2007-07-01 Thread Mingming Cao
From [EMAIL PROTECTED] Thu May 17 17:21:08 2007
Hi,

I have rebased this patch to 2.6.22-rc1 so that it can be added to the
ext4 patch queue. It has been tested by creating more than 65000 subdirs
and then deleting them and checking the nlinks. The e2fsprogs part of
this patch was sent earlier by me to linux-ext4 and doesn't need any
changes, so not submitting it again.

--
This patch adds support to ext4 for allowing more than 65000
subdirectories. Currently the maximum number of subdirectories is capped
at 32000.

If we exceed 65000 subdirectories in an htree directory it sets the
inode link count to 1 and no longer counts subdirectories.  The
directory link count is not actually used when determining if a
directory is empty, as that only counts subdirectories and not regular
files that might be in there. 

A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
the subdir count for any directory crosses 65000.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Kalpak Shah [EMAIL PROTECTED]


Index: linux-2.6.22-rc4/fs/ext4/namei.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/namei.c   2007-06-14 17:30:47.0 
-0700
+++ linux-2.6.22-rc4/fs/ext4/namei.c2007-06-14 17:32:55.0 -0700
@@ -1619,6 +1619,27 @@ static int ext4_delete_entry (handle_t *
return -ENOENT;
 }
 
+static inline void ext4_inc_count(handle_t *handle, struct inode *inode)
+{
+   inc_nlink(inode);
+   if (is_dx(inode)  inode-i_nlink  1) {
+   /* limit is 16-bit i_links_count */
+   if (inode-i_nlink = EXT4_LINK_MAX || inode-i_nlink == 2) {
+   inode-i_nlink = 1;
+   EXT4_SET_RO_COMPAT_FEATURE(inode-i_sb,
+ EXT4_FEATURE_RO_COMPAT_DIR_NLINK);
+   }
+   }
+}
+
+static inline void ext4_dec_count(handle_t *handle, struct inode *inode)
+{
+   drop_nlink(inode);
+   if (S_ISDIR(inode-i_mode)  inode-i_nlink == 0)
+   inc_nlink(inode);
+}
+
+
 static int ext4_add_nondir(handle_t *handle,
struct dentry *dentry, struct inode *inode)
 {
@@ -1715,7 +1736,7 @@ static int ext4_mkdir(struct inode * dir
struct ext4_dir_entry_2 * de;
int err, retries = 0;
 
-   if (dir-i_nlink = EXT4_LINK_MAX)
+   if (EXT4_DIR_LINK_MAX(dir))
return -EMLINK;
 
 retry:
@@ -1738,7 +1759,7 @@ retry:
inode-i_size = EXT4_I(inode)-i_disksize = inode-i_sb-s_blocksize;
dir_block = ext4_bread (handle, inode, 0, 1, err);
if (!dir_block) {
-   drop_nlink(inode); /* is this nlink == 0? */
+   ext4_dec_count(handle, inode); /* is this nlink == 0? */
ext4_mark_inode_dirty(handle, inode);
iput (inode);
goto out_stop;
@@ -1770,7 +1791,7 @@ retry:
iput (inode);
goto out_stop;
}
-   inc_nlink(dir);
+   ext4_inc_count(handle, dir);
ext4_update_dx_flag(dir);
ext4_mark_inode_dirty(handle, dir);
d_instantiate(dentry, inode);
@@ -2035,10 +2056,10 @@ static int ext4_rmdir (struct inode * di
retval = ext4_delete_entry(handle, dir, de, bh);
if (retval)
goto end_rmdir;
-   if (inode-i_nlink != 2)
-   ext4_warning (inode-i_sb, ext4_rmdir,
- empty directory has nlink!=2 (%d),
- inode-i_nlink);
+   if (!EXT4_DIR_LINK_EMPTY(inode))
+   ext4_warning(inode-i_sb, ext4_rmdir,
+empty directory has too many links (%d),
+inode-i_nlink);
inode-i_version++;
clear_nlink(inode);
/* There's no need to set i_disksize: the fact that i_nlink is
@@ -2048,7 +2069,7 @@ static int ext4_rmdir (struct inode * di
ext4_orphan_add(handle, inode);
inode-i_ctime = dir-i_ctime = dir-i_mtime = ext4_current_time(inode);
ext4_mark_inode_dirty(handle, inode);
-   drop_nlink(dir);
+   ext4_dec_count(handle, dir);
ext4_update_dx_flag(dir);
ext4_mark_inode_dirty(handle, dir);
 
@@ -2099,7 +2120,7 @@ static int ext4_unlink(struct inode * di
dir-i_ctime = dir-i_mtime = ext4_current_time(dir);
ext4_update_dx_flag(dir);
ext4_mark_inode_dirty(handle, dir);
-   drop_nlink(inode);
+   ext4_dec_count(handle, inode);
if (!inode-i_nlink)
ext4_orphan_add(handle, inode);
inode-i_ctime = ext4_current_time(inode);
@@ -2149,7 +2170,7 @@ retry:
err = __page_symlink(inode, symname, l,
mapping_gfp_mask(inode-i_mapping)  ~__GFP_FS);
if (err) {
-   drop_nlink(inode);
+   ext4_dec_count(handle, inode);
  

[EXT4 set 8][PATCH 1/1]Add journal checksums

2007-07-01 Thread Mingming Cao
Journal checksum feature has been added to detect corruption of journal.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Girish Shilamkar [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]

diff -Nurp linux024/fs/ext4/super.c linux/fs/ext4/super.c
--- linux024/fs/ext4/super.c2007-06-25 16:19:24.0 -0500
+++ linux/fs/ext4/super.c   2007-06-26 08:35:16.0 -0500
@@ -721,6 +721,7 @@ enum {
Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh,
Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev,
+   Opt_journal_checksum, Opt_journal_async_commit,
Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
@@ -760,6 +761,8 @@ static match_table_t tokens = {
{Opt_journal_update, journal=update},
{Opt_journal_inum, journal=%u},
{Opt_journal_dev, journal_dev=%u},
+   {Opt_journal_checksum, journal_checksum},
+   {Opt_journal_async_commit, journal_async_commit},
{Opt_abort, abort},
{Opt_data_journal, data=journal},
{Opt_data_ordered, data=ordered},
@@ -948,6 +951,13 @@ static int parse_options (char *options,
return 0;
*journal_devnum = option;
break;
+   case Opt_journal_checksum:
+   set_opt (sbi-s_mount_opt, JOURNAL_CHECKSUM);
+   break;
+   case Opt_journal_async_commit:
+   set_opt (sbi-s_mount_opt, JOURNAL_ASYNC_COMMIT);
+   set_opt (sbi-s_mount_opt, JOURNAL_CHECKSUM);
+   break;
case Opt_noload:
set_opt (sbi-s_mount_opt, NOLOAD);
break;
@@ -1817,6 +1827,21 @@ static int ext4_fill_super (struct super
goto failed_mount4;
}
 
+   if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
+   jbd2_journal_set_features(sbi-s_journal,
+   JBD2_FEATURE_COMPAT_CHECKSUM, 0,
+   JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+   } else if (test_opt(sb, JOURNAL_CHECKSUM)) {
+   jbd2_journal_set_features(sbi-s_journal,
+   JBD2_FEATURE_COMPAT_CHECKSUM, 0, 0);
+   jbd2_journal_clear_features(sbi-s_journal, 0, 0,
+   JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+   } else {
+   jbd2_journal_clear_features(sbi-s_journal,
+   JBD2_FEATURE_COMPAT_CHECKSUM, 0,
+   JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT);
+   }
+
/* We have now updated the journal if required, so we can
 * validate the data journaling mode. */
switch (test_opt(sb, DATA_FLAGS)) {
diff -Nurp linux024/fs/jbd2/commit.c linux/fs/jbd2/commit.c
--- linux024/fs/jbd2/commit.c   2007-06-25 16:19:25.0 -0500
+++ linux/fs/jbd2/commit.c  2007-06-26 08:40:03.0 -0500
@@ -21,6 +21,7 @@
 #include linux/mm.h
 #include linux/pagemap.h
 #include linux/jiffies.h
+#include linux/crc32.h
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -93,15 +94,18 @@ static int inverted_lock(journal_t *jour
return 1;
 }
 
-/* Done it all: now write the commit record.  We should have
+/*
+ * Done it all: now submit the commit record.  We should have
  * cleaned up our previous buffers by now, so if we are in abort
  * mode we can now just skip the rest of the journal write
  * entirely.
  *
  * Returns 1 if the journal needs to be aborted or 0 on success
  */
-static int journal_write_commit_record(journal_t *journal,
-   transaction_t *commit_transaction)
+static int journal_submit_commit_record(journal_t *journal,
+   transaction_t *commit_transaction,
+   struct buffer_head **cbh,
+   __u32 crc32_sum)
 {
struct journal_head *descriptor;
struct buffer_head *bh;
@@ -117,21 +121,36 @@ static int journal_write_commit_record(j
 
bh = jh2bh(descriptor);
 
-   /* AKPM: buglet - add `i' to tmp! */
for (i = 0; i  bh-b_size; i += 512) {
-   journal_header_t *tmp = (journal_header_t*)bh-b_data;
+   struct commit_header *tmp =
+   (struct commit_header *)(bh-b_data + i);
tmp-h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER);
tmp-h_blocktype = cpu_to_be32(JBD2_COMMIT_BLOCK);
tmp-h_sequence = cpu_to_be32(commit_transaction-t_tid);
+
+   if (JBD2_HAS_COMPAT_FEATURE(journal,
+   

[EXT4 set 9][PATCH 1/5]Morecleanups:ext4-zero_user_page

2007-07-01 Thread Mingming Cao
Use zero_user_page() in ext4 where possible.


Signed-off-by: Eric Sandeen [EMAIL PROTECTED]

Index: linux-2.6.22-rc4-mm2/fs/ext4/inode.c
===
--- linux-2.6.22-rc4-mm2.orig/fs/ext4/inode.c
+++ linux-2.6.22-rc4-mm2/fs/ext4/inode.c
@@ -1830,7 +1830,6 @@ int ext4_block_truncate_page(handle_t *h
struct inode *inode = mapping-host;
struct buffer_head *bh;
int err = 0;
-   void *kaddr;

if ((EXT4_I(inode)-i_flags  EXT4_EXTENTS_FL) 
test_opt(inode-i_sb, EXTENTS) 
@@ -1847,10 +1846,7 @@ int ext4_block_truncate_page(handle_t *h
 */
if (!page_has_buffers(page)  test_opt(inode-i_sb, NOBH) 
 ext4_should_writeback_data(inode)  PageUptodate(page)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length, KM_USER0);
set_page_dirty(page);
goto unlock;
}
@@ -1903,10 +1899,7 @@ int ext4_block_truncate_page(handle_t *h
goto unlock;
}

-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length, KM_USER0);

BUFFER_TRACE(bh, zeroed end of block);



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 9][PATCH 2/5]Morecleanups: use is_power_of_2 () in fill_super

2007-07-01 Thread Mingming Cao
Subject: is_power_of_2: ext4/super.c
From: vignesh babu [EMAIL PROTECTED]

Replace (n  (n-1)) in the context of power of 2 checks with is_power_of_2()

Signed-off-by: vignesh babu [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
---

 fs/ext4/super.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff -puN fs/ext4/super.c~is_power_of_2-ext4-superc fs/ext4/super.c
--- a/fs/ext4/super.c~is_power_of_2-ext4-superc
+++ a/fs/ext4/super.c
@@ -36,6 +36,7 @@
 #include linux/namei.h
 #include linux/quotaops.h
 #include linux/seq_file.h
+#include linux/log2.h

 #include asm/uaccess.h

@@ -1662,7 +1663,7 @@ static int ext4_fill_super (struct super
sbi-s_inode_size = le16_to_cpu(es-s_inode_size);
sbi-s_first_ino = le32_to_cpu(es-s_first_ino);
if ((sbi-s_inode_size  EXT4_GOOD_OLD_INODE_SIZE) ||
-   (sbi-s_inode_size  (sbi-s_inode_size - 1)) ||
+   (!is_power_of_2(sbi-s_inode_size)) ||
(sbi-s_inode_size  blocksize)) {
printk (KERN_ERR
EXT4-fs: unsupported inode size: %d\n,
_

Patches currently in -mm which might be from [EMAIL PROTECTED] are

git-ubi.patch
use-is_power_of_2-in-cxgb3-cxgb3_mainc.patch
use-is_power_of_2-in-myri10ge-myri10gec.patch
is_power_of_2-ext3-superc.patch
is_power_of_2-ext4-superc.patch
is_pwoer_of_2-jbd.patch



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 9][PATCH 3/5]Morecleanups:ext4-remove-extra-is_rdonly-check

2007-07-01 Thread Mingming Cao
Subject: ext4: remove extra IS_RDONLY() check
From: Dave Hansen [EMAIL PROTECTED]

ext4_change_inode_journal_flag() is only called from one location:
ext4_ioctl(EXT3_IOC_SETFLAGS).  That ioctl case already has a IS_RDONLY()
call in it so this one is superfluous.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
---

 fs/ext4/inode.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/ext4/inode.c~ext4-remove-extra-is_rdonly-check fs/ext4/inode.c
--- a/fs/ext4/inode.c~ext4-remove-extra-is_rdonly-check
+++ a/fs/ext4/inode.c
@@ -3352,7 +3352,7 @@ int ext4_change_inode_journal_flag(struc
 */

journal = EXT4_JOURNAL(inode);
-   if (is_journal_aborted(journal) || IS_RDONLY(inode))
+   if (is_journal_aborted(journal))
return -EROFS;

jbd2_journal_lock_updates(journal);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EXT4 set 9][PATCH 5/5]Extent micro cleanups

2007-07-01 Thread Mingming Cao
From: Dmitry Monakhov [EMAIL PROTECTED]
Subject: ext4: extent macros cleanup

- Replace math equation to it's macro equivalent
- make ext4_ext_grow_indepth() indexes/leaf correct

Signed-off-by: Dmitry Monakhov [EMAIL PROTECTED]
Acked-by: Alex Tomas [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
---
 fs/ext4/extents.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 12fe3d7..1fd00ac 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -375,7 +375,7 @@ ext4_ext_binsearch_idx(struct inode *inode, struct 
ext4_ext_path *path, int bloc
ext_debug(binsearch for %d(idx):  , block);

l = EXT_FIRST_INDEX(eh) + 1;
-   r = EXT_FIRST_INDEX(eh) + le16_to_cpu(eh-eh_entries) - 1;
+   r = EXT_LAST_INDEX(eh);
while (l = r) {
m = l + (r - l) / 2;
if (block  le32_to_cpu(m-ei_block))
@@ -440,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct 
ext4_ext_path *path, int block)
ext_debug(binsearch for %d:  , block);

l = EXT_FIRST_EXTENT(eh) + 1;
-   r = EXT_FIRST_EXTENT(eh) + le16_to_cpu(eh-eh_entries) - 1;
+   r = EXT_LAST_EXTENT(eh);

while (l = r) {
m = l + (r - l) / 2;
@@ -922,8 +922,11 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct 
inode *inode,
curp-p_hdr-eh_max = cpu_to_le16(ext4_ext_space_root_idx(inode));
curp-p_hdr-eh_entries = cpu_to_le16(1);
curp-p_idx = EXT_FIRST_INDEX(curp-p_hdr);
-   /* FIXME: it works, but actually path[0] can be index */
-   curp-p_idx-ei_block = EXT_FIRST_EXTENT(path[0].p_hdr)-ee_block;
+   
+   if (path[0].p_hdr-eh_depth)
+ curp-p_idx-ei_block = EXT_FIRST_INDEX(path[0].p_hdr)-ei_block;
+   else
+ curp-p_idx-ei_block = EXT_FIRST_EXTENT(path[0].p_hdr)-ee_block;
ext4_idx_store_pblock(curp-p_idx, newblock);

neh = ext_inode_hdr(inode);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-07-01 Thread Jose R. Santos
On Sun, 01 Jul 2007 03:38:10 -0400
Mingming Cao [EMAIL PROTECTED] wrote:

 [PATCH] jbd2 stats through procfs
 
 The patch below updates the jbd stats patch to 2.6.20/jbd2.
 The initial patch was posted by Alex Tomas in December 2005
 (http://marc.info/?l=linux-ext4m=113538565128617w=2).
 It provides statistics via procfs such as transaction lifetime and size.
 
 [ This probably should be rewritten to use debugfs?   -- Ted]

Was a decision ever made on whether this should remain on procfs or be
move to use debugfs?  I can recall this being discuss but don't recall
a firm decision on it.

-JRS
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] BIG_BG vs extended META_BG in ext4

2007-07-01 Thread Theodore Tso
On Sat, Jun 30, 2007 at 11:39:08PM -0500, Jose R. Santos wrote:
 On Sat, 30 Jun 2007 01:51:25 -0400
 Andreas Dilger [EMAIL PROTECTED] wrote:
  I don't think there is actually any fundamental difference between these
  proposals.  The reality is that we cannot change the semantics of the
  META_BG flag at this point, since both e2fsprogs and ext3/ext4 in the
  kernel understand META_BG to mean only group descriptor backups are
  in groups {0, 1, last} of the metagroup and nothing else.
 
 Agree.  I call it extended META_BG for lack of a better name, but a new
 feature flag will be required.

It was the intention that META_BG include allowing the bitmap and
inode tables to range anywhere outside of the block group, but that
never got coded.  It would be confusing though if we relaxed it
withotu adding a feature bit, and I agree that we might as well use
overload the BIG_BG group to indicate this feature.

The fact that BIG_BG requires contiguous blocks for the bitmaps when
they exceed blocksize*8 blocks still concerns me a minor amount, and
given the hopeful inclusion of kernel patches that allow blocksize 
pagesize.  Furthermore, I still wonder whether we will want to make
blockgroups that much bigger (since reducing the allocation groups is
not necessarily a smart thing; we will need to do some benchmarks with
filesystem aging to see how this affects antifragmentation efforts),
but the complexity engenered by adding BIG_BG isn't that bad (again,
my only concern is with the contiguous bitmap blocks requirements).

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] BIG_BG vs extended META_BG in ext4

2007-07-01 Thread Andreas Dilger
On Jun 30, 2007  23:40 -0500, Jose R. Santos wrote:
 Yes, I think bigger block groups will benefit extents a great deal
 since not only can we have larger extents, but I believe that as the
 filesystem ages the chances of getting large number contiguous block can
 be reduce with small block groups.

This turns out not to be true, and in fact we need to change the unwritten
extents patch a tiny bit.  The reason is that we have limited the maximum
extent size to 2^16-1 = 32767 blocks.  The current maximum for the number
of blocks in a group is 65528, so that we can always fit the free blocks
count into a __u16 if the bitmaps and inode table are moved out of the
group.  Moving the bitmaps and itable will hit the max extent length.

There are still other benefits to moving the metadata together.

Now, the one minor problem with the unwritten extent patches is that by
using the high bit of the ee_len this limits the extent length to 2^15-1
blocks, but it would be MUCH better if this limit was 2^16 blocks and
it fit evenly into an empty group, consecutive extents were aligned, etc.
It also doesn't make sense to have an uninitialized 0-length extent, so
I think the unwritten extent (fallocate) patch needs to special case
the ee_len = 65536 to be a regular extent instead of unwritten.

  With less groups, we load less group descriptors in memory, we have  
  less I/O to read bitmap and inode array (because we manage less group  
  descriptors again, because we load bigger bitmap and array in one time)
 
 Presumably, we would still need to access the same amount data but
 latencies should be reduce since we could do larger IO's and less seeks
 to read the bitmaps.  I also wonder if there are benefits in terms of
 locality to having the bitmaps closer to its blocks vs having them far
 away like in xMETA_BG.

Having the bitmaps together will fix this independent of BIG_BG.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix error handling in ext3_create_journal

2007-07-01 Thread Borislav Petkov

---
From: Borislav Petkov [EMAIL PROTECTED]

Fix error handling in ext3_create_journal according to kernel conventions.

Signed-off-by: Borislav Petkov [EMAIL PROTECTED]
--

Index: linux-2.6.22-rc6/fs/ext3/super.c
===
--- linux-2.6.22-rc6/fs/ext3/super.c.orig   2007-07-01 21:12:51.0 
+0200
+++ linux-2.6.22-rc6/fs/ext3/super.c2007-07-01 21:14:32.0 +0200
@@ -2075,6 +2075,7 @@
   unsigned int journal_inum)
 {
journal_t *journal;
+   int err;
 
if (sb-s_flags  MS_RDONLY) {
printk(KERN_ERR EXT3-fs: readonly filesystem when trying to 
@@ -2082,13 +2083,15 @@
return -EROFS;
}
 
-   if (!(journal = ext3_get_journal(sb, journal_inum)))
+   journal = ext3_get_journal(sb, journal_inum);
+   if (!journal)
return -EINVAL;
 
printk(KERN_INFO EXT3-fs: creating new journal on inode %u\n,
   journal_inum);
 
-   if (journal_create(journal)) {
+   err = journal_create(journal);
+   if (err) {
printk(KERN_ERR EXT3-fs: error creating journal.\n);
journal_destroy(journal);
return -EIO;
-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-01 Thread David Chinner
On Sat, Jun 30, 2007 at 11:21:11AM +0100, Christoph Hellwig wrote:
 On Tue, Jun 26, 2007 at 04:02:47PM +0530, Amit K. Arora wrote:
   Can you clarify - what is the current behaviour when ENOSPC (or some other
   error) is hit?  Does it keep the current fallocate() or does it free it?
  
  Currently it is left on the file system implementation. In ext4, we do
  not undo preallocation if some error (say, ENOSPC) is hit. Hence it may
  end up with partial (pre)allocation. This is inline with dd and
  posix_fallocate, which also do not free the partially allocated space.
 
 I can't find anything in the specification of posix_fallocate
 (http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html)
 that tells what should happen to allocate blocks on error.

Yeah, and AFAICT glibc leaves them behind ATM.

 But common sense would be to not leak disk space on failure of this
 syscall, and this definitively should not be left up to the filesystem,
 either we always leak it or always free it, and I'd strongly favour
 the latter variant.

We can't simply walk the range an remove unwritten extents, as some
of them may have been present before the fallocate() call. That
makes it extremely difficult to undo a failed call and not remove
more pre-existing pre-allocations.

Given the current behaviour for posix_fallocate() in glibc, I think
that retaining the same error semantic and punting the cleanup to
userspace (where the app will fail with ENOSPC anyway) is the only
sane thing we can do here. Trying to undo this in the kernel leads
to lots of extra rarely used code in error handling paths...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html