Re: delayed allocatiou result in Oops

2007-06-20 Thread Dmitry Monakhov
On 16:46 Срд 20 Июн , Mingming Cao wrote:
> On Wed, 2007-06-20 at 12:15 +0400, Alex Tomas wrote:
> > Mingming Cao wrote:
> > > Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
> > > =blocksize. Is that the reason we need page->private to pass the
> > > request?
> > 
> > PageMappedToDisk isn't enough in that case, definitely. bh is the way
> > to track state of each block (this is what i'm implementing now), but
> > I think current nobh version (per-page flags are used) is valuable.
> > 
> > > That's good to know, thanks for the update. So probably above error case
> > > handling will be addressed in the new version? 
> > 
> > well, you actually can move that SetPagePrivate() few lines above
> > (Dmitriy already tested this).
> 
> Like this?
Of course not, we may set page private bit only if 
space was successfully reserved by ext4_wb_reserve_space_page().
So patch looks like this:
diff --git a/fs/ext4/writeback.c b/fs/ext4/writeback.c
index 3e669d6..84e5029 100644
--- a/fs/ext4/writeback.c
+++ b/fs/ext4/writeback.c
@@ -904,9 +904,6 @@ int ext4_wb_commit_write(struct file *file, struct page 
*page,
wb_debug("commit page %lu (%u-%u) for inode %lu\n",
page->index, from, to, inode->i_ino);
 
-   /* mark page private so that we get
-* called to invalidate/release page */
-   SetPagePrivate(page);
 
if (!PageBooked(page) && !PageMappedToDisk(page)) {
/* ->prepare_write() observed that block for this
@@ -918,6 +915,9 @@ int ext4_wb_commit_write(struct file *file, struct page 
*page,
if (err)
return err;
}
+   /* mark page private so that we get
+* called to invalidate/release page */
+   SetPagePrivate(page);
 
/* ok. block for this page is allocated already or it has
 * been reserved succesfully. so, user may use it */
-- 
1.5.2
> 
> Index: linux-2.6.22-rc5/fs/ext4/writeback.c
> ===
> --- linux-2.6.22-rc5.orig/fs/ext4/writeback.c 2007-06-20 16:41:26.0 
> -0700
> +++ linux-2.6.22-rc5/fs/ext4/writeback.c  2007-06-20 16:44:10.0 
> -0700
> @@ -918,16 +918,16 @@ int ext4_wb_commit_write(struct file *fi
>   wb_debug("commit page %lu (%u-%u) for inode %lu\n",
>   page->index, from, to, inode->i_ino);
>  
> - /* mark page private so that we get
> -  * called to invalidate/release page */
> - SetPagePrivate(page);
> -
>   if (!PageBooked(page) && !PageMappedToDisk(page)) {
>   /* ->prepare_write() observed that block for this
>* page hasn't been allocated yet. there fore it
>* asked to reserve block for later allocation */
>   BUG_ON(page->private == 0);
>   page->private = 0;
> + /* mark page private so that we get
> +  * called to invalidate/release page */
> + SetPagePrivate(page);
> +
>   err = ext4_wb_reserve_space_page(page, 1);
>   if (err)
>   return err;
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 10/10] Online defrag command

2007-06-20 Thread Takashi Sato
- The defrag command.  Usage is as follows:
  o Put the multiple files closer together.
# e4defrag -r directory-name
  o Defrag for free space fragmentation.
# e4defrag -f file-name
  o Defrag for a single file.
# e4defrag file-name
  o Defrag for all files on ext4.
# e4defrag device-name

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
/*
 * e4defrag.c - ext4 filesystem defragmenter
 */

#ifndef _LARGEFILE_SOURCE
#define _LARGEFILE_SOURCE
#endif

#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE
#endif

#define _XOPEN_SOURCE   500
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data)
#define EXT4_IOC_GROUP_INFO _IOW('f', 11, struct ext4_group_data_info)
#define EXT4_IOC_FREE_BLOCKS_INFO _IOW('f', 12, struct ext4_extents_info)
#define EXT4_IOC_EXTENTS_INFO   _IOW('f', 13, struct ext4_extents_info)
#define EXT4_IOC_RESERVE_BLOCK  _IOW('f', 14, struct ext4_extents_info)
#define EXT4_IOC_MOVE_VICTIM_IOW('f', 15, struct ext4_extents_info)
#define EXT4_IOC_BLOCK_RELEASE  _IO('f', 16)


#define _FILE_OFFSET_BITS 64
#define ext4_fsblk_tunsigned long long
#define DEFRAG_MAX_ENT  32

/* Extent status which are used in ext_in_group */
#define EXT4_EXT_USE0
#define EXT4_EXT_FREE   1
#define EXT4_EXT_RESERVE2

/* Insert list2 after list1 */
#define insert(list1,list2) { list2 ->next = list1->next;\
list1->next->prev = list2;\
list2->prev = list1;\
list1->next = list2;\
}

#define DEFRAG_RESERVE_BLOCK_SECOND 2

/* Magic number for ext4 */
#define EXT4_SUPER_MAGIC0xEF53

/* The number of pages to defrag at one time */
#define DEFRAG_PAGES128

/* Maximum length of contiguous blocks */
#define MAX_BLOCKS_LEN  16384

/* Force defrag mode: Max file size in bytes (128MB) */
#define MAX_FILE_SIZE   (unsigned long)1 << 27

/* Force defrag mode: Max filesystem relative offset (48bit) */
#define MAX_FS_OFFSET_BIT   48

/* Data type for filesystem-wide blocks number */
#define  ext4_fsblk_t unsigned long long

/* Ioctl command */
#define EXT4_IOC_FIBMAP _IOW('f', 9, ext4_fsblk_t)
#define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data)

#define DEVNAME 0
#define DIRNAME 1
#define FILENAME2

#define RETURN_OK   0
#define RETURN_NG   -1
#define FTW_CONT0
#define FTW_STOP-1
#define FTW_OPEN_FD 2000
#define FILE_CHK_OK 0
#define FILE_CHK_NG -1
#define FS_EXT4 "ext4dev"
#define ROOT_UID0

/* Defrag block size, in bytes */
#define DEFRAG_SIZE 67108864

#define min(x,y) (((x) > (y)) ? (y) : (x))

#define PRINT_ERR_MSG(msg)  fprintf(stderr, "%s\n", (msg));
#define PRINT_FILE_NAME(file)   fprintf(stderr, "\t\t\"%s\"\n", (file));

#define MSG_USAGE   \
"Usage : e4defrag [-v] file...| directory...| device...\n\
  : e4defrag -f file [blocknr] \n\
  : e4defrag -r directory... | device... \n"

#define MSG_R_OPTION" with regional block allocation mode.\n"
#define NGMSG_MTAB  "\te4defrag  : Can not access /etc/mtab."
#define NGMSG_UNMOUNT   "\te4defrag  : FS is not mounted."
#define NGMSG_EXT4  "\te4defrag  : FS is not ext4 File System."
#define NGMSG_FS_INFO   "\te4defrag  : get FSInfo fail."
#define NGMSG_FILE_INFO "\te4defrag  : get FileInfo fail."
#define NGMSG_FILE_OPEN "\te4defrag  : open fail."
#define NGMSG_FILE_SYNC "\te4defrag  : sync(fsync) fail."
#define NGMSG_FILE_DEFRAG   "\te4defrag  : defrag fail."
#define NGMSG_FILE_BLOCKSIZE"\te4defrag  : can't get blocksize."
#define NGMSG_FILE_FIBMAP   "\te4defrag  : can't get block number."
#define NGMSG_FILE_UNREG"\te4defrag  : File is not regular file."

#define NGMSG_FILE_LARGE\
"\te4defrag  : Defrag size is larger than FileSystem's free space."

#define NGMSG_FILE_PRIORITY \
"\te4defrag  : File is not current user's file or current user is not root."

#define NGMSG_FILE_LOCK "\te4defrag  : File is locked."
#define NGMSG_FILE_BLANK"\te4defrag  : File size is 0."
#define NGMSG_GET_LCKINFO   "\te4defrag  : get LockInfo fail."
#define NGMSG_TYPE  \
"e4defrag  : Can not process %s in regional mode\n."
#define NGMSG_REALPATH  "\te4defrag  : Can not get full path."

struct ext4_extent_data {
unsigned long long block;   /* start logical block number */
ext4_fsblk_t start; /* start physical block

[RFC][PATCH 9/10] Fix bugs in multi-block allocation and locality-group

2007-06-20 Thread Takashi Sato
- Move lg_list to s_locality_dirty in ext4_lg_sync_single_group()
  to flush all of dirty inodes.
- Fix ext4_mb_new_blocks() to return err value when defrag failed.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test3/fs/ext4/lg.c 
Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/lg.c
--- linux-2.6.19-rc6-test3/fs/ext4/lg.c 2007-06-20 16:56:16.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/lg.c 2007-06-18 
14:21:54.0 +0900
@@ -389,6 +389,10 @@ int ext4_lg_sync_single_group(struct sup
cond_resched();
spin_lock(&inode_lock);
if (wbc->nr_to_write <= 0) {
+   if (!list_empty(&lg->lg_io)) {
+   set_bit(EXT4_LG_DIRTY, &lg->lg_flags);
+   list_move(&lg->lg_list, &sbi->s_locality_dirty);
+   }
rc = EXT4_STOP_WRITEBACK;
code = 6;
break;
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test3/fs/ext4/mballoc.c 
Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/mballoc.c
--- linux-2.6.19-rc6-test3/fs/ext4/mballoc.c2007-06-20 16:58:22.0 
+0900
+++ Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/mballoc.c2007-06-18 
14:21:54.0 +0900
@@ -3732,8 +3732,10 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t
!(EXT4_I(ar->inode)->i_state & EXT4_STATE_BLOCKS_RESERVED)) {
reserved = ar->len;
err = ext4_reserve_blocks(sb, reserved);
-   if (err)
+   if (err) {
+   *errp = err;
return err;
+   }
}
 
if (!ext4_mb_use_preallocated(&ac)) {
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 8/10] Release reserved block

2007-06-20 Thread Takashi Sato
- Release reserved blocks if defrag failed.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-lg-mballoc-bugfix/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c  
2007-06-19 20:19:14.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-lg-mballoc-bugfix/fs/ext4/extents.c  
2007-06-18 14:23:07.0 +0900
@@ -3066,6 +3066,10 @@ int ext4_ext_ioctl(struct inode *inode, 
 
err = ext4_ext_defrag_victim(filp, &ext_info);
 
+   } else if (cmd == EXT4_IOC_BLOCK_RELEASE) {
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   ext4_discard_reservation(inode);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
} else if (cmd == EXT4_IOC_DEFRAG) {
struct ext4_ext_defrag_data defrag;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 7/10] Reserve freed blocks

2007-06-20 Thread Takashi Sato
- Reserve the free blocks in the target area, not to be
  used by other process.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c  
2007-06-19 21:40:55.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c  
2007-06-19 20:19:14.0 +0900
@@ -2619,6 +2619,182 @@ out:
 }
 
 /**
+ * ext4_ext_defrag_reserve - reserve blocks for defrag
+ * @inode  target inode
+ * @goal   block reservation goal
+ * @lenblocks count to reserve
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+
+int ext4_ext_defrag_reserve(struct inode * inode, ext4_fsblk_t goal, int len)
+{
+   struct super_block *sb = NULL;
+   handle_t *handle = NULL;
+   struct buffer_head *bitmap_bh = NULL;
+   struct ext4_block_alloc_info *block_i;
+   struct ext4_reserve_window_node * my_rsv = NULL;
+   unsigned short windowsz = 0;
+   unsigned long group_no;
+   ext4_grpblk_t grp_target_blk;
+   int err = 0;
+
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+
+   handle = ext4_journal_start(inode, EXT4_RESERVE_TRANS_BLOCKS);
+   if (IS_ERR(handle)) {
+   err = PTR_ERR(handle);
+   handle = NULL;
+   goto out;
+   }
+
+   if (S_ISREG(inode->i_mode) && (!EXT4_I(inode)->i_block_alloc_info)) {
+   ext4_init_block_alloc_info(inode);
+   } else if (!S_ISREG(inode->i_mode)) {
+   printk(KERN_ERR "ext4_ext_defrag_reserve:"
+" incorrect file type\n");
+   err = -1;
+   goto out;
+   }
+
+   sb = inode->i_sb;
+   if (!sb) {
+   printk("ext4_ext_defrag_reserve: nonexistent device\n");
+   err = -ENXIO;
+   goto out;
+   }
+   ext4_get_group_no_and_offset(sb, goal, &group_no,
+   &grp_target_blk);
+
+   block_i = EXT4_I(inode)->i_block_alloc_info;
+
+   if (!block_i || ((windowsz =
+   block_i->rsv_window_node.rsv_goal_size) == 0)) {
+   printk("ex4_ext_defrag_reserve: unable to reserve\n");
+   err = -1;
+   goto out;
+   }
+
+   my_rsv = &block_i->rsv_window_node;
+
+   bitmap_bh = read_block_bitmap(sb, group_no);
+   if (!bitmap_bh) {
+   err = -ENOSPC;
+   goto out;
+   }
+
+   BUFFER_TRACE(bitmap_bh, "get undo access for new block");
+   err = ext4_journal_get_undo_access(handle, bitmap_bh);
+   if (err)
+   goto out;
+
+   err = alloc_new_reservation(my_rsv, grp_target_blk, sb,
+   group_no, bitmap_bh);
+   if (err < 0) {
+   printk(KERN_ERR "defrag: reservation faild\n");
+   ext4_discard_reservation(inode);
+   goto out;
+   } else {
+   if (len > EXT4_DEFAULT_RESERVE_BLOCKS) {
+   try_to_extend_reservation(my_rsv, sb,
+   len - EXT4_DEFAULT_RESERVE_BLOCKS);
+   }
+   }
+
+out:
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   ext4_journal_release_buffer(handle, bitmap_bh);
+   brelse(bitmap_bh);
+
+   if (handle)
+   ext4_journal_stop(handle);
+
+   return err;
+}
+
+int goal_in_my_reservation(struct ext4_reserve_window *, ext4_grpblk_t,
+   unsigned int, struct super_block *);
+int rsv_is_empty(struct ext4_reserve_window *);
+
+/**
+ * ext4_ext_block_within_rsv - Is target extent reserved ?
+ * @ inode   inode of target file
+ * @ ex_startstart physical block number of the extent
+ *   which already moved
+ * @ ex_len  block length of the extent which already moved
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+static int ext4_ext_block_within_rsv(struct inode *inode,
+   ext4_fsblk_t ex_start, int ex_len)
+{
+   struct super_block *sb = inode->i_sb;
+   struct ext4_block_alloc_info *block_i;
+   unsigned long group_no;
+   ext4_grpblk_t grp_blk;
+   struct ext4_reserve_window_node *rsv;
+
+   block_i = EXT4_I(inode)->i_block_alloc_info;
+   if (block_i && block_i->rsv_window_node.rsv_goal_size > 0) {
+   rsv = &block_i->rsv_window_node;
+   if (rsv_is_empty(&rsv->rsv_window)) {
+   printk("defrag: Can't defrag due to"
+   " the empty reservation\n");
+   return -1;
+   }
+   } else {
+   printk("

[RFC][PATCH 6/10] Move files from target block group to other block group

2007-06-20 Thread Takashi Sato
- To make contiguous free blocks, move files from the target block group
  to other block group. 

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c
2007-06-20 08:27:44.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c  
2007-06-19 21:40:55.0 +0900
@@ -1279,20 +1279,20 @@ ext4_can_extents_be_merged(struct inode 
 }
 
 /*
- * ext4_ext_insert_extent:
- * tries to merge requsted extent into the existing extent or
- * inserts requested extent as new one into the tree,
- * creating new leaf in the no-space case.
+ * ext4_ext_insert_extent_defrag:
+ * The difference from ext4_ext_insert_extent is to use the first block
+ * in newext as the goal of the new index block.
  */
-int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
+int ext4_ext_insert_extent_defrag(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *newext)
+   struct ext4_extent *newext, int defrag)
 {
struct ext4_extent_header * eh;
struct ext4_extent *ex, *fex;
struct ext4_extent *nearex; /* nearest extent */
struct ext4_ext_path *npath = NULL;
int depth, len, err, next;
+   ext4_fsblk_t defrag_goal;
 
BUG_ON(newext->ee_len == 0);
depth = ext_depth(inode);
@@ -1342,11 +1342,17 @@ repeat:
  le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max));
}
 
+   if (defrag) {
+   defrag_goal = ext_pblock(newext);
+   } else {
+   defrag_goal = 0;
+   }
/*
 * There is no free space in the found leaf.
 * We're gonna add a new leaf in the tree.
 */
-   err = ext4_ext_create_new_leaf(handle, inode, path, newext);
+   err = ext4_ext_create_new_leaf(handle, inode, path,
+   newext, defrag_goal);
if (err)
goto cleanup;
depth = ext_depth(inode);
@@ -1438,6 +1444,19 @@ cleanup:
return err;
 }
 
+/*
+ * ext4_ext_insert_extent:
+ * tries to merge requsted extent into the existing extent or
+ * inserts requested extent as new one into the tree,
+ * creating new leaf in the no-space case.
+ */
+int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_extent *newext)
+{
+   return ext4_ext_insert_extent_defrag(handle, inode, path, newext, 0);
+}
+
 int ext4_ext_walk_space(struct inode *inode, unsigned long block,
unsigned long num, ext_prepare_callback func,
void *cbdata)
@@ -2600,6 +2619,70 @@ out:
 }
 
 /**
+ * ext4_ext_defrag_victim - Create free space for defrag
+ * @filp  target file
+ * @ex_info   target extents array to move
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+static int ext4_ext_defrag_victim(struct file *target_filp,
+   struct ext4_extents_info *ex_info)
+{
+   struct inode *target_inode = target_filp->f_dentry->d_inode;
+   struct super_block *sb = target_inode->i_sb;
+   struct file victim_file;
+   struct dentry victim_dent;
+   struct inode *victim_inode;
+   ext4_fsblk_t goal = ex_info->goal;
+   int ret = 0;
+   int i = 0;
+   int flag = DEFRAG_RESERVE_BLOCKS_SECOND;
+   struct ext4_extent_data ext;
+   unsigned long group;
+   ext4_grpblk_t grp_off;
+
+   /* Setup dummy entent data */
+   ext.len = 0;
+
+   /* Get the inode of the victim file */
+   victim_inode = iget(sb, ex_info->ino);
+   if (!victim_inode)
+   return -EACCES;
+
+   /* Setup file for the victim file */
+   victim_dent.d_inode = victim_inode;
+   victim_file.f_dentry = &victim_dent;
+
+   /* Set the goal appropriate offset */
+   if (goal == -1) {
+   ext4_get_group_no_and_offset(victim_inode->i_sb,
+   ex_info->ext[0].start, &group, &grp_off);
+   goal = ext4_group_first_block_no(sb, group + 1);
+   }
+
+   for (i = 0; i < ex_info->entries; i++ ) {
+   /* Move original blocks to another block group */
+   if ((ret = ext4_ext_defrag(&victim_file, ex_info->ext[i].block,
+   ex_info->ext[i].len, goal, flag, &ext)) < 0)
+   goto ERR;
+
+   /* Sync journal blocks before reservation */
+   if (do_fsync(target_filp, 0))
+   goto ERR;
+   }

[RFC][PATCH 5/10] Get all extents information of specified inode number

2007-06-20 Thread Takashi Sato
- Get all extents information of specified inode number to calculate
  the combination of extents which should be moved to other block group.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c   
2007-06-20 08:50:57.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c
2007-06-20 08:27:44.0 +0900
@@ -43,6 +43,12 @@
 #include 
 #include 
 
+#define DIO_CREDITS (EXT4_RESERVE_TRANS_BLOCKS + 32)
+#define EXT_SET_EXTENT_DATA(src, dest)  do {\
+   dest.block = le32_to_cpu(src->ee_block);\
+   dest.start = ext_pblock(src);\
+   dest.len = le16_to_cpu(src->ee_len);\
+   } while (0)
 /*
  * ext_pblock:
  * combine low and high parts of physical block number into ext4_fsblk_t
@@ -2479,6 +2485,121 @@ ext4_ext_next_extent(struct inode *inode
 }
 
 /**
+ * ext4_ext_extents_info() - get extents information
+ *
+ * @ext_info:   pointer to ext4_extents_info
+ *  @ext_info->ino  describe an inode which is used to get extent
+ *  information
+ *  @ext_info->max_entries: defined by DEFRAG_MAX_ENT
+ *  @ext_info->entries:amount of extents (output)
+ *  @ext_info->ext[]:   array of extent (output)
+ *  @ext_info->offset:  starting block offset of targeted extent
+ *  (file relative)
+ *
+ * @sb: for iget()
+ *
+ * This function returns 0 if next extent(s) exists,
+ * or returns 1 if next extent doesn't exist, otherwise returns error value.
+ * Called under truncate_mutex lock.
+ */
+static int ext4_ext_extents_info(struct ext4_extents_info *ext_info,
+   struct super_block *sb)
+{
+   struct ext4_ext_path *path = NULL;
+   struct ext4_extent *ext = NULL;
+   struct inode *inode = NULL;
+   unsigned long offset = ext_info->offset;
+   int max = ext_info->max_entries;
+   int is_last_extent = 0;
+   int depth = 0;
+   int entries = 0;
+   int err = 0;
+
+   inode = iget(sb, ext_info->ino);
+   if (!inode)
+   return -EACCES;
+
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+
+   /* if a file doesn't exist*/
+   if ((!inode->i_nlink) || (inode->i_ino < 12) ||
+   !S_ISREG(inode->i_mode)) {
+   ext_info->entries = 0;
+   err = -ENOENT;
+   goto out;
+   }
+
+   path = ext4_ext_find_extent(inode, offset, NULL);
+   if (IS_ERR(path)) {
+   err = PTR_ERR(path);
+   path = NULL;
+   goto out;
+   }
+   depth = ext_depth(inode);
+   ext = path[depth].p_ext;
+   EXT_SET_EXTENT_DATA(ext, ext_info->ext[entries]);
+   entries = 1;
+
+   /*
+* The ioctl can return 'max' ext4_extent_data per a call,
+* so if @inode has > 'max' extents, we must get away here.
+*/
+   while (entries < max) {
+   is_last_extent = ext4_ext_next_extent(inode, path, &ext);
+   /* found next extent (not the last one)*/
+   if (is_last_extent == 0) {
+   EXT_SET_EXTENT_DATA(ext, ext_info->ext[entries]);
+   entries++;
+
+   /*
+* If @inode has > 'max' extents,
+* this function should be called again,
+* (per a call, it can resolve only 'max' extents)
+* next time we have to start from 'max*n+1'th extent.
+*/
+   if (entries == max) {
+   ext_info->offset =
+   le32_to_cpu(ext->ee_block) +
+   le32_to_cpu(ext->ee_len);
+   /* check the extent is the last one or not*/
+   is_last_extent =
+   ext4_ext_next_extent(inode, path, &ext);
+   if (is_last_extent) {
+   is_last_extent = 1;
+   err = is_last_extent;
+   } else if (is_last_extent < 0) {
+   /*ERR*/
+   err = is_last_extent;
+   goto out;
+   }
+   break;
+   }
+
+   /* the extent is the last one */
+   } else if (is_last_exte

[RFC][PATCH 4/10] Get free blocks distribution of the target block group

2007-06-20 Thread Takashi Sato
- Get free blocks distribution of the target block group to know
  how many free blocks it has.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c   
2007-06-20 09:05:37.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c   
2007-06-20 08:50:57.0 +0900
@@ -2478,6 +2478,99 @@ ext4_ext_next_extent(struct inode *inode
return 1;
 }
 
+/**
+ * ext4_ext_fblocks_distribution - Search free block distribution
+ * @filp  target file
+ * @ex_info   ext4_extents_info
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+static int ext4_ext_fblocks_distribution(struct inode *inode,
+   struct ext4_extents_info *ext_info)
+{
+   handle_t *handle;
+   struct buffer_head *bitmap_bh = NULL;
+   struct super_block *sb = inode->i_sb;
+   struct ext4_super_block *es;
+   unsigned long group_no;
+   int max_entries = ext_info->max_entries;
+   ext4_grpblk_t blocks_per_group;
+   ext4_grpblk_t start;
+   ext4_grpblk_t end;
+   int num = 0;
+   int len = 0;
+   int i = 0;
+   int err = 0;
+   int block_set = 0;
+   int start_block = 0;
+
+   if (!sb) {
+   printk("ext4_ext_fblock_distribution: nonexitent device\n");
+   return -ENOSPC;
+   }
+   es = EXT4_SB(sb)->s_es;
+
+   group_no = (inode->i_ino -1) / EXT4_INODES_PER_GROUP(sb);
+   start = ext_info->offset;
+   blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
+   end = blocks_per_group -1;
+
+   handle = ext4_journal_start(inode, 1);
+   if (IS_ERR(handle)) {
+   err = PTR_ERR(handle);
+   return err;
+   }
+
+   bitmap_bh = read_block_bitmap(sb, group_no);
+   if (!bitmap_bh) {
+   err = -EIO;
+   goto out;
+   }
+
+   BUFFER_TRACE(bitmap_bh, "get undo access for new block");
+   err = ext4_journal_get_undo_access(handle, bitmap_bh);
+   if (err)
+   goto out;
+
+   for (i = start; i <= end ; i++) {
+   if (bitmap_search_next_usable_block(i, bitmap_bh, i+1) >= 0) {
+   len++;
+   /* if the free block is the first one in a region */
+   if (!block_set) {
+   start_block =
+   i + group_no * blocks_per_group;
+   block_set = 1;
+   }
+   } else if (len) {
+   ext_info->ext[num].start = start_block;
+   ext_info->ext[num].len = len;
+   num++;
+   len = 0;
+   block_set = 0;
+   if (num == max_entries) {
+   ext_info->offset = i + 1;
+   break;
+   }
+   }
+   if ((i == end) && len) {
+   ext_info->ext[num].start = start_block;
+   ext_info->ext[num].len = len;
+   num++;
+   }
+   }
+
+   ext_info->entries = num;
+out:
+   ext4_journal_release_buffer(handle, bitmap_bh);
+   brelse(bitmap_bh);
+
+   if (handle)
+   ext4_journal_stop(handle);
+
+   return err;
+}
+
 int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
 {
@@ -2545,6 +2638,21 @@ int ext4_ext_ioctl(struct inode *inode, 
if (copy_to_user((struct ext4_group_data_info *)arg,
&grp_data, sizeof(grp_data)))
return -EFAULT;
+   } else if (cmd == EXT4_IOC_FREE_BLOCKS_INFO) {
+   struct ext4_extents_info ext_info;
+
+   if (copy_from_user(&ext_info,
+   (struct ext4_extents_info __user *)arg,
+   sizeof(ext_info)))
+   return -EFAULT;
+
+   BUG_ON(ext_info.ino != inode->i_ino);
+
+   err = ext4_ext_fblocks_distribution(inode, &ext_info);
+
+   if (!err)
+   err = copy_to_user((struct ext4_extents_info*)arg,
+   &ext_info, sizeof(ext_info));
} else if (cmd == EXT4_IOC_DEFRAG) {
struct ext4_ext_defrag_data defrag;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 3/10] Get block group information

2007-06-20 Thread Takashi Sato
- Get s_blocks_per_group and s_inodes_per_group of target filesystem.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test1/fs/ext4/balloc.c 
Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/balloc.c
--- linux-2.6.19-rc6-test1/fs/ext4/balloc.c 2007-06-20 15:15:46.0 
+0900
+++ Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/balloc.c
2007-06-20 14:57:04.0 +0900
@@ -216,7 +216,7 @@ restart:
  * If the goal block is within the reservation window, return 1;
  * otherwise, return 0;
  */
-static int
+int
 goal_in_my_reservation(struct ext4_reserve_window *rsv, ext4_grpblk_t grp_goal,
unsigned int group, struct super_block * sb)
 {
@@ -336,7 +336,7 @@ static void rsv_window_remove(struct sup
  *
  * returns 1 if the end block is EXT4_RESERVE_WINDOW_NOT_ALLOCATED.
  */
-static inline int rsv_is_empty(struct ext4_reserve_window *rsv)
+inline int rsv_is_empty(struct ext4_reserve_window *rsv)
 {
/* a valid reservation end block could not be 0 */
return rsv->_rsv_end == EXT4_RESERVE_WINDOW_NOT_ALLOCATED;
@@ -660,7 +660,7 @@ static int ext4_test_allocatable(ext4_gr
  * bitmap on disk and the last-committed copy in journal, until we find a
  * bit free in both bitmaps.
  */
-static ext4_grpblk_t
+ext4_grpblk_t
 bitmap_search_next_usable_block(ext4_grpblk_t start, struct buffer_head *bh,
ext4_grpblk_t maxblocks)
 {
@@ -1029,7 +1029,7 @@ static int find_next_reservable_window(
  * @bitmap_bh: the block group block bitmap
  *
  */
-static int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv,
+int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv,
ext4_grpblk_t grp_goal, struct super_block *sb,
unsigned int group, struct buffer_head *bitmap_bh)
 {
@@ -1173,7 +1173,7 @@ retry:
  * expand the reservation window size if necessary on a best-effort
  * basis before ext4_new_blocks() tries to allocate blocks,
  */
-static void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv,
+void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv,
struct super_block *sb, int size)
 {
struct ext4_reserve_window_node *next_rsv;
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test1/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c
--- linux-2.6.19-rc6-test1/fs/ext4/extents.c2007-06-20 15:42:15.0 
+0900
+++ Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c   
2007-06-20 15:50:14.0 +0900
@@ -43,7 +43,6 @@
 #include 
 #include 
 
-
 /*
  * ext_pblock:
  * combine low and high parts of physical block number into ext4_fsblk_t
@@ -206,11 +205,17 @@ static ext4_fsblk_t ext4_ext_find_goal(s
 static ext4_fsblk_t
 ext4_ext_new_block(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *ex, int *err)
+   struct ext4_extent *ex, int *err,
+   ext4_fsblk_t defrag_goal)
 {
ext4_fsblk_t goal, newblock;
 
-   goal = ext4_ext_find_goal(inode, path, le32_to_cpu(ex->ee_block));
+   if (defrag_goal) {
+   goal = defrag_goal;
+   } else {
+   goal= ext4_ext_find_goal(inode, path, 
+   le32_to_cpu(ex->ee_block));
+   }
newblock = ext4_new_block(handle, inode, goal, err);
return newblock;
 }
@@ -598,7 +603,8 @@ static int ext4_ext_insert_index(handle_
  */
 static int ext4_ext_split(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *newext, int at)
+   struct ext4_extent *newext, int at,
+   ext4_fsblk_t defrag_goal)
 {
struct buffer_head *bh = NULL;
int depth = ext_depth(inode);
@@ -649,7 +655,8 @@ static int ext4_ext_split(handle_t *hand
/* allocate all needed blocks */
ext_debug("allocate %d blocks for indexes/leaf\n", depth - at);
for (a = 0; a < depth - at; a++) {
-   newblock = ext4_ext_new_block(handle, inode, path, newext, 
&err);
+   newblock = ext4_ext_new_block(handle, inode, path, newext, &err,
+   defrag_goal);
if (newblock == 0)
goto cleanup;
ablocks[a] = newblock;
@@ -836,7 +843,8 @@ cleanup:
  */
 static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *newext)
+   

[RFC][PATCH 1/10] Allocate new contiguous blocks

2007-06-20 Thread Takashi Sato
Search contiguous free blocks with Alex's mutil-block allocation
and allocate them for the temporary inode.

This patch applies on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6-Alex/Documentation/dontdiff 
linux-2.6.19-rc6-Alex/fs/ext4/extents.c 
linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c
--- linux-2.6.19-rc6-Alex/fs/ext4/extents.c 2007-06-19 20:50:56.0 
+0900
+++ linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c  2007-06-20 10:54:11.0 
+0900
@@ -2335,6 +2335,713 @@ int ext4_ext_calc_metadata_amount(struct
return num;
 }
 
+/*
+ * this structure is used to gather extents from the tree via ioctl
+ */
+struct ext4_extent_buf {
+   ext4_fsblk_t start;
+   int buflen;
+   void *buffer;
+   void *cur;
+   int err;
+};
+
+/*
+ * this structure is used to collect stats info about the tree
+ */
+struct ext4_extent_tree_stats {
+   int depth;
+   int extents_num;
+   int leaf_num;
+};
+
+static int
+ext4_ext_store_extent_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *newex,
+   struct ext4_extent_buf *buf)
+{
+
+   if (newex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   if (buf->err < 0)
+   return EXT_BREAK;
+   if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen)
+   return EXT_BREAK;
+
+   if (!copy_to_user(buf->cur, newex, sizeof(*newex))) {
+   buf->err++;
+   buf->cur += sizeof(*newex);
+   } else {
+   buf->err = -EFAULT;
+   return EXT_BREAK;
+   }
+   return EXT_CONTINUE;
+}
+
+static int
+ext4_ext_collect_stats_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *ex,
+   struct ext4_extent_tree_stats *buf)
+{
+   int depth;
+
+   if (ex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   depth = ext_depth(inode);
+   buf->extents_num++;
+   if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr))
+   buf->leaf_num++;
+   return EXT_CONTINUE;
+}
+
+/**
+ * ext4_ext_next_extent - search for next extent and set it to "extent"
+ * @inode: inode of the the original file
+ * @path:  this will obtain data for next extent
+ * @extent:pointer to next extent we have just gotten
+ *
+ * This function returns 0 or 1(last_entry) if succeeded, otherwise
+ * returns -EIO
+ */
+static int
+ext4_ext_next_extent(struct inode *inode,
+struct ext4_ext_path *path,
+struct ext4_extent **extent)
+{
+   int ppos;
+   int leaf_ppos = path->p_depth;
+
+   ppos = leaf_ppos;
+   if (EXT_LAST_EXTENT(path[ppos].p_hdr) > path[ppos].p_ext) {
+   /* leaf block */
+   *extent = ++path[ppos].p_ext;
+   return 0;
+   }
+
+   while (--ppos >= 0) {
+   if (EXT_LAST_INDEX(path[ppos].p_hdr) >
+   path[ppos].p_idx) {
+   int cur_ppos = ppos;
+
+   /* index block */
+   path[ppos].p_idx++;
+   path[ppos].p_block =
+   idx_pblock(path[ppos].p_idx);
+   if (path[ppos+1].p_bh)
+   brelse(path[ppos+1].p_bh);
+   path[ppos+1].p_bh =
+   sb_bread(inode->i_sb, path[ppos].p_block);
+   if (!path[ppos+1].p_bh)
+   return  -EIO;
+   path[ppos+1].p_hdr =
+   ext_block_hdr(path[ppos+1].p_bh);
+
+   /* halfway index block */
+   while (++cur_ppos < leaf_ppos) {
+   path[cur_ppos].p_idx =
+   EXT_FIRST_INDEX(path[cur_ppos].p_hdr);
+   path[cur_ppos].p_block =
+   idx_pblock(path[cur_ppos].p_idx);
+   if (path[cur_ppos+1].p_bh)
+   brelse(path[cur_ppos+1].p_bh);
+   path[cur_ppos+1].p_bh = sb_bread(inode->i_sb,
+   path[cur_ppos].p_block);
+   if (!path[cur_ppos+1].p_bh)
+   return  -EIO;
+   path[cur_ppos+1].p_hdr =
+   ext_block_hdr(path[cur_ppos+1].p_bh);
+   }
+
+   /* leaf block */
+   path[leaf_ppos].p_

[RFC][PATCH 2/10] Move the file data to the new blocks

2007-06-20 Thread Takashi Sato
Move the blocks on the temporary inode to the original inode
by a page.
1. Read the file data from the old blocks to the page
2. Move the block on the temporary inode to the original inode
3. Write the file data on the page into the new blocks

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6-Alex/Documentation/dontdiff 
linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c 
linux-2.6.19-rc6-2-move/fs/ext4/extents.c
--- linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c  2007-06-20 10:54:11.0 
+0900
+++ linux-2.6.19-rc6-2-move/fs/ext4/extents.c   2007-06-20 11:00:45.0 
+0900
@@ -2533,6 +2533,565 @@ int ext4_ext_ioctl(struct inode *inode, 
 }
 
 /**
+ * ext4_ext_merge_across - merge extents across leaf block
+ *
+ * @handle journal handle
+ * @inode  target file's inode
+ * @o_startfirst original extent to be defraged
+ * @o_end  last original extent to be defraged
+ * @start_ext  first new extent to be merged
+ * @new_extmiddle of new extent to be merged
+ * @end_extlast new extent to be merged
+ *
+ * This function returns 0 if succeed, otherwise returns error value.
+ */
+static int
+ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode,
+   struct ext4_extent *o_start,
+   struct ext4_extent *o_end, struct ext4_extent *start_ext,
+   struct ext4_extent *new_ext, struct ext4_extent *end_ext,
+   int flag)
+{
+   struct ext4_ext_path *org_path = NULL;
+   unsigned long eblock = 0;
+   int err = 0;
+   int new_flag = 0;
+   int end_flag = 0;
+   int defrag_flag;
+
+   if (flag == DEFRAG_RESERVE_BLOCKS_SECOND)
+   defrag_flag = 1;
+   else
+   defrag_flag = 0;
+
+   if (le16_to_cpu(start_ext->ee_len) &&
+   le16_to_cpu(new_ext->ee_len) &&
+   le16_to_cpu(end_ext->ee_len)) {
+
+   if ((o_start) == (o_end)) {
+
+   /*   start_ext   new_extend_ext
+* dest |-|---||
+* org  |--|
+*/
+
+   end_flag = 1;
+   } else {
+
+   /*   start_ext   new_ext   end_ext
+* dest |-|--|-|
+* org  |---|--|
+*/
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+   }
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (!le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /*   start_ext  new_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((!le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /*new_ext   end_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+
+   /* If new_ext was first block */
+   if (!new_ext->ee_block)
+   eblock = 0;
+   else
+   eblock = le32_to_cpu(new_ext->ee_block);
+
+   new_flag = 1;
+   } else {
+   printk("Unexpected case \n");
+   return -EIO;
+   }
+
+   if (new_flag) {
+   org_path = ext4_ext_find_extent(inode, eblock, NULL);
+   if (IS_ERR(org_path)) {
+   err = PTR_ERR(org_path);
+   org_path = NULL;
+   goto ERR;
+   }
+   err = ext4_ext_insert_extent_defrag(handle, inode,
+   org_path, new_ext, defrag_flag);
+   if (err)
+   goto ERR;
+   }
+
+   if (end_flag) {
+   org_path = ext4_ext_find_extent(inode,
+   end_ext->ee_block -1, org_path);
+   if (IS_ERR(org_path)) {
+   err = PTR_ERR(org_path);
+   org_path = NULL;
+  

[RFC][PATCH 0/10] ext4 online defrag (ver 0.5)

2007-06-20 Thread Takashi Sato
Hi all,

I have updated my online defrag patchset for addition of a new function.
This function is defragmentation for free space.
If filesytem has insufficient contiguous free blocks, defrag tries to move
other files to make sufficient space and reallocates the contiguous blocks
for the target file.

This function can be used in the following fashion:
# e4defrag -f filename [blockno]

For create contiguous free blocks, reallocate target file to
the block group to which its inode belongs.
If set "blockno", defrag tries to move other files (except target file)
to indicated physical block offset, otherwise defrag tries to move them to 
the next block group to which its inode belongs.

Maximum of the target file size is same as capable maximum
size of one block group.

This time I add 6 ioctls for new function 
and they are used in order of the following.

Additional ioctl: 
- EXT4_IOC_GROUP_INFO
- EXT4_IOC_FREE_BLOCKS_INFO
- EXT4_IOC_EXTENTS_INFO
- EXT4_IOC_MOVE_VICTIM
- EXT4_IOC_RESERVE_BLOCK
- EXT4_IOC_BLOCK_RELEASE

1). Get s_blocks_per_group and s_inodes_per_group of target file.
(EXT4_IOC_GROUP_INFO)

In userspace, calculate block group number to which target file belongs with
the result of "1".

2). Get free blocks information of the target block group.
(EXT4_IOC_FREE_BLOCKS_INFO)
Read block bitmap of target block group then set
free block distribution to ext4_extents_info structure
as extents array. Finally return it to userspace.

3). Get all extents information of indicated inode number.
(EXT4_IOC_EXTENTS_INFO)
Set extents information of indicated inode number
to ext4_extent_info structure then return it to userspace. 

In userspace, call ioctl(EXT4_IOC_EXTENTS_INFO) for all of inodes
in the target group and calculate the combination of extents
which should be moved to other block group with the results of 2) and 3).
Its size will be same as target file's.

4). Move combination of extents from the target block group
to other block group to make free contiguous area in the target block group.
(EXT4_IOC_MOVE_VICTIM)

5). Reserve freed blocks of the target block group.
(EXT4_IOC_RESERVE_BLOCK)

6). Reallocate target file to reserved contiguous blocks with ext4_ext_defrag().
(EXT4_IOC_DEFRAG)

Current status:
These patches are at the experimental stage so they have issues and
items to improve. But these are worth enough to examine my trial.

Dependencies:
My patches depend on the following Alex's patches of the multi-block
allocation for Linux 2.6.19-rc6.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Outstanding issues:
Nothing for the moment.

Items to improve:
- Optimize the depth of extent tree and the number of leaf nodes
  after defragmentation.
- The blocks on the temporary inode are moved to the original inode
  by a page in the current implementation.  I have to tune
  the pages unit for the performance.
- Update the base kernel version when Alex's multi-block allocation patch
  is updated. 

Next steps:
- Make carry out movement of data as atomic transaction.
- Reduce the defrag influence upon other process with fadvice().


Summary of patches:
*These patches apply on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

[PATCH 1/10] Allocate new contiguous blocks with Alex's mballoc
- Search contiguous free blocks and allocate them for the temporary
  inode with Alex's multi-block allocation.

[PATCH 2/10] Move the file data to the new blocks
- Move the blocks on the temporary inode to the original inode
  by a page.

[PATCH 3/10] Get block group information
- Get s_blocks_per_group and s_inodes_per_group of target filesystem.

[PATCH 4/10] Get free blocks distribution of the target block group
- Get free blocks distribution of the target block group to know
  how many free blocks it has.

[PATCH 5/10] Get all extents information of indicated inode number
- Get all extents information of indicated inode number to calculate
the combination of extents which should be moved to other block group.

[PATCH 6/10] Move files from target block group to other block group
- To make contiguous free blocks, move files from the target block group
  to other block group. 

[PATCH 7/10] Reserve freed blocks
- Reserve the free blocks in the target area, not to be
  used by other process

[PATCH 8/10] Release reserved blocks
- Release reserved blocks if defrag failed.

[PATCH 9/10] Fix bugs in multi-block allocation and locality-group
- Move lg_list to s_locality_dirty in ext4_lg_sync_single_group()
  to flush all of dirty inodes.
- Fix ext4_mb_new_blocks() to return err value when defrag failed.

[PATCH 10/10] Online defrag command
- The defrag command.  Usage is as follows:
  o Put the multiple files closer together.
# e4defrag -r directory-name
  o Defrag for free space fragmentation.
# e4defrag -f file-name
  o Defrag for a single file.
# e4defrag file-name
  o

ext4 patch queue rebased to linux2.6.22-rc5

2007-06-20 Thread Mingming Cao
http://repo.or.cz/w/ext4-patch-queue.git

diff --git a/series b/series


index d68345c..766f3eb 100644 (file)


--- a/series

+++ b/series

@@ -1,4 +1,4 @@

-# Rebased the patches to 2.6.22-rc4

+# Rebased the patches to 2.6.22-rc5

 

 # Add mount option to turn off extents

 ext4_noextent_mount_opt.patch

@@ -60,6 +60,14 @@ jbd-stats-through-procfs

 # Remove 32000 subdirs limit. 

 ext4_remove_subdirs_limit.patch

 

+# Add unused inode watermark and checksum to blockgroup descriptors

+ext4_uninit_blockgroup.patch

+# need sign off

+update-uninit.patch

+

+# Need sing off and fix coding style

+# Add journal checksums

+ext4-journal_chksum-2.6.20.patch

 ##

 # Unstable patches

 # Note: still lots of outstanding comments from linux-ext4 list, 12/2006

@@ -81,3 +89,8 @@ ext4_nodelalloc_mount_opt.patch

 

 # Fix error returned from ext4_reserve_global

 ext4_reserve_global_return_error_fix.patch

+

+#Fix invariant checking in ext4_rebalance_reservation()

+#Missing signed offs

+ext4_rebalance_reservation_invariant_checking_fix.patch

+ext4_delalloc_setpageprivate_fix.patch

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] zero_user_page conversion

2007-06-20 Thread Mark Fasheh
On Wed, Jun 20, 2007 at 05:08:24PM -0500, Eric Sandeen wrote:
> Use zero_user_page() in cifs, ocfs2, ext4, and gfs2 where possible.

Ok, the ocfs2 bits looked fine so I folded that part of the patch into
ocfs2.git, thanks.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delayed allocatiou result in Oops

2007-06-20 Thread Mingming Cao
On Wed, 2007-06-20 at 12:15 +0400, Alex Tomas wrote:
> Mingming Cao wrote:
> > Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
> > =blocksize. Is that the reason we need page->private to pass the
> > request?
> 
> PageMappedToDisk isn't enough in that case, definitely. bh is the way
> to track state of each block (this is what i'm implementing now), but
> I think current nobh version (per-page flags are used) is valuable.
> 
> > That's good to know, thanks for the update. So probably above error case
> > handling will be addressed in the new version? 
> 
> well, you actually can move that SetPagePrivate() few lines above
> (Dmitriy already tested this).

Like this?

Index: linux-2.6.22-rc5/fs/ext4/writeback.c
===
--- linux-2.6.22-rc5.orig/fs/ext4/writeback.c   2007-06-20 16:41:26.0 
-0700
+++ linux-2.6.22-rc5/fs/ext4/writeback.c2007-06-20 16:44:10.0 
-0700
@@ -918,16 +918,16 @@ int ext4_wb_commit_write(struct file *fi
wb_debug("commit page %lu (%u-%u) for inode %lu\n",
page->index, from, to, inode->i_ino);
 
-   /* mark page private so that we get
-* called to invalidate/release page */
-   SetPagePrivate(page);
-
if (!PageBooked(page) && !PageMappedToDisk(page)) {
/* ->prepare_write() observed that block for this
 * page hasn't been allocated yet. there fore it
 * asked to reserve block for later allocation */
BUG_ON(page->private == 0);
page->private = 0;
+   /* mark page private so that we get
+* called to invalidate/release page */
+   SetPagePrivate(page);
+
err = ext4_wb_reserve_space_page(page, 1);
if (err)
return err;


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [36/37] Large blocksize support for ext2

2007-06-20 Thread Andreas Dilger
On Jun 20, 2007  14:27 -0700, Christoph Lameter wrote:
> > > Hmmm... Actually there is nothing additional to be done after the earlier
> > > cleanup of the macros. So just modify copyright.
> > 
> > It is NOT possible to have 64kB blocksize on ext2/3/4 without some small
> > changes to the directory handling code.  The reason is that an empty 64kB
> > directory block would have a rec_len == (__u16)2^16 == 0, and this would
> > cause an error to be hit in the filesystem.  What is needed is to put
> > 2 empty records in such a directory, or to special-case an impossible
> > value like rec_len = 0x to handle this.
> > 
> > There was a patch to fix the 64kB blocksize directory problem, but it
> > hasn't been merged anywhere yet seeing as there wasn't previously a
> > patch to allow larger blocksize...
> 
> mke2fs allows to specify a 64kb blocksize and IA64 can run with 64kb 
> PAGE_SIZE. So this is a bug in ext2fs that needs to be fixed regardless.

True.  I had increased the e2fsprogs blocksize to 16kB after testing it,
and after that it seems Ted increased it to 64kB after that.  The 64kB
directory problem only came out recently.

> > Having 32kB blocksize has no problems that I'm aware of.  Also, I'm not
> > sure how it happened, but ext2 SHOULD have an explicit check (as
> > ext3/4 does) limiting it to EXT2_MAX_BLOCK_SIZE.  Otherwise it appears
> > that there would be no error reported if the superblock reports e.g. 16MB
> > blocksize, and all kinds of things would break.
> 
> mke2fs fails for blocksizes > 64k so you are safe there. I'd like to see 
> that limit lifted?

I don't think extN can go to past 64kB blocksize in any case.

> > There shouldn't be a problem with increasing EXT{2,3,4}_MAX_BLOCK_SIZE to
> > 32kB (AFAIK), but I haven't looked into this in a while.
> 
> I'd love to see such a patch. That is also useful for arches that have 
> PAGE_SIZE > 4kb without this patchset.

Definitely, which is why we had been working on this originally.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] zero_user_page conversion

2007-06-20 Thread Eric Sandeen
Use zero_user_page() in cifs, ocfs2, ext4, and gfs2 where possible.

Compile tested, reviews welcome.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc4-mm2/fs/cifs/inode.c
===
--- linux-2.6.22-rc4-mm2.orig/fs/cifs/inode.c
+++ linux-2.6.22-rc4-mm2/fs/cifs/inode.c
@@ -1334,17 +1334,13 @@ static int cifs_truncate_page(struct add
pgoff_t index = from >> PAGE_CACHE_SHIFT;
unsigned offset = from & (PAGE_CACHE_SIZE - 1);
struct page *page;
-   char *kaddr;
int rc = 0;
 
page = grab_cache_page(mapping, index);
if (!page)
return -ENOMEM;
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, PAGE_CACHE_SIZE - offset, KM_USER0);
unlock_page(page);
page_cache_release(page);
return rc;
Index: linux-2.6.22-rc4-mm2/fs/ext4/inode.c
===
--- linux-2.6.22-rc4-mm2.orig/fs/ext4/inode.c
+++ linux-2.6.22-rc4-mm2/fs/ext4/inode.c
@@ -1830,7 +1830,6 @@ int ext4_block_truncate_page(handle_t *h
struct inode *inode = mapping->host;
struct buffer_head *bh;
int err = 0;
-   void *kaddr;
 
if ((EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) &&
test_opt(inode->i_sb, EXTENTS) &&
@@ -1847,10 +1846,7 @@ int ext4_block_truncate_page(handle_t *h
 */
if (!page_has_buffers(page) && test_opt(inode->i_sb, NOBH) &&
 ext4_should_writeback_data(inode) && PageUptodate(page)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length, KM_USER0);
set_page_dirty(page);
goto unlock;
}
@@ -1903,10 +1899,7 @@ int ext4_block_truncate_page(handle_t *h
goto unlock;
}
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, length);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, offset, length, KM_USER0);
 
BUFFER_TRACE(bh, "zeroed end of block");
 
Index: linux-2.6.22-rc4-mm2/fs/gfs2/ops_address.c
===
--- linux-2.6.22-rc4-mm2.orig/fs/gfs2/ops_address.c
+++ linux-2.6.22-rc4-mm2/fs/gfs2/ops_address.c
@@ -207,10 +207,7 @@ static int stuffed_readpage(struct gfs2_
 * so we need to supply one here. It doesn't happen often.
 */
if (unlikely(page->index)) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, PAGE_CACHE_SIZE);
-   kunmap_atomic(kaddr, KM_USER0);
-   flush_dcache_page(page);
+   zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
SetPageUptodate(page);
return 0;
}
Index: linux-2.6.22-rc4-mm2/fs/ocfs2/aops.c
===
--- linux-2.6.22-rc4-mm2.orig/fs/ocfs2/aops.c
+++ linux-2.6.22-rc4-mm2/fs/ocfs2/aops.c
@@ -739,18 +739,13 @@ int ocfs2_map_page_blocks(struct page *p
bh = head;
block_start = 0;
do {
-   void *kaddr;
-
block_end = block_start + bsize;
if (block_end <= from)
goto next_bh;
if (block_start >= to)
break;
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+block_start, 0, bh->b_size);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, block_start, bh->b_size, KM_USER0);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
 
@@ -895,15 +890,11 @@ static void ocfs2_zero_new_buffers(struc
if (block_end > from && block_start < to) {
if (!PageUptodate(page)) {
unsigned start, end;
-   void *kaddr;
 
start = max(from, block_start);
end = min(to, block_end);
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+start, 0, end - start);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_user_page(page, start, end - 
start, KM_USER0);
set_buffer_uptodate(bh);
  

Re: [36/37] Large blocksize support for ext2

2007-06-20 Thread Christoph Lameter
On Wed, 20 Jun 2007, Andreas Dilger wrote:

> On Jun 20, 2007  11:29 -0700, [EMAIL PROTECTED] wrote:
> > This adds support for a block size of up to 64k on any platform.
> > It enables the mounting filesystems that have a larger blocksize
> > than the page size.
> 
> Might have been good to CC the ext2/3/4 maintainers here?  I definitely
> have been waiting for a patch like this for ages (so definitely no
> objection from me), but there are a few caveats before this will work
> on ext2/3/4.

The CC list is already big so I thought those would be monitoring 
linux-fsdevel.

> > Hmmm... Actually there is nothing additional to be done after the earlier
> > cleanup of the macros. So just modify copyright.
> 
> It is NOT possible to have 64kB blocksize on ext2/3/4 without some small
> changes to the directory handling code.  The reason is that an empty 64kB
> directory block would have a rec_len == (__u16)2^16 == 0, and this would
> cause an error to be hit in the filesystem.  What is needed is to put
> 2 empty records in such a directory, or to special-case an impossible
> value like rec_len = 0x to handle this.
> 
> There was a patch to fix the 64kB blocksize directory problem, but it
> hasn't been merged anywhere yet seeing as there wasn't previously a
> patch to allow larger blocksize...

mke2fs allows to specify a 64kb blocksize and IA64 can run with 64kb 
PAGE_SIZE. So this is a bug in ext2fs that needs to be fixed regardless.

> Having 32kB blocksize has no problems that I'm aware of.  Also, I'm not
> sure how it happened, but ext2 SHOULD have an explicit check (as
> ext3/4 does) limiting it to EXT2_MAX_BLOCK_SIZE.  Otherwise it appears
> that there would be no error reported if the superblock reports e.g. 16MB
> blocksize, and all kinds of things would break.

mke2fs fails for blocksizes > 64k so you are safe there. I'd like to see 
that limit lifted?

> There shouldn't be a problem with increasing EXT{2,3,4}_MAX_BLOCK_SIZE to
> 32kB (AFAIK), but I haven't looked into this in a while.

I'd love to see such a patch. That is also useful for arches that have 
PAGE_SIZE > 4kb without this patchset.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [36/37] Large blocksize support for ext2

2007-06-20 Thread Andreas Dilger
On Jun 20, 2007  11:29 -0700, [EMAIL PROTECTED] wrote:
> This adds support for a block size of up to 64k on any platform.
> It enables the mounting filesystems that have a larger blocksize
> than the page size.

Might have been good to CC the ext2/3/4 maintainers here?  I definitely
have been waiting for a patch like this for ages (so definitely no
objection from me), but there are a few caveats before this will work
on ext2/3/4.

> Hmmm... Actually there is nothing additional to be done after the earlier
> cleanup of the macros. So just modify copyright.

It is NOT possible to have 64kB blocksize on ext2/3/4 without some small
changes to the directory handling code.  The reason is that an empty 64kB
directory block would have a rec_len == (__u16)2^16 == 0, and this would
cause an error to be hit in the filesystem.  What is needed is to put
2 empty records in such a directory, or to special-case an impossible
value like rec_len = 0x to handle this.

There was a patch to fix the 64kB blocksize directory problem, but it
hasn't been merged anywhere yet seeing as there wasn't previously a
patch to allow larger blocksize...

Having 32kB blocksize has no problems that I'm aware of.  Also, I'm not
sure how it happened, but ext2 SHOULD have an explicit check (as
ext3/4 does) limiting it to EXT2_MAX_BLOCK_SIZE.  Otherwise it appears
that there would be no error reported if the superblock reports e.g. 16MB
blocksize, and all kinds of things would break.

There shouldn't be a problem with increasing EXT{2,3,4}_MAX_BLOCK_SIZE to
32kB (AFAIK), but I haven't looked into this in a while.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Endianness bugs in e2fsck

2007-06-20 Thread Kalpak Shah
On Wed, 2007-06-20 at 11:09 -0400, Theodore Tso wrote:
> Hi Kalpak, 
> 
>   In the future it would be really helpful if you split up your
> patches so that each different thing is done in separate patches.

Sure. Do you want me to split this patch too and resend?

> 
>   Also, note there is a recent bug fix in this area, and the
> byte-swapping extended attributes.  The issues involved here are
> subtle; please see the discussion here:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=232663
> 
> So before your patches go in, we need to do a careful audit to make
> sure they don't interact properly with this patch which is already in
> e2fsprogs mainline.

This fix is already in our tree. We periodically keep syncing our tree
with the mainline changes. So I have tested my changes with this patch
present. 

Actually I had created test images for the expand-extra-isize patches
which failed on big-endian machines which brought these bugs to my
notice.

Thanks,
Kalpak.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Endianness bugs in e2fsck

2007-06-20 Thread Theodore Tso
On Wed, Jun 20, 2007 at 03:03:08PM +0530, Kalpak Shah wrote:
> In ext2fs_swap_inode_full() if to and from inodes are not the same
> (which is the case when called from e2fsck_get_next_inode_full),
> then e2fsck cannot recognize any in-inode EAs since the un-swabbed
> i_extra_isize was being used. So corrected that to use swabbed
> values all the time.

> Also in ext2fs_read_inode_full(), ext2fs_swap_inode_full() should be
> called with bufsize instead of with length argument. length was
> coming out to be 128 even with 512 byte inodes thus leaving the rest
> of the inode unswabbed.

Hi Kalpak, 

In the future it would be really helpful if you split up your
patches so that each different thing is done in separate patches.

Also, note there is a recent bug fix in this area, and the
byte-swapping extended attributes.  The issues involved here are
subtle; please see the discussion here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=232663

So before your patches go in, we need to do a careful audit to make
sure they don't interact properly with this patch which is already in
e2fsprogs mainline.

- Ted

# HG changeset patch
# User [EMAIL PROTECTED]
# Date 1176573631 14400
# Node ID aa8d65921c8922dfed73dd05027a097cc5946653
# Parent  4b2e34b5f7506f9f74b3fadf79280316d57e47d5
Correct byteswapping for fast symlinks with xattrs

Fix a problem byte-swapping fast symlinks inodes that contain extended
attributes.

Addresses Red Hat Bugzilla: #232663
Addresses LTC Bugzilla: #27634

Signed-off-by: "Bryn M. Reeves" <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>

diff -r 4b2e34b5f750 -r aa8d65921c89 e2fsck/ChangeLog
--- a/e2fsck/ChangeLog  Sat Apr 14 12:01:39 2007 -0400
+++ b/e2fsck/ChangeLog  Sat Apr 14 14:00:31 2007 -0400
@@ -1,4 +1,10 @@ 2007-04-14  Theodore Tso  <[EMAIL PROTECTED]
 2007-04-14  Theodore Tso  <[EMAIL PROTECTED]>
+
+   * pass2.c (e2fsck_process_bad_inode): Remove special kludge that
+   dealt with long symlinks on big endian systems.  It turns
+   out this was a workaround to a bug described in Red Hat
+   Bugzilla #232663, with an odd twist.  See comment #12 for
+   more details.
 
* pass1.c, pass2.c, util.c: Add better ehandler_operation()
markers so it is clearer what e2fsck was doing when an I/O
diff -r 4b2e34b5f750 -r aa8d65921c89 e2fsck/pass2.c
--- a/e2fsck/pass2.cSat Apr 14 12:01:39 2007 -0400
+++ b/e2fsck/pass2.cSat Apr 14 14:00:31 2007 -0400
@@ -1202,22 +1202,6 @@ extern int e2fsck_process_bad_inode(e2fs
!(fs->super->s_feature_compat & EXT2_FEATURE_COMPAT_EXT_ATTR)) {
if (fix_problem(ctx, PR_2_FILE_ACL_ZERO, &pctx)) {
inode.i_file_acl = 0;
-#ifdef EXT2FS_ENABLE_SWAPFS
-   /* 
-* This is a special kludge to deal with long
-* symlinks on big endian systems.  i_blocks
-* had already been decremented earlier in
-* pass 1, but since i_file_acl hadn't yet
-* been cleared, ext2fs_read_inode() assumed
-* that the file was short symlink and would
-* not have byte swapped i_block[0].  Hence,
-* we have to byte-swap it here.
-*/
-   if (LINUX_S_ISLNK(inode.i_mode) &&
-   (fs->flags & EXT2_FLAG_SWAP_BYTES) &&
-   (inode.i_blocks == fs->blocksize >> 9))
-   inode.i_block[0] = 
ext2fs_swab32(inode.i_block[0]);
-#endif
inode_modified++;
} else
not_fixed++;
diff -r 4b2e34b5f750 -r aa8d65921c89 lib/ext2fs/ChangeLog
--- a/lib/ext2fs/ChangeLog  Sat Apr 14 12:01:39 2007 -0400
+++ b/lib/ext2fs/ChangeLog  Sat Apr 14 14:00:31 2007 -0400
@@ -1,3 +1,9 @@ 2007-04-06  Theodore Tso  <[EMAIL PROTECTED]
+2007-04-14  Theodore Tso  <[EMAIL PROTECTED]>
+
+   * swapfs.c (ext2fs_swap_inode_full): Fix a problem byte-swapping 
+   fast symlinks inodes that contain extended attributes.
+   (Addresses Red Hat Bugzilla #232663, LTC bugzilla #27634)
+
 2007-04-06  Theodore Tso  <[EMAIL PROTECTED]>
 
* icount.c (ext2fs_create_icount_tdb): Add support for using TDB
diff -r 4b2e34b5f750 -r aa8d65921c89 lib/ext2fs/swapfs.c
--- a/lib/ext2fs/swapfs.c   Sat Apr 14 12:01:39 2007 -0400
+++ b/lib/ext2fs/swapfs.c   Sat Apr 14 14:00:31 2007 -0400
@@ -133,7 +133,7 @@ void ext2fs_swap_inode_full(ext2_filsys 
struct ext2_inode_large *f, int hostorder,
int bufsize)
 {
-   unsigned i;
+   unsigned i, has_data_blocks;
int islnk = 0;
__u32 *eaf, *eat;
 
@@ -150,11 +150,17 @@ void ext2fs_swap_inode_full(ext2_fil

ext2fs_block_iterate() on fast symlink

2007-06-20 Thread Jan Kara
  Hello,

  when  ext2fs_block_iterate() is called on a fast symlink (and I assume
device inodes would be no different), then random things happen - the
problem is ext2fs_block_iterate() just blindly takes portions of the inode
and treats them as block numbers. Now I agree that garbage went in (it
makes no sence to call this function on such inode) so garbage results but
maybe it would be nicer to handle it more gracefully. Attached patch should
do it.

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
--- a/lib/ext2fs/inode.c	2007-06-20 13:55:52.0 +0200
+++ b/lib/ext2fs/inode.c	2007-06-20 14:11:15.0 +0200
@@ -771,6 +771,10 @@ errcode_t ext2fs_get_blocks(ext2_filsys 
 	retval = ext2fs_read_inode(fs, ino, &inode);
 	if (retval)
 		return retval;
+	if (LINUX_S_ISCHR(inode.i_mode) || LINUX_S_ISBLK(inode.i_mode) ||
+	(LINUX_S_ISLNK(inode.i_mode) &&
+	 ext2fs_inode_data_blocks(fs, &inode) == 0))
+		return EXT2_ET_INVAL_INODE_TYPE;
 	for (i=0; i < EXT2_N_BLOCKS; i++)
 		blocks[i] = inode.i_block[i];
 	return 0;
--- a/lib/ext2fs/ext2_err.et.in	2007-06-20 14:09:18.0 +0200
+++ b/lib/ext2fs/ext2_err.et.in	2007-06-20 14:11:25.0 +0200
@@ -296,5 +296,8 @@ ec	EXT2_ET_RESIZE_INODE_CORRUPT,
 ec	EXT2_ET_SET_BMAP_NO_IND,
 	"Missing indirect block not present"
 
+ec	EXT2_ET_INVAL_INODE_TYPE,
+	"Invalid inode type for the operation."
+
 	end
 


Re: [PATCH] uninitialized groups ported - kernel

2007-06-20 Thread Girish Shilamkar

> 
> I've asked Girish to send an incremental patch.
> 
Here is the incremental patch,  to be applied after the patch sent by
Avantika for 2.6.22-rc4 kernel.

Regards,
Girish.
Index: linux-2.6.22-rc4/fs/ext4/super.c
===
--- linux-2.6.22-rc4.orig/fs/ext4/super.c
+++ linux-2.6.22-rc4/fs/ext4/super.c
@@ -1298,9 +1298,12 @@ __le16 ext4_group_desc_csum(struct ext4_
 		offset += sizeof(gdp->bg_checksum); /* skip checksum */
 		/*BUG_ON(offset != sizeof(*gdp)); /* XXX handle s_desc_size */
 		/* for checksum of struct ext4_group_desc do the rest...*/
-		if (offset < sbi->s_es->s_desc_size) {
+		if ((sbi->s_es->s_feature_incompat &
+		 cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT)) && 
+		offset < le16_to_cpu(sbi->s_es->s_desc_size)) {
 			crc = crc16(crc, (__u8 *)gdp + offset,
-sbi->s_es->s_desc_size - offset);
+le16_to_cpu(sbi->s_es->s_desc_size)
+	- offset);
 		} 
 	}
 
Index: linux-2.6.22-rc4/include/linux/ext4_fs.h
===
--- linux-2.6.22-rc4.orig/include/linux/ext4_fs.h
+++ linux-2.6.22-rc4/include/linux/ext4_fs.h
@@ -687,11 +687,11 @@ static inline int ext4_valid_inum(struct
 #define EXT4_FEATURE_COMPAT_EXT_ATTR		0x0008
 #define EXT4_FEATURE_COMPAT_RESIZE_INODE	0x0010
 #define EXT4_FEATURE_COMPAT_DIR_INDEX		0x0020
-#define EXT4_FEATURE_RO_COMPAT_GDT_CSUM		0x0040
 
 #define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER	0x0001
 #define EXT4_FEATURE_RO_COMPAT_LARGE_FILE	0x0002
 #define EXT4_FEATURE_RO_COMPAT_BTREE_DIR	0x0004
+#define EXT4_FEATURE_RO_COMPAT_GDT_CSUM		0x0010
 #define EXT4_FEATURE_RO_COMPAT_DIR_NLINK	0x0020
 #define EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE	0x0040
 


[PATCH] Endianness bugs in e2fsck

2007-06-20 Thread Kalpak Shah
In ext2fs_swap_inode_full() if to and from inodes are not the same (which is 
the case when called from e2fsck_get_next_inode_full), then e2fsck cannot 
recognize any in-inode EAs since the un-swabbed i_extra_isize was being used. 
So corrected that to use swabbed values all the time.

Also in ext2fs_read_inode_full(), ext2fs_swap_inode_full() should be called 
with bufsize instead of with length argument. length was coming out to be 128 
even with 512 byte inodes thus leaving the rest of the inode unswabbed.

On big-endian machines, ext2fs_get_next_inode_full() calls this for copying the 
inode:
ext2fs_swap_inode_full(scan->fs,
   (struct ext2_inode_large *) inode,
   (struct ext2_inode_large *) scan->ptr,
0, bufsize);
In ext2fs_swap_inode_full() only the first (GOOD_OLD_INODE_SIZE + 
i_extra_isize)bytes are copied into inode. The rest of the inode is not zeroed. 
So memset the inode to zero if swapfs is enabled. On little endian machines, 
memcpy(inode, scan->ptr, bufsize); is executed thereby hiding this error.

Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>

Index: e2fsprogs-1.39/lib/ext2fs/swapfs.c
===
--- e2fsprogs-1.39.orig/lib/ext2fs/swapfs.c 2007-06-19 22:31:20.0 
-0700
+++ e2fsprogs-1.39/lib/ext2fs/swapfs.c  2007-06-19 22:41:43.628732192 -0700
@@ -261,13 +261,13 @@ void ext2fs_swap_inode_full(ext2_filsys
return; /* no space for EA magic */

eaf = (__u32 *) (((char *) f) + sizeof(struct ext2_inode) +
-   f->i_extra_isize);
+   t->i_extra_isize);

if (ext2fs_swab32(*eaf) != EXT2_EXT_ATTR_MAGIC)
return; /* it seems no magic here */

eat = (__u32 *) (((char *) t) + sizeof(struct ext2_inode) +
-   f->i_extra_isize);
+   t->i_extra_isize);
*eat = ext2fs_swab32(*eaf);

/* convert EA(s) */
Index: e2fsprogs-1.39/lib/ext2fs/inode.c
===
--- e2fsprogs-1.39.orig/lib/ext2fs/inode.c  2007-06-19 22:31:21.0 
-0700
+++ e2fsprogs-1.39/lib/ext2fs/inode.c   2007-06-20 01:06:18.017788976 -0700
@@ -471,6 +471,7 @@ errcode_t ext2fs_get_next_inode_full(ext
scan->bytes_left -= scan->inode_size - extra_bytes;

 #ifdef EXT2FS_ENABLE_SWAPFS
+   memset(inode, 0, bufsize);
if ((scan->fs->flags & EXT2_FLAG_SWAP_BYTES) ||
(scan->fs->flags & EXT2_FLAG_SWAP_BYTES_READ))
ext2fs_swap_inode_full(scan->fs,
@@ -485,6 +486,7 @@ errcode_t ext2fs_get_next_inode_full(ext
scan->scan_flags &= ~EXT2_SF_BAD_EXTRA_BYTES;
} else {
 #ifdef EXT2FS_ENABLE_SWAPFS
+   memset(inode, 0, bufsize);
if ((scan->fs->flags & EXT2_FLAG_SWAP_BYTES) ||
(scan->fs->flags & EXT2_FLAG_SWAP_BYTES_READ))
ext2fs_swap_inode_full(scan->fs,
@@ -603,7 +605,7 @@ errcode_t ext2fs_read_inode_full(ext2_fi
(fs->flags & EXT2_FLAG_SWAP_BYTES_READ))
ext2fs_swap_inode_full(fs, (struct ext2_inode_large *) inode,
   (struct ext2_inode_large *) inode,
-  0, length);
+  0, bufsize);
 #endif

/* Update the inode cache */


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: delayed allocatiou result in Oops

2007-06-20 Thread Alex Tomas

Mingming Cao wrote:

Hmm, PageMappedToDisk is probably not sufficient enough for pagesize!
=blocksize. Is that the reason we need page->private to pass the
request?


PageMappedToDisk isn't enough in that case, definitely. bh is the way
to track state of each block (this is what i'm implementing now), but
I think current nobh version (per-page flags are used) is valuable.


That's good to know, thanks for the update. So probably above error case
handling will be addressed in the new version? 


well, you actually can move that SetPagePrivate() few lines above
(Dmitriy already tested this).


BTW, can you point me your latest and greatest mballoc patch? I am
trying to forward port and merge that patch to ext4 patch queue


I don't have version for mainline yet. will prepare soon. thanks for
your interest.

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html