[RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Takashi Sato
Hi,

Takashi Sato wrote:
>>> Instead, I'd like the sec to timeout on freeze API in order to thaw
>>> the filesystem automatically.  It can prevent a filesystem from staying
>>> frozen forever.
>>> (Because a freezer may cause a deadlock by accessing the frozen filesystem.)
>>
>>I'm still not very comfortable with the timeout; if you un-freeze on a
>>timer, how do you know that the work for which you needed the fileystem
>>frozen is complete?  How would you know if your snapshot was good if
>>there's a possibility that the fs unfroze while it was being taken?
>
>And how about adding the new ioctl to reset the timeval like below?
>(Dmitri proposed this idea before.)
> int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeval);
>fd:file descriptor of mountpoint
>FIFREEZE_RESET_TIMEOUT:request code for reset of timeout period 
>timeval:new timeout period
>This is useful for the application to set the timeval more accurately.
>For example, the freezer resets the timeval to 10 seconds every 5
>seconds.  In this approach, even if the freezer causes a deadlock
>by accessing the frozen filesystem, it will be solved by the timeout
>in 10 seconds and the freezer can recognize that at the next reset
>of timeval.

I have improved the following two points in my ext3 freeze feature.
o Add the new ioctl to reset the timeout period as above
  The usage is as below.
int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeval);
  fd:file descriptor of mountpoint
  FIFREEZE_RESET_TIMEOUT:request code for reset of timeout period
  timeval:new timeout period
  Return value: 0 if the operation succeeds. Otherwise, -1
  Error number: If the filesystem has already been unfrozen,
it sets EINVAL to errno.
  I have made sure the following two results with this ioctl.
  - After the deadlock occurred by accessing the frozen filesystem,
it could be solved by the reset timeout.
  - And the freezer could recognize that from the error number (EINVAL)
at the next reset of timeval.

o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS
  As Andreas Dilger and Christoph Hellwig advised me, I have elevated
  them to include/linux/fs.h as below.
#define FIFREEZE_IOWR('X', 119, int)
   #define FITHAW  _IOWR('X', 120, int)
  The ioctl numbers used by XFS applications don't need to be changed.
  But my following ioctl for the freeze needs the parameter
  as the timeout period.  So if XFS applications don't want the timeout
  feature as the current implementation, the parameter needs to be
  changed 1 (level?) into 0.

I haven't changed the following ioctls from the previous version.
  int ioctl(int fd, int cmd, long *timeval)
fd: The file descriptor of the mountpoint
cmd: FIFREEZE for the freeze or FITHAW for the unfreeze
timeval: The timeout value expressed in seconds
 If it's 0, the timeout isn't set.
Return value: 0 if the operation succeeds. Otherwise, -1

Any comments are very welcome.

Cheers, Takashi

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff 
linux-2.6.25-rc3.org/drivers/md/dm.c linux-2.6.25-rc3-freeze/drivers/
md/dm.c
--- linux-2.6.25-rc3.org/drivers/md/dm.c2008-02-25 06:25:54.0 
+0900
+++ linux-2.6.25-rc3-freeze/drivers/md/dm.c 2008-02-25 10:50:04.0 
+0900
@@ -1407,7 +1407,7 @@ static int lock_fs(struct mapped_device 
 
WARN_ON(md->frozen_sb);
 
-   md->frozen_sb = freeze_bdev(md->suspended_bdev);
+   md->frozen_sb = freeze_bdev(md->suspended_bdev, 0);
if (IS_ERR(md->frozen_sb)) {
r = PTR_ERR(md->frozen_sb);
md->frozen_sb = NULL;
diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff 
linux-2.6.25-rc3.org/fs/block_dev.c linux-2.6.25-rc3-freeze/fs/block_
dev.c
--- linux-2.6.25-rc3.org/fs/block_dev.c 2008-02-25 06:25:54.0 +0900
+++ linux-2.6.25-rc3-freeze/fs/block_dev.c  2008-02-25 10:50:04.0 
+0900
@@ -284,6 +284,11 @@ static void init_once(struct kmem_cache 
INIT_LIST_HEAD(&bdev->bd_holder_list);
 #endif
inode_init_once(&ei->vfs_inode);
+
+   /* Initialize semaphore for freeze. */
+   sema_init(&bdev->bd_freeze_sem, 1);
+   /* Setup freeze timeout function. */
+   INIT_DELAYED_WORK(&bdev->bd_freeze_timeout, freeze_timeout);
 }
 
 static inline void __bd_forget(struct inode *inode)
diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff 
linux-2.6.25-rc3.org/fs/buffer.c linux-2.6.25-rc3-freeze/fs/buffer.c
--- linux-2.6.25-rc3.org/fs/buffer.c2008-02-25 06:25:54.0 +0900
+++ linux-2.6.25-rc3-freeze/fs/buffer.c 2008-02-25 10:50:04.0 +0900
@@ -190,17 +190,33 @@ int fsync_bdev(stru

Re: [RFC] ext3 freeze feature

2008-02-15 Thread Takashi Sato

Hi,

Christoph Hellwig wrote:

On Fri, Feb 08, 2008 at 08:26:57AM -0500, Andreas Dilger wrote:

You may as well make the common ioctl the same as the XFS version,
both by number and parameters, so that applications which already
understand the XFS ioctl will work on other filesystems.


Yes.  In facy you should be able to lift the implementations of
XFS_IOC_FREEZE and XFS_IOC_THAW to generic code, there's nothing
XFS-specific in there.


According to Documentation/ioctl-number.txt,
XFS_IOC_XXXs (_IOWR('X', aa, bb)) are defined for XFS like below.

From Documentation/ioctl-number.txt:


CodeSeq#Include FileComments

: :
'X' all linux/xfs_fs.h

So XFS_IOC_FREEZE and XFS_IOC_THAW cannot be lifted to generic code simply.
I think we should create new generic numbers for freeze and thaw
like FIBMAP as followings.
linux/fs.h:
#define FIFREEZE _IO(0x00,3)
#define FITHAW   _IO(0x00,4)

And xfs_freeze calls XFS_IOC_FREEZE with a magic number 1, but what is 1?
Instead, I'd like the sec to timeout on freeze API in order to thaw
the filesystem automatically.  It can prevent a filesystem from staying
frozen forever.
(Because a freezer may cause a deadlock by accessing the frozen filesystem.)

Any comments are very welcome.

Cheers, Takashi 


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-02-13 Thread Takashi Sato

Hi,


P.S.  Oh yeah, it should be noted that freezing at the filesystem
layer does *not* guarantee that changes to the block device aren't
happening via mmap()'ed files.  The LVM needs to freeze writes the
block device level if it wants to guarantee a completely stable
snapshot image.  So the proposed patch doens't quite give you those
guarantees, if that was the intended goal.


I don't think a mmap()'ed file is written to a block device while a filesystem
is frozen.  pdflush starts the writing procedure of the mmap()'ed file's
data and calls ext3_ordered_writepage.  ext3_ordered_writepage calls
ext3_journal_start to get the journal handle.  As a result, the process
waits for unfreeze in start_this_handle.
pdflush
::
ext3_ordered_writepage
ext3_journal_start
ext3_journal_start_sb
journal_start
start_this_handle <--- wait here

I actually tried freezing the filesystem after updating the mmap()'ed
file's data.  But, the writing to the block device didn't happen.
(It happened right after unfreeze.)

I don't think the freeze feature on the block device level is needed
because the writing for the mmap()'ed file is suspended on
the frozen filesystem.

Any comments are very welcome.

Cheers, Takashi 


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-02-08 Thread Takashi Sato

Hi,

Ted wrote:

And I do agree that we probably should just implement this in
filesystem independent way, in which case all of the filesystems that
support this already have super_operations functions
write_super_lockfs() and unlockfs().

So if this is done using a new system call, there should be no
filesystem-specific changes needed, and all filesystems which support
those super_operations method functions would be able to provide this
functionality to the new system call.


OK I would like to implement the freeze feature on VFS
as the filesystem independent ioctl so that it can be
available on filesystems that have already had write_super_lockfs()
and unlockfs().
The usage for the freeze ioctl is the following.
 int ioctl(int fd, int FIFREEZE, long *timeval);
   fd:file descriptor of mountpoint
   FIFREEZE:request cord for freeze
   timeval:timeout period (second)

And the unfreeze ioctl is the following.
 int ioctl(int fd, int FITHAW, NULL);
   fd:file descriptor of mountpoint
   FITHAW:Request cord for unfreeze

I think we need the timeout feature which thaws the filesystem
after lapse of specified time for a fail-safe in case the freezer
accesses the frozen filesystem and causes a deadlock.
I intend to implement the timeout feature on VFS.
(This is realized by registering the delayed work which calls
thaw_bdev() to the delayed work queue.)

Any comments are very welcome.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-02-06 Thread Takashi Sato

Hi,


What you *could* do is to start putting processes to sleep if they
attempt to write to the frozen filesystem, and then detect the
deadlock case where the process holding the file descriptor used to
freeze the filesystem gets frozen because it attempted to write to the
filesystem --- at which point it gets some kind of signal (which
defaults to killing the process), and the filesystem is unfrozen and
as part of the unfreeze you wake up all of the processes that were put
to sleep for touching the frozen filesystem.

The other approach would be to say, "oh well, the freeze ioctl is
inherently dangerous, and root is allowed to himself in the foot, so
who cares".  :-)


Currently the XFS freezer doesn't solve a deadlock automatically
and we rely on administrators for ensuring that the freezer will not
access the filesystem.
And even if the wrong freezer causes a deadlock, it can be solved
by other unfreeze process(unfreeze command).

So I don't think the freezer itself needs to solve the deadlock.
I think the timeout is effective for a unexpected deadlock
and the timeout extending feature is very useful
as Dmitri proposed.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-28 Thread Takashi Sato

Hi,


What you *could* do is to start putting processes to sleep if they
attempt to write to the frozen filesystem, and then detect the
deadlock case where the process holding the file descriptor used to
freeze the filesystem gets frozen because it attempted to write to the
filesystem --- at which point it gets some kind of signal (which
defaults to killing the process), and the filesystem is unfrozen and
as part of the unfreeze you wake up all of the processes that were put
to sleep for touching the frozen filesystem.


I don't think close() usually writes to journal and the deadlock occurs.
Is there the special case which close() writes to journal in case of
getting signal?

Cheers, Takashi 


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-28 Thread Takashi Sato

Hi,

Thank you for your comments.


That's inherently unsafe - you can have multiple unfreezes
running in parallel which seriously screws with the bdev semaphore
count that is used to lock the device due to doing multiple up()s
for every down.

Your timeout thingy guarantee that at some point you will get
multiple up()s occuring due to the timer firing racing with
a thaw ioctl. 


If this interface is to be more widely exported, then it needs
a complete revamp of the bdev is locked while it is frozen so
that there is no chance of a double up() ever occuring on the
bd_mount_sem due to racing thaws.


My patch has the race condition as you said.
I will fix it.

Cheers, Takashi 


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext3 freeze feature

2008-01-25 Thread Takashi Sato

Hi,


I am also wondering whether we should have system call(s) for these:

On Jan 25, 2008 12:59 PM, Takashi Sato <[EMAIL PROTECTED]> wrote:

+   case EXT3_IOC_FREEZE: {



+   case EXT3_IOC_THAW: {


And just convert XFS to use them too?


I think it is reasonable to implement it as the generic system call, as you 
said.
Does XFS folks think so?

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] ext3 freeze feature

2008-01-25 Thread Takashi Sato
Hi,

Currently, ext3 doesn't have the freeze feature which suspends write
requests.  So, we cannot get a backup which keeps the filesystem's
consistency with the storage device's features (snapshot, replication)
while it is mounted.
In many case, a commercial filesystems (e.g. VxFS) has the freeze
feature and it would be used to get the consistent backup.

So I am planning on implementing the ioctl of the freeze feature for ext3.
I think we can get the consistent backup with the following steps.
1. Freeze the filesystem with ioctl.
2. Separate the replication volume or get the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with ioctl.
4. Get the backup from the separated replication volume
   or the snapshot.

The usage of the ioctl is as below.
 int ioctl(int fd, int cmd, long *timeval)
 fd: The file descriptor of the mountpoint.
 cmd: EXT3_IOC_FREEZE for the freeze or EXT3_IOC_THAW for the unfreeze.
 timeval: The timeout value expressed in seconds.
  If it's 0, the timeout isn't set.
 Return value: 0 if the operation succeeds. Otherwise, -1.

I have made sure that write requests were suspended with the experimental
patch for this feature and attached it in this mail.

The points of the implementation are followings.
- Add calls of the freeze function (freeze_bdev) and
  the unfreeze function (thaw_bdev) in ext3_ioctl().

- ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev)
  is registered to the delayed work queue to unfreeze the filesystem
  automatically after the lapse of the specified time.

Any comments are very welcome.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
linux-2.6.24-rc8/fs/ext3/ioctl.c linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c
--- linux-2.6.24-rc8/fs/ext3/ioctl.c2008-01-16 13:22:48.0 +0900
+++ linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c 2008-01-22 18:20:33.0 
+0900
@@ -254,6 +254,42 @@ flags_err:
return err;
}
 
+   case EXT3_IOC_FREEZE: {
+   long timeout_sec;
+   long timeout_msec;
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+   if (inode->i_sb->s_frozen != SB_UNFROZEN)
+   return -EINVAL;
+   /* arg(sec) to tick value */
+   get_user(timeout_sec, (long __user *) arg);
+   timeout_msec = timeout_sec * 1000;
+   if (timeout_msec < 0)
+   return -EINVAL;
+
+   /* Freeze */
+   freeze_bdev(inode->i_sb->s_bdev);
+
+   /* set up unfreeze timer */
+   if (timeout_msec > 0)
+   ext3_add_freeze_timeout(EXT3_SB(inode->i_sb),
+   timeout_msec);
+   return 0;
+   }
+   case EXT3_IOC_THAW: {
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+   if (inode->i_sb->s_frozen == SB_UNFROZEN)
+   return -EINVAL;
+
+   /* delete unfreeze timer */
+   ext3_del_freeze_timeout(EXT3_SB(inode->i_sb));
+
+   /* Unfreeze */
+   thaw_bdev(inode->i_sb->s_bdev, inode->i_sb);
+
+   return 0;
+   }
 
default:
return -ENOTTY;
diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff 
linux-2.6.24-rc8/fs/ext3/super.c linux-2.6.24-rc8-freeze/fs/ext3/super.c
--- linux-2.6.24-rc8/fs/ext3/super.c2008-01-16 13:22:48.0 +0900
+++ linux-2.6.24-rc8-freeze/fs/ext3/super.c 2008-01-22 18:20:33.0 
+0900
@@ -63,6 +63,7 @@ static int ext3_statfs (struct dentry * 
 static void ext3_unlockfs(struct super_block *sb);
 static void ext3_write_super (struct super_block * sb);
 static void ext3_write_super_lockfs(struct super_block *sb);
+static void ext3_freeze_timeout(struct work_struct *work);
 
 /*
  * Wrappers for journal_start/end.
@@ -323,6 +324,44 @@ void ext3_update_dynamic_rev(struct supe
 }
 
 /*
+ * ext3_add_freeze_timeout - Add timeout for ext3 freeze.
+ *
+ * @sbi: ext3 super block
+ * @timeout_msec   : timeout period
+ *
+ * Add the delayed work for ext3 freeze timeout
+ * to the delayed work queue.
+ */
+void ext3_add_freeze_timeout(struct ext3_sb_info *sbi,
+   long timeout_msec)
+{
+   s64 timeout_jiffies = msecs_to_jiffies(timeout_msec);
+
+   /*
+* setup freeze timeout function
+*/
+   INIT_DELAYED_WORK(&sbi->s_freeze_timeout, ext3_freeze_timeout);
+
+   /* set delayed work queue */
+   cancel_delayed_work(&sbi->s_freeze_timeout);
+   schedule_delayed_work(&sbi->s_freeze_timeout, timeout_jiffies);
+}
+
+/*
+ * ext3_del_freeze_timeout - Delete timeout for ext3 freeze.
+ *
+ * @sbi: ext3 super block
+ *

[RFC][PATCH 10/10] Online defrag command

2007-06-20 Thread Takashi Sato
- The defrag command.  Usage is as follows:
  o Put the multiple files closer together.
# e4defrag -r directory-name
  o Defrag for free space fragmentation.
# e4defrag -f file-name
  o Defrag for a single file.
# e4defrag file-name
  o Defrag for all files on ext4.
# e4defrag device-name

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
/*
 * e4defrag.c - ext4 filesystem defragmenter
 */

#ifndef _LARGEFILE_SOURCE
#define _LARGEFILE_SOURCE
#endif

#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE
#endif

#define _XOPEN_SOURCE   500
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data)
#define EXT4_IOC_GROUP_INFO _IOW('f', 11, struct ext4_group_data_info)
#define EXT4_IOC_FREE_BLOCKS_INFO _IOW('f', 12, struct ext4_extents_info)
#define EXT4_IOC_EXTENTS_INFO   _IOW('f', 13, struct ext4_extents_info)
#define EXT4_IOC_RESERVE_BLOCK  _IOW('f', 14, struct ext4_extents_info)
#define EXT4_IOC_MOVE_VICTIM_IOW('f', 15, struct ext4_extents_info)
#define EXT4_IOC_BLOCK_RELEASE  _IO('f', 16)


#define _FILE_OFFSET_BITS 64
#define ext4_fsblk_tunsigned long long
#define DEFRAG_MAX_ENT  32

/* Extent status which are used in ext_in_group */
#define EXT4_EXT_USE0
#define EXT4_EXT_FREE   1
#define EXT4_EXT_RESERVE2

/* Insert list2 after list1 */
#define insert(list1,list2) { list2 ->next = list1->next;\
list1->next->prev = list2;\
list2->prev = list1;\
list1->next = list2;\
}

#define DEFRAG_RESERVE_BLOCK_SECOND 2

/* Magic number for ext4 */
#define EXT4_SUPER_MAGIC0xEF53

/* The number of pages to defrag at one time */
#define DEFRAG_PAGES128

/* Maximum length of contiguous blocks */
#define MAX_BLOCKS_LEN  16384

/* Force defrag mode: Max file size in bytes (128MB) */
#define MAX_FILE_SIZE   (unsigned long)1 << 27

/* Force defrag mode: Max filesystem relative offset (48bit) */
#define MAX_FS_OFFSET_BIT   48

/* Data type for filesystem-wide blocks number */
#define  ext4_fsblk_t unsigned long long

/* Ioctl command */
#define EXT4_IOC_FIBMAP _IOW('f', 9, ext4_fsblk_t)
#define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data)

#define DEVNAME 0
#define DIRNAME 1
#define FILENAME2

#define RETURN_OK   0
#define RETURN_NG   -1
#define FTW_CONT0
#define FTW_STOP-1
#define FTW_OPEN_FD 2000
#define FILE_CHK_OK 0
#define FILE_CHK_NG -1
#define FS_EXT4 "ext4dev"
#define ROOT_UID0

/* Defrag block size, in bytes */
#define DEFRAG_SIZE 67108864

#define min(x,y) (((x) > (y)) ? (y) : (x))

#define PRINT_ERR_MSG(msg)  fprintf(stderr, "%s\n", (msg));
#define PRINT_FILE_NAME(file)   fprintf(stderr, "\t\t\"%s\"\n", (file));

#define MSG_USAGE   \
"Usage : e4defrag [-v] file...| directory...| device...\n\
  : e4defrag -f file [blocknr] \n\
  : e4defrag -r directory... | device... \n"

#define MSG_R_OPTION" with regional block allocation mode.\n"
#define NGMSG_MTAB  "\te4defrag  : Can not access /etc/mtab."
#define NGMSG_UNMOUNT   "\te4defrag  : FS is not mounted."
#define NGMSG_EXT4  "\te4defrag  : FS is not ext4 File System."
#define NGMSG_FS_INFO   "\te4defrag  : get FSInfo fail."
#define NGMSG_FILE_INFO "\te4defrag  : get FileInfo fail."
#define NGMSG_FILE_OPEN "\te4defrag  : open fail."
#define NGMSG_FILE_SYNC "\te4defrag  : sync(fsync) fail."
#define NGMSG_FILE_DEFRAG   "\te4defrag  : defrag fail."
#define NGMSG_FILE_BLOCKSIZE"\te4defrag  : can't get blocksize."
#define NGMSG_FILE_FIBMAP   "\te4defrag  : can't get block number."
#define NGMSG_FILE_UNREG"\te4defrag  : File is not regular file."

#define NGMSG_FILE_LARGE\
"\te4defrag  : Defrag size is larger than FileSystem's free space."

#define NGMSG_FILE_PRIORITY \
"\te4defrag  : File is not current user's file or current user is not root."

#define NGMSG_FILE_LOCK "\te4defrag  : File is locked."
#define NGMSG_FILE_BLANK"\te4defrag  : File size is 0."
#define NGMSG_GET_LCKINFO  

[RFC][PATCH 9/10] Fix bugs in multi-block allocation and locality-group

2007-06-20 Thread Takashi Sato
- Move lg_list to s_locality_dirty in ext4_lg_sync_single_group()
  to flush all of dirty inodes.
- Fix ext4_mb_new_blocks() to return err value when defrag failed.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test3/fs/ext4/lg.c 
Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/lg.c
--- linux-2.6.19-rc6-test3/fs/ext4/lg.c 2007-06-20 16:56:16.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/lg.c 2007-06-18 
14:21:54.0 +0900
@@ -389,6 +389,10 @@ int ext4_lg_sync_single_group(struct sup
cond_resched();
spin_lock(&inode_lock);
if (wbc->nr_to_write <= 0) {
+   if (!list_empty(&lg->lg_io)) {
+   set_bit(EXT4_LG_DIRTY, &lg->lg_flags);
+   list_move(&lg->lg_list, &sbi->s_locality_dirty);
+   }
rc = EXT4_STOP_WRITEBACK;
code = 6;
break;
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test3/fs/ext4/mballoc.c 
Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/mballoc.c
--- linux-2.6.19-rc6-test3/fs/ext4/mballoc.c2007-06-20 16:58:22.0 
+0900
+++ Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/mballoc.c2007-06-18 
14:21:54.0 +0900
@@ -3732,8 +3732,10 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t
!(EXT4_I(ar->inode)->i_state & EXT4_STATE_BLOCKS_RESERVED)) {
reserved = ar->len;
err = ext4_reserve_blocks(sb, reserved);
-   if (err)
+   if (err) {
+   *errp = err;
return err;
+   }
}
 
if (!ext4_mb_use_preallocated(&ac)) {
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 8/10] Release reserved block

2007-06-20 Thread Takashi Sato
- Release reserved blocks if defrag failed.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-lg-mballoc-bugfix/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c  
2007-06-19 20:19:14.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-lg-mballoc-bugfix/fs/ext4/extents.c  
2007-06-18 14:23:07.0 +0900
@@ -3066,6 +3066,10 @@ int ext4_ext_ioctl(struct inode *inode, 
 
err = ext4_ext_defrag_victim(filp, &ext_info);
 
+   } else if (cmd == EXT4_IOC_BLOCK_RELEASE) {
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   ext4_discard_reservation(inode);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
} else if (cmd == EXT4_IOC_DEFRAG) {
struct ext4_ext_defrag_data defrag;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 7/10] Reserve freed blocks

2007-06-20 Thread Takashi Sato
- Reserve the free blocks in the target area, not to be
  used by other process.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c  
2007-06-19 21:40:55.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c  
2007-06-19 20:19:14.0 +0900
@@ -2619,6 +2619,182 @@ out:
 }
 
 /**
+ * ext4_ext_defrag_reserve - reserve blocks for defrag
+ * @inode  target inode
+ * @goal   block reservation goal
+ * @lenblocks count to reserve
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+
+int ext4_ext_defrag_reserve(struct inode * inode, ext4_fsblk_t goal, int len)
+{
+   struct super_block *sb = NULL;
+   handle_t *handle = NULL;
+   struct buffer_head *bitmap_bh = NULL;
+   struct ext4_block_alloc_info *block_i;
+   struct ext4_reserve_window_node * my_rsv = NULL;
+   unsigned short windowsz = 0;
+   unsigned long group_no;
+   ext4_grpblk_t grp_target_blk;
+   int err = 0;
+
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+
+   handle = ext4_journal_start(inode, EXT4_RESERVE_TRANS_BLOCKS);
+   if (IS_ERR(handle)) {
+   err = PTR_ERR(handle);
+   handle = NULL;
+   goto out;
+   }
+
+   if (S_ISREG(inode->i_mode) && (!EXT4_I(inode)->i_block_alloc_info)) {
+   ext4_init_block_alloc_info(inode);
+   } else if (!S_ISREG(inode->i_mode)) {
+   printk(KERN_ERR "ext4_ext_defrag_reserve:"
+" incorrect file type\n");
+   err = -1;
+   goto out;
+   }
+
+   sb = inode->i_sb;
+   if (!sb) {
+   printk("ext4_ext_defrag_reserve: nonexistent device\n");
+   err = -ENXIO;
+   goto out;
+   }
+   ext4_get_group_no_and_offset(sb, goal, &group_no,
+   &grp_target_blk);
+
+   block_i = EXT4_I(inode)->i_block_alloc_info;
+
+   if (!block_i || ((windowsz =
+   block_i->rsv_window_node.rsv_goal_size) == 0)) {
+   printk("ex4_ext_defrag_reserve: unable to reserve\n");
+   err = -1;
+   goto out;
+   }
+
+   my_rsv = &block_i->rsv_window_node;
+
+   bitmap_bh = read_block_bitmap(sb, group_no);
+   if (!bitmap_bh) {
+   err = -ENOSPC;
+   goto out;
+   }
+
+   BUFFER_TRACE(bitmap_bh, "get undo access for new block");
+   err = ext4_journal_get_undo_access(handle, bitmap_bh);
+   if (err)
+   goto out;
+
+   err = alloc_new_reservation(my_rsv, grp_target_blk, sb,
+   group_no, bitmap_bh);
+   if (err < 0) {
+   printk(KERN_ERR "defrag: reservation faild\n");
+   ext4_discard_reservation(inode);
+   goto out;
+   } else {
+   if (len > EXT4_DEFAULT_RESERVE_BLOCKS) {
+   try_to_extend_reservation(my_rsv, sb,
+   len - EXT4_DEFAULT_RESERVE_BLOCKS);
+   }
+   }
+
+out:
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   ext4_journal_release_buffer(handle, bitmap_bh);
+   brelse(bitmap_bh);
+
+   if (handle)
+   ext4_journal_stop(handle);
+
+   return err;
+}
+
+int goal_in_my_reservation(struct ext4_reserve_window *, ext4_grpblk_t,
+   unsigned int, struct super_block *);
+int rsv_is_empty(struct ext4_reserve_window *);
+
+/**
+ * ext4_ext_block_within_rsv - Is target extent reserved ?
+ * @ inode   inode of target file
+ * @ ex_startstart physical block number of the extent
+ *   which already moved
+ * @ ex_len  block length of the extent which already moved
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+static int ext4_ext_block_within_rsv(struct inode *inode,
+   ext4_fsblk_t ex_start, int ex_len)
+{
+   struct super_block *sb = inode->i_sb;
+   struct ext4_block_alloc_info *block_i;
+   unsigned long group_no;
+   ext4_grpblk_t grp_blk;
+   struct ext4_reserve_window_node *rsv;
+
+   block_i = EXT4_I(inode)->i_block_alloc_info;
+   if (block_i && block_i->rsv_window_node.rsv_goal_size > 0) {
+   rsv = &block_i->rsv_window_node;
+   if (rsv_is_empty(&rsv->rsv_window)) {
+   printk("defrag: Ca

[RFC][PATCH 6/10] Move files from target block group to other block group

2007-06-20 Thread Takashi Sato
- To make contiguous free blocks, move files from the target block group
  to other block group. 

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c
2007-06-20 08:27:44.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c  
2007-06-19 21:40:55.0 +0900
@@ -1279,20 +1279,20 @@ ext4_can_extents_be_merged(struct inode 
 }
 
 /*
- * ext4_ext_insert_extent:
- * tries to merge requsted extent into the existing extent or
- * inserts requested extent as new one into the tree,
- * creating new leaf in the no-space case.
+ * ext4_ext_insert_extent_defrag:
+ * The difference from ext4_ext_insert_extent is to use the first block
+ * in newext as the goal of the new index block.
  */
-int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
+int ext4_ext_insert_extent_defrag(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *newext)
+   struct ext4_extent *newext, int defrag)
 {
struct ext4_extent_header * eh;
struct ext4_extent *ex, *fex;
struct ext4_extent *nearex; /* nearest extent */
struct ext4_ext_path *npath = NULL;
int depth, len, err, next;
+   ext4_fsblk_t defrag_goal;
 
BUG_ON(newext->ee_len == 0);
depth = ext_depth(inode);
@@ -1342,11 +1342,17 @@ repeat:
  le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max));
}
 
+   if (defrag) {
+   defrag_goal = ext_pblock(newext);
+   } else {
+   defrag_goal = 0;
+   }
/*
 * There is no free space in the found leaf.
 * We're gonna add a new leaf in the tree.
 */
-   err = ext4_ext_create_new_leaf(handle, inode, path, newext);
+   err = ext4_ext_create_new_leaf(handle, inode, path,
+   newext, defrag_goal);
if (err)
goto cleanup;
depth = ext_depth(inode);
@@ -1438,6 +1444,19 @@ cleanup:
return err;
 }
 
+/*
+ * ext4_ext_insert_extent:
+ * tries to merge requsted extent into the existing extent or
+ * inserts requested extent as new one into the tree,
+ * creating new leaf in the no-space case.
+ */
+int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_extent *newext)
+{
+   return ext4_ext_insert_extent_defrag(handle, inode, path, newext, 0);
+}
+
 int ext4_ext_walk_space(struct inode *inode, unsigned long block,
unsigned long num, ext_prepare_callback func,
void *cbdata)
@@ -2600,6 +2619,70 @@ out:
 }
 
 /**
+ * ext4_ext_defrag_victim - Create free space for defrag
+ * @filp  target file
+ * @ex_info   target extents array to move
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+static int ext4_ext_defrag_victim(struct file *target_filp,
+   struct ext4_extents_info *ex_info)
+{
+   struct inode *target_inode = target_filp->f_dentry->d_inode;
+   struct super_block *sb = target_inode->i_sb;
+   struct file victim_file;
+   struct dentry victim_dent;
+   struct inode *victim_inode;
+   ext4_fsblk_t goal = ex_info->goal;
+   int ret = 0;
+   int i = 0;
+   int flag = DEFRAG_RESERVE_BLOCKS_SECOND;
+   struct ext4_extent_data ext;
+   unsigned long group;
+   ext4_grpblk_t grp_off;
+
+   /* Setup dummy entent data */
+   ext.len = 0;
+
+   /* Get the inode of the victim file */
+   victim_inode = iget(sb, ex_info->ino);
+   if (!victim_inode)
+   return -EACCES;
+
+   /* Setup file for the victim file */
+   victim_dent.d_inode = victim_inode;
+   victim_file.f_dentry = &victim_dent;
+
+   /* Set the goal appropriate offset */
+   if (goal == -1) {
+   ext4_get_group_no_and_offset(victim_inode->i_sb,
+   ex_info->ext[0].start, &group, &grp_off);
+   goal = ext4_group_first_block_no(sb, group + 1);
+   }
+
+   for (i = 0; i < ex_info->entries; i++ ) {
+   /* Move original blocks to another block group */
+   if ((ret = ext4_ext_defrag(&victim_file, ex_info->ext[i].block,
+   ex_info->ext[i].len, goal, flag, &ext)) < 0)
+   goto ERR;
+
+   /* Sync journal blocks before reservation

[RFC][PATCH 5/10] Get all extents information of specified inode number

2007-06-20 Thread Takashi Sato
- Get all extents information of specified inode number to calculate
  the combination of extents which should be moved to other block group.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c   
2007-06-20 08:50:57.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c
2007-06-20 08:27:44.0 +0900
@@ -43,6 +43,12 @@
 #include 
 #include 
 
+#define DIO_CREDITS (EXT4_RESERVE_TRANS_BLOCKS + 32)
+#define EXT_SET_EXTENT_DATA(src, dest)  do {\
+   dest.block = le32_to_cpu(src->ee_block);\
+   dest.start = ext_pblock(src);\
+   dest.len = le16_to_cpu(src->ee_len);\
+   } while (0)
 /*
  * ext_pblock:
  * combine low and high parts of physical block number into ext4_fsblk_t
@@ -2479,6 +2485,121 @@ ext4_ext_next_extent(struct inode *inode
 }
 
 /**
+ * ext4_ext_extents_info() - get extents information
+ *
+ * @ext_info:   pointer to ext4_extents_info
+ *  @ext_info->ino  describe an inode which is used to get extent
+ *  information
+ *  @ext_info->max_entries: defined by DEFRAG_MAX_ENT
+ *  @ext_info->entries:amount of extents (output)
+ *  @ext_info->ext[]:   array of extent (output)
+ *  @ext_info->offset:  starting block offset of targeted extent
+ *  (file relative)
+ *
+ * @sb: for iget()
+ *
+ * This function returns 0 if next extent(s) exists,
+ * or returns 1 if next extent doesn't exist, otherwise returns error value.
+ * Called under truncate_mutex lock.
+ */
+static int ext4_ext_extents_info(struct ext4_extents_info *ext_info,
+   struct super_block *sb)
+{
+   struct ext4_ext_path *path = NULL;
+   struct ext4_extent *ext = NULL;
+   struct inode *inode = NULL;
+   unsigned long offset = ext_info->offset;
+   int max = ext_info->max_entries;
+   int is_last_extent = 0;
+   int depth = 0;
+   int entries = 0;
+   int err = 0;
+
+   inode = iget(sb, ext_info->ino);
+   if (!inode)
+   return -EACCES;
+
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+
+   /* if a file doesn't exist*/
+   if ((!inode->i_nlink) || (inode->i_ino < 12) ||
+   !S_ISREG(inode->i_mode)) {
+   ext_info->entries = 0;
+   err = -ENOENT;
+   goto out;
+   }
+
+   path = ext4_ext_find_extent(inode, offset, NULL);
+   if (IS_ERR(path)) {
+   err = PTR_ERR(path);
+   path = NULL;
+   goto out;
+   }
+   depth = ext_depth(inode);
+   ext = path[depth].p_ext;
+   EXT_SET_EXTENT_DATA(ext, ext_info->ext[entries]);
+   entries = 1;
+
+   /*
+* The ioctl can return 'max' ext4_extent_data per a call,
+* so if @inode has > 'max' extents, we must get away here.
+*/
+   while (entries < max) {
+   is_last_extent = ext4_ext_next_extent(inode, path, &ext);
+   /* found next extent (not the last one)*/
+   if (is_last_extent == 0) {
+   EXT_SET_EXTENT_DATA(ext, ext_info->ext[entries]);
+   entries++;
+
+   /*
+* If @inode has > 'max' extents,
+* this function should be called again,
+* (per a call, it can resolve only 'max' extents)
+* next time we have to start from 'max*n+1'th extent.
+*/
+   if (entries == max) {
+   ext_info->offset =
+   le32_to_cpu(ext->ee_block) +
+   le32_to_cpu(ext->ee_len);
+   /* check the extent is the last one or not*/
+   is_last_extent =
+   ext4_ext_next_extent(inode, path, &ext);
+   if (is_last_extent) {
+   is_last_extent = 1;
+   err = is_last_extent;
+   } else if (is_last_extent < 0) {
+   /*ERR*/
+   err = is_last_extent;
+   goto out;
+   

[RFC][PATCH 4/10] Get free blocks distribution of the target block group

2007-06-20 Thread Takashi Sato
- Get free blocks distribution of the target block group to know
  how many free blocks it has.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c
--- Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c   
2007-06-20 09:05:37.0 +0900
+++ Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c   
2007-06-20 08:50:57.0 +0900
@@ -2478,6 +2478,99 @@ ext4_ext_next_extent(struct inode *inode
return 1;
 }
 
+/**
+ * ext4_ext_fblocks_distribution - Search free block distribution
+ * @filp  target file
+ * @ex_info   ext4_extents_info
+ *
+ * This function returns 0 if succeeded, otherwise
+ * returns error value
+ */
+static int ext4_ext_fblocks_distribution(struct inode *inode,
+   struct ext4_extents_info *ext_info)
+{
+   handle_t *handle;
+   struct buffer_head *bitmap_bh = NULL;
+   struct super_block *sb = inode->i_sb;
+   struct ext4_super_block *es;
+   unsigned long group_no;
+   int max_entries = ext_info->max_entries;
+   ext4_grpblk_t blocks_per_group;
+   ext4_grpblk_t start;
+   ext4_grpblk_t end;
+   int num = 0;
+   int len = 0;
+   int i = 0;
+   int err = 0;
+   int block_set = 0;
+   int start_block = 0;
+
+   if (!sb) {
+   printk("ext4_ext_fblock_distribution: nonexitent device\n");
+   return -ENOSPC;
+   }
+   es = EXT4_SB(sb)->s_es;
+
+   group_no = (inode->i_ino -1) / EXT4_INODES_PER_GROUP(sb);
+   start = ext_info->offset;
+   blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
+   end = blocks_per_group -1;
+
+   handle = ext4_journal_start(inode, 1);
+   if (IS_ERR(handle)) {
+   err = PTR_ERR(handle);
+   return err;
+   }
+
+   bitmap_bh = read_block_bitmap(sb, group_no);
+   if (!bitmap_bh) {
+   err = -EIO;
+   goto out;
+   }
+
+   BUFFER_TRACE(bitmap_bh, "get undo access for new block");
+   err = ext4_journal_get_undo_access(handle, bitmap_bh);
+   if (err)
+   goto out;
+
+   for (i = start; i <= end ; i++) {
+   if (bitmap_search_next_usable_block(i, bitmap_bh, i+1) >= 0) {
+   len++;
+   /* if the free block is the first one in a region */
+   if (!block_set) {
+   start_block =
+   i + group_no * blocks_per_group;
+   block_set = 1;
+   }
+   } else if (len) {
+   ext_info->ext[num].start = start_block;
+   ext_info->ext[num].len = len;
+   num++;
+   len = 0;
+   block_set = 0;
+   if (num == max_entries) {
+   ext_info->offset = i + 1;
+   break;
+   }
+   }
+   if ((i == end) && len) {
+   ext_info->ext[num].start = start_block;
+   ext_info->ext[num].len = len;
+   num++;
+   }
+   }
+
+   ext_info->entries = num;
+out:
+   ext4_journal_release_buffer(handle, bitmap_bh);
+   brelse(bitmap_bh);
+
+   if (handle)
+   ext4_journal_stop(handle);
+
+   return err;
+}
+
 int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg)
 {
@@ -2545,6 +2638,21 @@ int ext4_ext_ioctl(struct inode *inode, 
if (copy_to_user((struct ext4_group_data_info *)arg,
&grp_data, sizeof(grp_data)))
return -EFAULT;
+   } else if (cmd == EXT4_IOC_FREE_BLOCKS_INFO) {
+   struct ext4_extents_info ext_info;
+
+   if (copy_from_user(&ext_info,
+   (struct ext4_extents_info __user *)arg,
+   sizeof(ext_info)))
+   return -EFAULT;
+
+   BUG_ON(ext_info.ino != inode->i_ino);
+
+   err = ext4_ext_fblocks_distribution(inode, &ext_info);
+
+   if (!err)
+   err = copy_to_user((struct ext4_extents_info*)arg,
+   &ext_info, sizeof(ext_info));
} else if (cmd == EXT4_IOC_DEFRAG) {
struct ext4_ext_defrag_data defrag;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 2/10] Move the file data to the new blocks

2007-06-20 Thread Takashi Sato
Move the blocks on the temporary inode to the original inode
by a page.
1. Read the file data from the old blocks to the page
2. Move the block on the temporary inode to the original inode
3. Write the file data on the page into the new blocks

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6-Alex/Documentation/dontdiff 
linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c 
linux-2.6.19-rc6-2-move/fs/ext4/extents.c
--- linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c  2007-06-20 10:54:11.0 
+0900
+++ linux-2.6.19-rc6-2-move/fs/ext4/extents.c   2007-06-20 11:00:45.0 
+0900
@@ -2533,6 +2533,565 @@ int ext4_ext_ioctl(struct inode *inode, 
 }
 
 /**
+ * ext4_ext_merge_across - merge extents across leaf block
+ *
+ * @handle journal handle
+ * @inode  target file's inode
+ * @o_startfirst original extent to be defraged
+ * @o_end  last original extent to be defraged
+ * @start_ext  first new extent to be merged
+ * @new_extmiddle of new extent to be merged
+ * @end_extlast new extent to be merged
+ *
+ * This function returns 0 if succeed, otherwise returns error value.
+ */
+static int
+ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode,
+   struct ext4_extent *o_start,
+   struct ext4_extent *o_end, struct ext4_extent *start_ext,
+   struct ext4_extent *new_ext, struct ext4_extent *end_ext,
+   int flag)
+{
+   struct ext4_ext_path *org_path = NULL;
+   unsigned long eblock = 0;
+   int err = 0;
+   int new_flag = 0;
+   int end_flag = 0;
+   int defrag_flag;
+
+   if (flag == DEFRAG_RESERVE_BLOCKS_SECOND)
+   defrag_flag = 1;
+   else
+   defrag_flag = 0;
+
+   if (le16_to_cpu(start_ext->ee_len) &&
+   le16_to_cpu(new_ext->ee_len) &&
+   le16_to_cpu(end_ext->ee_len)) {
+
+   if ((o_start) == (o_end)) {
+
+   /*   start_ext   new_extend_ext
+* dest |-|---||
+* org  |--|
+*/
+
+   end_flag = 1;
+   } else {
+
+   /*   start_ext   new_ext   end_ext
+* dest |-|--|-|
+* org  |---|--|
+*/
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+   }
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (!le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /*   start_ext  new_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((!le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /*new_ext   end_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+
+   /* If new_ext was first block */
+   if (!new_ext->ee_block)
+   eblock = 0;
+   else
+   eblock = le32_to_cpu(new_ext->ee_block);
+
+   new_flag = 1;
+   } else {
+   printk("Unexpected case \n");
+   return -EIO;
+   }
+
+   if (new_flag) {
+   org_path = ext4_ext_find_extent(inode, eblock, NULL);
+   if (IS_ERR(org_path)) {
+   err = PTR_ERR(org_path);
+   org_path = NULL;
+   goto ERR;
+   }
+   err = ext4_ext_insert_extent_defrag(handle, inode,
+   org_path, new_ext, defrag_flag);
+   if (err)
+   goto ERR;
+   }
+
+   if (end_flag) {
+   org_path = ext4_ext_find_extent(inode,
+   e

[RFC][PATCH 3/10] Get block group information

2007-06-20 Thread Takashi Sato
- Get s_blocks_per_group and s_inodes_per_group of target filesystem.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test1/fs/ext4/balloc.c 
Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/balloc.c
--- linux-2.6.19-rc6-test1/fs/ext4/balloc.c 2007-06-20 15:15:46.0 
+0900
+++ Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/balloc.c
2007-06-20 14:57:04.0 +0900
@@ -216,7 +216,7 @@ restart:
  * If the goal block is within the reservation window, return 1;
  * otherwise, return 0;
  */
-static int
+int
 goal_in_my_reservation(struct ext4_reserve_window *rsv, ext4_grpblk_t grp_goal,
unsigned int group, struct super_block * sb)
 {
@@ -336,7 +336,7 @@ static void rsv_window_remove(struct sup
  *
  * returns 1 if the end block is EXT4_RESERVE_WINDOW_NOT_ALLOCATED.
  */
-static inline int rsv_is_empty(struct ext4_reserve_window *rsv)
+inline int rsv_is_empty(struct ext4_reserve_window *rsv)
 {
/* a valid reservation end block could not be 0 */
return rsv->_rsv_end == EXT4_RESERVE_WINDOW_NOT_ALLOCATED;
@@ -660,7 +660,7 @@ static int ext4_test_allocatable(ext4_gr
  * bitmap on disk and the last-committed copy in journal, until we find a
  * bit free in both bitmaps.
  */
-static ext4_grpblk_t
+ext4_grpblk_t
 bitmap_search_next_usable_block(ext4_grpblk_t start, struct buffer_head *bh,
ext4_grpblk_t maxblocks)
 {
@@ -1029,7 +1029,7 @@ static int find_next_reservable_window(
  * @bitmap_bh: the block group block bitmap
  *
  */
-static int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv,
+int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv,
ext4_grpblk_t grp_goal, struct super_block *sb,
unsigned int group, struct buffer_head *bitmap_bh)
 {
@@ -1173,7 +1173,7 @@ retry:
  * expand the reservation window size if necessary on a best-effort
  * basis before ext4_new_blocks() tries to allocate blocks,
  */
-static void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv,
+void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv,
struct super_block *sb, int size)
 {
struct ext4_reserve_window_node *next_rsv;
diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr 
linux-2.6.19-rc6-test1/fs/ext4/extents.c 
Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c
--- linux-2.6.19-rc6-test1/fs/ext4/extents.c2007-06-20 15:42:15.0 
+0900
+++ Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c   
2007-06-20 15:50:14.0 +0900
@@ -43,7 +43,6 @@
 #include 
 #include 
 
-
 /*
  * ext_pblock:
  * combine low and high parts of physical block number into ext4_fsblk_t
@@ -206,11 +205,17 @@ static ext4_fsblk_t ext4_ext_find_goal(s
 static ext4_fsblk_t
 ext4_ext_new_block(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *ex, int *err)
+   struct ext4_extent *ex, int *err,
+   ext4_fsblk_t defrag_goal)
 {
ext4_fsblk_t goal, newblock;
 
-   goal = ext4_ext_find_goal(inode, path, le32_to_cpu(ex->ee_block));
+   if (defrag_goal) {
+   goal = defrag_goal;
+   } else {
+   goal= ext4_ext_find_goal(inode, path, 
+   le32_to_cpu(ex->ee_block));
+   }
newblock = ext4_new_block(handle, inode, goal, err);
return newblock;
 }
@@ -598,7 +603,8 @@ static int ext4_ext_insert_index(handle_
  */
 static int ext4_ext_split(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   struct ext4_extent *newext, int at)
+   struct ext4_extent *newext, int at,
+   ext4_fsblk_t defrag_goal)
 {
struct buffer_head *bh = NULL;
int depth = ext_depth(inode);
@@ -649,7 +655,8 @@ static int ext4_ext_split(handle_t *hand
/* allocate all needed blocks */
ext_debug("allocate %d blocks for indexes/leaf\n", depth - at);
for (a = 0; a < depth - at; a++) {
-   newblock = ext4_ext_new_block(handle, inode, path, newext, 
&err);
+   newblock = ext4_ext_new_block(handle, inode, path, newext, &err,
+   defrag_goal);
if (newblock == 0)
goto cleanup;
ablocks[a] = newblock;
@@ -836,7 +843,8 @@ cleanup:
  */
 static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path,
-   

[RFC][PATCH 1/10] Allocate new contiguous blocks

2007-06-20 Thread Takashi Sato
Search contiguous free blocks with Alex's mutil-block allocation
and allocate them for the temporary inode.

This patch applies on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Akira Fujita <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6-Alex/Documentation/dontdiff 
linux-2.6.19-rc6-Alex/fs/ext4/extents.c 
linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c
--- linux-2.6.19-rc6-Alex/fs/ext4/extents.c 2007-06-19 20:50:56.0 
+0900
+++ linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c  2007-06-20 10:54:11.0 
+0900
@@ -2335,6 +2335,713 @@ int ext4_ext_calc_metadata_amount(struct
return num;
 }
 
+/*
+ * this structure is used to gather extents from the tree via ioctl
+ */
+struct ext4_extent_buf {
+   ext4_fsblk_t start;
+   int buflen;
+   void *buffer;
+   void *cur;
+   int err;
+};
+
+/*
+ * this structure is used to collect stats info about the tree
+ */
+struct ext4_extent_tree_stats {
+   int depth;
+   int extents_num;
+   int leaf_num;
+};
+
+static int
+ext4_ext_store_extent_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *newex,
+   struct ext4_extent_buf *buf)
+{
+
+   if (newex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   if (buf->err < 0)
+   return EXT_BREAK;
+   if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen)
+   return EXT_BREAK;
+
+   if (!copy_to_user(buf->cur, newex, sizeof(*newex))) {
+   buf->err++;
+   buf->cur += sizeof(*newex);
+   } else {
+   buf->err = -EFAULT;
+   return EXT_BREAK;
+   }
+   return EXT_CONTINUE;
+}
+
+static int
+ext4_ext_collect_stats_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *ex,
+   struct ext4_extent_tree_stats *buf)
+{
+   int depth;
+
+   if (ex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   depth = ext_depth(inode);
+   buf->extents_num++;
+   if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr))
+   buf->leaf_num++;
+   return EXT_CONTINUE;
+}
+
+/**
+ * ext4_ext_next_extent - search for next extent and set it to "extent"
+ * @inode: inode of the the original file
+ * @path:  this will obtain data for next extent
+ * @extent:pointer to next extent we have just gotten
+ *
+ * This function returns 0 or 1(last_entry) if succeeded, otherwise
+ * returns -EIO
+ */
+static int
+ext4_ext_next_extent(struct inode *inode,
+struct ext4_ext_path *path,
+struct ext4_extent **extent)
+{
+   int ppos;
+   int leaf_ppos = path->p_depth;
+
+   ppos = leaf_ppos;
+   if (EXT_LAST_EXTENT(path[ppos].p_hdr) > path[ppos].p_ext) {
+   /* leaf block */
+   *extent = ++path[ppos].p_ext;
+   return 0;
+   }
+
+   while (--ppos >= 0) {
+   if (EXT_LAST_INDEX(path[ppos].p_hdr) >
+   path[ppos].p_idx) {
+   int cur_ppos = ppos;
+
+   /* index block */
+   path[ppos].p_idx++;
+   path[ppos].p_block =
+   idx_pblock(path[ppos].p_idx);
+   if (path[ppos+1].p_bh)
+   brelse(path[ppos+1].p_bh);
+   path[ppos+1].p_bh =
+   sb_bread(inode->i_sb, path[ppos].p_block);
+   if (!path[ppos+1].p_bh)
+   return  -EIO;
+   path[ppos+1].p_hdr =
+   ext_block_hdr(path[ppos+1].p_bh);
+
+   /* halfway index block */
+   while (++cur_ppos < leaf_ppos) {
+   path[cur_ppos].p_idx =
+   EXT_FIRST_INDEX(path[cur_ppos].p_hdr);
+   path[cur_ppos].p_block =
+   idx_pblock(path[cur_ppos].p_idx);
+   if (path[cur_ppos+1].p_bh)
+   brelse(path[cur_ppos+1].p_bh);
+   path[cur_ppos+1].p_bh = sb_bread(inode->i_sb,
+   path[cur_ppos].p_block);
+   if (!path[cur_ppos+1].p_bh)
+   return  -EIO;
+   path[cur_ppos+1].p_hdr =
+   ext_block_hdr(path[cur_ppos+1].p

[RFC][PATCH 0/10] ext4 online defrag (ver 0.5)

2007-06-20 Thread Takashi Sato
Hi all,

I have updated my online defrag patchset for addition of a new function.
This function is defragmentation for free space.
If filesytem has insufficient contiguous free blocks, defrag tries to move
other files to make sufficient space and reallocates the contiguous blocks
for the target file.

This function can be used in the following fashion:
# e4defrag -f filename [blockno]

For create contiguous free blocks, reallocate target file to
the block group to which its inode belongs.
If set "blockno", defrag tries to move other files (except target file)
to indicated physical block offset, otherwise defrag tries to move them to 
the next block group to which its inode belongs.

Maximum of the target file size is same as capable maximum
size of one block group.

This time I add 6 ioctls for new function 
and they are used in order of the following.

Additional ioctl: 
- EXT4_IOC_GROUP_INFO
- EXT4_IOC_FREE_BLOCKS_INFO
- EXT4_IOC_EXTENTS_INFO
- EXT4_IOC_MOVE_VICTIM
- EXT4_IOC_RESERVE_BLOCK
- EXT4_IOC_BLOCK_RELEASE

1). Get s_blocks_per_group and s_inodes_per_group of target file.
(EXT4_IOC_GROUP_INFO)

In userspace, calculate block group number to which target file belongs with
the result of "1".

2). Get free blocks information of the target block group.
(EXT4_IOC_FREE_BLOCKS_INFO)
Read block bitmap of target block group then set
free block distribution to ext4_extents_info structure
as extents array. Finally return it to userspace.

3). Get all extents information of indicated inode number.
(EXT4_IOC_EXTENTS_INFO)
Set extents information of indicated inode number
to ext4_extent_info structure then return it to userspace. 

In userspace, call ioctl(EXT4_IOC_EXTENTS_INFO) for all of inodes
in the target group and calculate the combination of extents
which should be moved to other block group with the results of 2) and 3).
Its size will be same as target file's.

4). Move combination of extents from the target block group
to other block group to make free contiguous area in the target block group.
(EXT4_IOC_MOVE_VICTIM)

5). Reserve freed blocks of the target block group.
(EXT4_IOC_RESERVE_BLOCK)

6). Reallocate target file to reserved contiguous blocks with ext4_ext_defrag().
(EXT4_IOC_DEFRAG)

Current status:
These patches are at the experimental stage so they have issues and
items to improve. But these are worth enough to examine my trial.

Dependencies:
My patches depend on the following Alex's patches of the multi-block
allocation for Linux 2.6.19-rc6.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Outstanding issues:
Nothing for the moment.

Items to improve:
- Optimize the depth of extent tree and the number of leaf nodes
  after defragmentation.
- The blocks on the temporary inode are moved to the original inode
  by a page in the current implementation.  I have to tune
  the pages unit for the performance.
- Update the base kernel version when Alex's multi-block allocation patch
  is updated. 

Next steps:
- Make carry out movement of data as atomic transaction.
- Reduce the defrag influence upon other process with fadvice().


Summary of patches:
*These patches apply on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

[PATCH 1/10] Allocate new contiguous blocks with Alex's mballoc
- Search contiguous free blocks and allocate them for the temporary
  inode with Alex's multi-block allocation.

[PATCH 2/10] Move the file data to the new blocks
- Move the blocks on the temporary inode to the original inode
  by a page.

[PATCH 3/10] Get block group information
- Get s_blocks_per_group and s_inodes_per_group of target filesystem.

[PATCH 4/10] Get free blocks distribution of the target block group
- Get free blocks distribution of the target block group to know
  how many free blocks it has.

[PATCH 5/10] Get all extents information of indicated inode number
- Get all extents information of indicated inode number to calculate
the combination of extents which should be moved to other block group.

[PATCH 6/10] Move files from target block group to other block group
- To make contiguous free blocks, move files from the target block group
  to other block group. 

[PATCH 7/10] Reserve freed blocks
- Reserve the free blocks in the target area, not to be
  used by other process

[PATCH 8/10] Release reserved blocks
- Release reserved blocks if defrag failed.

[PATCH 9/10] Fix bugs in multi-block allocation and locality-group
- Move lg_list to s_locality_dirty in ext4_lg_sync_single_group()
  to flush all of dirty inodes.
- Fix ext4_mb_new_blocks() to return err value when defrag failed.

[PATCH 10/10] Online defrag command
- The defrag command.  Usage is as follows:
  o Put the multiple files closer together.
# e4defrag -r directory-name
  o Defrag for free space fragmentation.
# e4defrag -f file-name
  o Defrag for a single file.
# e4defrag file-name
  o

[RFC][PATCH 2/4] Move the file data to the new blocks

2007-04-26 Thread Takashi Sato
Move the blocks on the temporary inode to the original inode
by a page.
1. Read the file data from the old blocks to the page
2. Move the block on the temporary inode to the original inode
3. Write the file data on the page into the new blocks

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6_move_data/Documentation/dontdiff 
linux-2.6.19-rc6_move_data/fs/ext4/extents.c 
linux-2.6.19-rc6-full-except_lg/fs/ext4/extents.c
--- linux-2.6.19-rc6_move_data/fs/ext4/extents.c2007-04-26 
19:36:34.0 +0900
+++ linux-2.6.19-rc6-full-except_lg/fs/ext4/extents.c   2007-04-26 
19:17:59.0 +0900
@@ -2533,6 +2533,610 @@ ext4_ext_next_extent(struct inode *inode
 }
 
 /**
+ * ext4_ext_merge_across - merge extents across leaf block
+ *
+ * @handle journal handle
+ * @inode  target file's inode
+ * @o_startfirst original extent to be defraged
+ * @o_end  last original extent to be defraged
+ * @start_ext  first new extent to be merged
+ * @new_extmiddle of new extent to be merged
+ * @end_extlast new extent to be merged
+ *
+ * This function returns 0 if succeed, otherwise returns error value.
+ */
+static int
+ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode,
+   struct ext4_extent *o_start,
+   struct ext4_extent *o_end, struct ext4_extent *start_ext,
+   struct ext4_extent *new_ext,struct ext4_extent *end_ext)
+{
+   struct ext4_ext_path *org_path = NULL;
+   unsigned long eblock = 0;
+   int err = 0;
+   int new_flag = 0;
+   int end_flag = 0;
+
+   if (le16_to_cpu(start_ext->ee_len) &&
+   le16_to_cpu(new_ext->ee_len) &&
+   le16_to_cpu(end_ext->ee_len)) {
+
+   if ((o_start) == (o_end)) {
+
+   /*   start_ext   new_extend_ext
+* dest |-|---||
+* org  |--|
+*/
+
+   end_flag = 1;
+   } else {
+
+   /*   start_ext   new_ext   end_ext
+* dest |-|--|-|
+* org  |---|--|
+*/
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+   }
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (!le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /* start_extnew_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((!le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /*  new_extend_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+
+   /* If new_ext was first block */
+   if (!new_ext->ee_block)
+   eblock = 0;
+   else
+   eblock = le32_to_cpu(new_ext->ee_block);
+
+   new_flag = 1;
+   } else {
+   printk("Unexpected case \n");
+   return -EIO;
+   }
+
+   if (new_flag) {
+   org_path = ext4_ext_find_extent(inode, eblock, NULL);
+   if (IS_ERR(org_path)) {
+   err = PTR_ERR(org_path);
+   org_path = NULL;
+   goto ERR;
+   }
+   err = ext4_ext_insert_extent(handle, inode,
+   org_path, new_ext);
+   if (err)
+   goto ERR;
+   }
+
+   if (end_flag) {
+   org_path = ext4_ext_find_extent(inode,
+   end_ext->ee_block -1, org_path);
+   if (IS_ERR(org_path)) {
+   err = PTR_ERR(org_path);
+   org_path = NULL;
+   goto ERR;
+   

[RFC][PATCH 3/4] Online defrag command

2007-04-26 Thread Takashi Sato
The defrag command.  Usage is as follows:
o Put the multiple files closer together.
  # e4defrag -r directory-name
o Defrag for a single file.
  # e4defrag file-name
o Defrag for all files on ext4.
  # e4defrag device-name

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
/*
 * e4defrag, ext4 filesystem defragmenter
 *
 */

#ifndef _LARGEFILE_SOURCE
#define _LARGEFILE_SOURCE
#endif

#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE
#endif

#define _XOPEN_SOURCE   500
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define EXT4_SUPER_MAGIC0xEF53 /* magic number for ext4 */
#define DEFRAG_PAGES128 /* the number of pages to defrag at one 
time */
#define MAX_BLOCKS_LEN  16384 /* Maximum length of contiguous blocks */

/* data type for filesystem-wide blocks number */
#define  ext4_fsblk_t unsigned long long

/* ioctl command */
#define EXT4_IOC_GET_DATA_BLOCK _IOW('f', 9, ext4_fsblk_t)
#define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data)

#define DEVNAME 0
#define DIRNAME 1
#define FILENAME2

#define RETURN_OK   0
#define RETURN_NG   -1
#define FTW_CONT0
#define FTW_STOP-1
#define FTW_OPEN_FD 2000
#define FILE_CHK_OK 0
#define FILE_CHK_NG -1
#define FS_EXT4 "ext4dev"
#define ROOT_UID0
/* defrag block size, in bytes */
#define DEFRAG_SIZE 67108864

#define min(x,y) (((x) > (y)) ? (y) : (x))

#define PRINT_ERR_MSG(msg)  fprintf(stderr, "%s\n", (msg));
#define PRINT_FILE_NAME(file)   \
fprintf(stderr, "\t\t\"%s\"\n", (file));

#define MSG_USAGE   \
"Usage : e4defrag [-v] file...| directory...| device...\n  
: e4defrag [-r] \
directory... | device... \n"
#define MSG_R_OPTION\
" with regional block allocation mode.\n"
#define NGMSG_MTAB  \
"\te4defrag  : Can not access /etc/mtab."
#define NGMSG_UNMOUNT   "\te4defrag  : FS is not mounted."
#define NGMSG_EXT4  \
"\te4defrag  : FS is not ext4 File System."
#define NGMSG_FS_INFO   "\te4defrag  : get FSInfo fail."
#define NGMSG_FILE_INFO "\te4defrag  : get FileInfo fail."
#define NGMSG_FILE_OPEN "\te4defrag  : open fail."
#define NGMSG_FILE_SYNC "\te4defrag  : sync(fsync) fail."
#define NGMSG_FILE_DEFRAG   "\te4defrag  : defrag fail."
#define NGMSG_FILE_BLOCKSIZE"\te4defrag  : can't get blocksize."
#define NGMSG_FILE_DATA "\te4defrag  : can't get data."
#define NGMSG_FILE_UNREG\
"\te4defrag  : File is not regular file."
#define NGMSG_FILE_LARGE\
"\te4defrag  : Defrag size is larger than FileSystem's free space."
#define NGMSG_FILE_PRIORITY \
"\te4defrag  : File is not current user's file or current user is not root."
#define NGMSG_FILE_LOCK "\te4defrag  : File is locked."
#define NGMSG_FILE_BLANK"\te4defrag  : File size is 0."
#define NGMSG_GET_LCKINFO   "\te4defrag  : get LockInfo fail."
#define NGMSG_TYPE  \
"e4defrag  : Can not process %s."

struct ext4_ext_defrag_data {
ext4_fsblk_t start_offset; /* start offset to defrag in blocks */
ext4_fsblk_t defrag_size;  /* size of defrag in blocks */
ext4_fsblk_t goal;   /* block offset for allocation */
};

int detail_flag = 0;
int regional_flag = 0;
int amount_cnt = 0;
int succeed_cnt = 0;
ext4_fsblk_tgoal = 0;

/*
 * Check if there's enough disk space
 */
int
check_free_size(int fd, off64_t fsize)
{
struct statfs   fsbuf;
off64_t file_asize = 0;

if (-1 == fstatfs(fd, &fsbuf)) {
if (detail_flag) {
perror(NGMSG_FS_INFO);
}
return RETURN_NG;
}

/* compute free space for root and normal user separately */
if (ROOT_UID == getuid())
file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bfree;
else
file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bavail;

if (file_asize >= fsize)
return RETURN_

ext4 online defrag (ver 0.4)

2007-04-26 Thread Takashi Sato
Hi all,

I have made following changes to the previous online defrag patchset
to improve it. Note that there is no functional change.

1. Change the handling of temporary inode.
   Now ext4_ext_defrag() calls ext4_new_inode()/iput() pair instead of
   new_inode()/delete_ext_defrag_inode(). Because new_inode() does not
   initialize all of entries that I need such as i_extra_isize.

2. Change how to swap blocks.
   In this patchset, the original blocks of the target file are swapped
   with temporary inode carefully to release them in iput().

3. Add an exclusive lock.
   Now ext4_inode_info.truncate_mutex is locked while the file being
   defragmented.

4. Add marking locality group as dirty.
   The lg is moved to s_locality_dirty list and marked as dirty
   if nr_to_write (total page count which has not written in disk yet)
   is 0 or less and lg_io is not empty in ext4_lg_sync_single_group(). 
   This makes sure that inode is written to disk.

Current status:
These patches are at the experimental stage so they have issues and
items to improve. But these are worth enough to examine my trial.

Dependencies:
My patches depend on the following Alex's patches of the multi-block
allocation for Linux 2.6.19-rc6.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Outstanding issues:
Nothing for the moment.

Items to improve:
- Optimize the depth of extent tree and the number of leaf nodes.
  after defragmentation.
- The blocks on the temporary inode are moved to the original inode
  by a page in the current implementation.  I have to tune
  the pages unit for the performance.
- Support indirect block file.

Next steps:
- Defragmentation for free space fragmentation.
  If filesytem has insufficient contiguous blocks, move other files
  to make sufficient space and allocate the contiguous blocks for
  the target file.

Summary of patches:
*These patches apply on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

[PATCH 1/4] Allocate new contiguous blocks with Alex's mballoc
- Search contiguous free blocks and allocate them for the temporary
  inode with Alex's multi-block allocation.

[PATCH 2/4] Move the file data to the new blocks
- Move the blocks on the temporary inode to the original inode
  by a page.

[PATCH 3/4] Online defrag command
- The defrag command.  Usage is as follows:
  o Put the multiple files closer together.
# e4defrag -r directory-name
  o Defrag for a single file.
# e4defrag file-name
  o Defrag for all files on ext4.
# e4defrag device-name

[PATCH 4/4] ext4_locality_group bug fix 
- Move lg_list to s_locality_dirty in ext4_lg_sync_single_group()
  to flush all of dirty inodes.

Any comments from reviews or tests are very welcome.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 4/4] ext4_locality_group bug fix

2007-04-26 Thread Takashi Sato
Move lg_list to s_locality_dirty and mark lg as dirty
if nr_to_write(total page count which has not written in disk yet)
is 0 or less and lg_io is not empty in ext4_lg_sync_single_group(). 
This makes sure that inode is written to disk.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6-lg/Documentation/dontdiff 
linux-2.6.19-rc6-lg/fs/ext4/lg.c linux-2.6.19-rc6-full/fs/ext4/lg.c
--- linux-2.6.19-rc6-lg/fs/ext4/lg.c2007-04-26 19:55:37.0 +0900
+++ linux-2.6.19-rc6-full/fs/ext4/lg.c  2007-04-26 19:17:59.0 +0900
@@ -389,6 +389,10 @@ int ext4_lg_sync_single_group(struct sup
cond_resched();
spin_lock(&inode_lock);
if (wbc->nr_to_write <= 0) {
+   if (!list_empty(&lg->lg_io)) {
+   set_bit(EXT4_LG_DIRTY, &lg->lg_flags);
+   list_move(&lg->lg_list, &sbi->s_locality_dirty);
+   }
rc = EXT4_STOP_WRITEBACK;
code = 6;
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 1/4] Allocate new contiguous blocks

2007-04-26 Thread Takashi Sato
Search contiguous free blocks with Alex's mutil-block allocation
and allocate them for the temporary inode.

This patch applies on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6-alloc_block/Documentation/dontdiff 
linux-2.6.19-rc6-full/fs/ext4/extents.c 
linux-2.6.19-rc6-alloc_block/fs/ext4/extents.c
--- linux-2.6.19-rc6-full/fs/ext4/extents.c 2007-04-26 20:36:50.0 
+0900
+++ linux-2.6.19-rc6-alloc_block/fs/ext4/extents.c  2007-04-26 
20:36:05.0 +0900
@@ -2335,10 +2335,635 @@ int ext4_ext_calc_metadata_amount(struct
return num;
 }
 
+/*
+ * this structure is used to gather extents from the tree via ioctl
+ */
+struct ext4_extent_buf {
+   ext4_fsblk_t start;
+   int buflen;
+   void *buffer;
+   void *cur;
+   int err;
+};
+
+/*
+ * this structure is used to collect stats info about the tree
+ */
+struct ext4_extent_tree_stats {
+   int depth;
+   int extents_num;
+   int leaf_num;
+};
+
+static int
+ext4_ext_store_extent_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *newex,
+   struct ext4_extent_buf *buf)
+{
+
+   if (newex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   if (buf->err < 0)
+   return EXT_BREAK;
+   if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen)
+   return EXT_BREAK;
+
+   if (!copy_to_user(buf->cur, newex, sizeof(*newex))) {
+   buf->err++;
+   buf->cur += sizeof(*newex);
+   } else {
+   buf->err = -EFAULT;
+   return EXT_BREAK;
+   }
+   return EXT_CONTINUE;
+}
+
+static int
+ext4_ext_collect_stats_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *ex,
+   struct ext4_extent_tree_stats *buf)
+{
+   int depth;
+
+   if (ex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   depth = ext_depth(inode);
+   buf->extents_num++;
+   if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr))
+   buf->leaf_num++;
+   return EXT_CONTINUE;
+}
+
+int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
+   unsigned long arg)
+{
+   int err = 0;
+   if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+   return -EINVAL;
+
+   if (cmd == EXT4_IOC_GET_EXTENTS) {
+   struct ext4_extent_buf buf;
+
+   if (copy_from_user(&buf, (void *) arg, sizeof(buf)))
+   return -EFAULT;
+
+   buf.cur = buf.buffer;
+   buf.err = 0;
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   err = ext4_ext_walk_space(inode, buf.start, EXT_MAX_BLOCK,
+   (void *)ext4_ext_store_extent_cb, &buf);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   if (err == 0)
+   err = buf.err;
+   } else if (cmd == EXT4_IOC_GET_TREE_STATS) {
+   struct ext4_extent_tree_stats buf;
+
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   buf.depth = ext_depth(inode);
+   buf.extents_num = 0;
+   buf.leaf_num = 0;
+   err = ext4_ext_walk_space(inode, 0, EXT_MAX_BLOCK,
+   (void *)ext4_ext_collect_stats_cb, &buf);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   if (!err)
+   err = copy_to_user((void *) arg, &buf, sizeof(buf));
+   } else if (cmd == EXT4_IOC_GET_TREE_DEPTH) {
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   err = ext_depth(inode);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   } else if (cmd == EXT4_IOC_FIBMAP) {
+   ext4_fsblk_t __user *p = (ext4_fsblk_t __user *)arg;
+   ext4_fsblk_t block = 0;
+   struct address_space *mapping = filp->f_mapping;
+
+   if (copy_from_user(&block, (ext4_fsblk_t __user *)arg,
+   sizeof(block)))
+   return -EFAULT;
+
+   lock_kernel();
+   block = ext4_bmap(mapping, block);
+   unlock_kernel();
+
+   return put_user(block, p);
+   } else if (cmd == EXT4_IOC_DEFRAG) {
+   struct ext4_ext_defrag_data defrag;
+
+   if (copy_from_user(&defrag,
+   (struct ext4_ext_defrag_data __user *)arg,
+ 

Re: [RFC][PATCH 3/3] Online defrag command

2007-02-09 Thread Takashi Sato

Hi,


On Thu, Feb 08 2007, Takashi Sato wrote:

The defrag command.  Usage is as follows:
o Put the multiple files closer together.
  # e4defrag -r directory-name
o Defrag for a single file.
  # e4defrag file-name
o Defrag for all files on ext4.
  # e4defrag device-name


Would it be possible to provide support for putting multiple files close
together? Ala

# e4defrag file1 file2 file3 ... fileN


e4defrag cannot do it in my current implementation.
I will consider its implementation on my later version.

Alternatively, you can do it if you link those files with a directory.
# ln file1 file2 file3 ... fileN directory-name
# e4defrag -r directory-name


I'm thinking boot speedup, gather the list of read files and put them
close on disk.


I think so.  It's my final goal.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 3/3] Online defrag command

2007-02-08 Thread Takashi Sato
The defrag command.  Usage is as follows:
o Put the multiple files closer together.
  # e4defrag -r directory-name
o Defrag for a single file.
  # e4defrag file-name
o Defrag for all files on ext4.
  # e4defrag device-name

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
/*
 * e4defrag, ext4 filesystem defragmenter
 *
 */

#ifndef _LARGEFILE_SOURCE
#define _LARGEFILE_SOURCE
#endif

#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE
#endif

#define _XOPEN_SOURCE   500
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define EXT4_SUPER_MAGIC0xEF53 /* magic number for ext4 */
#define DEFRAG_PAGES128 /* the number of pages to defrag at one 
time */
#define MAX_BLOCKS_LEN  16384 /* Maximum length of contiguous blocks */

/* data type for filesystem-wide blocks number */
#define  ext4_fsblk_t unsigned long long

/* ioctl command */
#define EXT4_IOC_GET_DATA_BLOCK _IOW('f', 9, ext4_fsblk_t)
#define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data)

#define DEVNAME 0
#define DIRNAME 1
#define FILENAME2

#define RETURN_OK   0
#define RETURN_NG   -1
#define FTW_CONT0
#define FTW_STOP-1
#define FTW_OPEN_FD 2000
#define FILE_CHK_OK 0
#define FILE_CHK_NG -1
#define FS_EXT4 "ext4dev"
#define ROOT_UID0
/* defrag block size, in bytes */
#define DEFRAG_SIZE 67108864

#define min(x,y) (((x) > (y)) ? (y) : (x))

#define PRINT_ERR_MSG(msg)  fprintf(stderr, "%s\n", (msg));
#define PRINT_FILE_NAME(file)   \
fprintf(stderr, "\t\t\"%s\"\n", (file));

#define MSG_USAGE   \
"Usage : e4defrag [-v] file...| directory...| device...\n  
: e4defrag [-r] directory... | device... \n"
#define MSG_R_OPTION\
" with regional block allocation mode.\n"
#define NGMSG_MTAB  \
"\te4defrag  : Can not access /etc/mtab."
#define NGMSG_UNMOUNT   "\te4defrag  : FS is not mounted."
#define NGMSG_EXT4  \
"\te4defrag  : FS is not ext4 File System."
#define NGMSG_FS_INFO   "\te4defrag  : get FSInfo fail."
#define NGMSG_FILE_INFO "\te4defrag  : get FileInfo fail."
#define NGMSG_FILE_OPEN "\te4defrag  : open fail."
#define NGMSG_FILE_SYNC "\te4defrag  : sync(fsync) fail."
#define NGMSG_FILE_DEFRAG   "\te4defrag  : defrag fail."
#define NGMSG_FILE_BLOCKSIZE"\te4defrag  : can't get blocksize."
#define NGMSG_FILE_DATA "\te4defrag  : can't get data."
#define NGMSG_FILE_UNREG\
"\te4defrag  : File is not regular file."
#define NGMSG_FILE_LARGE\
"\te4defrag  : Defrag size is larger than FileSystem's free space."
#define NGMSG_FILE_PRIORITY \
"\te4defrag  : File is not current user's file or current user is not root."
#define NGMSG_FILE_LOCK "\te4defrag  : File is locked."
#define NGMSG_FILE_BLANK"\te4defrag  : File size is 0."
#define NGMSG_GET_LCKINFO   "\te4defrag  : get LockInfo fail."
#define NGMSG_TYPE  \
"e4defrag  : Can not process %s."

struct ext4_ext_defrag_data {
ext4_fsblk_t start_offset; /* start offset to defrag in blocks */
ext4_fsblk_t defrag_size;  /* size of defrag in blocks */
ext4_fsblk_t goal;   /* block offset for allocation */
};

int detail_flag = 0;
int regional_flag = 0;
int amount_cnt = 0;
int succeed_cnt = 0;
ext4_fsblk_tgoal = 0;

/*
 * Check if there's enough disk space
 */
int
check_free_size(int fd, off64_t fsize)
{
struct statfs   fsbuf;
off64_t file_asize = 0;

if (-1 == fstatfs(fd, &fsbuf)) {
if (detail_flag) {
perror(NGMSG_FS_INFO);
}
return RETURN_NG;
}

/* compute free space for root and normal user separately */
if (ROOT_UID == getuid())
file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bfree;
else
file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bavail;

if (file_asize >= fsize)
return RETURN_

[RFC][PATCH 2/3] Move the file data to the new blocks

2007-02-08 Thread Takashi Sato
Move the blocks on the temporary inode to the original inode
by a page.
1. Read the file data from the old blocks to the page
2. Move the block on the temporary inode to the original inode
3. Write the file data on the page into the new blocks

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6.org/Documentation/dontdiff 
linux-2.6.19-rc6-1/fs/ext4/extents.c linux-2.6.19-rc6-full/fs/ext4/extents.c
--- linux-2.6.19-rc6-1/fs/ext4/extents.c2007-02-08 14:13:49.0 
+0900
+++ linux-2.6.19-rc6-full/fs/ext4/extents.c 2007-02-08 14:09:43.0 
+0900
@@ -2533,6 +2533,653 @@ ext4_ext_next_extent(struct inode *inode
 }
 
 /**
+ * ext4_ext_merge_across - merge extents across leaf block
+ * 
+ * @handle journal handle
+ * @inode  target file's inode
+ * @o_startfirst original extent to be defraged
+ * @o_end  last original extent to be defraged
+ * @start_ext  first new extent to be merged
+ * @new_extmiddle of new extent to be merged
+ * @end_extlast new extent to be merged
+ *
+ * This function returns 0 if succeed, otherwise returns error value.
+ */
+static int
+ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode,
+   struct ext4_extent *o_start,
+   struct ext4_extent *o_end, struct ext4_extent *start_ext,
+   struct ext4_extent *new_ext,struct ext4_extent *end_ext)
+{
+   struct ext4_ext_path *org_path = NULL;
+   unsigned long eblock = 0;
+   int err = 0;
+   int new_flag = 0;
+   int end_flag = 0;
+
+   if (le16_to_cpu(start_ext->ee_len) &&
+   le16_to_cpu(new_ext->ee_len) &&
+   le16_to_cpu(end_ext->ee_len)) {
+
+   if ((o_start) == (o_end)) {
+
+   /*   start_ext   new_extend_ext
+* dest |-|---||
+* org  |--|
+*/
+
+   ext4_free_blocks(handle, inode, ext_pblock(o_start) +
+le16_to_cpu(start_ext->ee_len),
+le16_to_cpu(new_ext->ee_len), 0);
+
+   end_flag = 1;
+
+   } else {
+
+   /*   start_ext   new_ext   end_ext
+* dest |-|--|-|
+* org  |---|--|
+*/
+
+   ext4_free_blocks(handle, inode, ext_pblock(o_start) +
+   le16_to_cpu(start_ext->ee_len),
+   le16_to_cpu(o_start->ee_len)
+   - le16_to_cpu(start_ext->ee_len), 0);
+
+   ext4_free_blocks(handle, inode, ext_pblock(o_end),
+   le16_to_cpu(o_end->ee_len)
+   - le16_to_cpu(end_ext->ee_len), 0);
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+   }
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (!le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /* start_extnew_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   ext4_free_blocks(handle, inode, ext_pblock(o_start) +
+   le16_to_cpu(start_ext->ee_len),
+   le16_to_cpu(new_ext->ee_len), 0);
+
+   o_start->ee_len = start_ext->ee_len;
+   new_flag = 1;
+
+   } else if ((!le16_to_cpu(start_ext->ee_len)) &&
+   (le16_to_cpu(new_ext->ee_len)) &&
+   (le16_to_cpu(end_ext->ee_len)) &&
+   ((o_start) == (o_end))) {
+
+   /*  new_extend_ext
+* dest |--|---|
+* org  |--|
+*/
+
+   ext4_free_blocks(handle, inode, ext_pblock(o_end),
+   le16_to_cpu(new_ext->ee_len), 0);
+
+   o_end->ee_block = end_ext->ee_block;
+   o_end->ee_len = end_ext->ee_len;
+   ext4_ext_store_pblock(o_end, ext_pblock(end_ext));
+
+   /* If new_ext was first block */
+   if (!new_ext->ee_block)
+   eblock = 0;
+   else
+   

[RFC][PATCH 1/3] Allocate new contiguous blocks

2007-02-08 Thread Takashi Sato
Search contiguous free blocks with Alex's mutil-block allocation
and allocate them for the temporary inode.

This patch applies on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
---
diff -Nrup -X linux-2.6.19-rc6.org/Documentation/dontdiff 
linux-2.6.19-rc6.org/fs/ext4/extents.c linux-2.6.19-rc6-1/fs/ext4/extents.c
--- linux-2.6.19-rc6.org/fs/ext4/extents.c  2007-02-08 08:40:48.0 
+0900
+++ linux-2.6.19-rc6-1/fs/ext4/extents.c2007-02-08 14:13:49.0 
+0900
@@ -2335,10 +2335,658 @@ int ext4_ext_calc_metadata_amount(struct
return num;
 }
 
+/*
+ * this structure is used to gather extents from the tree via ioctl
+ */
+struct ext4_extent_buf {
+   ext4_fsblk_t start;
+   int buflen;
+   void *buffer;
+   void *cur;
+   int err;
+};
+
+/*
+ * this structure is used to collect stats info about the tree
+ */
+struct ext4_extent_tree_stats {
+   int depth;
+   int extents_num;
+   int leaf_num;
+};
+
+static int
+ext4_ext_store_extent_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *newex,
+   struct ext4_extent_buf *buf)
+{
+
+   if (newex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   if (buf->err < 0)
+   return EXT_BREAK;
+   if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen)
+   return EXT_BREAK;
+
+   if (!copy_to_user(buf->cur, newex, sizeof(*newex))) {
+   buf->err++;
+   buf->cur += sizeof(*newex);
+   } else {
+   buf->err = -EFAULT;
+   return EXT_BREAK;
+   }
+   return EXT_CONTINUE;
+}
+
+static int
+ext4_ext_collect_stats_cb(struct inode *inode,
+   struct ext4_ext_path *path,
+   struct ext4_ext_cache *ex,
+   struct ext4_extent_tree_stats *buf)
+{
+   int depth;
+
+   if (ex->ec_type != EXT4_EXT_CACHE_EXTENT)
+   return EXT_CONTINUE;
+
+   depth = ext_depth(inode);
+   buf->extents_num++;
+   if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr))
+   buf->leaf_num++;
+   return EXT_CONTINUE;
+}
+
+int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
+   unsigned long arg)
+{
+   int err = 0;
+   if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+   return -EINVAL;
+
+   if (cmd == EXT4_IOC_GET_EXTENTS) {
+   struct ext4_extent_buf buf;
+
+   if (copy_from_user(&buf, (void *) arg, sizeof(buf)))
+   return -EFAULT;
+
+   buf.cur = buf.buffer;
+   buf.err = 0;
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   err = ext4_ext_walk_space(inode, buf.start, EXT_MAX_BLOCK,
+   (void *)ext4_ext_store_extent_cb, &buf);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   if (err == 0)
+   err = buf.err;
+   } else if (cmd == EXT4_IOC_GET_TREE_STATS) {
+   struct ext4_extent_tree_stats buf;
+
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   buf.depth = ext_depth(inode);
+   buf.extents_num = 0;
+   buf.leaf_num = 0;
+   err = ext4_ext_walk_space(inode, 0, EXT_MAX_BLOCK,
+   (void *)ext4_ext_collect_stats_cb, &buf);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   if (!err)
+   err = copy_to_user((void *) arg, &buf, sizeof(buf));
+   } else if (cmd == EXT4_IOC_GET_TREE_DEPTH) {
+   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   err = ext_depth(inode);
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   } else if (cmd == EXT4_IOC_FIBMAP) {
+   ext4_fsblk_t __user *p = (ext4_fsblk_t __user *)arg;
+   ext4_fsblk_t block = 0;
+   struct address_space *mapping = filp->f_mapping;
+
+   if (copy_from_user(&block, (ext4_fsblk_t __user *)arg,
+   sizeof(block)))
+   return -EFAULT;
+
+   lock_kernel();
+   block = ext4_bmap(mapping, block);
+   unlock_kernel();
+
+   return put_user(block, p);
+   } else if (cmd == EXT4_IOC_DEFRAG) {
+   struct ext4_ext_defrag_data defrag;
+
+   if (copy_from_user(&defrag,
+   (struct ext4_ext_defrag_data __user *)arg,
+ 

[RFC][PATCH 0/3] ext4 online defrag (ver 0.3)

2007-02-08 Thread Takashi Sato
Hi all,

I've updated my online defrag patches to support leaf node
which is filled with extents and has no space for new extent any more.
Any comments from reviews or tests are very welcome.

Implementation:
1. When the leaf node is filled with extents and there is no space
   for additional extent, the new extent can not be inserted and the defrag
   fails in my previous implementation.
   To solve this problem, call ext4_ext_insert_extent() to create
   a new leaf node and split full leaf node into a new leaf node. 
   In this case, leaf node count or depth of extent tree must be increased.
   Maybe you should turn "AGRESSIVE_TEST" on, if you test this fix.
   "AGRESSIVE_TEST" makes leaf node capacity small, so you can create
   complex inode tree easily.

2. Instead of previous version's ioctl(EXT4_IOC_GET_DATA_BLOCK),
   add new ioctl(EXT4_IOC_FIBMAP) which behaves like FIBMAP and
   returns the physical block number of the specified logical block.
   This ioctl is called by "e4defrag -r directory-name"
   to put all the files under the directory closer together.
   Andreas advised me to use FIBMAP to get physical block number,
   but not feasible due to address_space_operations.
   So this ioctl calls ext4_bmap() directly. 
   Thank you for your advice, Andreas!

3. Change the interface unit from bytes into blocks between user-space and
   kernel-space to be clear.

4. Andrew Morton suggested the following fixes.
   - Consider the type of variables which are file related offset,
 so I change them from unsigned long into ext4_fsblk_t. 
   - Remove unneeded wait_on_page_locked() in ext4_ext_replace_branches.
   - To avoid overflow, add the cast to shift bit calculate points.
   Thank you for your review and comments, Andrew. 

Current status:
These patches are at the experimental stage so they have many issues and
items to improve.  But they are worth enough to examine my trial.

Dependencies:
My patches depend on the following Alex's patches of the multi-block
allocation for Linux 2.6.19-rc6.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

Outstanding issues:
Nothing for the moment.

Items to improve:
- Optimize the depth of extent tree and the number of leaf nodes.
  after defragmentation.

- The blocks on the temporary inode are moved to the original inode
  by a page in the current implementation.  I have to tune
  the pages unit for the performance.

- Support indirect block file.

Next steps:
I will update my patches to optimize the depth of extent tree
and the number of leaf nodes.

Summary of patches:
*These patches apply on top of Alex's patches.
"[RFC] delayed allocation, mballoc, etc"
http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2

[PATCH 1/3] Allocate new contiguous blocks with Alex's mballoc
- Search contiguous free blocks and allocate them for the temporary
  inode with Alex's multi-block allocation.

[PATCH 2/3] Move the file data to the new blocks
- Move the blocks on the temporary inode to the original inode
  by a page.

[PATCH 3/3] Online defrag command
- The defrag command.  Usage is as follows:
  o Put the multiple files closer together.
# e4defrag -r directory-name
  o Defrag for a single file.
# e4defrag file-name
  o Defrag for all files on ext4.
# e4defrag device-name

Cheers, Takashi

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/3] Move the file data to the new blocks

2007-02-07 Thread Takashi Sato

Hi,


+ext4_ext_replace_branches(struct inode *org_inode, struct inode *dest_inode,
+ pgoff_t from_page,  pgoff_t dest_from_page,
+ pgoff_t count_page, unsigned long *delete_start) +{
+ struct ext4_ext_path *org_path = NULL;
+ struct ext4_ext_path *dest_path = NULL;
+ struct ext4_extent   *oext, *dext;
+ struct ext4_extent   tmp_ext;
+ int err = 0;
+ int depth;
+ unsigned long from, count, dest_off, diff, replaced_count = 0;


These should be sector_t, shouldn't they?


At some point should we start using blkcnt_t properly? (block-in[-large]-file, not 
block-in[-large]-device?)  I think that's what it was introduced for, although it's not 
in wide use at this point.


I guess the type really isn't used anywhere else; just in the inode's i_blocks. 
 Hmm.


On reflection, I think we should use ext4_fsblk_t in this case, because
some ext4 codes such as ext4_ext_get_blocks() use it.
int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
   ext4_fsblk_t iblock,
So I would like to change "unsigned long" into ext4_fsblk_t in my next patch.

Cheers, Takashi 


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)

2007-01-19 Thread Takashi Sato

Hi,

Thank you for your comment.


>>>1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical
>>>   block number of the specified file.  With this ioctl, a command
>>>   gets the specified directory's.
>>
>>Maybe I don't understand, but how is this different from the long-time
>>FIBMAP ioctl?
>
>I can use FIBMAP instead of my new ioctl.
>You are right.  I should have used FIBMAP ioctl...

I have to get the physical block number of the specified directory.
But FIBMAP is available only for a regular file, not for a directory.
So I will use my new ioctl.


Though it might make sense to implement FIBMAP for a directory, to keep
it consistent and allow user-space tools like "filefrag" to work on
directories also.


It sounds good.
I think it will be useful for other tools which use FIBMAP.
So I will consider the implementation of FIBMAP for a directory.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)

2007-01-18 Thread Takashi Sato

Hi,


On Jan 16, 2007  21:03 +0900, [EMAIL PROTECTED] wrote:

1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical
   block number of the specified file.  With this ioctl, a command
   gets the specified directory's.


Maybe I don't understand, but how is this different from the long-time
FIBMAP ioctl?


I can use FIBMAP instead of my new ioctl.
You are right.  I should have used FIBMAP ioctl...


I have to get the physical block number of the specified directory.
But FIBMAP is available only for a regular file, not for a directory.
So I will use my new ioctl.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)

2007-01-17 Thread Takashi Sato

Hi,


On Jan 16, 2007  21:03 +0900, [EMAIL PROTECTED] wrote:

1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical
   block number of the specified file.  With this ioctl, a command
   gets the specified directory's.


Maybe I don't understand, but how is this different from the long-time
FIBMAP ioctl?


I can use FIBMAP instead of my new ioctl.
You are right.  I should have used FIBMAP ioctl...


struct ext4_ext_defrag_data {
loff_t start_offset; /* start offset to defrag in byte */
loff_t defrag_size;  /* size of defrag in bytes */
ext4_fsblk_t goal;   /* block offset for allocation */
};


Two things of note:
- presumably the start_offset and defrag_size should be multiples of the
 filesystem blocksize?  If they are not, is it an error or are they
 adjusted to cover whole blocks?


Given the value which isn't multiples of the blocksize,
they are adjusted to cover whole blocks in the kernel.

But I think that it isn't clean that the unit of goal is different from
start_offset and defrag_size.  I will change their unit into a blocksize
in the next update.


- in previous defrag discussions (i.e. XFS defrag), it was desirable to
 allow specifying different types of goals (e.g. hard, soft, kernel picks).
 We may as well have a structure that allows these to be specified, instead
 of having to change the interface afterward.


Let me see...  Is it the following discussion?
http://marc.theaimsgroup.com/?l=linux-ext4&m=116161490908645&w=2
http://marc.theaimsgroup.com/?l=linux-ext4&m=116184475306761&w=2

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] Extent base online defrag

2006-11-17 Thread Takashi Sato

Hi,


> - Specify the target area in a file using the following structure:
>   struct ext3_ext_defrag_data {
>   loff_t start_offset; /* start offset to defrag in bytes */
>   loff_t defrag_size;  /* size of defrag in bytes */
>   }
>   It uses loff_t so that the size of the structure is identical on
>   both 32 bits and 64 bits architecture.
>   Block allocation, including searching for the free contiguous
>   blocks, is implemented in kernel.

NAK the ioctl approach.


I agree it shouldn't go into mainline this way, but while the details of
the proper interface are debated, this implementation at least allows
the core function to be tested & reviewed.


People who like ioctls are just holdovers from non-Linux OS's.


Thank you for your comments.
My patches are at the experimental phase and the ioctl approach is
the provisional solution.
But I intend to continue this work with ioctl approach, if there are no
actual problems.

Cheers, Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html