[RFC] ext3 freeze feature ver 0.2
Hi, Takashi Sato wrote: >>> Instead, I'd like the sec to timeout on freeze API in order to thaw >>> the filesystem automatically. It can prevent a filesystem from staying >>> frozen forever. >>> (Because a freezer may cause a deadlock by accessing the frozen filesystem.) >> >>I'm still not very comfortable with the timeout; if you un-freeze on a >>timer, how do you know that the work for which you needed the fileystem >>frozen is complete? How would you know if your snapshot was good if >>there's a possibility that the fs unfroze while it was being taken? > >And how about adding the new ioctl to reset the timeval like below? >(Dmitri proposed this idea before.) > int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeval); >fd:file descriptor of mountpoint >FIFREEZE_RESET_TIMEOUT:request code for reset of timeout period >timeval:new timeout period >This is useful for the application to set the timeval more accurately. >For example, the freezer resets the timeval to 10 seconds every 5 >seconds. In this approach, even if the freezer causes a deadlock >by accessing the frozen filesystem, it will be solved by the timeout >in 10 seconds and the freezer can recognize that at the next reset >of timeval. I have improved the following two points in my ext3 freeze feature. o Add the new ioctl to reset the timeout period as above The usage is as below. int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeval); fd:file descriptor of mountpoint FIFREEZE_RESET_TIMEOUT:request code for reset of timeout period timeval:new timeout period Return value: 0 if the operation succeeds. Otherwise, -1 Error number: If the filesystem has already been unfrozen, it sets EINVAL to errno. I have made sure the following two results with this ioctl. - After the deadlock occurred by accessing the frozen filesystem, it could be solved by the reset timeout. - And the freezer could recognize that from the error number (EINVAL) at the next reset of timeval. o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS As Andreas Dilger and Christoph Hellwig advised me, I have elevated them to include/linux/fs.h as below. #define FIFREEZE_IOWR('X', 119, int) #define FITHAW _IOWR('X', 120, int) The ioctl numbers used by XFS applications don't need to be changed. But my following ioctl for the freeze needs the parameter as the timeout period. So if XFS applications don't want the timeout feature as the current implementation, the parameter needs to be changed 1 (level?) into 0. I haven't changed the following ioctls from the previous version. int ioctl(int fd, int cmd, long *timeval) fd: The file descriptor of the mountpoint cmd: FIFREEZE for the freeze or FITHAW for the unfreeze timeval: The timeout value expressed in seconds If it's 0, the timeout isn't set. Return value: 0 if the operation succeeds. Otherwise, -1 Any comments are very welcome. Cheers, Takashi Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc3.org/drivers/md/dm.c linux-2.6.25-rc3-freeze/drivers/ md/dm.c --- linux-2.6.25-rc3.org/drivers/md/dm.c2008-02-25 06:25:54.0 +0900 +++ linux-2.6.25-rc3-freeze/drivers/md/dm.c 2008-02-25 10:50:04.0 +0900 @@ -1407,7 +1407,7 @@ static int lock_fs(struct mapped_device WARN_ON(md->frozen_sb); - md->frozen_sb = freeze_bdev(md->suspended_bdev); + md->frozen_sb = freeze_bdev(md->suspended_bdev, 0); if (IS_ERR(md->frozen_sb)) { r = PTR_ERR(md->frozen_sb); md->frozen_sb = NULL; diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc3.org/fs/block_dev.c linux-2.6.25-rc3-freeze/fs/block_ dev.c --- linux-2.6.25-rc3.org/fs/block_dev.c 2008-02-25 06:25:54.0 +0900 +++ linux-2.6.25-rc3-freeze/fs/block_dev.c 2008-02-25 10:50:04.0 +0900 @@ -284,6 +284,11 @@ static void init_once(struct kmem_cache INIT_LIST_HEAD(&bdev->bd_holder_list); #endif inode_init_once(&ei->vfs_inode); + + /* Initialize semaphore for freeze. */ + sema_init(&bdev->bd_freeze_sem, 1); + /* Setup freeze timeout function. */ + INIT_DELAYED_WORK(&bdev->bd_freeze_timeout, freeze_timeout); } static inline void __bd_forget(struct inode *inode) diff -uprN -X /home/sho/pub/MC/freeze-set/dontdiff linux-2.6.25-rc3.org/fs/buffer.c linux-2.6.25-rc3-freeze/fs/buffer.c --- linux-2.6.25-rc3.org/fs/buffer.c2008-02-25 06:25:54.0 +0900 +++ linux-2.6.25-rc3-freeze/fs/buffer.c 2008-02-25 10:50:04.0 +0900 @@ -190,17 +190,33 @@ int fsync_bdev(stru
Re: [RFC] ext3 freeze feature
Hi, Christoph Hellwig wrote: On Fri, Feb 08, 2008 at 08:26:57AM -0500, Andreas Dilger wrote: You may as well make the common ioctl the same as the XFS version, both by number and parameters, so that applications which already understand the XFS ioctl will work on other filesystems. Yes. In facy you should be able to lift the implementations of XFS_IOC_FREEZE and XFS_IOC_THAW to generic code, there's nothing XFS-specific in there. According to Documentation/ioctl-number.txt, XFS_IOC_XXXs (_IOWR('X', aa, bb)) are defined for XFS like below. From Documentation/ioctl-number.txt: CodeSeq#Include FileComments : : 'X' all linux/xfs_fs.h So XFS_IOC_FREEZE and XFS_IOC_THAW cannot be lifted to generic code simply. I think we should create new generic numbers for freeze and thaw like FIBMAP as followings. linux/fs.h: #define FIFREEZE _IO(0x00,3) #define FITHAW _IO(0x00,4) And xfs_freeze calls XFS_IOC_FREEZE with a magic number 1, but what is 1? Instead, I'd like the sec to timeout on freeze API in order to thaw the filesystem automatically. It can prevent a filesystem from staying frozen forever. (Because a freezer may cause a deadlock by accessing the frozen filesystem.) Any comments are very welcome. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 freeze feature
Hi, P.S. Oh yeah, it should be noted that freezing at the filesystem layer does *not* guarantee that changes to the block device aren't happening via mmap()'ed files. The LVM needs to freeze writes the block device level if it wants to guarantee a completely stable snapshot image. So the proposed patch doens't quite give you those guarantees, if that was the intended goal. I don't think a mmap()'ed file is written to a block device while a filesystem is frozen. pdflush starts the writing procedure of the mmap()'ed file's data and calls ext3_ordered_writepage. ext3_ordered_writepage calls ext3_journal_start to get the journal handle. As a result, the process waits for unfreeze in start_this_handle. pdflush :: ext3_ordered_writepage ext3_journal_start ext3_journal_start_sb journal_start start_this_handle <--- wait here I actually tried freezing the filesystem after updating the mmap()'ed file's data. But, the writing to the block device didn't happen. (It happened right after unfreeze.) I don't think the freeze feature on the block device level is needed because the writing for the mmap()'ed file is suspended on the frozen filesystem. Any comments are very welcome. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 freeze feature
Hi, Ted wrote: And I do agree that we probably should just implement this in filesystem independent way, in which case all of the filesystems that support this already have super_operations functions write_super_lockfs() and unlockfs(). So if this is done using a new system call, there should be no filesystem-specific changes needed, and all filesystems which support those super_operations method functions would be able to provide this functionality to the new system call. OK I would like to implement the freeze feature on VFS as the filesystem independent ioctl so that it can be available on filesystems that have already had write_super_lockfs() and unlockfs(). The usage for the freeze ioctl is the following. int ioctl(int fd, int FIFREEZE, long *timeval); fd:file descriptor of mountpoint FIFREEZE:request cord for freeze timeval:timeout period (second) And the unfreeze ioctl is the following. int ioctl(int fd, int FITHAW, NULL); fd:file descriptor of mountpoint FITHAW:Request cord for unfreeze I think we need the timeout feature which thaws the filesystem after lapse of specified time for a fail-safe in case the freezer accesses the frozen filesystem and causes a deadlock. I intend to implement the timeout feature on VFS. (This is realized by registering the delayed work which calls thaw_bdev() to the delayed work queue.) Any comments are very welcome. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 freeze feature
Hi, What you *could* do is to start putting processes to sleep if they attempt to write to the frozen filesystem, and then detect the deadlock case where the process holding the file descriptor used to freeze the filesystem gets frozen because it attempted to write to the filesystem --- at which point it gets some kind of signal (which defaults to killing the process), and the filesystem is unfrozen and as part of the unfreeze you wake up all of the processes that were put to sleep for touching the frozen filesystem. The other approach would be to say, "oh well, the freeze ioctl is inherently dangerous, and root is allowed to himself in the foot, so who cares". :-) Currently the XFS freezer doesn't solve a deadlock automatically and we rely on administrators for ensuring that the freezer will not access the filesystem. And even if the wrong freezer causes a deadlock, it can be solved by other unfreeze process(unfreeze command). So I don't think the freezer itself needs to solve the deadlock. I think the timeout is effective for a unexpected deadlock and the timeout extending feature is very useful as Dmitri proposed. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 freeze feature
Hi, What you *could* do is to start putting processes to sleep if they attempt to write to the frozen filesystem, and then detect the deadlock case where the process holding the file descriptor used to freeze the filesystem gets frozen because it attempted to write to the filesystem --- at which point it gets some kind of signal (which defaults to killing the process), and the filesystem is unfrozen and as part of the unfreeze you wake up all of the processes that were put to sleep for touching the frozen filesystem. I don't think close() usually writes to journal and the deadlock occurs. Is there the special case which close() writes to journal in case of getting signal? Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 freeze feature
Hi, Thank you for your comments. That's inherently unsafe - you can have multiple unfreezes running in parallel which seriously screws with the bdev semaphore count that is used to lock the device due to doing multiple up()s for every down. Your timeout thingy guarantee that at some point you will get multiple up()s occuring due to the timer firing racing with a thaw ioctl. If this interface is to be more widely exported, then it needs a complete revamp of the bdev is locked while it is frozen so that there is no chance of a double up() ever occuring on the bd_mount_sem due to racing thaws. My patch has the race condition as you said. I will fix it. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext3 freeze feature
Hi, I am also wondering whether we should have system call(s) for these: On Jan 25, 2008 12:59 PM, Takashi Sato <[EMAIL PROTECTED]> wrote: + case EXT3_IOC_FREEZE: { + case EXT3_IOC_THAW: { And just convert XFS to use them too? I think it is reasonable to implement it as the generic system call, as you said. Does XFS folks think so? Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] ext3 freeze feature
Hi, Currently, ext3 doesn't have the freeze feature which suspends write requests. So, we cannot get a backup which keeps the filesystem's consistency with the storage device's features (snapshot, replication) while it is mounted. In many case, a commercial filesystems (e.g. VxFS) has the freeze feature and it would be used to get the consistent backup. So I am planning on implementing the ioctl of the freeze feature for ext3. I think we can get the consistent backup with the following steps. 1. Freeze the filesystem with ioctl. 2. Separate the replication volume or get the snapshot with the storage device's feature. 3. Unfreeze the filesystem with ioctl. 4. Get the backup from the separated replication volume or the snapshot. The usage of the ioctl is as below. int ioctl(int fd, int cmd, long *timeval) fd: The file descriptor of the mountpoint. cmd: EXT3_IOC_FREEZE for the freeze or EXT3_IOC_THAW for the unfreeze. timeval: The timeout value expressed in seconds. If it's 0, the timeout isn't set. Return value: 0 if the operation succeeds. Otherwise, -1. I have made sure that write requests were suspended with the experimental patch for this feature and attached it in this mail. The points of the implementation are followings. - Add calls of the freeze function (freeze_bdev) and the unfreeze function (thaw_bdev) in ext3_ioctl(). - ext3_freeze_timeout() which calls the unfreeze function (thaw_bdev) is registered to the delayed work queue to unfreeze the filesystem automatically after the lapse of the specified time. Any comments are very welcome. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff linux-2.6.24-rc8/fs/ext3/ioctl.c linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c --- linux-2.6.24-rc8/fs/ext3/ioctl.c2008-01-16 13:22:48.0 +0900 +++ linux-2.6.24-rc8-freeze/fs/ext3/ioctl.c 2008-01-22 18:20:33.0 +0900 @@ -254,6 +254,42 @@ flags_err: return err; } + case EXT3_IOC_FREEZE: { + long timeout_sec; + long timeout_msec; + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + if (inode->i_sb->s_frozen != SB_UNFROZEN) + return -EINVAL; + /* arg(sec) to tick value */ + get_user(timeout_sec, (long __user *) arg); + timeout_msec = timeout_sec * 1000; + if (timeout_msec < 0) + return -EINVAL; + + /* Freeze */ + freeze_bdev(inode->i_sb->s_bdev); + + /* set up unfreeze timer */ + if (timeout_msec > 0) + ext3_add_freeze_timeout(EXT3_SB(inode->i_sb), + timeout_msec); + return 0; + } + case EXT3_IOC_THAW: { + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + if (inode->i_sb->s_frozen == SB_UNFROZEN) + return -EINVAL; + + /* delete unfreeze timer */ + ext3_del_freeze_timeout(EXT3_SB(inode->i_sb)); + + /* Unfreeze */ + thaw_bdev(inode->i_sb->s_bdev, inode->i_sb); + + return 0; + } default: return -ENOTTY; diff -uprN -X linux-2.6.24-rc8/Documentation/dontdiff linux-2.6.24-rc8/fs/ext3/super.c linux-2.6.24-rc8-freeze/fs/ext3/super.c --- linux-2.6.24-rc8/fs/ext3/super.c2008-01-16 13:22:48.0 +0900 +++ linux-2.6.24-rc8-freeze/fs/ext3/super.c 2008-01-22 18:20:33.0 +0900 @@ -63,6 +63,7 @@ static int ext3_statfs (struct dentry * static void ext3_unlockfs(struct super_block *sb); static void ext3_write_super (struct super_block * sb); static void ext3_write_super_lockfs(struct super_block *sb); +static void ext3_freeze_timeout(struct work_struct *work); /* * Wrappers for journal_start/end. @@ -323,6 +324,44 @@ void ext3_update_dynamic_rev(struct supe } /* + * ext3_add_freeze_timeout - Add timeout for ext3 freeze. + * + * @sbi: ext3 super block + * @timeout_msec : timeout period + * + * Add the delayed work for ext3 freeze timeout + * to the delayed work queue. + */ +void ext3_add_freeze_timeout(struct ext3_sb_info *sbi, + long timeout_msec) +{ + s64 timeout_jiffies = msecs_to_jiffies(timeout_msec); + + /* +* setup freeze timeout function +*/ + INIT_DELAYED_WORK(&sbi->s_freeze_timeout, ext3_freeze_timeout); + + /* set delayed work queue */ + cancel_delayed_work(&sbi->s_freeze_timeout); + schedule_delayed_work(&sbi->s_freeze_timeout, timeout_jiffies); +} + +/* + * ext3_del_freeze_timeout - Delete timeout for ext3 freeze. + * + * @sbi: ext3 super block + *
[RFC][PATCH 10/10] Online defrag command
- The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for free space fragmentation. # e4defrag -f file-name o Defrag for a single file. # e4defrag file-name o Defrag for all files on ext4. # e4defrag device-name Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- /* * e4defrag.c - ext4 filesystem defragmenter */ #ifndef _LARGEFILE_SOURCE #define _LARGEFILE_SOURCE #endif #ifndef _LARGEFILE64_SOURCE #define _LARGEFILE64_SOURCE #endif #define _XOPEN_SOURCE 500 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data) #define EXT4_IOC_GROUP_INFO _IOW('f', 11, struct ext4_group_data_info) #define EXT4_IOC_FREE_BLOCKS_INFO _IOW('f', 12, struct ext4_extents_info) #define EXT4_IOC_EXTENTS_INFO _IOW('f', 13, struct ext4_extents_info) #define EXT4_IOC_RESERVE_BLOCK _IOW('f', 14, struct ext4_extents_info) #define EXT4_IOC_MOVE_VICTIM_IOW('f', 15, struct ext4_extents_info) #define EXT4_IOC_BLOCK_RELEASE _IO('f', 16) #define _FILE_OFFSET_BITS 64 #define ext4_fsblk_tunsigned long long #define DEFRAG_MAX_ENT 32 /* Extent status which are used in ext_in_group */ #define EXT4_EXT_USE0 #define EXT4_EXT_FREE 1 #define EXT4_EXT_RESERVE2 /* Insert list2 after list1 */ #define insert(list1,list2) { list2 ->next = list1->next;\ list1->next->prev = list2;\ list2->prev = list1;\ list1->next = list2;\ } #define DEFRAG_RESERVE_BLOCK_SECOND 2 /* Magic number for ext4 */ #define EXT4_SUPER_MAGIC0xEF53 /* The number of pages to defrag at one time */ #define DEFRAG_PAGES128 /* Maximum length of contiguous blocks */ #define MAX_BLOCKS_LEN 16384 /* Force defrag mode: Max file size in bytes (128MB) */ #define MAX_FILE_SIZE (unsigned long)1 << 27 /* Force defrag mode: Max filesystem relative offset (48bit) */ #define MAX_FS_OFFSET_BIT 48 /* Data type for filesystem-wide blocks number */ #define ext4_fsblk_t unsigned long long /* Ioctl command */ #define EXT4_IOC_FIBMAP _IOW('f', 9, ext4_fsblk_t) #define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data) #define DEVNAME 0 #define DIRNAME 1 #define FILENAME2 #define RETURN_OK 0 #define RETURN_NG -1 #define FTW_CONT0 #define FTW_STOP-1 #define FTW_OPEN_FD 2000 #define FILE_CHK_OK 0 #define FILE_CHK_NG -1 #define FS_EXT4 "ext4dev" #define ROOT_UID0 /* Defrag block size, in bytes */ #define DEFRAG_SIZE 67108864 #define min(x,y) (((x) > (y)) ? (y) : (x)) #define PRINT_ERR_MSG(msg) fprintf(stderr, "%s\n", (msg)); #define PRINT_FILE_NAME(file) fprintf(stderr, "\t\t\"%s\"\n", (file)); #define MSG_USAGE \ "Usage : e4defrag [-v] file...| directory...| device...\n\ : e4defrag -f file [blocknr] \n\ : e4defrag -r directory... | device... \n" #define MSG_R_OPTION" with regional block allocation mode.\n" #define NGMSG_MTAB "\te4defrag : Can not access /etc/mtab." #define NGMSG_UNMOUNT "\te4defrag : FS is not mounted." #define NGMSG_EXT4 "\te4defrag : FS is not ext4 File System." #define NGMSG_FS_INFO "\te4defrag : get FSInfo fail." #define NGMSG_FILE_INFO "\te4defrag : get FileInfo fail." #define NGMSG_FILE_OPEN "\te4defrag : open fail." #define NGMSG_FILE_SYNC "\te4defrag : sync(fsync) fail." #define NGMSG_FILE_DEFRAG "\te4defrag : defrag fail." #define NGMSG_FILE_BLOCKSIZE"\te4defrag : can't get blocksize." #define NGMSG_FILE_FIBMAP "\te4defrag : can't get block number." #define NGMSG_FILE_UNREG"\te4defrag : File is not regular file." #define NGMSG_FILE_LARGE\ "\te4defrag : Defrag size is larger than FileSystem's free space." #define NGMSG_FILE_PRIORITY \ "\te4defrag : File is not current user's file or current user is not root." #define NGMSG_FILE_LOCK "\te4defrag : File is locked." #define NGMSG_FILE_BLANK"\te4defrag : File size is 0." #define NGMSG_GET_LCKINFO
[RFC][PATCH 9/10] Fix bugs in multi-block allocation and locality-group
- Move lg_list to s_locality_dirty in ext4_lg_sync_single_group() to flush all of dirty inodes. - Fix ext4_mb_new_blocks() to return err value when defrag failed. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr linux-2.6.19-rc6-test3/fs/ext4/lg.c Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/lg.c --- linux-2.6.19-rc6-test3/fs/ext4/lg.c 2007-06-20 16:56:16.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/lg.c 2007-06-18 14:21:54.0 +0900 @@ -389,6 +389,10 @@ int ext4_lg_sync_single_group(struct sup cond_resched(); spin_lock(&inode_lock); if (wbc->nr_to_write <= 0) { + if (!list_empty(&lg->lg_io)) { + set_bit(EXT4_LG_DIRTY, &lg->lg_flags); + list_move(&lg->lg_list, &sbi->s_locality_dirty); + } rc = EXT4_STOP_WRITEBACK; code = 6; break; diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr linux-2.6.19-rc6-test3/fs/ext4/mballoc.c Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/mballoc.c --- linux-2.6.19-rc6-test3/fs/ext4/mballoc.c2007-06-20 16:58:22.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git/fs/ext4/mballoc.c2007-06-18 14:21:54.0 +0900 @@ -3732,8 +3732,10 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t !(EXT4_I(ar->inode)->i_state & EXT4_STATE_BLOCKS_RESERVED)) { reserved = ar->len; err = ext4_reserve_blocks(sb, reserved); - if (err) + if (err) { + *errp = err; return err; + } } if (!ext4_mb_use_preallocated(&ac)) { - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 8/10] Release reserved block
- Release reserved blocks if defrag failed. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c Online-Defrag_linux-2.6.19-rc6-git-lg-mballoc-bugfix/fs/ext4/extents.c --- Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c 2007-06-19 20:19:14.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-lg-mballoc-bugfix/fs/ext4/extents.c 2007-06-18 14:23:07.0 +0900 @@ -3066,6 +3066,10 @@ int ext4_ext_ioctl(struct inode *inode, err = ext4_ext_defrag_victim(filp, &ext_info); + } else if (cmd == EXT4_IOC_BLOCK_RELEASE) { + mutex_lock(&EXT4_I(inode)->truncate_mutex); + ext4_discard_reservation(inode); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); } else if (cmd == EXT4_IOC_DEFRAG) { struct ext4_ext_defrag_data defrag; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 7/10] Reserve freed blocks
- Reserve the free blocks in the target area, not to be used by other process. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c --- Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c 2007-06-19 21:40:55.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-BLOCK_RELEASE/fs/ext4/extents.c 2007-06-19 20:19:14.0 +0900 @@ -2619,6 +2619,182 @@ out: } /** + * ext4_ext_defrag_reserve - reserve blocks for defrag + * @inode target inode + * @goal block reservation goal + * @lenblocks count to reserve + * + * This function returns 0 if succeeded, otherwise + * returns error value + */ + +int ext4_ext_defrag_reserve(struct inode * inode, ext4_fsblk_t goal, int len) +{ + struct super_block *sb = NULL; + handle_t *handle = NULL; + struct buffer_head *bitmap_bh = NULL; + struct ext4_block_alloc_info *block_i; + struct ext4_reserve_window_node * my_rsv = NULL; + unsigned short windowsz = 0; + unsigned long group_no; + ext4_grpblk_t grp_target_blk; + int err = 0; + + mutex_lock(&EXT4_I(inode)->truncate_mutex); + + handle = ext4_journal_start(inode, EXT4_RESERVE_TRANS_BLOCKS); + if (IS_ERR(handle)) { + err = PTR_ERR(handle); + handle = NULL; + goto out; + } + + if (S_ISREG(inode->i_mode) && (!EXT4_I(inode)->i_block_alloc_info)) { + ext4_init_block_alloc_info(inode); + } else if (!S_ISREG(inode->i_mode)) { + printk(KERN_ERR "ext4_ext_defrag_reserve:" +" incorrect file type\n"); + err = -1; + goto out; + } + + sb = inode->i_sb; + if (!sb) { + printk("ext4_ext_defrag_reserve: nonexistent device\n"); + err = -ENXIO; + goto out; + } + ext4_get_group_no_and_offset(sb, goal, &group_no, + &grp_target_blk); + + block_i = EXT4_I(inode)->i_block_alloc_info; + + if (!block_i || ((windowsz = + block_i->rsv_window_node.rsv_goal_size) == 0)) { + printk("ex4_ext_defrag_reserve: unable to reserve\n"); + err = -1; + goto out; + } + + my_rsv = &block_i->rsv_window_node; + + bitmap_bh = read_block_bitmap(sb, group_no); + if (!bitmap_bh) { + err = -ENOSPC; + goto out; + } + + BUFFER_TRACE(bitmap_bh, "get undo access for new block"); + err = ext4_journal_get_undo_access(handle, bitmap_bh); + if (err) + goto out; + + err = alloc_new_reservation(my_rsv, grp_target_blk, sb, + group_no, bitmap_bh); + if (err < 0) { + printk(KERN_ERR "defrag: reservation faild\n"); + ext4_discard_reservation(inode); + goto out; + } else { + if (len > EXT4_DEFAULT_RESERVE_BLOCKS) { + try_to_extend_reservation(my_rsv, sb, + len - EXT4_DEFAULT_RESERVE_BLOCKS); + } + } + +out: + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + ext4_journal_release_buffer(handle, bitmap_bh); + brelse(bitmap_bh); + + if (handle) + ext4_journal_stop(handle); + + return err; +} + +int goal_in_my_reservation(struct ext4_reserve_window *, ext4_grpblk_t, + unsigned int, struct super_block *); +int rsv_is_empty(struct ext4_reserve_window *); + +/** + * ext4_ext_block_within_rsv - Is target extent reserved ? + * @ inode inode of target file + * @ ex_startstart physical block number of the extent + * which already moved + * @ ex_len block length of the extent which already moved + * + * This function returns 0 if succeeded, otherwise + * returns error value + */ +static int ext4_ext_block_within_rsv(struct inode *inode, + ext4_fsblk_t ex_start, int ex_len) +{ + struct super_block *sb = inode->i_sb; + struct ext4_block_alloc_info *block_i; + unsigned long group_no; + ext4_grpblk_t grp_blk; + struct ext4_reserve_window_node *rsv; + + block_i = EXT4_I(inode)->i_block_alloc_info; + if (block_i && block_i->rsv_window_node.rsv_goal_size > 0) { + rsv = &block_i->rsv_window_node; + if (rsv_is_empty(&rsv->rsv_window)) { + printk("defrag: Ca
[RFC][PATCH 6/10] Move files from target block group to other block group
- To make contiguous free blocks, move files from the target block group to other block group. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c --- Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c 2007-06-20 08:27:44.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-RESERVE_BLOCK/fs/ext4/extents.c 2007-06-19 21:40:55.0 +0900 @@ -1279,20 +1279,20 @@ ext4_can_extents_be_merged(struct inode } /* - * ext4_ext_insert_extent: - * tries to merge requsted extent into the existing extent or - * inserts requested extent as new one into the tree, - * creating new leaf in the no-space case. + * ext4_ext_insert_extent_defrag: + * The difference from ext4_ext_insert_extent is to use the first block + * in newext as the goal of the new index block. */ -int ext4_ext_insert_extent(handle_t *handle, struct inode *inode, +int ext4_ext_insert_extent_defrag(handle_t *handle, struct inode *inode, struct ext4_ext_path *path, - struct ext4_extent *newext) + struct ext4_extent *newext, int defrag) { struct ext4_extent_header * eh; struct ext4_extent *ex, *fex; struct ext4_extent *nearex; /* nearest extent */ struct ext4_ext_path *npath = NULL; int depth, len, err, next; + ext4_fsblk_t defrag_goal; BUG_ON(newext->ee_len == 0); depth = ext_depth(inode); @@ -1342,11 +1342,17 @@ repeat: le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max)); } + if (defrag) { + defrag_goal = ext_pblock(newext); + } else { + defrag_goal = 0; + } /* * There is no free space in the found leaf. * We're gonna add a new leaf in the tree. */ - err = ext4_ext_create_new_leaf(handle, inode, path, newext); + err = ext4_ext_create_new_leaf(handle, inode, path, + newext, defrag_goal); if (err) goto cleanup; depth = ext_depth(inode); @@ -1438,6 +1444,19 @@ cleanup: return err; } +/* + * ext4_ext_insert_extent: + * tries to merge requsted extent into the existing extent or + * inserts requested extent as new one into the tree, + * creating new leaf in the no-space case. + */ +int ext4_ext_insert_extent(handle_t *handle, struct inode *inode, + struct ext4_ext_path *path, + struct ext4_extent *newext) +{ + return ext4_ext_insert_extent_defrag(handle, inode, path, newext, 0); +} + int ext4_ext_walk_space(struct inode *inode, unsigned long block, unsigned long num, ext_prepare_callback func, void *cbdata) @@ -2600,6 +2619,70 @@ out: } /** + * ext4_ext_defrag_victim - Create free space for defrag + * @filp target file + * @ex_info target extents array to move + * + * This function returns 0 if succeeded, otherwise + * returns error value + */ +static int ext4_ext_defrag_victim(struct file *target_filp, + struct ext4_extents_info *ex_info) +{ + struct inode *target_inode = target_filp->f_dentry->d_inode; + struct super_block *sb = target_inode->i_sb; + struct file victim_file; + struct dentry victim_dent; + struct inode *victim_inode; + ext4_fsblk_t goal = ex_info->goal; + int ret = 0; + int i = 0; + int flag = DEFRAG_RESERVE_BLOCKS_SECOND; + struct ext4_extent_data ext; + unsigned long group; + ext4_grpblk_t grp_off; + + /* Setup dummy entent data */ + ext.len = 0; + + /* Get the inode of the victim file */ + victim_inode = iget(sb, ex_info->ino); + if (!victim_inode) + return -EACCES; + + /* Setup file for the victim file */ + victim_dent.d_inode = victim_inode; + victim_file.f_dentry = &victim_dent; + + /* Set the goal appropriate offset */ + if (goal == -1) { + ext4_get_group_no_and_offset(victim_inode->i_sb, + ex_info->ext[0].start, &group, &grp_off); + goal = ext4_group_first_block_no(sb, group + 1); + } + + for (i = 0; i < ex_info->entries; i++ ) { + /* Move original blocks to another block group */ + if ((ret = ext4_ext_defrag(&victim_file, ex_info->ext[i].block, + ex_info->ext[i].len, goal, flag, &ext)) < 0) + goto ERR; + + /* Sync journal blocks before reservation
[RFC][PATCH 5/10] Get all extents information of specified inode number
- Get all extents information of specified inode number to calculate the combination of extents which should be moved to other block group. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c --- Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c 2007-06-20 08:50:57.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-MOVE_VICTIM/fs/ext4/extents.c 2007-06-20 08:27:44.0 +0900 @@ -43,6 +43,12 @@ #include #include +#define DIO_CREDITS (EXT4_RESERVE_TRANS_BLOCKS + 32) +#define EXT_SET_EXTENT_DATA(src, dest) do {\ + dest.block = le32_to_cpu(src->ee_block);\ + dest.start = ext_pblock(src);\ + dest.len = le16_to_cpu(src->ee_len);\ + } while (0) /* * ext_pblock: * combine low and high parts of physical block number into ext4_fsblk_t @@ -2479,6 +2485,121 @@ ext4_ext_next_extent(struct inode *inode } /** + * ext4_ext_extents_info() - get extents information + * + * @ext_info: pointer to ext4_extents_info + * @ext_info->ino describe an inode which is used to get extent + * information + * @ext_info->max_entries: defined by DEFRAG_MAX_ENT + * @ext_info->entries:amount of extents (output) + * @ext_info->ext[]: array of extent (output) + * @ext_info->offset: starting block offset of targeted extent + * (file relative) + * + * @sb: for iget() + * + * This function returns 0 if next extent(s) exists, + * or returns 1 if next extent doesn't exist, otherwise returns error value. + * Called under truncate_mutex lock. + */ +static int ext4_ext_extents_info(struct ext4_extents_info *ext_info, + struct super_block *sb) +{ + struct ext4_ext_path *path = NULL; + struct ext4_extent *ext = NULL; + struct inode *inode = NULL; + unsigned long offset = ext_info->offset; + int max = ext_info->max_entries; + int is_last_extent = 0; + int depth = 0; + int entries = 0; + int err = 0; + + inode = iget(sb, ext_info->ino); + if (!inode) + return -EACCES; + + mutex_lock(&EXT4_I(inode)->truncate_mutex); + + /* if a file doesn't exist*/ + if ((!inode->i_nlink) || (inode->i_ino < 12) || + !S_ISREG(inode->i_mode)) { + ext_info->entries = 0; + err = -ENOENT; + goto out; + } + + path = ext4_ext_find_extent(inode, offset, NULL); + if (IS_ERR(path)) { + err = PTR_ERR(path); + path = NULL; + goto out; + } + depth = ext_depth(inode); + ext = path[depth].p_ext; + EXT_SET_EXTENT_DATA(ext, ext_info->ext[entries]); + entries = 1; + + /* +* The ioctl can return 'max' ext4_extent_data per a call, +* so if @inode has > 'max' extents, we must get away here. +*/ + while (entries < max) { + is_last_extent = ext4_ext_next_extent(inode, path, &ext); + /* found next extent (not the last one)*/ + if (is_last_extent == 0) { + EXT_SET_EXTENT_DATA(ext, ext_info->ext[entries]); + entries++; + + /* +* If @inode has > 'max' extents, +* this function should be called again, +* (per a call, it can resolve only 'max' extents) +* next time we have to start from 'max*n+1'th extent. +*/ + if (entries == max) { + ext_info->offset = + le32_to_cpu(ext->ee_block) + + le32_to_cpu(ext->ee_len); + /* check the extent is the last one or not*/ + is_last_extent = + ext4_ext_next_extent(inode, path, &ext); + if (is_last_extent) { + is_last_extent = 1; + err = is_last_extent; + } else if (is_last_extent < 0) { + /*ERR*/ + err = is_last_extent; + goto out; +
[RFC][PATCH 4/10] Get free blocks distribution of the target block group
- Get free blocks distribution of the target block group to know how many free blocks it has. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c --- Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c 2007-06-20 09:05:37.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-EXTENTS_INFO/fs/ext4/extents.c 2007-06-20 08:50:57.0 +0900 @@ -2478,6 +2478,99 @@ ext4_ext_next_extent(struct inode *inode return 1; } +/** + * ext4_ext_fblocks_distribution - Search free block distribution + * @filp target file + * @ex_info ext4_extents_info + * + * This function returns 0 if succeeded, otherwise + * returns error value + */ +static int ext4_ext_fblocks_distribution(struct inode *inode, + struct ext4_extents_info *ext_info) +{ + handle_t *handle; + struct buffer_head *bitmap_bh = NULL; + struct super_block *sb = inode->i_sb; + struct ext4_super_block *es; + unsigned long group_no; + int max_entries = ext_info->max_entries; + ext4_grpblk_t blocks_per_group; + ext4_grpblk_t start; + ext4_grpblk_t end; + int num = 0; + int len = 0; + int i = 0; + int err = 0; + int block_set = 0; + int start_block = 0; + + if (!sb) { + printk("ext4_ext_fblock_distribution: nonexitent device\n"); + return -ENOSPC; + } + es = EXT4_SB(sb)->s_es; + + group_no = (inode->i_ino -1) / EXT4_INODES_PER_GROUP(sb); + start = ext_info->offset; + blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb); + end = blocks_per_group -1; + + handle = ext4_journal_start(inode, 1); + if (IS_ERR(handle)) { + err = PTR_ERR(handle); + return err; + } + + bitmap_bh = read_block_bitmap(sb, group_no); + if (!bitmap_bh) { + err = -EIO; + goto out; + } + + BUFFER_TRACE(bitmap_bh, "get undo access for new block"); + err = ext4_journal_get_undo_access(handle, bitmap_bh); + if (err) + goto out; + + for (i = start; i <= end ; i++) { + if (bitmap_search_next_usable_block(i, bitmap_bh, i+1) >= 0) { + len++; + /* if the free block is the first one in a region */ + if (!block_set) { + start_block = + i + group_no * blocks_per_group; + block_set = 1; + } + } else if (len) { + ext_info->ext[num].start = start_block; + ext_info->ext[num].len = len; + num++; + len = 0; + block_set = 0; + if (num == max_entries) { + ext_info->offset = i + 1; + break; + } + } + if ((i == end) && len) { + ext_info->ext[num].start = start_block; + ext_info->ext[num].len = len; + num++; + } + } + + ext_info->entries = num; +out: + ext4_journal_release_buffer(handle, bitmap_bh); + brelse(bitmap_bh); + + if (handle) + ext4_journal_stop(handle); + + return err; +} + int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg) { @@ -2545,6 +2638,21 @@ int ext4_ext_ioctl(struct inode *inode, if (copy_to_user((struct ext4_group_data_info *)arg, &grp_data, sizeof(grp_data))) return -EFAULT; + } else if (cmd == EXT4_IOC_FREE_BLOCKS_INFO) { + struct ext4_extents_info ext_info; + + if (copy_from_user(&ext_info, + (struct ext4_extents_info __user *)arg, + sizeof(ext_info))) + return -EFAULT; + + BUG_ON(ext_info.ino != inode->i_ino); + + err = ext4_ext_fblocks_distribution(inode, &ext_info); + + if (!err) + err = copy_to_user((struct ext4_extents_info*)arg, + &ext_info, sizeof(ext_info)); } else if (cmd == EXT4_IOC_DEFRAG) { struct ext4_ext_defrag_data defrag; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 2/10] Move the file data to the new blocks
Move the blocks on the temporary inode to the original inode by a page. 1. Read the file data from the old blocks to the page 2. Move the block on the temporary inode to the original inode 3. Write the file data on the page into the new blocks Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6-Alex/Documentation/dontdiff linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c linux-2.6.19-rc6-2-move/fs/ext4/extents.c --- linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c 2007-06-20 10:54:11.0 +0900 +++ linux-2.6.19-rc6-2-move/fs/ext4/extents.c 2007-06-20 11:00:45.0 +0900 @@ -2533,6 +2533,565 @@ int ext4_ext_ioctl(struct inode *inode, } /** + * ext4_ext_merge_across - merge extents across leaf block + * + * @handle journal handle + * @inode target file's inode + * @o_startfirst original extent to be defraged + * @o_end last original extent to be defraged + * @start_ext first new extent to be merged + * @new_extmiddle of new extent to be merged + * @end_extlast new extent to be merged + * + * This function returns 0 if succeed, otherwise returns error value. + */ +static int +ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode, + struct ext4_extent *o_start, + struct ext4_extent *o_end, struct ext4_extent *start_ext, + struct ext4_extent *new_ext, struct ext4_extent *end_ext, + int flag) +{ + struct ext4_ext_path *org_path = NULL; + unsigned long eblock = 0; + int err = 0; + int new_flag = 0; + int end_flag = 0; + int defrag_flag; + + if (flag == DEFRAG_RESERVE_BLOCKS_SECOND) + defrag_flag = 1; + else + defrag_flag = 0; + + if (le16_to_cpu(start_ext->ee_len) && + le16_to_cpu(new_ext->ee_len) && + le16_to_cpu(end_ext->ee_len)) { + + if ((o_start) == (o_end)) { + + /* start_ext new_extend_ext +* dest |-|---|| +* org |--| +*/ + + end_flag = 1; + } else { + + /* start_ext new_ext end_ext +* dest |-|--|-| +* org |---|--| +*/ + + o_end->ee_block = end_ext->ee_block; + o_end->ee_len = end_ext->ee_len; + ext4_ext_store_pblock(o_end, ext_pblock(end_ext)); + } + + o_start->ee_len = start_ext->ee_len; + new_flag = 1; + + } else if ((le16_to_cpu(start_ext->ee_len)) && + (le16_to_cpu(new_ext->ee_len)) && + (!le16_to_cpu(end_ext->ee_len)) && + ((o_start) == (o_end))) { + + /* start_ext new_ext +* dest |--|---| +* org |--| +*/ + + o_start->ee_len = start_ext->ee_len; + new_flag = 1; + + } else if ((!le16_to_cpu(start_ext->ee_len)) && + (le16_to_cpu(new_ext->ee_len)) && + (le16_to_cpu(end_ext->ee_len)) && + ((o_start) == (o_end))) { + + /*new_ext end_ext +* dest |--|---| +* org |--| +*/ + + o_end->ee_block = end_ext->ee_block; + o_end->ee_len = end_ext->ee_len; + ext4_ext_store_pblock(o_end, ext_pblock(end_ext)); + + /* If new_ext was first block */ + if (!new_ext->ee_block) + eblock = 0; + else + eblock = le32_to_cpu(new_ext->ee_block); + + new_flag = 1; + } else { + printk("Unexpected case \n"); + return -EIO; + } + + if (new_flag) { + org_path = ext4_ext_find_extent(inode, eblock, NULL); + if (IS_ERR(org_path)) { + err = PTR_ERR(org_path); + org_path = NULL; + goto ERR; + } + err = ext4_ext_insert_extent_defrag(handle, inode, + org_path, new_ext, defrag_flag); + if (err) + goto ERR; + } + + if (end_flag) { + org_path = ext4_ext_find_extent(inode, + e
[RFC][PATCH 3/10] Get block group information
- Get s_blocks_per_group and s_inodes_per_group of target filesystem. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr linux-2.6.19-rc6-test1/fs/ext4/balloc.c Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/balloc.c --- linux-2.6.19-rc6-test1/fs/ext4/balloc.c 2007-06-20 15:15:46.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/balloc.c 2007-06-20 14:57:04.0 +0900 @@ -216,7 +216,7 @@ restart: * If the goal block is within the reservation window, return 1; * otherwise, return 0; */ -static int +int goal_in_my_reservation(struct ext4_reserve_window *rsv, ext4_grpblk_t grp_goal, unsigned int group, struct super_block * sb) { @@ -336,7 +336,7 @@ static void rsv_window_remove(struct sup * * returns 1 if the end block is EXT4_RESERVE_WINDOW_NOT_ALLOCATED. */ -static inline int rsv_is_empty(struct ext4_reserve_window *rsv) +inline int rsv_is_empty(struct ext4_reserve_window *rsv) { /* a valid reservation end block could not be 0 */ return rsv->_rsv_end == EXT4_RESERVE_WINDOW_NOT_ALLOCATED; @@ -660,7 +660,7 @@ static int ext4_test_allocatable(ext4_gr * bitmap on disk and the last-committed copy in journal, until we find a * bit free in both bitmaps. */ -static ext4_grpblk_t +ext4_grpblk_t bitmap_search_next_usable_block(ext4_grpblk_t start, struct buffer_head *bh, ext4_grpblk_t maxblocks) { @@ -1029,7 +1029,7 @@ static int find_next_reservable_window( * @bitmap_bh: the block group block bitmap * */ -static int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv, +int alloc_new_reservation(struct ext4_reserve_window_node *my_rsv, ext4_grpblk_t grp_goal, struct super_block *sb, unsigned int group, struct buffer_head *bitmap_bh) { @@ -1173,7 +1173,7 @@ retry: * expand the reservation window size if necessary on a best-effort * basis before ext4_new_blocks() tries to allocate blocks, */ -static void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv, +void try_to_extend_reservation(struct ext4_reserve_window_node *my_rsv, struct super_block *sb, int size) { struct ext4_reserve_window_node *next_rsv; diff -X Online-Defrag_linux-2.6.19-rc6-git/Documentation/dontdiff -upNr linux-2.6.19-rc6-test1/fs/ext4/extents.c Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c --- linux-2.6.19-rc6-test1/fs/ext4/extents.c2007-06-20 15:42:15.0 +0900 +++ Online-Defrag_linux-2.6.19-rc6-git-FREE_BLOCKS_INFO/fs/ext4/extents.c 2007-06-20 15:50:14.0 +0900 @@ -43,7 +43,6 @@ #include #include - /* * ext_pblock: * combine low and high parts of physical block number into ext4_fsblk_t @@ -206,11 +205,17 @@ static ext4_fsblk_t ext4_ext_find_goal(s static ext4_fsblk_t ext4_ext_new_block(handle_t *handle, struct inode *inode, struct ext4_ext_path *path, - struct ext4_extent *ex, int *err) + struct ext4_extent *ex, int *err, + ext4_fsblk_t defrag_goal) { ext4_fsblk_t goal, newblock; - goal = ext4_ext_find_goal(inode, path, le32_to_cpu(ex->ee_block)); + if (defrag_goal) { + goal = defrag_goal; + } else { + goal= ext4_ext_find_goal(inode, path, + le32_to_cpu(ex->ee_block)); + } newblock = ext4_new_block(handle, inode, goal, err); return newblock; } @@ -598,7 +603,8 @@ static int ext4_ext_insert_index(handle_ */ static int ext4_ext_split(handle_t *handle, struct inode *inode, struct ext4_ext_path *path, - struct ext4_extent *newext, int at) + struct ext4_extent *newext, int at, + ext4_fsblk_t defrag_goal) { struct buffer_head *bh = NULL; int depth = ext_depth(inode); @@ -649,7 +655,8 @@ static int ext4_ext_split(handle_t *hand /* allocate all needed blocks */ ext_debug("allocate %d blocks for indexes/leaf\n", depth - at); for (a = 0; a < depth - at; a++) { - newblock = ext4_ext_new_block(handle, inode, path, newext, &err); + newblock = ext4_ext_new_block(handle, inode, path, newext, &err, + defrag_goal); if (newblock == 0) goto cleanup; ablocks[a] = newblock; @@ -836,7 +843,8 @@ cleanup: */ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode, struct ext4_ext_path *path, -
[RFC][PATCH 1/10] Allocate new contiguous blocks
Search contiguous free blocks with Alex's mutil-block allocation and allocate them for the temporary inode. This patch applies on top of Alex's patches. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Akira Fujita <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6-Alex/Documentation/dontdiff linux-2.6.19-rc6-Alex/fs/ext4/extents.c linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c --- linux-2.6.19-rc6-Alex/fs/ext4/extents.c 2007-06-19 20:50:56.0 +0900 +++ linux-2.6.19-rc6-1-alloc/fs/ext4/extents.c 2007-06-20 10:54:11.0 +0900 @@ -2335,6 +2335,713 @@ int ext4_ext_calc_metadata_amount(struct return num; } +/* + * this structure is used to gather extents from the tree via ioctl + */ +struct ext4_extent_buf { + ext4_fsblk_t start; + int buflen; + void *buffer; + void *cur; + int err; +}; + +/* + * this structure is used to collect stats info about the tree + */ +struct ext4_extent_tree_stats { + int depth; + int extents_num; + int leaf_num; +}; + +static int +ext4_ext_store_extent_cb(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_ext_cache *newex, + struct ext4_extent_buf *buf) +{ + + if (newex->ec_type != EXT4_EXT_CACHE_EXTENT) + return EXT_CONTINUE; + + if (buf->err < 0) + return EXT_BREAK; + if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen) + return EXT_BREAK; + + if (!copy_to_user(buf->cur, newex, sizeof(*newex))) { + buf->err++; + buf->cur += sizeof(*newex); + } else { + buf->err = -EFAULT; + return EXT_BREAK; + } + return EXT_CONTINUE; +} + +static int +ext4_ext_collect_stats_cb(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_ext_cache *ex, + struct ext4_extent_tree_stats *buf) +{ + int depth; + + if (ex->ec_type != EXT4_EXT_CACHE_EXTENT) + return EXT_CONTINUE; + + depth = ext_depth(inode); + buf->extents_num++; + if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr)) + buf->leaf_num++; + return EXT_CONTINUE; +} + +/** + * ext4_ext_next_extent - search for next extent and set it to "extent" + * @inode: inode of the the original file + * @path: this will obtain data for next extent + * @extent:pointer to next extent we have just gotten + * + * This function returns 0 or 1(last_entry) if succeeded, otherwise + * returns -EIO + */ +static int +ext4_ext_next_extent(struct inode *inode, +struct ext4_ext_path *path, +struct ext4_extent **extent) +{ + int ppos; + int leaf_ppos = path->p_depth; + + ppos = leaf_ppos; + if (EXT_LAST_EXTENT(path[ppos].p_hdr) > path[ppos].p_ext) { + /* leaf block */ + *extent = ++path[ppos].p_ext; + return 0; + } + + while (--ppos >= 0) { + if (EXT_LAST_INDEX(path[ppos].p_hdr) > + path[ppos].p_idx) { + int cur_ppos = ppos; + + /* index block */ + path[ppos].p_idx++; + path[ppos].p_block = + idx_pblock(path[ppos].p_idx); + if (path[ppos+1].p_bh) + brelse(path[ppos+1].p_bh); + path[ppos+1].p_bh = + sb_bread(inode->i_sb, path[ppos].p_block); + if (!path[ppos+1].p_bh) + return -EIO; + path[ppos+1].p_hdr = + ext_block_hdr(path[ppos+1].p_bh); + + /* halfway index block */ + while (++cur_ppos < leaf_ppos) { + path[cur_ppos].p_idx = + EXT_FIRST_INDEX(path[cur_ppos].p_hdr); + path[cur_ppos].p_block = + idx_pblock(path[cur_ppos].p_idx); + if (path[cur_ppos+1].p_bh) + brelse(path[cur_ppos+1].p_bh); + path[cur_ppos+1].p_bh = sb_bread(inode->i_sb, + path[cur_ppos].p_block); + if (!path[cur_ppos+1].p_bh) + return -EIO; + path[cur_ppos+1].p_hdr = + ext_block_hdr(path[cur_ppos+1].p
[RFC][PATCH 0/10] ext4 online defrag (ver 0.5)
Hi all, I have updated my online defrag patchset for addition of a new function. This function is defragmentation for free space. If filesytem has insufficient contiguous free blocks, defrag tries to move other files to make sufficient space and reallocates the contiguous blocks for the target file. This function can be used in the following fashion: # e4defrag -f filename [blockno] For create contiguous free blocks, reallocate target file to the block group to which its inode belongs. If set "blockno", defrag tries to move other files (except target file) to indicated physical block offset, otherwise defrag tries to move them to the next block group to which its inode belongs. Maximum of the target file size is same as capable maximum size of one block group. This time I add 6 ioctls for new function and they are used in order of the following. Additional ioctl: - EXT4_IOC_GROUP_INFO - EXT4_IOC_FREE_BLOCKS_INFO - EXT4_IOC_EXTENTS_INFO - EXT4_IOC_MOVE_VICTIM - EXT4_IOC_RESERVE_BLOCK - EXT4_IOC_BLOCK_RELEASE 1). Get s_blocks_per_group and s_inodes_per_group of target file. (EXT4_IOC_GROUP_INFO) In userspace, calculate block group number to which target file belongs with the result of "1". 2). Get free blocks information of the target block group. (EXT4_IOC_FREE_BLOCKS_INFO) Read block bitmap of target block group then set free block distribution to ext4_extents_info structure as extents array. Finally return it to userspace. 3). Get all extents information of indicated inode number. (EXT4_IOC_EXTENTS_INFO) Set extents information of indicated inode number to ext4_extent_info structure then return it to userspace. In userspace, call ioctl(EXT4_IOC_EXTENTS_INFO) for all of inodes in the target group and calculate the combination of extents which should be moved to other block group with the results of 2) and 3). Its size will be same as target file's. 4). Move combination of extents from the target block group to other block group to make free contiguous area in the target block group. (EXT4_IOC_MOVE_VICTIM) 5). Reserve freed blocks of the target block group. (EXT4_IOC_RESERVE_BLOCK) 6). Reallocate target file to reserved contiguous blocks with ext4_ext_defrag(). (EXT4_IOC_DEFRAG) Current status: These patches are at the experimental stage so they have issues and items to improve. But these are worth enough to examine my trial. Dependencies: My patches depend on the following Alex's patches of the multi-block allocation for Linux 2.6.19-rc6. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 Outstanding issues: Nothing for the moment. Items to improve: - Optimize the depth of extent tree and the number of leaf nodes after defragmentation. - The blocks on the temporary inode are moved to the original inode by a page in the current implementation. I have to tune the pages unit for the performance. - Update the base kernel version when Alex's multi-block allocation patch is updated. Next steps: - Make carry out movement of data as atomic transaction. - Reduce the defrag influence upon other process with fadvice(). Summary of patches: *These patches apply on top of Alex's patches. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 [PATCH 1/10] Allocate new contiguous blocks with Alex's mballoc - Search contiguous free blocks and allocate them for the temporary inode with Alex's multi-block allocation. [PATCH 2/10] Move the file data to the new blocks - Move the blocks on the temporary inode to the original inode by a page. [PATCH 3/10] Get block group information - Get s_blocks_per_group and s_inodes_per_group of target filesystem. [PATCH 4/10] Get free blocks distribution of the target block group - Get free blocks distribution of the target block group to know how many free blocks it has. [PATCH 5/10] Get all extents information of indicated inode number - Get all extents information of indicated inode number to calculate the combination of extents which should be moved to other block group. [PATCH 6/10] Move files from target block group to other block group - To make contiguous free blocks, move files from the target block group to other block group. [PATCH 7/10] Reserve freed blocks - Reserve the free blocks in the target area, not to be used by other process [PATCH 8/10] Release reserved blocks - Release reserved blocks if defrag failed. [PATCH 9/10] Fix bugs in multi-block allocation and locality-group - Move lg_list to s_locality_dirty in ext4_lg_sync_single_group() to flush all of dirty inodes. - Fix ext4_mb_new_blocks() to return err value when defrag failed. [PATCH 10/10] Online defrag command - The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for free space fragmentation. # e4defrag -f file-name o Defrag for a single file. # e4defrag file-name o
[RFC][PATCH 2/4] Move the file data to the new blocks
Move the blocks on the temporary inode to the original inode by a page. 1. Read the file data from the old blocks to the page 2. Move the block on the temporary inode to the original inode 3. Write the file data on the page into the new blocks Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6_move_data/Documentation/dontdiff linux-2.6.19-rc6_move_data/fs/ext4/extents.c linux-2.6.19-rc6-full-except_lg/fs/ext4/extents.c --- linux-2.6.19-rc6_move_data/fs/ext4/extents.c2007-04-26 19:36:34.0 +0900 +++ linux-2.6.19-rc6-full-except_lg/fs/ext4/extents.c 2007-04-26 19:17:59.0 +0900 @@ -2533,6 +2533,610 @@ ext4_ext_next_extent(struct inode *inode } /** + * ext4_ext_merge_across - merge extents across leaf block + * + * @handle journal handle + * @inode target file's inode + * @o_startfirst original extent to be defraged + * @o_end last original extent to be defraged + * @start_ext first new extent to be merged + * @new_extmiddle of new extent to be merged + * @end_extlast new extent to be merged + * + * This function returns 0 if succeed, otherwise returns error value. + */ +static int +ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode, + struct ext4_extent *o_start, + struct ext4_extent *o_end, struct ext4_extent *start_ext, + struct ext4_extent *new_ext,struct ext4_extent *end_ext) +{ + struct ext4_ext_path *org_path = NULL; + unsigned long eblock = 0; + int err = 0; + int new_flag = 0; + int end_flag = 0; + + if (le16_to_cpu(start_ext->ee_len) && + le16_to_cpu(new_ext->ee_len) && + le16_to_cpu(end_ext->ee_len)) { + + if ((o_start) == (o_end)) { + + /* start_ext new_extend_ext +* dest |-|---|| +* org |--| +*/ + + end_flag = 1; + } else { + + /* start_ext new_ext end_ext +* dest |-|--|-| +* org |---|--| +*/ + + o_end->ee_block = end_ext->ee_block; + o_end->ee_len = end_ext->ee_len; + ext4_ext_store_pblock(o_end, ext_pblock(end_ext)); + } + + o_start->ee_len = start_ext->ee_len; + new_flag = 1; + + } else if ((le16_to_cpu(start_ext->ee_len)) && + (le16_to_cpu(new_ext->ee_len)) && + (!le16_to_cpu(end_ext->ee_len)) && + ((o_start) == (o_end))) { + + /* start_extnew_ext +* dest |--|---| +* org |--| +*/ + + o_start->ee_len = start_ext->ee_len; + new_flag = 1; + + } else if ((!le16_to_cpu(start_ext->ee_len)) && + (le16_to_cpu(new_ext->ee_len)) && + (le16_to_cpu(end_ext->ee_len)) && + ((o_start) == (o_end))) { + + /* new_extend_ext +* dest |--|---| +* org |--| +*/ + + o_end->ee_block = end_ext->ee_block; + o_end->ee_len = end_ext->ee_len; + ext4_ext_store_pblock(o_end, ext_pblock(end_ext)); + + /* If new_ext was first block */ + if (!new_ext->ee_block) + eblock = 0; + else + eblock = le32_to_cpu(new_ext->ee_block); + + new_flag = 1; + } else { + printk("Unexpected case \n"); + return -EIO; + } + + if (new_flag) { + org_path = ext4_ext_find_extent(inode, eblock, NULL); + if (IS_ERR(org_path)) { + err = PTR_ERR(org_path); + org_path = NULL; + goto ERR; + } + err = ext4_ext_insert_extent(handle, inode, + org_path, new_ext); + if (err) + goto ERR; + } + + if (end_flag) { + org_path = ext4_ext_find_extent(inode, + end_ext->ee_block -1, org_path); + if (IS_ERR(org_path)) { + err = PTR_ERR(org_path); + org_path = NULL; + goto ERR; +
[RFC][PATCH 3/4] Online defrag command
The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for a single file. # e4defrag file-name o Defrag for all files on ext4. # e4defrag device-name Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- /* * e4defrag, ext4 filesystem defragmenter * */ #ifndef _LARGEFILE_SOURCE #define _LARGEFILE_SOURCE #endif #ifndef _LARGEFILE64_SOURCE #define _LARGEFILE64_SOURCE #endif #define _XOPEN_SOURCE 500 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define EXT4_SUPER_MAGIC0xEF53 /* magic number for ext4 */ #define DEFRAG_PAGES128 /* the number of pages to defrag at one time */ #define MAX_BLOCKS_LEN 16384 /* Maximum length of contiguous blocks */ /* data type for filesystem-wide blocks number */ #define ext4_fsblk_t unsigned long long /* ioctl command */ #define EXT4_IOC_GET_DATA_BLOCK _IOW('f', 9, ext4_fsblk_t) #define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data) #define DEVNAME 0 #define DIRNAME 1 #define FILENAME2 #define RETURN_OK 0 #define RETURN_NG -1 #define FTW_CONT0 #define FTW_STOP-1 #define FTW_OPEN_FD 2000 #define FILE_CHK_OK 0 #define FILE_CHK_NG -1 #define FS_EXT4 "ext4dev" #define ROOT_UID0 /* defrag block size, in bytes */ #define DEFRAG_SIZE 67108864 #define min(x,y) (((x) > (y)) ? (y) : (x)) #define PRINT_ERR_MSG(msg) fprintf(stderr, "%s\n", (msg)); #define PRINT_FILE_NAME(file) \ fprintf(stderr, "\t\t\"%s\"\n", (file)); #define MSG_USAGE \ "Usage : e4defrag [-v] file...| directory...| device...\n : e4defrag [-r] \ directory... | device... \n" #define MSG_R_OPTION\ " with regional block allocation mode.\n" #define NGMSG_MTAB \ "\te4defrag : Can not access /etc/mtab." #define NGMSG_UNMOUNT "\te4defrag : FS is not mounted." #define NGMSG_EXT4 \ "\te4defrag : FS is not ext4 File System." #define NGMSG_FS_INFO "\te4defrag : get FSInfo fail." #define NGMSG_FILE_INFO "\te4defrag : get FileInfo fail." #define NGMSG_FILE_OPEN "\te4defrag : open fail." #define NGMSG_FILE_SYNC "\te4defrag : sync(fsync) fail." #define NGMSG_FILE_DEFRAG "\te4defrag : defrag fail." #define NGMSG_FILE_BLOCKSIZE"\te4defrag : can't get blocksize." #define NGMSG_FILE_DATA "\te4defrag : can't get data." #define NGMSG_FILE_UNREG\ "\te4defrag : File is not regular file." #define NGMSG_FILE_LARGE\ "\te4defrag : Defrag size is larger than FileSystem's free space." #define NGMSG_FILE_PRIORITY \ "\te4defrag : File is not current user's file or current user is not root." #define NGMSG_FILE_LOCK "\te4defrag : File is locked." #define NGMSG_FILE_BLANK"\te4defrag : File size is 0." #define NGMSG_GET_LCKINFO "\te4defrag : get LockInfo fail." #define NGMSG_TYPE \ "e4defrag : Can not process %s." struct ext4_ext_defrag_data { ext4_fsblk_t start_offset; /* start offset to defrag in blocks */ ext4_fsblk_t defrag_size; /* size of defrag in blocks */ ext4_fsblk_t goal; /* block offset for allocation */ }; int detail_flag = 0; int regional_flag = 0; int amount_cnt = 0; int succeed_cnt = 0; ext4_fsblk_tgoal = 0; /* * Check if there's enough disk space */ int check_free_size(int fd, off64_t fsize) { struct statfs fsbuf; off64_t file_asize = 0; if (-1 == fstatfs(fd, &fsbuf)) { if (detail_flag) { perror(NGMSG_FS_INFO); } return RETURN_NG; } /* compute free space for root and normal user separately */ if (ROOT_UID == getuid()) file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bfree; else file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bavail; if (file_asize >= fsize) return RETURN_
ext4 online defrag (ver 0.4)
Hi all, I have made following changes to the previous online defrag patchset to improve it. Note that there is no functional change. 1. Change the handling of temporary inode. Now ext4_ext_defrag() calls ext4_new_inode()/iput() pair instead of new_inode()/delete_ext_defrag_inode(). Because new_inode() does not initialize all of entries that I need such as i_extra_isize. 2. Change how to swap blocks. In this patchset, the original blocks of the target file are swapped with temporary inode carefully to release them in iput(). 3. Add an exclusive lock. Now ext4_inode_info.truncate_mutex is locked while the file being defragmented. 4. Add marking locality group as dirty. The lg is moved to s_locality_dirty list and marked as dirty if nr_to_write (total page count which has not written in disk yet) is 0 or less and lg_io is not empty in ext4_lg_sync_single_group(). This makes sure that inode is written to disk. Current status: These patches are at the experimental stage so they have issues and items to improve. But these are worth enough to examine my trial. Dependencies: My patches depend on the following Alex's patches of the multi-block allocation for Linux 2.6.19-rc6. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 Outstanding issues: Nothing for the moment. Items to improve: - Optimize the depth of extent tree and the number of leaf nodes. after defragmentation. - The blocks on the temporary inode are moved to the original inode by a page in the current implementation. I have to tune the pages unit for the performance. - Support indirect block file. Next steps: - Defragmentation for free space fragmentation. If filesytem has insufficient contiguous blocks, move other files to make sufficient space and allocate the contiguous blocks for the target file. Summary of patches: *These patches apply on top of Alex's patches. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 [PATCH 1/4] Allocate new contiguous blocks with Alex's mballoc - Search contiguous free blocks and allocate them for the temporary inode with Alex's multi-block allocation. [PATCH 2/4] Move the file data to the new blocks - Move the blocks on the temporary inode to the original inode by a page. [PATCH 3/4] Online defrag command - The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for a single file. # e4defrag file-name o Defrag for all files on ext4. # e4defrag device-name [PATCH 4/4] ext4_locality_group bug fix - Move lg_list to s_locality_dirty in ext4_lg_sync_single_group() to flush all of dirty inodes. Any comments from reviews or tests are very welcome. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 4/4] ext4_locality_group bug fix
Move lg_list to s_locality_dirty and mark lg as dirty if nr_to_write(total page count which has not written in disk yet) is 0 or less and lg_io is not empty in ext4_lg_sync_single_group(). This makes sure that inode is written to disk. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6-lg/Documentation/dontdiff linux-2.6.19-rc6-lg/fs/ext4/lg.c linux-2.6.19-rc6-full/fs/ext4/lg.c --- linux-2.6.19-rc6-lg/fs/ext4/lg.c2007-04-26 19:55:37.0 +0900 +++ linux-2.6.19-rc6-full/fs/ext4/lg.c 2007-04-26 19:17:59.0 +0900 @@ -389,6 +389,10 @@ int ext4_lg_sync_single_group(struct sup cond_resched(); spin_lock(&inode_lock); if (wbc->nr_to_write <= 0) { + if (!list_empty(&lg->lg_io)) { + set_bit(EXT4_LG_DIRTY, &lg->lg_flags); + list_move(&lg->lg_list, &sbi->s_locality_dirty); + } rc = EXT4_STOP_WRITEBACK; code = 6; break; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 1/4] Allocate new contiguous blocks
Search contiguous free blocks with Alex's mutil-block allocation and allocate them for the temporary inode. This patch applies on top of Alex's patches. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6-alloc_block/Documentation/dontdiff linux-2.6.19-rc6-full/fs/ext4/extents.c linux-2.6.19-rc6-alloc_block/fs/ext4/extents.c --- linux-2.6.19-rc6-full/fs/ext4/extents.c 2007-04-26 20:36:50.0 +0900 +++ linux-2.6.19-rc6-alloc_block/fs/ext4/extents.c 2007-04-26 20:36:05.0 +0900 @@ -2335,10 +2335,635 @@ int ext4_ext_calc_metadata_amount(struct return num; } +/* + * this structure is used to gather extents from the tree via ioctl + */ +struct ext4_extent_buf { + ext4_fsblk_t start; + int buflen; + void *buffer; + void *cur; + int err; +}; + +/* + * this structure is used to collect stats info about the tree + */ +struct ext4_extent_tree_stats { + int depth; + int extents_num; + int leaf_num; +}; + +static int +ext4_ext_store_extent_cb(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_ext_cache *newex, + struct ext4_extent_buf *buf) +{ + + if (newex->ec_type != EXT4_EXT_CACHE_EXTENT) + return EXT_CONTINUE; + + if (buf->err < 0) + return EXT_BREAK; + if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen) + return EXT_BREAK; + + if (!copy_to_user(buf->cur, newex, sizeof(*newex))) { + buf->err++; + buf->cur += sizeof(*newex); + } else { + buf->err = -EFAULT; + return EXT_BREAK; + } + return EXT_CONTINUE; +} + +static int +ext4_ext_collect_stats_cb(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_ext_cache *ex, + struct ext4_extent_tree_stats *buf) +{ + int depth; + + if (ex->ec_type != EXT4_EXT_CACHE_EXTENT) + return EXT_CONTINUE; + + depth = ext_depth(inode); + buf->extents_num++; + if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr)) + buf->leaf_num++; + return EXT_CONTINUE; +} + +int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, + unsigned long arg) +{ + int err = 0; + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) + return -EINVAL; + + if (cmd == EXT4_IOC_GET_EXTENTS) { + struct ext4_extent_buf buf; + + if (copy_from_user(&buf, (void *) arg, sizeof(buf))) + return -EFAULT; + + buf.cur = buf.buffer; + buf.err = 0; + mutex_lock(&EXT4_I(inode)->truncate_mutex); + err = ext4_ext_walk_space(inode, buf.start, EXT_MAX_BLOCK, + (void *)ext4_ext_store_extent_cb, &buf); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + if (err == 0) + err = buf.err; + } else if (cmd == EXT4_IOC_GET_TREE_STATS) { + struct ext4_extent_tree_stats buf; + + mutex_lock(&EXT4_I(inode)->truncate_mutex); + buf.depth = ext_depth(inode); + buf.extents_num = 0; + buf.leaf_num = 0; + err = ext4_ext_walk_space(inode, 0, EXT_MAX_BLOCK, + (void *)ext4_ext_collect_stats_cb, &buf); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + if (!err) + err = copy_to_user((void *) arg, &buf, sizeof(buf)); + } else if (cmd == EXT4_IOC_GET_TREE_DEPTH) { + mutex_lock(&EXT4_I(inode)->truncate_mutex); + err = ext_depth(inode); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + } else if (cmd == EXT4_IOC_FIBMAP) { + ext4_fsblk_t __user *p = (ext4_fsblk_t __user *)arg; + ext4_fsblk_t block = 0; + struct address_space *mapping = filp->f_mapping; + + if (copy_from_user(&block, (ext4_fsblk_t __user *)arg, + sizeof(block))) + return -EFAULT; + + lock_kernel(); + block = ext4_bmap(mapping, block); + unlock_kernel(); + + return put_user(block, p); + } else if (cmd == EXT4_IOC_DEFRAG) { + struct ext4_ext_defrag_data defrag; + + if (copy_from_user(&defrag, + (struct ext4_ext_defrag_data __user *)arg, +
Re: [RFC][PATCH 3/3] Online defrag command
Hi, On Thu, Feb 08 2007, Takashi Sato wrote: The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for a single file. # e4defrag file-name o Defrag for all files on ext4. # e4defrag device-name Would it be possible to provide support for putting multiple files close together? Ala # e4defrag file1 file2 file3 ... fileN e4defrag cannot do it in my current implementation. I will consider its implementation on my later version. Alternatively, you can do it if you link those files with a directory. # ln file1 file2 file3 ... fileN directory-name # e4defrag -r directory-name I'm thinking boot speedup, gather the list of read files and put them close on disk. I think so. It's my final goal. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 3/3] Online defrag command
The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for a single file. # e4defrag file-name o Defrag for all files on ext4. # e4defrag device-name Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- /* * e4defrag, ext4 filesystem defragmenter * */ #ifndef _LARGEFILE_SOURCE #define _LARGEFILE_SOURCE #endif #ifndef _LARGEFILE64_SOURCE #define _LARGEFILE64_SOURCE #endif #define _XOPEN_SOURCE 500 #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define EXT4_SUPER_MAGIC0xEF53 /* magic number for ext4 */ #define DEFRAG_PAGES128 /* the number of pages to defrag at one time */ #define MAX_BLOCKS_LEN 16384 /* Maximum length of contiguous blocks */ /* data type for filesystem-wide blocks number */ #define ext4_fsblk_t unsigned long long /* ioctl command */ #define EXT4_IOC_GET_DATA_BLOCK _IOW('f', 9, ext4_fsblk_t) #define EXT4_IOC_DEFRAG _IOW('f', 10, struct ext4_ext_defrag_data) #define DEVNAME 0 #define DIRNAME 1 #define FILENAME2 #define RETURN_OK 0 #define RETURN_NG -1 #define FTW_CONT0 #define FTW_STOP-1 #define FTW_OPEN_FD 2000 #define FILE_CHK_OK 0 #define FILE_CHK_NG -1 #define FS_EXT4 "ext4dev" #define ROOT_UID0 /* defrag block size, in bytes */ #define DEFRAG_SIZE 67108864 #define min(x,y) (((x) > (y)) ? (y) : (x)) #define PRINT_ERR_MSG(msg) fprintf(stderr, "%s\n", (msg)); #define PRINT_FILE_NAME(file) \ fprintf(stderr, "\t\t\"%s\"\n", (file)); #define MSG_USAGE \ "Usage : e4defrag [-v] file...| directory...| device...\n : e4defrag [-r] directory... | device... \n" #define MSG_R_OPTION\ " with regional block allocation mode.\n" #define NGMSG_MTAB \ "\te4defrag : Can not access /etc/mtab." #define NGMSG_UNMOUNT "\te4defrag : FS is not mounted." #define NGMSG_EXT4 \ "\te4defrag : FS is not ext4 File System." #define NGMSG_FS_INFO "\te4defrag : get FSInfo fail." #define NGMSG_FILE_INFO "\te4defrag : get FileInfo fail." #define NGMSG_FILE_OPEN "\te4defrag : open fail." #define NGMSG_FILE_SYNC "\te4defrag : sync(fsync) fail." #define NGMSG_FILE_DEFRAG "\te4defrag : defrag fail." #define NGMSG_FILE_BLOCKSIZE"\te4defrag : can't get blocksize." #define NGMSG_FILE_DATA "\te4defrag : can't get data." #define NGMSG_FILE_UNREG\ "\te4defrag : File is not regular file." #define NGMSG_FILE_LARGE\ "\te4defrag : Defrag size is larger than FileSystem's free space." #define NGMSG_FILE_PRIORITY \ "\te4defrag : File is not current user's file or current user is not root." #define NGMSG_FILE_LOCK "\te4defrag : File is locked." #define NGMSG_FILE_BLANK"\te4defrag : File size is 0." #define NGMSG_GET_LCKINFO "\te4defrag : get LockInfo fail." #define NGMSG_TYPE \ "e4defrag : Can not process %s." struct ext4_ext_defrag_data { ext4_fsblk_t start_offset; /* start offset to defrag in blocks */ ext4_fsblk_t defrag_size; /* size of defrag in blocks */ ext4_fsblk_t goal; /* block offset for allocation */ }; int detail_flag = 0; int regional_flag = 0; int amount_cnt = 0; int succeed_cnt = 0; ext4_fsblk_tgoal = 0; /* * Check if there's enough disk space */ int check_free_size(int fd, off64_t fsize) { struct statfs fsbuf; off64_t file_asize = 0; if (-1 == fstatfs(fd, &fsbuf)) { if (detail_flag) { perror(NGMSG_FS_INFO); } return RETURN_NG; } /* compute free space for root and normal user separately */ if (ROOT_UID == getuid()) file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bfree; else file_asize = (off64_t)fsbuf.f_bsize * fsbuf.f_bavail; if (file_asize >= fsize) return RETURN_
[RFC][PATCH 2/3] Move the file data to the new blocks
Move the blocks on the temporary inode to the original inode by a page. 1. Read the file data from the old blocks to the page 2. Move the block on the temporary inode to the original inode 3. Write the file data on the page into the new blocks Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6.org/Documentation/dontdiff linux-2.6.19-rc6-1/fs/ext4/extents.c linux-2.6.19-rc6-full/fs/ext4/extents.c --- linux-2.6.19-rc6-1/fs/ext4/extents.c2007-02-08 14:13:49.0 +0900 +++ linux-2.6.19-rc6-full/fs/ext4/extents.c 2007-02-08 14:09:43.0 +0900 @@ -2533,6 +2533,653 @@ ext4_ext_next_extent(struct inode *inode } /** + * ext4_ext_merge_across - merge extents across leaf block + * + * @handle journal handle + * @inode target file's inode + * @o_startfirst original extent to be defraged + * @o_end last original extent to be defraged + * @start_ext first new extent to be merged + * @new_extmiddle of new extent to be merged + * @end_extlast new extent to be merged + * + * This function returns 0 if succeed, otherwise returns error value. + */ +static int +ext4_ext_merge_across_blocks(handle_t *handle, struct inode *inode, + struct ext4_extent *o_start, + struct ext4_extent *o_end, struct ext4_extent *start_ext, + struct ext4_extent *new_ext,struct ext4_extent *end_ext) +{ + struct ext4_ext_path *org_path = NULL; + unsigned long eblock = 0; + int err = 0; + int new_flag = 0; + int end_flag = 0; + + if (le16_to_cpu(start_ext->ee_len) && + le16_to_cpu(new_ext->ee_len) && + le16_to_cpu(end_ext->ee_len)) { + + if ((o_start) == (o_end)) { + + /* start_ext new_extend_ext +* dest |-|---|| +* org |--| +*/ + + ext4_free_blocks(handle, inode, ext_pblock(o_start) + +le16_to_cpu(start_ext->ee_len), +le16_to_cpu(new_ext->ee_len), 0); + + end_flag = 1; + + } else { + + /* start_ext new_ext end_ext +* dest |-|--|-| +* org |---|--| +*/ + + ext4_free_blocks(handle, inode, ext_pblock(o_start) + + le16_to_cpu(start_ext->ee_len), + le16_to_cpu(o_start->ee_len) + - le16_to_cpu(start_ext->ee_len), 0); + + ext4_free_blocks(handle, inode, ext_pblock(o_end), + le16_to_cpu(o_end->ee_len) + - le16_to_cpu(end_ext->ee_len), 0); + + o_end->ee_block = end_ext->ee_block; + o_end->ee_len = end_ext->ee_len; + ext4_ext_store_pblock(o_end, ext_pblock(end_ext)); + } + + o_start->ee_len = start_ext->ee_len; + new_flag = 1; + + } else if ((le16_to_cpu(start_ext->ee_len)) && + (le16_to_cpu(new_ext->ee_len)) && + (!le16_to_cpu(end_ext->ee_len)) && + ((o_start) == (o_end))) { + + /* start_extnew_ext +* dest |--|---| +* org |--| +*/ + + ext4_free_blocks(handle, inode, ext_pblock(o_start) + + le16_to_cpu(start_ext->ee_len), + le16_to_cpu(new_ext->ee_len), 0); + + o_start->ee_len = start_ext->ee_len; + new_flag = 1; + + } else if ((!le16_to_cpu(start_ext->ee_len)) && + (le16_to_cpu(new_ext->ee_len)) && + (le16_to_cpu(end_ext->ee_len)) && + ((o_start) == (o_end))) { + + /* new_extend_ext +* dest |--|---| +* org |--| +*/ + + ext4_free_blocks(handle, inode, ext_pblock(o_end), + le16_to_cpu(new_ext->ee_len), 0); + + o_end->ee_block = end_ext->ee_block; + o_end->ee_len = end_ext->ee_len; + ext4_ext_store_pblock(o_end, ext_pblock(end_ext)); + + /* If new_ext was first block */ + if (!new_ext->ee_block) + eblock = 0; + else +
[RFC][PATCH 1/3] Allocate new contiguous blocks
Search contiguous free blocks with Alex's mutil-block allocation and allocate them for the temporary inode. This patch applies on top of Alex's patches. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> --- diff -Nrup -X linux-2.6.19-rc6.org/Documentation/dontdiff linux-2.6.19-rc6.org/fs/ext4/extents.c linux-2.6.19-rc6-1/fs/ext4/extents.c --- linux-2.6.19-rc6.org/fs/ext4/extents.c 2007-02-08 08:40:48.0 +0900 +++ linux-2.6.19-rc6-1/fs/ext4/extents.c2007-02-08 14:13:49.0 +0900 @@ -2335,10 +2335,658 @@ int ext4_ext_calc_metadata_amount(struct return num; } +/* + * this structure is used to gather extents from the tree via ioctl + */ +struct ext4_extent_buf { + ext4_fsblk_t start; + int buflen; + void *buffer; + void *cur; + int err; +}; + +/* + * this structure is used to collect stats info about the tree + */ +struct ext4_extent_tree_stats { + int depth; + int extents_num; + int leaf_num; +}; + +static int +ext4_ext_store_extent_cb(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_ext_cache *newex, + struct ext4_extent_buf *buf) +{ + + if (newex->ec_type != EXT4_EXT_CACHE_EXTENT) + return EXT_CONTINUE; + + if (buf->err < 0) + return EXT_BREAK; + if (buf->cur - buf->buffer + sizeof(*newex) > buf->buflen) + return EXT_BREAK; + + if (!copy_to_user(buf->cur, newex, sizeof(*newex))) { + buf->err++; + buf->cur += sizeof(*newex); + } else { + buf->err = -EFAULT; + return EXT_BREAK; + } + return EXT_CONTINUE; +} + +static int +ext4_ext_collect_stats_cb(struct inode *inode, + struct ext4_ext_path *path, + struct ext4_ext_cache *ex, + struct ext4_extent_tree_stats *buf) +{ + int depth; + + if (ex->ec_type != EXT4_EXT_CACHE_EXTENT) + return EXT_CONTINUE; + + depth = ext_depth(inode); + buf->extents_num++; + if (path[depth].p_ext == EXT_FIRST_EXTENT(path[depth].p_hdr)) + buf->leaf_num++; + return EXT_CONTINUE; +} + +int ext4_ext_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, + unsigned long arg) +{ + int err = 0; + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) + return -EINVAL; + + if (cmd == EXT4_IOC_GET_EXTENTS) { + struct ext4_extent_buf buf; + + if (copy_from_user(&buf, (void *) arg, sizeof(buf))) + return -EFAULT; + + buf.cur = buf.buffer; + buf.err = 0; + mutex_lock(&EXT4_I(inode)->truncate_mutex); + err = ext4_ext_walk_space(inode, buf.start, EXT_MAX_BLOCK, + (void *)ext4_ext_store_extent_cb, &buf); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + if (err == 0) + err = buf.err; + } else if (cmd == EXT4_IOC_GET_TREE_STATS) { + struct ext4_extent_tree_stats buf; + + mutex_lock(&EXT4_I(inode)->truncate_mutex); + buf.depth = ext_depth(inode); + buf.extents_num = 0; + buf.leaf_num = 0; + err = ext4_ext_walk_space(inode, 0, EXT_MAX_BLOCK, + (void *)ext4_ext_collect_stats_cb, &buf); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + if (!err) + err = copy_to_user((void *) arg, &buf, sizeof(buf)); + } else if (cmd == EXT4_IOC_GET_TREE_DEPTH) { + mutex_lock(&EXT4_I(inode)->truncate_mutex); + err = ext_depth(inode); + mutex_unlock(&EXT4_I(inode)->truncate_mutex); + } else if (cmd == EXT4_IOC_FIBMAP) { + ext4_fsblk_t __user *p = (ext4_fsblk_t __user *)arg; + ext4_fsblk_t block = 0; + struct address_space *mapping = filp->f_mapping; + + if (copy_from_user(&block, (ext4_fsblk_t __user *)arg, + sizeof(block))) + return -EFAULT; + + lock_kernel(); + block = ext4_bmap(mapping, block); + unlock_kernel(); + + return put_user(block, p); + } else if (cmd == EXT4_IOC_DEFRAG) { + struct ext4_ext_defrag_data defrag; + + if (copy_from_user(&defrag, + (struct ext4_ext_defrag_data __user *)arg, +
[RFC][PATCH 0/3] ext4 online defrag (ver 0.3)
Hi all, I've updated my online defrag patches to support leaf node which is filled with extents and has no space for new extent any more. Any comments from reviews or tests are very welcome. Implementation: 1. When the leaf node is filled with extents and there is no space for additional extent, the new extent can not be inserted and the defrag fails in my previous implementation. To solve this problem, call ext4_ext_insert_extent() to create a new leaf node and split full leaf node into a new leaf node. In this case, leaf node count or depth of extent tree must be increased. Maybe you should turn "AGRESSIVE_TEST" on, if you test this fix. "AGRESSIVE_TEST" makes leaf node capacity small, so you can create complex inode tree easily. 2. Instead of previous version's ioctl(EXT4_IOC_GET_DATA_BLOCK), add new ioctl(EXT4_IOC_FIBMAP) which behaves like FIBMAP and returns the physical block number of the specified logical block. This ioctl is called by "e4defrag -r directory-name" to put all the files under the directory closer together. Andreas advised me to use FIBMAP to get physical block number, but not feasible due to address_space_operations. So this ioctl calls ext4_bmap() directly. Thank you for your advice, Andreas! 3. Change the interface unit from bytes into blocks between user-space and kernel-space to be clear. 4. Andrew Morton suggested the following fixes. - Consider the type of variables which are file related offset, so I change them from unsigned long into ext4_fsblk_t. - Remove unneeded wait_on_page_locked() in ext4_ext_replace_branches. - To avoid overflow, add the cast to shift bit calculate points. Thank you for your review and comments, Andrew. Current status: These patches are at the experimental stage so they have many issues and items to improve. But they are worth enough to examine my trial. Dependencies: My patches depend on the following Alex's patches of the multi-block allocation for Linux 2.6.19-rc6. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 Outstanding issues: Nothing for the moment. Items to improve: - Optimize the depth of extent tree and the number of leaf nodes. after defragmentation. - The blocks on the temporary inode are moved to the original inode by a page in the current implementation. I have to tune the pages unit for the performance. - Support indirect block file. Next steps: I will update my patches to optimize the depth of extent tree and the number of leaf nodes. Summary of patches: *These patches apply on top of Alex's patches. "[RFC] delayed allocation, mballoc, etc" http://marc.theaimsgroup.com/?l=linux-ext4&m=116493228301966&w=2 [PATCH 1/3] Allocate new contiguous blocks with Alex's mballoc - Search contiguous free blocks and allocate them for the temporary inode with Alex's multi-block allocation. [PATCH 2/3] Move the file data to the new blocks - Move the blocks on the temporary inode to the original inode by a page. [PATCH 3/3] Online defrag command - The defrag command. Usage is as follows: o Put the multiple files closer together. # e4defrag -r directory-name o Defrag for a single file. # e4defrag file-name o Defrag for all files on ext4. # e4defrag device-name Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 2/3] Move the file data to the new blocks
Hi, +ext4_ext_replace_branches(struct inode *org_inode, struct inode *dest_inode, + pgoff_t from_page, pgoff_t dest_from_page, + pgoff_t count_page, unsigned long *delete_start) +{ + struct ext4_ext_path *org_path = NULL; + struct ext4_ext_path *dest_path = NULL; + struct ext4_extent *oext, *dext; + struct ext4_extent tmp_ext; + int err = 0; + int depth; + unsigned long from, count, dest_off, diff, replaced_count = 0; These should be sector_t, shouldn't they? At some point should we start using blkcnt_t properly? (block-in[-large]-file, not block-in[-large]-device?) I think that's what it was introduced for, although it's not in wide use at this point. I guess the type really isn't used anywhere else; just in the inode's i_blocks. Hmm. On reflection, I think we should use ext4_fsblk_t in this case, because some ext4 codes such as ext4_ext_get_blocks() use it. int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, ext4_fsblk_t iblock, So I would like to change "unsigned long" into ext4_fsblk_t in my next patch. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)
Hi, Thank you for your comment. >>>1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical >>> block number of the specified file. With this ioctl, a command >>> gets the specified directory's. >> >>Maybe I don't understand, but how is this different from the long-time >>FIBMAP ioctl? > >I can use FIBMAP instead of my new ioctl. >You are right. I should have used FIBMAP ioctl... I have to get the physical block number of the specified directory. But FIBMAP is available only for a regular file, not for a directory. So I will use my new ioctl. Though it might make sense to implement FIBMAP for a directory, to keep it consistent and allow user-space tools like "filefrag" to work on directories also. It sounds good. I think it will be useful for other tools which use FIBMAP. So I will consider the implementation of FIBMAP for a directory. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)
Hi, On Jan 16, 2007 21:03 +0900, [EMAIL PROTECTED] wrote: 1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical block number of the specified file. With this ioctl, a command gets the specified directory's. Maybe I don't understand, but how is this different from the long-time FIBMAP ioctl? I can use FIBMAP instead of my new ioctl. You are right. I should have used FIBMAP ioctl... I have to get the physical block number of the specified directory. But FIBMAP is available only for a regular file, not for a directory. So I will use my new ioctl. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] ext4 online defrag (ver 0.2)
Hi, On Jan 16, 2007 21:03 +0900, [EMAIL PROTECTED] wrote: 1. Add new ioctl(EXT4_IOC_DEFRAG) which returns the first physical block number of the specified file. With this ioctl, a command gets the specified directory's. Maybe I don't understand, but how is this different from the long-time FIBMAP ioctl? I can use FIBMAP instead of my new ioctl. You are right. I should have used FIBMAP ioctl... struct ext4_ext_defrag_data { loff_t start_offset; /* start offset to defrag in byte */ loff_t defrag_size; /* size of defrag in bytes */ ext4_fsblk_t goal; /* block offset for allocation */ }; Two things of note: - presumably the start_offset and defrag_size should be multiples of the filesystem blocksize? If they are not, is it an error or are they adjusted to cover whole blocks? Given the value which isn't multiples of the blocksize, they are adjusted to cover whole blocks in the kernel. But I think that it isn't clean that the unit of goal is different from start_offset and defrag_size. I will change their unit into a blocksize in the next update. - in previous defrag discussions (i.e. XFS defrag), it was desirable to allow specifying different types of goals (e.g. hard, soft, kernel picks). We may as well have a structure that allows these to be specified, instead of having to change the interface afterward. Let me see... Is it the following discussion? http://marc.theaimsgroup.com/?l=linux-ext4&m=116161490908645&w=2 http://marc.theaimsgroup.com/?l=linux-ext4&m=116184475306761&w=2 Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 0/3] Extent base online defrag
Hi, > - Specify the target area in a file using the following structure: > struct ext3_ext_defrag_data { > loff_t start_offset; /* start offset to defrag in bytes */ > loff_t defrag_size; /* size of defrag in bytes */ > } > It uses loff_t so that the size of the structure is identical on > both 32 bits and 64 bits architecture. > Block allocation, including searching for the free contiguous > blocks, is implemented in kernel. NAK the ioctl approach. I agree it shouldn't go into mainline this way, but while the details of the proper interface are debated, this implementation at least allows the core function to be tested & reviewed. People who like ioctls are just holdovers from non-Linux OS's. Thank you for your comments. My patches are at the experimental phase and the ioctl approach is the provisional solution. But I intend to continue this work with ioctl approach, if there are no actual problems. Cheers, Takashi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html