On Mon, Sep 23, 2013 at 10:59 AM, Liu Bo <bo.li....@oracle.com> wrote: > On Mon, Sep 23, 2013 at 10:53:20AM +0100, Filipe David Manana wrote: >> On Mon, Sep 23, 2013 at 10:23 AM, Filipe David Borba Manana >> <fdman...@gmail.com> wrote: >> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't >> > wait for delayed work to finish before returning success to the >> > caller. This change fixes this, ensuring that there's no data loss >> > if a power failure happens right after fs sync returns success to >> > the caller and before the next commit happens. >> > >> > Steps to reproduce the data loss issue: >> > >> > $ mkfs.btrfs -f /dev/sdb3 >> > $ mount /dev/sdb3 /mnt/btrfs >> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print >> > $f $d; close($f);' && btrfs fi sync /mnt/btrfs >> > >> > Right after the btrfs fi sync command (a second or 2 for example), power >> > off the machine and reboot it. The file will be empty, as it can be >> > verified >> > after mounting the filesystem and through btrfs-debug-tree: >> > >> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 >> > -A 8 >> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 >> > location key (257 INODE_ITEM 0) type FILE >> > namelen 6 datalen 0 name: foobar >> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 >> > inode generation 7 transid 7 size 0 block group 0 mode >> > 100644 links 1 >> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 >> > inode ref index 2 namelen 6 name: foobar >> > checksum tree key (CSUM_TREE ROOT_ITEM 0) >> > leaf 29429760 items 0 free space 3995 generation 7 owner 7 >> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e >> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae >> > uuid tree key (UUID_TREE ROOT_ITEM 0) >> > >> > After this patch, the data loss no longer happens after a power failure and >> > btrfs-debug-tree shows: >> > >> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 >> > -A 8 >> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36 >> > location key (257 INODE_ITEM 0) type FILE >> > namelen 6 datalen 0 name: foobar >> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160 >> > inode generation 6 transid 6 size 6001 block group 0 mode >> > 100644 links 1 >> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16 >> > inode ref index 2 namelen 6 name: foobar >> > item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53 >> > extent data disk byte 12845056 nr 8192 >> > extent data offset 0 nr 8192 ram 8192 >> > extent compression 0 >> > checksum tree key (CSUM_TREE ROOT_ITEM 0) >> > >> > Signed-off-by: Filipe David Borba Manana <fdman...@gmail.com> >> > --- >> > >> > V2: Use writeback_inodes_sb() instead of >> > btrfs_start_all_delalloc_inodes(), as >> > suggested by Miao Xie. >> > >> > fs/btrfs/super.c | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >> > index 6ab0df5..38b4392 100644 >> > --- a/fs/btrfs/super.c >> > +++ b/fs/btrfs/super.c >> > @@ -921,6 +921,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait) >> > return 0; >> > } >> > >> > + writeback_inodes_sb(sb, WB_REASON_SYNC); >> > btrfs_wait_all_ordered_extents(fs_info); >> >> Ignore this 2nd patch version please, for 2 reasons: >> >> 1) It triggers a WARN_ON because writeback_inodes_sb() requires the >> sb->u_mount semaphore to be acquired before, which is not always the >> case (it is when called through btrfs_kill_super, otherwise it isn't) >> >> 2) It doesn't guarantee that inodes are actually written (see comment >> of writeback_inodes_sb()), so we can return 0 (success) when the >> writes actually didn't happen/succeed. Because of this, >> btrfs_start_all_delalloc_inodes() is more honest. > > What about > case BTRFS_IOC_SYNC: > btrfs_start_all_delalloc_inodes(); > btrfs_sync_fs(file->f_dentry->d_sb, 1); > return 0; > > This way, there is no impact on calling sync(1).
Sounds ok. Will try it, returning error if btrfs_start_all_delalloc_inodes() returns an error. Thanks for the suggestion and pointing me to sync_filesystem() :) > > -liubo -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html