Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted

2013-09-23 Thread Liu Bo
On Mon, Sep 23, 2013 at 10:11:42AM +0100, Filipe David Manana wrote:
> On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie  wrote:
> >
> > On  sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > > wait for delayed work to finish before returning success to the
> > > caller. This change fixes this, ensuring that there's no data loss
> > > if a power failure happens right after fs sync returns success to
> > > the caller and before the next commit happens.
> > >
> > > Steps to reproduce the data loss issue:
> > >
> > > $ mkfs.btrfs -f /dev/sdb3
> > > $ mount /dev/sdb3 /mnt/btrfs
> > > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print 
> > > $f $d; close($f);' && btrfs fi sync /mnt/btrfs
> > >
> > > Right after the btrfs fi sync command (a second or 2 for example), power
> > > off the machine and reboot it. The file will be empty, as it can be 
> > > verified
> > > after mounting the filesystem and through btrfs-debug-tree:
> > >
> > > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 
> > > -A 8
> > >
> > > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > > location key (257 INODE_ITEM 0) type FILE
> > > namelen 6 datalen 0 name: foobar
> > > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > > inode generation 7 transid 7 size 0 block group 0 mode 
> > > 100644 links 1
> > > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > > inode ref index 2 namelen 6 name: foobar
> > > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > > uuid tree key (UUID_TREE ROOT_ITEM 0)
> > >
> > > After this patch, the data loss no longer happens after a power failure 
> > > and
> > > btrfs-debug-tree shows:
> > >
> > > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 
> > > -A 8
> > >   item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > >   location key (257 INODE_ITEM 0) type FILE
> > >   namelen 6 datalen 0 name: foobar
> > >   item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > >   inode generation 6 transid 6 size 6001 block group 0 mode 
> > > 100644 links 1
> > >   item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > >   inode ref index 2 namelen 6 name: foobar
> > >   item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> > >   extent data disk byte 12845056 nr 8192
> > >   extent data offset 0 nr 8192 ram 8192
> > >   extent compression 0
> > > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > >
> > > Signed-off-by: Filipe David Borba Manana 
> > > ---
> > >  fs/btrfs/super.c |5 +
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > > index 6ab0df5..557e38f 100644
> > > --- a/fs/btrfs/super.c
> > > +++ b/fs/btrfs/super.c
> > > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > >   struct btrfs_trans_handle *trans;
> > >   struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> > >   struct btrfs_root *root = fs_info->tree_root;
> > > + int ret;
> > >
> > >   trace_btrfs_sync_fs(wait);
> > >
> > > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> > >   return 0;
> > >   }
> > >
> > > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > > + if (ret)
> > > + return ret;
> > > +
> >
> > I don't think we should call btrfs_start_all_delalloc_inodes(), because 
> > this function is also
> > called by do_sync(), but do_sync() syncs the whole fs before calling it, so 
> > if we add
> > btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the 
> > second one is unnecessary.
> 
> Where is that do_sync() function exactly? I'm not finding any with
> that exact name in fs/btrfs/* nor fs/*

I think it should refer to sync_filesystem() :)

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted

2013-09-23 Thread Filipe David Manana
On Mon, Sep 23, 2013 at 2:30 AM, Miao Xie  wrote:
>
> On  sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> > Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> > wait for delayed work to finish before returning success to the
> > caller. This change fixes this, ensuring that there's no data loss
> > if a power failure happens right after fs sync returns success to
> > the caller and before the next commit happens.
> >
> > Steps to reproduce the data loss issue:
> >
> > $ mkfs.btrfs -f /dev/sdb3
> > $ mount /dev/sdb3 /mnt/btrfs
> > $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f 
> > $d; close($f);' && btrfs fi sync /mnt/btrfs
> >
> > Right after the btrfs fi sync command (a second or 2 for example), power
> > off the machine and reboot it. The file will be empty, as it can be verified
> > after mounting the filesystem and through btrfs-debug-tree:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 
> > 8
> >
> > item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> > location key (257 INODE_ITEM 0) type FILE
> > namelen 6 datalen 0 name: foobar
> > item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> > inode generation 7 transid 7 size 0 block group 0 mode 
> > 100644 links 1
> > item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> > inode ref index 2 namelen 6 name: foobar
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> > leaf 29429760 items 0 free space 3995 generation 7 owner 7
> > fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> > chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> > uuid tree key (UUID_TREE ROOT_ITEM 0)
> >
> > After this patch, the data loss no longer happens after a power failure and
> > btrfs-debug-tree shows:
> >
> > $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 
> > 8
> >   item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> >   location key (257 INODE_ITEM 0) type FILE
> >   namelen 6 datalen 0 name: foobar
> >   item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> >   inode generation 6 transid 6 size 6001 block group 0 mode 
> > 100644 links 1
> >   item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> >   inode ref index 2 namelen 6 name: foobar
> >   item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
> >   extent data disk byte 12845056 nr 8192
> >   extent data offset 0 nr 8192 ram 8192
> >   extent compression 0
> > checksum tree key (CSUM_TREE ROOT_ITEM 0)
> >
> > Signed-off-by: Filipe David Borba Manana 
> > ---
> >  fs/btrfs/super.c |5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 6ab0df5..557e38f 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >   struct btrfs_trans_handle *trans;
> >   struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> >   struct btrfs_root *root = fs_info->tree_root;
> > + int ret;
> >
> >   trace_btrfs_sync_fs(wait);
> >
> > @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
> >   return 0;
> >   }
> >
> > + ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> > + if (ret)
> > + return ret;
> > +
>
> I don't think we should call btrfs_start_all_delalloc_inodes(), because this 
> function is also
> called by do_sync(), but do_sync() syncs the whole fs before calling it, so 
> if we add
> btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the 
> second one is unnecessary.

Where is that do_sync() function exactly? I'm not finding any with
that exact name in fs/btrfs/* nor fs/*
I used this approach because (besides working) it's what is done in
btrfs_commit_transaction() (via btrfs_start_delalloc_flush and
btrfs_wait_delalloc_flush). Why can it be like that in the transaction
commit and not in btrfs_sync_fs() ?

>
> Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix 
> this problem.

Just tested it, and it works that way too. Uploading a new patch.

Thanks for the feedback/review Miao.

>
> Thanks
> Miao
>
> >   btrfs_wait_all_ordered_extents(fs_info);
> >
> >   trans = btrfs_attach_transaction_barrier(root);
> >
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted

2013-09-22 Thread Miao Xie
On  sun, 22 Sep 2013 21:55:53 +0100, Filipe David Borba Manana wrote:
> Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
> wait for delayed work to finish before returning success to the
> caller. This change fixes this, ensuring that there's no data loss
> if a power failure happens right after fs sync returns success to
> the caller and before the next commit happens.
> 
> Steps to reproduce the data loss issue:
> 
> $ mkfs.btrfs -f /dev/sdb3
> $ mount /dev/sdb3 /mnt/btrfs
> $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f 
> $d; close($f);' && btrfs fi sync /mnt/btrfs
> 
> Right after the btrfs fi sync command (a second or 2 for example), power
> off the machine and reboot it. The file will be empty, as it can be verified
> after mounting the filesystem and through btrfs-debug-tree:
> 
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
> 
> item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
> location key (257 INODE_ITEM 0) type FILE
> namelen 6 datalen 0 name: foobar
> item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
> inode generation 7 transid 7 size 0 block group 0 mode 100644 
> links 1
> item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
> inode ref index 2 namelen 6 name: foobar
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29429760 items 0 free space 3995 generation 7 owner 7
> fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
> chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
> uuid tree key (UUID_TREE ROOT_ITEM 0)
> 
> After this patch, the data loss no longer happens after a power failure and
> btrfs-debug-tree shows:
> 
> $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
>   item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
>   location key (257 INODE_ITEM 0) type FILE
>   namelen 6 datalen 0 name: foobar
>   item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
>   inode generation 6 transid 6 size 6001 block group 0 mode 
> 100644 links 1
>   item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
>   inode ref index 2 namelen 6 name: foobar
>   item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
>   extent data disk byte 12845056 nr 8192
>   extent data offset 0 nr 8192 ram 8192
>   extent compression 0
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> 
> Signed-off-by: Filipe David Borba Manana 
> ---
>  fs/btrfs/super.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 6ab0df5..557e38f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>   struct btrfs_trans_handle *trans;
>   struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>   struct btrfs_root *root = fs_info->tree_root;
> + int ret;
>  
>   trace_btrfs_sync_fs(wait);
>  
> @@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
>   return 0;
>   }
>  
> + ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
> + if (ret)
> + return ret;
> +

I don't think we should call btrfs_start_all_delalloc_inodes(), because this 
function is also
called by do_sync(), but do_sync() syncs the whole fs before calling it, so if 
we add
btrfs_start_all_delalloc_inodes() here, we will sync the fs twice, and the 
second one is unnecessary.

Calling writeback_inodes_sb() before btrfs_sync_fs() is better way to fix this 
problem.

Thanks
Miao

>   btrfs_wait_all_ordered_extents(fs_info);
>  
>   trans = btrfs_attach_transaction_barrier(root);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted

2013-09-22 Thread Filipe David Borba Manana
Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.

Steps to reproduce the data loss issue:

$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; 
close($f);' && btrfs fi sync /mnt/btrfs

Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8

item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 7 transid 7 size 0 block group 0 mode 100644 
links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)

After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:

$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 6 transid 6 size 6001 block group 0 mode 
100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
extent data disk byte 12845056 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)

Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/super.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6ab0df5..557e38f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -913,6 +913,7 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
struct btrfs_trans_handle *trans;
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *root = fs_info->tree_root;
+   int ret;
 
trace_btrfs_sync_fs(wait);
 
@@ -921,6 +922,10 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
return 0;
}
 
+   ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
+   if (ret)
+   return ret;
+
btrfs_wait_all_ordered_extents(fs_info);
 
trans = btrfs_attach_transaction_barrier(root);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html