Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-25 Thread Chris Mason

On 25 Jun 2018, at 7:10, David Sterba wrote:


On Fri, Jun 22, 2018 at 05:25:54PM -0400, Chris Mason wrote:

The bug came here:

commit a528a24150870c5c16cbbbec69dba7e992b08456
Author: Souptick Joarder 
Date:   Wed Jun 6 19:54:44 2018 +0530

 btrfs: change return type of btrfs_page_mkwrite to vm_fault_t

When page->mapping != mapping, we improperly return success because 
ret2

is zero on goto out_unlock.

I'm running a fix through a full xfstests and I'll post soon.


The ret/ret2 pattern is IMO used in the wrong way, the ret2 is some
temporary value and it should be set and tested next to each other, 
not

holding the state accross the function.

The fix I'd propose is to avoid relying on it and add a separate exit
block, similar to out_noreserve:

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8981,7 +8981,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault 
*vmf)

ret = VM_FAULT_SIGBUS;
goto out_unlock;
}
-   ret2 = 0;

/* page is wholly or partially inside EOF */
if (page_start + PAGE_SIZE > size)
@@ -9004,14 +9003,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault 
*vmf)
BTRFS_I(inode)->last_log_commit = 
BTRFS_I(inode)->root->last_log_commit;


unlock_extent_cached(io_tree, page_start, page_end, 
&cached_state);

-
-out_unlock:
-   if (!ret2) {
-   btrfs_delalloc_release_extents(BTRFS_I(inode), 
PAGE_SIZE, true);

-   sb_end_pagefault(inode->i_sb);
-   extent_changeset_free(data_reserved);
-   return VM_FAULT_LOCKED;
-   }
unlock_page(page);


VM_FAULT_LOCKED is where we return success.  The out_unlock goto is 
confusing because it's actually only used for failure, but the goto 
lands right above the if (everything actually worked) {} test.


On top of the patch I sent today, moving out_unlock: after the if would 
make it more clear.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-25 Thread David Sterba
On Fri, Jun 22, 2018 at 05:25:54PM -0400, Chris Mason wrote:
> The bug came here:
> 
> commit a528a24150870c5c16cbbbec69dba7e992b08456
> Author: Souptick Joarder 
> Date:   Wed Jun 6 19:54:44 2018 +0530
> 
>  btrfs: change return type of btrfs_page_mkwrite to vm_fault_t
> 
> When page->mapping != mapping, we improperly return success because ret2 
> is zero on goto out_unlock.
> 
> I'm running a fix through a full xfstests and I'll post soon.

The ret/ret2 pattern is IMO used in the wrong way, the ret2 is some
temporary value and it should be set and tested next to each other, not
holding the state accross the function.

The fix I'd propose is to avoid relying on it and add a separate exit
block, similar to out_noreserve:

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8981,7 +8981,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out_unlock;
}
-   ret2 = 0;
 
/* page is wholly or partially inside EOF */
if (page_start + PAGE_SIZE > size)
@@ -9004,14 +9003,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
 
unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
-
-out_unlock:
-   if (!ret2) {
-   btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
-   sb_end_pagefault(inode->i_sb);
-   extent_changeset_free(data_reserved);
-   return VM_FAULT_LOCKED;
-   }
unlock_page(page);
 out:
btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
@@ -9021,6 +9012,12 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
sb_end_pagefault(inode->i_sb);
extent_changeset_free(data_reserved);
return ret;
+
+out_unlock:
+   btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
+   sb_end_pagefault(inode->i_sb);
+   extent_changeset_free(data_reserved);
+   return VM_FAULT_LOCKED;
 }
 
 static int btrfs_truncate(struct inode *inode, bool skip_writeback)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-22 Thread Chris Mason

On 20 Jun 2018, at 16:24, David Sterba wrote:


On Wed, Jun 20, 2018 at 03:48:08PM -0400, Chris Mason wrote:



generic/095 [18:07:03][ 3769.317862] run fstests generic/095 at
2018-06-20 18:07:03


Hmpf, I pass both 095 and 208 here.

[ 3774.849685] BTRFS: device fsid 
3acffad9-28e5-43ce-80e1-f5032e334cba

devid 1 transid 5 /dev/vdb
[ 3774.875409] BTRFS info (device vdb): disk space caching is 
enabled

[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big 
metadata

feature
[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O.
Possible data corruption due to collision with buffered I/O!
[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O.
Possible data corruption due to collision with buffered I/O!
[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319
btrfs_destroy_inode+0x1d5/0x290 [btrfs]



Which warning is this in your tree?  The file_write patch is more 
likely
to have screwed up our bits and the fixup worker is more likely to 
have

screwed up nrpages.


 9311 void btrfs_destroy_inode(struct inode *inode)
 9312 {
 9313 struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 9314 struct btrfs_ordered_extent *ordered;
 9315 struct btrfs_root *root = BTRFS_I(inode)->root;
 9316
 9317 WARN_ON(!hlist_empty(&inode->i_dentry));
 9318 WARN_ON(inode->i_data.nrpages);
 9319 WARN_ON(BTRFS_I(inode)->block_rsv.reserved);

The branch is the last pull, ie. no other 4.18-rc1 stuff plus your two 
patches.


The bug came here:

commit a528a24150870c5c16cbbbec69dba7e992b08456
Author: Souptick Joarder 
Date:   Wed Jun 6 19:54:44 2018 +0530

btrfs: change return type of btrfs_page_mkwrite to vm_fault_t

When page->mapping != mapping, we improperly return success because ret2 
is zero on goto out_unlock.


I'm running a fix through a full xfstests and I'll post soon.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-21 Thread Chris Mason




On 20 Jun 2018, at 15:33, David Sterba wrote:


On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
We've been hunting the root cause of data crc errors here at FB for a 
while.
We'd find one or two corrupted files, usually displaying crc errors 
without any
corresponding IO errors from the storage.  The bug was rare enough 
that we'd
need to watch a large number of machines for a few days just to catch 
it

happening.

We're still running these patches through testing, but the fixup 
worker bug
seems to account for the vast majority of crc errors we're seeing in 
the fleet.
It's cleaning pages that were dirty, and creating a window where they 
can be

reclaimed before we finish processing the page.


I'm having flashbacks when I see 'fixup worker', and the test 
generic/208 does

not make it better:

generic/095		[18:07:03][ 3769.317862] run fstests generic/095 at 
2018-06-20 18:07:03
[ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba 
devid 1 transid 5 /dev/vdb

[ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata 
feature

[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O.  
Possible data corruption due to collision with buffered I/O!

[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O.  
Possible data corruption due to collision with buffered I/O!

[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
btrfs_destroy_inode+0x1d5/0x290 [btrfs]
[ 3776.924182] Modules linked in: btrfs libcrc32c xor zstd_decompress 
zstd_compress xxhash raid6_pq loop [last unloaded: libcrc32c]
[ 3776.927703] CPU: 0 PID: 12036 Comm: umount Not tainted 
4.17.0-rc7-default+ #153
[ 3776.929164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014

[ 3776.931006] RIP: 0010:btrfs_destroy_inode+0x1d5/0x290 [btrfs]


Running generic/095 on current Linus git (without my patches), I'm 
seeing this same warning.  This makes me a little happy because I have 
my patches in prod, but mostly sad because it's easier to find when the 
suspect pool is small.  I'll bisect.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread David Sterba
On Wed, Jun 20, 2018 at 03:48:08PM -0400, Chris Mason wrote:
> 
> 
> On 20 Jun 2018, at 15:33, David Sterba wrote:
> 
> > On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
> >> We've been hunting the root cause of data crc errors here at FB for a 
> >> while.
> >> We'd find one or two corrupted files, usually displaying crc errors 
> >> without any
> >> corresponding IO errors from the storage.  The bug was rare enough 
> >> that we'd
> >> need to watch a large number of machines for a few days just to catch 
> >> it
> >> happening.
> >>
> >> We're still running these patches through testing, but the fixup 
> >> worker bug
> >> seems to account for the vast majority of crc errors we're seeing in 
> >> the fleet.
> >> It's cleaning pages that were dirty, and creating a window where they 
> >> can be
> >> reclaimed before we finish processing the page.
> >
> > I'm having flashbacks when I see 'fixup worker',
> 
> Yeah, I don't understand how so much pain can live in one little 
> function.
> 
> > and the test generic/208 does not make it better:
> >
> > generic/095 [18:07:03][ 3769.317862] run fstests generic/095 at 
> > 2018-06-20 18:07:03
> 
> Hmpf, I pass both 095 and 208 here.
> 
> > [ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba 
> > devid 1 transid 5 /dev/vdb
> > [ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
> > [ 3774.877723] BTRFS info (device vdb): has skinny extents
> > [ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata 
> > feature
> > [ 3774.885020] BTRFS info (device vdb): checking UUID tree
> > [ 3775.593329] Page cache invalidation failure on direct I/O.  
> > Possible data corruption due to collision with buffered I/O!
> > [ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
> > [ 3776.642812] Page cache invalidation failure on direct I/O.  
> > Possible data corruption due to collision with buffered I/O!
> > [ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
> > [ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
> > btrfs_destroy_inode+0x1d5/0x290 [btrfs]
> 
> 
> Which warning is this in your tree?  The file_write patch is more likely 
> to have screwed up our bits and the fixup worker is more likely to have 
> screwed up nrpages.

 9311 void btrfs_destroy_inode(struct inode *inode)
 9312 {
 9313 struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 9314 struct btrfs_ordered_extent *ordered;
 9315 struct btrfs_root *root = BTRFS_I(inode)->root;
 9316
 9317 WARN_ON(!hlist_empty(&inode->i_dentry));
 9318 WARN_ON(inode->i_data.nrpages);
 9319 WARN_ON(BTRFS_I(inode)->block_rsv.reserved);

The branch is the last pull, ie. no other 4.18-rc1 stuff plus your two patches.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread Chris Mason




On 20 Jun 2018, at 15:33, David Sterba wrote:


On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
We've been hunting the root cause of data crc errors here at FB for a 
while.
We'd find one or two corrupted files, usually displaying crc errors 
without any
corresponding IO errors from the storage.  The bug was rare enough 
that we'd
need to watch a large number of machines for a few days just to catch 
it

happening.

We're still running these patches through testing, but the fixup 
worker bug
seems to account for the vast majority of crc errors we're seeing in 
the fleet.
It's cleaning pages that were dirty, and creating a window where they 
can be

reclaimed before we finish processing the page.


I'm having flashbacks when I see 'fixup worker',


Yeah, I don't understand how so much pain can live in one little 
function.



and the test generic/208 does not make it better:

generic/095		[18:07:03][ 3769.317862] run fstests generic/095 at 
2018-06-20 18:07:03


Hmpf, I pass both 095 and 208 here.

[ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba 
devid 1 transid 5 /dev/vdb

[ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata 
feature

[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O.  
Possible data corruption due to collision with buffered I/O!

[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O.  
Possible data corruption due to collision with buffered I/O!

[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
btrfs_destroy_inode+0x1d5/0x290 [btrfs]



Which warning is this in your tree?  The file_write patch is more likely 
to have screwed up our bits and the fixup worker is more likely to have 
screwed up nrpages.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-20 Thread David Sterba
On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
> We've been hunting the root cause of data crc errors here at FB for a while.
> We'd find one or two corrupted files, usually displaying crc errors without 
> any
> corresponding IO errors from the storage.  The bug was rare enough that we'd
> need to watch a large number of machines for a few days just to catch it
> happening.
> 
> We're still running these patches through testing, but the fixup worker bug
> seems to account for the vast majority of crc errors we're seeing in the 
> fleet.
> It's cleaning pages that were dirty, and creating a window where they can be
> reclaimed before we finish processing the page.

I'm having flashbacks when I see 'fixup worker', and the test generic/208 does
not make it better:

generic/095 [18:07:03][ 3769.317862] run fstests generic/095 at 
2018-06-20 18:07:03
[ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba devid 1 
transid 5 /dev/vdb
[ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata feature
[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 
btrfs_destroy_inode+0x1d5/0x290 [btrfs]
[ 3776.924182] Modules linked in: btrfs libcrc32c xor zstd_decompress 
zstd_compress xxhash raid6_pq loop [last unloaded: libcrc32c]
[ 3776.927703] CPU: 0 PID: 12036 Comm: umount Not tainted 4.17.0-rc7-default+ 
#153
[ 3776.929164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 3776.931006] RIP: 0010:btrfs_destroy_inode+0x1d5/0x290 [btrfs]
[ 3776.932052] RSP: 0018:b2dac5943dc8 EFLAGS: 00010206
[ 3776.933066] RAX: 9ab763fe1000 RBX: 9ab7796bf4d8 RCX: 
[ 3776.934366] RDX:  RSI:  RDI: 9ab7796bf4d8
[ 3776.935708] RBP: b2dac5943e38 R08:  R09: 0002
[ 3776.93] R10: b2dac5943d28 R11: f9929087e0f2246e R12: 9ab7796bf4d8
[ 3776.937511] R13: a1dfb4b1 R14: 9ab775c657a0 R15: 9ab7796bd4b8
[ 3776.938346] FS:  7f0c97635fc0() GS:9ab77fc0() 
knlGS:
[ 3776.939502] CS:  0010 DS:  ES:  CR0: 80050033
[ 3776.940701] CR2: 7f0c96f7f793 CR3: 63819000 CR4: 06f0
[ 3776.942396] Call Trace:
[ 3776.942994]  dispose_list+0x51/0x80
[ 3776.943758]  evict_inodes+0x15b/0x1b0
[ 3776.944558]  generic_shutdown_super+0x3a/0x110
[ 3776.945501]  kill_anon_super+0xe/0x20
[ 3776.946272]  btrfs_kill_super+0x12/0xa0 [btrfs]
[ 3776.947313]  deactivate_locked_super+0x34/0x60
[ 3776.948421]  cleanup_mnt+0x3b/0x70
[ 3776.949201]  task_work_run+0x8d/0xc0
[ 3776.949971]  exit_to_usermode_loop+0x99/0xa0
[ 3776.950872]  do_syscall_64+0x17d/0x190
[ 3776.951783]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3776.952724] RIP: 0033:0x7f0c96efea57
[ 3776.953320] RSP: 002b:7ffc3ae13b98 EFLAGS: 0246 ORIG_RAX: 
00a6
[ 3776.954294] RAX:  RBX: 55cf12f21970 RCX: 7f0c96efea57
[ 3776.955196] RDX: 0001 RSI:  RDI: 55cf12f21b50
[ 3776.956648] RBP:  R08: 0005 R09: 
[ 3776.957964] R10: 55cf12f21b70 R11: 0246 R12: 55cf12f21b50
[ 3776.959657] R13: 7f0c974191c4 R14:  R15: 
[ 3776.961345] Code: ef e8 90 a7 fe ff e9 5f ff ff ff 0f 0b 48 83 bb d8 02 00 
00 00 0f 84 76 fe ff ff 0f 0b 48 83 bb f0 fe ff ff 00 0f 84 74 fe ff ff <0f> 0b 
48 83 bb e8 fe ff ff 00 0f 84 72 fe ff ff 0f 0b 8b 93 e4 
[ 3776.965122] irq event stamp: 12936
[ 3776.965598] hardirqs last  enabled at (12935): [] 
_raw_spin_unlock_irq+0x29/0x50
[ 3776.966691] hardirqs last disabled at (12936): [] 
error_entry+0x6c/0xc0
[ 3776.968171] softirqs last  enabled at (5088): [] 
__do_softirq+0x3a8/0x518
[ 3776.969521] softirqs last disabled at (5065): [] 
irq_exit+0xc1/0xd0
[ 3776.971686] ---[ end trace e11771ebe2e788d0 ]---
[ 3776.972746] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9320 
btrfs_destroy_inode+0x1e5/0x290 [btrfs]
[ 3776.974875] Modules linked in: btrfs libcrc32c xor zstd_decompress 
zstd_compress xxhash raid6_pq loop [last unloaded: libcrc32c]
[ 3776.977451] CPU: 0 PID: 12036 Comm: umount Tainted: GW 
4.17.0-rc7-default+ #153
[ 3776.978663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 3776.980291]