[PATCH] btrfs: raid56: data corruption on a device removal

2018-12-11 Thread Dmitriy Gorokh
I found that RAID5 or RAID6 filesystem might be got corrupted in the following 
scenario:

1. Create 4 disks RAID6 filesystem
2. Preallocate 16 10Gb files
3. Run fio: 'fio --name=testload --directory=./ --size=10G --numjobs=16 
--bs=64k --iodepth=64 --rw=randrw --verify=sha256 --time_based --runtime=3600’
4. After few minutes pull out two drives: 'echo 1 > 
/sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’

About 5 of 10 times the test is run, it led to silent data corruption of a 
random extent, resulting in ‘IO Error’ and ‘csum failed’ messages while trying 
to read the affected file. It usually affects only small portion of the files 
and only one underlying extent of a file. When I converted logical address of 
the damaged extent to physical address and dumped a stripe directly from 
drives, I saw specific pattern, always the same when the issue occurs.

I found that few bios which were being processed right during the drives 
removal, contained non zero bio->bi_iter.bi_done field despite of  EIO 
bi_status. bi_sector field was also increased from original one by that 
'bi_done' value. Looks like this is a quite rare condition. Subsequently, in 
the raid_rmw_end_io handler that failed bio can be translated to a wrong stripe 
number and fail wrong rbio.


---
 fs/btrfs/raid56.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 3c8093757497..94ae70715195 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1451,6 +1451,9 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
struct btrfs_bio_stripe *stripe;

physical <<= 9;
+   // Since the failed bio can return partial data, bi_sector might be 
incremented
+   // by that value. We need to revert it back to the state before the bio 
was submitted.
+   physical -= bio->bi_iter.bi_done;

for (i = 0; i < rbio->bbio->num_stripes; i++) {
stripe = &rbio->bbio->stripes[i];
-- 
2.17.0




[PATCH v2] btrfs: raid56: data corruption on a device removal

2018-12-14 Thread Dmitriy Gorokh
RAID5 or RAID6 filesystem might get corrupted in the following scenario:

1. Create 4 disks RAID6 filesystem
2. Preallocate 16 10Gb files
3. Run fio: 'fio --name=testload --directory=./ --size=10G
--numjobs=16 --bs=64k --iodepth=64 --rw=randrw --verify=sha256
--time_based --runtime=3600’
4. After few minutes pull out two drives: 'echo 1 >
/sys/block/sdc/device/delete ;  echo 1 > /sys/block/sdd/device/delete’

About 5 of 10 times the test is run, it led to a silent data
corruption of a random stripe, resulting in ‘IO Error’ and ‘csum
failed’ messages while trying to read the affected file. It usually
affects only small portion of the files.

It is possible that few bios which were being processed during the
drives removal, contained non zero bio->bi_iter.bi_done field despite
of EIO bi_status. bi_sector field was also increased from original one
by that 'bi_done' value. Looks like this is a quite rare condition.
Subsequently, in the raid_rmw_end_io handler that failed bio can be
translated to a wrong stripe number and fail wrong rbio.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Dmitriy Gorokh 
---
 fs/btrfs/raid56.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 3c8093757497..cd2038315feb 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1451,6 +1451,12 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
  struct btrfs_bio_stripe *stripe;

  physical <<= 9;
+ /*
+  * Since the failed bio can return partial data, bi_sector might be
+  * incremented by that value. We need to revert it back to the
+  * state before the bio was submitted.
+  */
+ physical -= bio->bi_iter.bi_done;

  for (i = 0; i < rbio->bbio->num_stripes; i++) {
  stripe = &rbio->bbio->stripes[i];
-- 
2.17.0


[PATCH] Fix NULL pointer exception in find_bio_stripe()

2018-02-16 Thread Dmitriy Gorokh
On detaching of a disk which is a part of a RAID6 filesystem, the following 
kernel OOPS may happen:

[63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, flush 
1, corrupt 0, gen 0 
[63122.719584] BTRFS warning (device sdo): lost page write due to IO error on 
/dev/sdo 
[63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, flush 
1, corrupt 0, gen 0 
[63122.803516] BTRFS warning (device sdo): lost page write due to IO error on 
/dev/sdo 
[63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, flush 
1, corrupt 0, gen 0 
[63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo 
[63122.935338] BUG: unable to handle kernel NULL pointer dereference at 
0080 
[63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs] 
[63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0 
[63122.971202] Oops:  [#1] SMP 
[63122.990786] Modules linked in: libcrc32c dlm configfs cpufreq_userspace 
cpufreq_powersave cpufreq_conservative softdog nfsd auth_rpcgss nfs_acl nfs 
lockd grace fscache sunrpc bonding ipmi_devintf ipmi_msghandler joydev 
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd psmouse evdev parport_pc 
soundcore serio_raw battery pcspkr video ac97_bus ac parport ohci_pci ohci_hcd 
i2c_piix4 button crc32c_generic crc32c_intel btrfs xor zstd_decompress 
zstd_compress xxhash raid6_pq dm_mod dax raid1 md_mod hid_generic usbhid hid 
xhci_pci xhci_hcd ehci_pci ehci_hcd usbcore sg sd_mod sr_mod cdrom ata_generic 
ahci libahci ata_piix libata e1000 scsi_mod [last unloaded: scst] 
[63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 
4.14.2-16-scst34x+ #8 
[63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006 
[63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs] 
[63123.007595] task: 880036ea4040 task.stack: c90006384000 
[63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs] 
[63123.007968] RSP: 0018:c90006387ad8 EFLAGS: 00010287 
[63123.008140] RAX: 0002 RBX: 88004beaa0b8 RCX: 
8800b2bd5690 
[63123.008359] RDX:  RSI: 88007bb43500 RDI: 
88004beaa000 
[63123.008621] RBP: c90006387ae8 R08: 9910 R09: 
8800b2bd5600 
[63123.008840] R10: 0004 R11: 0001 R12: 
88007bb43500 
[63123.009059] R13: fffb R14: 880036fc5180 R15: 
0004 
[63123.009278] FS: () GS:8800b700() 
knlGS: 
[63123.009564] CS: 0010 DS:  ES:  CR0: 80050033 
[63123.009748] CR2: 0080 CR3: b0866000 CR4: 
000406f0 
[63123.009969] Call Trace: 
[63123.010085] raid_write_end_io+0x7e/0x80 [btrfs] 
[63123.010251] bio_endio+0xa1/0x120 
[63123.010378] generic_make_request+0x218/0x270 
[63123.010921] submit_bio+0x66/0x130 
[63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs] 
[63123.011245] full_stripe_write+0x96/0xc0 [btrfs] 
[63123.011428] raid56_parity_write+0x117/0x170 [btrfs] 
[63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs] 
[63123.011759] ? ___cache_free+0x1c5/0x300 
[63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs] 
[63123.012087] run_one_async_done+0x9c/0xc0 [btrfs] 
[63123.012257] normal_work_helper+0x19e/0x300 [btrfs] 
[63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs] 
[63123.012656] process_one_work+0x14d/0x350 
[63123.012888] worker_thread+0x4d/0x3a0 
[63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20 
[63123.013192] kthread+0x109/0x140 
[63123.013315] ? process_scheduled_works+0x40/0x40 
[63123.013472] ? kthread_stop+0x110/0x110 
[63123.013610] ret_from_fork+0x25/0x30 
[63123.013741] Code: 7e 43 31 c0 48 63 d0 48 8d 14 52 49 8d 4c d1 60 48 8b 51 
08 49 39 d0 72 1f 4c 63 1b 4c 01 da 49 39 d0 73 14 48 8b 11 48 8b 52 68 <48> 8b 
8a 80 00 00 00 48 39 4e 08 74 14 83 c0 01 44 39 d0 75 c4 
[63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: c90006387ad8 
[63123.014678] CR2: 0080 
[63123.016590] ---[ end trace a295ea7259c17880 ]— 

This is reproducible in a cycle, where a series of writes is followed by SCSI 
device delete command. The test may take up to few minutes.

Fixes: commit 74d46992e0d9dee7f1f376de0d56d31614c8a17a ("block: replace bi_bdev 
with a gendisk pointer and partitions index")
---
 fs/btrfs/raid56.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index dec0907dfb8a..fcfc20de2df3 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1370,6 +1370,7 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
stripe_start = stripe->physical;
if (physical >= stripe_start &&
physical < stripe_start + rbio->stripe_len &&
+   stripe->dev->bdev &&
bio->bi_disk == stripe->dev->bdev->bd_disk &&
bio->bi_partno == stripe->dev->bdev->bd_partno) {
return i;
-- 
2.14.2