I/O Error Test 1
================

commit "bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags"

Problem: the cacheset is not retired immediately on I/O errors in
cache device if I/O requests keep coming.

Original kernel: bcache device remains on top of caching device,
and 'fio' never finishes.

Modified kernel: bcache device is removed from caching device,
and fio finishes.

Original
--------

# uname -rv
4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019

# ./setup.sh >/dev/null 2>&1
[  285.677682] bcache: register_bdev() registered backing device dm-0
[  285.697006] bcache: run_cache_set() invalidating existing data
[  285.710938] bcache: register_cache() registered cache device dm-1
[  287.686924] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
c589879b-b1c3-49b3-9603-9795ddc750f5

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 

# ./dm_fake_dev.sh /dev/loop1 bad
[  766.102586] Buffer I/O error on dev dm-1, logical block 262128, async page 
read
[  766.107602] Buffer I/O error on dev dm-1, logical block 262128, async page 
read
[  766.113889] bcache: register_bcache() error /dev/dm-1: device already 
registered

On another shell:

# fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k --iodepth=8 
--ioengine=libaio --runtime=300s --continue_on_error=all
write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=8
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]

< fio never finishes, keeps showing the last line above >
<  console prints as below when fio starts running .... >

[  777.451177] bcache: bch_count_io_errors() dm-1: IO error on writing btree, 
recovering
[  777.490882] bcache: error on c589879b-b1c3-49b3-9603-9795ddc750f5:
[  777.490885] journal io error
[  777.494087] , disabling caching
[  807.900700] bcache: bch_count_io_errors() dm-1: IO error on writing btree, 
recovering
(error msgs looping)

bcache0 still present on top of cache device (fake-loop1)

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop
└─fake-loop0 253:0    0 1024M  0 dm
  └─bcache0  251:0    0 1024M  0 disk
loop1          7:1    0    1G  0 loop
fake-loop1   253:1    0    1G  0 dm
└─bcache0    251:0    0 1024M  0 disk


Modified
--------

# uname -rv
4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37 
UTC


# ./setup.sh >/dev/null 2>&1
[   60.542088] bcache: register_bdev() registered backing device dm-0
[   60.550509] bcache: run_cache_set() invalidating existing data
[   60.560109] bcache: register_cache() registered cache device dm-1
[   62.548849] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
f6833a2c-53e6-468e-bf1d-a9f48b73d783

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 

# ./dm_fake_dev.sh /dev/loop1 bad
[   72.639185] Buffer I/O error on dev dm-1, logical block 262128, async page 
read
[   72.644876] Buffer I/O error on dev dm-1, logical block 262128, async page 
read
[   72.650707] bcache: register_bcache() error /dev/dm-1: device already 
registered

On another shell:

# fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k
--iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all

[   97.858468] bcache: bch_count_io_errors() dm-1: IO error on writing btree, 
recovering
[   97.868519] bcache: error on f6833a2c-53e6-468e-bf1d-a9f48b73d783:
[   97.868520] journal io error
[   97.869998] , disabling caching
[   97.871441] bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache0 is "auto" and cache is clean, keep it 
alive.
[   97.874423] Buffer I/O error on dev bcache0, logical block 2814, lost async 
page write
[   97.878697] Buffer I/O error on dev bcache0, logical block 2816, lost async 
page write
[   97.881702] Buffer I/O error on dev bcache0, logical block 2817, lost async 
page write
[   97.884790] Buffer I/O error on dev bcache0, logical block 2818, lost async 
page write
[   97.887709] Buffer I/O error on dev bcache0, logical block 2819, lost async 
page write
[   97.890558] Buffer I/O error on dev bcache0, logical block 2820, lost async 
page write
[   97.892419] Buffer I/O error on dev bcache0, logical block 2821, lost async 
page write
[   97.894228] Buffer I/O error on dev bcache0, logical block 2822, lost async 
page write
[   97.896107] Buffer I/O error on dev bcache0, logical block 2823, lost async 
page write
[   97.897900] Buffer I/O error on dev bcache0, logical block 2824, lost async 
page write
[   97.916818] bcache: cached_dev_detach_finish() Caching disabled for dm-0
[   97.918511] bcache: bch_count_io_errors() dm-1: IO error on writing btree, 
recovering
[   97.920581] bcache: cache_set_free() Cache set 
f6833a2c-53e6-468e-bf1d-a9f48b73d783 unregistered

fio finished:

write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=8
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=135MiB/s][r=0,w=34.7k IOPS][eta 
00m:00s]
...
Run status group 0 (all jobs):
  WRITE: bw=219MiB/s (229MB/s), 219MiB/s-219MiB/s (229MB/s-229MB/s), io=1024MiB 
(1074MB), run=4685-4685msec

bcache not on top of caching device:

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
fake-loop1   253:1    0    1G  0 dm

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]

   * The bcache code in Bionic lacks several fixes to handle
     I/O errors in both backing devices and caching devices.

   * Partial or permanent errors in backing or caching devices,
     specially in writeback mode, can lead to data loss and/or
     the application is not notified about failed I/O requests.

   * The bcache device might remain available for I/O requests
     even if backing device is offline, so writes are undefined.

  [Test Case]

   * Detailed test cases/steps for the behavior of almost every
     patch with code logic changes are provided in bug comments.

   * The patchset has been tested for regressions on each cache
     mode (writethrough, writeback, writearound, none) with the
     xfstests test suite (on ext4), fio (random read-write) and
     iozone (several read/write tests).

  [Regression Potential]

   * The patchset is relatively large and touches several areas
     in bcache code, however, synthetic testing of the patches
     has been performed, and extensive regression/stress tests
     were run (as mentioned in Test Case section).

   * Many patches in the patchset are 'Fixes' patches to other
     patches, and no further 'Fixes' currently exist upstream.

  [Other Info]

   * Canonical Field Eng. deploys bcache+writeback extensively
     (e.g., BootStack, UA cloud, except rare all-flash cases).

  
  [Original Bug Description]

  This is a request for a backport of the following upstream patch from
  4.18:

  "bcache: stop bcache device when backing device is offline"
  
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

  Field engineering uses bcache quite extensively and it would be good
  to have this in the GA/bionic kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to