I/O Error Test 6 (for the Disco kernel)
================

commit: 'Revert "bcache: set CACHE_SET_IO_DISABLE in
bch_cached_dev_error()"'

Problem: if one backing device hits I/O errors the cache device
is disabled, but if that cache device is shared by other bcache
devices they stop too (even with non-failing backing devices).

Original kernel: all bcache devices that share cache device with
failing backing device are stopped.

Modified kernel: only the bcache device with the failing backing
device is stopped.


Original kernel:
---------------

root@bionic-bcache:~# uname -rv
5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019

root@bionic-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1           
                    
[   23.323929] bcache: register_bdev() registered backing device dm-0           
                    
[   23.330821] bcache: register_bdev() registered backing device dm-1           
                    
[   23.335493] bcache: run_cache_set() invalidating existing data               
                    
[   23.347255] bcache: register_cache() registered cache device dm-2            
                    
[   24.335738] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
f816e09d-f744-4fc9-b3bd-239f3d5093c6
[   24.342388] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 
f816e09d-f744-4fc9-b3bd-239f3d5093c6


root@bionic-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 

# echo writeback | tee /sys/block/bcache*/bcache/cache_mode
writeback

# echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed
always


root@bionic-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[   58.915344] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   58.921948] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   58.928886] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[   58.931006] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.936386] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.939346] Buffer I/O error on dev bcache0, logical block 262112, async 
page read
root@bionic-bcache:~# [   58.944685] bcache: bch_count_backing_io_errors() 
dm-0: IO error on backing device, unrecoverable
[   58.948468] Buffer I/O error on dev bcache0, logical block 262112, async 
page read
[   58.951078] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.954231] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.957216] Buffer I/O error on dev bcache0, logical block 1, async page read


# ./dm_fake_dev.sh /dev/loop0 bad
[  167.341298] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[  167.347802] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[  167.354959] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[  167.356585] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  167.364784] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  167.369083] Buffer I/O error on dev bcache0, logical block 262112, async 
page read
root@bionic-bcache:~# [  167.376976] bcache: bch_count_backing_io_errors() 
dm-0: IO error on backing device, unrecoverable
[  167.381644] Buffer I/O error on dev bcache0, logical block 262112, async 
page read
[  167.384195] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  167.387144] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  167.390040] Buffer I/O error on dev bcache0, logical block 1, async page read


root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero 
of=/dev/bcache0 bs=4k &
[1] 1464                                                                        
                    
[2] 1465                                                                        
                    
root@bionic-bcache:~# [  178.103060] bcache: bch_count_backing_io_errors() 
dm-0: IO error on backing device, unrecoverable
[  178.107790] Buffer I/O error on dev bcache0, logical block 0, lost async 
page write              
[  178.111814] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.116428] Buffer I/O error on dev bcache0, logical block 1, lost async 
page write              
[  178.119286] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.122070] Buffer I/O error on dev bcache0, logical block 2, lost async 
page write              
[  178.122601] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.128535] Buffer I/O error on dev bcache0, logical block 3, lost async 
page write              
[  178.132472] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.136169] Buffer I/O error on dev bcache0, logical block 4, lost async 
page write              
[  178.139426] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.143021] Buffer I/O error on dev bcache0, logical block 5, lost async 
page write              
[  178.146279] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.149876] Buffer I/O error on dev bcache0, logical block 6, lost async 
page write              
[  178.153119] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.156697] Buffer I/O error on dev bcache0, logical block 7, lost async 
page write              
[  178.159941] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.163519] Buffer I/O error on dev bcache0, logical block 8, lost async 
page write
[  178.166783] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.170706] Buffer I/O error on dev bcache0, logical block 9, lost async 
page write
[  178.173933] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.177574] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.181235] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
...
[  178.362803] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[  178.366412] bcache: bch_cached_dev_error() stop bcache0: too many IO errors 
on backing device dm-0
[  178.366412]
[  178.501362] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set
[  178.504932] bcache: bch_cache_set_error() bcache: error on 
f816e09d-f744-4fc9-b3bd-239f3d5093c6:
[  178.509390] journal io error
[  178.509391] bcache: bch_cache_set_error() , disabling caching
[  178.509391]
[  178.517586] bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache0 is "always", stop it for failed cache set 
f816
e09d-f744-4fc9-b3bd-239f3d5093c6.
[  178.524925] bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache1 is "always", stop it for failed cache set 
f816
e09d-f744-4fc9-b3bd-239f3d5093c6.
[  178.562349] bcache: cached_dev_detach_finish() Caching disabled for dm-1
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.89317 s, 371 MB/s
[  180.186818] bcache: bcache_device_free() bcache0 stopped
[  180.188875] bcache: bch_count_io_errors() dm-2: IO error on writing btree.
[  180.214681] bcache: cache_set_free() Cache set 
f816e09d-f744-4fc9-b3bd-239f3d5093c6 unregistered
dd: error writing '/dev/bcache1': No space left on device
[  181.732575] bcache: bcache_device_free() bcache1 stopped
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.44023 s, 242 MB/s

root@bionic-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop
loop1          7:1    0    1G  0 loop
└─fake-loop1 253:1    0 1024M  0 dm
loop2          7:2    0    1G  0 loop
└─fake-loop2 253:2    0 1024M  0 dm
fake-loop0   253:0    0    1G  0 dm

both bcache0 and bcache1 devices removed.



Modified kernel:
---------------

root@bionic-bcache:~# uname -rv
5.0.0-21-generic #22+test20190707build1 SMP Mon Jul 8 01:50:31 UTC 2019

root@bionic-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[   25.668092] bcache: register_bdev() registered backing device dm-0
[   25.680959] bcache: register_bdev() registered backing device dm-1
[   25.686178] bcache: run_cache_set() invalidating existing data
[   25.695269] bcache: register_cache() registered cache device dm-2
[   26.691859] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
b3823d82-8753-44ef-a7df-e1271b667021
[   26.698108] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 
b3823d82-8753-44ef-a7df-e1271b667021


root@bionic-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 

# echo writeback | tee /sys/block/bcache*/bcache/cache_mode
writeback

# echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed
always


root@bionic-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[   49.073126] Buffer I/O error on dev dm-0, logical block 262128, async page 
read                  
[   49.079509] Buffer I/O error on dev dm-0, logical block 262128, async page 
read                  
[   49.086012] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[   49.088466] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   49.093359] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   49.096298] Buffer I/O error on dev bcache0, logical block 262112, async 
page read               
root@bionic-bcache:~# [   49.100583] bcache: bch_count_backing_io_errors() 
dm-0: IO error on backing device, unrecoverable
[   49.103578] Buffer I/O error on dev bcache0, logical block 262112, async 
page read               
[   49.107542] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   49.111926] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   49.116200] Buffer I/O error on dev bcache0, logical block 1, async page 
read    


root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero 
of=/dev/bcache0 bs=4k &
[1] 1453                                                                        
                    
[2] 1454                                                                        
                    
root@bionic-bcache:~# [   55.398092] bcache: bch_count_backing_io_errors() 
dm-0: IO error on backing device, unrecoverable
[   55.404433] Buffer I/O error on dev bcache0, logical block 0, lost async 
page write              
[   55.409868] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.414151] Buffer I/O error on dev bcache0, logical block 1, lost async 
page write              
[   55.417134] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.420521] Buffer I/O error on dev bcache0, logical block 2, lost async 
page write              
[   55.423094] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.428977] Buffer I/O error on dev bcache0, logical block 3, lost async 
page write
[   55.433236] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.436400] Buffer I/O error on dev bcache0, logical block 4, lost async 
page write
[   55.439314] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
...
[   55.720661] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.726927] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.734921] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.743469] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.747248] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.750829] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.754349] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   55.757972] bcache: bch_cached_dev_error() stop bcache0: too many IO errors 
on backing device dm-0
[   55.757972]                                                                  
                    
dd: error writing '/dev/bcache1': No space left on device                       
                    
dd: error writing '/dev/bcache0': No space left on device                       
                    
262142+0 records in                                                             
                    
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 3.62916 s, 296 MB/s
[   58.188089] bcache: bcache_device_free() bcache0 stopped

root@bionic-bcache:~# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
fake-loop0   253:0    0    1G  0 dm  

bcache0 is removed, bcache1 is still avaiable.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]

   * The bcache code in Bionic lacks several fixes to handle
     I/O errors in both backing devices and caching devices.

   * Partial or permanent errors in backing or caching devices,
     specially in writeback mode, can lead to data loss and/or
     the application is not notified about failed I/O requests.

   * The bcache device might remain available for I/O requests
     even if backing device is offline, so writes are undefined.

  [Test Case]

   * Detailed test cases/steps for the behavior of many patches
     with code logic changes are provided in bug comments.

   * The patchset has been tested for regressions on each cache
     mode (writethrough, writeback, writearound, none) with the
     xfstests test suite (on ext4) and fio (sequential + random
     read-write).

  [Regression Potential]

   * The patchset is relatively large and touches several areas
     in bcache code, however, synthetic testing of the patches
     has been performed, and extensive regression/stress tests
     were run (as mentioned in Test Case section).

   * Many patches in the patchset are 'Fixes' patches to other
     patches, and no further 'Fixes' currently exist upstream.

  [Other Info]

   * Canonical Field Eng. deploys bcache+writeback extensively
     (e.g., BootStack, UA cloud, except rare all-flash cases).

  [Original Bug Description]

  This is a request for a backport of the following upstream patch from
  4.18:

  "bcache: stop bcache device when backing device is offline"
  
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

  Field engineering uses bcache quite extensively and it would be good
  to have this in the GA/bionic kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to