** Description changed: SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> - [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] - I have not seen any potential drawbacks or harmful effects of this fix - in my testing. In fact it is required, without which the deadlock is - easily reproduced both on focal as well as jammy GA. + Regression potential should be minimal. I have not seen any potential + drawbacks or harmful effects of this fix in my testing.
** Description changed: SRU Justification: [Impact] - This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1980925 Title: [SRU] bcache deadlock during read IO in writeback mode Status in linux package in Ubuntu: Confirmed Status in linux source package in Focal: Confirmed Status in linux source package in Jammy: Confirmed Bug description: SRU Justification: [Impact] When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp