2/3 patches were already included in upstream v5.15.46. Updated the
shared commits to refer to both reports and committed the 3rd patch for
next cycle (the stable updates also are for next cycle).

** Changed in: linux (Ubuntu Jammy)
       Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1980925

Title:
  [SRU] bcache deadlock during read IO in writeback mode

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Invalid
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]

  When Random Read I/O is started with a test like -

  fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G
  --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128
  --rw=randread --randrepeat=0

  or

  random read-writes with a test like,

  fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB
  --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128
  --name=iops-test-job --randrepeat=0

  traces are seen in the kernel log,

  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>

  The bug exists till kernel 5.15.50-051550-generic

  The reproducer is pasted below:

  # uname -a
  Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  sdd           8:48   0 279.4G  0 disk
  └─sdd1        8:49   0    60G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  nvme0n1     259:0    0 372.6G  0 disk
  └─nvme0n1p1 259:2    0    15G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount

  fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G 
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread 
--group_reporting=1
  read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  read_iops: Laying out IO file (1 file / 12288MiB)

  The test does not progress beyond a few minutes, and this trace is
  then seen in the kernel log,

  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>

  [Fix]
  These 3 fixes are needed for the SRU.

  dea3560e5f31965165bcf34ecf0b47af28bfd155, 
6445ec3df23f24677064a327dce437ef3e02dc6,
  dc60301fb408e06e0b718c0980cdd31d2b238bee

  I have built these fixes into kernel 5.15.0-39-generic (jammy) and
  tested to verify the problem is fixed.

  [Regression Potential]

  Regression potential should be minimal. I have not seen any potential
  drawbacks or harmful effects of this fix in my testing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to