** Description changed:

  SRU Justification:
  
  [Impact]
  This bug in bcache affects (at least) focal and jammy releases.
  
  When Random Read I/O is started with a test like -
  
  fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G
  --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128
  --rw=randread --randrepeat=0
  
  or
  
  random read-writes with a test like,
  
  fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB
  --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128
  --name=iops-test-job --randrepeat=0
  
  traces are seen in the kernel log,
  
  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>
  
  The bug exists till kernel 5.15.50-051550-generic
  
  The reproducer is pasted below:
  
  # uname -a
  Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  
  NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  sdd           8:48   0 279.4G  0 disk
  └─sdd1        8:49   0    60G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  nvme0n1     259:0    0 372.6G  0 disk
  └─nvme0n1p1 259:2    0    15G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  
  fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G 
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread 
--group_reporting=1
  read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  read_iops: Laying out IO file (1 file / 12288MiB)
  
  The test does not progress beyond a few minutes, and this trace is then
  seen in the kernel log,
  
  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>
  
- 
  [Fix]
  These 3 fixes are needed for the SRU.
  
  dea3560e5f31965165bcf34ecf0b47af28bfd155, 
6445ec3df23f24677064a327dce437ef3e02dc6,
  dc60301fb408e06e0b718c0980cdd31d2b238bee
  
  I have built these fixes into kernel 5.15.0-39-generic (jammy) and
  tested to verify the problem is fixed.
  
  [Regression Potential]
  
- I have not seen any potential drawbacks or harmful effects of this fix
- in my testing. In fact it is required, without which the deadlock is
- easily reproduced both on focal as well as jammy GA.
+ Regression potential should be minimal. I have not seen any potential
+ drawbacks or harmful effects of this fix in my testing.

** Description changed:

  SRU Justification:
  
  [Impact]
- This bug in bcache affects (at least) focal and jammy releases.
  
  When Random Read I/O is started with a test like -
  
  fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G
  --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128
  --rw=randread --randrepeat=0
  
  or
  
  random read-writes with a test like,
  
  fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB
  --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128
  --name=iops-test-job --randrepeat=0
  
  traces are seen in the kernel log,
  
  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>
  
  The bug exists till kernel 5.15.50-051550-generic
  
  The reproducer is pasted below:
  
  # uname -a
  Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  
  NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  sdd           8:48   0 279.4G  0 disk
  └─sdd1        8:49   0    60G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  nvme0n1     259:0    0 372.6G  0 disk
  └─nvme0n1p1 259:2    0    15G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  
  fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G 
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread 
--group_reporting=1
  read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  read_iops: Laying out IO file (1 file / 12288MiB)
  
  The test does not progress beyond a few minutes, and this trace is then
  seen in the kernel log,
  
  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>
  
  [Fix]
  These 3 fixes are needed for the SRU.
  
  dea3560e5f31965165bcf34ecf0b47af28bfd155, 
6445ec3df23f24677064a327dce437ef3e02dc6,
  dc60301fb408e06e0b718c0980cdd31d2b238bee
  
  I have built these fixes into kernel 5.15.0-39-generic (jammy) and
  tested to verify the problem is fixed.
  
  [Regression Potential]
  
  Regression potential should be minimal. I have not seen any potential
  drawbacks or harmful effects of this fix in my testing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1980925

Title:
  [SRU] bcache deadlock during read IO in writeback mode

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Focal:
  Confirmed
Status in linux source package in Jammy:
  Confirmed

Bug description:
  SRU Justification:

  [Impact]

  When Random Read I/O is started with a test like -

  fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G
  --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128
  --rw=randread --randrepeat=0

  or

  random read-writes with a test like,

  fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB
  --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128
  --name=iops-test-job --randrepeat=0

  traces are seen in the kernel log,

  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>

  The bug exists till kernel 5.15.50-051550-generic

  The reproducer is pasted below:

  # uname -a
  Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
  sdd           8:48   0 279.4G  0 disk
  └─sdd1        8:49   0    60G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount
  nvme0n1     259:0    0 372.6G  0 disk
  └─nvme0n1p1 259:2    0    15G  0 part
    └─bcache0 252:0    0    60G  0 disk /home/ubuntu/bcache_mount

  fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G 
--ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread 
--group_reporting=1
  read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=128
  fio-3.28
  Starting 1 process
  read_iops: Laying out IO file (1 file / 12288MiB)

  The test does not progress beyond a few minutes, and this trace is
  then seen in the kernel log,

  [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 
seconds.
  [ 4474.050921]       Not tainted 5.15.50-051550-generic #202206251445
  [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 4474.731391] task:bcache_writebac state:D stack:    0 pid: 1835 ppid:     2 
flags:0x00004000
  [ 4474.731408] Call Trace:
  [ 4474.731411]  <TASK>
  [ 4474.731413]  __schedule+0x23d/0x5a0
  [ 4474.731433]  schedule+0x4e/0xb0
  [ 4474.731436]  rwsem_down_write_slowpath+0x220/0x3d0
  [ 4474.731441]  down_write+0x43/0x50
  [ 4474.731446]  bch_writeback_thread+0x78/0x320 [bcache]
  [ 4474.731471]  ? read_dirty_submit+0x70/0x70 [bcache]
  [ 4474.731487]  kthread+0x12a/0x150
  [ 4474.731491]  ? set_kthread_struct+0x50/0x50
  [ 4474.731494]  ret_from_fork+0x22/0x30
  [ 4474.731499]  </TASK>

  [Fix]
  These 3 fixes are needed for the SRU.

  dea3560e5f31965165bcf34ecf0b47af28bfd155, 
6445ec3df23f24677064a327dce437ef3e02dc6,
  dc60301fb408e06e0b718c0980cdd31d2b238bee

  I have built these fixes into kernel 5.15.0-39-generic (jammy) and
  tested to verify the problem is fixed.

  [Regression Potential]

  Regression potential should be minimal. I have not seen any potential
  drawbacks or harmful effects of this fix in my testing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1980925/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to