Unfortunately, an unacceptable side-effect of "sysctl -w
vm.dirty_ratio=0" is that disk operations that move a lot of data are
taking much too long. An rsync that normally takes less than an hour of
real time was still running over 12 hours later (and hasn't finished
yet).  I'm reverting the vm.dirty_ratio back up to 20 to see if that
clears out all the unfinished disk I/O.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2018687

Title:
  rm -r dir on USB disk locks up hdparm on different disk

Status in linux-meta-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Description:    Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz

  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:

      May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
      [...]
      May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
      [...]
      May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.

  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:

      # \time hdparm -t /dev/sda
      /dev/sda:
       Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
      0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
      2312704inputs+8outputs (0major+664minor)pagefaults 0swaps

  Elapsed time is about six seconds, as expected.  /dev/sda is an
  internal SSD drive.

  I now run this loop to show the timings and process states below:

      # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v
  refrig ; done

  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:

      Sun May  7 05:01:07 EDT 2023
      Sun May  7 05:01:08 EDT 2023
      Sun May  7 05:01:09 EDT 2023
      Sun May  7 05:01:10 EDT 2023
      Sun May  7 05:01:11 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      4     0 1423283   11939  20   0  10648   580 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      Sun May  7 05:01:12 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0    4016       1  20   0 161136  1900 usbhid Dsl  ?          1:39 
/sbin/apcupsd
      4     0 1423283   11939  20   0  10648   580 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      Sun May  7 05:01:13 EDT 2023

  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:

      Sun May  7 05:01:25 EDT 2023
      1     0     368       2  20   0      0     0 md_sup D    ?          2:57 
[md0_raid5]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:26 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:27 EDT 2023
  [...]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:35 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      4     0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:36 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0    4985       2  20   0      0     0 rq_qos D    ?          0:24 
[jbd2/sdj1-8]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:37 EDT 2023

  I keep waiting.  The above lines repeat over and over and the hdparm is
  blocked and doesn't finish.

      Sun May  7 05:03:32 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:18 
[usb-storage]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:03 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:34 EDT 2023

  At this point, I SIGSTOP the offending "rm -r" and it disappears from
  uninterruptible sleep, but the other processes are still in it for
  several seconds until the lockup clears and the hdparm finally finishes:

      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:35 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:36 EDT 2023
      1     0    4016       1  20   0 161136  1900 usbhid Dsl  ?          1:39 
/sbin/apcupsd
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:37 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:38 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:39 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:40 EDT 2023
      Sun May  7 05:03:41 EDT 2023
      Sun May  7 05:03:42 EDT 2023
      Sun May  7 05:03:43 EDT 2023
      4     0 1423501    9975  20   0   4680  4584 wait_o DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:44 EDT 2023
      4     0 1423501    9975  20   0   4680  4584 wait_o DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:45 EDT 2023
      Sun May  7 05:03:46 EDT 2023

  And the hdparm is finally done, several minutes after I started it.  If I keep
  the offending "rm -r" running on that USB3 disk, the "rm -r" locks up an 
hdparm
  (running on a different disk) long enough to cause the above kernel INFO 
messages.

  If I try hdparm on a different disk, e.g. one of the WD Red drives in
  my internal RAID5 array, it also hangs until I SIGSTOP or kill the "rm -r":

      1     0     447       2  20   0      0     0 usb_sg D    ?          3:51 
[usb-storage]
      1     0    4985       2  20   0      0     0 blk_mq D    ?          0:31 
[jbd2/sdj1-8]
      1     0 1466181       2  20   0      0     0 rq_qos D    ?          0:08 
[kworker/u16:4+flush-8:144]
      4     0 1469821   11939  20   0  27524 18888 wait_o D+   pts/28     0:44 
rm -rf 15tb3
      4     0 1501584   11933  20   0   4680  4584 rq_qos DL+  pts/26     0:00 
hdparm -t /dev/sdb

  While hung, ^C (interrupt) doesn't affect the hdparm until I SIGSTOP
  the offending "rm -r" and then everything eventually unlocks after a few 
seconds.
  Is this the way things are supposed to work?  I did not expect this 
behaviour.  Here's the syslog trace:

      [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 
seconds.
      [163968.537853]       Tainted: G           O      5.15.0-72-generic 
#79~20.04.1-Ubuntu
      [163968.537858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
      [163968.537861] task:hdparm          state:D stack:    0 pid:1391162 
ppid:1391130 flags:0x00004004
      [163968.537869] Call Trace:
      [163968.537872]  <TASK>
      [163968.537876]  __schedule+0x2cd/0x890
      [163968.537889]  schedule+0x69/0x110
      [163968.537896]  wb_wait_for_completion+0x5a/0x90
      [163968.537907]  ? wait_woken+0x60/0x60
      [163968.537928]  sync_inodes_sb+0xd8/0x2b0
      [163968.537932]  sync_inodes_one_sb+0x19/0x20
      [163968.537933]  iterate_supers+0xa3/0x100
      [163968.537937]  ? __x64_sys_tee+0xe0/0xe0
      [163968.537939]  ksys_sync+0x42/0xa0
      [163968.537941]  __do_sys_sync+0xe/0x20
      [163968.537942]  do_syscall_64+0x5c/0xc0
      [163968.537946]  ? do_syscall_64+0x69/0xc0
      [163968.537948]  ? do_syscall_64+0x69/0xc0
      [163968.537950]  ? do_syscall_64+0x69/0xc0
      [163968.537953]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
      [163968.537956] RIP: 0033:0x7f1d374a822b
      [163968.537958] RSP: 002b:00007ffc973592f8 EFLAGS: 00000202 ORIG_RAX: 
00000000000000a2
      [163968.537961] RAX: ffffffffffffffda RBX: 0000000000000d64 RCX: 
00007f1d374a822b
      [163968.537962] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 
0000000000000003
      [163968.537963] RBP: 0000000000000003 R08: 0000000000000007 R09: 
000000000000002a
      [163968.537964] R10: 0000564568bdd017 R11: 0000000000000202 R12: 
0000000000000000
      [163968.537965] R13: 00007ffc9735a295 R14: 00007ffc97359370 R15: 
00007f1d37193000
      [163968.537967]  </TASK>

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-generic-hwe-20.04 5.15.0.72.79~20.04.33
  ProcVersionSignature: Ubuntu 5.15.0-72.79~20.04.1-generic 5.15.98
  Uname: Linux 5.15.0-72-generic x86_64
  ApportVersion: 2.20.11-0ubuntu27.26
  Architecture: amd64
  CasperMD5CheckResult: skip
  Date: Sun May  7 08:38:10 2023
  EcryptfsInUse: Yes
  InstallationDate: Installed on 2020-10-07 (941 days ago)
  InstallationMedia: Lubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  SourcePackage: linux-meta-hwe-5.15
  UpgradeStatus: Upgraded to focal on 2022-08-15 (265 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.15/+bug/2018687/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to