Unfortunately, an unacceptable side-effect of "sysctl -w vm.dirty_ratio=0" is that disk operations that move a lot of data are taking much too long. An rsync that normally takes less than an hour of real time was still running over 12 hours later (and hasn't finished yet). I'm reverting the vm.dirty_ratio back up to 20 to see if that clears out all the unfinished disk I/O.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/2018687 Title: rm -r dir on USB disk locks up hdparm on different disk Status in linux-meta-hwe-5.15 package in Ubuntu: New Bug description: Description: Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. [...] May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. [...] May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: # \time hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: Sun May 7 05:01:07 EDT 2023 Sun May 7 05:01:08 EDT 2023 Sun May 7 05:01:09 EDT 2023 Sun May 7 05:01:10 EDT 2023 Sun May 7 05:01:11 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:12 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 1 0 4016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: Sun May 7 05:01:25 EDT 2023 1 0 368 2 20 0 0 0 md_sup D ? 2:57 [md0_raid5] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:26 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:27 EDT 2023 [...] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:35 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:36 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 1 0 4985 2 20 0 0 0 rq_qos D ? 0:24 [jbd2/sdj1-8] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:37 EDT 2023 I keep waiting. The above lines repeat over and over and the hdparm is blocked and doesn't finish. Sun May 7 05:03:32 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:18 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:03 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:34 EDT 2023 At this point, I SIGSTOP the offending "rm -r" and it disappears from uninterruptible sleep, but the other processes are still in it for several seconds until the lockup clears and the hdparm finally finishes: 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:35 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:36 EDT 2023 1 0 4016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:37 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:38 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:39 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:40 EDT 2023 Sun May 7 05:03:41 EDT 2023 Sun May 7 05:03:42 EDT 2023 Sun May 7 05:03:43 EDT 2023 4 0 1423501 9975 20 0 4680 4584 wait_o DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:44 EDT 2023 4 0 1423501 9975 20 0 4680 4584 wait_o DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:45 EDT 2023 Sun May 7 05:03:46 EDT 2023 And the hdparm is finally done, several minutes after I started it. If I keep the offending "rm -r" running on that USB3 disk, the "rm -r" locks up an hdparm (running on a different disk) long enough to cause the above kernel INFO messages. If I try hdparm on a different disk, e.g. one of the WD Red drives in my internal RAID5 array, it also hangs until I SIGSTOP or kill the "rm -r": 1 0 447 2 20 0 0 0 usb_sg D ? 3:51 [usb-storage] 1 0 4985 2 20 0 0 0 blk_mq D ? 0:31 [jbd2/sdj1-8] 1 0 1466181 2 20 0 0 0 rq_qos D ? 0:08 [kworker/u16:4+flush-8:144] 4 0 1469821 11939 20 0 27524 18888 wait_o D+ pts/28 0:44 rm -rf 15tb3 4 0 1501584 11933 20 0 4680 4584 rq_qos DL+ pts/26 0:00 hdparm -t /dev/sdb While hung, ^C (interrupt) doesn't affect the hdparm until I SIGSTOP the offending "rm -r" and then everything eventually unlocks after a few seconds. Is this the way things are supposed to work? I did not expect this behaviour. Here's the syslog trace: [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 seconds. [163968.537853] Tainted: G O 5.15.0-72-generic #79~20.04.1-Ubuntu [163968.537858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [163968.537861] task:hdparm state:D stack: 0 pid:1391162 ppid:1391130 flags:0x00004004 [163968.537869] Call Trace: [163968.537872] <TASK> [163968.537876] __schedule+0x2cd/0x890 [163968.537889] schedule+0x69/0x110 [163968.537896] wb_wait_for_completion+0x5a/0x90 [163968.537907] ? wait_woken+0x60/0x60 [163968.537928] sync_inodes_sb+0xd8/0x2b0 [163968.537932] sync_inodes_one_sb+0x19/0x20 [163968.537933] iterate_supers+0xa3/0x100 [163968.537937] ? __x64_sys_tee+0xe0/0xe0 [163968.537939] ksys_sync+0x42/0xa0 [163968.537941] __do_sys_sync+0xe/0x20 [163968.537942] do_syscall_64+0x5c/0xc0 [163968.537946] ? do_syscall_64+0x69/0xc0 [163968.537948] ? do_syscall_64+0x69/0xc0 [163968.537950] ? do_syscall_64+0x69/0xc0 [163968.537953] entry_SYSCALL_64_after_hwframe+0x61/0xcb [163968.537956] RIP: 0033:0x7f1d374a822b [163968.537958] RSP: 002b:00007ffc973592f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2 [163968.537961] RAX: ffffffffffffffda RBX: 0000000000000d64 RCX: 00007f1d374a822b [163968.537962] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 0000000000000003 [163968.537963] RBP: 0000000000000003 R08: 0000000000000007 R09: 000000000000002a [163968.537964] R10: 0000564568bdd017 R11: 0000000000000202 R12: 0000000000000000 [163968.537965] R13: 00007ffc9735a295 R14: 00007ffc97359370 R15: 00007f1d37193000 [163968.537967] </TASK> ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-generic-hwe-20.04 5.15.0.72.79~20.04.33 ProcVersionSignature: Ubuntu 5.15.0-72.79~20.04.1-generic 5.15.98 Uname: Linux 5.15.0-72-generic x86_64 ApportVersion: 2.20.11-0ubuntu27.26 Architecture: amd64 CasperMD5CheckResult: skip Date: Sun May 7 08:38:10 2023 EcryptfsInUse: Yes InstallationDate: Installed on 2020-10-07 (941 days ago) InstallationMedia: Lubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) SourcePackage: linux-meta-hwe-5.15 UpgradeStatus: Upgraded to focal on 2022-08-15 (265 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.15/+bug/2018687/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp