I now have a Perl script running that is removing duplicate files by doing thousands of hard links on a different external USB3 disk and it is locking up or timing out many disk-related things on all my other disks. Both this USB3 external drive and the one above are plugged directly into the motherboard (HP Z440 Workstation).
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 4 0 2210112 2210111 20 0 53680 42588 blk_mq D+ pts/24 3:00 /usr/bin/perl [...] May 14 13:37:49 kernel: [259424.745462] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 00:08:09 kernel: [297244.761855] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 00:10:10 kernel: [297365.592485] INFO: task smartd:2719 blocked for more than 241 seconds. May 15 01:08:34 kernel: [300869.682961] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 01:28:43 kernel: [302077.989582] INFO: task hdparm:2052842 blocked for more than 120 seconds. May 15 01:30:43 kernel: [302198.820278] INFO: task hdparm:2052842 blocked for more than 241 seconds. May 15 01:32:44 kernel: [302319.654907] INFO: task hdparm:2052842 blocked for more than 362 seconds. May 15 01:34:45 kernel: [302440.481601] INFO: task hdparm:2052842 blocked for more than 483 seconds. May 15 01:36:46 kernel: [302561.316237] INFO: task hdparm:2052842 blocked for more than 604 seconds. May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more than 120 seconds. From one of the logged events: May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more than 120 seconds. May 15 02:06:58 kernel: [304373.770209] Tainted: G O 5.15.0-72-generic #79~20.04.1-Ubuntu May 15 02:06:58 kernel: [304373.770215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 15 02:06:58 kernel: [304373.770218] task:smartd state:D stack: 0 pid: 2719 ppid: 1 flags:0x00000000 May 15 02:06:58 kernel: [304373.770226] Call Trace: May 15 02:06:58 kernel: [304373.770230] <TASK> May 15 02:06:58 kernel: [304373.770236] __schedule+0x2cd/0x890 May 15 02:06:58 kernel: [304373.770251] schedule+0x69/0x110 May 15 02:06:58 kernel: [304373.770260] schedule_preempt_disabled+0xe/0x20 May 15 02:06:58 kernel: [304373.770269] __mutex_lock.isra.0+0x20c/0x470 May 15 02:06:58 kernel: [304373.770276] ? iput.part.0+0x61/0x1e0 May 15 02:06:58 kernel: [304373.770287] __mutex_lock_slowpath+0x13/0x20 May 15 02:06:58 kernel: [304373.770294] mutex_lock+0x36/0x40 May 15 02:06:58 kernel: [304373.770299] blkdev_get_by_dev+0x11d/0x2d0 May 15 02:06:58 kernel: [304373.770309] ? blkdev_close+0x30/0x30 May 15 02:06:58 kernel: [304373.770318] blkdev_open+0x50/0x90 May 15 02:06:58 kernel: [304373.770325] do_dentry_open+0x169/0x3e0 May 15 02:06:58 kernel: [304373.770336] vfs_open+0x2d/0x40 May 15 02:06:58 kernel: [304373.770342] do_open.isra.0+0x20d/0x480 May 15 02:06:58 kernel: [304373.770351] path_openat+0x18e/0xe50 May 15 02:06:58 kernel: [304373.770361] ? put_device+0x13/0x20 May 15 02:06:58 kernel: [304373.770371] ? scsi_device_put+0x31/0x40 May 15 02:06:58 kernel: [304373.770380] ? sd_release+0x3b/0xb0 May 15 02:06:58 kernel: [304373.770388] do_filp_open+0xb2/0x120 May 15 02:06:58 kernel: [304373.770398] ? __check_object_size+0x14f/0x160 May 15 02:06:58 kernel: [304373.770408] do_sys_openat2+0x249/0x330 May 15 02:06:58 kernel: [304373.770418] do_sys_open+0x46/0x80 May 15 02:06:58 kernel: [304373.770424] __x64_sys_openat+0x20/0x30 May 15 02:06:58 kernel: [304373.770430] do_syscall_64+0x5c/0xc0 May 15 02:06:58 kernel: [304373.770440] ? do_syscall_64+0x69/0xc0 May 15 02:06:58 kernel: [304373.770448] entry_SYSCALL_64_after_hwframe+0x61/0xcb May 15 02:06:58 kernel: [304373.770458] RIP: 0033:0x7f9b0d188d3b May 15 02:06:58 kernel: [304373.770465] RSP: 002b:00007ffd72a3caf0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 May 15 02:06:58 kernel: [304373.770473] RAX: ffffffffffffffda RBX: 000055f1346783c0 RCX: 00007f9b0d188d3b May 15 02:06:58 kernel: [304373.770479] RDX: 0000000000000800 RSI: 000055f1346783f8 RDI: 00000000ffffff9c May 15 02:06:58 kernel: [304373.770484] RBP: 000055f1346783f8 R08: 0000000000000001 R09: 0000000000000000 May 15 02:06:58 kernel: [304373.770488] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000800 May 15 02:06:58 kernel: [304373.770493] R13: 0000000000000000 R14: 000055f1334c26a4 R15: 00007f9b0cd17250 May 15 02:06:58 kernel: [304373.770500] </TASK> -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu. https://bugs.launchpad.net/bugs/2018687 Title: rm -r dir on USB disk locks up hdparm on different disk Status in linux-meta-hwe-5.15 package in Ubuntu: New Bug description: Description: Ubuntu 20.04.6 LTS Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 GNU/Linux Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz Running a "rm -r dir" on a directory with millions of files that resides on a disk in an external USB-3 hard drive dock locks up an unrelated hdparm processes running on an internal disk such that the kernel says: May 7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked for more than 120 seconds. [...] May 7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked for more than 241 seconds. [...] May 7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked for more than 362 seconds. First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" SIGSTOPped so that it doesn't affect anything: # \time hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 1128 MB in 3.00 seconds = 375.50 MB/sec 0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 4584maxresident)k 2312704inputs+8outputs (0major+664minor)pagefaults 0swaps Elapsed time is about six seconds, as expected. /dev/sda is an internal SSD drive. I now run this loop to show the timings and process states below: # while sleep 1 ; do date ; ps laxww | grep '[ ]D' | grep -v refrig ; done (I have some processes stopped in a freezer cgroup ("refrig") that I don't want to see in the grep output.) I SIGCONT the offending "rm -r" running on the drive in the USB3 drive dock and you see the rm appear in uninterruptible sleep along with a couple of kernel processes: Sun May 7 05:01:07 EDT 2023 Sun May 7 05:01:08 EDT 2023 Sun May 7 05:01:09 EDT 2023 Sun May 7 05:01:10 EDT 2023 Sun May 7 05:01:11 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:12 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 1 0 4016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 4 0 1423283 11939 20 0 10648 580 wait_o D+ pts/28 0:00 rm -rf 15tb3 Sun May 7 05:01:13 EDT 2023 The above lines showing those processes in uninterruptible sleep repeat over and over each second as the "rm -r" continues. I then start up "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in uninterruptible sleep and doesn't finish even after minutes of waiting: Sun May 7 05:01:25 EDT 2023 1 0 368 2 20 0 0 0 md_sup D ? 2:57 [md0_raid5] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:26 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:27 EDT 2023 [...] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:35 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 4 0 1423283 11939 20 0 11260 2544 blk_mq D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:36 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:17 [usb-storage] 1 0 4985 2 20 0 0 0 rq_qos D ? 0:24 [jbd2/sdj1-8] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:00 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:01:37 EDT 2023 I keep waiting. The above lines repeat over and over and the hdparm is blocked and doesn't finish. Sun May 7 05:03:32 EDT 2023 1 0 447 2 20 0 0 0 usb_sg D ? 3:18 [usb-storage] 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423283 11939 20 0 11260 2544 wait_o D+ pts/28 0:03 rm -rf 15tb3 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:34 EDT 2023 At this point, I SIGSTOP the offending "rm -r" and it disappears from uninterruptible sleep, but the other processes are still in it for several seconds until the lockup clears and the hdparm finally finishes: 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:35 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:36 EDT 2023 1 0 4016 1 20 0 161136 1900 usbhid Dsl ? 1:39 /sbin/apcupsd 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:37 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:38 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:39 EDT 2023 1 0 1366783 2 20 0 0 0 blk_mq D ? 0:02 [kworker/u16:2+flush-8:144] 4 0 1423501 9975 20 0 4680 4584 wb_wai DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:40 EDT 2023 Sun May 7 05:03:41 EDT 2023 Sun May 7 05:03:42 EDT 2023 Sun May 7 05:03:43 EDT 2023 4 0 1423501 9975 20 0 4680 4584 wait_o DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:44 EDT 2023 4 0 1423501 9975 20 0 4680 4584 wait_o DL+ pts/4 0:00 hdparm -t /dev/sda Sun May 7 05:03:45 EDT 2023 Sun May 7 05:03:46 EDT 2023 And the hdparm is finally done, several minutes after I started it. If I keep the offending "rm -r" running on that USB3 disk, the "rm -r" locks up an hdparm (running on a different disk) long enough to cause the above kernel INFO messages. If I try hdparm on a different disk, e.g. one of the WD Red drives in my internal RAID5 array, it also hangs until I SIGSTOP or kill the "rm -r": 1 0 447 2 20 0 0 0 usb_sg D ? 3:51 [usb-storage] 1 0 4985 2 20 0 0 0 blk_mq D ? 0:31 [jbd2/sdj1-8] 1 0 1466181 2 20 0 0 0 rq_qos D ? 0:08 [kworker/u16:4+flush-8:144] 4 0 1469821 11939 20 0 27524 18888 wait_o D+ pts/28 0:44 rm -rf 15tb3 4 0 1501584 11933 20 0 4680 4584 rq_qos DL+ pts/26 0:00 hdparm -t /dev/sdb While hung, ^C (interrupt) doesn't affect the hdparm until I SIGSTOP the offending "rm -r" and then everything eventually unlocks after a few seconds. Is this the way things are supposed to work? I did not expect this behaviour. Here's the syslog trace: [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 seconds. [163968.537853] Tainted: G O 5.15.0-72-generic #79~20.04.1-Ubuntu [163968.537858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [163968.537861] task:hdparm state:D stack: 0 pid:1391162 ppid:1391130 flags:0x00004004 [163968.537869] Call Trace: [163968.537872] <TASK> [163968.537876] __schedule+0x2cd/0x890 [163968.537889] schedule+0x69/0x110 [163968.537896] wb_wait_for_completion+0x5a/0x90 [163968.537907] ? wait_woken+0x60/0x60 [163968.537928] sync_inodes_sb+0xd8/0x2b0 [163968.537932] sync_inodes_one_sb+0x19/0x20 [163968.537933] iterate_supers+0xa3/0x100 [163968.537937] ? __x64_sys_tee+0xe0/0xe0 [163968.537939] ksys_sync+0x42/0xa0 [163968.537941] __do_sys_sync+0xe/0x20 [163968.537942] do_syscall_64+0x5c/0xc0 [163968.537946] ? do_syscall_64+0x69/0xc0 [163968.537948] ? do_syscall_64+0x69/0xc0 [163968.537950] ? do_syscall_64+0x69/0xc0 [163968.537953] entry_SYSCALL_64_after_hwframe+0x61/0xcb [163968.537956] RIP: 0033:0x7f1d374a822b [163968.537958] RSP: 002b:00007ffc973592f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2 [163968.537961] RAX: ffffffffffffffda RBX: 0000000000000d64 RCX: 00007f1d374a822b [163968.537962] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 0000000000000003 [163968.537963] RBP: 0000000000000003 R08: 0000000000000007 R09: 000000000000002a [163968.537964] R10: 0000564568bdd017 R11: 0000000000000202 R12: 0000000000000000 [163968.537965] R13: 00007ffc9735a295 R14: 00007ffc97359370 R15: 00007f1d37193000 [163968.537967] </TASK> ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-generic-hwe-20.04 5.15.0.72.79~20.04.33 ProcVersionSignature: Ubuntu 5.15.0-72.79~20.04.1-generic 5.15.98 Uname: Linux 5.15.0-72-generic x86_64 ApportVersion: 2.20.11-0ubuntu27.26 Architecture: amd64 CasperMD5CheckResult: skip Date: Sun May 7 08:38:10 2023 EcryptfsInUse: Yes InstallationDate: Installed on 2020-10-07 (941 days ago) InstallationMedia: Lubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) SourcePackage: linux-meta-hwe-5.15 UpgradeStatus: Upgraded to focal on 2022-08-15 (265 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.15/+bug/2018687/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp