I now have a Perl script running that is removing duplicate files by
doing thousands of hard links on a different external USB3 disk and it
is locking up or timing out many disk-related things on all my other
disks.  Both this USB3 external drive and the one above are plugged
directly into the motherboard (HP Z440 Workstation).

F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0 2210112 2210111  20   0  53680 42588 blk_mq D+   pts/24     3:00 
/usr/bin/perl [...]

May 14 13:37:49 kernel: [259424.745462] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 00:08:09 kernel: [297244.761855] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 00:10:10 kernel: [297365.592485] INFO: task smartd:2719 blocked for more 
than 241 seconds.
May 15 01:08:34 kernel: [300869.682961] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 01:28:43 kernel: [302077.989582] INFO: task hdparm:2052842 blocked for 
more than 120 seconds.
May 15 01:30:43 kernel: [302198.820278] INFO: task hdparm:2052842 blocked for 
more than 241 seconds.
May 15 01:32:44 kernel: [302319.654907] INFO: task hdparm:2052842 blocked for 
more than 362 seconds.
May 15 01:34:45 kernel: [302440.481601] INFO: task hdparm:2052842 blocked for 
more than 483 seconds.
May 15 01:36:46 kernel: [302561.316237] INFO: task hdparm:2052842 blocked for 
more than 604 seconds.
May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more 
than 120 seconds.

From one of the logged events:

May 15 02:06:58 kernel: [304373.770194] INFO: task smartd:2719 blocked for more 
than 120 seconds.
May 15 02:06:58 kernel: [304373.770209]       Tainted: G           O      
5.15.0-72-generic #79~20.04.1-Ubuntu
May 15 02:06:58 kernel: [304373.770215] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 15 02:06:58 kernel: [304373.770218] task:smartd          state:D stack:    
0 pid: 2719 ppid:     1 flags:0x00000000
May 15 02:06:58 kernel: [304373.770226] Call Trace: 
May 15 02:06:58 kernel: [304373.770230]  <TASK>
May 15 02:06:58 kernel: [304373.770236]  __schedule+0x2cd/0x890
May 15 02:06:58 kernel: [304373.770251]  schedule+0x69/0x110
May 15 02:06:58 kernel: [304373.770260]  schedule_preempt_disabled+0xe/0x20
May 15 02:06:58 kernel: [304373.770269]  __mutex_lock.isra.0+0x20c/0x470
May 15 02:06:58 kernel: [304373.770276]  ? iput.part.0+0x61/0x1e0
May 15 02:06:58 kernel: [304373.770287]  __mutex_lock_slowpath+0x13/0x20
May 15 02:06:58 kernel: [304373.770294]  mutex_lock+0x36/0x40
May 15 02:06:58 kernel: [304373.770299]  blkdev_get_by_dev+0x11d/0x2d0
May 15 02:06:58 kernel: [304373.770309]  ? blkdev_close+0x30/0x30
May 15 02:06:58 kernel: [304373.770318]  blkdev_open+0x50/0x90
May 15 02:06:58 kernel: [304373.770325]  do_dentry_open+0x169/0x3e0
May 15 02:06:58 kernel: [304373.770336]  vfs_open+0x2d/0x40
May 15 02:06:58 kernel: [304373.770342]  do_open.isra.0+0x20d/0x480
May 15 02:06:58 kernel: [304373.770351]  path_openat+0x18e/0xe50
May 15 02:06:58 kernel: [304373.770361]  ? put_device+0x13/0x20
May 15 02:06:58 kernel: [304373.770371]  ? scsi_device_put+0x31/0x40
May 15 02:06:58 kernel: [304373.770380]  ? sd_release+0x3b/0xb0
May 15 02:06:58 kernel: [304373.770388]  do_filp_open+0xb2/0x120
May 15 02:06:58 kernel: [304373.770398]  ? __check_object_size+0x14f/0x160
May 15 02:06:58 kernel: [304373.770408]  do_sys_openat2+0x249/0x330
May 15 02:06:58 kernel: [304373.770418]  do_sys_open+0x46/0x80
May 15 02:06:58 kernel: [304373.770424]  __x64_sys_openat+0x20/0x30
May 15 02:06:58 kernel: [304373.770430]  do_syscall_64+0x5c/0xc0
May 15 02:06:58 kernel: [304373.770440]  ? do_syscall_64+0x69/0xc0
May 15 02:06:58 kernel: [304373.770448]  
entry_SYSCALL_64_after_hwframe+0x61/0xcb
May 15 02:06:58 kernel: [304373.770458] RIP: 0033:0x7f9b0d188d3b
May 15 02:06:58 kernel: [304373.770465] RSP: 002b:00007ffd72a3caf0 EFLAGS: 
00000246 ORIG_RAX: 0000000000000101
May 15 02:06:58 kernel: [304373.770473] RAX: ffffffffffffffda RBX: 
000055f1346783c0 RCX: 00007f9b0d188d3b
May 15 02:06:58 kernel: [304373.770479] RDX: 0000000000000800 RSI: 
000055f1346783f8 RDI: 00000000ffffff9c
May 15 02:06:58 kernel: [304373.770484] RBP: 000055f1346783f8 R08: 
0000000000000001 R09: 0000000000000000
May 15 02:06:58 kernel: [304373.770488] R10: 0000000000000000 R11: 
0000000000000246 R12: 0000000000000800
May 15 02:06:58 kernel: [304373.770493] R13: 0000000000000000 R14: 
000055f1334c26a4 R15: 00007f9b0cd17250
May 15 02:06:58 kernel: [304373.770500]  </TASK>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2018687

Title:
  rm -r dir on USB disk locks up hdparm on different disk

Status in linux-meta-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Description:    Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz

  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:

      May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
      [...]
      May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
      [...]
      May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.

  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:

      # \time hdparm -t /dev/sda
      /dev/sda:
       Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
      0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
      2312704inputs+8outputs (0major+664minor)pagefaults 0swaps

  Elapsed time is about six seconds, as expected.  /dev/sda is an
  internal SSD drive.

  I now run this loop to show the timings and process states below:

      # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v
  refrig ; done

  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:

      Sun May  7 05:01:07 EDT 2023
      Sun May  7 05:01:08 EDT 2023
      Sun May  7 05:01:09 EDT 2023
      Sun May  7 05:01:10 EDT 2023
      Sun May  7 05:01:11 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      4     0 1423283   11939  20   0  10648   580 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      Sun May  7 05:01:12 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0    4016       1  20   0 161136  1900 usbhid Dsl  ?          1:39 
/sbin/apcupsd
      4     0 1423283   11939  20   0  10648   580 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      Sun May  7 05:01:13 EDT 2023

  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:

      Sun May  7 05:01:25 EDT 2023
      1     0     368       2  20   0      0     0 md_sup D    ?          2:57 
[md0_raid5]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:26 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:27 EDT 2023
  [...]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:35 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      4     0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:36 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0    4985       2  20   0      0     0 rq_qos D    ?          0:24 
[jbd2/sdj1-8]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:37 EDT 2023

  I keep waiting.  The above lines repeat over and over and the hdparm is
  blocked and doesn't finish.

      Sun May  7 05:03:32 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:18 
[usb-storage]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:03 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:34 EDT 2023

  At this point, I SIGSTOP the offending "rm -r" and it disappears from
  uninterruptible sleep, but the other processes are still in it for
  several seconds until the lockup clears and the hdparm finally finishes:

      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:35 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:36 EDT 2023
      1     0    4016       1  20   0 161136  1900 usbhid Dsl  ?          1:39 
/sbin/apcupsd
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:37 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:38 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:39 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:40 EDT 2023
      Sun May  7 05:03:41 EDT 2023
      Sun May  7 05:03:42 EDT 2023
      Sun May  7 05:03:43 EDT 2023
      4     0 1423501    9975  20   0   4680  4584 wait_o DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:44 EDT 2023
      4     0 1423501    9975  20   0   4680  4584 wait_o DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:45 EDT 2023
      Sun May  7 05:03:46 EDT 2023

  And the hdparm is finally done, several minutes after I started it.  If I keep
  the offending "rm -r" running on that USB3 disk, the "rm -r" locks up an 
hdparm
  (running on a different disk) long enough to cause the above kernel INFO 
messages.

  If I try hdparm on a different disk, e.g. one of the WD Red drives in
  my internal RAID5 array, it also hangs until I SIGSTOP or kill the "rm -r":

      1     0     447       2  20   0      0     0 usb_sg D    ?          3:51 
[usb-storage]
      1     0    4985       2  20   0      0     0 blk_mq D    ?          0:31 
[jbd2/sdj1-8]
      1     0 1466181       2  20   0      0     0 rq_qos D    ?          0:08 
[kworker/u16:4+flush-8:144]
      4     0 1469821   11939  20   0  27524 18888 wait_o D+   pts/28     0:44 
rm -rf 15tb3
      4     0 1501584   11933  20   0   4680  4584 rq_qos DL+  pts/26     0:00 
hdparm -t /dev/sdb

  While hung, ^C (interrupt) doesn't affect the hdparm until I SIGSTOP
  the offending "rm -r" and then everything eventually unlocks after a few 
seconds.
  Is this the way things are supposed to work?  I did not expect this 
behaviour.  Here's the syslog trace:

      [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 
seconds.
      [163968.537853]       Tainted: G           O      5.15.0-72-generic 
#79~20.04.1-Ubuntu
      [163968.537858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
      [163968.537861] task:hdparm          state:D stack:    0 pid:1391162 
ppid:1391130 flags:0x00004004
      [163968.537869] Call Trace:
      [163968.537872]  <TASK>
      [163968.537876]  __schedule+0x2cd/0x890
      [163968.537889]  schedule+0x69/0x110
      [163968.537896]  wb_wait_for_completion+0x5a/0x90
      [163968.537907]  ? wait_woken+0x60/0x60
      [163968.537928]  sync_inodes_sb+0xd8/0x2b0
      [163968.537932]  sync_inodes_one_sb+0x19/0x20
      [163968.537933]  iterate_supers+0xa3/0x100
      [163968.537937]  ? __x64_sys_tee+0xe0/0xe0
      [163968.537939]  ksys_sync+0x42/0xa0
      [163968.537941]  __do_sys_sync+0xe/0x20
      [163968.537942]  do_syscall_64+0x5c/0xc0
      [163968.537946]  ? do_syscall_64+0x69/0xc0
      [163968.537948]  ? do_syscall_64+0x69/0xc0
      [163968.537950]  ? do_syscall_64+0x69/0xc0
      [163968.537953]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
      [163968.537956] RIP: 0033:0x7f1d374a822b
      [163968.537958] RSP: 002b:00007ffc973592f8 EFLAGS: 00000202 ORIG_RAX: 
00000000000000a2
      [163968.537961] RAX: ffffffffffffffda RBX: 0000000000000d64 RCX: 
00007f1d374a822b
      [163968.537962] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 
0000000000000003
      [163968.537963] RBP: 0000000000000003 R08: 0000000000000007 R09: 
000000000000002a
      [163968.537964] R10: 0000564568bdd017 R11: 0000000000000202 R12: 
0000000000000000
      [163968.537965] R13: 00007ffc9735a295 R14: 00007ffc97359370 R15: 
00007f1d37193000
      [163968.537967]  </TASK>

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-generic-hwe-20.04 5.15.0.72.79~20.04.33
  ProcVersionSignature: Ubuntu 5.15.0-72.79~20.04.1-generic 5.15.98
  Uname: Linux 5.15.0-72-generic x86_64
  ApportVersion: 2.20.11-0ubuntu27.26
  Architecture: amd64
  CasperMD5CheckResult: skip
  Date: Sun May  7 08:38:10 2023
  EcryptfsInUse: Yes
  InstallationDate: Installed on 2020-10-07 (941 days ago)
  InstallationMedia: Lubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  SourcePackage: linux-meta-hwe-5.15
  UpgradeStatus: Upgraded to focal on 2022-08-15 (265 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.15/+bug/2018687/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to