On 11/21/23, Michael Kjörling <2695bd53d...@ewoof.net> wrote: > On 21 Nov 2023 14:48 +1100, from zen...@gmail.com (Zenaan Harkness): >> The desktop displays, but my external HDDs have been put to sleep, and >> they do not wake up. >> >> One of them is zfs. The zfs mounts list shows, but any attempt to >> view/ls a zfs mount, just hangs permanently until a reboot. >> >> The other drive is an ext4 filesystem, and it has been completely >> un-mounted and the HDD spun down, and it does not spin up again - >> until a reboot. > > This doesn't sound right. > > Can you run hdparm -C on the affected devices at the time? What is the > result of that?
So it seems I can test this quickly with a manual suspend, then do the various checks... it seems that the issue here is the auto-sleep/suspend. For starters, prior to suspend, I've removed the zfs drive, and just left the ext4 drive in the USB caddy (it holds up to 2 drives). Prior to suspend, I get, for the 2.5 inch hdd when it has not been accessed for a while and I can feel it is not spinning: # hdparm -C /dev/sda /dev/sda: drive state is: standby then, I ls'ed a dir in that drive that had not previously been accessed, and could feel it spin up and then give me the output, and then I ran hdparm again and interestingly, checking a few times on the now spun up drive, I get identical results as with the drive in the spun down state: # hdparm -C /dev/sda /dev/sda: drive state is: standby ---- Now, after suspend (and wait for hdd to spin down, and wait for monitors to blank, and wait another 10s) and finally wake the computer up (which is really too slow - 20 or 30 seconds or so, so something odd or challenging seems to be happening inside the kernel somewhere): # ll /dev/sd* ls: cannot access '/dev/sd*': No such file or directory # hdparm -C /dev/sda /dev/sda: No such file or directory > Do the drives spin back up if you use hdparm -z? Prior to suspend and wake, I get this: # hdparm -z /dev/sda /dev/sda: re-reading partition table BLKRRPART failed: Device or resource busy And again, after suspend and wake there is no more /dev/sda, or any /dev/sd*, so I cannot run hdparm on any such device. > What is the exact kernel version you are running? Please provide both > the package name and exact package version, and the full output from > uname -a. # uname -a Linux zen-L7 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux The kernel package is exactly linux-image-6.1.0-13-amd64 > Assuming that those drives are connected over USB, do they show up in > lsusb output while inaccessible? Prior to suspend and wake, lsusb shows me my hubs, dock, eth adaptors, trackball, and possibly the following is the HDD dock ? dunno: Bus 006 Device 015: ID 152d:0565 JMicron Technology Corp. / JMicron USA Technology Corp. JMS56x Series ... and sure enough after suspend and wake, Bus 006 Device 015 is gone, no longer exists, so it somehow has not woken up - but I CAN still see the blue light on the hdd caddy, but the hdd remains in a spun down/ sleep state, and no /dev/sd* device. I do get these though (alias ll='ls -l'): # find /dev/disk/|grep usb /dev/disk/by-id/usb-WDC_WD20_SPZX-22UA7T0_RANDOM__3F4917AD758C-0:0 /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 # ll /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 0 lrwxrwxrwx 1 root root 9 20231122 10:33.10 /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 -> ../../sda # ll /dev/sd* 0 brw-rw---- 1 root disk 8, 0 20231122 10:33.10 /dev/sda ... interestingly, it seems when I formatted this drive with ext4, I formatted ext4 on the whole disk (/dev/sda) without using partitions, and so it's just /dev/sda and not /dev/sda1, which has the ext4 filesystem. > Is there anything relevant in dmesg output? This looks quite suspicious (some error lines, not all of dmesg output): [42635.638996] usb 6-2.3.1: device not accepting address 15, error -62 [42668.986050] usb 6-2.3.1: USB disconnect, device number 15 [42668.986406] device offline error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 [42668.988647] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71) [42668.990867] hub 6-2.3.2.3:1.0: hub_ext_port_status failed (err = -71) [42668.990888] hub 6-2.3.2.1:1.0: hub_ext_port_status failed (err = -71) [42669.007554] usb 6-2.3.2.3.1: Failed to suspend device, error -71 [42669.008775] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71) 42713.495809] xhci_hcd 0000:3a:00.0: Timeout while waiting for setup device command [42713.703761] usb 6-2.3.1: device not accepting address 19, error -62 [42713.704792] usb 6-2.3-port1: unable to enumerate USB device [42713.708332] usb 6-2.3.2: USB disconnect, device number 5 [42713.708343] usb 6-2.3.2.1: USB disconnect, device number 7 since "2.3.1" appears in the drive links above, and 6 could be "Bus 6". I'm not familiar with dmesg output though... I also see the following, but that was earlier in the dmesg output and may relate to "quitting mpv" causing my desktop/wayland (or it seems, gnome shell) hang (a different email thread of mine): [41107.248174] gnome-shell[45289]: segfault at 0 ip 00007f8bc835f1e0 sp 00007ffd720f9028 error 4 in libmutter-11.so.0.0.0[7f8bc824f000+15a000] likely on CPU 10 (core 4, socket 0) > Are you booting the kernel with any command-line parameters? Please > provide the exact contents of /proc/cmdline. Fresh debian stable install, no customization by me at all: # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.1.0-13-amd64 root=UUID=9ce1e519-9712-4616-aeeb-0f858e5ac00a ro quiet > A spun-down drive can take a brief time to spin back up (typically on > the order of a few seconds), but that SHOULD be handled automatically; > clearly something odd is going on in your case if it doesn't. Indeed, something odd is up. So this hdd has, through suspend, disappeared. It's mount is gone, it's /dev device is gone. There is a crypted mount which still shows, but of course any access gives io error, and umount and cryptsetup close cleans that up. In the past, and I suspect still, with the zfs drive, the problems above are impossible to clean up with zfs, e.g run any zfs command such as `zfs list` or `zpool list` and the command hangs until reboot, and the drive cannot be used again, until a reboot, so the suspend/wake cycle is very problematic in this instance, I have to permanently disable auto suspend, and remember to not manually suspend, if I am in the middle of work on a zfs drive. At least, an externally attached zfs drive, but I suspect the same problem is with internal drive zfs mounts - zfs is just not properly integrated in relation to the linux kernel's suspend and resume, and zfs is not currently designed to cope with that... though I DO suspect if some sort of loop mount zfs filesystem-in-a-file were mounted only inside a virtual machine, and exported back to the host via samba or nfs, and the zfs 'filesystem in a file' exists on the host in an ext4 filesystem, that then zfs may "cope" with suspend and resume, due to the cleaner nature of virtual machine "suspend" environment. Something to test one day... > “Remember when, on the Internet, nobody cared that you were a dog?” Bugger! I've been so deluded all this time... Oh well, better awake than deluded!