On 11/21/23, Michael Kjörling <2695bd53d...@ewoof.net> wrote:
> On 21 Nov 2023 14:48 +1100, from zen...@gmail.com (Zenaan Harkness):
>> The desktop displays, but my external HDDs have been put to sleep, and
>> they do not wake up.
>>
>> One of them is zfs. The zfs mounts list shows, but any attempt to
>> view/ls a zfs mount, just hangs permanently until a reboot.
>>
>> The other drive is an ext4 filesystem, and it has been completely
>> un-mounted and the HDD spun down, and it does not spin up again -
>> until a reboot.
>
> This doesn't sound right.
>
> Can you run hdparm -C on the affected devices at the time? What is the
> result of that?

So it seems I can test this quickly with a manual suspend, then do the
various checks... it seems that the issue here is the
auto-sleep/suspend.

For starters, prior to suspend, I've removed the zfs drive, and just
left the ext4 drive in the USB caddy (it holds up to 2 drives).

Prior to suspend, I get, for the 2.5 inch hdd when it has not been
accessed for a while and I can feel it is not spinning:

# hdparm -C /dev/sda
/dev/sda:
 drive state is:  standby

then, I ls'ed a dir in that drive that had not previously been
accessed, and could feel it spin up and then give me the output, and
then I ran hdparm again and interestingly, checking a few times on the
now spun up drive, I get identical results as with the drive in the
spun down state:

# hdparm -C /dev/sda
/dev/sda:
 drive state is:  standby

----
Now, after suspend (and wait for hdd to spin down, and wait for
monitors to blank, and wait another 10s) and finally wake the computer
up (which is really too slow - 20 or 30 seconds or so, so something
odd or challenging seems to be happening inside the kernel somewhere):

# ll /dev/sd*
ls: cannot access '/dev/sd*': No such file or directory

# hdparm -C /dev/sda
/dev/sda: No such file or directory


> Do the drives spin back up if you use hdparm -z?

Prior to suspend and wake, I get this:

# hdparm -z /dev/sda
/dev/sda:
 re-reading partition table
 BLKRRPART failed: Device or resource busy

And again, after suspend and wake there is no more /dev/sda, or any
/dev/sd*, so I cannot run hdparm on any such device.


> What is the exact kernel version you are running? Please provide both
> the package name and exact package version, and the full output from
> uname -a.

# uname -a
Linux zen-L7 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1
(2023-09-29) x86_64 GNU/Linux

The kernel package is exactly
linux-image-6.1.0-13-amd64


> Assuming that those drives are connected over USB, do they show up in
> lsusb output while inaccessible?

Prior to suspend and wake, lsusb shows me my hubs, dock, eth adaptors,
trackball, and possibly the following is the HDD dock ? dunno:

Bus 006 Device 015: ID 152d:0565 JMicron Technology Corp. / JMicron
USA Technology Corp. JMS56x Series

... and sure enough after suspend and wake, Bus 006 Device 015 is
gone, no longer exists, so it somehow has not woken up - but I CAN
still see the blue light on the hdd caddy, but the hdd remains in a
spun down/ sleep state, and no /dev/sd* device.

I do get these though (alias ll='ls -l'):

# find /dev/disk/|grep usb
/dev/disk/by-id/usb-WDC_WD20_SPZX-22UA7T0_RANDOM__3F4917AD758C-0:0
/dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0

# ll /dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0
0 lrwxrwxrwx 1 root root 9 20231122 10:33.10
/dev/disk/by-path/pci-0000:3a:00.0-usb-0:2.3.1:1.0-scsi-0:0:0:0 ->
../../sda

# ll /dev/sd*
0 brw-rw---- 1 root disk 8, 0 20231122 10:33.10 /dev/sda

... interestingly, it seems when I formatted this drive with ext4, I
formatted ext4 on the whole disk (/dev/sda) without using partitions,
and so it's just /dev/sda and not /dev/sda1, which has the ext4
filesystem.


> Is there anything relevant in dmesg output?

This looks quite suspicious (some error lines, not all of dmesg output):

[42635.638996] usb 6-2.3.1: device not accepting address 15, error -62
[42668.986050] usb 6-2.3.1: USB disconnect, device number 15
[42668.986406] device offline error, dev sda, sector 0 op 0x1:(WRITE)
flags 0x800 phys_seg 0 prio class 2
[42668.988647] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71)
[42668.990867] hub 6-2.3.2.3:1.0: hub_ext_port_status failed (err = -71)
[42668.990888] hub 6-2.3.2.1:1.0: hub_ext_port_status failed (err = -71)
[42669.007554] usb 6-2.3.2.3.1: Failed to suspend device, error -71
[42669.008775] hub 6-2.3.2:1.0: hub_ext_port_status failed (err = -71)
42713.495809] xhci_hcd 0000:3a:00.0: Timeout while waiting for setup
device command
[42713.703761] usb 6-2.3.1: device not accepting address 19, error -62
[42713.704792] usb 6-2.3-port1: unable to enumerate USB device
[42713.708332] usb 6-2.3.2: USB disconnect, device number 5
[42713.708343] usb 6-2.3.2.1: USB disconnect, device number 7


since "2.3.1" appears in the drive links above, and 6 could be "Bus
6". I'm not familiar with dmesg output though...

I also see the following, but that was earlier in the dmesg output and
may relate to "quitting mpv" causing my desktop/wayland (or it seems,
gnome shell) hang (a different email thread of mine):

[41107.248174] gnome-shell[45289]: segfault at 0 ip 00007f8bc835f1e0
sp 00007ffd720f9028 error 4 in
libmutter-11.so.0.0.0[7f8bc824f000+15a000] likely on CPU 10 (core 4,
socket 0)


> Are you booting the kernel with any command-line parameters? Please
> provide the exact contents of /proc/cmdline.

Fresh debian stable install, no customization by me at all:

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.1.0-13-amd64
root=UUID=9ce1e519-9712-4616-aeeb-0f858e5ac00a ro quiet


> A spun-down drive can take a brief time to spin back up (typically on
> the order of a few seconds), but that SHOULD be handled automatically;
> clearly something odd is going on in your case if it doesn't.

Indeed, something odd is up.

So this hdd has, through suspend, disappeared. It's mount is gone,
it's /dev device is gone. There is a crypted mount which still shows,
but of course any access gives io error, and umount and cryptsetup
close cleans that up.

In the past, and I suspect still, with the zfs drive, the problems
above are impossible to clean up with zfs, e.g run any zfs command
such as `zfs list` or `zpool list` and the command hangs until reboot,
and the drive cannot be used again, until a reboot, so the
suspend/wake cycle is very problematic in this instance, I have to
permanently disable auto suspend, and remember to not manually
suspend, if I am in the middle of work on a zfs drive. At least, an
externally attached zfs drive, but I suspect the same problem is with
internal drive zfs mounts - zfs is just not properly integrated in
relation to the linux kernel's suspend and resume, and zfs is not
currently designed to cope with that... though I DO suspect if some
sort of loop mount zfs filesystem-in-a-file were mounted only inside a
virtual machine, and exported back to the host via samba or nfs, and
the zfs 'filesystem in a file' exists on the host in an ext4
filesystem, that then zfs may "cope" with suspend and resume, due to
the cleaner nature of virtual machine "suspend" environment. Something
to test one day...


> “Remember when, on the Internet, nobody cared that you were a dog?”

Bugger! I've been so deluded all this time...
Oh well, better awake than deluded!

Reply via email to