On Mon, 05 Jan 2026 05:24:54 +0100,
Jonathan Matthew <[email protected]> wrote:
>
> On Sun, Jan 04, 2026 at 10:34:55AM -0700, Theo de Raadt wrote:
> > Mark Kettenis <[email protected]> wrote:
> >
> > > I'm 100% sure that I am booting the correct kernel. The checksum
> > > calculated by that code above is the same. But for some reason the
> > > checksum that we read back from the hibernation info on disk is
> > > all-zeroes. So something is going wrong. Will dig deeper when I have
> > > time.
> >
> > Is it just the checksum field --- or has the signature sector not
> > actually made it onto disk?
> >
> > There is this messy thing in subr_hibernate.c around 1954
> >
> > /* Allow the disk to settle */
> > delay(500000);
> >
> > Few days ago I asked Mike about this again. Apparently this was a
> > workaround
> > for an old system, and we should not do it anymore. That was probably ahci.
> > But why did we need it back then?
>
> This was added well before ahci hibernate was working at all, so
> it must have been for wdc.
>
> >
> > These new systems are nvme. Do we have a situation where the last hibernate
> > write operation gets skipped in subr_hibernate.c, or do we have low-level
> > side-effect-free io functions which don't do their job. Is
> > nvme_hibernate_io()
> > failing the last write to disk?
>
> Looking at the nvme shutdown code again, I realise we're not deleting
> the hibernate i/o queue, which we're supposed to do as part of the
> normal shutdown procedure. Perhaps without that the controller isn't
> flushing all the data out to non-volatile storage. We don't issue a
> flush command after the last hibernate write, but we shouldn't have
> to.
>
> Maybe this will help? (only compile tested)
>
My Huawei Matebook sometimes freezes when it goes into sleep mode. I can
usually change the LED at the Caps and Fn buttons, but not when the ZZZ
cycle is successful. I was never motivated enough to investigate because it
didn't happen that often, nor do I use ZZZ that often, but I suspect it's
related to background backups, which create tons of IO.
Anyway, I've tried your diff. I ran 10 ZZZ cycles with manually run backups
and two ZZZ cycles because I ran out of power.
No issues.
>
> Index: nvme.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/ic/nvme.c,v
> diff -u -p -r1.125 nvme.c
> --- nvme.c 16 Dec 2025 00:24:55 -0000 1.125
> +++ nvme.c 5 Jan 2026 04:17:41 -0000
> @@ -557,6 +574,12 @@ nvme_shutdown(struct nvme_softc *sc)
> printf("%s: unable to delete q, disabling\n", DEVNAME(sc));
> goto disable;
> }
> +#ifdef HIBERNATE
> + if (nvme_q_delete(sc, sc->sc_hib_q) != 0) {
> + printf("%s: unable to delete hib q, disabling\n", DEVNAME(sc));
> + goto disable;
> + }
> +#endif
>
> cc = nvme_read4(sc, NVME_CC);
> CLR(cc, NVME_CC_SHN_MASK);
>
--
wbr, Kirill