On Sun, Jan 04, 2026 at 10:34:55AM -0700, Theo de Raadt wrote:
> Mark Kettenis <[email protected]> wrote:
> 
> > I'm 100% sure that I am booting the correct kernel.  The checksum
> > calculated by that code above is the same.  But for some reason the
> > checksum that we read back from the hibernation info on disk is
> > all-zeroes.  So something is going wrong.  Will dig deeper when I have
> > time.
> 
> Is it just the checksum field --- or has the signature sector not
> actually made it onto disk?
> 
> There is this messy thing in subr_hibernate.c around 1954
> 
>         /* Allow the disk to settle */
>         delay(500000);
> 
> Few days ago I asked Mike about this again.  Apparently this was a workaround
> for an old system, and we should not do it anymore.  That was probably ahci.
> But why did we need it back then?

This was added well before ahci hibernate was working at all, so
it must have been for wdc.

> 
> These new systems are nvme.  Do we have a situation where the last hibernate
> write operation gets skipped in subr_hibernate.c, or do we have low-level
> side-effect-free io functions which don't do their job.  Is 
> nvme_hibernate_io()
> failing the last write to disk?

Looking at the nvme shutdown code again, I realise we're not deleting
the hibernate i/o queue, which we're supposed to do as part of the
normal shutdown procedure. Perhaps without that the controller isn't
flushing all the data out to non-volatile storage. We don't issue a
flush command after the last hibernate write, but we shouldn't have
to.

Maybe this will help? (only compile tested)


Index: nvme.c
===================================================================
RCS file: /cvs/src/sys/dev/ic/nvme.c,v
diff -u -p -r1.125 nvme.c
--- nvme.c      16 Dec 2025 00:24:55 -0000      1.125
+++ nvme.c      5 Jan 2026 04:17:41 -0000
@@ -557,6 +574,12 @@ nvme_shutdown(struct nvme_softc *sc)
                printf("%s: unable to delete q, disabling\n", DEVNAME(sc));
                goto disable;
        }
+#ifdef HIBERNATE
+       if (nvme_q_delete(sc, sc->sc_hib_q) != 0) {
+               printf("%s: unable to delete hib q, disabling\n", DEVNAME(sc));
+               goto disable;
+       }
+#endif
 
        cc = nvme_read4(sc, NVME_CC);
        CLR(cc, NVME_CC_SHN_MASK);

Reply via email to