On Fri, May 12, 2017 at 03:11:35PM +0000, Natasha Kerensikova wrote: > >Synopsis: Suspend-to-disk doesn't work anymore > >Category: <PR category (one line)> > >Environment: > System : OpenBSD 6.1 > Details : OpenBSD 6.1-current (GENERIC.MP) #6: Fri May 12 15:12:39 > CEST 2017 > > [email protected]:/data/semarie/repos/openbsd/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > On my Thinkpad X220 (with Core i5) with full disk encryption, > OpenBSD doesn't resume after suspend to disk since my latest snanpshot > update (May 7th snapshot). Keeping the same userland and using kernels > helpfully provided by semarie, we bisected the problem to the commits > detailed below. > >How-To-Repeat: > Suspend-to-disk a live OpenBSD. On next boot, it should resume from > disk, but instead it starts a standard boot with dirty filesystems. > >Fix: > Reverting the commits identified on github mirror by the hashes > d223d7cb85c1f2f705da547a0134b949655abe6a ("Switch glxsb(4), VIA > padlock and AES-NI drivers over to the new AES") and > cb3087542b323ec5bf5db9dc64f0d54dc40cca40 ("Switch OCF and IPsec over > to the new AES") fixes the problem, at least until commit > 50f8ee3e5db5b40ae9a05db4742b05e8d975573d (May 11th). >
With Natacha, we continued a bit a try to debug the problem. By activating HIB_DEBUG, the resume showed that it failed due to wrong magic number: [...] sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed sd1: 953866MB, 512 bytes/sector, 1953519473 sectors root on sd1a (63848a4fade4a944.a) swap on sd1b dump on sd1b reading hibernate signature block location: 8641783 wrong magic number in hibernate signature: e82daa08 I am unsure the reason: it could be the hibernate part that don't write it correctly or the resume part that don't read it correctly ? I dunno. By "correctly" I mean: wrong aes key ? use of uninitialised or garbaged struct ? Something that results a "bad state" on writing or reading. With the last commit to revert AES_XTS to rijndael, I pushed it on top of the tested tree (7 days old). The hibernate/resume works again. It makes it to confirm the problem is related to the switch to constant-time-aes in the context of full-disk-encryption. Regarding the problem itself, I don't know enough the crypto part and the initialisation code path to figure the reason. Does aes.c has some initialisation that would arrive later than rijndael.c ? resulting a first read on disk with wrong key or uninitialised structure ? I dunno. I just hope this problem doesn't hide a more subtile underlined problem. I expect the problem to be fixed in next snapshot (a one including the revert of AES_XTS to rijndael). Thanks. -- Sebastien Marie
