On Fri, May 12, 2017 at 03:11:35PM +0000, Natasha Kerensikova wrote:
> >Synopsis:    Suspend-to-disk doesn't work anymore
> >Category:    <PR category (one line)>
> >Environment:
>       System      : OpenBSD 6.1
>       Details     : OpenBSD 6.1-current (GENERIC.MP) #6: Fri May 12 15:12:39 
> CEST 2017
>                        
> [email protected]:/data/semarie/repos/openbsd/src/sys/arch/amd64/compile/GENERIC.MP
> 
>       Architecture: OpenBSD.amd64
>       Machine     : amd64
> >Description:
>       On my Thinkpad X220 (with Core i5) with full disk encryption,
>       OpenBSD doesn't resume after suspend to disk since my latest snanpshot
>       update (May 7th snapshot). Keeping the same userland and using kernels
>       helpfully provided by semarie, we bisected the problem to the commits
>       detailed below.
> >How-To-Repeat:
>       Suspend-to-disk a live OpenBSD. On next boot, it should resume from
>       disk, but instead it starts a standard boot with dirty filesystems.
> >Fix:
>       Reverting the commits identified on github mirror by the hashes
>       d223d7cb85c1f2f705da547a0134b949655abe6a ("Switch glxsb(4), VIA
>       padlock and AES-NI drivers over to the new AES") and
>       cb3087542b323ec5bf5db9dc64f0d54dc40cca40 ("Switch OCF and IPsec over
>       to the new AES") fixes the problem, at least until commit
>       50f8ee3e5db5b40ae9a05db4742b05e8d975573d (May 11th).
> 

With Natacha, we continued a bit a try to debug the problem.

By activating HIB_DEBUG, the resume showed that it failed due to wrong magic 
number:

[...]
sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
sd1: 953866MB, 512 bytes/sector, 1953519473 sectors
root on sd1a (63848a4fade4a944.a) swap on sd1b dump on sd1b
reading hibernate signature block location: 8641783
wrong magic number in hibernate signature: e82daa08

I am unsure the reason: it could be the hibernate part that don't write
it correctly or the resume part that don't read it correctly ? I dunno.

By "correctly" I mean: wrong aes key ? use of uninitialised or garbaged
struct ? Something that results a "bad state" on writing or reading.


With the last commit to revert AES_XTS to rijndael, I pushed it on
top of the tested tree (7 days old). The hibernate/resume works again.

It makes it to confirm the problem is related to the switch to
constant-time-aes in the context of full-disk-encryption.

Regarding the problem itself, I don't know enough the crypto part and
the initialisation code path to figure the reason. Does aes.c has some
initialisation that would arrive later than rijndael.c ? resulting a
first read on disk with wrong key or uninitialised structure ? I dunno.
I just hope this problem doesn't hide a more subtile underlined problem.

I expect the problem to be fixed in next snapshot (a one including the revert
of AES_XTS to rijndael).

Thanks.
-- 
Sebastien Marie

Reply via email to