On Fri, 19 Apr 2024, Zack Weinberg wrote:
> On Fri, Apr 19, 2024, at 4:15 PM, Mikulas Patocka wrote:
> > On Fri, 19 Apr 2024, Zack Weinberg wrote:
> >> ... the copy
> >> of round_keys in the vector registers *won't* get erased -- the exact
> >> problem being discussed in this thread.
> >
> > On the SYSV ABI, all the vector registers are volatile, so you can erase
> > them in explicit_bzero.
> >
> > On Windows 64-bit ABI, it is more problematic, because some of the vector
> > registers must be preserved.
>
> Oh, huh. Yes, that would work.
I've just realized that this wouldn't work - if the function
explicit_bzero is lazily resolved, the dynamic linker would spill the
vector registers to the stack prior to calling explicit_bzero.
> Call-preserved registers are not a
> problem, because any function that puts secret data in a call-preserved
> register in the first place, must erase it again (by restoring the old
> value) before returning. Therefore, if we made explicit_bzero wipe *all*
> the call-clobbered registers before returning, my example function would
> be safe.
>
> There's still a place secrets could leak to and not get erased, though:
> register spill slots on the stack. Only the compiler could plug this
> leak. Long term, I think what we want is something like
> __attribute__((sensitive)), which can only be applied to variables with
> automatic storage duration, and which means "erase all copies of this
> variable's value, wherever they wound up, at the end of its lifetime."
> Note that such variables must not be put in call-preserved registers in
> non-leaf functions, because then they might get spilled to the stack by
> a callee, which has no way of knowing that it's just leaked a secret.
> And I suppose we might also want to worry about signal frames. Nobody
> said this was gonna be easy ;-)
>
> zw
Yes.
Another problem is varargs - if there is at least one floating point
argument, the compiler will store 8 XMM registers on the stack regardless
of whether they are used or not.
In the past it didn't do it (it made indirect jump based on the value in
the %AL register to save only the used registers), but someone probably
found out that indirect jumps are expensive and that storing all 8
registers is faster.
Mikulas