Hi all,
I assume I don't have to set the context, but I would like to start
thinking out loudly and discussing what Jailhouse can/should/has to do
in order to address the threats related to speculative execution.
The best news: Jailhouse runs on some processors, namely ARM Cortex-A7
and A53, that aren't affected by these attacks at all, simply due to a
lack of speculation. Sometimes slower is nicer...
Further good news: Meltdown is not our problem. It only affects setups
with shared address spaces, but both on x86 and ARM we switch them
completely when moving from guest to host context (and vice versa). The
issue may affect our guests, but that's not a hypervisor problem.
The bad news: All the Spectre attacks conceptually affect us on
vulnerable CPUs.
Now we could "simply" start porting all the current and future
mitigation patterns and extensions to the hypervisor core. Some actually
make sense, but I would like to start looking at the topic from a higher
perspective.
Both Spectre and Meltdown are confidentiality attacks. So, what could
the Jailhouse hypervisor leak if it is running on a vulnerable CPU?
- Its binary code
=> not an asset, it's de facto public already (GPL)
- Its configuration
=> generally not an asset either because it doesn't contain secrets
like cryptographic keys or passwords, only platform information
and the partitioning setup
- RAM and other sensitive state information of guests
=> ok, now it's getting interesting
The key purpose of Jailhouse is to isolate guests from each other. If
one guest could trick the hypervisor to reveal secrets of another guest,
we failed.
But: you cannot leak what you do not know.
Jailhouse already has a couple of design properties that avoid direct or
indirect interference and leakage between guests:
- It has no means built in to share a logical CPU between multiple
guests. This differentiates Jailhouse significantly from the
hypervisors that run, e.g., in cloud environments.
- It tries hard to map only as much of a guest into its address space as
it needs to perform its duties.
- It has only a very small interface to its guests, consisting of a hand
full of hypercall services and the necessary emulation of certain
hardware accesses, depending on the target architecture's
virtualization capabilities.
The Spectre way of stealing information is tricking more privileged
code, the OS kernel or a hypervisor, to speculatively read that data and
then derive from timing differences the content. For that, the attacker
has to be able to influence which code is speculatively executed. So the
known attack vectors go via synchronous service requests on the
privileged code.
Mapping this on Jailhouse, an attacker could issue hypercalls or trigger
MMIO or PIO accesses on resources that require hypervisor interception.
These paths only work synchronously. So the attacker would see at most
what its local CPU can see when running in hypervisor mode. If we do not
expose secrets of other cells this way by hiding them from the local
mapping, no assets will remain, even if Spectre is not mitigated via the
currently known measured for general purpose OSes and hypervisors.
To achieve this, I think we should do three things:
- reduce the visibility of sensitive cell (guest) state by CPUs that do
not belong to that cell, specifically registers state and temporary
mappings of cell memory into the hypervisor address space
- ensure we have proper barriers in place (cache flushes, possibly even
speculation barriers) when handing over a CPU from one cell to another
- partition the system in a way that all hyperthread siblings belong to
the same cell (or are unused / disabled)
The last point is a user task that we can support via documentation and
eventually even tooling (config checker). It's already a very reasonable
measure to apply, simply due to the timing effects between hyperthread
siblings.
The first point is something I started to hack on, see
https://github.com/siemens/jailhouse/commits/wip/percpu-mappings. It
hides everything of the per-cpu state that is not related to cross-CPU
control and communication. The code is still a quick hack, breaks
anything != Intel x86 and isn't even complete for Intel yet (missing
private temporary mappings) but it's a starting point.
The second point in the list requires more thoughts. Maybe we are safe
already because we flush enough cached state today or because the switch
is not taking place continuously. But maybe it's just wiser and cheap
enough anyway to add explicit barriers as made available / documented
for our target architectures now.
So far from me for now. I would appreciate a lot any thoughts and
comments of others on this. The topic is complex, and we are just
starting to wrap our heads around its details. The more we need
cross-checks of the concepts and their implementations by as many
experts as we have.
Thanks,
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.