Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-11 Thread Stefan Hajnoczi
On Sun, May 10, 2020 at 11:02:18AM +, Herrenschmidt, Benjamin wrote:
> On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:
> > 
> > On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
> > > 
> > > 
> > > On 07/05/2020 20:44, Pavel Machek wrote:
> > > > 
> > > > Hi!
> > > > 
> > > > > > it uses its own memory and CPUs + its virtio-vsock emulated device 
> > > > > > for
> > > > > > communication with the primary VM.
> > > > > > 
> > > > > > The memory and CPUs are carved out of the primary VM, they are 
> > > > > > dedicated
> > > > > > for the enclave. The Nitro hypervisor running on the host ensures 
> > > > > > memory
> > > > > > and CPU isolation between the primary VM and the enclave VM.
> > > > > > 
> > > > > > These two components need to reflect the same state e.g. when the
> > > > > > enclave abstraction process (1) is terminated, the enclave VM (2) is
> > > > > > terminated as well.
> > > > > > 
> > > > > > With regard to the communication channel, the primary VM has its own
> > > > > > emulated virtio-vsock PCI device. The enclave VM has its own 
> > > > > > emulated
> > > > > > virtio-vsock device as well. This channel is used, for example, to 
> > > > > > fetch
> > > > > > data in the enclave and then process it. An application that sets 
> > > > > > up the
> > > > > > vsock socket and connects or listens, depending on the use case, is 
> > > > > > then
> > > > > > developed to use this channel; this happens on both ends - primary 
> > > > > > VM
> > > > > > and enclave VM.
> > > > > > 
> > > > > > Let me know if further clarifications are needed.
> > > > > 
> > > > > Thanks, this is all useful.  However can you please clarify the
> > > > > low-level details here?
> > > > 
> > > > Is the virtual machine manager open-source? If so, I guess pointer for 
> > > > sources
> > > > would be useful.
> > > 
> > > Hi Pavel,
> > > 
> > > Thanks for reaching out.
> > > 
> > > The VMM that is used for the primary / parent VM is not open source.
> > 
> > Do we want to merge code that opensource community can not test?
> 
> Hehe.. this isn't quite the story Pavel :)
> 
> We merge support for proprietary hypervisors, this is no different. You
> can test it, well at least you'll be able to ... when AWS deploys the
> functionality. You don't need the hypervisor itself to be open source.
> 
> In fact, in this case, it's not even low level invasive arch code like
> some of the above can be. It's a driver for a PCI device :-) Granted a
> virtual one. We merge drivers for PCI devices routinely without the RTL
> or firmware of those devices being open source.
> 
> So yes, we probably want this if it's going to be a useful features to
> users when running on AWS EC2. (Disclaimer: I work for AWS these days).

I agree that the VMM does not need to be open source.

What is missing though are details of the enclave's initial state and
the image format required to boot code. Until this documentation is
available only Amazon can write a userspace application that does
anything useful with this driver.

Some of the people from Amazon are long-time Linux contributors (such as
yourself!) and the intent to publish this information has been
expressed, so I'm sure that will be done.

Until then, it's cool but no one else can play with it.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-11 Thread Paraschiv, Andra-Irina



On 10/05/2020 12:57, Li Qiang wrote:



Paraschiv, Andra-Irina > 于2020年4月24日周五 下午10:03写道:




On 24/04/2020 12:59, Tian, Kevin wrote:
>
>> From: Paraschiv, Andra-Irina
>> Sent: Thursday, April 23, 2020 9:20 PM
>>
>> On 22/04/2020 00:46, Paolo Bonzini wrote:
>>> On 21/04/20 20:41, Andra Paraschiv wrote:
 An enclave communicates with the primary VM via a local
communication
>> channel,
 using virtio-vsock [2]. An enclave does not have a disk or a
network device
 attached.
>>> Is it possible to have a sample of this in the samples/ directory?
>> I can add in v2 a sample file including the basic flow of how
to use the
>> ioctl interface to create / terminate an enclave.
>>
>> Then we can update / build on top it based on the ongoing
discussions on
>> the patch series and the received feedback.
>>
>>> I am interested especially in:
>>>
>>> - the initial CPU state: CPL0 vs. CPL3, initial program
counter, etc.
>>>
>>> - the communication channel; does the enclave see the usual
local APIC
>>> and IOAPIC interfaces in order to get interrupts from
virtio-vsock, and
>>> where is the virtio-vsock device (virtio-mmio I suppose)
placed in memory?
>>>
>>> - what the enclave is allowed to do: can it change privilege
levels,
>>> what happens if the enclave performs an access to nonexistent
memory,
>> etc.
>>> - whether there are special hypercall interfaces for the enclave
>> An enclave is a VM, running on the same host as the primary VM,
that
>> launched the enclave. They are siblings.
>>
>> Here we need to think of two components:
>>
>> 1. An enclave abstraction process - a process running in the
primary VM
>> guest, that uses the provided ioctl interface of the Nitro Enclaves
>> kernel driver to spawn an enclave VM (that's 2 below).
>>
>> How does all gets to an enclave VM running on the host?
>>
>> There is a Nitro Enclaves emulated PCI device exposed to the
primary VM.
>> The driver for this new PCI device is included in the current
patch series.
>>
>> The ioctl logic is mapped to PCI device commands e.g. the
>> NE_ENCLAVE_START ioctl maps to an enclave start PCI command or the
>> KVM_SET_USER_MEMORY_REGION maps to an add memory PCI command.
>> The PCI
>> device commands are then translated into actions taken on the
hypervisor
>> side; that's the Nitro hypervisor running on the host where the
primary
>> VM is running.
>>
>> 2. The enclave itself - a VM running on the same host as the
primary VM
>> that spawned it.
>>
>> The enclave VM has no persistent storage or network interface
attached,
>> it uses its own memory and CPUs + its virtio-vsock emulated
device for
>> communication with the primary VM.
> sounds like a firecracker VM?

It's a VM crafted for enclave needs.

>
>> The memory and CPUs are carved out of the primary VM, they are
dedicated
>> for the enclave. The Nitro hypervisor running on the host
ensures memory
>> and CPU isolation between the primary VM and the enclave VM.
> In last paragraph, you said that the enclave VM uses its own
memory and
> CPUs. Then here, you said the memory/CPUs are carved out and
dedicated
> from the primary VM. Can you elaborate which one is accurate? or
a mixed
> model?

Memory and CPUs are carved out of the primary VM and are dedicated
for
the enclave VM. I mentioned above as "its own" in the sense that the
primary VM doesn't use these carved out resources while the
enclave is
running, as they are dedicated to the enclave.

Hope that now it's more clear.

>
>>
>> These two components need to reflect the same state e.g. when the
>> enclave abstraction process (1) is terminated, the enclave VM
(2) is
>> terminated as well.
>>
>> With regard to the communication channel, the primary VM has
its own
>> emulated virtio-vsock PCI device. The enclave VM has its own
emulated
>> virtio-vsock device as well. This channel is used, for example,
to fetch
>> data in the enclave and then process it. An application that
sets up the
>> vsock socket and connects or listens, depending on the use
case, is then
>> developed to use this channel; this happens on both ends -
primary VM
>> and enclave VM.
> How does the application in the primary VM assign task to be
executed
> in the enclave VM? I didn't see such command in this series, so
suppose
> it is also communicated through virtio-vsock?

The application that runs in the enclave needs to be packaged in an
enclave image together with the OS ( e.g. kernel, ramdisk, init )
that
will run in the enclave 

Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-11 Thread Paraschiv, Andra-Irina



On 10/05/2020 14:02, Herrenschmidt, Benjamin wrote:

On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:

On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:


On 07/05/2020 20:44, Pavel Machek wrote:

Hi!


it uses its own memory and CPUs + its virtio-vsock emulated device for
communication with the primary VM.

The memory and CPUs are carved out of the primary VM, they are dedicated
for the enclave. The Nitro hypervisor running on the host ensures memory
and CPU isolation between the primary VM and the enclave VM.

These two components need to reflect the same state e.g. when the
enclave abstraction process (1) is terminated, the enclave VM (2) is
terminated as well.

With regard to the communication channel, the primary VM has its own
emulated virtio-vsock PCI device. The enclave VM has its own emulated
virtio-vsock device as well. This channel is used, for example, to fetch
data in the enclave and then process it. An application that sets up the
vsock socket and connects or listens, depending on the use case, is then
developed to use this channel; this happens on both ends - primary VM
and enclave VM.

Let me know if further clarifications are needed.

Thanks, this is all useful.  However can you please clarify the
low-level details here?

Is the virtual machine manager open-source? If so, I guess pointer for sources
would be useful.

Hi Pavel,

Thanks for reaching out.

The VMM that is used for the primary / parent VM is not open source.

Do we want to merge code that opensource community can not test?

Hehe.. this isn't quite the story Pavel :)

We merge support for proprietary hypervisors, this is no different. You
can test it, well at least you'll be able to ... when AWS deploys the
functionality. You don't need the hypervisor itself to be open source.

In fact, in this case, it's not even low level invasive arch code like
some of the above can be. It's a driver for a PCI device :-) Granted a
virtual one. We merge drivers for PCI devices routinely without the RTL
or firmware of those devices being open source.

So yes, we probably want this if it's going to be a useful features to
users when running on AWS EC2. (Disclaimer: I work for AWS these days).


Indeed, it will available for checking out how it works.

The discussions are ongoing here on the LKML - understanding the 
context, clarifying items, sharing feedback and coming with codebase 
updates and basic example flow of the ioctl interface usage. This all 
helps with the path towards merging.


Thanks, Ben, for the follow-up.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-10 Thread Herrenschmidt, Benjamin
On Sat, 2020-05-09 at 21:21 +0200, Pavel Machek wrote:
> 
> On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
> > 
> > 
> > On 07/05/2020 20:44, Pavel Machek wrote:
> > > 
> > > Hi!
> > > 
> > > > > it uses its own memory and CPUs + its virtio-vsock emulated device for
> > > > > communication with the primary VM.
> > > > > 
> > > > > The memory and CPUs are carved out of the primary VM, they are 
> > > > > dedicated
> > > > > for the enclave. The Nitro hypervisor running on the host ensures 
> > > > > memory
> > > > > and CPU isolation between the primary VM and the enclave VM.
> > > > > 
> > > > > These two components need to reflect the same state e.g. when the
> > > > > enclave abstraction process (1) is terminated, the enclave VM (2) is
> > > > > terminated as well.
> > > > > 
> > > > > With regard to the communication channel, the primary VM has its own
> > > > > emulated virtio-vsock PCI device. The enclave VM has its own emulated
> > > > > virtio-vsock device as well. This channel is used, for example, to 
> > > > > fetch
> > > > > data in the enclave and then process it. An application that sets up 
> > > > > the
> > > > > vsock socket and connects or listens, depending on the use case, is 
> > > > > then
> > > > > developed to use this channel; this happens on both ends - primary VM
> > > > > and enclave VM.
> > > > > 
> > > > > Let me know if further clarifications are needed.
> > > > 
> > > > Thanks, this is all useful.  However can you please clarify the
> > > > low-level details here?
> > > 
> > > Is the virtual machine manager open-source? If so, I guess pointer for 
> > > sources
> > > would be useful.
> > 
> > Hi Pavel,
> > 
> > Thanks for reaching out.
> > 
> > The VMM that is used for the primary / parent VM is not open source.
> 
> Do we want to merge code that opensource community can not test?

Hehe.. this isn't quite the story Pavel :)

We merge support for proprietary hypervisors, this is no different. You
can test it, well at least you'll be able to ... when AWS deploys the
functionality. You don't need the hypervisor itself to be open source.

In fact, in this case, it's not even low level invasive arch code like
some of the above can be. It's a driver for a PCI device :-) Granted a
virtual one. We merge drivers for PCI devices routinely without the RTL
or firmware of those devices being open source.

So yes, we probably want this if it's going to be a useful features to
users when running on AWS EC2. (Disclaimer: I work for AWS these days).

Cheers,
Ben.



Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-09 Thread Pavel Machek
On Fri 2020-05-08 10:00:27, Paraschiv, Andra-Irina wrote:
> 
> 
> On 07/05/2020 20:44, Pavel Machek wrote:
> >
> >Hi!
> >
> >>>it uses its own memory and CPUs + its virtio-vsock emulated device for
> >>>communication with the primary VM.
> >>>
> >>>The memory and CPUs are carved out of the primary VM, they are dedicated
> >>>for the enclave. The Nitro hypervisor running on the host ensures memory
> >>>and CPU isolation between the primary VM and the enclave VM.
> >>>
> >>>These two components need to reflect the same state e.g. when the
> >>>enclave abstraction process (1) is terminated, the enclave VM (2) is
> >>>terminated as well.
> >>>
> >>>With regard to the communication channel, the primary VM has its own
> >>>emulated virtio-vsock PCI device. The enclave VM has its own emulated
> >>>virtio-vsock device as well. This channel is used, for example, to fetch
> >>>data in the enclave and then process it. An application that sets up the
> >>>vsock socket and connects or listens, depending on the use case, is then
> >>>developed to use this channel; this happens on both ends - primary VM
> >>>and enclave VM.
> >>>
> >>>Let me know if further clarifications are needed.
> >>Thanks, this is all useful.  However can you please clarify the
> >>low-level details here?
> >Is the virtual machine manager open-source? If so, I guess pointer for 
> >sources
> >would be useful.
> 
> Hi Pavel,
> 
> Thanks for reaching out.
> 
> The VMM that is used for the primary / parent VM is not open source.

Do we want to merge code that opensource community can not test?

Pavel

-- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-08 Thread Paraschiv, Andra-Irina



On 07/05/2020 20:44, Pavel Machek wrote:


Hi!


it uses its own memory and CPUs + its virtio-vsock emulated device for
communication with the primary VM.

The memory and CPUs are carved out of the primary VM, they are dedicated
for the enclave. The Nitro hypervisor running on the host ensures memory
and CPU isolation between the primary VM and the enclave VM.

These two components need to reflect the same state e.g. when the
enclave abstraction process (1) is terminated, the enclave VM (2) is
terminated as well.

With regard to the communication channel, the primary VM has its own
emulated virtio-vsock PCI device. The enclave VM has its own emulated
virtio-vsock device as well. This channel is used, for example, to fetch
data in the enclave and then process it. An application that sets up the
vsock socket and connects or listens, depending on the use case, is then
developed to use this channel; this happens on both ends - primary VM
and enclave VM.

Let me know if further clarifications are needed.

Thanks, this is all useful.  However can you please clarify the
low-level details here?

Is the virtual machine manager open-source? If so, I guess pointer for sources
would be useful.


Hi Pavel,

Thanks for reaching out.

The VMM that is used for the primary / parent VM is not open source.

Andra




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-05-07 Thread Pavel Machek
Hi!

> > it uses its own memory and CPUs + its virtio-vsock emulated device for
> > communication with the primary VM.
> > 
> > The memory and CPUs are carved out of the primary VM, they are dedicated
> > for the enclave. The Nitro hypervisor running on the host ensures memory
> > and CPU isolation between the primary VM and the enclave VM.
> > 
> > These two components need to reflect the same state e.g. when the
> > enclave abstraction process (1) is terminated, the enclave VM (2) is
> > terminated as well.
> > 
> > With regard to the communication channel, the primary VM has its own
> > emulated virtio-vsock PCI device. The enclave VM has its own emulated
> > virtio-vsock device as well. This channel is used, for example, to fetch
> > data in the enclave and then process it. An application that sets up the
> > vsock socket and connects or listens, depending on the use case, is then
> > developed to use this channel; this happens on both ends - primary VM
> > and enclave VM.
> > 
> > Let me know if further clarifications are needed.
> 
> Thanks, this is all useful.  However can you please clarify the
> low-level details here?

Is the virtual machine manager open-source? If so, I guess pointer for sources
would be useful.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Paraschiv, Andra-Irina



On 29/04/2020 16:20, Paolo Bonzini wrote:

On 28/04/20 17:07, Alexander Graf wrote:

So why not just start running the enclave at 0xfff0 in real mode?
Yes everybody hates it, but that's what OSes are written against.  In
the simplest example, the parent enclave can load bzImage and initrd at
0x1 and place firmware tables (MPTable and DMI) somewhere at
0xf; the firmware would just be a few movs to segment registers
followed by a long jmp.

There is a bit of initial attestation flow in the enclave, so that
you can be sure that the code that is running is actually what you wanted to
run.

Can you explain this, since it's not documented?


Hash values are computed for the entire enclave image (EIF), the kernel 
and ramdisk(s). That's used, for example, to checkthat the enclave image 
that is loaded in the enclave VM is the one that was intended to be run.


These crypto measurements are included in a signed attestation document 
generated by the Nitro Hypervisor and further used to prove the identity 
of the enclave. KMS is an example of service that NE is integrated with 
and that checks the attestation doc.





   vm = ne_create(vcpus = 4)
   ne_set_memory(vm, hva, len)
   ne_load_image(vm, addr, len)
   ne_start(vm)

That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
would only be available in the time window between set_memory and start.
It basically implements a memcpy(), but it would completely hide the
hidden semantics of where an EIF has to go, so future device versions
(or even other enclave implementers) could change the logic.

I think it also makes sense to just allocate those 4 ioctls from
scratch. Paolo, would you still want to "donate" KVM ioctl space in that
case?

Sure, that's not a problem.


Ok, thanks for confirmation. I've updated the ioctl number documentation 
to reflect the ioctl space update, taking into account the previous 
discussion; andnow, given also the proposal above from Alex, the 
discussions we currently have and considering further easy extensibility 
of the user space interface.


Thanks,
Andra


Overall, the above should address most of the concerns you raised in
this mail, right? It still requires copying, but at least we don't have
to keep the copy in kernel space.





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Alexander Graf



On 30.04.20 13:58, Paolo Bonzini wrote:


On 30/04/20 13:47, Alexander Graf wrote:


So the issue would be that a firmware image provided by the parent could
be tampered with by something malicious running in the parent enclave?


You have to have a root of trust somewhere. That root then checks and
attests everything it runs. What exactly would you attest for with a
flat address space model?

So the issue is that the enclave code can not trust its own integrity if
it doesn't have anything at a higher level attesting it. The way this is
usually solved on bare metal systems is that you trust your CPU which
then checks the firmware integrity (Boot Guard). Where would you put
that check in a VM model?


In the enclave device driver, I would just limit the attestation to the
firmware image

So yeah it wouldn't be a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xff0.  You would still need
"load image" functionality.


How close would it be to a normal VM then? And
if it's not, what's the point of sticking to such terrible legacy boot
paths?


The point is that there's already two plausible loaders for the kernel
(bzImage and ELF), so I'd like to decouple the loader and the image.


The loader is implemented by the enclave device. If it wishes to support 
bzImage and ELF it does that. Today, it only does bzImage though IIRC :).


So yes, they are decoupled? Are you saying you would like to build your 
own code in any way you like? Well, that means we either need to add 
support for another loader in the enclave device or your workloads just 
fakes a bzImage header and gets loaded regardless :).



Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Paolo Bonzini
On 30/04/20 13:47, Alexander Graf wrote:
>>
>> So the issue would be that a firmware image provided by the parent could
>> be tampered with by something malicious running in the parent enclave?
> 
> You have to have a root of trust somewhere. That root then checks and
> attests everything it runs. What exactly would you attest for with a
> flat address space model?
> 
> So the issue is that the enclave code can not trust its own integrity if
> it doesn't have anything at a higher level attesting it. The way this is
> usually solved on bare metal systems is that you trust your CPU which
> then checks the firmware integrity (Boot Guard). Where would you put
> that check in a VM model?

In the enclave device driver, I would just limit the attestation to the
firmware image

So yeah it wouldn't be a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xff0.  You would still need
"load image" functionality.

> How close would it be to a normal VM then? And
> if it's not, what's the point of sticking to such terrible legacy boot
> paths?

The point is that there's already two plausible loaders for the kernel
(bzImage and ELF), so I'd like to decouple the loader and the image.

Paolo



Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Alexander Graf



On 30.04.20 13:38, Paolo Bonzini wrote:


On 30/04/20 13:21, Alexander Graf wrote:

Also, would you consider a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xff0?


Consider, sure. But I don't quite see any big benefit just yet. The
current abstraction level for the booted payloads is much higher. That
allows us to simplify the device model dramatically: There is no need to
create a virtual flash region for example.


It doesn't have to be flash, it can be just ROM.


In addition, by moving firmware into the trusted base, firmware can
execute validation of the target image. If you make it all flat, how do
you verify whether what you're booting is what you think you're booting?


So the issue would be that a firmware image provided by the parent could
be tampered with by something malicious running in the parent enclave?


You have to have a root of trust somewhere. That root then checks and 
attests everything it runs. What exactly would you attest for with a 
flat address space model?


So the issue is that the enclave code can not trust its own integrity if 
it doesn't have anything at a higher level attesting it. The way this is 
usually solved on bare metal systems is that you trust your CPU which 
then checks the firmware integrity (Boot Guard). Where would you put 
that check in a VM model? How close would it be to a normal VM then? And 
if it's not, what's the point of sticking to such terrible legacy boot 
paths?



Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Paolo Bonzini
On 30/04/20 13:21, Alexander Graf wrote:
>> Also, would you consider a mode where ne_load_image is not invoked and
>> the enclave starts in real mode at 0xff0?
> 
> Consider, sure. But I don't quite see any big benefit just yet. The
> current abstraction level for the booted payloads is much higher. That
> allows us to simplify the device model dramatically: There is no need to
> create a virtual flash region for example.

It doesn't have to be flash, it can be just ROM.

> In addition, by moving firmware into the trusted base, firmware can
> execute validation of the target image. If you make it all flat, how do
> you verify whether what you're booting is what you think you're booting?

So the issue would be that a firmware image provided by the parent could
be tampered with by something malicious running in the parent enclave?

Paolo

> So in a nutshell, for a PV virtual machine spawning interface, I think
> it would make sense to have memory fully owned by the parent. In the
> enclave world, I would rather not like to give the parent too much
> control over what memory actually means, outside of donating a bucket of
> it.



Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Alexander Graf



On 30.04.20 12:34, Paolo Bonzini wrote:


On 28/04/20 17:07, Alexander Graf wrote:


Why don't we build something like the following instead?

   vm = ne_create(vcpus = 4)
   ne_set_memory(vm, hva, len)
   ne_load_image(vm, addr, len)
   ne_start(vm)

That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
would only be available in the time window between set_memory and start.
It basically implements a memcpy(), but it would completely hide the
hidden semantics of where an EIF has to go, so future device versions
(or even other enclave implementers) could change the logic.


Can we add a file format argument and flags to ne_load_image, to avoid
having a v2 ioctl later?


I think flags along should be enough, no? A new format would just be a flag.

That said, any of the commands above should have flags IMHO.


Also, would you consider a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xff0?


Consider, sure. But I don't quite see any big benefit just yet. The 
current abstraction level for the booted payloads is much higher. That 
allows us to simplify the device model dramatically: There is no need to 
create a virtual flash region for example.


In addition, by moving firmware into the trusted base, firmware can 
execute validation of the target image. If you make it all flat, how do 
you verify whether what you're booting is what you think you're booting?


So in a nutshell, for a PV virtual machine spawning interface, I think 
it would make sense to have memory fully owned by the parent. In the 
enclave world, I would rather not like to give the parent too much 
control over what memory actually means, outside of donating a bucket of it.



Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-30 Thread Paolo Bonzini
On 28/04/20 17:07, Alexander Graf wrote:
> 
> Why don't we build something like the following instead?
> 
>   vm = ne_create(vcpus = 4)
>   ne_set_memory(vm, hva, len)
>   ne_load_image(vm, addr, len)
>   ne_start(vm)
> 
> That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
> would only be available in the time window between set_memory and start.
> It basically implements a memcpy(), but it would completely hide the
> hidden semantics of where an EIF has to go, so future device versions
> (or even other enclave implementers) could change the logic.

Can we add a file format argument and flags to ne_load_image, to avoid
having a v2 ioctl later?

Also, would you consider a mode where ne_load_image is not invoked and
the enclave starts in real mode at 0xff0?

Thanks,

Paolo



Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-29 Thread Paolo Bonzini
On 28/04/20 17:07, Alexander Graf wrote:
>> So why not just start running the enclave at 0xfff0 in real mode?
>> Yes everybody hates it, but that's what OSes are written against.  In
>> the simplest example, the parent enclave can load bzImage and initrd at
>> 0x1 and place firmware tables (MPTable and DMI) somewhere at
>> 0xf; the firmware would just be a few movs to segment registers
>> followed by a long jmp.
> 
> There is a bit of initial attestation flow in the enclave, so that
> you can be sure that the code that is running is actually what you wanted to
> run.

Can you explain this, since it's not documented?

>   vm = ne_create(vcpus = 4)
>   ne_set_memory(vm, hva, len)
>   ne_load_image(vm, addr, len)
>   ne_start(vm)
> 
> That way we would get the EIF loading into kernel space. "LOAD_IMAGE"
> would only be available in the time window between set_memory and start.
> It basically implements a memcpy(), but it would completely hide the
> hidden semantics of where an EIF has to go, so future device versions
> (or even other enclave implementers) could change the logic.
> 
> I think it also makes sense to just allocate those 4 ioctls from
> scratch. Paolo, would you still want to "donate" KVM ioctl space in that
> case?

Sure, that's not a problem.

Paolo

> Overall, the above should address most of the concerns you raised in
> this mail, right? It still requires copying, but at least we don't have
> to keep the copy in kernel space.



Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-28 Thread Liran Alon



On 28/04/2020 18:25, Alexander Graf wrote:



On 27.04.20 13:44, Liran Alon wrote:


On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:


On 25/04/2020 18:25, Liran Alon wrote:


On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:


The memory and CPUs are carved out of the primary VM, they are
dedicated for the enclave. The Nitro hypervisor running on the host
ensures memory and CPU isolation between the primary VM and the
enclave VM.

I hope you properly take into consideration Hyper-Threading
speculative side-channel vulnerabilities here.
i.e. Usually cloud providers designate each CPU core to be assigned
to run only vCPUs of specific guest. To avoid sharing a single CPU
core between multiple guests.
To handle this properly, you need to use some kind of core-scheduling
mechanism (Such that each CPU core either runs only vCPUs of enclave
or only vCPUs of primary VM at any given point in time).

In addition, can you elaborate more on how the enclave memory is
carved out of the primary VM?
Does this involve performing a memory hot-unplug operation from
primary VM or just unmap enclave-assigned guest physical pages from
primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?


Correct, we take into consideration the HT setup. The enclave gets
dedicated physical cores. The primary VM and the enclave VM don't run
on CPU siblings of a physical core.

The way I would imagine this to work is that Primary-VM just specifies
how many vCPUs will the Enclave-VM have and those vCPUs will be set with
affinity to run on same physical CPU cores as Primary-VM.
But with the exception that scheduler is modified to not run vCPUs of
Primary-VM and Enclave-VM as sibling on the same physical CPU core
(core-scheduling). i.e. This is different than primary-VM losing
physical CPU cores permanently as long as the Enclave-VM is running.
Or maybe this should even be controlled by a knob in virtual PCI device
interface to allow flexibility to customer to decide if Enclave-VM needs
dedicated CPU cores or is it ok to share them with Primary-VM
as long as core-scheduling is used to guarantee proper isolation.


Running both parent and enclave on the same core can *potentially* 
lead to L2 cache leakage, so we decided not to go with it :).

Haven't thought about the L2 cache. Makes sense. Ack.




Regarding the memory carve out, the logic includes page table entries
handling.

As I thought. Thanks for conformation.


IIRC, memory hot-unplug can be used for the memory blocks that were
previously hot-plugged.

https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$ 





I don't quite understand why Enclave VM needs to be
provisioned/teardown during primary VM's runtime.

For example, an alternative could have been to just provision both
primary VM and Enclave VM on primary VM startup.
Then, wait for primary VM to setup a communication channel with
Enclave VM (E.g. via virtio-vsock).
Then, primary VM is free to request Enclave VM to perform various
tasks when required on the isolated environment.

Such setup will mimic a common Enclave setup. Such as Microsoft
Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
similar to TEEs running on ARM TrustZone.
i.e. In my alternative proposed solution, the Enclave VM is similar
to VTL1/TrustZone.
It will also avoid requiring introducing a new PCI device and driver.


True, this can be another option, to provision the primary VM and the
enclave VM at launch time.

In the proposed setup, the primary VM starts with the initial
allocated resources (memory, CPUs). The launch path of the enclave VM,
as it's spawned on the same host, is done via the ioctl interface -
PCI device - host hypervisor path. Short-running or long-running
enclave can be bootstrapped during primary VM lifetime. Depending on
the use case, a custom set of resources (memory and CPUs) is set for
an enclave and then given back when the enclave is terminated; these
resources can be used for another enclave spawned later on or the
primary VM tasks.


Yes, I already understood this is how the mechanism work. I'm
questioning whether this is indeed a good approach that should also be
taken by upstream.


I thought the point of Linux was to support devices that exist, rather 
than change the way the world works around it? ;)
I agree. Just poking around to see if upstream wants to implement a 
different approach for Enclaves, regardless of accepting the Nitro 
Enclave virtual PCI driver for AWS use-case of course.



The use-case of using Nitro Enclaves is for a Confidential-Computing
service. i.e. The ability to provision a compute instance that can be
trusted to perform a bunch of computation on sensitive
information with high confidence that it cannot be compromised as it's
highly isolated. Some technologies such as Intel SGX and AMD SEV
attempted to achieve this even with guarantees 

Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-28 Thread Alexander Graf



On 27.04.20 13:44, Liran Alon wrote:


On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:


On 25/04/2020 18:25, Liran Alon wrote:


On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:


The memory and CPUs are carved out of the primary VM, they are
dedicated for the enclave. The Nitro hypervisor running on the host
ensures memory and CPU isolation between the primary VM and the
enclave VM.

I hope you properly take into consideration Hyper-Threading
speculative side-channel vulnerabilities here.
i.e. Usually cloud providers designate each CPU core to be assigned
to run only vCPUs of specific guest. To avoid sharing a single CPU
core between multiple guests.
To handle this properly, you need to use some kind of core-scheduling
mechanism (Such that each CPU core either runs only vCPUs of enclave
or only vCPUs of primary VM at any given point in time).

In addition, can you elaborate more on how the enclave memory is
carved out of the primary VM?
Does this involve performing a memory hot-unplug operation from
primary VM or just unmap enclave-assigned guest physical pages from
primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?


Correct, we take into consideration the HT setup. The enclave gets
dedicated physical cores. The primary VM and the enclave VM don't run
on CPU siblings of a physical core.

The way I would imagine this to work is that Primary-VM just specifies
how many vCPUs will the Enclave-VM have and those vCPUs will be set with
affinity to run on same physical CPU cores as Primary-VM.
But with the exception that scheduler is modified to not run vCPUs of
Primary-VM and Enclave-VM as sibling on the same physical CPU core
(core-scheduling). i.e. This is different than primary-VM losing
physical CPU cores permanently as long as the Enclave-VM is running.
Or maybe this should even be controlled by a knob in virtual PCI device
interface to allow flexibility to customer to decide if Enclave-VM needs
dedicated CPU cores or is it ok to share them with Primary-VM
as long as core-scheduling is used to guarantee proper isolation.


Running both parent and enclave on the same core can *potentially* lead 
to L2 cache leakage, so we decided not to go with it :).




Regarding the memory carve out, the logic includes page table entries
handling.

As I thought. Thanks for conformation.


IIRC, memory hot-unplug can be used for the memory blocks that were
previously hot-plugged.

https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$ 





I don't quite understand why Enclave VM needs to be
provisioned/teardown during primary VM's runtime.

For example, an alternative could have been to just provision both
primary VM and Enclave VM on primary VM startup.
Then, wait for primary VM to setup a communication channel with
Enclave VM (E.g. via virtio-vsock).
Then, primary VM is free to request Enclave VM to perform various
tasks when required on the isolated environment.

Such setup will mimic a common Enclave setup. Such as Microsoft
Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
similar to TEEs running on ARM TrustZone.
i.e. In my alternative proposed solution, the Enclave VM is similar
to VTL1/TrustZone.
It will also avoid requiring introducing a new PCI device and driver.


True, this can be another option, to provision the primary VM and the
enclave VM at launch time.

In the proposed setup, the primary VM starts with the initial
allocated resources (memory, CPUs). The launch path of the enclave VM,
as it's spawned on the same host, is done via the ioctl interface -
PCI device - host hypervisor path. Short-running or long-running
enclave can be bootstrapped during primary VM lifetime. Depending on
the use case, a custom set of resources (memory and CPUs) is set for
an enclave and then given back when the enclave is terminated; these
resources can be used for another enclave spawned later on or the
primary VM tasks.


Yes, I already understood this is how the mechanism work. I'm
questioning whether this is indeed a good approach that should also be
taken by upstream.


I thought the point of Linux was to support devices that exist, rather 
than change the way the world works around it? ;)



The use-case of using Nitro Enclaves is for a Confidential-Computing
service. i.e. The ability to provision a compute instance that can be
trusted to perform a bunch of computation on sensitive
information with high confidence that it cannot be compromised as it's
highly isolated. Some technologies such as Intel SGX and AMD SEV
attempted to achieve this even with guarantees that
the computation is isolated from the hardware and hypervisor itself.


Yeah, that worked really well, didn't it? ;)


I would have expected that for the vast majority of real customer
use-cases, the customer will provision a compute instance that runs some
confidential-computing task in 

Re: [PATCH v1 00/15] Add support for Nitro Enclaves

2020-04-28 Thread Alexander Graf



On 25.04.20 18:05, Paolo Bonzini wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On 24/04/20 21:11, Alexander Graf wrote:

What I was saying above is that maybe code is easier to transfer that
than a .txt file that gets lost somewhere in the Documentation directory
:).


whynotboth.jpg :D


Uh, sure? :)

Let's first hammer out what we really want for the UABI though. Then we 
can document it.



To answer the question though, the target file is in a newly invented
file format called "EIF" and it needs to be loaded at offset 0x80 of
the address space donated to the enclave.


What is this EIF?


It's just a very dumb container format that has a trivial header, a
section with the bzImage and one to many sections of initramfs.

As mentioned earlier in this thread, it really is just "-kernel" and
"-initrd", packed into a single binary for transmission to the host.


Okay, got it.  So, correct me if this is wrong, the information that is
needed to boot the enclave is:

* the kernel, in bzImage format

* the initrd


It's a single EIF file for a good reason. There are checksums in there 
and potentially signatures too, so that you can the enclave can attest 
itself. For the sake of the user space API, the enclave image really 
should just be considered a blob.




* a consecutive amount of memory, to be mapped with
KVM_SET_USER_MEMORY_REGION

Off list, Alex and I discussed having a struct that points to kernel and
initrd off enclave memory, and have the driver build EIF at the
appropriate point in enclave memory (the 8 MiB ofset that you mentioned).

This however has two disadvantages:

1) having the kernel and initrd loaded by the parent VM in enclave
memory has the advantage that you save memory outside the enclave memory
for something that is only needed inside the enclave

2) it is less extensible (what if you want to use PVH in the future for
example) and puts in the driver policy that should be in userspace.


So why not just start running the enclave at 0xfff0 in real mode?
Yes everybody hates it, but that's what OSes are written against.  In
the simplest example, the parent enclave can load bzImage and initrd at
0x1 and place firmware tables (MPTable and DMI) somewhere at
0xf; the firmware would just be a few movs to segment registers
followed by a long jmp.


There is a bit of initial attestation flow in the enclave, so that you 
can be sure that the code that is running is actually what you wanted to 
run.


I would also in general prefer to disconnect the notion of "enclave 
memory" as much as possible from a memory location view. User space 
shouldn't be in the business of knowing location of its donated memory 
ended up at which enclave memory position. By disconnecting the view of 
the memory world, we can do some more optimizations, such as compact 
memory ranges more efficiently in kernel space.



If you want to keep EIF, we measured in QEMU that there is no measurable
difference between loading the kernel in the host and doing it in the
guest, so Amazon could provide an EIF loader stub at 0xfff0 for
backwards compatibility.


It's not about performance :).

So the other thing we discussed was whether the KVM API really turned 
out to be a good fit here. After all, today we merely call:


  * CREATE_VM
  * SET_MEMORY_RANGE
  * CREATE_VCPU
  * START_ENCLAVE

where we even butcher up CREATE_VCPU into a meaningless blob of overhead 
for no good reason.


Why don't we build something like the following instead?

  vm = ne_create(vcpus = 4)
  ne_set_memory(vm, hva, len)
  ne_load_image(vm, addr, len)
  ne_start(vm)

That way we would get the EIF loading into kernel space. "LOAD_IMAGE" 
would only be available in the time window between set_memory and start. 
It basically implements a memcpy(), but it would completely hide the 
hidden semantics of where an EIF has to go, so future device versions 
(or even other enclave implementers) could change the logic.


I think it also makes sense to just allocate those 4 ioctls from 
scratch. Paolo, would you still want to "donate" KVM ioctl space in that 
case?


Overall, the above should address most of the concerns you raised in 
this mail, right? It still requires copying, but at least we don't have 
to keep the copy in kernel space.



Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879