On Jun 10, 2019, at 3:28 PM, Xing, Cedric <cedric.x...@intel.com> wrote:
>> From: Andy Lutomirski [mailto:l...@kernel.org]
>> Sent: Monday, June 10, 2019 12:15 PM
>>
>> On Mon, Jun 10, 2019 at 11:29 AM Xing, Cedric <cedric.x...@intel.com>
>> wrote:
>>>
>>>> From: Christopherson, Sean J
>>>> Sent: Wednesday, June 05, 2019 7:12 PM
>>>>
>>>> +/**
>>>> + * sgx_map_allowed - check vma protections against the associated
>>>> enclave page
>>>> + * @encl: an enclave
>>>> + * @start: start address of the mapping (inclusive)
>>>> + * @end: end address of the mapping (exclusive)
>>>> + * @prot: protection bits of the mapping
>>>> + *
>>>> + * Verify a userspace mapping to an enclave page would not violate
>>>> +the security
>>>> + * requirements of the *kernel*. Note, this is in no way related
>>>> +to the
>>>> + * page protections enforced by hardware via the EPCM. The EPCM
>>>> +protections
>>>> + * can be directly extended by the enclave, i.e. cannot be relied
>>>> +upon by the
>>>> + * kernel for security guarantees of any kind.
>>>> + *
>>>> + * Return:
>>>> + * 0 on success,
>>>> + * -EACCES if the mapping is disallowed
>>>> + */
>>>> +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
>>>> + unsigned long end, unsigned long prot) {
>>>> + struct sgx_encl_page *page;
>>>> + unsigned long addr;
>>>> +
>>>> + prot &= (VM_READ | VM_WRITE | VM_EXEC);
>>>> + if (!prot || !encl)
>>>> + return 0;
>>>> +
>>>> + mutex_lock(&encl->lock);
>>>> +
>>>> + for (addr = start; addr < end; addr += PAGE_SIZE) {
>>>> + page = radix_tree_lookup(&encl->page_tree, addr >>
>>>> PAGE_SHIFT);
>>>> +
>>>> + /*
>>>> + * Do not allow R|W|X to a non-existent page, or
>> protections
>>>> + * beyond those of the existing enclave page.
>>>> + */
>>>> + if (!page || (prot & ~page->prot))
>>>> + return -EACCES;
>>>
>>> In SGX2, pages will be "mapped" before being populated.
>>>
>>> Here's a brief summary for those who don't have enough background on
>> how new EPC pages could be added to a running enclave in SGX2:
>>> - There are 2 new instructions - EACCEPT and EAUG.
>>> - EAUG is used by SGX module to add (augment) a new page to an
>> existing enclave. The newly added page is *inaccessible* until the
>> enclave *accepts* it.
>>> - EACCEPT is the instruction for an enclave to accept a new page.
>>>
>>> And the s/w flow for an enclave to request new EPC pages is expected
>> to be something like the following:
>>> - The enclave issues EACCEPT at the linear address that it would
>> like a new page.
>>> - EACCEPT results in #PF, as there's no page at the linear address
>> above.
>>> - SGX module is notified about the #PF, in form of its vma->vm_ops-
>>> fault() being called by kernel.
>>> - SGX module EAUGs a new EPC page at the fault address, and resumes
>> the enclave.
>>> - EACCEPT is reattempted, and succeeds at this time.
>>
>> This seems like an odd workflow. Shouldn't the #PF return back to
>> untrusted userspace so that the untrusted user code can make its own
>> decision as to whether it wants to EAUG a page there as opposed to, say,
>> killing the enclave or waiting to keep resource usage under control?
>
> This may seem odd to some at the first glance. But if you can think of how
> static heap (pre-allocated by EADD before EINIT) works, the load parses the
> "metadata" coming with the enclave to decide the address/size of the heap,
> EADDs it, and calls it done. In the case of "dynamic" heap (allocated
> dynamically by EAUG after EINIT), the same thing applies - the loader
> determines the range of the heap, tells the SGX module about it, and calls it
> done. Everything else is the between the enclave and the SGX module.
>
> In practice, untrusted code usually doesn't know much about enclaves, just
> like it doesn't know much about the shared objects loaded into its address
> space either. Without the necessary knowledge, untrusted code usually just
> does what it is told (via o-calls, or return value from e-calls), without
> judging that's right or wrong.
>
> When it comes to #PF like what I described, of course a signal could be sent
> to the untrusted code but what would it do then? Usually it'd just come back
> asking for a page at the fault address. So we figured it'd be more efficient
> to just have the kernel EAUG at #PF.
>
> Please don't get me wrong though, as I'm not dictating what the s/w flow
> shall be. It's just going to be a choice offered to user mode. And that
> choice was planned to be offered via mprotect() - i.e. a writable vma causes
> kernel to EAUG while a non-writable vma will result in a signal (then the
> user mode could decide whether to EAUG). The key point is flexibility - as we
> want to allow all reasonable s/w flows instead of dictating one over others.
> We had similar discussions on vDSO API before. And I think you accepted my
> approach because of its flexibility. Am I right?
As long as user code can turn this off, I have no real objection. But it might
make sense to have it be more explicit — have an ioctl set up a range as
“EAUG-on-demand”.
But this is all currently irrelevant. We can argue about it when the patches
show up. :)