Re: Can we credibly make vdso_sgx_enter_enclave() pleasant to use?

2020-09-26 Thread Xing, Cedric

On 9/26/2020 12:05 PM, Andy Lutomirski wrote:

On Fri, Sep 25, 2020 at 3:29 PM Sean Christopherson
 wrote:


On Fri, Sep 25, 2020 at 01:20:03PM -0700, Andy Lutomirski wrote:

On Fri, Sep 25, 2020 at 12:09 PM Sean Christopherson
 wrote:

But where would the vDSO get memory for that little data structure?  It can't
be percpu because the current task can get preempted.  It can't be per instance
of the vDSO because a single mm/process can have multiple tasks entering an
enclave.  Per task might work, but how would the vDSO get that info?  E.g.
via a syscall, which seems like complete overkill?


The stack.


Duh.


The vDSO could, logically, do:

struct sgx_entry_state {
   unsigned long real_rbp;
   unsigned long real_rsp;
   unsigned long orig_fsbase;
};

...

   struct sgx_entry_state state;
   state.rbp = rbp;  [ hey, this is pseudocode.  the real code would be in asm.]
   state.rsp = rsp;
   state.fsbase = __rdfsbase();
   rbp = arg->rbp;

   /* set up all other regs */
   wrfsbase %rsp
   movq enclave_rsp(%rsp), %rsp


I think this is where there's a disconnect with what is being requested by the
folks writing run times.  IIUC, they want to use the untrusted runtime's stack
to pass params because it doesn't require additional memory allocations and
automagically grows as necessary (obviously to a certain limit).  I.e. forcing
the caller to provide an alternative "stack" defeats the purpose of using the
untrusted stack.


I personally find this concept rather distasteful.  Sure, it might
save a couple cycles, but it means that the enclave has hardcoded some
kind of assumption about the outside-the-enclave stack.



It's more than just a couple of cycles. It's convenience. Yes, an 
enclave may overflow the caller's stack with big allocations but those 
are rare. In more common cases less than the red zone size (128 bytes) 
are required. And we should optimize for the more common cases.


And yes again, the enclave has to assume something about the stack. But 
please note that the vDSO has to save its "context" somewhere so that it 
can switch back to it. The "context" currently is anchored at RBP so the 
enclave has to preserve it. If not RBP, the "context" has to anchor 
"something else", and we have to assume the enclave preserve that 
"something else". That said, we can't get rid of assumptions. RBP is a 
reasonable choice because it is simple without obvious side effects, 
i.e. most compilers/ABIs preserve RBP so developers don't have to pay 
extra attention to it generally.


If I were asked to opine on the API, I'd say I like the most the initial 
version with callback support. The stack parameters were easier to 
set/retrieve than struct members (requiring hand-crafted offset macros) 
in asm, and didn't need any padding. The callback was easy to use 
(non-NULL pointer) or skip (NULL pointer). Standard/unified error codes 
were easier to handle than separate error/exit_reason. Additional data 
for callback could be captured in a structure enclosing 
sgx_enclave_exception so no need to be explicitly passed (languages that 
don't support offsetof/container_of can always employ an asm wrapper). 
The current API looks confusing and overly complicated to me, even 
though it still works.



Given that RBP seems reasonably likely to be stable across enclave
executions, I suppose we could add a flag and an RSP value in the
sgx_enclave_run structure.  If set, the vDSO would swap out RSP (but
not RBP) with the provided value on entry and record the new RSP on
exit.  I don't know if this would be useful to people.



I would say, if one wants to use a different untrusted stack for calling 
the enclave, he/she could switch stack before calling vDSO. Given this 
isn't commonly required, I vote NO here.




I do think we need to add at least minimal CFI annotations no matter what we do.



Can't agree more.


Re: [PATCH v21 24/28] selftests/x86: Add a selftest for SGX

2019-07-17 Thread Xing, Cedric

On 7/13/2019 10:08 AM, Jarkko Sakkinen wrote:

Add a selftest for SGX. It is a trivial test where a simple enclave
copies one 64-bit word of memory between two memory locations given to
the enclave as arguments.

Signed-off-by: Jarkko Sakkinen 
---
  tools/testing/selftests/x86/Makefile  |  10 +
  tools/testing/selftests/x86/sgx/Makefile  |  48 ++
  tools/testing/selftests/x86/sgx/defines.h |  39 ++
  tools/testing/selftests/x86/sgx/encl.c|  20 +
  tools/testing/selftests/x86/sgx/encl.lds  |  33 ++
  .../selftests/x86/sgx/encl_bootstrap.S|  94 
  tools/testing/selftests/x86/sgx/encl_piggy.S  |  18 +
  tools/testing/selftests/x86/sgx/encl_piggy.h  |  14 +
  tools/testing/selftests/x86/sgx/main.c| 301 +++
  tools/testing/selftests/x86/sgx/sgx_call.S|  49 ++
  tools/testing/selftests/x86/sgx/sgxsign.c | 508 ++
  .../testing/selftests/x86/sgx/signing_key.pem |  39 ++
  12 files changed, 1173 insertions(+)
  create mode 100644 tools/testing/selftests/x86/sgx/Makefile
  create mode 100644 tools/testing/selftests/x86/sgx/defines.h
  create mode 100644 tools/testing/selftests/x86/sgx/encl.c
  create mode 100644 tools/testing/selftests/x86/sgx/encl.lds
  create mode 100644 tools/testing/selftests/x86/sgx/encl_bootstrap.S
  create mode 100644 tools/testing/selftests/x86/sgx/encl_piggy.S
  create mode 100644 tools/testing/selftests/x86/sgx/encl_piggy.h
  create mode 100644 tools/testing/selftests/x86/sgx/main.c
  create mode 100644 tools/testing/selftests/x86/sgx/sgx_call.S
  create mode 100644 tools/testing/selftests/x86/sgx/sgxsign.c
  create mode 100644 tools/testing/selftests/x86/sgx/signing_key.pem

diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index fa07d526fe39..a1831406fd01 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -1,4 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
+
+SUBDIRS_64 := sgx
+
  all:
  
  include ../lib.mk

@@ -68,6 +71,13 @@ all_32: $(BINARIES_32)
  
  all_64: $(BINARIES_64)
  
+all_64: $(SUBDIRS_64)


$(SUBDIRS_64) aren't targets. No need for all_64 to depend on them.


+   @for DIR in $(SUBDIRS_64); do   \
+   BUILD_TARGET=$(OUTPUT)/$$DIR;   \
+   mkdir $$BUILD_TARGET  -p;   \
+   make OUTPUT=$$BUILD_TARGET -C $$DIR $@; \


Please use $(MAKE), otherwise command line options cannot be passed onto 
sub-makes.



+   done


The above only builds but will not run SGX tests.

Also, 'clean' target will not descend into sgx folder either.


+
  EXTRA_CLEAN := $(BINARIES_32) $(BINARIES_64)
  
  $(BINARIES_32): $(OUTPUT)/%_32: %.c

diff --git a/tools/testing/selftests/x86/sgx/Makefile 
b/tools/testing/selftests/x86/sgx/Makefile
new file mode 100644
index ..10136b73096b
--- /dev/null
+++ b/tools/testing/selftests/x86/sgx/Makefile
@@ -0,0 +1,48 @@
+top_srcdir = ../../../../..
+
+include ../../lib.mk
+
+HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC -z noexecstack
+ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
+  -fno-stack-protector -mrdrnd $(INCLUDES)
+
+TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+all_64: $(TEST_CUSTOM_PROGS)
+
+$(TEST_CUSTOM_PROGS): $(OUTPUT)/main.o $(OUTPUT)/sgx_call.o \
+ $(OUTPUT)/encl_piggy.o
+   $(CC) $(HOST_CFLAGS) -o $@ $^
+
+$(OUTPUT)/main.o: main.c
+   $(CC) $(HOST_CFLAGS) -c $< -o $@


.o files don't have to be generated/kept. And to be consistent with 
other selftests, please don't generate/keep them.



+
+$(OUTPUT)/sgx_call.o: sgx_call.S
+   $(CC) $(HOST_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/encl_piggy.o: $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
+   $(CC) $(HOST_CFLAGS) -c encl_piggy.S -o $@


Without -I, the above command breaks when "O=" is specified.


+
+$(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf $(OUTPUT)/sgxsign
+   objcopy --remove-section=.got.plt -O binary $< $@


.got.plt section will never be present for statically linked binaries.


+
+$(OUTPUT)/encl.elf: $(OUTPUT)/encl.o $(OUTPUT)/encl_bootstrap.o
+   $(CC) $(ENCL_CFLAGS) -T encl.lds -o $@ $^


Please fix the warning of ".note.gnu.build-id section discarded".


+
+$(OUTPUT)/encl.o: encl.c
+   $(CC) $(ENCL_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/encl_bootstrap.o: encl_bootstrap.S
+   $(CC) $(ENCL_CFLAGS) -c $< -o $@
+
+$(OUTPUT)/encl.ss: $(OUTPUT)/encl.bin  $(OUTPUT)/sgxsign
+   $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
+
+$(OUTPUT)/sgxsign: sgxsign.c
+   $(CC) -o $@ $< -lcrypto
+
+EXTRA_CLEAN := $(OUTPUT)/sgx-selftest $(OUTPUT)/sgx-selftest.o \
+  $(OUTPUT)/sgx_call.o $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss \
+  $(OUTPUT)/encl.elf $(OUTPUT)/encl.o $(OUTPUT)/encl_bootstrap.o \
+  $(OUTPUT)/sgxsign
+


encl_piggy.o, main.o and test_sgx are not cleaned.


+.PHONY: clean


Re: [PATCH v21 23/28] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-07-17 Thread Xing, Cedric

On 7/13/2019 10:07 AM, Jarkko Sakkinen wrote:

From: Sean Christopherson 

Intel Software Guard Extensions (SGX) introduces a new CPL3-only enclave
mode that runs as a sort of black box shared object that is hosted by an
untrusted normal CPL3 process.

Skipping over a great deal of gory architecture details[1], SGX was
designed in such a way that the host process can utilize a library to
build, launch and run an enclave.  This is roughly analogous to how
e.g. libc implementations are used by most applications so that the
application can focus on its business logic.

The big gotcha is that because enclaves can generate *and* handle
exceptions, any SGX library must be prepared to handle nearly any
exception at any time (well, any time a thread is executing in an
enclave).  In Linux, this means the SGX library must register a
signal handler in order to intercept relevant exceptions and forward
them to the enclave (or in some cases, take action on behalf of the
enclave).  Unfortunately, Linux's signal mechanism doesn't mesh well
with libraries, e.g. signal handlers are process wide, are difficult
to chain, etc...  This becomes particularly nasty when using multiple
levels of libraries that register signal handlers, e.g. running an
enclave via cgo inside of the Go runtime.

In comes vDSO to save the day.  Now that vDSO can fixup exceptions,
add a function, __vdso_sgx_enter_enclave(), to wrap enclave transitions
and intercept any exceptions that occur when running the enclave.

__vdso_sgx_enter_enclave() does NOT adhere to the x86-64 ABI and instead
uses a custom calling convention.  The primary motivation is to avoid
issues that arise due to asynchronous enclave exits.  The x86-64 ABI
requires that EFLAGS.DF, MXCSR and FCW be preserved by the callee, and
unfortunately for the vDSO, the aformentioned registers/bits are not
restored after an asynchronous exit, e.g. EFLAGS.DF is in an unknown
state while MXCSR and FCW are reset to their init values.  So the vDSO
cannot simply pass the buck by requiring enclaves to adhere to the
x86-64 ABI.  That leaves three somewhat reasonable options:

   1) Save/restore non-volatile GPRs, MXCSR and FCW, and clear EFLAGS.DF

  + 100% compliant with the x86-64 ABI
  + Callable from any code
  + Minimal documentation required
  - Restoring MXCSR/FCW is likely unnecessary 99% of the time
  - Slow

   2) Save/restore non-volatile GPRs and clear EFLAGS.DF

  + Mostly compliant with the x86-64 ABI
  + Callable from any code that doesn't use SIMD registers
  - Need to document deviations from x86-64 ABI, i.e. MXCSR and FCW

   3) Require the caller to save/restore everything.

  + Fast
  + Userspace can pass all GPRs to the enclave (minus EAX, RBX and RCX)
  - Custom ABI
  - For all intents and purposes must be called from an assembly wrapper

__vdso_sgx_enter_enclave() implements option (3).  The custom ABI is
mostly a documentation issue, and even that is offset by the fact that
being more similar to hardware's ENCLU[EENTER/ERESUME] ABI reduces the
amount of documentation needed for the vDSO, e.g. options (2) and (3)
would need to document which registers are marshalled to/from enclaves.
Requiring an assembly wrapper imparts minimal pain on userspace as SGX
libraries and/or applications need a healthy chunk of assembly, e.g. in
the enclave, regardless of the vDSO's implementation.

Note, the C-like pseudocode describing the assembly routine is wrapped
in a non-existent macro instead of in a comment to trick kernel-doc into
auto-parsing the documentation and function prototype.  This is a double
win as the pseudocode is intended to aid kernel developers, not userland
enclave developers.

[1] Documentation/x86/sgx/1.Architecture.rst

Suggested-by: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Jarkko Sakkinen 
Cc: Dave Hansen 
Cc: Josh Triplett 
Cc: Haitao Huang 
Cc: Jethro Beekman 
Cc: Dr. Greg Wettstein 
Signed-off-by: Sean Christopherson 
Co-developed-by: Cedric Xing 
Signed-off-by: Cedric Xing 
---
  arch/x86/entry/vdso/Makefile |   2 +
  arch/x86/entry/vdso/vdso.lds.S   |   1 +
  arch/x86/entry/vdso/vsgx_enter_enclave.S | 169 +++
  arch/x86/include/uapi/asm/sgx.h  |  18 +++
  4 files changed, 190 insertions(+)
  create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 715106395c71..1ae23e7d54a9 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -24,6 +24,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)   := y
  
  # files to link into the vdso

  vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-$(VDSO64-y)  += vsgx_enter_enclave.o
  
  # files to link into kernel

  obj-y += vma.o extable.o
@@ -92,6 +93,7 @@ CFLAGS_REMOVE_vdso-note.o = -pg
  CFLAGS_REMOVE_vclock_gettime.o = -pg
  CFLAGS_REMOVE_vgetcpu.o = -pg
  CFLAGS_REMOVE_vvar.o = -pg
+CFLAGS

Re: [RFC PATCH v4 1/3] selftests/x86/sgx: Fix Makefile for SGX selftest

2019-07-13 Thread Xing, Cedric

On 7/13/2019 8:15 AM, Jarkko Sakkinen wrote:

On Sat, 2019-07-13 at 18:10 +0300, Jarkko Sakkinen wrote:

On Fri, 2019-07-12 at 23:51 -0700, Cedric Xing wrote:

The original x86/sgx/Makefile didn't work when "x86/sgx" was specified as the
test target, nor did it work with "run_tests" as the make target. Yet another
problem was that it breaks 32-bit only build. This patch fixes those problems,
along with adjustments to compiler/linker options and simplifications to the
build rules.

Signed-off-by: Cedric Xing 


You must split this in quite a few separate patches:

1. One for each fix.
2. At least one patch for each change in compiler and linker options with a
commit message clearly expalaining why the change was made.
3. One for each simplification.

We don't support 32-bit build:

config INTEL_SGX
bool "Intel SGX core functionality"
depends on X86_64 && CPU_SUP_INTEL


This is not to say that changes suck. This just in "unreviewable" state as far
as the kernel processes go...


Please note that your patchset hasn't been upstreamed yet. Your Makefile 
is problematic to begin with. Technically it's your job to make it work 
before sending out any patches. You didn't explain what's done for each 
line of Makefile in your commit message either.


Not saying documentation is unimportant, but the purposes for those 
changes are obvious and easy to understand for anyone having reasonable 
knowledge on how Makefile works.


I'm totally fine not fixing the Makefile. You can just leave them out.


/Jarkko



Re: [RFC PATCH v2 3/3] selftests/x86: Augment SGX selftest to test new __vdso_sgx_enter_enclave() and its callback interface

2019-07-13 Thread Xing, Cedric

On 7/11/2019 8:25 PM, Jarkko Sakkinen wrote:

On Tue, Apr 23, 2019 at 11:26:23PM -0700, Cedric Xing wrote:

This patch augments SGX selftest with two new tests.

The first test exercises the newly added callback interface, by marking the
whole enclave range as PROT_READ, then calling mprotect() upon #PFs to add
necessary PTE permissions per PFEC (#PF Error Code) until the enclave finishes.
This test also serves as an example to demonstrate the callback interface.

The second test single-steps through __vdso_sgx_enter_enclave() to make sure
the call stack can be unwound at every instruction within that vDSO API. Its
purpose is to validate the hand-crafted CFI directives in the assembly.

Signed-off-by: Cedric Xing 
---
  tools/testing/selftests/x86/sgx/Makefile   |   6 +-
  tools/testing/selftests/x86/sgx/main.c | 323 ++---
  tools/testing/selftests/x86/sgx/sgx_call.S |  40 ++-
  3 files changed, 322 insertions(+), 47 deletions(-)

diff --git a/tools/testing/selftests/x86/sgx/Makefile 
b/tools/testing/selftests/x86/sgx/Makefile
index 3af15d7c8644..31f937e220c4 100644
--- a/tools/testing/selftests/x86/sgx/Makefile
+++ b/tools/testing/selftests/x86/sgx/Makefile
@@ -14,16 +14,16 @@ TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
  all_64: $(TEST_CUSTOM_PROGS)
  
  $(TEST_CUSTOM_PROGS): main.c sgx_call.S $(OUTPUT)/encl_piggy.o

-   $(CC) $(HOST_CFLAGS) -o $@ $^
+   $(CC) $(HOST_CFLAGS) -o $@ $^ -lunwind -ldl -Wl,--defsym,__image_base=0 
-pie
  
  $(OUTPUT)/encl_piggy.o: encl_piggy.S $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss

$(CC) $(HOST_CFLAGS) -I$(OUTPUT) -c $< -o $@
  
  $(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf

-   objcopy --remove-section=.got.plt -O binary $< $@
+   objcopy -O binary $< $@
  
  $(OUTPUT)/encl.elf: encl.lds encl.c encl_bootstrap.S

-   $(CC) $(ENCL_CFLAGS) -T $^ -o $@
+   $(CC) $(ENCL_CFLAGS) -T $^ -o $@ -Wl,--build-id=none
  
  $(OUTPUT)/encl.ss: $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin

$^ $@
diff --git a/tools/testing/selftests/x86/sgx/main.c 
b/tools/testing/selftests/x86/sgx/main.c
index e2265f841fb0..d3e53c71306d 100644
--- a/tools/testing/selftests/x86/sgx/main.c
+++ b/tools/testing/selftests/x86/sgx/main.c
@@ -1,6 +1,7 @@
  // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
  // Copyright(c) 2016-18 Intel Corporation.
  
+#define _GNU_SOURCE

  #include 
  #include 
  #include 
@@ -9,16 +10,31 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
-#include 
+#include 
+#include 
+#include 
+
+#define UNW_LOCAL_ONLY
+#include 
+
  #include "encl_piggy.h"
  #include "defines.h"
  #include "../../../../../arch/x86/kernel/cpu/sgx/arch.h"
  #include "../../../../../arch/x86/include/uapi/asm/sgx.h"
  
-static const uint64_t MAGIC = 0x1122334455667788ULL;

+#define _Q(x)  __Q(x)
+#define __Q(x) #x
+#define ERRLN  "Line " _Q(__LINE__)
+
+#define X86_EFLAGS_TF  (1ul << 8)
+
+extern char __image_base[];
+size_t eenter;
+static size_t vdso_base;
  
  struct vdso_symtab {

Elf64_Sym *elf_symtab;
@@ -26,20 +42,11 @@ struct vdso_symtab {
Elf64_Word *elf_hashtab;
  };
  
-static void *vdso_get_base_addr(char *envp[])

+static void vdso_init(void)
  {
-   Elf64_auxv_t *auxv;
-   int i;
-
-   for (i = 0; envp[i]; i++);
-   auxv = (Elf64_auxv_t *)&envp[i + 1];
-
-   for (i = 0; auxv[i].a_type != AT_NULL; i++) {
-   if (auxv[i].a_type == AT_SYSINFO_EHDR)
-   return (void *)auxv[i].a_un.a_val;
-   }
-
-   return NULL;
+   vdso_base = getauxval(AT_SYSINFO_EHDR);
+   if (!vdso_base)
+   exit(1);
  }


The clean up makes sense but should be a separate patch i.e. one
logical change per patch. Right now the patch does other mods
than the ones explcitly stated in the commit message.

I'd suggest open coding vdso_init() to the call site in that
patch.

Please try to always minimize for diff's.

  
  static Elf64_Dyn *vdso_get_dyntab(void *addr)

@@ -66,8 +73,9 @@ static void *vdso_get_dyn(void *addr, Elf64_Dyn *dyntab, 
Elf64_Sxword tag)
return NULL;
  }
  
-static bool vdso_get_symtab(void *addr, struct vdso_symtab *symtab)

+static bool vdso_get_symtab(struct vdso_symtab *symtab)
  {
+   void *addr = (void *)vdso_base;
Elf64_Dyn *dyntab = vdso_get_dyntab(addr);
  
  	symtab->elf_symtab = vdso_get_dyn(addr, dyntab, DT_SYMTAB);

@@ -138,7 +146,7 @@ static bool encl_create(int dev_fd, unsigned long bin_size,
base = mmap(NULL, secs->size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_SHARED, dev_fd, 0);
if (base == MAP_FAILED) {
-   perror("mmap");
+   perror(ERRLN);
return false;
}
  
@@ -224,35 +232,271 @@ static bool encl_load(struct sgx_secs *secs, unsigned long bin_size)

return false;
  }
  
-void sgx_call(void *rdi, void *rsi, void *tcs,

- struct sgx_enclave_exception *exception,
- void *e

Re: [RFC PATCH v2 1/3] selftests/x86: Fixed Makefile for SGX selftest

2019-07-12 Thread Xing, Cedric

On 7/11/2019 8:19 PM, Jarkko Sakkinen wrote:

On Tue, Apr 23, 2019 at 11:26:21PM -0700, Cedric Xing wrote:

The original x86/sgx/Makefile doesn't work when 'x86/sgx' is specified as the
test target. This patch fixes that problem, along with minor changes to the
dependencies between 'x86' and 'x86/sgx' in selftests/x86/Makefile.

Signed-off-by: Cedric Xing 
---
  tools/testing/selftests/x86/Makefile | 12 +++
  tools/testing/selftests/x86/sgx/Makefile | 45 +---
  2 files changed, 22 insertions(+), 35 deletions(-)

diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index 4fc9a42f56ea..1294c5f5b6ca 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -70,11 +70,11 @@ all_32: $(BINARIES_32)
  
  all_64: $(BINARIES_64)
  
-all_64: $(SUBDIRS_64)

-   @for DIR in $(SUBDIRS_64); do   \
-   BUILD_TARGET=$(OUTPUT)/$$DIR;   \
-   mkdir $$BUILD_TARGET  -p;   \
-   make OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+all_64: | $(SUBDIRS_64)
+   @for DIR in $|; do  \
+   BUILD_TARGET=$(OUTPUT)/$$DIR;   \
+   mkdir $$BUILD_TARGET  -p;   \
+   $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@;  \


This is not fix for anything. It is change in semantics. This diff
should be isolated to its own commit as you are changing something
outside of SGX scope.


I have removed all changes to x86/Makefile from v4.

The current x86/Makefile only builds but cannot run sgx selftest. I 
don't see it as an urgent issue though.



done
  
  EXTRA_CLEAN := $(BINARIES_32) $(BINARIES_64)

@@ -90,7 +90,7 @@ ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
  all: warn_32bit_failure
  
  warn_32bit_failure:

-   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;  \
+   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;  \


Please clean this up.


echo "environment.  This will reduce test coverage of 64-bit" 2>&1; \
echo "kernels.  If you are using a Debian-like distribution," 2>&1; \
echo "try:"; 2>&1; \
diff --git a/tools/testing/selftests/x86/sgx/Makefile 
b/tools/testing/selftests/x86/sgx/Makefile
index 1fd6f2708e81..3af15d7c8644 100644
--- a/tools/testing/selftests/x86/sgx/Makefile
+++ b/tools/testing/selftests/x86/sgx/Makefile
@@ -2,47 +2,34 @@ top_srcdir = ../../../../..
  
  include ../../lib.mk
  
-HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC

-ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
+ifeq ($(shell $(CC) -dumpmachine | cut --delimiter=- -f1),x86_64)
+all: all_64
+endif
+
+HOST_CFLAGS := -Wall -Werror -g $(INCLUDES)
+ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIE \
   -fno-stack-protector -mrdrnd $(INCLUDES)
  
  TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx

  all_64: $(TEST_CUSTOM_PROGS)
  
-$(TEST_CUSTOM_PROGS): $(OUTPUT)/main.o $(OUTPUT)/sgx_call.o \

- $(OUTPUT)/encl_piggy.o
+$(TEST_CUSTOM_PROGS): main.c sgx_call.S $(OUTPUT)/encl_piggy.o
$(CC) $(HOST_CFLAGS) -o $@ $^
  
-$(OUTPUT)/main.o: main.c

-   $(CC) $(HOST_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/sgx_call.o: sgx_call.S
-   $(CC) $(HOST_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/encl_piggy.o: $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
-   $(CC) $(HOST_CFLAGS) -c encl_piggy.S -o $@
+$(OUTPUT)/encl_piggy.o: encl_piggy.S $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
+   $(CC) $(HOST_CFLAGS) -I$(OUTPUT) -c $< -o $@
  
-$(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf $(OUTPUT)/sgxsign

+$(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf
objcopy --remove-section=.got.plt -O binary $< $@
  
-$(OUTPUT)/encl.elf: $(OUTPUT)/encl.o $(OUTPUT)/encl_bootstrap.o

-   $(CC) $(ENCL_CFLAGS) -T encl.lds -o $@ $^
+$(OUTPUT)/encl.elf: encl.lds encl.c encl_bootstrap.S
+   $(CC) $(ENCL_CFLAGS) -T $^ -o $@
  
-$(OUTPUT)/encl.o: encl.c

-   $(CC) $(ENCL_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/encl_bootstrap.o: encl_bootstrap.S
-   $(CC) $(ENCL_CFLAGS) -c $< -o $@
-
-$(OUTPUT)/encl.ss: $(OUTPUT)/encl.bin  $(OUTPUT)/sgxsign
-   $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss
+$(OUTPUT)/encl.ss: $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin
+   $^ $@
  
  $(OUTPUT)/sgxsign: sgxsign.c

$(CC) -o $@ $< -lcrypto
  
-EXTRA_CLEAN := $(OUTPUT)/sgx-selftest $(OUTPUT)/sgx-selftest.o \

-  $(OUTPUT)/sgx_call.o $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss \
-  $(OUTPUT)/encl.elf $(OUTPUT)/encl.o $(OUTPUT)/encl_bootstrap.o \
-  $(OUTPUT)/sgxsign
-
-.PHONY: clean
+EXTRA_CLEAN := $(TEST_CUSTOM_PROGS) $(addprefix $(OUTPUT)/,\
+   encl.elf encl.bin encl.ss encl_piggy.o sgxsign)
--
2.17.1



What are all these changes to the makefile? I don't see mention of them
in the commit message.


Added description to the commit message i

Re: [PATCH v20 22/28] x86/traps: Attempt to fixup exceptions in vDSO before signaling

2019-07-11 Thread Xing, Cedric

On 7/11/2019 8:54 AM, Sean Christopherson wrote:

On Thu, Jun 27, 2019 at 01:32:58PM -0700, Xing, Cedric wrote:

Just a reminder that #DB/#BP shall be treated differently because they are
used by debuggers. So instead of branching to the fixup address, the kernel
shall just signal the process.


More importantly, doing fixup on #DB and #BP simply doesn't work.


What's really needed is a signal, as if the fixup entry didn't exist.

You don't have to care whether a debugger is attached or not.


On Tue, Apr 23, 2019 at 11:59:37AM -0700, Sean Christopherson wrote:

On Mon, Apr 22, 2019 at 06:29:06PM -0700, Andy Lutomirski wrote:

What's not tested here is running this code with EFLAGS.TF set and
making sure that it unwinds correctly.  Also, Jarkko, unless I missed
something, the vDSO extable code likely has a bug.  If you run the
instruction right before ENCLU with EFLAGS.TF set, then do_debug()
will eat the SIGTRAP and skip to the exception handler.  Similarly, if
you put an instruction breakpoint on ENCLU, it'll get skipped.  Or is
the code actually correct and am I just remembering wrong?


The code is indeed broken, and I don't see a sane way to make it not
broken other than to never do vDSO fixup on #DB or #BP.  But that's
probably the right thing to do anyways since an attached debugger is
likely the intended recipient the 99.999% of the time.

The crux of the matter is that it's impossible to identify whether or
not a #DB/#BP originated from within an enclave, e.g. an INT3 in an
enclave will look identical to an INT3 at the AEP.  Even if hardware
provided a magic flag, #DB still has scenarios where the intended
recipient is ambiguous, e.g. data breakpoint encountered in the enclave
but on an address outside of the enclave, breakpoint encountered in the
enclave and a code breakpoint on the AEP, etc...


Re: [RFC PATCH v2 0/3] An alternative __vdso_sgx_enter_enclave() to allow enclave/host parameter passing using untrusted stack

2019-07-11 Thread Xing, Cedric

On 7/11/2019 2:38 AM, Jarkko Sakkinen wrote:

On Wed, Jul 10, 2019 at 04:37:41PM -0700, Xing, Cedric wrote:

On 7/10/2019 4:15 PM, Jarkko Sakkinen wrote:

On Thu, Jul 11, 2019 at 01:46:28AM +0300, Jarkko Sakkinen wrote:

On Wed, Jul 10, 2019 at 11:08:37AM -0700, Xing, Cedric wrote:

With these conclusions I think the current vDSO API is sufficient for
Linux.


The new vDSO API is to support data exchange on stack. It has nothing to do
with debugging. BTW, the community has closed on this.


And how that is useful?


The CFI directives are for stack unwinding. They don't affect what the code
does so you can just treat them as NOPs if you don't understand what they
do. However, they are useful to not only debuggers but also exception
handling code. libunwind also has a setjmp()/longjmp() implementation based
on CFI directives.


Of course I won't merge code of which usefulness I don't understand.


I re-read the cover letter [1] because it usually is the place
to "pitch" a feature.

It fails to address two things:

1. How and in what circumstances is an untrusted stack is a better
 vessel for handling exceptions than the register based approach
 that we already have?


We are not judging which vessel is better (or the best) among all possible
vessels. We are trying to enable more vessels. Every vessel has its pros and
cons so there's *no* single best vessel.


I think reasonable metric is actually the coverage of the Intel SDK
based enclaves. How widely are they in the wild? If the user base is
large, it should be reasonable to support this just based on that.


I don't know how many existing enclaves out there, but definitely larger 
than 0 (zero), while user base for the old API is definitely 0. What are 
you worrying, really?



/Jarkko



Re: [RFC PATCH v2 0/3] An alternative __vdso_sgx_enter_enclave() to allow enclave/host parameter passing using untrusted stack

2019-07-11 Thread Xing, Cedric

On 7/11/2019 2:36 AM, Jarkko Sakkinen wrote:

On Wed, Jul 10, 2019 at 03:54:20PM -0700, Xing, Cedric wrote:

On 7/10/2019 3:46 PM, Jarkko Sakkinen wrote:

On Wed, Jul 10, 2019 at 11:08:37AM -0700, Xing, Cedric wrote:

With these conclusions I think the current vDSO API is sufficient for
Linux.


The new vDSO API is to support data exchange on stack. It has nothing to do
with debugging. BTW, the community has closed on this.


And how that is useful?


There is a lengthy discussion on its usefulness so I don't want to repeat.
In short, it allows using untrusted stack as a convenient method to exchange
data with the enclave. It is currently being used by Intel's SGX SDK for
e/o-calls parameters.


The CFI directives are for stack unwinding. They don't affect what the code
does so you can just treat them as NOPs if you don't understand what they
do. However, they are useful to not only debuggers but also exception
handling code. libunwind also has a setjmp()/longjmp() implementation based
on CFI directives.


Of course I won't merge code of which usefulness I don't understand.


Sure.

Any other questions I can help with?


I dissected my concerns in other email. We can merge this feature after
v21 if it makes sense.


Sent out v3 of vDSO changes last night. I hope your concerns have been 
properly addressed.


The new vDSO API is a community consensus. I can help on whatever 
technical problems you may have but I don't see a reason you should 
reject it.



/Jarkko



Re: [RFC PATCH v2 0/3] An alternative __vdso_sgx_enter_enclave() to allow enclave/host parameter passing using untrusted stack

2019-07-10 Thread Xing, Cedric

On 7/10/2019 4:15 PM, Jarkko Sakkinen wrote:

On Thu, Jul 11, 2019 at 01:46:28AM +0300, Jarkko Sakkinen wrote:

On Wed, Jul 10, 2019 at 11:08:37AM -0700, Xing, Cedric wrote:

With these conclusions I think the current vDSO API is sufficient for
Linux.


The new vDSO API is to support data exchange on stack. It has nothing to do
with debugging. BTW, the community has closed on this.


And how that is useful?


The CFI directives are for stack unwinding. They don't affect what the code
does so you can just treat them as NOPs if you don't understand what they
do. However, they are useful to not only debuggers but also exception
handling code. libunwind also has a setjmp()/longjmp() implementation based
on CFI directives.


Of course I won't merge code of which usefulness I don't understand.


I re-read the cover letter [1] because it usually is the place
to "pitch" a feature.

It fails to address two things:

1. How and in what circumstances is an untrusted stack is a better
vessel for handling exceptions than the register based approach
that we already have?


We are not judging which vessel is better (or the best) among all 
possible vessels. We are trying to enable more vessels. Every vessel has 
its pros and cons so there's *no* single best vessel.



2. How is it simpler approach? There is a strong claim of simplicity
and convenience without anything backing it.


The major benefits in terms of simplicity realize in user mode 
applications. It's always a trade-off. This vDSO API takes 10-20 more 
lines than the original one but would save hundreds or even thousands in 
user applications.


Again, I don't want to repeat everything as you can look back at the 
lengthy discussion to dig out the details.



3. Why we need both register and stack based approach co-exist? I'd go
with one approach for a new API without any legacy whatsoever.


Neither is a legacy to the other. Supporting both approaches is by 
design. Again, the goal is to enable more vessels because there's *no* 
single best vessel.



This really needs a better pitch before we can consider doing anything
to it.

Also, in [2] there is talk about the next revision. Maybe the way go
forward is to address the three issues I found in the cover letter
and fix whatever needed to be fixed in the actual patches?

[1] https://lkml.org/lkml/2019/4/24/84
[2] https://lkml.org/lkml/2019/4/25/1170


Let me update the commit message for the vDSO API and send you a patch.


/Jarkko



Re: [RFC PATCH v2 0/3] An alternative __vdso_sgx_enter_enclave() to allow enclave/host parameter passing using untrusted stack

2019-07-10 Thread Xing, Cedric

On 7/10/2019 3:46 PM, Jarkko Sakkinen wrote:

On Wed, Jul 10, 2019 at 11:08:37AM -0700, Xing, Cedric wrote:

With these conclusions I think the current vDSO API is sufficient for
Linux.


The new vDSO API is to support data exchange on stack. It has nothing to do
with debugging. BTW, the community has closed on this.


And how that is useful?


There is a lengthy discussion on its usefulness so I don't want to 
repeat. In short, it allows using untrusted stack as a convenient method 
to exchange data with the enclave. It is currently being used by Intel's 
SGX SDK for e/o-calls parameters.



The CFI directives are for stack unwinding. They don't affect what the code
does so you can just treat them as NOPs if you don't understand what they
do. However, they are useful to not only debuggers but also exception
handling code. libunwind also has a setjmp()/longjmp() implementation based
on CFI directives.


Of course I won't merge code of which usefulness I don't understand.


Sure.

Any other questions I can help with?


/Jarkko



Re: [RFC PATCH v2 0/3] An alternative __vdso_sgx_enter_enclave() to allow enclave/host parameter passing using untrusted stack

2019-07-10 Thread Xing, Cedric

On 7/10/2019 4:17 AM, Jarkko Sakkinen wrote:

On Tue, Apr 23, 2019 at 11:26:20PM -0700, Cedric Xing wrote:

The current proposed __vdso_sgx_enter_enclave() requires enclaves to preserve
%rsp, which prohibits enclaves from allocating space on the untrusted stack.
However, there are existing enclaves (e.g. those built with current Intel SGX
SDK libraries) relying on the untrusted stack for passing parameters to
untrusted functions (aka. o-calls), which requires allocating space on the
untrusted stack by enclaves. And given its simplicity and convenience, it could
be desired by future SGX applications as well.

This patchset introduces a new ABI for __vdso_sgx_enter_enclave() to anchor its
stack frame on %rbp (instead of %rsp), so as to allow enclaves to "push" onto
the untrusted stack by decrementing the untrusted %rsp. Additionally, this new
__vdso_sgx_enter_enclave() will take one more parameter - a callback function,
to be invoked upon all enclave exits (both AEX and normal exits). The callback
function will be given the value of %rsp left off by the enclave, so that data
"pushed" by the enclave (if any) could be addressed/accessed.  Please note that
the callback function is optional, and if not supplied (i.e. null),
__vdso_sgx_enter_enclave() will just return (i.e. behave the same as the
current implementation) after the enclave exits (or AEX due to exceptions).

The SGX selftest is augmented by two new tests. One exercises the new callback
interface, and serves as a simple example to showcase how to use it; while the
other validates the hand-crafted CFI directives in __vdso_sgx_enter_enclave()
by single-stepping through it and unwinding call stack at every instruction.


Why does the SDK anyway use real enclaves when step debugging? I don't
think kernel needs to scale to that. For me it looks more like a bad
architectural choice in the SDK implementation than anything else.


Intel's SGX SDK *does* support simulation mode for debugging enclaves 
outside of SGX mode, or even on non-SGX platforms. But nothing can 
replace the real SGX environment, hence SGX ISA supports debugging real 
enclaves.



You should design SDK in a way that it doesn't run the code inside real
enclave at first. It is just the sanest development model to use. The
current SDK has an unacceptable requirement that the workstation used to
develop code destined to run inside enclave must possess an appropriate
hardware. I don't think that is acceptable no matter what kind of API
kernel provides to invoke enclaves.


SDK supports simulation on non-SGX platforms.


I think "real" debug enclaves are only good for production testing when
you have more or less ready to release/deploy something but still want
to be able to get a memory dump of the enclave on occassion.


That's true.


With these conclusions I think the current vDSO API is sufficient for
Linux.


The new vDSO API is to support data exchange on stack. It has nothing to 
do with debugging. BTW, the community has closed on this.


The CFI directives are for stack unwinding. They don't affect what the 
code does so you can just treat them as NOPs if you don't understand 
what they do. However, they are useful to not only debuggers but also 
exception handling code. libunwind also has a setjmp()/longjmp() 
implementation based on CFI directives.



/Jarkko



RE: [PATCH v20 22/28] x86/traps: Attempt to fixup exceptions in vDSO before signaling

2019-06-27 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Jarkko Sakkinen
> Sent: Tuesday, June 25, 2019 8:44 AM
> 
> I went through the vDSO changes just to revisit couple of details that I
> had forgotten. Sean, if you don't mind I'd squash this and prepending
> patch.

Just a reminder that #DB/#BP shall be treated differently because they are used 
by debuggers. So instead of branching to the fixup address, the kernel shall 
just signal the process. 

> 
> Is there any obvious reason why #PF fixup is in its own patch and the
> rest are collected to the same patch? I would not find it confusing if
> there was one patch per exception but really don't get this division.
> 
> /Jarkko


RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-14 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Thursday, June 13, 2019 5:46 PM
> 
> On Thu, Jun 13, 2019 at 01:02:17PM -0400, Stephen Smalley wrote:
> > On 6/11/19 6:02 PM, Sean Christopherson wrote:
> > >On Tue, Jun 11, 2019 at 09:40:25AM -0400, Stephen Smalley wrote:
> > >>I haven't looked at this code closely, but it feels like a lot of
> > >>SGX-specific logic embedded into SELinux that will have to be
> > >>repeated or reused for every security module.  Does SGX not track
> this state itself?
> > >
> > >SGX does track equivalent state.
> > >
> > >There are three proposals on the table (I think):
> > >
> > >   1. Require userspace to explicitly specificy (maximal) enclave
> page
> > >  permissions at build time.  The enclave page permissions are
> provided
> > >  to, and checked by, LSMs at enclave build time.
> > >
> > >  Pros: Low-complexity kernel implementation, straightforward
> auditing
> > >  Cons: Sullies the SGX UAPI to some extent, may increase
> complexity of
> > >SGX2 enclave loaders.
> > >
> > >   2. Pre-check LSM permissions and dynamically track mappings to
> enclave
> > >  pages, e.g. add an SGX mprotect() hook to restrict W->X and WX
> > >  based on the pre-checked permissions.
> > >
> > >  Pros: Does not impact SGX UAPI, medium kernel complexity
> > >  Cons: Auditing is complex/weird, requires taking enclave-
> specific
> > >lock during mprotect() to query/update tracking.
> > >
> > >   3. Implement LSM hooks in SGX to allow LSMs to track enclave
> regions
> > >  from cradle to grave, but otherwise defer everything to LSMs.
> > >
> > >  Pros: Does not impact SGX UAPI, maximum flexibility, precise
> auditing
> > >  Cons: Most complex and "heaviest" kernel implementation of the
> three,
> > >pushes more SGX details into LSMs.
> > >
> > >My RFC series[1] implements #1.  My understanding is that Andy
> > >(Lutomirski) prefers #2.  Cedric's RFC series implements #3.
> > >
> > >Perhaps the easiest way to make forward progress is to rule out the
> > >options we absolutely *don't* want by focusing on the potentially
> > >blocking issue with each option:
> > >
> > >   #1 - SGX UAPI funkiness
> > >
> > >   #2 - Auditing complexity, potential enclave lock contention
> > >
> > >   #3 - Pushing SGX details into LSMs and complexity of kernel
> > > implementation
> > >
> > >
> > >[1]
> > >https://lkml.kernel.org/r/20190606021145.12604-1-sean.j.christopherso
> > >n...@intel.com
> >
> > Given the complexity tradeoff, what is the clear motivating example
> > for why
> > #1 isn't the obvious choice? That the enclave loader has no way of
> > knowing a priori whether the enclave will require W->X or WX?  But
> > aren't we better off requiring enclaves to be explicitly marked as
> > needing such so that we can make a more informed decision about
> > whether to load them in the first place?
> 
> Andy and/or Cedric, can you please weigh in with a concrete (and
> practical) use case that will break if we go with #1?  The auditing
> issues for #2/#3 are complex to say the least...

How does enclave loader provide per-page ALLOW_* flags? And a related question 
is why they are necessary for enclaves but unnecessary for regular executables 
or shared objects.

What's the story for SGX2 if mmap()'ing non-existing pages is not allowed?

What's the story for auditing?

After everything above has been taken care of properly, will #1 still be 
simpler than #2/#3?



RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-13 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Thursday, June 13, 2019 4:18 PM
> 
> On Thu, Jun 13, 2019 at 04:03:24PM -0700, Xing, Cedric wrote:
> > > From: Stephen Smalley [mailto:s...@tycho.nsa.gov]
> > > Sent: Thursday, June 13, 2019 10:02 AM
> > >
> > > > My RFC series[1] implements #1.  My understanding is that Andy
> > > > (Lutomirski) prefers #2.  Cedric's RFC series implements #3.
> > > >
> > > > Perhaps the easiest way to make forward progress is to rule out
> the
> > > > options we absolutely *don't* want by focusing on the potentially
> > > > blocking issue with each option:
> > > >
> > > >#1 - SGX UAPI funkiness
> > > >
> > > >#2 - Auditing complexity, potential enclave lock contention
> > > >
> > > >#3 - Pushing SGX details into LSMs and complexity of kernel
> > > > implementation
> > > >
> > > >
> > > > [1]
> > > > https://lkml.kernel.org/r/20190606021145.12604-1-
> sean.j.christopherson
> > > > @intel.com
> > >
> > > Given the complexity tradeoff, what is the clear motivating example
> for
> > > why #1 isn't the obvious choice? That the enclave loader has no way
> of
> > > knowing a priori whether the enclave will require W->X or WX?  But
> > > aren't we better off requiring enclaves to be explicitly marked as
> > > needing such so that we can make a more informed decision about
> whether
> > > to load them in the first place?
> >
> > Are you asking this question at a) page granularity, b) file
> granularity or
> > c) enclave (potentially comprised of multiple executable files)
> granularity?
> >
> > #b is what we have on regular executable files and shared objects (i.e.
> > FILE__EXECMOD). We all know how to do that.
> >
> > #c is kind of new but could be done via some proxy file (e.g.
> sigstruct file)
> > hence reduced to #b.
> >
> > #a is problematic. It'd require compilers/linkers to generate such
> > information, and proper executable image file format to carry that
> > information, to be eventually picked up the loader. SELinux doesn't
> have
> > PAGE__EXECMOD I guess is because it is generally considered
> impractical.
> >
> > Option #1 however requires #a because the driver doesn't track which
> page was
> > loaded from which file, otherwise it can no longer be qualified
> "simple". Or
> > we could just implement #c, which will make all options simpler. But I
> guess
> > #b is still preferred, to be aligned with what SELinux is enforcing
> today on
> > regular memory pages.o
> 
> Option #1 doesn't require (a).  The checks will happen for every page,
> but in the RFCs I sent, the policies are still attached to files and
> processes, i.e. (b).

I was talking at the UAPI level - i.e. your ioctl requires ALLOW_* at page 
granularity, hence #a.


RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-13 Thread Xing, Cedric
> From: Stephen Smalley [mailto:s...@tycho.nsa.gov]
> Sent: Thursday, June 13, 2019 10:02 AM
> 
> On 6/11/19 6:02 PM, Sean Christopherson wrote:
> > On Tue, Jun 11, 2019 at 09:40:25AM -0400, Stephen Smalley wrote:
> >> I haven't looked at this code closely, but it feels like a lot of
> >> SGX-specific logic embedded into SELinux that will have to be
> >> repeated or reused for every security module.  Does SGX not track
> this state itself?
> >
> > SGX does track equivalent state.
> >
> > There are three proposals on the table (I think):
> >
> >1. Require userspace to explicitly specificy (maximal) enclave page
> >   permissions at build time.  The enclave page permissions are
> provided
> >   to, and checked by, LSMs at enclave build time.
> >
> >   Pros: Low-complexity kernel implementation, straightforward
> auditing
> >   Cons: Sullies the SGX UAPI to some extent, may increase
> complexity of
> > SGX2 enclave loaders.
> >
> >2. Pre-check LSM permissions and dynamically track mappings to
> enclave
> >   pages, e.g. add an SGX mprotect() hook to restrict W->X and WX
> >   based on the pre-checked permissions.
> >
> >   Pros: Does not impact SGX UAPI, medium kernel complexity
> >   Cons: Auditing is complex/weird, requires taking enclave-
> specific
> > lock during mprotect() to query/update tracking.
> >
> >3. Implement LSM hooks in SGX to allow LSMs to track enclave
> regions
> >   from cradle to grave, but otherwise defer everything to LSMs.
> >
> >   Pros: Does not impact SGX UAPI, maximum flexibility, precise
> auditing
> >   Cons: Most complex and "heaviest" kernel implementation of the
> three,
> > pushes more SGX details into LSMs.
> >
> > My RFC series[1] implements #1.  My understanding is that Andy
> > (Lutomirski) prefers #2.  Cedric's RFC series implements #3.
> >
> > Perhaps the easiest way to make forward progress is to rule out the
> > options we absolutely *don't* want by focusing on the potentially
> > blocking issue with each option:
> >
> >#1 - SGX UAPI funkiness
> >
> >#2 - Auditing complexity, potential enclave lock contention
> >
> >#3 - Pushing SGX details into LSMs and complexity of kernel
> > implementation
> >
> >
> > [1]
> > https://lkml.kernel.org/r/20190606021145.12604-1-sean.j.christopherson
> > @intel.com
> 
> Given the complexity tradeoff, what is the clear motivating example for
> why #1 isn't the obvious choice? That the enclave loader has no way of
> knowing a priori whether the enclave will require W->X or WX?  But
> aren't we better off requiring enclaves to be explicitly marked as
> needing such so that we can make a more informed decision about whether
> to load them in the first place?

Are you asking this question at a) page granularity, b) file granularity or c) 
enclave (potentially comprised of multiple executable files) granularity?

#b is what we have on regular executable files and shared objects (i.e. 
FILE__EXECMOD). We all know how to do that.

#c is kind of new but could be done via some proxy file (e.g. sigstruct file) 
hence reduced to #b.

#a is problematic. It'd require compilers/linkers to generate such information, 
and proper executable image file format to carry that information, to be 
eventually picked up the loader. SELinux doesn't have PAGE__EXECMOD I guess is 
because it is generally considered impractical.

Option #1 however requires #a because the driver doesn't track which page was 
loaded from which file, otherwise it can no longer be qualified "simple". Or we 
could just implement #c, which will make all options simpler. But I guess #b is 
still preferred, to be aligned with what SELinux is enforcing today on regular 
memory pages.


RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-13 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Thursday, June 13, 2019 12:49 PM
> 
> > >By "SGX", did you mean the SGX subsystem being upstreamed? It doesn’t
> > >track that state. In practice, there's no way for SGX to track it
> > >because there's no vm_ops->may_mprotect() callback. It doesn't follow
> > >the philosophy of Linux either, as mprotect() doesn't track it for
> > >regular memory. And it doesn't have a use without LSM, so I believe
> > >it makes more sense to track it inside LSM.
> >
> > Yes, the SGX driver/subsystem.  I had the impression from Sean that it
> > does track this kind of per-page state already in some manner, but
> > possibly he means it does under a given proposal and not in the
> current driver.
> 
> Yeah, under a given proposal.  SGX has per-page tracking, adding new
> flags is fairly easy.  Philosophical objections aside,
> adding .may_mprotect() is trivial.

As I pointed out in an earlier email, protection flags are associated with 
ranges. They could of course be duplicated to every page but that will hurt 
performance because every page within the range would have to be tested 
individually.

Furthermore, though .may_protect()is able to make the decision, I don't think 
it can do the audit log as well, unless it is coded in an SELinux specific way. 
Then I wonder how it could work with LSM modules other than SELinux.

> 
> > Even the #b remembering might end up being SELinux-specific if we also
> > have to remember the original inputs used to compute the answer so
> > that we can audit that information when access is denied later upon
> > mprotect().  At the least we'd need it to save some opaque data and
> > pass it to a callback into SELinux to perform that auditing.


RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-13 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Stephen Smalley
> 
> On 6/11/19 6:55 PM, Xing, Cedric wrote:
> >> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> >> ow...@vger.kernel.org] On Behalf Of Stephen Smalley
> >> Sent: Tuesday, June 11, 2019 6:40 AM
> >>
> >>>
> >>> +#ifdef CONFIG_INTEL_SGX
> >>> + rc = sgxsec_mprotect(vma, prot);
> >>> + if (rc <= 0)
> >>> + return rc;
> >>
> >> Why are you skipping the file_map_prot_check() call when rc == 0?
> >> What would SELinux check if you didn't do so -
> >> FILE__READ|FILE__WRITE|FILE__EXECUTE to /dev/sgx/enclave?  Is it a
> >> problem to let SELinux proceed with that check?
> >
> > We can continue the check. But in practice, all
> FILE__{READ|WRITE|EXECUTE} are needed for every enclave, then what's the
> point of checking them? FILE__EXECMOD may be the only flag that has a
> meaning, but it's kind of redundant because sigstruct file was checked
> against that already.
> 
> I don't believe FILE__EXECMOD will be checked since it is a shared file
> mapping.  We'll check at least FILE__READ and FILE__WRITE anyway upon
> open(), and possibly FILE__EXECUTE upon mmap() unless that is never
> PROT_EXEC.  We want the policy to accurately reflect the operations of
> the system, even when an operation "must" be allowed, and even here this
> only needs to be allowed to processes authorized as enclave loaders, not
> to all processes.
> 
> I don't think there are other examples where we skip a SELinux check
> like this.  If we were to do so here, we would at least need a comment
> explaining that it was intentional and why.  The risk would be that
> future checking added into file_map_prot_check() would be unwittingly
> bypassed for these mappings.  A warning there would also be advisable if
> we skip it for these mappings.

You are right! The code was written assuming file_map_prot_check() wouldn't 
object if sgxsec_mprotect() approves it, but that may not always be the case if 
new checks are added in future. I'll add the check back.
 
> 
> >
> >>> +static int selinux_enclave_load(struct file *encl, unsigned long
> addr,
> >>> + unsigned long size, unsigned long prot,
> >>> + struct vm_area_struct *source)
> >>> +{
> >>> + if (source) {
> >>> + /**
> >>> +  * Adding page from source => EADD request
> >>> +  */
> >>> + int rc = selinux_file_mprotect(source, prot, prot);
> >>> + if (rc)
> >>> + return rc;
> >>> +
> >>> + if (!(prot & VM_EXEC) &&
> >>> + selinux_file_mprotect(source, VM_EXEC, VM_EXEC))
> >>
> >> I wouldn't conflate VM_EXEC with PROT_EXEC even if they happen to be
> >> defined with the same values currently.  Elsewhere the kernel appears
> >> to explicitly translate them ala calc_vm_prot_bits().
> >
> > Thanks! I'd change them to PROT_EXEC in the next version.
> >
> >>
> >> Also, this will mean that we will always perform an execute check on
> >> all sources, thereby triggering audit denial messages for any EADD
> >> sources that are only intended to be data.  Depending on the source,
> >> this could trigger PROCESS__EXECMEM or FILE__EXECMOD or
> >> FILE__EXECUTE.  In a world where users often just run any denials
> >> they see through audit2allow, they'll end up always allowing them
> >> all.  How can they tell whether it was needed? It would be preferable
> >> if we could only trigger execute checks when there is some
> >> probability that execute will be requested in the future.
> >> Alternatives would be to silence the audit of these permission checks
> >> always via use of _noaudit() interfaces or to silence audit of these
> >> permissions via dontaudit rules in policy, but the latter would hide
> >> all denials of the permission by the process, not just those
> >> triggered from security_enclave_load().  And if we silence them, then
> we won't see them even if they were needed.
> >
> > *_noaudit() is exactly what I wanted. But I couldn't find
> selinux_file_mprotect_noaudit()/file_has_perm_noaudit(), and I'm
> reluctant to duplicate code. Any suggestions?
> 
> I would have no objection to adding _noaudit() variants of these, either
> duplicating 

RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-13 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Wednesday, June 12, 2019 3:03 PM
> 
> > I think this model works quite well in an SGX1 world.  The main thing
> > that makes me uneasy about this model is that, in SGX2, it requires
> > that an SGX2-compatible enclave loader must pre-declare to the kernel
> > whether it intends for its dynamically allocated memory to be
> > ALLOW_EXEC.  If ALLOW_EXEC is set but not actually needed, it will
> > still fail if DENY_X_IF_ALLOW_WRITE ends up being set.  The other
> > version below does not have this limitation.
> 
> I'm not convinced this will be a meaningful limitation in practice,
> though that's probably obvious from my RFCs :-).  That being said, the
> UAPI quirk is essentially a dealbreaker for multiple people, so let's
> drop #1.
> 
> I discussed the options with Cedric offline, and he is ok with option #2
> *if* the idea actually translates to acceptable code and doesn't present
> problems for userspace and/or future SGX features.
> 
> So, I'll work on an RFC series to implement #2 as described below.  If
> it works out, yay!  If not, i.e. option #2 is fundamentally broken, I'll
> shift my focus to Cedric's code (option #3).
> 
> > >   2. Pre-check LSM permissions and dynamically track mappings to
> enclave
> > >  pages, e.g. add an SGX mprotect() hook to restrict W->X and WX
> > >  based on the pre-checked permissions.
> > >
> > >  Pros: Does not impact SGX UAPI, medium kernel complexity
> > >  Cons: Auditing is complex/weird, requires taking enclave-
> specific
> > >lock during mprotect() to query/update tracking.
> >
> > Here's how this looks in my mind.  It's quite similar, except that
> > ALLOW_READ, ALLOW_WRITE, and ALLOW_EXEC are replaced with a little
> > state machine.
> >
> > EADD does not take any special flags.  It calls this LSM hook:
> >
> >   int security_enclave_load(struct vm_area_struct *source);
> >
> > This hook can return -EPERM.  Otherwise it 0 or
> > ALLOC_EXEC_IF_UNMODIFIED (i.e. 1).  This hook enforces permissions (a)
> and (b).
> >
> > The driver tracks a state for each page, and the possible states are:
> >
> >  - CLEAN_MAYEXEC /* no W or X VMAs have existed, but X is okay */
> >  - CLEAN_NOEXEC /* no W or X VMAs have existed, and X is not okay */
> >  - CLEAN_EXEC /* no W VMA has existed, but an X VMA has existed */
> >  - DIRTY /* a W VMA has existed */
> >
> > The initial state for a page is CLEAN_MAYEXEC if the hook said
> > ALLOW_EXEC_IF_UNMODIFIED and CLEAN_NOEXEC otherwise.
> >
> > The future EAUG does not call a hook at all and puts pages into the
> > state CLEAN_NOEXEC.  If SGX3 or later ever adds EAUG-but-don't-clear,
> > it can call security_enclave_load() and add CLEAN_MAYEXEC pages if
> appropriate.
> >
> > EINIT takes a sigstruct pointer.  SGX calls a new hook:
> >
> >   unsigned int security_enclave_init(struct sigstruct *sigstruct,
> > struct vm_area_struct *source, unsigned int flags);
> >
> > This hook can return -EPERM.  Otherwise it returns 0 or a combination
> > of flags DENY_WX and DENY_X_DIRTY.  The driver saves this value.
> > These represent permissions (c) and (d).
> >
> > If we want to have a permission for "execute code supplied from
> > outside the enclave that was not measured", we could have a flag like
> > HAS_UNMEASURED_CLEAN_EXEC_PAGE that the LSM could consider.
> >
> > mmap() and mprotect() enforce the following rules:
> >
> >  - If VM_EXEC is requested and (either the page is DIRTY or VM_WRITE
> is
> >requested) and DENY_X_DIRTY, then deny.
> >
> >  - If VM_WRITE and VM_EXEC are both requested and DENY_WX, then deny.
> >
> >  - If VM_WRITE is requested, we need to update the state.  If it was
> >CLEAN_EXEC, then we reject if DENY_X_DIRTY.  Otherwise we change
> the
> >state to DIRTY.
> >
> >  - If VM_EXEC is requested and the page is CLEAN_NOEXEC, then deny.
> >
> > mprotect() and mmap() do *not* call SGX-specific LSM hooks to ask for
> > permission, although they can optionally call an LSM hook if they hit
> > one of the -EPERM cases for auditing purposes.
> >
> > Before the SIGSTRUCT is provided to the driver, the driver acts as
> > though DENY_X_DIRTY and DENY_WX are both set.

I think we've been discussing 2 topics simultaneously, one is the state machine 
that accepts/rejects mmap/mprotect requests, while the other is where is the 
best place to put it. I think we have an agreement on the former, and IMO 
option #2 and #3 differ only in the latter.

Option #2 keeps the state machine inside SGX subsystem, so it could reuse 
existing data structures for page tracking/locking to some extent. Sean may 
have smarter ideas, but it looks to me like the existing 'struct sgx_encl_page' 
tracks individual enclave pages while the FSM states apply to ranges. So in 
order *not* to test page by page in mmap/mprotect, I guess some new range 
oriented structures are still necessary. But I don't think it very important 
anyway. 

My major concern is more from the architecture

RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-13 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Wednesday, June 12, 2019 3:03 PM
> 
> > I think this model works quite well in an SGX1 world.  The main thing
> > that makes me uneasy about this model is that, in SGX2, it requires
> > that an SGX2-compatible enclave loader must pre-declare to the kernel
> > whether it intends for its dynamically allocated memory to be
> > ALLOW_EXEC.  If ALLOW_EXEC is set but not actually needed, it will
> > still fail if DENY_X_IF_ALLOW_WRITE ends up being set.  The other
> > version below does not have this limitation.
> 
> I'm not convinced this will be a meaningful limitation in practice,
> though that's probably obvious from my RFCs :-).  That being said, the
> UAPI quirk is essentially a dealbreaker for multiple people, so let's
> drop #1.
> 
> I discussed the options with Cedric offline, and he is ok with option #2
> *if* the idea actually translates to acceptable code and doesn't present
> problems for userspace and/or future SGX features.
> 
> So, I'll work on an RFC series to implement #2 as described below.  If
> it works out, yay!  If not, i.e. option #2 is fundamentally broken, I'll
> shift my focus to Cedric's code (option #3).
> 
> > >   2. Pre-check LSM permissions and dynamically track mappings to
> enclave
> > >  pages, e.g. add an SGX mprotect() hook to restrict W->X and WX
> > >  based on the pre-checked permissions.
> > >
> > >  Pros: Does not impact SGX UAPI, medium kernel complexity
> > >  Cons: Auditing is complex/weird, requires taking enclave-
> specific
> > >lock during mprotect() to query/update tracking.
> >
> > Here's how this looks in my mind.  It's quite similar, except that
> > ALLOW_READ, ALLOW_WRITE, and ALLOW_EXEC are replaced with a little
> > state machine.
> >
> > EADD does not take any special flags.  It calls this LSM hook:
> >
> >   int security_enclave_load(struct vm_area_struct *source);
> >
> > This hook can return -EPERM.  Otherwise it 0 or
> > ALLOC_EXEC_IF_UNMODIFIED (i.e. 1).  This hook enforces permissions (a)
> and (b).
> >
> > The driver tracks a state for each page, and the possible states are:
> >
> >  - CLEAN_MAYEXEC /* no W or X VMAs have existed, but X is okay */
> >  - CLEAN_NOEXEC /* no W or X VMAs have existed, and X is not okay */
> >  - CLEAN_EXEC /* no W VMA has existed, but an X VMA has existed */
> >  - DIRTY /* a W VMA has existed */
> >
> > The initial state for a page is CLEAN_MAYEXEC if the hook said
> > ALLOW_EXEC_IF_UNMODIFIED and CLEAN_NOEXEC otherwise.
> >
> > The future EAUG does not call a hook at all and puts pages into the
> > state CLEAN_NOEXEC.  If SGX3 or later ever adds EAUG-but-don't-clear,
> > it can call security_enclave_load() and add CLEAN_MAYEXEC pages if
> appropriate.
> >
> > EINIT takes a sigstruct pointer.  SGX calls a new hook:
> >
> >   unsigned int security_enclave_init(struct sigstruct *sigstruct,
> > struct vm_area_struct *source, unsigned int flags);
> >
> > This hook can return -EPERM.  Otherwise it returns 0 or a combination
> > of flags DENY_WX and DENY_X_DIRTY.  The driver saves this value.
> > These represent permissions (c) and (d).
> >
> > If we want to have a permission for "execute code supplied from
> > outside the enclave that was not measured", we could have a flag like
> > HAS_UNMEASURED_CLEAN_EXEC_PAGE that the LSM could consider.
> >
> > mmap() and mprotect() enforce the following rules:
> >
> >  - If VM_EXEC is requested and (either the page is DIRTY or VM_WRITE
> is
> >requested) and DENY_X_DIRTY, then deny.
> >
> >  - If VM_WRITE and VM_EXEC are both requested and DENY_WX, then deny.
> >
> >  - If VM_WRITE is requested, we need to update the state.  If it was
> >CLEAN_EXEC, then we reject if DENY_X_DIRTY.  Otherwise we change
> the
> >state to DIRTY.
> >
> >  - If VM_EXEC is requested and the page is CLEAN_NOEXEC, then deny.
> >
> > mprotect() and mmap() do *not* call SGX-specific LSM hooks to ask for
> > permission, although they can optionally call an LSM hook if they hit
> > one of the -EPERM cases for auditing purposes.
> >
> > Before the SIGSTRUCT is provided to the driver, the driver acts as
> > though DENY_X_DIRTY and DENY_WX are both set.

I think we've been discussing 2 topics simultaneously, one is the state machine 
that accepts/rejects mmap/mprotect requests, while the other is where is the 
best place to put it. I think we have an agreement on the former, and IMO 
option #2 and #3 differ only in the latter.

Option #2 keeps the state machine inside SGX subsystem, so it could reuse 
existing data structures for page tracking/locking to some extent. Sean may 
have smarter ideas, but it looks to me like the existing 'struct sgx_encl_page' 
tracks individual enclave pages while the FSM states apply to ranges. So in 
order *not* to test page by page in mmap/mprotect, I guess some new range 
oriented structures are still necessary. But I don't think it very important 
anyway. 

My major concern is more from the architecture

RE: [RFC PATCH 2/9] x86/sgx: Do not naturally align MAP_FIXED address

2019-06-13 Thread Xing, Cedric
> From: Jarkko Sakkinen [mailto:jarkko.sakki...@linux.intel.com]
> Sent: Thursday, June 13, 2019 6:48 AM
> 
> On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen
>  wrote:
> > > >
> > > >> On Tue, Jun 04, 2019 at 10:10:22PM +, Xing, Cedric wrote:
> > > >> A bit off topic here. This mmap()/mprotect() discussion reminds
> > > >> me a question (guess for Jarkko): Now that
> > > >> vma->vm_file->private_data keeps a pointer to the enclave, why do
> we store it again in vma->vm_private?
> > > >> It isn't a big deal but non-NULL vm_private does prevent
> > > >> mprotect() from merging adjacent VMAs.
> > > >
> > > > Same semantics as with a regular mmap i.e. you can close the file
> > > > and still use the mapping.
> > > >
> > > >
> > >
> > > The file should be properly refcounted — vm_file should not go away
> while it’s mapped.
> 
> mm already takes care of that so vm_file does not go away. Still we need
> an internal refcount for enclaves to synchronize with the swapper. In
> summary nothing needs to be done.

I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks 
the vma, which in turn reference counts vma->vm_file. Why is the internal 
refcount still needed? 

> 
> > Right, makes sense. It is easy one to change essentially just removing
> > internal refcount from sgx_encl and using file for the same. I'll
> > update this to my tree along with the changes to remove LKM/ACPI bits
> ASAP.
> 
> /Jarkko


RE: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits

2019-06-12 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Wednesday, June 12, 2019 7:34 AM
> 
> On Tue, Jun 11, 2019 at 05:09:28PM -0700, Andy Lutomirski wrote:
> >
> > On Jun 10, 2019, at 3:28 PM, Xing, Cedric 
> wrote:
> >
> > >> From: Andy Lutomirski [mailto:l...@kernel.org]
> > >> Sent: Monday, June 10, 2019 12:15 PM This seems like an odd
> > >> workflow.  Shouldn't the #PF return back to untrusted userspace so
> > >> that the untrusted user code can make its own decision as to
> > >> whether it wants to EAUG a page there as opposed to, say, killing
> > >> the enclave or waiting to keep resource usage under control?
> > >
> > > This may seem odd to some at the first glance. But if you can think
> > > of how static heap (pre-allocated by EADD before EINIT) works, the
> > > load parses the "metadata" coming with the enclave to decide the
> > > address/size of the heap, EADDs it, and calls it done. In the case
> > > of "dynamic" heap (allocated dynamically by EAUG after EINIT), the
> > > same thing applies - the loader determines the range of the heap,
> > > tells the SGX module about it, and calls it done. Everything else is
> the between the enclave and the SGX module.
> > >
> > > In practice, untrusted code usually doesn't know much about
> > > enclaves, just like it doesn't know much about the shared objects
> > > loaded into its address space either. Without the necessary
> > > knowledge, untrusted code usually just does what it is told (via
> > > o-calls, or return value from e-calls), without judging that's right
> or wrong.
> > >
> > > When it comes to #PF like what I described, of course a signal could
> > > be sent to the untrusted code but what would it do then? Usually
> > > it'd just come back asking for a page at the fault address. So we
> > > figured it'd be more efficient to just have the kernel EAUG at #PF.
> > >
> > > Please don't get me wrong though, as I'm not dictating what the s/w
> > > flow shall be. It's just going to be a choice offered to user mode.
> > > And that choice was planned to be offered via mprotect() - i.e. a
> > > writable vma causes kernel to EAUG while a non-writable vma will
> > > result in a signal (then the user mode could decide whether to
> > > EAUG). The key point is flexibility - as we want to allow all
> > > reasonable s/w flows instead of dictating one over others. We had
> similar discussions on vDSO API before.
> > > And I think you accepted my approach because of its flexibility. Am
> > > I right?
> >
> > As long as user code can turn this off, I have no real objection. But
> > it might make sense to have it be more explicit — have an ioctl set up
> > a range as “EAUG-on-demand”.
> 
> This was part of the motivation behind changing SGX_IOC_ENCLAVE_ADD_PAGE
> to SGX_IOC_ENCLAVE_ADD_REGION and adding a @flags parameter.  E.g.
> adding support for "EAUG-on-demand" regions would just be a new flag.

We'll end up in some sort of interface eventually. But that's too early to 
discuss.

Currently what we need is the plumbing - i.e. the range has to be mmap()'ed and 
it cannot be PROT_NONE, otherwise vm_ops->fault() will not be reached.

> 
> > But this is all currently irrelevant. We can argue about it when the
> > patches show up. :)


RE: [RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

2019-06-11 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Tuesday, June 11, 2019 6:40 AM
> 
> >
> > +#ifdef CONFIG_INTEL_SGX
> > +   rc = sgxsec_mprotect(vma, prot);
> > +   if (rc <= 0)
> > +   return rc;
> 
> Why are you skipping the file_map_prot_check() call when rc == 0?
> What would SELinux check if you didn't do so -
> FILE__READ|FILE__WRITE|FILE__EXECUTE to /dev/sgx/enclave?  Is it a
> problem to let SELinux proceed with that check?

We can continue the check. But in practice, all FILE__{READ|WRITE|EXECUTE} are 
needed for every enclave, then what's the point of checking them? FILE__EXECMOD 
may be the only flag that has a meaning, but it's kind of redundant because 
sigstruct file was checked against that already.

> > +static int selinux_enclave_load(struct file *encl, unsigned long addr,
> > +   unsigned long size, unsigned long prot,
> > +   struct vm_area_struct *source)
> > +{
> > +   if (source) {
> > +   /**
> > +* Adding page from source => EADD request
> > +*/
> > +   int rc = selinux_file_mprotect(source, prot, prot);
> > +   if (rc)
> > +   return rc;
> > +
> > +   if (!(prot & VM_EXEC) &&
> > +   selinux_file_mprotect(source, VM_EXEC, VM_EXEC))
> 
> I wouldn't conflate VM_EXEC with PROT_EXEC even if they happen to be
> defined with the same values currently.  Elsewhere the kernel appears to
> explicitly translate them ala calc_vm_prot_bits().

Thanks! I'd change them to PROT_EXEC in the next version.

> 
> Also, this will mean that we will always perform an execute check on all
> sources, thereby triggering audit denial messages for any EADD sources
> that are only intended to be data.  Depending on the source, this could
> trigger PROCESS__EXECMEM or FILE__EXECMOD or FILE__EXECUTE.  In a world
> where users often just run any denials they see through audit2allow,
> they'll end up always allowing them all.  How can they tell whether it
> was needed? It would be preferable if we could only trigger execute
> checks when there is some probability that execute will be requested in
> the future.  Alternatives would be to silence the audit of these
> permission checks always via use of _noaudit() interfaces or to silence
> audit of these permissions via dontaudit rules in policy, but the latter
> would hide all denials of the permission by the process, not just those
> triggered from security_enclave_load().  And if we silence them, then we
> won't see them even if they were needed.

*_noaudit() is exactly what I wanted. But I couldn't find 
selinux_file_mprotect_noaudit()/file_has_perm_noaudit(), and I'm reluctant to 
duplicate code. Any suggestions?
 
> 
> > +   prot = 0;
> > +   else {
> > +   prot = SGX__EXECUTE;
> > +   if (source->vm_file &&
> > +   !file_has_perm(current_cred(), source->vm_file,
> > +  FILE__EXECMOD))
> > +   prot |= SGX__EXECMOD;
> 
> Similarly, this means that we will always perform a FILE__EXECMOD check
> on all executable sources, triggering audit denial messages for any EADD
> source that is executable but to which EXECMOD is not allowed, and again
> the most common pattern will be that users will add EXECMOD to all
> executable sources to avoid this.
> 
> > +   }
> > +   return sgxsec_eadd(encl, addr, size, prot);
> > +   } else {
> > +   /**
> > + * Adding page from NULL => EAUG request
> > + */
> > +   return sgxsec_eaug(encl, addr, size, prot);
> > +   }
> > +}
> > +
> > +static int selinux_enclave_init(struct file *encl,
> > +   const struct sgx_sigstruct *sigstruct,
> > +   struct vm_area_struct *vma)
> > +{
> > +   int rc = 0;
> > +
> > +   if (!vma)
> > +   rc = -EINVAL;
> 
> Is it ever valid to call this hook with a NULL vma?  If not, this should
> be handled/prevented by the caller.  If so, I'd just return -EINVAL
> immediately here.

vma shall never be NULL. I'll update it in the next version.

> 
> > +
> > +   if (!rc && !(vma->vm_flags & VM_EXEC))
> > +   rc = selinux_file_mprotect(vma, VM_EXEC, VM_EXEC);
> 
> I had thought we were trying to avoid overloading FILE__EXECUTE (or
> whatever gets checked here, e.g. could be PROCESS__EXECMEM or
> FILE__EXECMOD) on the sigstruct file, since the caller isn't truly
> executing code from it.

Agreed. Another problem with FILE__EXECMOD on the sigstruct file is that user 
code would then be allowed to modify SIGSTRUCT at will, which effectively wipes 
out the protection provided by FILE__EXECUTE.

> 
> I'd define new ENCLAVE__* permissions, including an up-front
> ENCLAVE__INIT permission that governs whether the sigstruct file can be
> used at all irrespective

RE: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits

2019-06-10 Thread Xing, Cedric
> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Monday, June 10, 2019 12:15 PM
> 
> On Mon, Jun 10, 2019 at 11:29 AM Xing, Cedric 
> wrote:
> >
> > > From: Christopherson, Sean J
> > > Sent: Wednesday, June 05, 2019 7:12 PM
> > >
> > > +/**
> > > + * sgx_map_allowed - check vma protections against the associated
> > > enclave page
> > > + * @encl:an enclave
> > > + * @start:   start address of the mapping (inclusive)
> > > + * @end: end address of the mapping (exclusive)
> > > + * @prot:protection bits of the mapping
> > > + *
> > > + * Verify a userspace mapping to an enclave page would not violate
> > > +the security
> > > + * requirements of the *kernel*.  Note, this is in no way related
> > > +to the
> > > + * page protections enforced by hardware via the EPCM.  The EPCM
> > > +protections
> > > + * can be directly extended by the enclave, i.e. cannot be relied
> > > +upon by the
> > > + * kernel for security guarantees of any kind.
> > > + *
> > > + * Return:
> > > + *   0 on success,
> > > + *   -EACCES if the mapping is disallowed
> > > + */
> > > +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
> > > + unsigned long end, unsigned long prot) {
> > > + struct sgx_encl_page *page;
> > > + unsigned long addr;
> > > +
> > > + prot &= (VM_READ | VM_WRITE | VM_EXEC);
> > > + if (!prot || !encl)
> > > + return 0;
> > > +
> > > + mutex_lock(&encl->lock);
> > > +
> > > + for (addr = start; addr < end; addr += PAGE_SIZE) {
> > > + page = radix_tree_lookup(&encl->page_tree, addr >>
> > > PAGE_SHIFT);
> > > +
> > > + /*
> > > +  * Do not allow R|W|X to a non-existent page, or
> protections
> > > +  * beyond those of the existing enclave page.
> > > +  */
> > > + if (!page || (prot & ~page->prot))
> > > + return -EACCES;
> >
> > In SGX2, pages will be "mapped" before being populated.
> >
> > Here's a brief summary for those who don't have enough background on
> how new EPC pages could be added to a running enclave in SGX2:
> >   - There are 2 new instructions - EACCEPT and EAUG.
> >   - EAUG is used by SGX module to add (augment) a new page to an
> existing enclave. The newly added page is *inaccessible* until the
> enclave *accepts* it.
> >   - EACCEPT is the instruction for an enclave to accept a new page.
> >
> > And the s/w flow for an enclave to request new EPC pages is expected
> to be something like the following:
> >   - The enclave issues EACCEPT at the linear address that it would
> like a new page.
> >   - EACCEPT results in #PF, as there's no page at the linear address
> above.
> >   - SGX module is notified about the #PF, in form of its vma->vm_ops-
> >fault() being called by kernel.
> >   - SGX module EAUGs a new EPC page at the fault address, and resumes
> the enclave.
> >   - EACCEPT is reattempted, and succeeds at this time.
> 
> This seems like an odd workflow.  Shouldn't the #PF return back to
> untrusted userspace so that the untrusted user code can make its own
> decision as to whether it wants to EAUG a page there as opposed to, say,
> killing the enclave or waiting to keep resource usage under control?

This may seem odd to some at the first glance. But if you can think of how 
static heap (pre-allocated by EADD before EINIT) works, the load parses the 
"metadata" coming with the enclave to decide the address/size of the heap, 
EADDs it, and calls it done. In the case of "dynamic" heap (allocated 
dynamically by EAUG after EINIT), the same thing applies - the loader 
determines the range of the heap, tells the SGX module about it, and calls it 
done. Everything else is the between the enclave and the SGX module.

In practice, untrusted code usually doesn't know much about enclaves, just like 
it doesn't know much about the shared objects loaded into its address space 
either. Without the necessary knowledge, untrusted code usually just does what 
it is told (via o-calls, or return value from e-calls), without judging that's 
right or wrong. 

When it comes to #PF like what I described, of course a signal could be sent to 
the untrusted code but what would it do then? Usually it'd just come back 
asking for a page at the fault address. So we figured it'd be more efficient to 
just have the kernel EAUG at #PF. 

Please don't get me wrong though, as I'm not dictating what the s/w flow shall 
be. It's just going to be a choice offered to user mode. And that choice was 
planned to be offered via mprotect() - i.e. a writable vma causes kernel to 
EAUG while a non-writable vma will result in a signal (then the user mode could 
decide whether to EAUG). The key point is flexibility - as we want to allow all 
reasonable s/w flows instead of dictating one over others. We had similar 
discussions on vDSO API before. And I think you accepted my approach because of 
its flexibility. Am I right?


RE: [RFC PATCH v2 1/5] mm: Introduce vm_ops->may_mprotect()

2019-06-10 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Monday, June 10, 2019 12:50 PM
> 
> On Mon, Jun 10, 2019 at 10:47:52AM -0700, Xing, Cedric wrote:
> > > From: Christopherson, Sean J
> > > Sent: Monday, June 10, 2019 8:56 AM
> > >
> > > > > As a result, LSM policies cannot be meaningfully applied, e.g.
> > > > > an LSM can deny access to the EPC as a whole, but can't deny
> > > > > PROT_EXEC on page that originated in a non-EXECUTE file (which
> > > > > is long gone by the time
> > > > > mprotect() is called).
> > > >
> > > > I have hard time following what is paragraph is trying to say.
> > > >
> > > > > By hooking mprotect(), SGX can make explicit LSM upcalls while
> > > > > an enclave is being built, i.e. when the kernel has a handle to
> > > > > origin of each enclave page, and enforce the result of the LSM
> > > > > policy whenever userspace maps the enclave page in the future.
> > > >
> > > > "LSM policy whenever calls mprotect()"? I'm no sure why you mean
> > > > by mapping here and if there is any need to talk about future.
> > > > Isn't this needed now?
> > >
> > > Future is referring to the timeline of a running kernel, not the
> > > future of the kernel code.
> > >
> > > Rather than trying to explain all of the above with words, I'll
> > > provide code examples to show how ->may_protect() will be used by
> > > SGX and why it is the preferred solution.
> >
> > The LSM concept is to separate security policy enforcement from the
> > rest of the kernel. For modules, the "official" way is to use VM_MAY*
> > flags to limit allowable permissions, while LSM uses
> security_file_mprotect().
> > I guess that's why we didn't have .may_mprotect() in the first place.
> 
> Heh, so I've typed up about five different responses to this comment.
> In doing so, I think I've convinced myself that ->may_mprotect() is
> unnecessary.  Rther than hook mprotect(), simply update the VM_MAY*
> flags during mmap(), with all bits cleared if there isn't an associated
> enclave page.  IIRC, the need to add ->may_protect() came about when we
> were exploring more dynamic interplay between SGX and LSMs.
> 
> > What you are doing is enforcing some security policy outside of LSM,
> > which is dirty from architecture perspective.
> 
> No, the enclave page protections are enforced regardless of LSM policy,
> and in v2 those protections are immutable.  Yes, the explicit enclave
> page protection bits are being added primarily for LSMs, but they don't
> impact functionality other than at the security_enclave_load()
> touchpoint.

Disagreed.

You can say you want to enforce "something" without LSM. But what's the purpose 
of that "something" without LSM? Why doesn't the original mprotect() enforce 
that "something"?

It *does* affect functionality because user mode code has to figure out an 
"explicit protection" to make sure the enclave would work with *and also* 
without LSM. That said, the "explicit protection" can neither be too 
restrictive (or enclave wouldn't work) nor be too permissive (or LSM policies 
are violated). But what if the user mode code doesn't have appropriate 
"explicit protection" ahead of time as it is just going to mprotect() as the 
enclave requests at runtime?

And your restrictions on mmap()'ing non-existing pages also have great impacts 
to SGX2 support.

I think some reasonable answers are needed to the above questions before we can 
call this proposal viable. 


RE: [RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits

2019-06-10 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Wednesday, June 05, 2019 7:12 PM
> 
> +/**
> + * sgx_map_allowed - check vma protections against the associated
> enclave page
> + * @encl:an enclave
> + * @start:   start address of the mapping (inclusive)
> + * @end: end address of the mapping (exclusive)
> + * @prot:protection bits of the mapping
> + *
> + * Verify a userspace mapping to an enclave page would not violate the
> +security
> + * requirements of the *kernel*.  Note, this is in no way related to
> +the
> + * page protections enforced by hardware via the EPCM.  The EPCM
> +protections
> + * can be directly extended by the enclave, i.e. cannot be relied upon
> +by the
> + * kernel for security guarantees of any kind.
> + *
> + * Return:
> + *   0 on success,
> + *   -EACCES if the mapping is disallowed
> + */
> +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
> + unsigned long end, unsigned long prot) {
> + struct sgx_encl_page *page;
> + unsigned long addr;
> +
> + prot &= (VM_READ | VM_WRITE | VM_EXEC);
> + if (!prot || !encl)
> + return 0;
> +
> + mutex_lock(&encl->lock);
> +
> + for (addr = start; addr < end; addr += PAGE_SIZE) {
> + page = radix_tree_lookup(&encl->page_tree, addr >>
> PAGE_SHIFT);
> +
> + /*
> +  * Do not allow R|W|X to a non-existent page, or protections
> +  * beyond those of the existing enclave page.
> +  */
> + if (!page || (prot & ~page->prot))
> + return -EACCES;

In SGX2, pages will be "mapped" before being populated.

Here's a brief summary for those who don't have enough background on how new 
EPC pages could be added to a running enclave in SGX2:
  - There are 2 new instructions - EACCEPT and EAUG.
  - EAUG is used by SGX module to add (augment) a new page to an existing 
enclave. The newly added page is *inaccessible* until the enclave *accepts* it.
  - EACCEPT is the instruction for an enclave to accept a new page.

And the s/w flow for an enclave to request new EPC pages is expected to be 
something like the following:
  - The enclave issues EACCEPT at the linear address that it would like a new 
page.
  - EACCEPT results in #PF, as there's no page at the linear address above.
  - SGX module is notified about the #PF, in form of its vma->vm_ops->fault() 
being called by kernel.
  - SGX module EAUGs a new EPC page at the fault address, and resumes the 
enclave.
  - EACCEPT is reattempted, and succeeds at this time.

But with the above check in sgx_map_allowed(), I'm not sure how this will work 
out with SGX2.

> + }
> +
> + mutex_unlock(&encl->lock);
> +
> + return 0;
> +}
> +


RE: [RFC PATCH v2 1/5] mm: Introduce vm_ops->may_mprotect()

2019-06-10 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Monday, June 10, 2019 8:56 AM
> 
> > > As a result, LSM policies cannot be meaningfully applied, e.g. an
> > > LSM can deny access to the EPC as a whole, but can't deny PROT_EXEC
> > > on page that originated in a non-EXECUTE file (which is long gone by
> > > the time
> > > mprotect() is called).
> >
> > I have hard time following what is paragraph is trying to say.
> >
> > > By hooking mprotect(), SGX can make explicit LSM upcalls while an
> > > enclave is being built, i.e. when the kernel has a handle to origin
> > > of each enclave page, and enforce the result of the LSM policy
> > > whenever userspace maps the enclave page in the future.
> >
> > "LSM policy whenever calls mprotect()"? I'm no sure why you mean by
> > mapping here and if there is any need to talk about future. Isn't this
> > needed now?
> 
> Future is referring to the timeline of a running kernel, not the future
> of the kernel code.
> 
> Rather than trying to explain all of the above with words, I'll provide
> code examples to show how ->may_protect() will be used by SGX and why it
> is the preferred solution.

The LSM concept is to separate security policy enforcement from the rest of the 
kernel. For modules, the "official" way is to use VM_MAY* flags to limit 
allowable permissions, while LSM uses security_file_mprotect(). I guess that's 
why we didn't have .may_mprotect() in the first place. What you are doing is 
enforcing some security policy outside of LSM, which is dirty from architecture 
perspective.


RE: [RFC PATCH 2/9] x86/sgx: Do not naturally align MAP_FIXED address

2019-06-04 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Andy Lutomirski
> Sent: Tuesday, June 04, 2019 1:16 PM
> 
> On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
>  wrote:
> >
> > On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that
> > > is tracked and enforced by the CPU using a base+mask approach,
> > > similar to how hardware range registers such as the variable MTRRs.
> > > As a result, the ELRANGE must be naturally sized and aligned.
> > >
> > > To reduce boilerplate code that would be needed in every userspace
> > > enclave loader, the SGX driver naturally aligns the mmap() address
> > > and also requires the range to be naturally sized.  Unfortunately,
> > > SGX fails to grant a waiver to the MAP_FIXED case, e.g. incorrectly
> > > rejects mmap() if userspace is attempting to map a small slice of an
> existing enclave.
> > >
> > > Signed-off-by: Sean Christopherson 
> >
> > Why you want to allow mmap() to be called multiple times? mmap() could
> > be allowed only once with PROT_NONE and denied afterwards. Is this for
> > sending fd to another process that would map already existing enclave?
> >
> > I don't see any checks for whether the is enclave underneath. Also, I
> > think that in all cases mmap() callback should allow only PROT_NONE as
> > permissions for clarity even if it could called multiple times.
> >
> 
> What's the advantage to only allowing PROT_NONE?  The idea here is to
> allow a PROT_NONE map followed by some replacemets that overlay it for
> the individual segments.  Admittedly, mprotect() can do the same thing,
> but disallowing mmap() seems at least a bit surprising.

Disallowing mmap() is not only surprising but also unnecessary.

A bit off topic here. This mmap()/mprotect() discussion reminds me a question 
(guess for Jarkko): Now that vma->vm_file->private_data keeps a pointer to the 
enclave, why do we store it again in vma->vm_private? It isn't a big deal but 
non-NULL vm_private does prevent mprotect() from merging adjacent VMAs. 



RE: [RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl()

2019-06-04 Thread Xing, Cedric
> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Tuesday, June 04, 2019 1:18 PM
> 
> On Mon, Jun 3, 2019 at 1:39 PM Sean Christopherson
>  wrote:
> >
> > On Mon, Jun 03, 2019 at 01:08:04PM -0700, Sean Christopherson wrote:
> > > On Sun, Jun 02, 2019 at 11:26:09PM -0700, Xing, Cedric wrote:
> > > > > From: Christopherson, Sean J
> > > > > Sent: Friday, May 31, 2019 4:32 PM
> > > > >
> > > > > +/**
> > > > > + * sgx_ioc_enclave_add_pages - handler for
> > > > > +%SGX_IOC_ENCLAVE_ADD_PAGES
> > > > > + *
> > > > > + * @filep:   open file to /dev/sgx
> > > > > + * @cmd: the command value
> > > > > + * @arg: pointer to an &sgx_enclave_add_page instance
> > > > > + *
> > > > > + * Add a range of pages to an uninitialized enclave (EADD), and
> > > > > +optionally
> > > > > + * extend the enclave's measurement with the contents of the
> page (EEXTEND).
> > > > > + * The range of pages must be virtually contiguous.  The
> > > > > +SECINFO and
> > > > > + * measurement maskare applied to all pages, i.e. pages with
> > > > > +different
> > > > > + * properties must be added in separate calls.
> > > > > + *
> > > > > + * EADD and EEXTEND are done asynchronously via worker threads.
> > > > > +A successful
> > > > > + * sgx_ioc_enclave_add_page() only indicates the pages have
> > > > > +been added to the
> > > > > + * work queue, it does not guarantee adding the pages to the
> > > > > +enclave will
> > > > > + * succeed.
> > > > > + *
> > > > > + * Return:
> > > > > + *   0 on success,
> > > > > + *   -errno otherwise
> > > > > + */
> > > > > +static long sgx_ioc_enclave_add_pages(struct file *filep,
> unsigned int cmd,
> > > > > +   unsigned long arg) {  struct
> > > > > +sgx_enclave_add_pages *addp = (void *)arg;  struct sgx_encl
> > > > > +*encl = filep->private_data;  struct sgx_secinfo secinfo;
> > > > > +unsigned int i;  int ret;
> > > > > +
> > > > > + if (copy_from_user(&secinfo, (void __user *)addp->secinfo,
> > > > > +sizeof(secinfo)))
> > > > > + return -EFAULT;
> > > > > +
> > > > > + for (i = 0, ret = 0; i < addp->nr_pages && !ret; i++) {
> > > > > + if (signal_pending(current))
> > > > > + return -ERESTARTSYS;
> > > >
> > > > If interrupted, how would user mode code know how many pages have
> been EADD'ed?
> > >
> > > Hmm, updating nr_pages would be fairly simple and shouldn't confuse
> > > userspace, e.g. as opposed to overloading the return value.
> >
> > Or maybe update @addr and @src as well?  That would allow userspace to
> > re-invoke the ioctl() without having to modify the struct.
> 
> If you're going to use -ERESTARTSYS, that's the way to go.  -EINTR would
> be an alternative.  A benefit of -ERESTARTSYS is that, with -EINTR, it
> wouldn't be that surprising for user code to simply fail to handle it.

-EINTR means the call was interrupted before anything could be done. Am I 
correct?

But in this case some pages have been processed already so I guess we cannot 
return any error code. I think it more reasonable to return the number of pages 
(or bytes) processed.


RE: [RFC PATCH 7/9] x86/sgx: Enforce noexec filesystem restriction for enclaves

2019-06-04 Thread Xing, Cedric
> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Tuesday, June 04, 2019 1:25 PM
> To: Jarkko Sakkinen 
> 
> On Tue, Jun 4, 2019 at 9:26 AM Jarkko Sakkinen
>  wrote:
> >
> > On Fri, May 31, 2019 at 04:31:57PM -0700, Sean Christopherson wrote:
> > > Do not allow an enclave page to be mapped with PROT_EXEC if the
> > > source page is backed by a file on a noexec file system.
> > >
> > > Signed-off-by: Sean Christopherson 
> >
> > Why don't you just check in sgx_encl_add_page() that whether the path
> > comes from noexec and deny if SECINFO contains X?
> >
> 
> SECINFO seems almost entirely useless for this kind of thing because of
> SGX2.  I'm thinking that SECINFO should be completely ignored for
> anything other than its required architectural purpose.

For the purpose of allowing/denying EADD/EAUG, SECINFO is useless. 

But SECINFO contains also the page type. What's coming as new feature of SGX2 
is CONFIGID, which is a 512-bit value inside SECS, provided by untrusted code 
at ECREATE. Usually CONFIGID is a hash of something that would affect the 
behavior of the enclave. For example, the "main" enclave could be a JVM with 
the actual applet being loaded hashed into SECS.CONFIGID. In that case the 
enclave's measurements (MRENCLAVE) will stay the same for all applets yet 
individual applet will have distinct CONFIGID and receive distinct keys. When 
it comes to LSM, a policy may want to whitelist/blacklist applets for a JVM so 
a hook at ECREATE may be desirable. We could either define a new hook, or 
overload security_enclave_load() by providing SECINFO as one of its parameters.


RE: [RFC PATCH 8/9] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX

2019-06-04 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Tuesday, June 04, 2019 1:37 PM
> 
> On Tue, Jun 04, 2019 at 01:29:10PM -0700, Andy Lutomirski wrote:
> > On Fri, May 31, 2019 at 4:32 PM Sean Christopherson
> >  wrote:
> > >  static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long
> > > addr, diff --git a/include/linux/lsm_hooks.h
> > > b/include/linux/lsm_hooks.h index 47f58cfb6a19..0562775424a0 100644
> > > --- a/include/linux/lsm_hooks.h
> > > +++ b/include/linux/lsm_hooks.h
> > > @@ -1446,6 +1446,14 @@
> > >   * @bpf_prog_free_security:
> > >   * Clean up the security information stored inside bpf prog.
> > >   *
> > > + * Security hooks for Intel SGX enclaves.
> > > + *
> > > + * @enclave_load:
> > > + * On success, returns 0 and optionally adjusts @allowed_prot
> > > + * @vma: the source memory region of the enclave page being
> loaded.
> > > + * @prot: the initial protection of the enclave page.
> >
> > What do you mean "initial"?  The page is always mapped PROT_NONE when
> > this is called, right?  I feel like I must be missing something here.
> 
> Initial protection in the EPCM.  Yet another reason to ignore SECINFO.

I know you guys are talking in the background that all pages are mmap()'ed 
PROT_NONE. But that's an unnecessary limitation. And @prot here should be 
@target_vma->vm_flags&(VM_READ|VM_WRITE|VM_EXEC). 


RE: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

2019-06-04 Thread Xing, Cedric
Hi Stephen,

> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Tuesday, June 04, 2019 8:34 AM
> 
> On 6/3/19 2:30 PM, Xing, Cedric wrote:
> >> From: Christopherson, Sean J
> >> Sent: Monday, June 03, 2019 10:16 AM
> >>
> >> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> >>> Hi Sean,
> >>>
> >>> Generally I agree with your direction but think ALLOW_* flags are
> >>> completely internal to LSM because they can be both produced and
> >>> consumed inside an LSM module. So spilling them into SGX driver and
> >>> also user mode code makes the solution ugly and in some cases
> >>> impractical because not every enclave host process has a priori
> >>> knowledge on whether or not an enclave page would be EMODPE'd at
> >> runtime.
> >>
> >> In this case, the host process should tag *all* pages it *might*
> convert
> >> to executable as ALLOW_EXEC.  LSMs can (and should/will) be written
> in
> >> such a way that denying ALLOW_EXEC is fatal to the enclave if and
> only
> >> if the enclave actually attempts mprotect(PROT_EXEC).
> >
> > What if those pages contain self-modifying code but the host doesn't
> know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD?
> Then would it prevent those pages to start with PROT_EXEC?
> >
> > Anyway, my point is that it is unnecessary even if it works.
> >
> >>
> >> Take the SELinux path for example.  The only scenario in which
> >> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> >> PROT_EXEC.
> >> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> >> then PROT_EXEC will be cleared from @allowed_prot.
> >>
> >> As Stephen pointed out, auditing the denials on @allowed_prot means
> the
> >> log will contain false positives of a sort.  But this is more of a
> noise
> >> issue than true false positives.  E.g. there are three possible
> outcomes
> >> for the enclave.
> >>
> >>- The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> >>  Requesting ALLOW_EXEC is either a straightforward a userspace
> bug or
> >>  a poorly written generic enclave loader.
> >>
> >>- The enclave conditionally performs EMODPE[PROT_EXEC].  In this
> case
> >>  the denial is a true false positive.
> >>
> >>- The enclave does EMODPE[PROT_EXEC] and its host userspace then
> fails
> >>  on mprotect(PROT_EXEC), i.e. the LSM denial is working as
> intended.
> >>  The audit log will be noisy, but viewed as a whole the denials
> >> aren't
> >>  false positives.
> >
> > What I was talking about was EMODPE[PROT_WRITE] on an RX page.
> >
> >>
> >> The potential for noisy audit logs and/or false positives is
> unfortunate,
> >> but it's (by far) the lesser of many evils.
> >>
> >>> Theoretically speaking, what you really need is a per page flag
> (let's
> >>> name it WRITTEN?) indicating whether a page has ever been written to
> >>> (or more precisely, granted PROT_WRITE), which will be used to
> decide
> >>> whether to grant PROT_EXEC when requested in future. Given the fact
> >>> that all mprotect() goes through LSM and mmap() is limited to
> >>> PROT_NONE, it's easy for LSM to capture that flag by itself instead
> of
> >> asking user mode code to provide it.
> >>>
> >>> That said, here is the summary of what I think is a better approach.
> >>> * In hook security_file_alloc(), if @file is an enclave, allocate
> some
> >> data
> >>>structure to store for every page, the WRITTEN flag as described
> >> above.
> >>>WRITTEN is cleared initially for all pages.
> >>
> >> This would effectively require *every* LSM to duplicate the SGX
> driver's
> >> functionality, e.g. track per-page metadata, implement locking to
> >> prevent races between multiple mm structs, etc...
> >
> > Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_*
> are no difference than PROCESS__* or FILE__* flags, which are just
> artifacts to assist particular LSMs in decision making. They are never
> considered part of the LSM interface, even if other LSMs than SELinux
> may adopt the same/similar approach.
> >
> > If code du

RE: [RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl()

2019-06-03 Thread Xing, Cedric



> -Original Message-
> From: Christopherson, Sean J
> Sent: Monday, June 03, 2019 1:37 PM
> To: Hansen, Dave 
> Cc: Jarkko Sakkinen ; Andy Lutomirski
> ; Xing, Cedric ; Stephen Smalley
> ; James Morris ; Serge E . Hallyn
> ; LSM List ;
> Paul Moore ; Eric Paris ;
> seli...@vger.kernel.org; Jethro Beekman ; Thomas
> Gleixner ; Linus Torvalds  foundation.org>; LKML ; X86 ML
> ; linux-...@vger.kernel.org; Andrew Morton  foundation.org>; nhor...@redhat.com; npmccal...@redhat.com; Ayoun, Serge
> ; Katz-zamir, Shay ;
> Huang, Haitao ; Andy Shevchenko
> ; Svahn, Kai ;
> Borislav Petkov ; Josh Triplett ;
> Huang, Kai ; David Rientjes ;
> Roberts, William C ; Tricca, Philip B
> 
> Subject: Re: [RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple
> pages in single ioctl()
> 
> On Mon, Jun 03, 2019 at 01:14:45PM -0700, Dave Hansen wrote:
> > On 5/31/19 4:31 PM, Sean Christopherson wrote:
> > > -struct sgx_enclave_add_page {
> > > +struct sgx_enclave_add_pages {
> > >   __u64   addr;
> > >   __u64   src;
> > >   __u64   secinfo;
> > > + __u32   nr_pages;
> > >   __u16   mrmask;
> > >  } __attribute__((__packed__));
> >
> > IMNHO this follows a user interface anti-pattern: exposing page sizes
> > where not strictly required.
> >
> > Think of how this would look to an application if page size was
> > variable.  With this interface, they always need to scale their
> > operations by page size instead of just aligning it.
> 
> I briefly considered taking size in bytes, but I took a shortcut because
> EPC pages are architecturally defined to be 4k sized and aligned.  That
> being said, I don't necessarily disagree, especially if nr_pages isn't
> squeezed into a u32.
> 
> > BTW, why is nr_pages a u32?  Do we never envision a case where you can
> > add more than 4TB of memory to an enclave? ;)
> 
> Heh, fair enough.  IIRC, a while back someone posted about having
> problems building a 512gb enclave in a 92mb EPC...
> 
> How about this for the intermediate patch:
> 
>   struct sgx_enclave_add_region {
>   __u64   addr;
>   __u64   src;
>   __u64   size;
>   __u64   secinfo;
>   __u16   mrmask;
>   __u16   reserved16;
>   __u32   reserved;
>   }
> 
> and with the flags field:
> 
>   struct sgx_enclave_add_region {
>   __u64   addr;
>   __u64   src;
>   __u64   size;
>   __u64   secinfo;
>   __u16   mrmask;
>   __u16   flags;

What is "flags" here?

>   __u32   reserved;
>   }


RE: [RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl()

2019-06-03 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Monday, June 03, 2019 1:40 PM
> 
> On Mon, Jun 03, 2019 at 01:08:04PM -0700, Sean Christopherson wrote:
> > On Sun, Jun 02, 2019 at 11:26:09PM -0700, Xing, Cedric wrote:
> > > > From: Christopherson, Sean J
> > > > Sent: Friday, May 31, 2019 4:32 PM
> > > >
> > > > +/**
> > > > + * sgx_ioc_enclave_add_pages - handler for
> > > > +%SGX_IOC_ENCLAVE_ADD_PAGES
> > > > + *
> > > > + * @filep: open file to /dev/sgx
> > > > + * @cmd:   the command value
> > > > + * @arg:   pointer to an &sgx_enclave_add_page instance
> > > > + *
> > > > + * Add a range of pages to an uninitialized enclave (EADD), and
> > > > +optionally
> > > > + * extend the enclave's measurement with the contents of the page
> (EEXTEND).
> > > > + * The range of pages must be virtually contiguous.  The SECINFO
> > > > +and
> > > > + * measurement maskare applied to all pages, i.e. pages with
> > > > +different
> > > > + * properties must be added in separate calls.
> > > > + *
> > > > + * EADD and EEXTEND are done asynchronously via worker threads.
> > > > +A successful
> > > > + * sgx_ioc_enclave_add_page() only indicates the pages have been
> > > > +added to the
> > > > + * work queue, it does not guarantee adding the pages to the
> > > > +enclave will
> > > > + * succeed.
> > > > + *
> > > > + * Return:
> > > > + *   0 on success,
> > > > + *   -errno otherwise
> > > > + */
> > > > +static long sgx_ioc_enclave_add_pages(struct file *filep,
> unsigned int cmd,
> > > > + unsigned long arg)
> > > > +{
> > > > +   struct sgx_enclave_add_pages *addp = (void *)arg;
> > > > +   struct sgx_encl *encl = filep->private_data;
> > > > +   struct sgx_secinfo secinfo;
> > > > +   unsigned int i;
> > > > +   int ret;
> > > > +
> > > > +   if (copy_from_user(&secinfo, (void __user *)addp->secinfo,
> > > > +  sizeof(secinfo)))
> > > > +   return -EFAULT;
> > > > +
> > > > +   for (i = 0, ret = 0; i < addp->nr_pages && !ret; i++) {
> > > > +   if (signal_pending(current))
> > > > +   return -ERESTARTSYS;
> > >
> > > If interrupted, how would user mode code know how many pages have
> been EADD'ed?
> >
> > Hmm, updating nr_pages would be fairly simple and shouldn't confuse
> > userspace, e.g. as opposed to overloading the return value.
> 
> Or maybe update @addr and @src as well?  That would allow userspace to
> re-invoke the ioctl() without having to modify the struct.

How about returning the number of pages (or bytes) EADD'ed, similar to write() 
syscall? 


RE: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

2019-06-03 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Monday, June 03, 2019 10:16 AM
> 
> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> > Hi Sean,
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> > completely internal to LSM because they can be both produced and
> > consumed inside an LSM module. So spilling them into SGX driver and
> > also user mode code makes the solution ugly and in some cases
> > impractical because not every enclave host process has a priori
> > knowledge on whether or not an enclave page would be EMODPE'd at
> runtime.
> 
> In this case, the host process should tag *all* pages it *might* convert
> to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
> such a way that denying ALLOW_EXEC is fatal to the enclave if and only
> if the enclave actually attempts mprotect(PROT_EXEC).

What if those pages contain self-modifying code but the host doesn't know ahead 
of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent 
those pages to start with PROT_EXEC?

Anyway, my point is that it is unnecessary even if it works.

> 
> Take the SELinux path for example.  The only scenario in which
> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> PROT_EXEC.
> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> then PROT_EXEC will be cleared from @allowed_prot.
> 
> As Stephen pointed out, auditing the denials on @allowed_prot means the
> log will contain false positives of a sort.  But this is more of a noise
> issue than true false positives.  E.g. there are three possible outcomes
> for the enclave.
> 
>   - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> Requesting ALLOW_EXEC is either a straightforward a userspace bug or
> a poorly written generic enclave loader.
> 
>   - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
> the denial is a true false positive.
> 
>   - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
> on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
> The audit log will be noisy, but viewed as a whole the denials
> aren't
> false positives.

What I was talking about was EMODPE[PROT_WRITE] on an RX page.

> 
> The potential for noisy audit logs and/or false positives is unfortunate,
> but it's (by far) the lesser of many evils.
> 
> > Theoretically speaking, what you really need is a per page flag (let's
> > name it WRITTEN?) indicating whether a page has ever been written to
> > (or more precisely, granted PROT_WRITE), which will be used to decide
> > whether to grant PROT_EXEC when requested in future. Given the fact
> > that all mprotect() goes through LSM and mmap() is limited to
> > PROT_NONE, it's easy for LSM to capture that flag by itself instead of
> asking user mode code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data
> >   structure to store for every page, the WRITTEN flag as described
> above.
> >   WRITTEN is cleared initially for all pages.
> 
> This would effectively require *every* LSM to duplicate the SGX driver's
> functionality, e.g. track per-page metadata, implement locking to
> prevent races between multiple mm structs, etc...

Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no 
difference than PROCESS__* or FILE__* flags, which are just artifacts to assist 
particular LSMs in decision making. They are never considered part of the LSM 
interface, even if other LSMs than SELinux may adopt the same/similar approach.

If code duplication is what you are worrying about, you can put them in a 
library, or implement/export them in some new file (maybe security/enclave.c?) 
as utility functions. But spilling them into user mode is what I think is 
unacceptable.

> 
> >   Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can
> >   only be PROT_NONE. This is to force all protection changes to go
> through
> >   security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages
> >   that are requested PROT_WRITE.
> 
> How would an LSM associate a page with a specific enclave?  vma->vm_file
> will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
> because we're allowing multiple processes to map a single enclave, not
> to mention that by mm would require holdin

RE: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

2019-06-03 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Monday, June 03, 2019 10:47 AM
> 
> On 6/2/19 3:29 AM, Xing, Cedric wrote:
> > Hi Sean,
> >
> >> From: Christopherson, Sean J
> >> Sent: Friday, May 31, 2019 4:32 PM
> >>
> >> This series is the result of a rather absurd amount of discussion
> >> over how to get SGX to play nice with LSM policies, without having to
> >> resort to evil shenanigans or put undue burden on userspace.  The
> >> discussion definitely wandered into completely insane territory at
> times, but I think/hope we ended up with something reasonable.
> >>
> >> The basic gist of the approach is to require userspace to declare
> >> what protections are maximally allowed for any given page, e.g. add a
> >> flags field for loading enclave pages that takes
> >> ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed
> >> protections, e.g. clear ALLOW_EXEC to prevent ever mapping the page
> with PROT_EXEC.  SGX enforces the allowed perms via a new mprotect()
> vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> >>
> >> ALLOW_EXEC is used to deny hings like loading an enclave from a
> >> noexec file system or from a file without EXECUTE permissions, e.g.
> >> without the ALLOW_EXEC concept, on SGX2 hardware (regardless of
> >> kernel support) userspace could EADD from a noexec file using read-
> only permissions, and later use mprotect() and ENCLU[EMODPE] to gain
> execute permissions.
> >>
> >> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce
> SELinux's EXECMOD (or EXECMEM).
> >>
> >> This is very much an RFC series.  It's only compile tested, likely
> >> has obvious bugs, the SELinux patch could be completely harebrained,
> etc...
> >> My goal at this point is to get feedback at a macro level, e.g. is
> >> the core concept viable/acceptable, are there objection to hooking
> mprotect(), etc...
> >>
> >> Andy and Cedric, hopefully this aligns with your general expectations
> >> based on our last discussion.
> >
> > I couldn't understand the real intentions of ALLOW_* flags until I saw
> > them in code. I have to say C is more expressive than English in that
> > regard :)
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> completely internal to LSM because they can be both produced and
> consumed inside an LSM module. So spilling them into SGX driver and also
> user mode code makes the solution ugly and in some cases impractical
> because not every enclave host process has a priori knowledge on whether
> or not an enclave page would be EMODPE'd at runtime.
> >
> > Theoretically speaking, what you really need is a per page flag (let's
> name it WRITTEN?) indicating whether a page has ever been written to (or
> more precisely, granted PROT_WRITE), which will be used to decide
> whether to grant PROT_EXEC when requested in future. Given the fact that
> all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's
> easy for LSM to capture that flag by itself instead of asking user mode
> code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data structure to store for every page, the WRITTEN flag as described
> above. WRITTEN is cleared initially for all pages.
> >Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can only be PROT_NONE. This is to force all protection changes to
> go through security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages that are requested PROT_WRITE.
> > * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> > look up and use WRITTEN flags for all pages within @vma, along with
> > other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of
> > SELinux) to decide on
> allowing/rejecting @prot.
> 
> At this point we have no knowledge of the source vma/file, right?  So
> what do we check FILE__EXECUTE and/or FILE__EXECMOD against?
> vma->vm_file at this point is /dev/sgx/enclave, right?

My apology to the confusions here.

Yes, vma->vm_file is always /dev/sgx/enclave, but each open("/dev/sgx/enclave") 
returns a *new* file struct (let's denote it as @enclave_fd) that uniquely 
identifies one enclave instance, and the expectation is that 
@enclave_fd->f_security would be used by LSM to store enclave specific 
information, including ALLOW_* flags and whatever deemed appropriate by an LSM 
module.

In the case of SELinux, and if the choice is to use FILE__EXECMOD of .sigstruct 
file to authorize RW->RX at runtime, then SELinux could cache that flag in 
@enclave_fd->f_security upon security_enclave_init().


RE: [RFC PATCH 7/9] x86/sgx: Enforce noexec filesystem restriction for enclaves

2019-06-02 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> Do not allow an enclave page to be mapped with PROT_EXEC if the source page 
> is backed by a
> file on a noexec file system.
> 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> index c30acd3fbbdd..5f71be7cbb01 100644
> --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
> @@ -576,6 +576,27 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, 
> unsigned long
> addr,
>   return ret;
>  }
> 
> +static int sgx_encl_page_protect(unsigned long src, unsigned long prot,
> +  unsigned long *allowed_prot)
> +{
> + struct vm_area_struct *vma;
> +
> + if (!(*allowed_prot & VM_EXEC))
> + goto do_check;
> +
> + down_read(¤t->mm->mmap_sem);
> + vma = find_vma(current->mm, src);
> + if (!vma || (vma->vm_file && path_noexec(&vma->vm_file->f_path)))
> + *allowed_prot &= ~VM_EXEC;

Testing (vma->vm_flags & VM_MAYEXEC) == 0 should be a better approach.

Moreover, it looks like the check is done per page, so say 100 pages would 
cause this test to run 100 times even if they are within the same VMA. Wouldn't 
that be a bit inefficient? 
 
> + up_read(¤t->mm->mmap_sem);
> +
> +do_check:
> + if (prot & ~*allowed_prot)
> + return -EACCES;
> +
> + return 0;
> +}


RE: [RFC PATCH 6/9] x86/sgx: Require userspace to provide allowed prots to ADD_PAGES

2019-06-02 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> ...to support (the equivalent) of existing Linux Security Module 
> functionality.
> 
> Because SGX manually manages EPC memory, all enclave VMAs are backed by the 
> same vm_file,
> i.e. /dev/sgx/enclave, so that SGX can implement the necessary hooks to move 
> pages in/out
> of the EPC.  And because EPC pages for any given enclave are fundamentally 
> shared between
> processes, i.e.
> CoW semantics are not possible with EPC pages, /dev/sgx/enclave must always 
> be MAP_SHARED.
> Lastly, all real world enclaves will need read, write and execute permissions 
> to EPC pages.
> As a result, SGX does not play nice with existing LSM behavior as it is 
> impossible to
> apply policies to enclaves with any reasonable granularity, e.g. an LSM can 
> deny access to
> EPC altogether, but can't deny potentially dangerous behavior such as mapping 
> pages RW->RW
> or RWX.
> 
> To give LSMs enough information to implement their policies without having to 
> resort to
> ugly things, e.g. holding a reference to the vm_file of each enclave page, 
> require
> userspace to explicitly state the allowed protections for each page (region), 
> i.e. take
> ALLOW_{READ,WRITE,EXEC} in the ADD_PAGES ioctl.
> 
> The ALLOW_* flags will be passed to LSMs so that they can make informed 
> decisions when the
> enclave is being built, i.e. when the source vm_file is available.  For 
> example, SELinux's
> EXECMOD permission can be required if an enclave is requesting both 
> ALLOW_WRITE and
> ALLOW_EXEC.
> 
> Update the mmap()/mprotect() hooks to enforce the ALLOW_* protections, a la 
> the standard
> VM_MAY{READ,WRITE,EXEC} flags.
> 
> The ALLOW_EXEC flag also has a second (important) use in that it can be used 
> to prevent
> loading an enclave from a noexec file system, on
> SGX2 hardware (regardless of kernel support for SGX2), userspace could EADD 
> from a noexec
> path using read-only permissions and later mprotect() and ENCLU[EMODPE] the 
> page to gain
> execute permissions.  By requiring ALLOW_EXEC up front, SGX will be able to 
> enforce noexec
> paths when building the enclave.

ALLOW_* flags shall be kept internal to LSM.

This patch is completely unnecessary.


RE: [RFC PATCH 5/9] x86/sgx: Restrict mapping without an enclave page to PROT_NONE

2019-06-02 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> To support LSM integration, SGX will require userspace to explicitly specify 
> the allowed
> protections for each page.  The allowed protections will be supplied to and 
> modified by
> LSMs (based on their policies).
> To prevent userspace from circumventing the allowed protections, do not allow
> PROT_{READ,WRITE,EXEC} mappings to an enclave without an associated enclave 
> page (which
> will track the allowed protections).

This is unnecessary. 

For mprotect(), LSM shall validate @prot against existing pages with applicable 
global flags (e.g. FILE__EXECMOD/PROCESS__EXECMEM in the case of SELinux). 

For mmap(), SGX driver could invoke security_file_mprotect() explicitly to have 
LSM validate requested protection.

In the case where there's no page associated with an VMA, 
security_file_mprotect() shall still dictate whether to allow/deny the request. 
LSM internally is able to track existence/nonexistence of enclave pages. If 
there's no page, there's no conflict so the decision shall only depend on 
global flags (if any). Afterwards, #PF may trigger SGX driver to EAUG, in which 
case security_enclave_load() will be invoked and LSM could decide whether to 
approve/decline EAUG request. 


RE: [RFC PATCH 8/9] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX

2019-06-02 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> diff --git a/include/linux/security.h b/include/linux/security.h index
> 659071c2e57c..2f7925eeef7e 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -392,6 +392,8 @@ void security_inode_invalidate_secctx(struct inode 
> *inode);  int
> security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);  int
> security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);  int
> security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
> +int security_enclave_load(struct vm_area_struct *vma, unsigned long prot,
> +   unsigned long *allowed_prot);

Per my comments to the cover letter, security_enclave_load() should have a 
signature similar to the following:

int security_enclave_load(struct file *enclave_fd, unsigned long 
linear_address, unsigned long nr_pages, int prot, struct vm_area_struct 
*source_vma);

@enclave_fd identifies the enclave to which new pages are being added.
@linear_address/@nr_pages specifies the linear range of pages being added.
@prot specifies the initial protection of those newly added pages. It is taken 
from the vma covering the target range.
@source_vma covers the source pages in the case of EADD. An LSM is expected to 
make sure security_file_mprotect(source_vma, prot, prot) would succeed before 
checking anything else, unless @source_vma is NULL, indicating pages are being 
EAUG'ed. In all cases, LSM is expected to "remember" @prot for all those pages 
to be checked in future security_file_mprotect() invocations.

>  #else /* CONFIG_SECURITY */


RE: [RFC PATCH 4/9] mm: Introduce vm_ops->mprotect()

2019-06-02 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h index 
> 0e8834ac32b7..50a42364a885
> 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -458,6 +458,8 @@ struct vm_operations_struct {
>   void (*close)(struct vm_area_struct * area);
>   int (*split)(struct vm_area_struct * area, unsigned long addr);
>   int (*mremap)(struct vm_area_struct * area);
> + int (*mprotect)(struct vm_area_struct * area, unsigned long start,
> + unsigned long end, unsigned long prot);

As I commented in my reply to the cover letter, SGX driver doesn't need to 
intercept mprotect() if ALLOW_* flags are not spilled into it.

>   vm_fault_t (*fault)(struct vm_fault *vmf);
>   vm_fault_t (*huge_fault)(struct vm_fault *vmf,
>   enum page_entry_size pe_size);



RE: [RFC PATCH 3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl()

2019-06-02 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> +/**
> + * sgx_ioc_enclave_add_pages - handler for %SGX_IOC_ENCLAVE_ADD_PAGES
> + *
> + * @filep:   open file to /dev/sgx
> + * @cmd: the command value
> + * @arg: pointer to an &sgx_enclave_add_page instance
> + *
> + * Add a range of pages to an uninitialized enclave (EADD), and
> +optionally
> + * extend the enclave's measurement with the contents of the page (EEXTEND).
> + * The range of pages must be virtually contiguous.  The SECINFO and
> + * measurement maskare applied to all pages, i.e. pages with different
> + * properties must be added in separate calls.
> + *
> + * EADD and EEXTEND are done asynchronously via worker threads.  A
> +successful
> + * sgx_ioc_enclave_add_page() only indicates the pages have been added
> +to the
> + * work queue, it does not guarantee adding the pages to the enclave
> +will
> + * succeed.
> + *
> + * Return:
> + *   0 on success,
> + *   -errno otherwise
> + */
> +static long sgx_ioc_enclave_add_pages(struct file *filep, unsigned int cmd,
> +   unsigned long arg)
> +{
> + struct sgx_enclave_add_pages *addp = (void *)arg;
> + struct sgx_encl *encl = filep->private_data;
> + struct sgx_secinfo secinfo;
> + unsigned int i;
> + int ret;
> +
> + if (copy_from_user(&secinfo, (void __user *)addp->secinfo,
> +sizeof(secinfo)))
> + return -EFAULT;
> +
> + for (i = 0, ret = 0; i < addp->nr_pages && !ret; i++) {
> + if (signal_pending(current))
> + return -ERESTARTSYS;

If interrupted, how would user mode code know how many pages have been EADD'ed?

> +
> + if (need_resched())
> + cond_resched();
> +
> + ret = sgx_encl_add_page(encl, addp->addr + i*PAGE_SIZE,
> + addp->src + i*PAGE_SIZE,
> + &secinfo, addp->mrmask);
> + }
>   return ret;
>  }
> 
> @@ -823,8 +845,8 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, 
> unsigned long arg)
>   case SGX_IOC_ENCLAVE_CREATE:
>   handler = sgx_ioc_enclave_create;
>   break;
> - case SGX_IOC_ENCLAVE_ADD_PAGE:
> - handler = sgx_ioc_enclave_add_page;
> + case SGX_IOC_ENCLAVE_ADD_PAGES:
> + handler = sgx_ioc_enclave_add_pages;
>   break;
>   case SGX_IOC_ENCLAVE_INIT:
>   handler = sgx_ioc_enclave_init;
> --
> 2.21.0



RE: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

2019-06-02 Thread Xing, Cedric
Hi Sean,

> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> This series is the result of a rather absurd amount of discussion over how to 
> get SGX to play
> nice with LSM policies, without having to resort to evil shenanigans or put 
> undue burden on
> userspace.  The discussion definitely wandered into completely insane 
> territory at times, but
> I think/hope we ended up with something reasonable.
> 
> The basic gist of the approach is to require userspace to declare what 
> protections are
> maximally allowed for any given page, e.g. add a flags field for loading 
> enclave pages that
> takes ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed protections, 
> e.g. clear
> ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC.  SGX enforces the 
> allowed perms
> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses 
> MAY_{READ,WRITE,EXEC}.
> 
> ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file 
> system or from a
> file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on 
> SGX2 hardware
> (regardless of kernel support) userspace could EADD from a noexec file using 
> read-only
> permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute 
> permissions.
> 
> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's 
> EXECMOD (or EXECMEM).
> 
> This is very much an RFC series.  It's only compile tested, likely has 
> obvious bugs, the
> SELinux patch could be completely harebrained, etc...
> My goal at this point is to get feedback at a macro level, e.g. is the core 
> concept
> viable/acceptable, are there objection to hooking mprotect(), etc...
> 
> Andy and Cedric, hopefully this aligns with your general expectations based 
> on our last
> discussion.

I couldn't understand the real intentions of ALLOW_* flags until I saw them in 
code. I have to say C is more expressive than English in that regard :)

Generally I agree with your direction but think ALLOW_* flags are completely 
internal to LSM because they can be both produced and consumed inside an LSM 
module. So spilling them into SGX driver and also user mode code makes the 
solution ugly and in some cases impractical because not every enclave host 
process has a priori knowledge on whether or not an enclave page would be 
EMODPE'd at runtime.

Theoretically speaking, what you really need is a per page flag (let's name it 
WRITTEN?) indicating whether a page has ever been written to (or more 
precisely, granted PROT_WRITE), which will be used to decide whether to grant 
PROT_EXEC when requested in future. Given the fact that all mprotect() goes 
through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture 
that flag by itself instead of asking user mode code to provide it.

That said, here is the summary of what I think is a better approach.
* In hook security_file_alloc(), if @file is an enclave, allocate some data 
structure to store for every page, the WRITTEN flag as described above. WRITTEN 
is cleared initially for all pages.
  Open: Given a file of type struct file *, how to tell if it is an enclave 
(i.e. /dev/sgx/enclave)?
* In hook security_mmap_file(), if @file is an enclave, make sure @prot can 
only be PROT_NONE. This is to force all protection changes to go through 
security_file_mprotect().
* In the newly introduced hook security_enclave_load(), set WRITTEN for pages 
that are requested PROT_WRITE.
* In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up and 
use WRITTEN flags for all pages within @vma, along with other global flags 
(e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide on 
allowing/rejecting @prot.
* In hook security_file_free(), if @file is an enclave, free storage allocated 
for WRITTEN flags. 

I'll try to make more detailed comments in my replies to individual patches 
sometime tomorrow.

> 
> Lastly, I added a patch to allow userspace to add multiple pages in a single 
> ioctl().  It's
> obviously not directly related to the security stuff, but the idea 
> tangentially came up during
> earlier discussions and it's something I think the UAPI should provide (it's 
> a tiny change).
> Since I was modifying the UAPI anyways, I threw it in.
> 
> Sean Christopherson (9):
>   x86/sgx: Remove unused local variable in sgx_encl_release()
>   x86/sgx: Do not naturally align MAP_FIXED address
>   x86/sgx: Allow userspace to add multiple pages in single ioctl()
>   mm: Introduce vm_ops->mprotect()
>   x86/sgx: Restrict mapping without an enclave page to PROT_NONE
>   x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
>   x86/sgx: Enforce noexec filesystem restriction for enclaves
>   LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
>   security/selinux: Add enclave_load() implementation
> 
>  arch/x86/include/uapi/asm/sgx.h|  30 --
>  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +
> arch/x

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-30 Thread Xing, Cedric
Hi Andy,

I'm just curious how Sean convinced you that "MAXPERM doesn't work". More 
specifically, I'm objecting to the statement of "the enclave loader won't know 
at load time whether a given EAUG-ed page will ever be executed".

I'm still new to LSM so my understanding may not be correct. But I think LSM 
policy defines a boundary that "loosely" restricts what a process can do. By 
"loosely", I mean it is usually more permissive than a process needs. For 
example, FILE__EXECMOD basically says there are self-modifying code in a file, 
but it isn't specific on which pages contain self-modifying code, hence *all* 
pages are allowed mprotect(RW->RX) even though only a (small) subset actually 
need it. LSM policies are static too, as FILE__EXECMOD is given by a system 
admin to be associated with the disk file, instead of being 
requested/programmed by any processes loading/mapping that file.

So I think the same rationale applies to enclaves. Your original idea of 
MAXPERM is the policy set forth by system admin and shall *never* change at 
runtime. If an enclave is dynamically linked and needs to bring in code pages 
at runtime, the admin needs to enable it by setting, say ENCLAVE__EXECMOD, in 
the sigstruct file. Then all EAUG'ed pages will receive RWX as MAXPERM. The 
process would then mprotect() selective pages to be RX but which exact set of 
pages doesn't concern LSM usually.

> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Thursday, May 30, 2019 12:21 PM
> 
> On Thu, May 30, 2019 at 11:01 AM Sean Christopherson 
> 
> wrote:
> >
> > On Thu, May 30, 2019 at 09:14:10AM -0700, Andy Lutomirski wrote:
> > > On Thu, May 30, 2019 at 8:04 AM Stephen Smalley  
> > > wrote:
> > > >
> > > > On 5/30/19 10:31 AM, Andy Lutomirski wrote:
> > > > > Hi all-
> > > > >
> > > > > After an offline discussion with Sean yesterday, here are some
> > > > > updates to the user API parts of my proposal.
> > > > >
> > > > > Unfortunately, Sean convinced me that MAXPERM doesn't work the
> > > > > way I described it because, for SGX2, the enclave loader won't
> > > > > know at load time whether a given EAUG-ed page will ever be
> > > > > executed.  So here's an update.
> > > > >
> > > > > First, here are the requrements as I see them, where EXECUTE,
> > > > > EXECMOD, and EXECMEM could be substituted with other rules at
> > > > > the LSM's
> > > > > discretion:
> > > > >
> > > > >   - You can create a WX or RWX mapping if and only if you have 
> > > > > EXECMEM.
> > > > >
> > > > >   - To create an X mapping of an enclave page that has ever been
> > > > > W, you need EXECMOD.
> > > >
> > > > EXECMOD to what file? The enclave file from which the page's
> > > > content originated, the sigstruct file, or /dev/sgx/enclave?
> > >
> > > I leave that decision to you :)  The user should need permission to
> > > do an execmod thing on an enclave, however that wants to be encoded.
> >
> > But that decision dictates how the SGX API handles sigstruct.  If LSMs
> > want to associate EXECMOD with sigstruct, then SGX needs to take
> > sigstruct early and hold a reference to the file for the lifetime of the 
> > enclave.
> > And if we're going to do that, the whole approach of inheriting
> > permissions from source VMAs becomes unnecessary complexity.
> >
> > > >
> > > > >   - To create an X mapping of an enclave page that came from
> > > > > EADD, you need EXECUTE on the source file.  Optionally, we could
> > > > > also permit this if you have EXECMOD.
> > > >
> > > > What is the "source file" i.e. the target of the check?  Enclave
> > > > file, sigstruct file, or /dev/sgx/enclave?
> > >
> > > Enclave file -- that is, the file backing the vma from which the data is 
> > > loaded.
> >
> > It wasn't explicitly called out in Andy's proposal(s), but the idea is
> > that the SGX driver would effectively inherit permissions from the
> > source VMA (EADD needs a source for the initial value of the encave page).
> 
> I actually meant for it to *not* work like this.  I don't want the source VMA 
> to have to
> be VM_EXEC.  I think the LSM should just check permissions on ->vm_file.

-Cedric


RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-29 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org 
> [mailto:linux-sgx-ow...@vger.kernel.org] On Behalf
> Of Stephen Smalley
> Sent: Wednesday, May 29, 2019 7:08 AM
> 
> On 5/28/19 4:24 PM, Sean Christopherson wrote:
> > On Sat, May 25, 2019 at 11:09:38PM -0700, Xing, Cedric wrote:
> >>> From: Andy Lutomirski [mailto:l...@kernel.org]
> >>> Sent: Saturday, May 25, 2019 5:58 PM
> >>>
> >>> On Sat, May 25, 2019 at 3:40 PM Xing, Cedric  
> >>> wrote:
> >>>>
> >>>> If we think of EADD as a way of mmap()'ing an enclave file into memory,
> >>>> would this
> >>> security_enclave_load() be the same as
> >>> security_mmap_file(source_vma->vm_file, maxperm, MAP_PRIVATE), except that
> >>> the target is now EPC instead of regular pages?
> >>>
> >>> Hmm, that's clever.  Although it seems plausible that an LSM would want to
> >>> allow RX or RWX of a given file page but only in the context of an 
> >>> approved
> >>> enclave, so I think it should still be its own hook.
> >>
> >> What do you mean by "in the context of an approved enclave"? EPC pages are
> >> *inaccessible* to any software until after EINIT. So it would never be a
> >> security concern to EADD a page with wrong permissions as long as the 
> >> enclave
> >> would be denied eventually by LSM at EINIT.
> >>
> >> But I acknowledge the difference between loading a page into regular memory
> >> vs. into EPC. So it's beneficial to have a separate hook, which if not
> >> hooked, would pass through to security_mmap_file() by default?
> >
> > Mapping the enclave will still go through security_mmap_file(), the extra
> > security_enclave_load() hook allows the mmap() to use PROT_NONE.
> >
> >>> If it's going to be in an arbitrary file, then I think the signature 
> >>> needs to be more
> like:
> >>>
> >>> int security_enclave_init(struct vm_area_struct *sigstruct_vma, loff_t
> sigstruct_offset,
> >>> const sgx_sigstruct *sigstruct);
> >>>
> >>> So that the LSM still has the opportunity to base its decision on the 
> >>> contents of the
> >>> SIGSTRUCT.  Actually, we need that change regardless.
> >>
> >> Wouldn't the pair of { sigstruct_vma, sigstruct_offset } be the same as 
> >> just
> >> a pointer, because the VMA could be looked up using the pointer and the
> >> offset would then be (pointer - vma->vm_start)?
> >
> > VMA has vm_file, e.g. the .sigstruct file labeled by LSMs.  That being
> > said, why does the LSM need the VMA?  E.g. why not this?
> >
> >int security_enclave_init(struct file *file, struct sgx_sigstruct 
> > *sigstruct);
> >
> >>>> Loosely speaking, an enclave (including initial contents of all of its 
> >>>> pages and
> their
> >>> permissions) and its MRENCLAVE are a 1-to-1 correspondence (given the 
> >>> collision
> resistant
> >>> property of SHA-2). So only one is needed for a decision, and either one 
> >>> would lead to
> the
> >>> same decision. So I don't see anything making any sense here.
> >>>>
> >>>> Theoretically speaking, if LSM can make a decision at EINIT by means of
> >>> security_enclave_load(), then security_enclave_load() is never needed.
> >>>>
> >>>> In practice, I support keeping both because security_enclave_load() can 
> >>>> only approve
> an
> >>> enumerable set while security_enclave_load() can approve a non-enumerable 
> >>> set of
> enclaves.
> >>> Moreover, in order to determine the validity of a MRENCLAVE (as in 
> >>> development of a
> policy
> >>> or in creation of a white/black list), system admins will need the audit 
> >>> log produced
> by
> >>> security_enclave_load().
> >>>
> >>> I'm confused.  Things like MRSIGNER aren't known until the SIGSTRUCT shows
> >>> up.  Also, security_enclave_load() provides no protection against loading 
> >>> a
> >>> mishmash of two different enclave files.  I see security_enclave_init() as
> >>> "verify this SIGSTRUCT against your policy on who may sign enclaves and/or
> >>> grant EXECMOD depending on SIGSTRUCT" and security_enclave_load() as
> >>> "implement your EXECMOD / EXECUTE / WRITE / whatever policy

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-29 Thread Xing, Cedric
> From: Christopherson, Sean J
> Sent: Tuesday, May 28, 2019 2:41 PM
> 
> On Tue, May 28, 2019 at 01:48:02PM -0700, Andy Lutomirski wrote:
> > On Tue, May 28, 2019 at 1:24 PM Sean Christopherson
> >  wrote:
> > >
> > > Actually, I think we do have everything we need from an LSM perspective.
> > > LSMs just need to understand that sgx_enclave_load() with a NULL vma
> > > implies a transition from RW.  For example, SELinux would interpret
> > > sgx_enclave_load(NULL, RX) as requiring FILE__EXECMOD.
> >
> > You lost me here.  What operation triggers this callback?  And
> > wouldn't sgx_enclave_load(NULL, RX) sometimes be a transition from RO
> > or just some fresh executable zero bytes?
> 
> An explicit ioctl() after EACCEPTCOPY to update the allowed permissions.
> For all intents and purposes, the EAUG'd page must start RW.  Maybe a better 
> way to phrase
> it is that at some point the page must be writable to have any value 
> whatsover.
> EACCEPTCOPY explicitly requires the page to be at least RW.  EACCEPT 
> technically doesn't
> require RW, but a RO or RX zero page is useless.  Userspace could still 
> EACCEPT with RO or
> RX, but SGX would assume a minimum of RW for the purposes of the LSM check.

Why is an explicit ioctl() necessary after EACCEPTCOPY? Or why is mprotect() 
not sufficient?

I tend to agree on Andy's MAXPERM model where MAXPERM never changes once 
established.

> 
> > > As Cedric mentioned earlier, the host process doesn't necessarily
> > > know which pages will end up RW vs RX, i.e. sgx_enclave_load(NULL,
> > > RX) already has to be invoked at runtime, and when that happens, the
> > > kernel can take the opportunity to change the VMAs from MAY_RW to MAY_RX.
> > >
> > > For simplicity in the kernel and clarity in userspace, it makes
> > > sense to require an explicit ioctl() to add the to-be-EAUG'd range.
> > > That just leaves us wanting an ioctl() to set the post-EACCEPT{COPY} 
> > > permissions.
> > >
> > > E.g.:
> > >
> > > ioctl(_ADD_REGION, { NULL }) /* NULL == EAUG, MAY_RW */
> > >
> > > mprotect(addr, size, RW);
> > > ...
> > >
> > > EACCEPTCOPY -> EAUG /* page fault handler */
> > >
> > > ioctl(_ACTIVATE_REGION, { addr, size, RX}) /* MAY_RX */
> > >
> > > mprotect(addr, size, RX);
> >
> > In the maxperm model, this mprotect() will fail unless MAXPERM
> > contains RX, which could only happen if MAXPERM=RWX.  So, regardless
> > of how it's actually mapped to SELinux policy, MAXPERM=RWX is
> > functionally like EXECMOD and actual RWX PTEs are functionally like
> > EXECMEM.
> 
> Yep, same idea, except in the proposed flow ACTIVATE_REGION.


> 
> > > ...
> > >
> > > And making ACTIVATE_REGION a single-shot per page eliminates the
> > > need for the MAXPERMS concept (see below).
> > >
> > > > If we keep only one MAXPERM, wouldn't this be the current behavior
> > > > of mmap()/mprotect()?
> > > >
> > > > To be a bit more clear, system admin sets MAXPERM upper bound in
> > > > the form of FILE__{READ|WRITE|EXECUTE|EXECMOD} of
> > > > /dev/sgx/enclave. Then for a process/enclave, if what it requires
> > > > falls below what's allowed on /dev/sgx/enclave, then everything
> > > > will just work. Otherwise, it fails in the form of -EPERM returned
> > > > from mmap()/mprotect(). Please note that MAXPERM here applies to
> > > > "runtime" permissions, while "initial" permissions are taken care
> > > > of by security_enclave_{load|init}. "initial" permissions could be
> > > > more permissive than "runtime" permissions, e.g., RX is still
> > > > required for initial code pages even though system admins could disable 
> > > > dynamically
> loaded code pages by *not* giving FILE__{EXECUTE|EXECMOD}. Therefore, the 
> "initial"
> > > > mapping would still have to be done by the driver (to bypass LSM),
> > > > either via a new ioctl or as part of IOC_EINIT.
> > >
> > > Aha!
> > >
> > > Starting with Cedric's assertion that initial permissions can be
> > > taken directly from SECINFO:
> > >
> > >   - Initial permissions for *EADD* pages are explicitly handled via
> > > sgx_enclave_load() with the exact SECINFO permissions.
> > >
> > >   - Initial permissions for *EAUG* are unconditionally RW.  EACCEPTCOPY
> > > requires the target EPC page to be RW, and EACCEPT with RO is useless.
> > >
> > >   - Runtime permissions break down as follows:
> > >   R   - N/A, subset of RW (EAUG)
> > >   W   - N/A, subset of RW (EAUG) and x86 paging can't do W
> > >   X   - N/A, subset of RX (x86 paging can't do XO)
> >
> > Sure it can!  You just have a hypervisor that maps a PA bit to EPT
> > no-read.  Then you can use that PA bit to suppress read.  Also, Linux
> > already abuses PKRU to simulate XO, although that won't work for
> > enclaves.
> 
> Heh, I intentionally said "x86 paging" to rule out EPT :-)  I'm pretty sure 
> it's a moot
> point though, I have a hard time believing an LSM will allow RW->X and not 
> RW->RX.
> 
> > >   RW  - Handled by EAUG LSM hook (us

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-25 Thread Xing, Cedric
> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Saturday, May 25, 2019 5:58 PM
> 
> On Sat, May 25, 2019 at 3:40 PM Xing, Cedric  wrote:
> >
> > > From: Andy Lutomirski [mailto:l...@amacapital.net]
> > > Sent: Friday, May 24, 2019 4:42 PM
> > >
> > > > On May 24, 2019, at 3:41 PM, Sean Christopherson 
> > > > 
> wrote:
> > > >
> > > >> On Fri, May 24, 2019 at 02:27:34PM -0700, Andy Lutomirski wrote:
> > > >> On Fri, May 24, 2019 at 1:03 PM Sean Christopherson
> > > >>  wrote:
> > > >>>
> > > >>>> On Fri, May 24, 2019 at 12:37:44PM -0700, Andy Lutomirski wrote:
> > > >>>>> On Fri, May 24, 2019 at 11:34 AM Xing, Cedric 
> > > >>>>>  wrote:
> > > >>>>>
> > > >>>>> If "initial permissions" for enclaves are less restrictive
> > > >>>>> than shared objects, then it'd become a backdoor for
> > > >>>>> circumventing LSM when enclave whitelisting is *not* in place.
> > > >>>>> For example, an adversary may load a page, which would
> > > >>>>> otherwise never be executable, as an executable
> > > page in EPC.
> > > >>>>>
> > > >>>>> In the case a RWX page is needed, the calling process has to
> > > >>>>> have a RWX page serving as the source for EADD so
> > > >>>>> PROCESS__EXECMEM will have been checked. For SGX2, changing an
> > > >>>>> EPC page to RWX is subject to FILE__EXECMEM on
> > > >>>>> /dev/sgx/enclave, which I see as a security benefit because it
> > > >>>>> only affects the enclave but not the whole process hosting
> > > it.
> > > >>>>
> > > >>>> So the permission would be like FILE__EXECMOD on the source
> > > >>>> enclave page, because it would be mapped MAP_ANONYMOUS, PROT_WRITE?
> > > >>>> MAP_SHARED, PROT_WRITE isn't going to work because that means
> > > >>>> you can modify the file.
> > > >>>
> > > >>> Was this in response to Cedric's comment, or to my comment?
> > > >>
> > > >> Yours.  I think that requiring source pages to be actually mapped
> > > >> W is not such a great idea.
> > > >
> > > > I wasn't requiring source pages to be mapped W.  At least I didn't
> > > > intend to require W.  What I was trying to say is that SGX could
> > > > trigger an EXECMEM check if userspace attempted to EADD or EAUG an
> > > > enclave page with RWX permissions, e.g.:
> > > >
> > > >  if ((SECINFO.PERMS & RWX) == RWX) {
> > > >  ret = security_mmap_file(NULL, RWX, ???);
> > > >  if (ret)
> > > >  return ret;
> > > >  }
> > > >
> > > > But that's a moot point if we add security_enclave_load() or whatever.
> > > >
> > > >>
> > > >>>
> > > >>>> I'm starting to think that looking at the source VMA permission
> > > >>>> bits or source PTE permission bits is putting a bit too much
> > > >>>> policy into the driver as opposed to the LSM.  How about
> > > >>>> delegating the whole thing to an LSM hook?  The EADD operation
> > > >>>> would invoke a new hook, something like:
> > > >>>>
> > > >>>> int security_enclave_load_bytes(void *source_addr, struct
> > > >>>> vm_area_struct *source_vma, loff_t source_offset, unsigned int
> > > >>>> maxperm);
> > > >>>>
> > > >>>> Then you don't have to muck with mapping anything PROT_EXEC.
> > > >>>> Instead you load from a mapping of a file and the LSM applies
> > > >>>> whatever policy it feels appropriate.  If the first pass gets
> > > >>>> something wrong, the application or library authors can take it
> > > >>>> up with the SELinux folks without breaking the whole ABI :)
> > > >>>>
> > > >>>> (I'm proposing passing in the source_vma because this hook
> > > >>>> would be called with mmap_sem held for read to avoid a TOCTOU
> > > >>>> race.)
> > > >>>>
> > >

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-25 Thread Xing, Cedric
> From: Andy Lutomirski [mailto:l...@amacapital.net]
> Sent: Friday, May 24, 2019 4:42 PM
> 
> > On May 24, 2019, at 3:41 PM, Sean Christopherson 
> >  wrote:
> >
> >> On Fri, May 24, 2019 at 02:27:34PM -0700, Andy Lutomirski wrote:
> >> On Fri, May 24, 2019 at 1:03 PM Sean Christopherson
> >>  wrote:
> >>>
> >>>> On Fri, May 24, 2019 at 12:37:44PM -0700, Andy Lutomirski wrote:
> >>>>> On Fri, May 24, 2019 at 11:34 AM Xing, Cedric  
> >>>>> wrote:
> >>>>>
> >>>>> If "initial permissions" for enclaves are less restrictive than
> >>>>> shared objects, then it'd become a backdoor for circumventing LSM
> >>>>> when enclave whitelisting is *not* in place. For example, an
> >>>>> adversary may load a page, which would otherwise never be executable, 
> >>>>> as an executable
> page in EPC.
> >>>>>
> >>>>> In the case a RWX page is needed, the calling process has to have
> >>>>> a RWX page serving as the source for EADD so PROCESS__EXECMEM will
> >>>>> have been checked. For SGX2, changing an EPC page to RWX is
> >>>>> subject to FILE__EXECMEM on /dev/sgx/enclave, which I see as a
> >>>>> security benefit because it only affects the enclave but not the whole 
> >>>>> process hosting
> it.
> >>>>
> >>>> So the permission would be like FILE__EXECMOD on the source enclave
> >>>> page, because it would be mapped MAP_ANONYMOUS, PROT_WRITE?
> >>>> MAP_SHARED, PROT_WRITE isn't going to work because that means you
> >>>> can modify the file.
> >>>
> >>> Was this in response to Cedric's comment, or to my comment?
> >>
> >> Yours.  I think that requiring source pages to be actually mapped W
> >> is not such a great idea.
> >
> > I wasn't requiring source pages to be mapped W.  At least I didn't
> > intend to require W.  What I was trying to say is that SGX could
> > trigger an EXECMEM check if userspace attempted to EADD or EAUG an
> > enclave page with RWX permissions, e.g.:
> >
> >  if ((SECINFO.PERMS & RWX) == RWX) {
> >  ret = security_mmap_file(NULL, RWX, ???);
> >  if (ret)
> >  return ret;
> >  }
> >
> > But that's a moot point if we add security_enclave_load() or whatever.
> >
> >>
> >>>
> >>>> I'm starting to think that looking at the source VMA permission
> >>>> bits or source PTE permission bits is putting a bit too much policy
> >>>> into the driver as opposed to the LSM.  How about delegating the
> >>>> whole thing to an LSM hook?  The EADD operation would invoke a new
> >>>> hook, something like:
> >>>>
> >>>> int security_enclave_load_bytes(void *source_addr, struct
> >>>> vm_area_struct *source_vma, loff_t source_offset, unsigned int
> >>>> maxperm);
> >>>>
> >>>> Then you don't have to muck with mapping anything PROT_EXEC.
> >>>> Instead you load from a mapping of a file and the LSM applies
> >>>> whatever policy it feels appropriate.  If the first pass gets
> >>>> something wrong, the application or library authors can take it up
> >>>> with the SELinux folks without breaking the whole ABI :)
> >>>>
> >>>> (I'm proposing passing in the source_vma because this hook would be
> >>>> called with mmap_sem held for read to avoid a TOCTOU race.)
> >>>>
> >>>> If we go this route, the only substantial change to the existing
> >>>> driver that's needed for an initial upstream merge is the maxperm
> >>>> mechanism and whatever hopefully minimal API changes are needed to
> >>>> allow users to conveniently set up the mappings.  And we don't need
> >>>> to worry about how to hack around mprotect() calling into the LSM,
> >>>> because the LSM will actually be aware of SGX and can just do the
> >>>> right thing.
> >>>
> >>> This doesn't address restricting which processes can run which
> >>> enclaves, it only allows restricting the build flow.  Or are you
> >>> suggesting this be done in addition to whitelisting sigstructs?
> >>
> >> In addition.
> >>
> >> But I named the function badl

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Sean Christopherson
> Sent: Friday, May 24, 2019 1:04 PM
> 
> On Fri, May 24, 2019 at 12:37:44PM -0700, Andy Lutomirski wrote:
> > On Fri, May 24, 2019 at 11:34 AM Xing, Cedric 
> wrote:
> > >
> > > If "initial permissions" for enclaves are less restrictive than
> > > shared objects, then it'd become a backdoor for circumventing LSM
> > > when enclave whitelisting is *not* in place. For example, an
> > > adversary may load a page, which would otherwise never be executable,
> as an executable page in EPC.
> > >
> > > In the case a RWX page is needed, the calling process has to have a
> > > RWX page serving as the source for EADD so PROCESS__EXECMEM will
> > > have been checked. For SGX2, changing an EPC page to RWX is subject
> > > to FILE__EXECMEM on /dev/sgx/enclave, which I see as a security
> > > benefit because it only affects the enclave but not the whole
> process hosting it.
> >
> > So the permission would be like FILE__EXECMOD on the source enclave
> > page, because it would be mapped MAP_ANONYMOUS, PROT_WRITE?
> > MAP_SHARED, PROT_WRITE isn't going to work because that means you can
> > modify the file.
> 
> Was this in response to Cedric's comment, or to my comment?

Creating RWX source page requires PROCESS_EXECMEM. But as I responded to Sean 
earlier, I think his proposal of "aggregating" all "initial" permission checks 
into a single SIGSTRUCT check is probably a better approach.

> 
> > I'm starting to think that looking at the source VMA permission bits
> > or source PTE permission bits is putting a bit too much policy into
> > the driver as opposed to the LSM.  How about delegating the whole
> > thing to an LSM hook?  The EADD operation would invoke a new hook,
> > something like:
> >
> > int security_enclave_load_bytes(void *source_addr, struct
> > vm_area_struct *source_vma, loff_t source_offset, unsigned int
> > maxperm);

This is exactly what I was thinking. But with Sean's proposal this is probably 
no longer necessary.

> >
> > Then you don't have to muck with mapping anything PROT_EXEC.  Instead
> > you load from a mapping of a file and the LSM applies whatever policy
> > it feels appropriate.  If the first pass gets something wrong, the
> > application or library authors can take it up with the SELinux folks
> > without breaking the whole ABI :)
> >
> > (I'm proposing passing in the source_vma because this hook would be
> > called with mmap_sem held for read to avoid a TOCTOU race.)
> >
> > If we go this route, the only substantial change to the existing
> > driver that's needed for an initial upstream merge is the maxperm
> > mechanism and whatever hopefully minimal API changes are needed to
> > allow users to conveniently set up the mappings.  And we don't need to
> > worry about how to hack around mprotect() calling into the LSM,
> > because the LSM will actually be aware of SGX and can just do the
> > right thing.
> 
> This doesn't address restricting which processes can run which enclaves,
> it only allows restricting the build flow.  Or are you suggesting this
> be done in addition to whitelisting sigstructs?

In the context of SELinux, new types could be defined to be associated with 
SIGSTRUCT (or more precisely, files containing SIGSTRUCTs). Then the LSM hook 
(I'd propose security_sgx_initialize_enclave) could enforce whatever...

> 
> What's the value prop beyond whitelisting sigstructs?  Realistically, I
> doubt LSMs/users will want to take the performance hit of scanning the
> source bytes every time an enclave is loaded.
> 
> We could add seomthing like security_enclave_mprotect() in lieu of
> abusing security_file_mprotect(), but passing the full source bytes
> seems a bit much.

I'd just use /dev/sgx/enclave to govern "runtime" permissions any EPC page can 
mmap()/mprotect() to. Then we won't need any code changes in LSM.

-Cedric


RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Sean Christopherson
> Sent: Friday, May 24, 2019 12:14 PM
> 
> My point is that enclaves have different properties than shared objects.
> 
> Normal LSM behavior with regard to executing files is to label files
> with e.g. FILE__EXECUTE.  Because an enclave must be built to the exact
> specifications of .sigstruct, requring FILE__EXECUTE on the .sigstruct
> is effectively the same as requiring FILE__EXECUTE on the enclave itself.
> 
> Addressing your scenario of loading an executable page in EPC, doing so
> would require one of the following:
> 
>   - Ability to install a .sigstruct with FILE__EXECUTE
> 
>   - PROCESS__EXECMEM
> 
>   - FILE__EXECMOD and SGX2 support

Now I got your point. It sounds a great idea to me!

But instead of using .sigstruct file, I'd still recommend using file mapping 
(i.e. SIGSTRUCT needs to reside in executable memory). But then there'll be a 
hole - a process having FILE__EXECMOD on any file could use that file as a 
SIGSTRUCT. Probably we'll need a new type in SELinux to label enclave/sigstruct 
files.

> 
> > In the case a RWX page is needed, the calling process has to have a
> > RWX page serving as the source for EADD so PROCESS__EXECMEM will have
> been checked.
> > For SGX2, changing an EPC page to RWX is subject to FILE__EXECMEM on
> > /dev/sgx/enclave, which I see as a security benefit because it only
> > affects the enclave but not the whole process hosting it.
> 
> There is no FILE__EXECMEM check, only PROCESS__EXECMEM and FILE__EXECMOD.
> I assume you're referring to the latter?

Yes.

> 
> I don't see a fundamental difference between having RWX in an enclave
> and RWX in normal memory, either way the process can execute arbitrary
> code, i.e. PROCESS__EXECMEM is appropriate.  Yes, an enclave will #UD on
> certain instructions, but that's easily sidestepped by having a
> trampoline in the host (marked RX) and piping arbitrary code into the
> enclave.  Or using EEXIT to do a bit of ROP.

I'm with you.

With your proposal only FILE__EXECMOD is needed on /dev/sgx/enclave to launch 
Graphene enclaves or the like.

> 
> > > > No changes are required to LSMs, SGX1 has a single LSM touchpoint
> > > > in
> > > its
> > > > mmap(), and I *think* the only required userspace change is to
> > > > mmap() PROT_NONE when allocating the enclave's virtual address
> range.
> >
> > I'm not sure I understand the motivation behind this proposal to
> > decouple initial EPC permissions from source pages.
> 
> Pulling permissions from source pages means userspace needs to fully map
> the in normal memory, including marking pages executable.  That exposes
> the loader to having executable pages in its address space that it has
> no intention of executing (outside of the enclave).  And for Graphene,
> it means having to actively avoid PROCESS__EXECMEM, e.g. by using a
> dummy backing file to build the enclave instead of anon memory.

Agreed.

> 
> > I don't think it a big deal to fully mmap() enclave files, which have
> > to be parsed by user mode anyway to determine various things including
> > but not limited to the size of heap(s), size and number of
> > TCSs/stacks/TLS areas, and the overall enclave size. So with PHDRs
> > parsed, it's trivial to mmap() each segment with permissions from its
> PHDR.
> >
> > > > As for Graphene, it doesn't need extra permissions to run its
> > > > enclaves, it just needs a way to install .sigstruct, which is a
> > > > generic permissions problem and not SGX specific.
> > > >
> > > >
> > > > For SGX2 maybe:
> > > >
> > > >   - No additional requirements to map an EAUG'd page as RW page.
> Not
> > > > aligned with standard MAP_SHARED behavior, but we really don't
> want
> > > > to require FILE__WRITE, and thus allow writes to .sigstruct.
> > > >
> > > >   - Require FILE__EXECMOD on the .sigstruct to map previously
> writable
> > > > page as executable (which indirectly includes all EAUG'd
> pages).
> > > > Wiring this up will be a little funky, but we again we don't
> want
> > > > to require FILE__WRITE on .sigstruct.
> > > >
> >
> > I'm lost. Why is EAUG tied to permissions on .sigstruct?
> 
> Because for the purposes of LSM checks, .sigstruct is the enclave's
> backing file, and mapping a previously writable enclave page as
> exectuable is roughly equivalent to mapping a CoW'd page as exectuable.

I think I've got your idea. You are trying to use permissions on .sigstruct to 
determine whether EAUG will be available to that specific enclave. Am I right?

I'd tie EAUG to the permissions of /dev/sgx/enclave instead. But why? There are 
couple of reasons. For one, a SIGSTRUCT identifies the behavior of the enclave, 
hence the SGX features needed by that enclave. So if an enclave requires EAUG, 
the .sigstruct has to allow EAUG or the enclave wouldn't work. That means the 
system admin wouldn't have a choice but to match up what's needed by the 

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Xing, Cedric
> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Sean Christopherson
> Sent: Friday, May 24, 2019 10:55 AM
> 
> On Fri, May 24, 2019 at 10:42:43AM -0700, Sean Christopherson wrote:
> > Hmm, I've been thinking more about pulling permissions from the source
> > page.  Conceptually I'm not sure we need to meet the same requirements
> as
> > non-enclave DSOs while the enclave is being built, i.e. do we really
> need
> > to force userspace to fully map the enclave in normal memory?
> >
> > Consider the Graphene scenario where it's building an enclave on the
> fly.
> > Pulling permissions from the source VMAs means Graphene has to map the
> > code pages of the enclave with X.  This means Graphene will need
> EXEDMOD
> > (or EXECMEM if Graphene isn't careful).  In a non-SGX scenario this
> makes
> > perfect sense since there is no way to verify the end result of RW->RX.
> >
> > But for SGX, assuming enclaves are whitelisted by their sigstruct
> (checked
> > at EINIT) and because page permissions affect sigstruct.MRENCLAVE, it
> *is*
> > possible to verify the resulting RX contents.  E.g. for the purposes
> of
> > LSMs, can't we use the .sigstruct file as a proxy for the enclave and
> > require FILE__EXECUTE on the .sigstruct inode to map/run the enclave?
> >
> > Stephen, is my logic sound?
> >
> >
> > If so...
> >
> >   - Require FILE__READ+FILE__EXECUTE on .sigstruct to mmap() the
> enclave.
> >
> >   - Prevent userspace from mapping the enclave with permissions beyond
> the
> > original permissions of the enclave.  This can be done by
> populating
> > VM_MAY{READ,WRITE,EXEC} from the SECINFO (same basic concept as
> Andy's
> > proposals).  E.g. pre-EINIT, mmap() and mprotect() can only
> succeed
> > with PROT_NONE.
> >
> >   - Require FILE__{READ,WRITE,EXECUTE} on /dev/sgx/enclave for
> simplicity,
> > or provide an alternate SGX_IOC_MPROTECT if we want to sidestep
> the
> > FILE__WRITE requirement.
> 
> One more thought.  EADD (and the equivalent SGX2 flow) could do
> security_mmap_file() with a NULL file on the SECINFO permissions, which
> would trigger PROCESS_EXECMEM if an enclave attempts to map a page RWX.

If "initial permissions" for enclaves are less restrictive than shared objects, 
then it'd become a backdoor for circumventing LSM when enclave whitelisting is 
*not* in place. For example, an adversary may load a page, which would 
otherwise never be executable, as an executable page in EPC.

In the case a RWX page is needed, the calling process has to have a RWX page 
serving as the source for EADD so PROCESS__EXECMEM will have been checked. For 
SGX2, changing an EPC page to RWX is subject to FILE__EXECMEM on 
/dev/sgx/enclave, which I see as a security benefit because it only affects the 
enclave but not the whole process hosting it.

> 
> > No changes are required to LSMs, SGX1 has a single LSM touchpoint in
> its
> > mmap(), and I *think* the only required userspace change is to mmap()
> > PROT_NONE when allocating the enclave's virtual address range.

I'm not sure I understand the motivation behind this proposal to decouple 
initial EPC permissions from source pages.

I don't think it a big deal to fully mmap() enclave files, which have to be 
parsed by user mode anyway to determine various things including but not 
limited to the size of heap(s), size and number of TCSs/stacks/TLS areas, and 
the overall enclave size. So with PHDRs parsed, it's trivial to mmap() each 
segment with permissions from its PHDR.

> >
> > As for Graphene, it doesn't need extra permissions to run its enclaves,
> > it just needs a way to install .sigstruct, which is a generic
> permissions
> > problem and not SGX specific.
> >
> >
> > For SGX2 maybe:
> >
> >   - No additional requirements to map an EAUG'd page as RW page.  Not
> > aligned with standard MAP_SHARED behavior, but we really don't
> want
> > to require FILE__WRITE, and thus allow writes to .sigstruct.
> >
> >   - Require FILE__EXECMOD on the .sigstruct to map previously writable
> > page as executable (which indirectly includes all EAUG'd pages).
> > Wiring this up will be a little funky, but we again we don't want
> > to require FILE__WRITE on .sigstruct.
> >

I'm lost. Why is EAUG tied to permissions on .sigstruct? 

-Cedric


RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Xing, Cedric
Hi Stephen,

> On 5/24/19 3:24 AM, Xing, Cedric wrote:
> >
> > When we talk about EPC page permissions with SGX2 in mind, I think we
> should distinguish between initial permissions and runtime permissions.
> Initial permissions refer to the page permissions set at EADD. They are
> technically set by "untrusted" code so should go by policies similar to
> those applicable to regular shared objects. Runtime permissions refer to
> the permissions granted by EMODPE, EAUG and EACCEPTCOPY. They are
> resulted from inherent behavior of the enclave, which in theory is
> determined by the enclave's measurements (MRENCLAVE and/or MRSIGNER).
> >
> > And we have 2 distinct files to work with - the enclave file and
> /dev/sgx/enclave. And I consider the enclave file a logical source for
> initial permissions while /dev/sgx/enclave is a means to control runtime
> permissions. Then we can have a simpler approach like the pseudo code
> below.
> >
> > /**
> >   * Summary:
> >   * - The enclave file resembles a shared object that contains
> RO/RX/RW segments
> >   * - FILE__* are assigned to /dev/sgx/enclave, to determine
> acceptable permissions to mmap()/mprotect(), valid combinations are
> >   *   + FILE__READ - Allow SGX1 enclaves only
> >   *   + FILE__READ|FILE__WRITE - Allow SGX2 enclaves to expand data
> segments (e.g. heaps, stacks, etc.)
> >   *   + FILE__READ|FILE__WRITE|FILE__EXECUTE - Allow SGX2 enclaves to
> expend both data and code segments. This is necessary to support
> dynamically linked enclaves (e.g. Graphene)
> >   *   + FILE__READ|FILE__EXECUTE - Allow RW->RX changes for SGX1
> enclaves - necessary to support dynamically linked enclaves (e.g.
> Graphene) on SGX1. EXECMEM is also required for this to work
> 
> I think EXECMOD would fit better than EXECMEM for this case; the former
> is applied for RW->RX changes for private file mappings while the latter
> is applied for WX private file mappings.
> 
> >   *   +  - Disallow the calling process to launch any enclaves
> >   */
> >
> > /* Step 1: mmap() the enclave file according to the segment attributes
> > (similar to what dlopen() would do for regular shared objects) */ int
> > image_fd = open("/path/to/enclave/file", O_RDONLY);
> 
> FILE__READ checked to enclave file upon open().

Yes. We'd like to have the enclave file pass LSM/IMA checks and let EPC pages 
"inherit" the permissions from it as "initial" permissions.

> 
> > foreach phdr in loadable segments /* phdr->p_type == PT_LOAD */ {
> >  /*  below is subject to LSM checks */
> >  loadable_segments[i] = mmap(NULL, phdr->p_memsz, MAP_PRIATE,
> > , image_fd, phdr->p_offset);
> 
> FILE__READ revalidated and FILE__EXECUTE checked to enclave file upon
> mmap() for PROT_READ and PROT_EXEC respectively.  FILE__WRITE not
> checked even for PROT_WRITE mappings since it is a private file mapping
> and writes do not reach the file.  EXECMEM checked if any segment
> permission has both W and X simultaneously.  EXECMOD checked on any
> subsequent mprotect() RW->RX changes (if modified).

Yes. The intention here is to make sure all X pages come directly from file 
(unless EXECMEM or EXECMOD is granted). And because the driver will grant X 
only if the source page also has X, we can assert that all executable EPC pages 
are loaded from a file that has passed LSM/IMA checks.

> 
> > }
> >
> > /* Step 2: Create enclave */
> > int enclave_fd = open("/dev/sgx/enclave", O_RDONLY /* or O_RDWR for
> > SGX2 enclaves */);
> 
> FILE__READ checked (SGX1) or both FILE__READ and FILE__WRITE checked
> (SGX2) to /dev/sgx/enclave upon open().  Assuming that we are returning
> an open file referencing the /dev/sgx/enclave inode and not an anon
> inode, else we lose all subsequent FILE__* checking on mmap/mprotect and
> trigger EXECMEM on any mmap/mprotect PROT_EXEC.

Yes, the returned fd will be referencing /dev/sgx/enclave. The intention here 
is to limit EPC "runtime" permissions by the permissions granted to 
/dev/sgx/enclave, in order to allow user/administrator to specify what kinds of 
enclaves a given process can launch. Per your earlier comments, FILE__EXECMOD 
is probably also needed to support dynamically linked enclaves (that require 
RW->RX changes).

> 
> > void *enclave_base = mmap(NULL, , MAP_SHARED, PROT_READ,
> > enclave_fd, 0); /* Only FILE__READ is required here */
> 
> FILE__READ revalidated to /dev/sgx/enclave upon mmap().

Yes. This mmap() is to set "default" permissions for regions that do *not* have 
EPC pages populated. It is significant only for SGX2, to specify what action to 
take by t

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-24 Thread Xing, Cedric
Hi Andy,

> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Thursday, May 23, 2019 6:18 PM
> 
> On Thu, May 23, 2019 at 4:40 PM Sean Christopherson 
> 
> wrote:
> >
> > On Thu, May 23, 2019 at 08:38:17AM -0700, Andy Lutomirski wrote:
> > > On Thu, May 23, 2019 at 7:17 AM Sean Christopherson
> > >  wrote:
> > > >
> > > > On Thu, May 23, 2019 at 01:26:28PM +0300, Jarkko Sakkinen wrote:
> > > > > On Wed, May 22, 2019 at 07:35:17PM -0700, Sean Christopherson wrote:
> > > > > > But actually, there's no need to disallow mmap() after ECREATE
> > > > > > since the LSM checks also apply to mmap(), e.g. FILE__EXECUTE
> > > > > > would be needed to
> > > > > > mmap() any enclave pages PROT_EXEC.  I guess my past self
> > > > > > thought mmap() bypassed LSM checks?  The real problem is that
> > > > > > mmap()'ng an existing enclave would require FILE__WRITE and
> > > > > > FILE__EXECUTE, which puts us back at square one.
> > > > >
> > > > > I'm lost with the constraints we want to set.
> > > >
> > > > As is today, SELinux policies would require enclave loaders to
> > > > have FILE__WRITE and FILE__EXECUTE permissions on
> > > > /dev/sgx/enclave.  Presumably other LSMs have similar
> > > > requirements.  Requiring all processes to have
> > > > FILE__{WRITE,EXECUTE} permissions means the permissions don't add
> > > > much value, e.g. they can't be used to distinguish between an
> > > > enclave that is being loaded from an unmodified file and an enclave 
> > > > that is being
> generated on the fly, e.g. Graphene.
> > > >
> > > > Looking back at Andy's mail, he was talking about requiring
> > > > FILE__EXECUTE to run an enclave, so perhaps it's only FILE__WRITE
> > > > that we're trying to special case.
> > > >
> > >
> > > I thought about this some more, and I have a new proposal that helps
> > > address the ELRANGE alignment issue and the permission issue at the
> > > cost of some extra verbosity.  Maybe you all can poke holes in it :)
> > > The basic idea is to make everything more explicit from a user's
> > > perspective.  Here's how it works:
> > >
> > > Opening /dev/sgx/enclave gives an enclave_fd that, by design,
> > > doesn't give EXECUTE or WRITE.  mmap() on the enclave_fd only works
> > > if you pass PROT_NONE and gives the correct alignment.  The
> > > resulting VMA cannot be mprotected or mremapped.  It can't be
> > > mmapped at all until
> >
> > I assume you're thinking of clearing all VM_MAY* flags in sgx_mmap()?
> >
> > > after ECREATE because the alignment isn't known before that.
> >
> > I don't follow.  The alignment is known because userspace knows the
> > size of its enclave.  The initial unknown is the address, but that
> > becomes known once the initial mmap() completes.
> 
> [...]
> 
> I think I made the mistake of getting too carried away with implementation 
> details rather
> than just getting to the point.  And I misremembered the ECREATE flow -- 
> oops.  Let me try
> again.  First, here are some problems with some earlier proposals (mine, yours
> Cedric's):
> 
>  - Having the EADD operation always work but have different effects depending 
> on the
> source memory permissions is, at the very least, confusing.

Inheriting permissions from source pages IMHO is the easiest way to validate 
the EPC permissions without any changes to LSM. And the argument about its 
security is also easy to make.

I understand that it may take some effort to document it properly but otherwise 
don't see any practical issues with it.

> 
>  - If we want to encourage user programs to be well-behaved, we want to make 
> it easy to
> map the RX parts of an enclave RX, the RW parts RW, the RO parts R, etc.  But 
> this
> interacts poorly with the sgx_mmap() alignment magic, as you've pointed out.
> 
>  - We don't want to couple LSMs with SGX too tightly.
> 
> So here's how a nice interface might work:
> 
> int enclave_fd = open("/dev/sgx/enclave", O_RDWR);
> 
> /* enclave_fd points to a totally blank enclave. Before ECREATE, we need to 
> decide on an
> address. */
> 
> void *addr = mmap(NULL, size, PROT_NONE, MAP_SHARED, enclave_fd, 0);
> 
> /* we have an address! */
> 
> ioctl(enclave_fd, ECREATE, ...);
> 
> /* now add some data to the enclave.  We want the RWX addition to fail
> immediately unless we have the relevant LSM pemission.   Similarly, we
> want the RX addition to fail immediately unless the source VMA is 
> appropriate. */
> 
> ioctl(enclave_fd, EADD, rx_source_1, MAXPERM=RX, ...);  [the ...
> includes SECINFO, which the kernel doesn't really care about] 
> ioctl(enclave_fd, EADD,
> ro_source_1, MAXPERM=RX ...); ioctl(enclave_fd, EADD, rw_source_1, MAXPERM=RW 
> ...);
> ioctl(enclave_fd, EADD, rwx_source_1, MAXPERM=RWX ...);

If MAXPERM is taken from ioctl parameters, the real question here is how to 
validate MAXPERM. Guess we shouldn't allow arbitrary MAXPERM to be specified by 
user code, and the only logical source I can think of is from the source pages 
(or from the enclave source file, but me

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-16 Thread Xing, Cedric
> From: Andy Lutomirski [mailto:l...@kernel.org]
> 
> On Thu, May 16, 2019 at 3:23 PM Xing, Cedric 
> wrote:
> >
> > Hi Andy,
> >
> > > > SIGSTRUCT isn't necessarily stored on disk so may not always have
> a fd.
> > > How about the following?
> > > > void *ss_pointer = mmap(sigstruct_fd, PROT_READ,...);
> > > > ioctl(enclave_fd, SGX_INIT_THE_ENCLAVE, ss_pointer);
> > > >
> > > > The idea here is SIGSTRUCT will still be passed in memory so it
> > > > works
> > > the same way when no LSM modules are loaded or basing its decision
> > > on the .sigstruct file. Otherwise, an LSM module can figure out the
> > > backing file (and offset within that file) by looking into the VMA
> > > covering ss_pointer.
> > >
> > > I don’t love this approach.  Application authors seem likely to use
> > > read() instead of mmap(), and it’ll still work in many cares. It
> > > would also complicate the kernel implementation, and looking at the
> > > inode backing the vma that backs a pointer is at least rather
> unusual.
> > > Instead, if the sigstruct isn’t on disk because it’s dynamic or came
> > > from a network, the application can put it in a memfd.
> >
> > I understand your concern here. But I guess we are making too much
> assumption on how enclaves are structured/packaged. My concern is, what
> if a SIGSTRUCT really has to be from memory? For example, an enclave
> (along with its SIGSTRUCT) could be embedded inside a shared object (or
> even the "main" executable) so it shows up in memory to begin with.
> 
> Hmm.  That's a fair point, although opening /proc/self/exe could be
> somewhat of a workaround.  It does suffer from a bit of an in-band
> signaling problem, though, in that it's possible that some other random
> bytes in the library resemble a SIGSTRUCT.
> 
> > I was not saying enclaves were exempt to good security practices. What
> I was trying to say was, EPC pages are *not* subject to the same attacks
> as regular pages so I suspect there will be a desire to enforce
> different policies on them, especially after new SGX2
> features/applications become available. So I think it beneficial to
> distinguish between regular vs. enclave virtual ranges. And to do that,
> a new VM_SGX flag in VMA is probably a very simple/easy way. And with
> that VM_SGX flag, we could add a new security_sgx_mprot() hook so that
> LSM modules/policies could act differently.
> 
> I'm not opposed to this, but I also don't think this needs to be in the
> initial upstream driver.  VM_SGX also isn't strictly necessary -- an LSM
> could inspect the VMA to decide whether it's an SGX VMA if it really
> wanted to.

VM_SGX is just what I think is the easiest way for any module to tell enclave 
VMAs from all others. I agree totally with you that doesn't have to be in the 
initial release!

> 
> That being said, do you have any specific behavior differences in mind
> aside from the oddities involved in loading the enclave.

The major thing is dynamically linked enclaves. Say if you want something like 
dlopen() inside an enclave, the driver would need to EAUG a page as RW 
initially, and then change to RX after it has been EACCEPTCOPY'ed by the 
enclave. So it's like a RW->RX transition and an LSM module/policy may want to 
allow it only if it's within an enclave range (ELRANGE), or deny it otherwise.

> 
> >
> > And if you are with me on that bigger picture, the next question is:
> what should be the default behavior of security_sgx_mprot() for
> existing/non-SGX-aware LSM modules/policies? I'd say a reasonable
> default is to allow R, RW and RX, but not anything else. It'd suffice to
> get rid of EXECMEM/EXECMOD requirements on enclave applications. For
> SGX1, EPCM permissions are immutable so it really doesn't matter what
> security_sgx_mprot() does. For SGX2 and beyond, there's still time and
> new SGX-aware LSM modules/policies will probably have emerged by then.
> 
> I hadn't thought about the SGX1 vs SGX2 difference.  If the driver
> initially only wants to support SGX1, then I guess we really could get
> away with constraining the EPC flags based on the source page permission
> and not restricting mprotect() and mmap() permissions on
> /dev/sgx/enclave at all.

This is exactly what I'm going after! 

But I have to apologize for this silly question because I don't know much about 
SELinux: Wouldn't it require changes to existing SELinux policies to *not* 
restrict mprotect() on /dev/sgx/enclave?

-Cedric


RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-16 Thread Xing, Cedric
> > > There is a problem here though. Usually the enclave itself is just a
> > > loader that then loads the application from outside source and
> > > creates the executable pages from the content.
> > >
> > > A great example of this is Graphene that bootstraps unmodified Linux
> > > applications to an enclave:
> > >
> > > https://github.com/oscarlab/graphene
> > >
> >
> > ISTM you should need EXECMEM or similar to run Graphene, then.
> 
> Agreed, Graphene is effectively running arbitrary enclave code.  I'm
> guessing there is nothing that prevents extending/reworking Graphene to
> allow generating the enclave ahead of time so as to avoid populating the
> guts of the enclave at runtime, i.e. it's likely possible to run an
> unmodified application in an enclave without EXECMEM if that's something
> Graphene or its users really care about.

Inefficient use of memory is a problem of running Graphene on SGX1, from at 
least 2 aspects: 1) heaps/stacks have to be pre-allocated but only a small 
portion of those pages will be actually used; and 2) dynamic linking is 
commonly used in *unmodified* applications and all dependent libraries have to 
be loaded, but only a subset of those pages will actually be used - e.g. most 
applications use only a small set of functions in libc.so but the whole library 
still has to be loaded. Hence a practical/efficient solution will 
require/involve EDMM features available in SGX2. I guess we shall look a bit 
further into future in order to address this problem properly. And I think it 
necessary to distinguish enclave virtual ranges from regular ones (probably at 
VMA level) before we could have a practical solution.


RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-16 Thread Xing, Cedric
Hi Andy,

> > SIGSTRUCT isn't necessarily stored on disk so may not always have a fd.
> How about the following?
> > void *ss_pointer = mmap(sigstruct_fd, PROT_READ,...);
> > ioctl(enclave_fd, SGX_INIT_THE_ENCLAVE, ss_pointer);
> >
> > The idea here is SIGSTRUCT will still be passed in memory so it works
> the same way when no LSM modules are loaded or basing its decision on
> the .sigstruct file. Otherwise, an LSM module can figure out the backing
> file (and offset within that file) by looking into the VMA covering
> ss_pointer.
> 
> I don’t love this approach.  Application authors seem likely to use
> read() instead of mmap(), and it’ll still work in many cares. It would
> also complicate the kernel implementation, and looking at the inode
> backing the vma that backs a pointer is at least rather unusual.
> Instead, if the sigstruct isn’t on disk because it’s dynamic or came
> from a network, the application can put it in a memfd.

I understand your concern here. But I guess we are making too much assumption 
on how enclaves are structured/packaged. My concern is, what if a SIGSTRUCT 
really has to be from memory? For example, an enclave (along with its 
SIGSTRUCT) could be embedded inside a shared object (or even the "main" 
executable) so it shows up in memory to begin with. Of course it could be 
copied to a memfd but whatever "attributes" (e.g. path, or SELinux class/type) 
associated with the original file would be lost, so I'm not sure if that would 
work.

I'm also with you that applications tend to use read() instead of mmap() for 
accessing files. But in our case that'd be necessary only if .sigstruct is a 
separate file (hence needs to be read separately). What if (and I guess most 
implementations would) the SIGSTRUCT is embedded in the same file as the 
enclave? mmap() is the more common practice when dealing with executable 
images, and in that case SIGSTRUCT will have already been mmap()'d. 

I'm with you again that it's kind of unprecedented to look at the backing 
inode. But I believe we should strive to allow as large variety of 
applications/usages as possible and I don't see any alternatives without losing 
flexibility.

> >
> >>
> >> /* Actually map the thing */
> >> mmap(enclave_fd RO section, PROT_READ, ...);
> >> mmap(enclave_fd RW section, PROT_READ | PROT_WRITE, ...);
> >> mmap(enclave_fd RX section, PROT_READ | PROT_EXEC, ...);
> >>
> >> /* This should fail unless EXECMOD is available, I think */
> >> mmap(enclave_fd RWX section, PROT_READ | PROT_WRITE | PROT_EXEC);
> >>
> >> And the idea here is that, if the .enclave file isn't mapped
> >> PROT_EXEC, then mmapping the RX section will also require EXECMEM or
> >> EXECMOD.
> >
> > From security perspective, I think it reasonable to give EXECMEM and
> EXECMOD to /dev/sgx/enclave because the actual permissions are guarded
> by EPCM permissions, which are "inherited" from the source pages, whose
> permissions have passed LSM checks.
> 
> I disagree.  If you deny a program EXECMOD, it’s not because you
> distrust the program. It’s because you want to enforce good security
> practices.  (Or you’re Apple and want to disallow third-party JITs.)
> A policy that accepts any sigstruct but requires that enclaves come
> from disk and respect W^X seems entirely reasonable.
> 
> I think that blocking EXECMOD has likely served two very real security
> purposes. It helps force application and library developers to write
> and compile their code in a way that doesn’t rely on dangerous tricks
> like putting executable trampolines on the stack.  It also makes it
> essentially impossible for an exploit to run actual downloaded machine
> code — if there is no way to run code that isn’t appropriately
> labeled, then attackers are more limited in what they can do.

> 
> I don’t think that SGX should become an exception to either of these.
> Code should not have an excuse to use WX memory just because it’s in
> an enclave. Similarly, an exploit should not be able to run an
> attacker-supplied enclave as a way around a policy that would
> otherwise prevent downloaded code from running.

My apology for the confusion here.

I thought EXECMOD applied to files (and memory mappings backed by them) but I 
was probably wrong. It sounds like EXECMOD applies to the whole process so 
would allow all pages within a process's address space to be modified then 
executed, regardless the backing files. Am I correct this time?

I was not saying enclaves were exempt to good security practices. What I was 
trying to say was, EPC pages are *not* subject to the same attacks as regular 
pages so I suspect there will be a desire to enforce different policies on 
them, especially after new SGX2 features/applications become available. So I 
think it beneficial to distinguish between regular vs. enclave virtual ranges. 
And to do that, a new VM_SGX flag in VMA is probably a very simple/easy way. 
And with that VM_SGX flag, we could add a new security_sgx_mprot() hook so that 
L

RE: SGX vs LSM (Re: [PATCH v20 00/28] Intel SGX1 support)

2019-05-15 Thread Xing, Cedric
Hi Andy,

> From: Andy Lutomirski [mailto:l...@kernel.org]
> 
> On Wed, May 15, 2019 at 3:46 PM James Morris  wrote:
> >
> > On Wed, 15 May 2019, Andy Lutomirski wrote:
> >
> > > > Why not just use an xattr, like security.sgx ?
> > >
> > > Wouldn't this make it so that only someone with CAP_MAC_ADMIN could
> > > install an enclave?  I think that this decision should be left up the
> > > administrator, and it should be easy to set up a loose policy where
> > > anyone can load whatever enclave they want.  That's what would happen
> > > in my proposal if there was no LSM loaded or of the LSM policy didn't
> > > restrict what .sigstruct files were acceptable.
> > >
> >
> > You could try user.sigstruct, which does not require any privs.
> >
> 
> I don't think I understand your proposal.  What file would this
> attribute be on?  What would consume it?
> 
> I'm imagining that there's some enclave in a file
> crypto_thingy.enclave.  There's also a file crypto_thingy.sigstruct.
> crypto_thingy.enclave has type lib_t or similar so that it's
> executable.  crypto_thingy.sigstruct has type sgx_sigstruct_t.  The
> enclave loader does, in effect:
> 
> void *source_data = mmap(crypto_thingy.enclave, PROT_READ | PROT_EXEC, ...);
> int sigstruct_fd = open("crypto_thingy.sigstruct", O_RDONLY);
> int enclave_fd = open("/dev/sgx/enclave", O_RDWR);
> 
> ioctl(enclave_fd, SGX_IOC_ADD_SOME_DATA, source_data + source_offset,
> enclave_offset, len, ...);
> ioctl(enclave_fd, SGX_IOC_ADD_SOME_DATA, source_data + source_offset2,
> enclave_offset2, len, ...);
> etc.
> 
> /* Here's where LSMs get to check that the sigstruct is acceptable.
> The CPU will check that the sigstruct matches the enclave. */
> ioctl(enclave_fd, SGX_INIT_THE_ENCLAVE, sigstruct_fd);

SIGSTRUCT isn't necessarily stored on disk so may not always have a fd. How 
about the following?
void *ss_pointer = mmap(sigstruct_fd, PROT_READ,...);
ioctl(enclave_fd, SGX_INIT_THE_ENCLAVE, ss_pointer);

The idea here is SIGSTRUCT will still be passed in memory so it works the same 
way when no LSM modules are loaded or basing its decision on the .sigstruct 
file. Otherwise, an LSM module can figure out the backing file (and offset 
within that file) by looking into the VMA covering ss_pointer.

> 
> /* Actually map the thing */
> mmap(enclave_fd RO section, PROT_READ, ...);
> mmap(enclave_fd RW section, PROT_READ | PROT_WRITE, ...);
> mmap(enclave_fd RX section, PROT_READ | PROT_EXEC, ...);
> 
> /* This should fail unless EXECMOD is available, I think */
> mmap(enclave_fd RWX section, PROT_READ | PROT_WRITE | PROT_EXEC);
> 
> And the idea here is that, if the .enclave file isn't mapped
> PROT_EXEC, then mmapping the RX section will also require EXECMEM or
> EXECMOD.

From security perspective, I think it reasonable to give EXECMEM and EXECMOD to 
/dev/sgx/enclave because the actual permissions are guarded by EPCM 
permissions, which are "inherited" from the source pages, whose permissions 
have passed LSM checks.

Alternatively, I think we could mark enclave VMAs somewhat differently, such as 
defining a new VM_SGX flag. The reason behind that is, enclave ranges differ 
from "regular" virtual ranges in terms of both functionality (i.e. #PF will 
have to be handled quite differently) and security, so I believe demand will 
come up to distinguish them eventually - e.g., LSM modules can then enforce 
different policies on them (by a new security_sgx_mprot() hook?).

-Cedric


RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-14 Thread Xing, Cedric
Hi Andy,

> Let me make sure I'm understanding this correctly: when an enclave tries
> to execute code, it only works if *both* the EPCM and the page tables
> grant the access, right?  This seems to be that 37.3 is trying to say.
> So we should probably just ignore SECINFO for these purposes.
> 
> But thinking this all through, it's a bit more complicated than any of
> this.  Looking at the SELinux code for inspiration, there are quite a
> few paths, but they boil down to two cases: EXECUTE is the right to map
> an unmodified file executably, and EXECMOD/EXECMEM (the distinction
> seems mostly irrelevant) is the right to create (via mmap or mprotect) a
> modified anonymous file mapping or a non-file-backed mapping that is
> executable.  So, if we do nothing, then mapping an enclave with execute
> permission will require either EXECUTE on the enclave inode or
> EXECMOD/EXECMEM, depending on exactly how this gets set up.
> 
> So all is well, sort of.  The problem is that I expect there will be
> people who want enclaves to work in a process that does not have these
> rights.  To make this work, we probably need do some surgery on SELinux.
> ISTM the act of copying (via the EADD ioctl) data from a PROT_EXEC
> mapping to an enclave should not be construed as "modifying"
> the enclave for SELinux purposes.  Actually doing this could be awkward,
> since the same inode will have executable parts and non-executable parts,
> and SELinux can't really tell the difference.
> 

Enclave files are pretty much like shared objects in that they both contain 
executable code mapped into the host process's address space. Do shared objects 
need EXECUTE to be mapped executable? If so, why would people not want EXECUTE 
in enclave files?

Wrt to which part of an executable file is executable, the limitation resides 
in security_mmap_file(), which doesn't take the range of bytes as input. It 
isn't SGX specific. I'd say just let enclaves inherit applicable checks for 
shared objects. Then it will also inherit all future enhancements transparently.

> Maybe the enclave should track a bitmap of which pages have ever been
> either mapped for write or EADDed with a *source* that wasn't PROT_EXEC.
> And then SELinux could learn to allow those pages (and only those pages)
> to be mapped executably without EXECUTE or EXECMOD or whatever
> permission.

What do you mean by "enclave" here? The enclave range (ELRANGE) created by 
mmap()'ing /dev/sgx/enclave device? My argument is, an enclave's 
sanity/insanity is determined at load time (EADD/EEXTEND and EINIT) and all 
page accesses are enforced by EPCM, so PTE permissions really don't matter. As 
I discussed in an earlier email, I'd allow RWX for any range backed by 
/dev/sgx/enclave device file by default, unless an SGX-aware LSM module/policy 
objects to that (e.g. via a new security_sgx_mprotect() LSM hook).

> 
> Does this seem at all reasonable?
> 
> I suppose it's not the end of the world if the initially merged version
> doesn't do this, as long as there's some reasonable path to adding a
> mechanism like this when there's demand for it.

Agreed!

-Cedric


RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-14 Thread Xing, Cedric
Hi Everyone,

I think we are talking about 2 different kinds of criteria for determining the 
sanity of an enclave. 

The first kind determines an enclave's sanity by generally accepted good 
practices. For example, no executable pages shall ever be writable.

The second kind determines an enclave's sanity by "who stands behind it", such 
as whether the file containing SIGSTRUCT has the proper SELinux label/type, or 
whether the signing key is trusted.

I'd say those 2 kinds of criteria should be orthogonal because they don't get 
into each other's way and it could also be beneficial to enable both at the 
same time. For example, a user may want to allow launching enclaves signed by 
himself/herself only, however, as a human being he/she may make mistakes so 
would also like to ensure no RWX pages even for those enclaves explicitly 
authorized.

I think those 2 kinds of criteria could be abstracted as 2 new LSM hooks - 
security_sgx_add_pages() and security_sgx_initialize_enclave().

security_sgx_add_pages() is invoked by SGX driver to determine if a range of 
source pages are allowed to be EADD'ed with the requested EPCM attributes, and 
by default it returns 0 (allowed) iff the requested EPCM attributes are a 
subset of the permissions of the VMA covering that range of source pages. An 
(SGX-aware) LSM module/policy could employ different criteria, such as making 
sure the source pages are backed by an enclave file (using SELinux label, for 
example). 

security_sgx_initialize_enclave() is invoked by SGX driver to determine if a 
given SIGSTRUCT is valid (hence allowed to be EINIT'ed), and it always returns 
0 (allowed) by default. An LSM implementation may enforce custom policies, such 
as whether the signing public key is trusted by the current user, or whether it 
was backed (mmap()'ed) by an authorized file (e.g. of an expected type in the 
case of SELinux, or located in a particular directory in the case of AppArmor).

With regard to SGX2/EDMM (Enclave Dynamic Memory Management) support, RW->RX 
transitions are inevitable to support certain usages such as dynamic 
loading/linking, meaning those usages may be blocked by existing policies. But 
from security perspective, I think they should be allowed by default (i.e. for 
non-SGX-aware LSM modules/policies) because such permission changes are 
inherent behaviors of the enclave itself, which is considered "sane" after 
passing all the checks performed in 
security_sgx_add_pages()/security_sgx_initialize_enclave(). Of course, an 
SGX-aware LSM module/policy shall be allowed to override. How about adding a 
new security_sgx_mprotect() LSM hook?

> From: Andy Lutomirski [mailto:l...@kernel.org]
> 
> > On May 14, 2019, at 8:30 AM, Haitao Huang
>  wrote:
> >
> >> On Tue, 14 May 2019 10:17:29 -0500, Andy Lutomirski 
> wrote:
> >>
> >> On Tue, May 14, 2019 at 7:33 AM Haitao Huang
> >>  wrote:
> >>>
> >>> On Fri, 10 May 2019 14:22:34 -0500, Andy Lutomirski
> >>> 
> >>> wrote:
> >>>
> >>> > On Fri, May 10, 2019 at 12:04 PM Jethro Beekman
> >>> > 
> >>> > wrote:
> >>> >>
> >>> >> On 2019-05-10 11:56, Xing, Cedric wrote:
> >>> >> > Hi Jethro,
> >>> >> >
> >>> >> >> ELF files are explicitly designed such that you can map them
> >>> >> >> (with
> >>> >> mmap)
> >>> >> >> in 4096-byte chunks. However, sometimes there's overlap and
> >>> >> >> you will sometimes see that a particular offset is mapped
> >>> >> >> twice because the
> >>> >> first
> >>> >> >> half of the page in the file belongs to an RX range and the
> >>> >> >> second
> >>> >> half
> >>> >> >> to an R-only range. Also, ELF files don't (normally) describe
> >>> >> >> stack, heap, etc. which you do need for enclaves.
> >>> >> >
> >>> >> > You have probably misread my email. By mmap(), I meant the
> >>> >> > enclave
> >>> >> file would be mapped via *multiple* mmap() calls, in the same way
> >>> >> as what dlopen() would do in loading regular shared object. The
> >>> >> intention here is to make the enclave file subject to the same
> >>> >> checks as regular shared objects.
> >>> >>
> >>> >> No, I didn't misread your email. My original point still stands:
> >>> >> requiring that an enclave's memory is created from one or mo

RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-10 Thread Xing, Cedric
Hi Andy and Jethro,

> > > You have probably misread my email. By mmap(), I meant the enclave
> file would be mapped via *multiple* mmap() calls, in the same way as
> what dlopen() would do in loading regular shared object. The intention
> here is to make the enclave file subject to the same checks as regular
> shared objects.
> >
> > No, I didn't misread your email. My original point still stands:
> > requiring that an enclave's memory is created from one or more mmap
> > calls of a file puts significant restrictions on the enclave's on-disk
> > representation.
> >

I was talking in the context of ELF, with the assumption that changing RW pages 
to RX is disallowed by LSM. And in that case mmap()would be the only way to 
load a page from disk without having to "write" to it. But that's just an 
example but not the focus of the discussion.

The point I was trying to make was, that the driver is going to copy both 
content and permissions from the source so the security properties established 
(by IMA/LSM) around that source page would be carried onto the EPC page being 
EADD'ed. The driver doesn't care how that source page came into existence. It 
could be mapped from an ELF file as in the example, or it could be a result 
from JIT as long as LSM allows it. The driver will be file format agnostic.

> 
> For a tiny bit of background, Linux (AFAIK*) makes no effort to ensure
> the complete integrity of DSOs.  What Linux *does* do (if so
> configured) is to make sure that only approved data is mapped
> executable.  So, if you want to have some bytes be executable, those
> bytes have to come from a file that passes the relevant LSM and IMA
> checks.  So we have two valid approaches, I think.
> 
> Approach 1: we treat SGX exactly the same way and make it so that only
> bytes that pass the relevant checks can be mapped as code within an
> enclave.  This imposes no particular restrictions on the file format
> -- we just need some API that takes an fd, an offset, and a length,
> and adds those bytes as code to an enclave.  (It could also take a
> pointer and a length and make sure that the pointer points to
> executable memory -- same effect.)

I assume "some API" is some user mode API so this approach is the same as what 
I suggested in my last email. Am I correct?

> 
> Approach 2: we decide that we want a stronger guarantee and that we
> *will* ensure the integrity of the enclave.  This means:
> 
> 2a) that we either need to load the entire thing from some approved
> file, and we commit to supporting one or more file formats.
> 
> 2b) we need to check that the eventual enclave hash is approved.  Or
> we could have a much shorter file that is just the hash and we check
> that.  At its simplest, the file could be *only* the hash, and there
> could be an LSM callback to check it.  In the future, if someone wants
> to allow enclaves to be embedded in DSOs, we could have a special ELF
> note or similar that contains an enclave hash or similar.
> 
> 2c) same as 2b except that we expose the whole SIGSTRUCT, not just the
> hash.
> 
> Here are some pros and cons of various bits:
> 
> 1 and 2a allow anti-virus software to scan the enclave code, and 2a
> allows it to scan the whole enclave.  I don't know if this is actually
> imporant.

I guess anti-virus software can scan any enclave file in *all* cases as long as 
it understands the format of that enclave. It doesn't necessary mean the kernel 
has to understand that enclave format (as enclave file could be parsed in user 
mode) or the anti-virus software has to understand all formats (if any) 
supported natively by the kernel.
 
> 
> 2a is by far the most complicated kernel implementation.
> 

Agreed. I don't see any reason 2a would be necessary.

> 2b and 2c are almost file-format agnostic.  1 is completely file
> format agnostic but, in exchange, it's much weaker.

I'd say 1 and (variants of) 2 are orthogonal. SGX always enforces integrities 
so not doing integrity checks at EADD doesn't mean the enclave integrity is not 
being enforced. 1 and 2 are basically 2 different checkpoints where LSM hooks 
could be placed. And a given LSM implementation/policy may enforce either 1 or 
2, or both, or neither. 

> 
> 2b and 2c should solve most (but not all) of the launch control
> complaints that Dr. Greg cares about, in the sense that the LSM policy
> quite literally validates that the enclave is approved.
> 
> As a straw man design, I propose the following, which is mostly 2c.
> The whole loading process works almost as in Jarkko's current driver,
> but the actual ioctl that triggers EINIT changes.  When you issue the
> ioctl, you pass in an fd and the SIGSTRUCT is loaded and checked from
> the fd.  The idea is that software that ships an enclave will ship a
> .sgxsig file that is literally a SIGSTRUCT for the enclave.  With
> SELinux, that file gets labeled something like
> sgx_enclave_sigstruct_t.  And we have the following extra twist: if
> you're calling the EADD ioctl 

RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-10 Thread Xing, Cedric
Hi Jethro,
 
> ELF files are explicitly designed such that you can map them (with mmap)
> in 4096-byte chunks. However, sometimes there's overlap and you will
> sometimes see that a particular offset is mapped twice because the first
> half of the page in the file belongs to an RX range and the second half
> to an R-only range. Also, ELF files don't (normally) describe stack,
> heap, etc. which you do need for enclaves.

You have probably misread my email. By mmap(), I meant the enclave file would 
be mapped via *multiple* mmap() calls, in the same way as what dlopen() would 
do in loading regular shared object. The intention here is to make the enclave 
file subject to the same checks as regular shared objects.

The other enclave components (e.g. TCS, heap, stack, etc.) will be created from 
writable pages within the calling process's address space as before. Similar to 
heaps/stacks in a regular process, those components don't have to be backed by 
disk files.

The core idea is for SGX driver to "copy" not only the content but also 
permissions from a source page to the target EPC page. Given the existence of 
the source page, it can be concluded that the combination of content and 
permissions of that page has passed all IMA/LSM checks applicable to that page, 
hence "copying" that page to EPC with the same (or less) permission is *not* a 
circumvention of IMA/LSM policies.

-Cedric


RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-10 Thread Xing, Cedric
Hi Dave,

> On 5/10/19 10:37 AM, Jethro Beekman wrote:
> > It does assume a specific format, namely, that the memory layout
> > (including page types/permissions) of the enclave can be represented
> in
> > a "flat file" on disk, or at least that the enclave memory contents
> > consist of 4096-byte chunks in that file.
> 
> I _think_ Cedric's point is that, to the kernel,
> /lib/x86_64-linux-gnu/libc.so.6 is a "flat file" because the kernel
> doesn't have any part in parsing the executable format of a shared
> library.
> 
> I actually don't know how it works, though.  Do we just just trust that
> the userspace parsing of the .so format is correct?  Do we just assume
> that any part of a file passing IMA checks can be PROT_EXEC?

The key idea here is for enclave files to "inherit" the checks applicable to 
regular shared objects. And how we are going to do it is for user process to 
map every loadable segment of the enclave file into its address space using 
*multiple* mmap() calls, just in the same way as dlopen() would do for loading 
regular shared objects. Then those open()/mmap() syscalls will trigger all 
applicable checks (by means of security_file_open(), security_mmap_file() and 
ima_file_mmap() hooks) transparently. That said, IMA/LSM 
implementations/policies will dictate the success/failure of those 
open()/mmap() syscalls. And by depending EPCM attributes on permissions of 
source VMAs, no EXEC page could be ever created unless the source page (which 
has to be EXEC, btw) has passed IMA/LSM checks (as if it were loaded from a 
regular shared object file).

-Cedric 



RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-10 Thread Xing, Cedric
Hi Andy,

> So I think we need a better EADD ioctl that explicitly does work on 
> PROT_READ|PROT_EXEC enclave memory but makes up for by validating the
> *source* of the data.  The effect will be similar to mapping a 
> labeled, appraised, etc file as PROT_EXEC.  Maybe, in extreme
> pseudocode:
> 
> fd = open(“/dev/sgx/enclave”);
> ioctl(fd, SGX_CREATE_FROM_FILE, file_fd); // fd now inherits the LSM 
> label from the file, or is otherwise marked.
> mmap(fd, PROT_READ|PROT_EXEC, ...);
> 
> I suppose that an alternative would be to delegate all the EADD calls 
> to a privileged daemon, but that’s nasty.

I agree with you that *source* of EPC pages shall be validated. But instead of 
doing it in the driver, I think an alternative could be to treat an enclave 
file as a regular shared object so all IMA/LSM checks could be 
triggered/performed as part of the loading process, then let the driver "copy" 
those pages to EPC. By "copy", I mean both the page content and _permissions_ 
are copied, which differs from the current driver's behavior of copying page 
content only (while permissions are taken from ioctl parameters). That said, if 
an adversary cannot inject any code into a process in regular pages, then it 
cannot inject any code to any EPC pages in that process either, simply because 
the latter depends on the former. 

Security wise, it is also assumed that duplicating (both content and 
permissions of) an existing page within a process will not threaten the 
security of that process. That assumption may not always be true but in 
practice, the current LSM architecture doesn't seem to have that in scope, so 
this proposal I think still aligns with LSM in that sense. 

If compared to the idea of "enclave loader inside kernel", I think this 
alternative is much simpler and more flexible. In particular,
* It requires minimal change to the driver - just take EPCM permissions from 
source pages' VMA instead of from ioctl parameter.
* It requires little change to user mode enclave loader - just mmap() enclave 
file in the same way as dlopen() would do, then all IMA/LSM checks applicable 
to shared objects will be triggered/performed  transparently.
* It doesn't assume/tie to specific enclave formats.
* It is extensible - Today every regular page within a process is allowed 
implicitly to be the source for an EPC page. In future, if at all 
desirable/necessary, IMA/LSM could be extended to leave a flag somewhere (e.g. 
in VMA) to indicate explicitly whether a regular page (or range) could be a 
source for EPC. Then SGX specific hooks/policies could be supported easily.

How do you think?

-Cedric


RE: [RFC PATCH v2 2/3] x86/vdso: Modify __vdso_sgx_enter_enclave() to allow parameter passing on untrusted stack

2019-04-25 Thread Xing, Cedric
Hi Sean,

> AIUI, you and Andy had an off-list discussion regarding rewriting
> __vdso_sgx_enter_enclave() vs. providing a second vDSO function.  Can
> you
> please summarize the discussion so that it's clear why you're pursuing a
> single function?
> 
> In the end I probably agree that's it's desirable to have a single ABI
> and
> vDSO function since the cost is basically that the non-callback case
> needs
> to pass params via the stack instead of registers, but I do miss the
> simplicity of separate implementations.
> 

The major reason is we don't want to maintain 2 different ABIs (preserving %rsp 
vs. %rbp). Given the new one covers all known use cases, we prefer not to keep 
the other one.

> > Here's the summary of API/ABI changes in this patch. More details
> could be
> > found in arch/x86/entry/vdso/vsgx_enter_enclave.S.
> > * 'struct sgx_enclave_exception' is renamed to 'struct
> sgx_enclave_exinfo'
> >   because it is filled upon both AEX (i.e. exceptions) and normal
> enclave
> >   exits.
> > * __vdso_sgx_enter_enclave() anchors its frame using %rbp (instead
> of %rsp in
> >   the previous implementation).
> > * __vdso_sgx_enter_enclave() takes one more parameter - a callback
> function to
> >   be invoked upon enclave exits. This callback is optional, and if not
> >   supplied, will cause __vdso_sgx_enter_enclave() to return upon
> enclave exits
> >   (same behavior as previous implementation).
> > * The callback function is given as a parameter the value of %rsp at
> enclave
> >   exit to address data "pushed" by the enclave. A positive value
> returned by
> >   the callback will be treated as an ENCLU leaf for re-entering the
> enclave,
> >   while a zero or negative value will be passed through as the return
> >   value of __vdso_sgx_enter_enclave() to its caller. It's also safe to
> >   leave callback by longjmp() or by throwing a C++ exception.
> >
> > Signed-off-by: Cedric Xing 
> > ---
> >  arch/x86/entry/vdso/vsgx_enter_enclave.S | 175 --
> -
> >  arch/x86/include/uapi/asm/sgx.h  |  14 +-
> >  2 files changed, 128 insertions(+), 61 deletions(-)
> >
> > diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > index fe0bf6671d6d..debecb0d785c 100644
> > --- a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > +++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > @@ -19,83 +19,150 @@
> >   * __vdso_sgx_enter_enclave() - Enter an SGX enclave
> >   *
> >   * @leaf:  **IN \%eax** - ENCLU leaf, must be EENTER or ERESUME
> > - * @tcs:   **IN \%rbx** - TCS, must be non-NULL
> > - * @ex_info:   **IN \%rcx** - Optional 'struct sgx_enclave_exception'
> pointer
> > + * @tcs:   **IN 0x08(\%rsp)** - TCS, must be non-NULL
> > + * @ex_info:   **IN 0x10(\%rsp)** - Optional 'struct
> sgx_enclave_exinfo'
> > + *  pointer
> > + * @callback:  **IN 0x18(\%rsp)** - Optional callback function to
> be called on
> > + *  enclave exit or exception
> >   *
> >   * Return:
> >   *  **OUT \%eax** -
> > - *  %0 on a clean entry/exit to/from the enclave, %-EINVAL if ENCLU
> leaf is
> > - *  not allowed or if TCS is NULL, %-EFAULT if ENCLU or the enclave
> faults
> > + *  %0 on a clean entry/exit to/from the enclave, %-EINVAL if ENCLU
> leaf is not
> > + *  allowed, %-EFAULT if ENCLU or the enclave faults, or a non-
> positive value
> > + *  returned from ``callback`` (if one is supplied).
> >   *
> >   * **Important!**  __vdso_sgx_enter_enclave() is **NOT** compliant
> with the
> > - * x86-64 ABI, i.e. cannot be called from standard C code.   As noted
> above,
> > - * input parameters must be passed via ``%eax``, ``%rbx`` and
> ``%rcx``, with
> > - * the return value passed via ``%eax``.  All registers except
> ``%rsp`` must
> > - * be treated as volatile from the caller's perspective, including
> but not
> > - * limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc...  Conversely, the
> enclave
> > - * being run **must** preserve the untrusted ``%rsp`` and stack.
> > + * x86-64 ABI, i.e. cannot be called from standard C code. As noted
> above,
> > + * input parameters must be passed via ``%eax``, ``8(%rsp)``,
> ``0x10(%rsp)`` and
> > + * ``0x18(%rsp)``, with the return value passed via ``%eax``. All
> other
> > + * registers will be passed through to the enclave as is. All
> registers except
> > + * ``%rbp`` must be treated as volatile from the caller's perspective,
> including
> 
> Hmm, this is my fault since I wrote the original comment, but stating
> "All registers except %rbp must be treated as volatile" is confusing,
> e.g.
> at first I thought it meant that the assembly blob was mucking with the
> registers and that they couldn't be used to pass information out of the
> enclave.
> 
> Maybe something like:
> 
>  * Register ABI:
>  *  - As noted above, input parameters are passed via %eax and the stack.
>  *  - The return value is passed via %eax.
>  *  - %rbx and %rcx must be treated as volat

RE: [RFC PATCH v1 2/3] x86/vdso: Modify __vdso_sgx_enter_enclave() to allow parameter passing on untrusted stack

2019-04-24 Thread Xing, Cedric
Hi Andy,

I sent out my RFC patch v2 last night, that has your suggestions incorporated, 
plus a new unwind test to single step through the vDSO API to test out the CFI 
directives. Hopefully it is able to address all of your concerns. It's worth 
noting that, given the current patch fixes up #DB and #BP at ENCLU, the unwind 
test cannot run to completion. I assume Sean will revise the fixup code soon.

Thanks!

-Cedric

> -Original Message-
> From: Andy Lutomirski [mailto:l...@kernel.org]
> Sent: Monday, April 22, 2019 6:26 PM
> To: Xing, Cedric 
> Cc: LKML ; X86 ML ; linux-
> s...@vger.kernel.org; Andrew Morton ; Hansen,
> Dave ; Christopherson, Sean J
> ; nhor...@redhat.com;
> npmccal...@redhat.com; Ayoun, Serge ; Katz-zamir,
> Shay ; Huang, Haitao ;
> Andy Shevchenko ; Thomas Gleixner
> ; Svahn, Kai ; Borislav Petkov
> ; Josh Triplett ; Andrew Lutomirski
> ; Huang, Kai ; David Rientjes
> ; Jarkko Sakkinen 
> Subject: Re: [RFC PATCH v1 2/3] x86/vdso: Modify
> __vdso_sgx_enter_enclave() to allow parameter passing on untrusted stack
> 
> On Mon, Apr 22, 2019 at 5:37 PM Cedric Xing 
> wrote:
> >
> > The previous __vdso_sgx_enter_enclave() requires enclaves to preserve
> > %rsp, which prohibits enclaves from allocating and passing parameters
> > for untrusted function calls (aka. o-calls).
> >
> > This patch addresses the problem above by introducing a new ABI that
> > preserves %rbp instead of %rsp. Then __vdso_sgx_enter_enclave() can
> > anchor its frame using %rbp so that enclaves are allowed to allocate
> > space on the untrusted stack by decrementing %rsp. Please note that
> > the stack space allocated in such way will be part of
> > __vdso_sgx_enter_enclave()'s frame so will be freed after
> > __vdso_sgx_enter_enclave() returns. Therefore,
> > __vdso_sgx_enter_enclave() has been changed to take a callback
> > function as an optional parameter, which if supplied, will be invoked
> > upon enclave exits (both AEX (Asynchronous Enclave
> > eXit) and normal exits), with the value of %rsp left off by the
> > enclave as a parameter to the callback.
> >
> > Here's the summary of API/ABI changes in this patch. More details
> > could be found in arch/x86/entry/vdso/vsgx_enter_enclave.S.
> > * 'struct sgx_enclave_exception' is renamed to 'struct
> sgx_enclave_exinfo'
> >   because it is filled upon both AEX (i.e. exceptions) and normal
> enclave
> >   exits.
> > * __vdso_sgx_enter_enclave() anchors its frame using %rbp (instead
> of %rsp in
> >   the previous implementation).
> > * __vdso_sgx_enter_enclave() takes one more parameter - a callback
> function to
> >   be invoked upon enclave exits. This callback is optional, and if not
> >   supplied, will cause __vdso_sgx_enter_enclave() to return upon
> enclave exits
> >   (same behavior as previous implementation).
> > * The callback function is given as a parameter the value of %rsp at
> enclave
> >   exit to address data "pushed" by the enclave. A positive value
> returned by
> >   the callback will be treated as an ENCLU leaf for re-entering the
> enclave,
> >   while a zero or negative value will be passed through as the return
> >   value of __vdso_sgx_enter_enclave() to its caller. It's also safe to
> >   leave callback by longjmp() or by throwing a C++ exception.
> >
> > Signed-off-by: Cedric Xing 
> > ---
> >  arch/x86/entry/vdso/vsgx_enter_enclave.S | 156 ++
> -
> >  arch/x86/include/uapi/asm/sgx.h  |  14 +-
> >  2 files changed, 100 insertions(+), 70 deletions(-)
> >
> > diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > index fe0bf6671d6d..210f4366374a 100644
> > --- a/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > +++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> > @@ -14,88 +14,118 @@
> >  .code64
> >  .section .text, "ax"
> >
> > -#ifdef SGX_KERNEL_DOC
> >  /**
> >   * __vdso_sgx_enter_enclave() - Enter an SGX enclave
> >   *
> >   * @leaf:  **IN \%eax** - ENCLU leaf, must be EENTER or ERESUME
> > - * @tcs:   **IN \%rbx** - TCS, must be non-NULL
> > - * @ex_info:   **IN \%rcx** - Optional 'struct sgx_enclave_exception'
> pointer
> > + * @tcs:   **IN 0x08(\%rsp)** - TCS, must be non-NULL
> > + * @ex_info:   **IN 0x10(\%rsp)** - Optional 'struct
> sgx_enclave_exinfo'
> > + *  pointer
> > + * @callback:  **IN 0x18(\%rsp)** - Optional callback function to be
> called on
&g

RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-28 Thread Xing, Cedric
> It's certainly making progress.  I like the fact that the callback is
> now unconditional (if non-NULL) rather than being used just in case of
> certain exit types.  But, if we go down this route, let's name and
> document it appropriately -- just call the function "callback" and
> document it as a function that is called just before
> __vdso_sgx_enter_enclave returns, to be used if support for legacy
> enclaves that push data onto the untrusted stack is needed.  We should
> further document that it's safe to longjmp out of it.
> 
> Also, the tests in tools/testing/selftests/x86/unwind_vdso.c should be
> augmented to test this code.
> 
> Finally, why does the vDSO code bother checking whether the leaf is
> valid?

I can document it. I'll look into unwind_vdso.c to see what kind of selftests 
will make sense here. And I'll send out a RFC patch with everything included. 
Or would you prefer to have my changes integrated into Jarkko's patch v20?

Different ENCLU leaf has different parameters. This vDSO API knows how to load 
up parameters only for EENTER and ERESUME so it errs on all other positive 
values. 0 and negative values are interpreted as return codes. 


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-27 Thread Xing, Cedric
Hi Andy,

> From: linux-sgx-ow...@vger.kernel.org [mailto:linux-sgx-
> ow...@vger.kernel.org] On Behalf Of Andy Lutomirski
> 
> I suppose the real question is: are there a significant number of
> users who will want to run enclaves created using an old SDK on Linux?
>  And will there actually be support for doing this in the software
> stack?

To your first question, I cannot share information of Intel customers or speak 
for them. But in general, people would like to stay with an old enclave usually 
because of: 1) attestation, because MRENCLAVE will change after rebuild; and/or 
2) the need to support a mix of older and newer Linux kernels. So I'd say it'll 
be commonly desired, especially when this vDSO API is still "new" (so not 
available on every platform).

To your second question, Intel will support all "legacy" enclaves built with 
older SGX SDKs on newer kernels. And that's why we are so eager to find a 
migration path. I can't speak for other companies, but guess backward 
compatibility is always desirable.

> 
> If the answer to both questions is yes, then it seems like it could be
> reasonable to support it in the vDSO.  But I still think it should
> probably be a different vDSO entry point so that the normal case
> doesn't become more complicated.

I'll support whatever you think is more appropriate.

At the end, I'd like to give out the full version of my proposal, with your 
feedbacks (i.e. stack unwinder and Spectre variant 2) addressed. I'm a bit 
concerned by retpoline, which won't work (or be needed) when CET comes online. 
Are you looking to change it again then?

Here's the summary of the changes:
 - Added CFI directives for proper unwinding.
 - Defined sgx_ex_callback - the callback function on enclave exit/exception.
 - Aligned stack properly before calling sgx_ex_callback (per x86_64 ABI).
 - Used retpoline in place of indirect call.
 - The block starting at label "4:" captures all the code necessary to support 
sgx_ex_call. It has grown longer due to retpoline.

/**
 * __vdso_sgx_enter_enclave() - Enter an SGX enclave
 *
 * %eax:ENCLU leaf, must be either EENTER or ERESUME
 * 0x08(%rsp):  TCS
 * 0x10(%rsp):  Optional pointer to 'struct sgx_enclave_exception'
 * 0x18(%rsp):  Optional function pointer to 'sgx_ex_callback', whose
 *  definition will be given below. Note that this function, if
 *  present, shall follow x86_64 ABI.
 * return:  0 (zero) on success, or a negative error code on failure.
 *
 * Note that __vdso_sgx_enter_enclave() is not compatible with x86_64 ABI.
 * All registers except RBP must be treated as volatile from the caller's
 * perspective, including but not limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc...
 * Enclave may decrement RSP, but must not increment it - i.e. existing content
 * of the stack shall be preserved.
 *
 * sgx_ex_callback - A callback function to be invoked by
 * __vdso_sgx_enter_enclave() upon exception or after the enclave exits.
 *
 * typedef int (*sgx_ex_callback)(long rdi, long rsi, long rdx,
 *  struct sgx_enclave_exception *ex_info, long r8, long r9,
 *  long rsp, void *tcs);
 *
 * Note that sgx_ex_callback shall be x86_64 ABI compliant.
 *
 * Note that other GPRs (except %rax, %rbx and %rcx) are also passed through to
 * sgx_ex_callback, even though accessing them requires assembly code.
 */
__vdso_sgx_enter_enclave:
/* prolog */
.cfi_startproc
push%rbp
.cfi_adjust_cfa_offset  8
.cfi_rel_offset %rbp, 0
mov %rsp, %rbp
.cfi_def_cfa_register   %rbp

1:  /* EENTER <= leaf <= ERESUME */
cmp $0x2, %eax
jb  5f
cmp $0x3, %eax
ja  5f

/* Load TCS and AEP */
mov 0x10(%rbp), %rbx
lea 2f(%rip), %rcx

2:  enclu

/* EEXIT path */
mov 0x18(%rbp), %rcx
jrcxz   3f
mov %eax, EX_LEAF(%rcx)
/* normalize return value */
3:  xor %eax, %eax

4:  /* call sgx_ex_callback if supplied */
cmpq$0, 0x20(%rbp)
jz  6f
/* align stack per x86_64 ABI */
mov %rsp, %rbx
and $-0x10, %rsp
/* parameters */
push0x10(%rbp)
push%rbx
/* call *0x20(%rbp) using retpoline */
mov 0x20(%rbp), %rax
call41f
/* stack cleanup */
mov %rbx, %rsp
jmp 1b
41: call43f
42: pause
lfence
jmp 42b
43: mov %rax, (%rsp)
ret

5:  /* bad leaf */
cmp $0, %eax
jle 6f
mov $(-EINVAL), %eax

6:  /* epilog */
leave
.cfi_def_cfa%rsp, 8
ret
.cfi_endproc

.pushsection.fixup, "ax"
7:  mov 0x18(%rbp), %rcx
jrcxz   8f
/* fill in ex_info */
mov %eax, EX_LEAF(%rcx)
mov %di, EX_TRAPNR(%rcx)
mov %si, EX_ERROR_CODE

RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-25 Thread Xing, Cedric
> On Mon, Mar 25, 2019 at 11:03 AM Sean Christopherson
>  wrote:
> >
> > On Sun, Mar 24, 2019 at 01:59:48AM -0700, Xing, Cedric wrote:
> > > As said in my previous email, this vDSO API isn't even compliant to
> > > x86_64 ABI and is absolutely NOT for average developers. Instead,
> > > host/enclave communications are expected to be handled by SDKs and
> > > those developers will be very aware of the limitations of their
> > > targeted environments, and will need the freedom to deploy optimal
> solutions.
> 
> > I fully realize that the above approach saddles Cedric and the SDK
> > team with the extra task of justifying the need for two vDSO
> > interfaces, and likely reduces the probability of their proposal being
> > accepted.  But, we don't *force* the SDK to be rewritten, and we gain
> > a vDSO interface that many people want and is acceptable to the
> > maintainers (unless I've horribly misread Andy's position).
> 
> I don't think you've horribly misread it.  I would like to keep the
> stuff in the vDSO as minimal as possible.  If we need to add a fancier
> interface down the line, then that's fine.

Andy, I don't know "many people" is how many in Sean's email. I couldn't tell 
you how long it took us to settle on the current SGX ISA because you would just 
LAUGH! Why? Because it took insanely ridiculously long. Why that long? Because 
the h/w and u-code teams would like to trim down the ISA as much as possible. 
The fact is, whatever new is released, Intel will have to maintain it on all 
future processors FOREVER! That means significant/on-going cost to Intel. So 
any addition to ISA has to undergo extensive reviews that involve all kinds of 
experts from both within Intel and externally, and would usually take years, 
before you can see what you are seeing today. As I said in my earlier emails, 
RBP is NOT needed for interrupt/exception handlers, then how did RBP end up 
being restored at AEX? You can guess how many people were standing behind it! 
Sean has no clue! I can assure you!

Guess we've talked enough on the technical front. So let's talk about it on the 
business front. Let me take a step back. Let's say you are right, all enclaves 
would eventually be coded in the way you want. We (Intel SDK team) were 
convinced to follow your approach. But there were existing enclaves and a 
migration path would be needed. Would you like to support us? It'd be only 9 
LOC on your side but how much would incur on our side? If you believe you are 
doing right thing, then acceptance is the next thing you should think of. You 
should offer an easy path for those who did "wrong" to get on your "right" 
boat. Don't you think so?

-Cedric


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-25 Thread Xing, Cedric
> I think you're misunderstanding me.  I'm not talking about security at
> all here.  SGX isn't a sandbox, full stop.  I'm talking about the degree
> to which an SGX enclave acts like a well-behaved black box.

Any meaningful communication requires an agreement in place. The host and the 
enclave could be either in agreement, or not in an agreement. In the former 
case, the enclave will behave while in the latter case it will misbehave. The 
thing is, if an agreement between them says - "Don't you enclave touch the 
stack", and if the enclave behaves, then it wouldn't touch the stack; or the 
enclave misbehaves, then that "agreement" CANNOT stop it from doing so, 
REGARDLESS what that agreement is.

The point is, an agreement must exist for host/enclave communication. The ABI 
limits what kinds of agreements they can bind to, but can NEVER enforce an 
agreement. The difference between Sean's ABI and mine is that mine is more 
relaxing (i.e. allows larger variety of agreements) but otherwise identical 
functionally. I truly hope you can understand that.
 
> 
> >
> > > I’m going to put my vDSO maintainer hat on for a minute.  Cedric,
> > > your proposal has the following issues related specifically to the
> vDSO:
> > >
> > > It inherently contains indirect branches.  This means that, on
> > > retpoline configurations, it probably needs to use retpolines.  This
> > > is doable, but it’s nasty, and you need to worry about register
> clobbers.
> >
> > Only the weakest link matters in security. With dynamic linking in
> use, this additional indirect CALL can't make things worse. But I'm open
> to, and in fact also willing to, apply whatever mitigation that you
> think is satisfactory (or that has been applied to other indirect
> branches, such as in PLT), such as retpoline. Btw, don't worry about
> register clobbers because we have at least %rax at our disposal.
> 
> There is no actual fundamental reason that dynamic linking has to work
> this way, and in principle, one could even use retpolines to the call
> the vDSO.  In any event, the vDSO is currently compiled with retpolines
> enabled, and if we decide to turn that off, it would be decision to be
> made independently of SGX.

Don't get me wrong! I'm just saying dynamic linking requires indirect branches 
so whatever mitigation used there can also apply here. I'm willing to implement 
the same mechanism as generally accepted in other occasions. If retpoline is 
the one then I will just do it.

Btw, retpoline won't work with CET though.

> 
> >
> > >
> > > It uses effectively unbounded stack space. The vDSO timing functions
> > > are already a problem for Go, and this is worse.
> >
> > If targeting the same functionality (i.e. no exit callback), my API
> uses exactly 24 bytes more than Sean's. Is it really the case that those
> 24 bytes will break Go?
> 
> You're counting wrong.  Your version uses 24 bytes + the stack size of
> the exit handler + the amount of stack consumed by the enclave, which is
> effectively unbounded.  So this whole scheme becomes unusable on
> anything other than a stack that is "large" for a totally undefined
> value of large and that has guard pages.

You misread. I said "targeting the same functionality", meaning no exit 
callback is used (because Sean's ABI doesn't support it). And because no 
callbacks will be made, only 24 more bytes will be needed.

And at this point I'm trying to stress the fact that my proposal is a superset 
of Sean's in terms of functionality - i.e. my proposal can do all that Sean's 
can do. For example, if the enclave is coded NOT to use the untrusted stack, 
e.g. in Fortanix's case, then it won't use the stack and the callback is 
unnecessary (and shall be set to NULL). That is, mine will work exactly the 
same as Sean's in the case of enclave not touching the stack. My apology for 
being excessively verbose here but your comment above prompts me that you may 
NOT have realized that my proposal will work exactly the SAME as Sean's when 
exit callback is absent (NULL).

> 
> >
> > >
> > > Cedric, your proposal allows an enclave to muck with RSP, but not in
> > > a way that’s particularly pleasant.
> >
> > From security perspective, it is SGX ISA, but NOT any particular ABI,
> that allows enclaves "to muck with RSP".
> 
> Again, this has nothing to do with security.  With your proposal, it's
> not possible for the caller of an enclave to decide, in an ocall
> handler, to pause and do something else.  This isn't just theoretical.
> Suppose someone wants to send a network request in an ocall handler.
> With the current RSP approach, it's difficult to do this in a program
> that uses poll / select / epoll -- you can't return out from the ocall
> until you have an answer.

Andy, your comment here further confirms that you have NOT understood my 
proposal.

In the case the o-call parameters are passed in registers (or separate 
buffers), exit handler is NOT needed (and should be set to NULL), and the API 
will j

RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-24 Thread Xing, Cedric
Hi Andy,

Thank you for your valuable feedbacks!

Per what you have been saying, your feedbacks come from different angles - i.e. 
functionality vs. security, but they are mixed up somehow. As an effort to make 
the discussion more constructive going forward, I'd like you to acknowledge 
that, in terms of functionality, my proposal is a superset of the current 
patch, as I have proven by implementing Sean's API using mine as a subroutine. 
That said, if you are satisfied with his, you should be satisfied with mine as 
well, from functional perspective. And because of that, I'll try to 
interpret/address your concerns from security perspective unless otherwise 
noted. I'm aware that there's still subtle difference between Sean's API and 
mine - e.g. my proposal consumes 24 bytes more stack space (for the same 
functionality, i.e. exit callback is null) than his, due to the additional 
parameters. But I don't believe that would become a "make it or break it" 
situation in practice.

> I’m going to put my vDSO maintainer hat on for a minute.  Cedric, your
> proposal has the following issues related specifically to the vDSO:
> 
> It inherently contains indirect branches.  This means that, on retpoline
> configurations, it probably needs to use retpolines.  This is doable,
> but it’s nasty, and you need to worry about register clobbers.

Only the weakest link matters in security. With dynamic linking in use, this 
additional indirect CALL can't make things worse. But I'm open to, and in fact 
also willing to, apply whatever mitigation that you think is satisfactory (or 
that has been applied to other indirect branches, such as in PLT), such as 
retpoline. Btw, don't worry about register clobbers because we have at least 
%rax at our disposal.

> 
> It uses effectively unbounded stack space. The vDSO timing functions are
> already a problem for Go, and this is worse.

If targeting the same functionality (i.e. no exit callback), my API uses 
exactly 24 bytes more than Sean's. Is it really the case that those 24 bytes 
will break Go?

> 
> And with my vDSO hat back off, I find it disappointing that SGX SDKs
> seem willing to couple the SGX enclaves so tightly to their host ABIs.
> An *unmodified* SGX enclave should be able to run, without excessive
> annoyance, in a Windows process, a Linux process, a C process, a Java
> process, a Go process, and pretty much any other process.  Saying “I’ll
> just recompile it” is a bad solution — for enclaves that use MRENCLAVE,
> you can’t, and for enclaves that use MRSIGNER, you need to deal with the
> fact the protecting the signing key is a big deal.
> Someone should be able to port the entire host program to a different
> language without losing secrets and without access to a signing key.

I'm not sure which SGX SDKs you are referring to. But for Intel SGX SDK, we 
defined our own ABI that is consistent between Windows and Linux - i.e. there's 
no technical problem to load on Windows an enclave built on Linux or vice 
versa. In terms of what programming languages they can work with, I have to say 
it was designed exclusively for C/C++. Fortunately, there's usually a "native" 
interface (e.g. JNI, cgo, etc.) supported by a language runtime so it hasn't 
been a roadblock so far. Alternatively, the enclave vendor could ship an 
enclave along with an "interface" shared object that encapsulates all of the 
marshaling specifics, then the combination of that enclave and its "interface" 
shared object may be able to work "universally", which should be close to what 
you want.

The idea we had, when Intel SGX SDK was designed, was that different SDKs would 
be developed for different languages to take advantage of specific language 
features. That is similar to different programming languages were invented to 
target different usages. As we all know, every programming language has both 
advantages and disadvantages, hence no single language dominates. And that same 
idea applies to SGX SDKs. If there existed an SDK that worked with everything, 
probably it wouldn't work well with anything.

> 
> Cedric, your proposal allows an enclave to muck with RSP, but not in a
> way that’s particularly pleasant.

From security perspective, it is SGX ISA, but NOT any particular ABI, that 
allows enclaves "to muck with RSP". 

> Since the ISA is set in stone, we
> can’t do anything about the enclave’s access to its caller’s registers.
> I would love to see a straightforward way to run an enclave such that it
> does not access the main untrusted stack at all — uRSP and uRBP should
> be arbitrary values passed in the untrusted code, and the values the
> enclave sets should be relayed back to the caller but otherwise not have
> any effect.  Sadly I see no way to do this short of using GSBASE to
> store the real untrusted stack pointer.

I understand your sadness. You are "hoping" SGX to be a sandbox technology 
(i.e. to prevent enclave from reaching out into the host) but that wasn't the 
security 

RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-23 Thread Xing, Cedric
Hi Sean,

> Although its just 9 LOC, consider its impact on someone who is looking
> at
> the kernel's SGX support for the first time.  Questions they may have
> when
> looking at the vDSO code/documentation:
> 
>   - What's an exit handler?
>   - Why is an exit handler optional?  Don't I always want to handle
> exits?
>   - What value should my exit handler return?
>   - What should my exit handler do if it detects an error?
>   - Why would I want to preserve %rbp and not %rsp?
>   - Isn't it insecure to use the untrusted stack in my enclave?
> 
> AFAIK, the only reason to preserve %rbp instead of %rsp, i.e. support an
> "exit handler" callback, is to be able to implement an o-call scheme
> using
> the untrusted stack to pass data.  Every idea I came up with for using
> the
> callback, e.g. logging, handling stack corruptiong, testing hooks,
> etc...
> was at worst no more difficult to implement when using a barebones vDSO.
> 
> So, given the choice between a) documenting and maintaining all the
> baggage
> that comes with the exit handler and b) saying "go use signals", I chose
> option b.

Disagreed!

This API is NOT even x86_64 compatible and NOT intended to be used by average 
developers. Instead, this API will be used by SGX SDK vendors who have all the 
needed background/expertise. And flexibility is way more important to them than 
reduced documentation. Just imagine how much one needs to read to understand 
how SGX works, do you really think a function comprised of 20 or so LOC will be 
a big deal? 

Anyway, the documentation needed IMO will not exceed even 1 page, which will be 
way shorter than most of docs in kernel source tree. I'll be more than happy to 
help you out if that's out of your competence!

Regarding maintenance, I see an API may require maintenance for 2 possible 
categories of reasons: 1) its interface cannot satisfy emerging applications; 
or 2) the infrastructure it relies on has changed. Generally speaking, a more 
generic API with less assumption/dependence on other components will impose 
lower maintenance cost in the long run. Comparing our proposals, they share the 
same dependences (i.e. SGX ISA and vDSO extable) but mine is more generic (as 
yours could be implemented using mine as a subroutine). Thus, I bet your 
proposal will impose higher maintenance cost in the long run.

-Cedric


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-22 Thread Xing, Cedric
Hi Andy,

> > >  - It will make it quite unpleasant to call into an enclave in a
> > > coroutine depending on how the host untrusted runtime implements
> > > coroutines.
> >
> > I'm not sure what you are referring to by "coroutine". But this vDSO
> API will be (expected to be) the only routine that actually calls into
> an enclave. Isn't that correct?
> 
> I mean use in languages and runtimes that allow a function and its
> callees to pause and then resume later.  Something like (pseudocode,
> obviously):
> 
> void invoke_the_enclave()
> {
>   do_eenter_through_vdso();
> }
> 
> void some_ocall_handler(void *ptr)
> {
>   yield;
> }

Thank you very much for your detailed explanation. This looks more about 
whether or not the untrusted stack will remain valid after EEXIT, than whether 
the ocall will be paused or not. As in your example code above, a problem may 
occur if "yield" destroys the stack of its caller. But is that a common 
behavior of "yield" (or any scheduler at all)? 

Your point is well received though - Not every enclave can/shall assume 
existence or size of an untrusted stack. Therefore I've made sure my proposal 
will work no matter the enclave touches the untrusted stack or not.

> 
> If the enclave has ptr pointing to the untrusted stack, then this gets
> quite awkward for the runtime to handle efficiently.  IMO a much nicer
> approach would be:
> 
> void invoke_the_enclave()
> {
>   char buffer[1024];
>   while (true)
>   {
> eenter (through vdso);
> if (exit was an ocall request) {
>   some_ocall_handler(buffer);
> }
>   }
> }
> 
> And now there is nothing funny happening behind the runtime's back when
> some_ocall_handler tries to yield.

Agreed. 

In fact, Mr. Christopherson's API could be implemented using mine as a 
subroutine (please see below). So your "nicer approach" will continue to work 
as long as it works with current patch. However, please keep in mind that your 
"nicer" approach doesn't have to be the "only" approach.

The code snippet below shows an equivalent implementation of Mr. 
Christopherson's API using mine as a subroutine, except that RBP cannot be used 
as a parameter to the enclave because it will be overwritten before EENTER. To 
distinguish his API and mine, they are renamed to 
__vdso_sgx_enter_enclave_Christopherson and __vdso_sgx_enter_enclave_Xing, 
respectively. 

__vdso_sgx_enter_enclave_Christopherson:
push$0  /* No "exit callback" provided */
push%rcx/* Optional pointer to 'struct sgx_enclave_exception' */
push%rbx/* TCS */
call__vdso_sgx_enter_enclave_Xing
add $24, %rsp
ret

Thanks!

-Cedric


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-20 Thread Xing, Cedric
> On Wed, Mar 20, 2019 at 12:57:52PM -0700, Xing, Cedric wrote:
> > > Using the untrusted stack as a way to exchange data is very
> > > convenient, but that doesn't mean it's a good idea.  Here are some
> > > problems it
> > > causes:
> > >
> > >  - It prevents using a normal function to wrap enclave entry (as
> > > we're seeing with this patch set).
> >
> > It doesn't prevent.
> 
> Yes it does, keyword being "normal".  Though I guess we could do a bit
> of bikeshedding on the definition of normal...

I don't understand what you mean by "normal". As I said, I tend to have a 
x86_64 ABI compliant version and by saying that I mean it'd be a 100% "normal" 
function callable from C. And the version I provided in this thread is a 
trimmed down version that doesn't preserve any registers except RSP/RBP so a C 
wrapper will be necessary. Other than that I'm not aware of any anomalies. 
Could you elaborate on what "abnormal" operations necessary to invoke this vDSO 
under what circumstances? And it'll be very helpful if you could present a 
"normal" function to demonstrate how your code could work with it while mine 
couldn't.

> Actually, this series doesn't force anything on Intel's SDK, as there is
> nothing in the documentation that states the vDSO *must* be used to
> enter enclaves.  In other words, unless it's expressly forbidden,
> applications are free to enter enclaves directly and do as they wish
> with the untrusted stack.  The catch being that any such usage will need
> to deal with enclave exceptions being delivered as signals, i.e. the
> vDSO implementation is a carrot, not a stick.

If you want to bike-shedding on *must*, well, no one *must* use *anything* from 
the *anyone*! But is that the expectation? Or if you don't expect your API to 
be used then what are you doing here?

Intel SDK doesn't have to use this API. But we (the SDK team) are truly willing 
to use this API because we share the same concern with you over signals and 
would like to move to something better.

> 
> AIUI, excepting libraries that want to manipulate the untrusted stack,
> there is a general consensus that the proposed vDSO implementation is
> the right approach, e.g. full x86_64 ABI compatibility was explored in
> the past and was deemed to add unnecessary complexity and overhead.
> 
> The vDSO *could* be written in such a way that it supports preserving
> RBP or RSP, but for all intents and purposes such an implementation
> would yield two distinct ABIs that just happen to be implemented in a
> single function.
> And *if* we want to deal with maintaining two ABIs, supporting the
> kernel's existing signal ABI is a lot cleaner (from a kernel perspective)
> than polluting the vDSO.

Disagreed! What I'm proposing is one ABI - enclave preserves RBP! No 
requirements on RSP. Of course RSP is still interpreted as the line between 
vacant and used parts of the stack, or nothing will work regardless the 
proposal.

The hosting process may have an agreement with the enclave to preserve RSP. But 
that would be completely between them, and would be just a coincidence instead 
of a consequence of the ABI from the perspective of this vDSO API.

> 
> In other words, if there is a desire to support enclaves which modify
> the untrusted stack, and it sounds like there is, then IMO our time is
> better spent discussing whether or not to officially support the signal
> ABI for enclaves.

Disagreed! We share the same concern over signals so let's work this out! 

> 
> > > It assumes that the untrusted stack isn't further constrained by
> > > various CFI mechanisms (e.g. CET), and, as of last time I checked,
> > > the interaction between CET and SGX was still not specified.
> >
> > I was one of the architects participating in the CET ISA definition.
> > The assembly provided was crafted with CET in mind and will be fully
> > compatible with CET.
> >
> > > It also
> > > assumes that the untrusted stack doesn't have ABI-imposed layout
> > > restrictions related to unwinding, and, as far as I know, this means
> > > that current enclaves with current enclave runtimes can interact
> > > quite poorly with debuggers, exception handling, and various crash
> > > dumping technologies.
> >
> > Per comments from the patch set, I guess it's been agreed that this
> > vDSO function will NOT be x86_64 ABI compatible. So I'm not sure why
> > stacking unwinding is relevant here.
> 
> I think Andy's point is that a single PUSH (to save %rcx) won't break
> unwinding, etc..., but unwind

RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-20 Thread Xing, Cedric
> > By requiring preservation of RSP at both AEX and EEXIT, this precludes
> > the possibility of using the untrusted stack as temporary storage by
> > enclaves. While that looks reasonable at first glance, I'm afraid it
> > isn't the case in reality. The untrusted stack is inarguably the most
> > convenient way for data exchange between an enclave and its enclosing
> > process,
> 
> I vehemently disagree with "inarguably".  IMO, passing data via
> registers is much more convenient.

Which is the most convenient approach is always dependent on data size and/or 
even how the data is produced/consumed. It's kind of a spectrum and we're just 
talking in the sense of probability. You are right that "inarguably" is 
arguable if the buffer is small enough to fit in registers, and the 
producer/consumer also has access to registers.

> 
> Even if you qualify your assertion with "data of arbitrary size unknown
> at build time", I still disagree.  Using the untrusted stack allows for
> trickery when a debugger is involved, other than that I see no
> advantages over allocating virtual memory and handing the pointer to the
> enclave at launch time.  Sure, it requires a few more lines of code to
> setup, but it's literally ~20 LoC out of thousands required to sign,
> build and launch an enclave, but it doesn't require playing games with
> the stack.

I'm NOT ruling out your approach.

And like you said, the untrusted stack enables certain trickery that helps 
debugging and also simplifies enclaves (even just a little). Then why are you 
trying to rule that out? Because of 9 LOC in vDSO?

> 
> Not to mention that the entire concept of using the untrusted stack is
> based on the assumption that the enclave is making ocalls, e.g.
> stateless enclaves or libraries that use a message queue have zero
> need/benefit for using the untrusted stack.

Don't get me wrong. I never said enclaves would require untrusted stack to make 
ocalls, or ocalls would require untrusted stack to make. It's just a generic 
approach for sharing/exchanging data. Some enclaves my need it, others may not.

My question still remains: why do you want to rule it out?

> > and is in fact being used for that purpose by almost all existing
> > enclaves to date.
> 
> That's a bit misleading, since almost all existing enclaves are built
> against Intel's SDK, which just so happens to unconditionally use the
> untrusted stack.  It's not like all enclave developers made a concious
> decision to use the untrusted stack.  If Intel rewrote the SDK to use a
> different method then one could argue that the new approach is the most
> common method of passing data.

Everything exists for a reason. It's unimportant what has been done. What 
matters is why that was done in that particular way. I was trying to inspire 
thinking.

> 
> > Given the expectation that this API will be used by all future SGX
> > application, it looks unwise to ban the most convenient and commonly
> > used approach for data exchange.
> >
> > Given an enclave can touch everything (registers and memory) of the
> > enclosing process, it's reasonable to restrict the enclave by means of
> > "calling convention" to allow the enclosing process to retain its
> > context. And for that purpose, SGX ISA does offer 2 registers (i.e.
> > RSP and RBP) for applications to choose. Instead of preserving RSP,
> > I'd prefer RBP, which will end up with more flexibility in all SGX
> > applications in future.
> 
> I disagree that the SGX ISA intends for applications to choose between
> preserving RSP and RBP, e.g. the SDM description of SSA.UR{B,S}P states:
> 
>   Non-Enclave (outside) {RBP,stack} pointer. Saved by EENTER, restored
> on AEX.
> 
> To me, the "Saved/restored" wording implies that URBP and URSP should
> *never* be touched by the enclave.  Sure, the proposed vDSO interface
> doesn't require RBP to be preserved, but only because the goal was to
> deviate from hardware as little as possible, not because anyone wants to
> encourage enclaves to muck with RBP.

I'm so sorry to tell you that you have misunderstood the SDM. If this is a 
common misunderstanding, I guess I could talk to the architect responsible for 
this SDM chapter to see if we could amend the language.

The purpose of restoring RSP is because software needs a stack to handle 
exception. Well, that's not 100% accurate because it's a user mode stack. 
Anyway, it tells the used part from the unused space in the stack. RBP on the 
other hand is NEVER required from interrupt/exception handling perspective, but 
we decided to add it because we'd like to offer a choice, just like I said 
earlier. The calling thread could then anchor its frame on either RSP or RBP.  


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-20 Thread Xing, Cedric
> > By requiring preservation of RSP at both AEX and EEXIT, this precludes
> the possibility of using the untrusted stack as temporary storage by
> enclaves. While that looks reasonable at first glance, I'm afraid it
> isn't the case in reality. The untrusted stack is inarguably the most
> convenient way for data exchange between an enclave and its enclosing
> process, and is in fact being used for that purpose by almost all
> existing enclaves to date. Given the expectation that this API will be
> used by all future SGX application, it looks unwise to ban the most
> convenient and commonly used approach for data exchange.
> 
> For reference, here's the code in the Intel toolchain responsible for
> this:
> https://github.com/intel/linux-
> sgx/blob/6a0b5ac71f8d16f04e0376f3b2168e80c773dd23/sdk/trts/trts.cpp#L125
> -L140
> 
> Regarding "almost all existing enclaves to date", enclaves built with
> the Fortanix toolchain don't touch the untrusted stack.
> 
> --
> Jethro Beekman | Fortanix

Thanks for providing the references. Yes, not every enclave touches the 
untrusted stack so I used the word "almost".

Everything exists for a reason. By bringing up what is done today, I was trying 
to inspire thinking on the more important question of "why is it done this way 
today?".


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-20 Thread Xing, Cedric
> Using the untrusted stack as a way to exchange data is very convenient,
> but that doesn't mean it's a good idea.  Here are some problems it
> causes:
> 
>  - It prevents using a normal function to wrap enclave entry (as we're
> seeing with this patch set).

It doesn't prevent. It's all about what's agreed between the enclave and its 
hosting process. With the optional "exit/exception callback" set to null, this 
will behave exactly the same as in the current patch. That's what I meant by 
"flexibility" and "superset of functionality".

> 
>  - It makes quite a few unfortunate assumptions about the layout of the
> untrusted stack.  It assumes that the untrusted stack is arbitrarily
> expandable, which is entirely untrue in languages like Go.

I'm with you that stack is not always good thing, hence I'm NOT ruling out any 
other approaches for exchanging data. But is stack "bad" enough to be ruled out 
completely? The point here is flexibility because the stack could be "good" for 
its convenience. After all, only buffers of "reasonable" sizes will be 
exchanged in most cases, and in the rare exceptions of stack overflow they'd 
probably get caught in validation anyway. The point here again is - 
flexibility. I'd say it's better to leave the choice to the SDK implementers 
than to force the choice on them.

> It assumes that the untrusted stack isn't further constrained by various
> CFI mechanisms (e.g. CET), and, as of last time I checked, the
> interaction between CET and SGX was still not specified.  

I was one of the architects participating in the CET ISA definition. The 
assembly provided was crafted with CET in mind and will be fully compatible 
with CET.

> It also
> assumes that the untrusted stack doesn't have ABI-imposed layout
> restrictions related to unwinding, and, as far as I know, this means
> that current enclaves with current enclave runtimes can interact quite
> poorly with debuggers, exception handling, and various crash dumping
> technologies.

Per comments from the patch set, I guess it's been agreed that this vDSO 
function will NOT be x86_64 ABI compatible. So I'm not sure why stacking 
unwinding is relevant here. However, I'm with you that we should take 
debugging/exception handling/reporting/crash dumping into consideration by 
making this vDSO API x86_64 ABI compatible. IMO it's trivial and the 
performance overhead in negligible (dwarfed by ENCLU anyway. I'd be more than 
happy to provide a x86_64 ABI compatible version if that's also your preference.

>  - It will make it quite unpleasant to call into an enclave in a
> coroutine depending on how the host untrusted runtime implements
> coroutines.

I'm not sure what you are referring to by "coroutine". But this vDSO API will 
be (expected to be) the only routine that actually calls into an enclave. Isn't 
that correct?

> 
> So I think it's a *good* thing if the effect is to make enclave SDKs
> change their memory management so that untrusted buffers are explicitly
> supplied by the host runtime.  

Intel SGX SDK will change no matter what. The point is flexibility, is to offer 
choices and let SDK implementers decide, instead of deciding for them ahead of 
time. 

> Honestly, I would have much preferred if
> the architecture did not give the enclave access to RSP and RBP at all.
> (And, for that matter, RIP.)

This reminds me of PUSHA/POPA instructions. We once thought those instruction 
would be appreciated by compilers but the fact turns out that most compilers 
prefer a mix of caller-saved/callee-saved GPRs instead of treating all GPRs 
caller or callee saved. Then when we believed everyone would prefer a mix after 
so many years, an exception emerged as GO was invented. That said, flexibility 
is the point and is the most important thing ISA is always trying to offer. The 
rest is just software convention. So we decided not to enforce RBP/RSP, unless 
there are security implications - e.g. RIP - EEXIT will be considered an 
indirect branch instruction and will have to land on an ENDBR once CET comes 
out.


RE: [PATCH v19,RESEND 24/27] x86/vdso: Add __vdso_sgx_enter_enclave() to wrap SGX enclave transitions

2019-03-20 Thread Xing, Cedric
> +/**
> + * __vdso_sgx_enter_enclave() - Enter an SGX enclave
> + *
> + * %eax:ENCLU leaf, must be EENTER or ERESUME
> + * %rbx:TCS, must be non-NULL
> + * %rcx:Optional pointer to 'struct sgx_enclave_exception'
> + *
> + * Return:
> + *  0 on a clean entry/exit to/from the enclave
> + *  -EINVAL if ENCLU leaf is not allowed or if TCS is NULL
> + *  -EFAULT if ENCLU or the enclave faults
> + *
> + * Note that __vdso_sgx_enter_enclave() is not compliant with the x86-
> 64 ABI.
> + * All registers except RSP must be treated as volatile from the
> +caller's
> + * perspective, including but not limited to GPRs, EFLAGS.DF, MXCSR,
> FCW, etc...
> + * Conversely, the enclave being run must preserve the untrusted RSP
> and stack.

By requiring preservation of RSP at both AEX and EEXIT, this precludes the 
possibility of using the untrusted stack as temporary storage by enclaves. 
While that looks reasonable at first glance, I'm afraid it isn't the case in 
reality. The untrusted stack is inarguably the most convenient way for data 
exchange between an enclave and its enclosing process, and is in fact being 
used for that purpose by almost all existing enclaves to date. Given the 
expectation that this API will be used by all future SGX application, it looks 
unwise to ban the most convenient and commonly used approach for data exchange.

Given an enclave can touch everything (registers and memory) of the enclosing 
process, it's reasonable to restrict the enclave by means of "calling 
convention" to allow the enclosing process to retain its context. And for that 
purpose, SGX ISA does offer 2 registers (i.e. RSP and RBP) for applications to 
choose. Instead of preserving RSP, I'd prefer RBP, which will end up with more 
flexibility in all SGX applications in future.

> + * __vdso_sgx_enter_enclave(u32 leaf, void *tcs,
> + *   struct sgx_enclave_exception *exception_info)
> + * {
> + *   if (leaf != SGX_EENTER && leaf != SGX_ERESUME)
> + *   return -EINVAL;
> + *
> + *   if (!tcs)
> + *   return -EINVAL;
> + *
> + *   try {
> + *   ENCLU[leaf];
> + *   } catch (exception) {
> + *   if (e)
> + *   *e = exception;
> + *   return -EFAULT;
> + *   }
> + *
> + *   return 0;
> + * }
> + */
> +ENTRY(__vdso_sgx_enter_enclave)
> + /* EENTER <= leaf <= ERESUME */
> + cmp $0x2, %eax
> + jb  bad_input
> +
> + cmp $0x3, %eax
> + ja  bad_input
> +
> + /* TCS must be non-NULL */
> + test%rbx, %rbx
> + je  bad_input
> +
> + /* Save @exception_info */
> + push%rcx
> +
> + /* Load AEP for ENCLU */
> + lea 1f(%rip),  %rcx
> +1:   enclu
> +
> + add $0x8, %rsp
> + xor %eax, %eax
> + ret
> +
> +bad_input:
> + mov $(-EINVAL), %rax
> + ret
> +
> +.pushsection .fixup, "ax"
> + /* Re-load @exception_info and fill it (if it's non-NULL) */
> +2:   pop %rcx
> + test%rcx, %rcx
> + je  3f
> +
> + mov %eax, EX_LEAF(%rcx)
> + mov %di,  EX_TRAPNR(%rcx)
> + mov %si,  EX_ERROR_CODE(%rcx)
> + mov %rdx, EX_ADDRESS(%rcx)
> +3:   mov $(-EFAULT), %rax
> + ret
> +.popsection
> +
> +_ASM_VDSO_EXTABLE_HANDLE(1b, 2b)
> +
> +ENDPROC(__vdso_sgx_enter_enclave)

Rather than preserving RSP, an alternative that preserves RBP will allow more 
flexibility inside SGX applications. Below is the assembly code based on that 
idea, that offers a superset of functionality over the current patch, yet at a 
cost of just 9 more lines of code (23 LOC here vs. 14 LOC in the patch).

/**
 * __vdso_sgx_enter_enclave() - Enter an SGX enclave
 *
 * %eax:ENCLU leaf, must be either EENTER or ERESUME
 * 0x08(%rsp):  TCS
 * 0x10(%rsp):  Optional pointer to 'struct sgx_enclave_exception'
 * 0x18(%rsp):  Optional function pointer to 'sgx_exit_handler', defined below
 *  typedef int (*sgx_exit_handler)(struct sgx_enclave_exception 
*ex_info);
 * return:  Non-negative integer to indicate success, or a negative error
 *  code on failure.
 *
 * Note that __vdso_sgx_enter_enclave() is not compatible with x86_64 ABI.
 * All registers except RBP must be treated as volatile from the caller's
 * perspective, including but not limited to GPRs, EFLAGS.DF, MXCSR, FCW, etc...
 * Enclave may decrement RSP, but must not increment it - i.e. existing content
 * of the stack shall be preserved.
 */
__vdso_sgx_enter_enclave:
push%rbp
mov %rsp, %rbp

/* EENTER <= leaf <= ERESUME */
1:  cmp $0x2, %eax
jb  bad_input
cmp $0x3, %eax
ja  bad_leaf

/* Load TCS and AEP */
mov 0x10(%rbp), %rbx
lea 2f(%rip), %rcx

2:  enclu

mov 0x18(%rbp), %rcx
jrcxz   3f
/* Besides leaf, this instruction also zeros trapnr and error_code */
mov %rax, EX_LEAF(%rcx)

3:  mo