Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data

2022-11-14 Thread Michael Sammler



> We're currently working on a feature in chromium that uses pkeys for
> in-process isolation. Being able to use the pkey state in the seccomp
> filter would be pretty useful for this. For example, it would allow
> us to enforce that no code outside the isolated thread would ever
> map/mprotect executable memory.
> We can probably do something similar by adding instruction pointer
> checks to the seccomp filter, but that feels quite hacky and this
> feature would make a much nicer implementation.
>
> Are there any plans to make a version 2 of this patch?

Thanks for your interest in this patch, but I am now working on other projects 
and currently don't plan to make a version 2 of this patch.


Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data

2018-10-30 Thread Michael Sammler

On 10/29/2018 11:33 PM, Dave Hansen wrote:

But, that's really an implementation detail.  The effect on the ABI and
how this might constrain future pkeys use is my bigger worry.

I'd also want to make sure that your specific use-case is compatible
with all the oddities of pkeys, like the 'clone' and signal behavior.
Some of that is spelled out here:

http://man7.org/linux/man-pages/man7/pkeys.7.html



I think my specific use case is compatible with these oddities since the 
untrusted code is not allowed to install signal handlers or spawn 
threads himself but only through the trusted code, which ensures, that 
the PKRU has the correct value when control passes back to the untrusted 
code. But I agree that these oddities are something a programmer needs 
to take into account if he uses pkeys in general (and this patch adds 
new considerations to it if the program installs a seccomp filter and 
e.g. wants to do system calls from a signal handler).



One thing that's a worry is that we have never said that you *can't*
write to arbitrary permissions in PKRU.  I can totally see some really
paranoid code saying, "I'm about to do something risky, so I'll turn off
access to *all* pkeys", or " turn off all access except my current
stack".  If they did that, they might also inadvertently disable access
to certain seccomp-restricted syscalls.

We can fix that up by documenting restrictions like "code should never
change the access rights of any pkey other than those that it
allocated", but that doesn't help any old code (of which I hope there is
relatively little).



I can also see paranoid code wanting to do something like you describe 
and I think, that we should try to not forbid this behaviour. 
Documenting restrictions on code which writes to the PKRU as you 
describe would be one way to go, but this would disallow this paranoid 
use case if I understand it correctly because the code would not be 
allowed to disable access to all except of one pkey.


My idea would be to not put the blame on the code which writes to the 
PKRU but on the code which installs the seccomp filter based on the 
PKRU: It has to make sure, that it allows the system calls which the 
paranoid code should be allowed to do when the pkey of the paranoid code 
is the (only) enabled pkey. This could be ensured e.g. by the paranoid 
code providing the pkey it intends to use to the code which installs the 
seccomp filter. This of course means, that you need to know what you are 
doing, when you install a seccomp filter which depends on the PKRU, but 
I think this is already the case when you install a seccomp filter 
without this patch (but maybe someone with more seccomp experience than 
me can say better how much reasoning is needed to install a seccomp 
filter without and with this patch).


-- Michael


Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data

2018-10-29 Thread Michael Sammler

Am 29.10.2018 um 18:29 schrieb Dave Hansen:


On 10/29/18 9:48 AM, Jann Horn wrote:

On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen  wrote:

I'm not sure this is a great use for PKRU.  I *think* the basic problem
is that you want to communicate some rights information down into a
filter, and you want to communicate it with PKRU.  While it's handy to
have an extra register that nobody (generally) mucks with, I'm not quite
convinced that we want to repurpose it this way.

That's not how I understand it; I believe that the context is probably
https://arxiv.org/pdf/1801.06822.pdf ?
My understanding is that PKRU is used for lightweight in-process
sandboxing, and to extend this sandbox protection to the syscall
interface, it is necessary to expose PKRU state to seccomp filters.
In other words, this isn't using PKRU exclusively for passing rights
into a filter, but it has to use PKRU anyway.

PKRU gives information about rights to various bits of application data.
  From that, a seccomp filter can infer the context, and thus the ability
for the code to call a given syscall at a certain point in time.

This makes PKRU an opt-in part of the syscall ABI, which is pretty
interesting.  We _could_ do the same kind of thing with any callee-saved
general purpose register, but PKRU is particularly attractive because
there is only one instruction that writes to it (well, outside of
XSAVE*), and random library code is very unlikely at this point to be
using it.
I agree with you on the points, why PKRU is particularly attractive but 
I think the most important point is that PKRU is _not_ a general purpose 
register, but is already used to control access to some resource 
(memory). This patch would allow to also control access to another 
resource (system calls) using the PKRU. This is why it makes sense to 
use the PKRU in this patch instead of another callee-saved register.

PKRU getting reset on signals, and the requirement now that it *can't*
be changed if you make syscalls probably needs to get thought about very
carefully before we do this, though.
I am not sure, whether I follow you. Are you saying, that PKRU is 
currently not guaranteed to be preserved across system calls?
This would make it very hard to use protection keys if libc does not 
save and restore the PKRU before/after systemcalls (and I am not aware 
of this).


Or do you mean, that the kernel might want to use the PKRU register for 
its own purposes while it is executing?
Then the solution you proposed in another email in this thread would 
work: instead of providing the seccomp filter with the current value of 
the PKRU (which might be different from what the user space expects) use 
the user space value which must have been saved somewhere (otherwise it 
would not be possible to restore it).


Or are you afraid, that one part of a user space program installs a 
seccomp filter, which blocks system calls based on the PKRU, and another 
part of the same program (maybe a library) changes the PKRU in a way, 
which the first part did not expect and the program dies because it 
tries to do a forbidden system call?
I don't know whether the kernel can (and wants) do anything against 
this. This problem also exists without this patch if you replace system 
call with memory access.


-- Michael


Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data

2018-10-29 Thread Michael Sammler

On 10/29/2018 05:48 PM, Ram Pai wrote:


On Mon, Oct 29, 2018 at 09:25:15AM -0700, Kees Cook wrote:

On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler  wrote:

Add the current value of an architecture specific protection keys
register (currently PKRU on x86) to data available for seccomp-bpf
programs to work on. This allows filters based on the currently
enabled protection keys.

Support for protection keys on the POWER architecture is not part of
this patch since I do not have access to a PowerPC, but adding support
for it can be achieved by setting sd->pkeys to the AMR register in
populate_seccomp_data.

Maybe you can use a generic read_pkey() function, and each arch can
provide their own implementation.  In the case of powerpc it is
mfspr(SPRN_AMR);

something like the code below might help?

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 92a9962..d79efe3 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -89,6 +89,11 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
  #define __mm_pkey_is_reserved(pkey) (reserved_allocation_mask & \
   pkey_alloc_mask(pkey))
  
+static inline u64 read_pkey()

+{
+return mfspr(SPRN_AMR);
+}
+
  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
  {
if (pkey < 0 || pkey >= arch_max_pkey())
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 19b137f..feea114 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -9,6 +9,11 @@
  extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
  
+static inline u64 read_pkey()

+{
+return read_pkru();
+}
+
  static inline bool arch_pkeys_enabled(void)
  {
return boot_cpu_has(X86_FEATURE_OSPKE);
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 2955ba97..f1538c7 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -13,6 +13,11 @@
  #define PKEY_DEDICATED_EXECUTE_ONLY 0
  #define ARCH_VM_PKEY_FLAGS 0
  
+static inline u64 read_pkey()

+{
+   return 0;
+}
+
  static inline int vma_pkey(struct vm_area_struct *vma)
  {
return 0;

RP
Thank you very much, this helps indeed. I will add this to the next 
version of this patch.



One use case for this patch is disabling unnecessary system calls for a
library (e.g. network i/o for a crypto library) while the library runs
without disabling the system calls for the whole program (by changing
the protection keys before and after the library executes). Using this
one could ensure that the library behaves a expected (e.g. the crypto
library not sending private keys to a malicious server).

This patch also enables lightweight sandboxing of untrusted code using
memory protection keys: Protection keys provide memory isolation but
for a sandbox system call isolation is needed as well. This patch
allows writing a seccomp filter to prevent system calls by the
untrusted code while still allowing system calls for the trusted code.

Isn't PKU instruction based? Couldn't a malicious library just change
the state of the MPK registers? This seems like an easy way to bypass
any filters that used PKU. I'm not convinced this is a meaningful
barrier that should be enforced by seccomp.

This can also be done with a signal handler with SECCOMP_RET_TRAP and
check instruction pointer vs PKU state.

-Kees


An alternative design would be to extend (c)BPF with a new instruction
to read the state of a protection key. This alternate design would
provide a simpler interface to the user space since the BPF program
would not need to deal with the architecture specific pkeys field in
seccomp_data, but the question is, how much of an advantage this would
be as the nr field in seccomp_data is already architecture specific.
Adding a new instruction for BPF programs is more complicated than
this patch and might be a breaking change.

Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys
support:

With patch:
Benchmarking 33554432 samples...
28.019505558 - 18.676858522 = 9342647036
getpid native: 278 ns
42.279109885 - 28.019657031 = 14259452854
getpid RET_ALLOW: 424 ns
Estimated seccomp overhead per syscall: 146 ns

Without patch:
Benchmarking 33554432 samples...
28.059619466 - 18.706769155 = 9352850311
getpid native: 278 ns
42.299228279 - 28.059761804 = 14239466475
getpid RET_ALLOW: 424 ns
Estimated seccomp overhead per syscall: 146 ns

Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Ram Pai 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Michael Sammler 
---
Changes to the previous version:
- added motivation, notes about POWER, alternative design and benchmark results 
to the commit log
- renamed pkru field in seccomp_data to pkeys
- changed size of pkru field to __u64 and removed reserved field
- added test for x86

Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data

2018-10-29 Thread Michael Sammler

On 10/29/2018 05:48 PM, Jann Horn wrote:

On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen  wrote:

On 10/29/18 9:25 AM, Kees Cook wrote:

On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler  wrote:

Add the current value of an architecture specific protection keys
register (currently PKRU on x86) to data available for seccomp-bpf
programs to work on. This allows filters based on the currently
enabled protection keys.

How does the current "assignment" of protection keys to the various uses
get communicated to the filter?

I assume that you first allocate your protection keys, then install the filter?

Yes, but I agree that it should probably be documented, that the filter 
should only look at the parts of the PKRU, which belong to pkeys the 
user space program allocated (if the kernel wants to use some parts of 
the PKRU for its own purposes).

I'm not sure this is a great use for PKRU.  I *think* the basic problem
is that you want to communicate some rights information down into a
filter, and you want to communicate it with PKRU.  While it's handy to
have an extra register that nobody (generally) mucks with, I'm not quite
convinced that we want to repurpose it this way.

That's not how I understand it; I believe that the context is probably
https://arxiv.org/pdf/1801.06822.pdf ?
My understanding is that PKRU is used for lightweight in-process
sandboxing, and to extend this sandbox protection to the syscall
interface, it is necessary to expose PKRU state to seccomp filters.
In other words, this isn't using PKRU exclusively for passing rights
into a filter, but it has to use PKRU anyway.
Yes, https://arxiv.org/pdf/1801.06822.pdf is indeed the context and what 
you say is correct.

Also, I'm not sure the kernel provides the PKRU guarantees you want at
the moment.  Our implementation *probably* works, but it's mostly by
accident.
I don't know, which guarantees about the PKRU are provided at the 
moment, but the only guarantee needed for this patch is, that the kernel 
does not change the bits of the PKRU register, which belong to pkeys 
allocated by the user program, between the syscall entry and the call to 
secure_computing(). Is there are use case where the kernel would like to 
modify these bits of the PKRU?


-- MIchael


[RFC PATCH] seccomp: Add protection keys into seccomp_data

2018-10-29 Thread Michael Sammler
Add the current value of an architecture specific protection keys
register (currently PKRU on x86) to data available for seccomp-bpf
programs to work on. This allows filters based on the currently
enabled protection keys.

Support for protection keys on the POWER architecture is not part of
this patch since I do not have access to a PowerPC, but adding support
for it can be achieved by setting sd->pkeys to the AMR register in
populate_seccomp_data.

One use case for this patch is disabling unnecessary system calls for a
library (e.g. network i/o for a crypto library) while the library runs
without disabling the system calls for the whole program (by changing
the protection keys before and after the library executes). Using this
one could ensure that the library behaves a expected (e.g. the crypto
library not sending private keys to a malicious server).

This patch also enables lightweight sandboxing of untrusted code using
memory protection keys: Protection keys provide memory isolation but
for a sandbox system call isolation is needed as well. This patch
allows writing a seccomp filter to prevent system calls by the
untrusted code while still allowing system calls for the trusted code.

An alternative design would be to extend (c)BPF with a new instruction
to read the state of a protection key. This alternate design would
provide a simpler interface to the user space since the BPF program
would not need to deal with the architecture specific pkeys field in
seccomp_data, but the question is, how much of an advantage this would
be as the nr field in seccomp_data is already architecture specific.
Adding a new instruction for BPF programs is more complicated than
this patch and might be a breaking change.

Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys
support:

With patch:
Benchmarking 33554432 samples...
28.019505558 - 18.676858522 = 9342647036
getpid native: 278 ns
42.279109885 - 28.019657031 = 14259452854
getpid RET_ALLOW: 424 ns
Estimated seccomp overhead per syscall: 146 ns

Without patch:
Benchmarking 33554432 samples...
28.059619466 - 18.706769155 = 9352850311
getpid native: 278 ns
42.299228279 - 28.059761804 = 14239466475
getpid RET_ALLOW: 424 ns
Estimated seccomp overhead per syscall: 146 ns

Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Ram Pai 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Michael Sammler 
---
Changes to the previous version:
- added motivation, notes about POWER, alternative design and benchmark results 
to the commit log
- renamed pkru field in seccomp_data to pkeys
- changed size of pkru field to __u64 and removed reserved field
- added test for x86

 arch/mips/kernel/ptrace.c |   1 +
 arch/x86/entry/common.c   |   1 +
 include/uapi/linux/seccomp.h  |   3 +
 kernel/seccomp.c  |   1 +
 tools/testing/selftests/seccomp/seccomp_bpf.c | 107 +-
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index e5ba56c0..a58dd04d 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -1277,6 +1277,7 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, 
long syscall)
for (i = 0; i < 6; i++)
sd.args[i] = args[i];
sd.instruction_pointer = KSTK_EIP(current);
+   sd.pkeys = 0;

ret = __secure_computing(&sd);
if (ret == -1)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3b2490b8..20c51bf2 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -98,6 +98,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
sd.arch = arch;
sd.nr = regs->orig_ax;
sd.instruction_pointer = regs->ip;
+   sd.pkeys = read_pkru();
 #ifdef CONFIG_X86_64
if (arch == AUDIT_ARCH_X86_64) {
sd.args[0] = regs->di;
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 9efc0e73..3aa2d934 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -52,12 +52,15 @@
  * @instruction_pointer: at the time of the system call.
  * @args: up to 6 system call arguments always stored as 64-bit values
  *regardless of the architecture.
+ * @pkeys: value of an architecture specific protection keys register
+ * (currently PKRU on x86)
  */
 struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
+   __u64 pkeys;
 };

 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index fd023ac2..dfb8b0d6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -91,6 +91,7 @@ static void populate_seccomp_data(struct seccomp_data *sd)
sd->args[4] = args[4];
  

Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Michael Sammler

On 10/25/2018 11:12 AM, Florian Weimer wrote:

I understand your concern about exposing the number of protection keys
in the ABI. One idea would be to state, that the pkru field (which
should probably be renamed) contains an architecture specific value,
which could then be the PKRU on x86 and AMR (or another register) on
POWER. This new field should probably be extended to __u64 and the
reserved field removed.

POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS
(disable read and write) and PKEY_DISABLE_WRITE like Intel.  It's
currently translated by the kernel, but I really need a
PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory
is write-only.
The idea here would be to simply provide the raw value of the register 
(PKRU on x86, AMR on POWER) to the BPF program and let the BPF program 
(or maybe a higher level library like libseccomp) deal with the 
complications of interpreting this architecture specific value (similar 
how the BPF program currently already has to deal with architecture 
specific system call numbers). If an architecture were to support more 
protection keys than fit into the field, the architecture specific value 
stored in the field might simply be the first protection keys. If there 
was interest, it would be possible to add more architecture specific 
fields to seccomp_data.

Another idea would be to not add a field in the seccomp_data
structure, but instead provide a new BPF instruction, which reads the
value of a specified protection key.

I would prefer that if it's possible.  We should make sure that the bits
are the same as those returned from pkey_get.  I have an implementation
on POWER, but have yet to figure out the implications for 32-bit because
I do not know the AMR register size there.

Thanks,
Florian
I have had a look at how BPF is implemented and it does not seem to be 
easy to just add an BPF instruction for seccomp since (as far as I 
understand) the code of the classical BPF (as used by seccomp) is shared 
with the code of eBPF, which is used in many parts of the kernel and 
there is at least one interpreter and one JIT compiler for BPF. But 
maybe someone with more experience than me can comment on how hard it 
would be to add an instruction to BPF.


- Michael


Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Michael Sammler

On 10/24/2018 08:06 PM, Florian Weimer wrote:


* Michael Sammler:


Add the current value of the PKRU register to data available for
seccomp-bpf programs to work on. This allows filters based on the
currently enabled protection keys.
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 9efc0e73..e8b9ecfc 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -52,12 +52,16 @@
   * @instruction_pointer: at the time of the system call.
   * @args: up to 6 system call arguments always stored as 64-bit values
   *regardless of the architecture.
+ * @pkru: value of the pkru register
+ * @reserved: pad the structure to a multiple of eight bytes
   */
  struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
+   __u32 pkru;
+   __u32 reserved;
  };

This doesn't cover the POWER implementation.  Adding Cc:s.

And I think the kernel shouldn't expose the number of protection keys in
the ABI.

Thanks,
Florian
Thank you for the pointer about the POWER implementation. I am not 
familiar with POWER in general and its protection key feature at all. 
Would the AMR register be the correct register to expose here?


I understand your concern about exposing the number of protection keys 
in the ABI. One idea would be to state, that the pkru field (which 
should probably be renamed) contains an architecture specific value, 
which could then be the PKRU on x86 and AMR (or another register) on 
POWER. This new field should probably be extended to __u64 and the 
reserved field removed.


Another idea would be to not add a field in the seccomp_data structure, 
but instead provide a new BPF instruction, which reads the value of a 
specified protection key.


- Michael