Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
> We're currently working on a feature in chromium that uses pkeys for > in-process isolation. Being able to use the pkey state in the seccomp > filter would be pretty useful for this. For example, it would allow > us to enforce that no code outside the isolated thread would ever > map/mprotect executable memory. > We can probably do something similar by adding instruction pointer > checks to the seccomp filter, but that feels quite hacky and this > feature would make a much nicer implementation. > > Are there any plans to make a version 2 of this patch? Thanks for your interest in this patch, but I am now working on other projects and currently don't plan to make a version 2 of this patch.
Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
On 10/29/2018 11:33 PM, Dave Hansen wrote: But, that's really an implementation detail. The effect on the ABI and how this might constrain future pkeys use is my bigger worry. I'd also want to make sure that your specific use-case is compatible with all the oddities of pkeys, like the 'clone' and signal behavior. Some of that is spelled out here: http://man7.org/linux/man-pages/man7/pkeys.7.html I think my specific use case is compatible with these oddities since the untrusted code is not allowed to install signal handlers or spawn threads himself but only through the trusted code, which ensures, that the PKRU has the correct value when control passes back to the untrusted code. But I agree that these oddities are something a programmer needs to take into account if he uses pkeys in general (and this patch adds new considerations to it if the program installs a seccomp filter and e.g. wants to do system calls from a signal handler). One thing that's a worry is that we have never said that you *can't* write to arbitrary permissions in PKRU. I can totally see some really paranoid code saying, "I'm about to do something risky, so I'll turn off access to *all* pkeys", or " turn off all access except my current stack". If they did that, they might also inadvertently disable access to certain seccomp-restricted syscalls. We can fix that up by documenting restrictions like "code should never change the access rights of any pkey other than those that it allocated", but that doesn't help any old code (of which I hope there is relatively little). I can also see paranoid code wanting to do something like you describe and I think, that we should try to not forbid this behaviour. Documenting restrictions on code which writes to the PKRU as you describe would be one way to go, but this would disallow this paranoid use case if I understand it correctly because the code would not be allowed to disable access to all except of one pkey. My idea would be to not put the blame on the code which writes to the PKRU but on the code which installs the seccomp filter based on the PKRU: It has to make sure, that it allows the system calls which the paranoid code should be allowed to do when the pkey of the paranoid code is the (only) enabled pkey. This could be ensured e.g. by the paranoid code providing the pkey it intends to use to the code which installs the seccomp filter. This of course means, that you need to know what you are doing, when you install a seccomp filter which depends on the PKRU, but I think this is already the case when you install a seccomp filter without this patch (but maybe someone with more seccomp experience than me can say better how much reasoning is needed to install a seccomp filter without and with this patch). -- Michael
Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
Am 29.10.2018 um 18:29 schrieb Dave Hansen: On 10/29/18 9:48 AM, Jann Horn wrote: On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen wrote: I'm not sure this is a great use for PKRU. I *think* the basic problem is that you want to communicate some rights information down into a filter, and you want to communicate it with PKRU. While it's handy to have an extra register that nobody (generally) mucks with, I'm not quite convinced that we want to repurpose it this way. That's not how I understand it; I believe that the context is probably https://arxiv.org/pdf/1801.06822.pdf ? My understanding is that PKRU is used for lightweight in-process sandboxing, and to extend this sandbox protection to the syscall interface, it is necessary to expose PKRU state to seccomp filters. In other words, this isn't using PKRU exclusively for passing rights into a filter, but it has to use PKRU anyway. PKRU gives information about rights to various bits of application data. From that, a seccomp filter can infer the context, and thus the ability for the code to call a given syscall at a certain point in time. This makes PKRU an opt-in part of the syscall ABI, which is pretty interesting. We _could_ do the same kind of thing with any callee-saved general purpose register, but PKRU is particularly attractive because there is only one instruction that writes to it (well, outside of XSAVE*), and random library code is very unlikely at this point to be using it. I agree with you on the points, why PKRU is particularly attractive but I think the most important point is that PKRU is _not_ a general purpose register, but is already used to control access to some resource (memory). This patch would allow to also control access to another resource (system calls) using the PKRU. This is why it makes sense to use the PKRU in this patch instead of another callee-saved register. PKRU getting reset on signals, and the requirement now that it *can't* be changed if you make syscalls probably needs to get thought about very carefully before we do this, though. I am not sure, whether I follow you. Are you saying, that PKRU is currently not guaranteed to be preserved across system calls? This would make it very hard to use protection keys if libc does not save and restore the PKRU before/after systemcalls (and I am not aware of this). Or do you mean, that the kernel might want to use the PKRU register for its own purposes while it is executing? Then the solution you proposed in another email in this thread would work: instead of providing the seccomp filter with the current value of the PKRU (which might be different from what the user space expects) use the user space value which must have been saved somewhere (otherwise it would not be possible to restore it). Or are you afraid, that one part of a user space program installs a seccomp filter, which blocks system calls based on the PKRU, and another part of the same program (maybe a library) changes the PKRU in a way, which the first part did not expect and the program dies because it tries to do a forbidden system call? I don't know whether the kernel can (and wants) do anything against this. This problem also exists without this patch if you replace system call with memory access. -- Michael
Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
On 10/29/2018 05:48 PM, Ram Pai wrote: On Mon, Oct 29, 2018 at 09:25:15AM -0700, Kees Cook wrote: On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler wrote: Add the current value of an architecture specific protection keys register (currently PKRU on x86) to data available for seccomp-bpf programs to work on. This allows filters based on the currently enabled protection keys. Support for protection keys on the POWER architecture is not part of this patch since I do not have access to a PowerPC, but adding support for it can be achieved by setting sd->pkeys to the AMR register in populate_seccomp_data. Maybe you can use a generic read_pkey() function, and each arch can provide their own implementation. In the case of powerpc it is mfspr(SPRN_AMR); something like the code below might help? diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 92a9962..d79efe3 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -89,6 +89,11 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) #define __mm_pkey_is_reserved(pkey) (reserved_allocation_mask & \ pkey_alloc_mask(pkey)) +static inline u64 read_pkey() +{ +return mfspr(SPRN_AMR); +} + static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { if (pkey < 0 || pkey >= arch_max_pkey()) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 19b137f..feea114 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -9,6 +9,11 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); +static inline u64 read_pkey() +{ +return read_pkru(); +} + static inline bool arch_pkeys_enabled(void) { return boot_cpu_has(X86_FEATURE_OSPKE); diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba97..f1538c7 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -13,6 +13,11 @@ #define PKEY_DEDICATED_EXECUTE_ONLY 0 #define ARCH_VM_PKEY_FLAGS 0 +static inline u64 read_pkey() +{ + return 0; +} + static inline int vma_pkey(struct vm_area_struct *vma) { return 0; RP Thank you very much, this helps indeed. I will add this to the next version of this patch. One use case for this patch is disabling unnecessary system calls for a library (e.g. network i/o for a crypto library) while the library runs without disabling the system calls for the whole program (by changing the protection keys before and after the library executes). Using this one could ensure that the library behaves a expected (e.g. the crypto library not sending private keys to a malicious server). This patch also enables lightweight sandboxing of untrusted code using memory protection keys: Protection keys provide memory isolation but for a sandbox system call isolation is needed as well. This patch allows writing a seccomp filter to prevent system calls by the untrusted code while still allowing system calls for the trusted code. Isn't PKU instruction based? Couldn't a malicious library just change the state of the MPK registers? This seems like an easy way to bypass any filters that used PKU. I'm not convinced this is a meaningful barrier that should be enforced by seccomp. This can also be done with a signal handler with SECCOMP_RET_TRAP and check instruction pointer vs PKU state. -Kees An alternative design would be to extend (c)BPF with a new instruction to read the state of a protection key. This alternate design would provide a simpler interface to the user space since the BPF program would not need to deal with the architecture specific pkeys field in seccomp_data, but the question is, how much of an advantage this would be as the nr field in seccomp_data is already architecture specific. Adding a new instruction for BPF programs is more complicated than this patch and might be a breaking change. Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys support: With patch: Benchmarking 33554432 samples... 28.019505558 - 18.676858522 = 9342647036 getpid native: 278 ns 42.279109885 - 28.019657031 = 14259452854 getpid RET_ALLOW: 424 ns Estimated seccomp overhead per syscall: 146 ns Without patch: Benchmarking 33554432 samples... 28.059619466 - 18.706769155 = 9352850311 getpid native: 278 ns 42.299228279 - 28.059761804 = 14239466475 getpid RET_ALLOW: 424 ns Estimated seccomp overhead per syscall: 146 ns Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Ram Pai Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-...@vger.kernel.org Signed-off-by: Michael Sammler --- Changes to the previous version: - added motivation, notes about POWER, alternative design and benchmark results to the commit log - renamed pkru field in seccomp_data to pkeys - changed size of pkru field to __u64 and removed reserved field - added test for x86
Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
On 10/29/2018 05:48 PM, Jann Horn wrote: On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen wrote: On 10/29/18 9:25 AM, Kees Cook wrote: On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler wrote: Add the current value of an architecture specific protection keys register (currently PKRU on x86) to data available for seccomp-bpf programs to work on. This allows filters based on the currently enabled protection keys. How does the current "assignment" of protection keys to the various uses get communicated to the filter? I assume that you first allocate your protection keys, then install the filter? Yes, but I agree that it should probably be documented, that the filter should only look at the parts of the PKRU, which belong to pkeys the user space program allocated (if the kernel wants to use some parts of the PKRU for its own purposes). I'm not sure this is a great use for PKRU. I *think* the basic problem is that you want to communicate some rights information down into a filter, and you want to communicate it with PKRU. While it's handy to have an extra register that nobody (generally) mucks with, I'm not quite convinced that we want to repurpose it this way. That's not how I understand it; I believe that the context is probably https://arxiv.org/pdf/1801.06822.pdf ? My understanding is that PKRU is used for lightweight in-process sandboxing, and to extend this sandbox protection to the syscall interface, it is necessary to expose PKRU state to seccomp filters. In other words, this isn't using PKRU exclusively for passing rights into a filter, but it has to use PKRU anyway. Yes, https://arxiv.org/pdf/1801.06822.pdf is indeed the context and what you say is correct. Also, I'm not sure the kernel provides the PKRU guarantees you want at the moment. Our implementation *probably* works, but it's mostly by accident. I don't know, which guarantees about the PKRU are provided at the moment, but the only guarantee needed for this patch is, that the kernel does not change the bits of the PKRU register, which belong to pkeys allocated by the user program, between the syscall entry and the call to secure_computing(). Is there are use case where the kernel would like to modify these bits of the PKRU? -- MIchael
[RFC PATCH] seccomp: Add protection keys into seccomp_data
Add the current value of an architecture specific protection keys register (currently PKRU on x86) to data available for seccomp-bpf programs to work on. This allows filters based on the currently enabled protection keys. Support for protection keys on the POWER architecture is not part of this patch since I do not have access to a PowerPC, but adding support for it can be achieved by setting sd->pkeys to the AMR register in populate_seccomp_data. One use case for this patch is disabling unnecessary system calls for a library (e.g. network i/o for a crypto library) while the library runs without disabling the system calls for the whole program (by changing the protection keys before and after the library executes). Using this one could ensure that the library behaves a expected (e.g. the crypto library not sending private keys to a malicious server). This patch also enables lightweight sandboxing of untrusted code using memory protection keys: Protection keys provide memory isolation but for a sandbox system call isolation is needed as well. This patch allows writing a seccomp filter to prevent system calls by the untrusted code while still allowing system calls for the trusted code. An alternative design would be to extend (c)BPF with a new instruction to read the state of a protection key. This alternate design would provide a simpler interface to the user space since the BPF program would not need to deal with the architecture specific pkeys field in seccomp_data, but the question is, how much of an advantage this would be as the nr field in seccomp_data is already architecture specific. Adding a new instruction for BPF programs is more complicated than this patch and might be a breaking change. Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys support: With patch: Benchmarking 33554432 samples... 28.019505558 - 18.676858522 = 9342647036 getpid native: 278 ns 42.279109885 - 28.019657031 = 14259452854 getpid RET_ALLOW: 424 ns Estimated seccomp overhead per syscall: 146 ns Without patch: Benchmarking 33554432 samples... 28.059619466 - 18.706769155 = 9352850311 getpid native: 278 ns 42.299228279 - 28.059761804 = 14239466475 getpid RET_ALLOW: 424 ns Estimated seccomp overhead per syscall: 146 ns Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Ram Pai Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-...@vger.kernel.org Signed-off-by: Michael Sammler --- Changes to the previous version: - added motivation, notes about POWER, alternative design and benchmark results to the commit log - renamed pkru field in seccomp_data to pkeys - changed size of pkru field to __u64 and removed reserved field - added test for x86 arch/mips/kernel/ptrace.c | 1 + arch/x86/entry/common.c | 1 + include/uapi/linux/seccomp.h | 3 + kernel/seccomp.c | 1 + tools/testing/selftests/seccomp/seccomp_bpf.c | 107 +- 5 files changed, 112 insertions(+), 1 deletion(-) diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c index e5ba56c0..a58dd04d 100644 --- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -1277,6 +1277,7 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall) for (i = 0; i < 6; i++) sd.args[i] = args[i]; sd.instruction_pointer = KSTK_EIP(current); + sd.pkeys = 0; ret = __secure_computing(&sd); if (ret == -1) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 3b2490b8..20c51bf2 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -98,6 +98,7 @@ static long syscall_trace_enter(struct pt_regs *regs) sd.arch = arch; sd.nr = regs->orig_ax; sd.instruction_pointer = regs->ip; + sd.pkeys = read_pkru(); #ifdef CONFIG_X86_64 if (arch == AUDIT_ARCH_X86_64) { sd.args[0] = regs->di; diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 9efc0e73..3aa2d934 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -52,12 +52,15 @@ * @instruction_pointer: at the time of the system call. * @args: up to 6 system call arguments always stored as 64-bit values *regardless of the architecture. + * @pkeys: value of an architecture specific protection keys register + * (currently PKRU on x86) */ struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; + __u64 pkeys; }; #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index fd023ac2..dfb8b0d6 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -91,6 +91,7 @@ static void populate_seccomp_data(struct seccomp_data *sd) sd->args[4] = args[4];
Re: [PATCH] seccomp: Add pkru into seccomp_data
On 10/25/2018 11:12 AM, Florian Weimer wrote: I understand your concern about exposing the number of protection keys in the ABI. One idea would be to state, that the pkru field (which should probably be renamed) contains an architecture specific value, which could then be the PKRU on x86 and AMR (or another register) on POWER. This new field should probably be extended to __u64 and the reserved field removed. POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS (disable read and write) and PKEY_DISABLE_WRITE like Intel. It's currently translated by the kernel, but I really need a PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory is write-only. The idea here would be to simply provide the raw value of the register (PKRU on x86, AMR on POWER) to the BPF program and let the BPF program (or maybe a higher level library like libseccomp) deal with the complications of interpreting this architecture specific value (similar how the BPF program currently already has to deal with architecture specific system call numbers). If an architecture were to support more protection keys than fit into the field, the architecture specific value stored in the field might simply be the first protection keys. If there was interest, it would be possible to add more architecture specific fields to seccomp_data. Another idea would be to not add a field in the seccomp_data structure, but instead provide a new BPF instruction, which reads the value of a specified protection key. I would prefer that if it's possible. We should make sure that the bits are the same as those returned from pkey_get. I have an implementation on POWER, but have yet to figure out the implications for 32-bit because I do not know the AMR register size there. Thanks, Florian I have had a look at how BPF is implemented and it does not seem to be easy to just add an BPF instruction for seccomp since (as far as I understand) the code of the classical BPF (as used by seccomp) is shared with the code of eBPF, which is used in many parts of the kernel and there is at least one interpreter and one JIT compiler for BPF. But maybe someone with more experience than me can comment on how hard it would be to add an instruction to BPF. - Michael
Re: [PATCH] seccomp: Add pkru into seccomp_data
On 10/24/2018 08:06 PM, Florian Weimer wrote: * Michael Sammler: Add the current value of the PKRU register to data available for seccomp-bpf programs to work on. This allows filters based on the currently enabled protection keys. diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 9efc0e73..e8b9ecfc 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -52,12 +52,16 @@ * @instruction_pointer: at the time of the system call. * @args: up to 6 system call arguments always stored as 64-bit values *regardless of the architecture. + * @pkru: value of the pkru register + * @reserved: pad the structure to a multiple of eight bytes */ struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; + __u32 pkru; + __u32 reserved; }; This doesn't cover the POWER implementation. Adding Cc:s. And I think the kernel shouldn't expose the number of protection keys in the ABI. Thanks, Florian Thank you for the pointer about the POWER implementation. I am not familiar with POWER in general and its protection key feature at all. Would the AMR register be the correct register to expose here? I understand your concern about exposing the number of protection keys in the ABI. One idea would be to state, that the pkru field (which should probably be renamed) contains an architecture specific value, which could then be the PKRU on x86 and AMR (or another register) on POWER. This new field should probably be extended to __u64 and the reserved field removed. Another idea would be to not add a field in the seccomp_data structure, but instead provide a new BPF instruction, which reads the value of a specified protection key. - Michael