[RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
After several recent works [1,2,3] KASLR on x86_64 was basically considered dead by many researchers. We have been working on an efficient but effective fix for this problem and found that not mapping the kernel space when running in user mode is the solution to this problem [4] (the corresponding paper [5] will be presented at ESSoS17). With this RFC patch we allow anybody to configure their kernel with the flag CONFIG_KAISER to add our defense mechanism. If there are any questions we would love to answer them. We also appreciate any comments! Cheers, Daniel (+ the KAISER team from Graz University of Technology) [1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf [2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf [3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf [4] https://github.com/IAIK/KAISER [5] https://gruss.cc/files/kaiser.pdf >From 03c413bc52f1ac253cf0f067605f367f3390d3f4 Mon Sep 17 00:00:00 2001 From: Richard Fellner Date: Thu, 4 May 2017 10:44:38 +0200 Subject: [PATCH] KAISER: Kernel Address Isolation This patch introduces our implementation of KAISER (Kernel Address Isolation to have Side-channels Efficiently Removed), a kernel isolation technique to close hardware side channels on kernel address information. More information about the patch can be found on: https://github.com/IAIK/KAISER --- arch/x86/entry/entry_64.S| 17 + arch/x86/entry/entry_64_compat.S | 7 ++- arch/x86/include/asm/hw_irq.h| 2 +- arch/x86/include/asm/pgtable.h | 4 arch/x86/include/asm/pgtable_64.h| 21 + arch/x86/include/asm/pgtable_types.h | 12 ++-- arch/x86/include/asm/processor.h | 7 ++- arch/x86/kernel/cpu/common.c | 4 ++-- arch/x86/kernel/espfix_64.c | 6 ++ arch/x86/kernel/head_64.S| 16 arch/x86/kernel/irqinit.c| 2 +- arch/x86/kernel/process.c| 2 +- arch/x86/mm/Makefile | 2 +- arch/x86/mm/pageattr.c | 2 +- arch/x86/mm/pgtable.c| 28 +++- include/asm-generic/vmlinux.lds.h| 11 ++- include/linux/percpu-defs.h | 30 ++ init/main.c | 5 + kernel/fork.c| 8 security/Kconfig | 7 +++ 20 files changed, 176 insertions(+), 17 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 044d18e..631c7bf 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -37,6 +37,7 @@ #include #include #include +#include #include .code64 @@ -141,6 +142,7 @@ ENTRY(entry_SYSCALL_64) * it is too small to ever cause noticeable irq latency. */ SWAPGS_UNSAFE_STACK + SWITCH_KERNEL_CR3_NO_STACK /* * A hypervisor implementation might want to use a label * after the swapgs, so that it can do the swapgs @@ -223,6 +225,7 @@ entry_SYSCALL_64_fastpath: movq RIP(%rsp), %rcx movq EFLAGS(%rsp), %r11 RESTORE_C_REGS_EXCEPT_RCX_R11 + SWITCH_USER_CR3 movq RSP(%rsp), %rsp USERGS_SYSRET64 @@ -318,10 +321,12 @@ return_from_SYSCALL_64: syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ RESTORE_C_REGS_EXCEPT_RCX_R11 + SWITCH_USER_CR3 movq RSP(%rsp), %rsp USERGS_SYSRET64 opportunistic_sysret_failed: + SWITCH_USER_CR3 SWAPGS jmp restore_c_regs_and_iret END(entry_SYSCALL_64) @@ -420,6 +425,7 @@ ENTRY(ret_from_fork) leaq FRAME_OFFSET(%rsp),%rdi /* pt_regs pointer */ call syscall_return_slowpath /* returns with IRQs disabled */ TRACE_IRQS_ON /* user mode is traced as IRQS on */ + SWITCH_USER_CR3 SWAPGS FRAME_END jmp restore_regs_and_iret @@ -476,6 +482,7 @@ END(irq_entries_start) * tracking that we're in kernel mode. */ SWAPGS + SWITCH_KERNEL_CR3 /* * We need to tell lockdep that IRQs are off. We can't do this until @@ -533,6 +540,7 @@ GLOBAL(retint_user) mov %rsp,%rdi call prepare_exit_to_usermode TRACE_IRQS_IRETQ + SWITCH_USER_CR3 SWAPGS jmp restore_regs_and_iret @@ -610,6 +618,7 @@ native_irq_return_ldt: pushq %rdi/* Stash user RDI */ SWAPGS + SWITCH_KERNEL_CR3 movq PER_CPU_VAR(espfix_waddr), %rdi movq %rax, (0*8)(%rdi) /* user RAX */ movq (1*8)(%rsp), %rax /* user RIP */ @@ -636,6 +645,7 @@ native_irq_return_ldt: * still points to an RO alias of the ESPFIX stack. */ orq PER_CPU_VAR(espfix_stack), %rax + SWITCH_USER_CR3 SWAPGS movq %rax, %rsp @@ -1034,6 +1044,7 @@ ENTRY(paranoid_entry) testl %edx, %edx js 1f/* negative -> in kernel */ SWAPGS + SWITCH_KERNEL_CR3 xorl %ebx, %ebx 1: ret END(paranoid_entry) @@ -1056,6 +1067,7 @@ ENTRY(paranoid_ex
Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Sorry, missed a file in the first mail (from some code cleanup), the full patch is now attached. Cheers, Daniel >From c4b1831d44c6144d3762ccc72f0c4e71a0c713e5 Mon Sep 17 00:00:00 2001 From: Richard Fellner Date: Thu, 4 May 2017 14:16:44 +0200 Subject: [PATCH] KAISER: Kernel Address Isolation This patch introduces our implementation of KAISER (Kernel Address Isolation to have Side-channels Efficiently Removed), a kernel isolation technique to close hardware side channels on kernel address information. More information about the patch can be found on: https://github.com/IAIK/KAISER --- arch/x86/entry/entry_64.S| 17 arch/x86/entry/entry_64_compat.S | 7 +- arch/x86/include/asm/hw_irq.h| 2 +- arch/x86/include/asm/kaiser.h| 113 +++ arch/x86/include/asm/pgtable.h | 4 + arch/x86/include/asm/pgtable_64.h| 21 + arch/x86/include/asm/pgtable_types.h | 12 ++- arch/x86/include/asm/processor.h | 7 +- arch/x86/kernel/cpu/common.c | 4 +- arch/x86/kernel/espfix_64.c | 6 ++ arch/x86/kernel/head_64.S| 16 +++- arch/x86/kernel/irqinit.c| 2 +- arch/x86/kernel/process.c| 2 +- arch/x86/mm/Makefile | 2 +- arch/x86/mm/kaiser.c | 172 +++ arch/x86/mm/pageattr.c | 2 +- arch/x86/mm/pgtable.c| 28 +- include/asm-generic/vmlinux.lds.h| 11 ++- include/linux/percpu-defs.h | 30 ++ init/main.c | 5 + kernel/fork.c| 8 ++ security/Kconfig | 7 ++ 22 files changed, 461 insertions(+), 17 deletions(-) create mode 100644 arch/x86/include/asm/kaiser.h create mode 100644 arch/x86/mm/kaiser.c diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 044d18e..631c7bf 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -37,6 +37,7 @@ #include #include #include +#include #include .code64 @@ -141,6 +142,7 @@ ENTRY(entry_SYSCALL_64) * it is too small to ever cause noticeable irq latency. */ SWAPGS_UNSAFE_STACK + SWITCH_KERNEL_CR3_NO_STACK /* * A hypervisor implementation might want to use a label * after the swapgs, so that it can do the swapgs @@ -223,6 +225,7 @@ entry_SYSCALL_64_fastpath: movq RIP(%rsp), %rcx movq EFLAGS(%rsp), %r11 RESTORE_C_REGS_EXCEPT_RCX_R11 + SWITCH_USER_CR3 movq RSP(%rsp), %rsp USERGS_SYSRET64 @@ -318,10 +321,12 @@ return_from_SYSCALL_64: syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ RESTORE_C_REGS_EXCEPT_RCX_R11 + SWITCH_USER_CR3 movq RSP(%rsp), %rsp USERGS_SYSRET64 opportunistic_sysret_failed: + SWITCH_USER_CR3 SWAPGS jmp restore_c_regs_and_iret END(entry_SYSCALL_64) @@ -420,6 +425,7 @@ ENTRY(ret_from_fork) leaq FRAME_OFFSET(%rsp),%rdi /* pt_regs pointer */ call syscall_return_slowpath /* returns with IRQs disabled */ TRACE_IRQS_ON /* user mode is traced as IRQS on */ + SWITCH_USER_CR3 SWAPGS FRAME_END jmp restore_regs_and_iret @@ -476,6 +482,7 @@ END(irq_entries_start) * tracking that we're in kernel mode. */ SWAPGS + SWITCH_KERNEL_CR3 /* * We need to tell lockdep that IRQs are off. We can't do this until @@ -533,6 +540,7 @@ GLOBAL(retint_user) mov %rsp,%rdi call prepare_exit_to_usermode TRACE_IRQS_IRETQ + SWITCH_USER_CR3 SWAPGS jmp restore_regs_and_iret @@ -610,6 +618,7 @@ native_irq_return_ldt: pushq %rdi/* Stash user RDI */ SWAPGS + SWITCH_KERNEL_CR3 movq PER_CPU_VAR(espfix_waddr), %rdi movq %rax, (0*8)(%rdi) /* user RAX */ movq (1*8)(%rsp), %rax /* user RIP */ @@ -636,6 +645,7 @@ native_irq_return_ldt: * still points to an RO alias of the ESPFIX stack. */ orq PER_CPU_VAR(espfix_stack), %rax + SWITCH_USER_CR3 SWAPGS movq %rax, %rsp @@ -1034,6 +1044,7 @@ ENTRY(paranoid_entry) testl %edx, %edx js 1f/* negative -> in kernel */ SWAPGS + SWITCH_KERNEL_CR3 xorl %ebx, %ebx 1: ret END(paranoid_entry) @@ -1056,6 +1067,7 @@ ENTRY(paranoid_exit) testl %ebx, %ebx /* swapgs needed? */ jnz paranoid_exit_no_swapgs TRACE_IRQS_IRETQ + SWITCH_USER_CR3_NO_STACK SWAPGS_UNSAFE_STACK jmp paranoid_exit_restore paranoid_exit_no_swapgs: @@ -1085,6 +1097,7 @@ ENTRY(error_entry) * from user mode due to an IRET fault. */ SWAPGS + SWITCH_KERNEL_CR3 .Lerror_entry_from_usermode_after_swapgs: /* @@ -1136,6 +1149,7 @@ ENTRY(error_entry) * Switch to kernel gsbase: */ SWAPGS + SWITCH_KERNEL_CR3 /* * Pretend that the exception came from user mode: set up pt_regs @@ -1234,6 +1248,7 @@ ENTRY(nmi) */ SWAPGS_UNSAFE_STACK + SWITCH_KERNEL_CR3_NO_STACK cld movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp @@ -1274,6 +1289,7 @@ ENTRY(nmi) * Return back to user mode. We must *not* do the normal exit * work, becaus
Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Thu, May 04, 2017 at 12:02:47PM +0200, Daniel Gruss wrote: > After several recent works [1,2,3] KASLR on x86_64 was basically considered > dead by many researchers. We have been working on an efficient but effective > fix for this problem and found that not mapping the kernel space when > running in user mode is the solution to this problem [4] (the corresponding > paper [5] will be presented at ESSoS17). I'll try to read the paper. In the meantime: how different is your approach from then one here? https://lwn.net/Articles/39283/ and how different is the performance impact?
Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 04.05.2017 17:47, Christoph Hellwig wrote: I'll try to read the paper. In the meantime: how different is your approach from then one here? https://lwn.net/Articles/39283/ and how different is the performance impact? The approach sounds very similar, but we have fewer changes because we don't want to change memory allocation but only split the virtual memory - everything can stay where it is. We found that the CR3 switch seems to be significantly improved in modern microarchitectures (we performed our performance tests on a Skylake i7-6700K). We think the TLB maybe uses the full CR3 base address as a tag, relaxing the necessity of flushing the entire TLB upon CR3 updates a bit. Direct runtime overhead is switching the CR3, but that's it. Indirectly, we're potentially increasing the number of TLB entries that are required on one or the other level of the TLB. For TLB-intense tasks this might lead to more significant performance penalties. I'm sure the overhead on older systems is larger than on recent systems.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 2017-05-05 17:47, Thomas Garnier wrote: On Fri, May 5, 2017 at 1:23 AM, Daniel Gruss wrote: On 04.05.2017 17:28, Thomas Garnier wrote: Please read the documentation on submitting patches [1] and coding style [2]. I will have a closer look at that. - How this approach prevent the hardware attacks you mentioned? You still have to keep a part of _text in the pagetable and an attacker could discover it no? (and deduce the kernel base address). These parts are moved to a different section (.user_mapped) which is at a possibly predictable location - the location of the randomized parts of the kernel is independent of the location of .user_mapped. The code/data footprint for .user_mapped is quite small, helping to reduce or eliminate the attack surface... If I get it right, it means you can leak the per-cpu address instead of the kernel. Correct? That would be a problem because you can elevate privilege by overwriting per-cpu variables. Leaking this address means also defeating KASLR memory randomization [3] (cf paper in the commit). In theory you could put the code in the fixmap but you still have the per-cpu variables and changing that is hard. [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=021182e52fe01c1f7b126f97fd6ba048dc4234fd (Chiming in here, since we worked on something similar) Assuming that their patch indeed leaks per-cpu addresses.. it might not necessarily be required to change it. Since an adversary has to leak the per-cpu addresses based on timing information you can work around that by inserting dummy entries into the user mappings, with the goal of creating multiple candidate addresses that show an identical measurement. For instance, you can create one entry for every possible KASLR slot. You also need to make it clear that btb attacks are still possible. By just increasing the KASLR randomization range, btb attacks can be mitigated (for free). Correct, I hope we can do that. - What is the perf impact? It will vary for different machines. We have promising results (<1%) for an i7-6700K with representative benchmarks. However, for older systems or for workloads with a lot of pressure on some TLB levels, the performance may be much worse. I think including performance data in both cases would be useful. Best, David
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 2017-05-05 17:53, Jann Horn wrote: Ah, I think I understand. The kernel stacks are mapped, but cpu_current_top_of_stack isn't, so you can't find the stack until after the CR3 switch in the syscall handler? That's the idea. Only the absolute minimum that is required for a context switch remains mapped (+ it is mapped at an offset which does not depend on KASLR -> we do not leak the KASLR offsets).
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 2017-05-06 06:02, David Gens wrote: Assuming that their patch indeed leaks per-cpu addresses.. it might not necessarily be required to change it. I think we're not leaking them (unless we still have some bug in our code). The basic idea is that any part that is required for the context switch is at a fixed location (unrelated to the location of code / data / per-cpu data / ...) and thus does not reveal any randomized offsets. Then the attacker cannot gain any knowledge through the side channel anymore. For any attack the attacker could then only use the few KBs of memory that cannot be unmapped because of the way x86 works. Hardening these few KBs seems like an easier task than doing the same for the entire kernel. (The best solution would of course be Intel introducing CR3A and CR3B just like ARM has TTBR0 and TTBR1 - on ARM this entirely prevents any prefetch / double-fault side-channel attacks.)
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Hi, On Sat, May 06, 2017 at 10:38:23AM +0200, Daniel Gruss wrote: > On 2017-05-06 06:02, David Gens wrote: > >Assuming that their patch indeed leaks per-cpu addresses.. it might not > >necessarily > >be required to change it. > > I think we're not leaking them (unless we still have some bug in our code). > The basic idea is that any part that is required for the context switch is > at a fixed location (unrelated to the location of code / data / per-cpu data > / ...) and thus does not reveal any randomized offsets. Then the attacker > cannot gain any knowledge through the side channel anymore. > For any attack the attacker could then only use the few KBs of memory that > cannot be unmapped because of the way x86 works. Hardening these few KBs > seems like an easier task than doing the same for the entire kernel. > > (The best solution would of course be Intel introducing CR3A and CR3B just > like ARM has TTBR0 and TTBR1 - on ARM this entirely prevents any prefetch / > double-fault side-channel attacks.) While it may be the case that in practice ARM systems do not have such a side channel, I think that it is erroneous to believe that the architectural TTBR{0,1} split ensures this. The use of TTBR0 for user and TTBR1 for kernel is entirely a SW policy, and not an architectural requirement. It is possible to map data in TTBR1 which is accessible to userspace, and data in TTBR0 which is only accessible by the kernel. In either case, this is determined by the page tables themselves. Given this, I think that the statements in the KAISER paper regarding the TTBRs (in section 2.1) are not quite right. Architecturally, permission checks and lookups cannot be elided based on the TTBR used. Having two TTBRs does make it simpler to change the user/kernel address spaces independently, however. Thanks, Mark.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
While it may be the case that in practice ARM systems do not have such a side channel, I think that it is erroneous to believe that the architectural TTBR{0,1} split ensures this. The use of TTBR0 for user and TTBR1 for kernel is entirely a SW policy, and not an architectural requirement. It is possible to map data in TTBR1 which is accessible to userspace, and data in TTBR0 which is only accessible by the kernel. In either case, this is determined by the page tables themselves. Absolutely right, but TTBR0 and TTBR1 are usually used in this way. Given this, I think that the statements in the KAISER paper regarding the TTBRs (in section 2.1) are not quite right. Architecturally, permission checks and lookups cannot be elided based on the TTBR used. As we say in section 2.1, they are "typically" used in this way, and this prevents the attacks. Not just the presence of a second register, but the way how the two registers are used to split the translation tables for user and kernel.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Mon, May 08, 2017 at 12:51:27PM +0200, Daniel Gruss wrote: > >While it may be the case that in practice ARM systems do not have such a > >side channel, I think that it is erroneous to believe that the > >architectural TTBR{0,1} split ensures this. > > > >The use of TTBR0 for user and TTBR1 for kernel is entirely a SW policy, > >and not an architectural requirement. It is possible to map data in > >TTBR1 which is accessible to userspace, and data in TTBR0 which is only > >accessible by the kernel. In either case, this is determined by the page > >tables themselves. > > Absolutely right, but TTBR0 and TTBR1 are usually used in this way. Sure; if we consider Linux, while userspace is executing, TTBR1 will (only) contain kernel page tables and TTBR0 will (only) contain user page tables. However, as this is not an architectural requirement, the CPU cannot know that a user access that gets translated via TTBR1 will fault, and at some point must determine the permissions from the page tables as required by the architecture. > >Given this, I think that the statements in the KAISER paper regarding > >the TTBRs (in section 2.1) are not quite right. Architecturally, > >permission checks and lookups cannot be elided based on the TTBR used. > > As we say in section 2.1, they are "typically" used in this way, and > this prevents the attacks. Not just the presence of a second > register, but the way how the two registers are used to split the > translation tables for user and kernel. In practice, while userspace is executing, TTBR1 still points to kernel page tables. If a user program attempts to access an address mapped via TTBR1, the CPU has to attempt this translation via the TTBR1 page tables and/or associated TLB entries. Specifically, I think this does not align with the statement in 2.1 regarding the two TTBRs: This simplifies privilege checks and does not require any address translation for invalid memory accesses and thus no cache lookups. ... since the use of the TTBRs is orthogonal to privilege checks and/or the design of the TLBs. Thanks, Mark.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 05.05.2017 10:23, Daniel Gruss wrote: - How this approach prevent the hardware attacks you mentioned? You still have to keep a part of _text in the pagetable and an attacker could discover it no? (and deduce the kernel base address). These parts are moved to a different section (.user_mapped) which is at a possibly predictable location - the location of the randomized parts of the kernel is independent of the location of .user_mapped. The code/data footprint for .user_mapped is quite small, helping to reduce or eliminate the attack surface... We just discussed that in our group again: although we experimented with this part, it's not yet included in the patch. The solution we sketched is, as I wrote, we map the required (per-thread) variables in the user CR3 to a fixed location in memory. During the context switch, only this fixed part remains mapped but not the randomized pages. This is not a lot of work, because it's just mapping a few more pages and fixing a 1 or 2 lines in the context switch.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 08.05.2017 15:22, Mark Rutland wrote: Specifically, I think this does not align with the statement in 2.1 regarding the two TTBRs: This simplifies privilege checks and does not require any address translation for invalid memory accesses and thus no cache lookups. ... since the use of the TTBRs is orthogonal to privilege checks and/or the design of the TLBs. Ok, this is a good point, we will try to clarify this in the paper.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 06.05.2017 10:38, Daniel Gruss wrote: On 2017-05-06 06:02, David Gens wrote: Assuming that their patch indeed leaks per-cpu addresses.. it might not necessarily be required to change it. I think we're not leaking them (unless we still have some bug in our code). Just to correct my answer here as well: Although we experimented with fixed mappings for per-cpu addresses, the current patch does not incorporate this yet, so it indeed still leaks. However, it is not a severe problem. The mapping of the required (per-cpu) variables would be at a fixed location in the user CR3, instead of the ones that are used in the kernel.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Mon, May 8, 2017 at 6:53 AM, Daniel Gruss wrote: > On 06.05.2017 10:38, Daniel Gruss wrote: >> >> On 2017-05-06 06:02, David Gens wrote: >>> >>> Assuming that their patch indeed leaks per-cpu addresses.. it might not >>> necessarily >>> be required to change it. >> >> >> I think we're not leaking them (unless we still have some bug in our >> code). > > > Just to correct my answer here as well: Although we experimented with fixed > mappings for per-cpu addresses, the current patch does not incorporate this > yet, so it indeed still leaks. However, it is not a severe problem. The > mapping of the required (per-cpu) variables would be at a fixed location in > the user CR3, instead of the ones that are used in the kernel. Why do you think it should be at a fixed location in the user CR3? I see that you just mirror the entries. You also mirror __entry_text_start / __entry_text_end which is part of the binary so will leak the base address of the kernel. Maybe I am missing something. -- Thomas
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 08.05.2017 16:09, Thomas Garnier wrote: Just to correct my answer here as well: Although we experimented with fixed mappings for per-cpu addresses, the current patch does not incorporate this yet, so it indeed still leaks. However, it is not a severe problem. The mapping of the required (per-cpu) variables would be at a fixed location in the user CR3, instead of the ones that are used in the kernel. Why do you think it should be at a fixed location in the user CR3? I see that you just mirror the entries. You also mirror __entry_text_start / __entry_text_end which is part of the binary so will leak the base address of the kernel. Maybe I am missing something. As I said, the current patch does not incorporate this yet, so yes, this part currently still leaks because we did not implement it yet.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Thu, May 4, 2017 at 3:02 AM, Daniel Gruss wrote: > After several recent works [1,2,3] KASLR on x86_64 was basically considered > dead by many researchers. We have been working on an efficient but effective > fix for this problem and found that not mapping the kernel space when > running in user mode is the solution to this problem [4] (the corresponding > paper [5] will be presented at ESSoS17). > > With this RFC patch we allow anybody to configure their kernel with the flag > CONFIG_KAISER to add our defense mechanism. > > If there are any questions we would love to answer them. > We also appreciate any comments! > > Cheers, > Daniel (+ the KAISER team from Graz University of Technology) > > [1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf > [2] > https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf > [3] > https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf > [4] https://github.com/IAIK/KAISER > [5] https://gruss.cc/files/kaiser.pdf > > Please read the documentation on submitting patches [1] and coding style [2]. I have two questions: - How this approach prevent the hardware attacks you mentioned? You still have to keep a part of _text in the pagetable and an attacker could discover it no? (and deduce the kernel base address). You also need to make it clear that btb attacks are still possible. - What is the perf impact? [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst Thanks, -- Thomas
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 04.05.2017 17:28, Thomas Garnier wrote: Please read the documentation on submitting patches [1] and coding style [2]. I will have a closer look at that. - How this approach prevent the hardware attacks you mentioned? You still have to keep a part of _text in the pagetable and an attacker could discover it no? (and deduce the kernel base address). These parts are moved to a different section (.user_mapped) which is at a possibly predictable location - the location of the randomized parts of the kernel is independent of the location of .user_mapped. The code/data footprint for .user_mapped is quite small, helping to reduce or eliminate the attack surface... You also need to make it clear that btb attacks are still possible. By just increasing the KASLR randomization range, btb attacks can be mitigated (for free). - What is the perf impact? It will vary for different machines. We have promising results (<1%) for an i7-6700K with representative benchmarks. However, for older systems or for workloads with a lot of pressure on some TLB levels, the performance may be much worse.
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Fri, May 5, 2017 at 1:23 AM, Daniel Gruss wrote: > > On 04.05.2017 17:28, Thomas Garnier wrote: >> >> Please read the documentation on submitting patches [1] and coding style [2]. > > > I will have a closer look at that. > >> - How this approach prevent the hardware attacks you mentioned? You >> still have to keep a part of _text in the pagetable and an attacker >> could discover it no? (and deduce the kernel base address). > > > These parts are moved to a different section (.user_mapped) which is at a > possibly predictable location - the location of the randomized parts of the > kernel is independent of the location of .user_mapped. > The code/data footprint for .user_mapped is quite small, helping to reduce or > eliminate the attack surface... > If I get it right, it means you can leak the per-cpu address instead of the kernel. Correct? That would be a problem because you can elevate privilege by overwriting per-cpu variables. Leaking this address means also defeating KASLR memory randomization [3] (cf paper in the commit). In theory you could put the code in the fixmap but you still have the per-cpu variables and changing that is hard. [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=021182e52fe01c1f7b126f97fd6ba048dc4234fd >> You also need to make it clear that btb attacks are still possible. > > > By just increasing the KASLR randomization range, btb attacks can be > mitigated (for free). Correct, I hope we can do that. > >> - What is the perf impact? > > > It will vary for different machines. We have promising results (<1%) for an > i7-6700K with representative benchmarks. However, for older systems or for > workloads with a lot of pressure on some TLB levels, the performance may be > much worse. I think including performance data in both cases would be useful. -- Thomas
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Thu, May 4, 2017 at 12:02 PM, Daniel Gruss wrote: > After several recent works [1,2,3] KASLR on x86_64 was basically considered > dead by many researchers. We have been working on an efficient but effective > fix for this problem and found that not mapping the kernel space when > running in user mode is the solution to this problem [4] (the corresponding > paper [5] will be presented at ESSoS17). > > With this RFC patch we allow anybody to configure their kernel with the flag > CONFIG_KAISER to add our defense mechanism. > > If there are any questions we would love to answer them. > We also appreciate any comments! Why do you need this SWITCH_KERNEL_CR3_NO_STACK logic? It would make sense if the kernel stacks weren't mapped, but if they weren't mapped, I don't see how the entry_INT80_compat entry point could work at all - the software interrupt itself already pushes values on the kernel stack. You could maybe work around that using some sort of trampoline stack, but I don't see anything like that. Am I missing something?
Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On Fri, May 5, 2017 at 5:49 PM, Jann Horn wrote: > On Thu, May 4, 2017 at 12:02 PM, Daniel Gruss > wrote: >> After several recent works [1,2,3] KASLR on x86_64 was basically considered >> dead by many researchers. We have been working on an efficient but effective >> fix for this problem and found that not mapping the kernel space when >> running in user mode is the solution to this problem [4] (the corresponding >> paper [5] will be presented at ESSoS17). >> >> With this RFC patch we allow anybody to configure their kernel with the flag >> CONFIG_KAISER to add our defense mechanism. >> >> If there are any questions we would love to answer them. >> We also appreciate any comments! > > Why do you need this SWITCH_KERNEL_CR3_NO_STACK logic? It would > make sense if the kernel stacks weren't mapped, but if they weren't mapped, > I don't see how the entry_INT80_compat entry point could work at all - the > software interrupt itself already pushes values on the kernel stack. You could > maybe work around that using some sort of trampoline stack, but I don't see > anything like that. Am I missing something? Ah, I think I understand. The kernel stacks are mapped, but cpu_current_top_of_stack isn't, so you can't find the stack until after the CR3 switch in the syscall handler?
Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Just did a quick test on my main KVM host, a 8 core Intel(R) Xeon(R) CPU E3-1240 V2. KVM guests are 4.10 w/o CONFIG_KAISER and kvmconfig without CONFIG_PARAVIRT. Building a defconfig kernel within that guests is about 10% slower when CONFIG_KAISER is enabled. Thank you for testing it! :) Is this expected? It sounds plausible. First, I would expect any form of virtualization to increase the overhead. Second, for the processor (Ivy Bridge), I would have expected even higher performance overheads. KAISER utilizes very recent performance improvements in Intel processors... If it helps I can redo the same test also on bare metal. I'm not sure how we proceed here and if this would help, because I don't know what everyone expects. KAISER definitely introduces an overhead, no doubt about that. How much overhead it is depends on the specific hardware and may be very little on recent architectures and more on older machines. We are not proposing to enable KAISER by default, but to provide the config option to allow easy integration into hardened kernels where performance overheads may be acceptable (which depends on the specific use case and the specific hardware).
Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Daniel, Am 07.05.2017 um 23:45 schrieb Daniel Gruss: >> Just did a quick test on my main KVM host, a 8 core Intel(R) Xeon(R) >> CPU E3-1240 V2. >> KVM guests are 4.10 w/o CONFIG_KAISER and kvmconfig without CONFIG_PARAVIRT. >> Building a defconfig kernel within that guests is about 10% slower >> when CONFIG_KAISER >> is enabled. > > Thank you for testing it! :) > >> Is this expected? > > It sounds plausible. First, I would expect any form of virtualization to > increase the overhead. Second, for the processor (Ivy Bridge), I would have > expected even higher > performance overheads. KAISER utilizes very recent performance improvements > in Intel processors... Ahh, *very* recent is the keyword then. ;) I was a bit confused since in your paper the overhead is less than 1%. What platforms did you test? i.e. how does it perform on recent AMD systems? Thanks, //richard
Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Daniel, On Fri, May 5, 2017 at 9:40 AM, Daniel Gruss wrote: > I'm sure the overhead on older systems is larger than on recent systems. Just did a quick test on my main KVM host, a 8 core Intel(R) Xeon(R) CPU E3-1240 V2. KVM guests are 4.10 w/o CONFIG_KAISER and kvmconfig without CONFIG_PARAVIRT. Building a defconfig kernel within that guests is about 10% slower when CONFIG_KAISER is enabled. Is this expected? If it helps I can redo the same test also on bare metal. -- Thanks, //richard
Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
On 2017-05-08 00:02, Richard Weinberger wrote: Ahh, *very* recent is the keyword then. ;) I was a bit confused since in your paper the overhead is less than 1%. Yes, only for very recent platforms (Skylake). While working on the paper we were surprised that we found overheads that small. What platforms did you test? We tested it on multiple platforms for stability, but we only ran longer performance tests on different Skylake i7-6700K systems we mentioned in the paper. i.e. how does it perform on recent AMD systems? Unfortunately, we don't have any AMD systems at hand. I'm also not sure how AMD is affected by the issue in the first place. Although unlikely, there is the possibility that the problem of KASLR information leakage through microarchitectural side channels might be Intel-specific.