Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O

2019-01-31 Thread Daniel Gruss
On 1/31/19 1:08 PM, Jiri Kosina wrote:
> On Thu, 31 Jan 2019, Daniel Gruss wrote:
> 
>> If I understood it correctly, this patch just removes the advantages of 
>> preadv2 over mmmap+access for the attacker.
> 
> Which is the desired effect. We are not trying to solve the timing aspect, 
> as I don't think there is a reasonable way to do it, is there?

There are two building blocks to cache attacks, bringing the cache into
a state, and observing a state change, you can mitigate them by breaking
either of these building blocks.

For most attacks the attacker would be interested in observing *when* a
specific victim page is loaded into the page cache rather than observing
whether it is in the page cache right now (it could be there for ages if
the system was not under memory pressure).
So, one could try to prevent interference in the page cache between
attacker and victim -> working set algorithms do that to some extent.
Simpler idea (with more side effects) would be limiting the maximum
share of the page cache per user (or per process, depending on the
threat model)...


Cheers,
Daniel


Re: [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O

2019-01-31 Thread Daniel Gruss
On 1/30/19 1:44 PM, Vlastimil Babka wrote:
> Close that sidechannel by always initiating readahead on the cache if we
> encounter a cache miss for preadv2(RWF_NOWAIT); with that in place, probing
> the pagecache residency itself will actually populate the cache, making the
> sidechannel useless.

I fear this does not really close the side channel. You can time the
preadv2 function and infer which path it took, so you just bring it down
to the same as using mmap and timing accesses.
If I understood it correctly, this patch just removes the advantages of
preadv2 over mmmap+access for the attacker.


Cheers,
Daniel


Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Daniel Gruss
On 1/7/19 12:08 PM, Dominique Martinet wrote:
>> That's my bigger concern here. In [1] there's described a remote attack
>> (on webserver) using the page fault timing differences for present/not
>> present page cache pages. Noisy but works, and I expect locally it to be
>> much less noisy. Yet the countermeasures section only mentions
>> restricting mincore() as if it was sufficient (and also how to make
>> evictions harder, but that's secondary IMHO).
> 
> I'd suggest making clock rougher for non-root users but javascript tried
> that and it wasn't enough... :)
> Honestly won't be of much help there, good luck?

Restricting mincore() is sufficient to fix the hardware-agnostic part.
If the attack is not hardware-agnostic anymore, an attacker could also
just use a hardware cache attack, which has a higher temporal and
spatial resolution, so there's no reason why the attacker would use page
cache attacks instead then.


Cheers,
Daniel


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 08.05.2017 16:09, Thomas Garnier wrote:

Just to correct my answer here as well: Although we experimented with fixed
mappings for per-cpu addresses, the current patch does not incorporate this
yet, so it indeed still leaks. However, it is not a severe problem. The
mapping of the required (per-cpu) variables would be at a fixed location in
the user CR3, instead of the ones that are used in the kernel.


Why do you think it should be at a fixed location in the user CR3? I
see that you just mirror the entries. You also mirror
__entry_text_start / __entry_text_end which is part of the binary so
will leak the base address of the kernel. Maybe I am missing
something.


As I said, the current patch does not incorporate this yet, so yes, this part currently still leaks because we did not 
implement it yet.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 08.05.2017 16:09, Thomas Garnier wrote:

Just to correct my answer here as well: Although we experimented with fixed
mappings for per-cpu addresses, the current patch does not incorporate this
yet, so it indeed still leaks. However, it is not a severe problem. The
mapping of the required (per-cpu) variables would be at a fixed location in
the user CR3, instead of the ones that are used in the kernel.


Why do you think it should be at a fixed location in the user CR3? I
see that you just mirror the entries. You also mirror
__entry_text_start / __entry_text_end which is part of the binary so
will leak the base address of the kernel. Maybe I am missing
something.


As I said, the current patch does not incorporate this yet, so yes, this part currently still leaks because we did not 
implement it yet.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 06.05.2017 10:38, Daniel Gruss wrote:

On 2017-05-06 06:02, David Gens wrote:

Assuming that their patch indeed leaks per-cpu addresses.. it might not
necessarily
be required to change it.


I think we're not leaking them (unless we still have some bug in our code).


Just to correct my answer here as well: Although we experimented with fixed mappings for per-cpu addresses, the current 
patch does not incorporate this yet, so it indeed still leaks. However, it is not a severe problem. The mapping of the 
required (per-cpu) variables would be at a fixed location in the user CR3, instead of the ones that are used in the kernel.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 06.05.2017 10:38, Daniel Gruss wrote:

On 2017-05-06 06:02, David Gens wrote:

Assuming that their patch indeed leaks per-cpu addresses.. it might not
necessarily
be required to change it.


I think we're not leaking them (unless we still have some bug in our code).


Just to correct my answer here as well: Although we experimented with fixed mappings for per-cpu addresses, the current 
patch does not incorporate this yet, so it indeed still leaks. However, it is not a severe problem. The mapping of the 
required (per-cpu) variables would be at a fixed location in the user CR3, instead of the ones that are used in the kernel.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 08.05.2017 15:22, Mark Rutland wrote:

Specifically, I think this does not align with the statement in 2.1
regarding the two TTBRs:

  This simplifies privilege checks and does not require any address
  translation for invalid memory accesses and thus no cache lookups.

... since the use of the TTBRs is orthogonal to privilege checks and/or
the design of the TLBs.


Ok, this is a good point, we will try to clarify this in the paper.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 08.05.2017 15:22, Mark Rutland wrote:

Specifically, I think this does not align with the statement in 2.1
regarding the two TTBRs:

  This simplifies privilege checks and does not require any address
  translation for invalid memory accesses and thus no cache lookups.

... since the use of the TTBRs is orthogonal to privilege checks and/or
the design of the TLBs.


Ok, this is a good point, we will try to clarify this in the paper.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 05.05.2017 10:23, Daniel Gruss wrote:

 - How this approach prevent the hardware attacks you mentioned? You
still have to keep a part of _text in the pagetable and an attacker
could discover it no? (and deduce the kernel base address).


These parts are moved to a different section (.user_mapped) which is at a 
possibly predictable location - the location
of the randomized parts of the kernel is independent of the location of 
.user_mapped.
The code/data footprint for .user_mapped is quite small, helping to reduce or 
eliminate the attack surface...


We just discussed that in our group again: although we experimented with this part, it's not yet included in the patch. 
The solution we sketched is, as I wrote, we map the required (per-thread) variables in the user CR3 to a fixed location 
in memory. During the context switch, only this fixed part remains mapped but not the randomized pages. This is not a 
lot of work, because it's just mapping a few more pages and fixing a 1 or 2 lines in the context switch.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

On 05.05.2017 10:23, Daniel Gruss wrote:

 - How this approach prevent the hardware attacks you mentioned? You
still have to keep a part of _text in the pagetable and an attacker
could discover it no? (and deduce the kernel base address).


These parts are moved to a different section (.user_mapped) which is at a 
possibly predictable location - the location
of the randomized parts of the kernel is independent of the location of 
.user_mapped.
The code/data footprint for .user_mapped is quite small, helping to reduce or 
eliminate the attack surface...


We just discussed that in our group again: although we experimented with this part, it's not yet included in the patch. 
The solution we sketched is, as I wrote, we map the required (per-thread) variables in the user CR3 to a fixed location 
in memory. During the context switch, only this fixed part remains mapped but not the randomized pages. This is not a 
lot of work, because it's just mapping a few more pages and fixing a 1 or 2 lines in the context switch.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

While it may be the case that in practice ARM systems do not have such a
side channel, I think that it is erroneous to believe that the
architectural TTBR{0,1} split ensures this.

The use of TTBR0 for user and TTBR1 for kernel is entirely a SW policy,
and not an architectural requirement. It is possible to map data in
TTBR1 which is accessible to userspace, and data in TTBR0 which is only
accessible by the kernel. In either case, this is determined by the page
tables themselves.


Absolutely right, but TTBR0 and TTBR1 are usually used in this way.


Given this, I think that the statements in the KAISER paper regarding
the TTBRs (in section 2.1) are not quite right. Architecturally,
permission checks and lookups cannot be elided based on the TTBR used.


As we say in section 2.1, they are "typically" used in this way, and this prevents the attacks. Not just the presence of 
a second register, but the way how the two registers are used to split the translation tables for user and kernel.




Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-08 Thread Daniel Gruss

While it may be the case that in practice ARM systems do not have such a
side channel, I think that it is erroneous to believe that the
architectural TTBR{0,1} split ensures this.

The use of TTBR0 for user and TTBR1 for kernel is entirely a SW policy,
and not an architectural requirement. It is possible to map data in
TTBR1 which is accessible to userspace, and data in TTBR0 which is only
accessible by the kernel. In either case, this is determined by the page
tables themselves.


Absolutely right, but TTBR0 and TTBR1 are usually used in this way.


Given this, I think that the statements in the KAISER paper regarding
the TTBRs (in section 2.1) are not quite right. Architecturally,
permission checks and lookups cannot be elided based on the TTBR used.


As we say in section 2.1, they are "typically" used in this way, and this prevents the attacks. Not just the presence of 
a second register, but the way how the two registers are used to split the translation tables for user and kernel.




Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-07 Thread Daniel Gruss

On 2017-05-08 00:02, Richard Weinberger wrote:

Ahh, *very* recent is the keyword then. ;)
I was a bit confused since in your paper the overhead is less than 1%.


Yes, only for very recent platforms (Skylake). While working on the 
paper we were surprised that we found overheads that small.



What platforms did you test?


We tested it on multiple platforms for stability, but we only ran longer 
performance tests on different Skylake i7-6700K systems we mentioned in 
the paper.



i.e. how does it perform on recent AMD systems?


Unfortunately, we don't have any AMD systems at hand. I'm also not sure 
how AMD is affected by the issue in the first place. Although unlikely, 
there is the possibility that the problem of KASLR information leakage 
through microarchitectural side channels might be Intel-specific.


Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-07 Thread Daniel Gruss

On 2017-05-08 00:02, Richard Weinberger wrote:

Ahh, *very* recent is the keyword then. ;)
I was a bit confused since in your paper the overhead is less than 1%.


Yes, only for very recent platforms (Skylake). While working on the 
paper we were surprised that we found overheads that small.



What platforms did you test?


We tested it on multiple platforms for stability, but we only ran longer 
performance tests on different Skylake i7-6700K systems we mentioned in 
the paper.



i.e. how does it perform on recent AMD systems?


Unfortunately, we don't have any AMD systems at hand. I'm also not sure 
how AMD is affected by the issue in the first place. Although unlikely, 
there is the possibility that the problem of KASLR information leakage 
through microarchitectural side channels might be Intel-specific.


Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-07 Thread Daniel Gruss

Just did a quick test on my main KVM host, a 8 core Intel(R) Xeon(R)
CPU E3-1240 V2.
KVM guests are 4.10 w/o CONFIG_KAISER and kvmconfig without CONFIG_PARAVIRT.
Building a defconfig kernel within that guests is about 10% slower
when CONFIG_KAISER
is enabled.


Thank you for testing it! :)


Is this expected?


It sounds plausible. First, I would expect any form of virtualization to 
increase the overhead. Second, for the processor (Ivy Bridge), I would 
have expected even higher performance overheads. KAISER utilizes very 
recent performance improvements in Intel processors...



If it helps I can redo the same test also on bare metal.


I'm not sure how we proceed here and if this would help, because I don't 
know what everyone expects.
KAISER definitely introduces an overhead, no doubt about that. How much 
overhead it is depends on the specific hardware and may be very little 
on recent architectures and more on older machines.
We are not proposing to enable KAISER by default, but to provide the 
config option to allow easy integration into hardened kernels where 
performance overheads may be acceptable (which depends on the specific 
use case and the specific hardware).


Re: [kernel-hardening] Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-07 Thread Daniel Gruss

Just did a quick test on my main KVM host, a 8 core Intel(R) Xeon(R)
CPU E3-1240 V2.
KVM guests are 4.10 w/o CONFIG_KAISER and kvmconfig without CONFIG_PARAVIRT.
Building a defconfig kernel within that guests is about 10% slower
when CONFIG_KAISER
is enabled.


Thank you for testing it! :)


Is this expected?


It sounds plausible. First, I would expect any form of virtualization to 
increase the overhead. Second, for the processor (Ivy Bridge), I would 
have expected even higher performance overheads. KAISER utilizes very 
recent performance improvements in Intel processors...



If it helps I can redo the same test also on bare metal.


I'm not sure how we proceed here and if this would help, because I don't 
know what everyone expects.
KAISER definitely introduces an overhead, no doubt about that. How much 
overhead it is depends on the specific hardware and may be very little 
on recent architectures and more on older machines.
We are not proposing to enable KAISER by default, but to provide the 
config option to allow easy integration into hardened kernels where 
performance overheads may be acceptable (which depends on the specific 
use case and the specific hardware).


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-06 Thread Daniel Gruss

On 2017-05-06 06:02, David Gens wrote:

Assuming that their patch indeed leaks per-cpu addresses.. it might not
necessarily
be required to change it.


I think we're not leaking them (unless we still have some bug in our 
code). The basic idea is that any part that is required for the context 
switch is at a fixed location (unrelated to the location of code / data 
/ per-cpu data / ...) and thus does not reveal any randomized offsets. 
Then the attacker cannot gain any knowledge through the side channel 
anymore.
For any attack the attacker could then only use the few KBs of memory 
that cannot be unmapped because of the way x86 works. Hardening these 
few KBs seems like an easier task than doing the same for the entire kernel.


(The best solution would of course be Intel introducing CR3A and CR3B 
just like ARM has TTBR0 and TTBR1 - on ARM this entirely prevents any 
prefetch / double-fault side-channel attacks.)


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-06 Thread Daniel Gruss

On 2017-05-06 06:02, David Gens wrote:

Assuming that their patch indeed leaks per-cpu addresses.. it might not
necessarily
be required to change it.


I think we're not leaking them (unless we still have some bug in our 
code). The basic idea is that any part that is required for the context 
switch is at a fixed location (unrelated to the location of code / data 
/ per-cpu data / ...) and thus does not reveal any randomized offsets. 
Then the attacker cannot gain any knowledge through the side channel 
anymore.
For any attack the attacker could then only use the few KBs of memory 
that cannot be unmapped because of the way x86 works. Hardening these 
few KBs seems like an easier task than doing the same for the entire kernel.


(The best solution would of course be Intel introducing CR3A and CR3B 
just like ARM has TTBR0 and TTBR1 - on ARM this entirely prevents any 
prefetch / double-fault side-channel attacks.)


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-06 Thread Daniel Gruss

On 2017-05-05 17:53, Jann Horn wrote:

Ah, I think I understand. The kernel stacks are mapped, but
cpu_current_top_of_stack isn't, so you can't find the stack until after the CR3
switch in the syscall handler?


That's the idea. Only the absolute minimum that is required for a 
context switch remains mapped (+ it is mapped at an offset which does 
not depend on KASLR -> we do not leak the KASLR offsets).




Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-06 Thread Daniel Gruss

On 2017-05-05 17:53, Jann Horn wrote:

Ah, I think I understand. The kernel stacks are mapped, but
cpu_current_top_of_stack isn't, so you can't find the stack until after the CR3
switch in the syscall handler?


That's the idea. Only the absolute minimum that is required for a 
context switch remains mapped (+ it is mapped at an offset which does 
not depend on KASLR -> we do not leak the KASLR offsets).




Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-05 Thread Daniel Gruss

On 04.05.2017 17:28, Thomas Garnier wrote:

Please read the documentation on submitting patches [1] and coding style [2].


I will have a closer look at that.


 - How this approach prevent the hardware attacks you mentioned? You
still have to keep a part of _text in the pagetable and an attacker
could discover it no? (and deduce the kernel base address).


These parts are moved to a different section (.user_mapped) which is at 
a possibly predictable location - the location of the randomized parts 
of the kernel is independent of the location of .user_mapped.
The code/data footprint for .user_mapped is quite small, helping to 
reduce or eliminate the attack surface...



You also need to make it clear that btb attacks are still possible.


By just increasing the KASLR randomization range, btb attacks can be 
mitigated (for free).



 - What is the perf impact?


It will vary for different machines. We have promising results (<1%) for 
an i7-6700K with representative benchmarks. However, for older systems 
or for workloads with a lot of pressure on some TLB levels, the 
performance may be much worse.


Re: [kernel-hardening] [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-05 Thread Daniel Gruss

On 04.05.2017 17:28, Thomas Garnier wrote:

Please read the documentation on submitting patches [1] and coding style [2].


I will have a closer look at that.


 - How this approach prevent the hardware attacks you mentioned? You
still have to keep a part of _text in the pagetable and an attacker
could discover it no? (and deduce the kernel base address).


These parts are moved to a different section (.user_mapped) which is at 
a possibly predictable location - the location of the randomized parts 
of the kernel is independent of the location of .user_mapped.
The code/data footprint for .user_mapped is quite small, helping to 
reduce or eliminate the attack surface...



You also need to make it clear that btb attacks are still possible.


By just increasing the KASLR randomization range, btb attacks can be 
mitigated (for free).



 - What is the perf impact?


It will vary for different machines. We have promising results (<1%) for 
an i7-6700K with representative benchmarks. However, for older systems 
or for workloads with a lot of pressure on some TLB levels, the 
performance may be much worse.


Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-05 Thread Daniel Gruss

On 04.05.2017 17:47, Christoph Hellwig wrote:

I'll try to read the paper.  In the meantime: how different is your
approach from then one here?

https://lwn.net/Articles/39283/

and how different is the performance impact?


The approach sounds very similar, but we have fewer changes because we 
don't want to change memory allocation but only split the virtual memory 
- everything can stay where it is.


We found that the CR3 switch seems to be significantly improved in 
modern microarchitectures (we performed our performance tests on a 
Skylake i7-6700K). We think the TLB maybe uses the full CR3 base address 
as a tag, relaxing the necessity of flushing the entire TLB upon CR3 
updates a bit.

Direct runtime overhead is switching the CR3, but that's it.
Indirectly, we're potentially increasing the number of TLB entries that 
are required on one or the other level of the TLB. For TLB-intense tasks 
this might lead to more significant performance penalties.


I'm sure the overhead on older systems is larger than on recent systems.



Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-05 Thread Daniel Gruss

On 04.05.2017 17:47, Christoph Hellwig wrote:

I'll try to read the paper.  In the meantime: how different is your
approach from then one here?

https://lwn.net/Articles/39283/

and how different is the performance impact?


The approach sounds very similar, but we have fewer changes because we 
don't want to change memory allocation but only split the virtual memory 
- everything can stay where it is.


We found that the CR3 switch seems to be significantly improved in 
modern microarchitectures (we performed our performance tests on a 
Skylake i7-6700K). We think the TLB maybe uses the full CR3 base address 
as a tag, relaxing the necessity of flushing the entire TLB upon CR3 
updates a bit.

Direct runtime overhead is switching the CR3, but that's it.
Indirectly, we're potentially increasing the number of TLB entries that 
are required on one or the other level of the TLB. For TLB-intense tasks 
this might lead to more significant performance penalties.


I'm sure the overhead on older systems is larger than on recent systems.



Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-04 Thread Daniel Gruss
Sorry, missed a file in the first mail (from some code cleanup), the 
full patch is now attached.



Cheers,
Daniel
>From c4b1831d44c6144d3762ccc72f0c4e71a0c713e5 Mon Sep 17 00:00:00 2001
From: Richard Fellner 
Date: Thu, 4 May 2017 14:16:44 +0200
Subject: [PATCH] KAISER: Kernel Address Isolation

This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

https://github.com/IAIK/KAISER
---
 arch/x86/entry/entry_64.S|  17 
 arch/x86/entry/entry_64_compat.S |   7 +-
 arch/x86/include/asm/hw_irq.h|   2 +-
 arch/x86/include/asm/kaiser.h| 113 +++
 arch/x86/include/asm/pgtable.h   |   4 +
 arch/x86/include/asm/pgtable_64.h|  21 +
 arch/x86/include/asm/pgtable_types.h |  12 ++-
 arch/x86/include/asm/processor.h |   7 +-
 arch/x86/kernel/cpu/common.c |   4 +-
 arch/x86/kernel/espfix_64.c  |   6 ++
 arch/x86/kernel/head_64.S|  16 +++-
 arch/x86/kernel/irqinit.c|   2 +-
 arch/x86/kernel/process.c|   2 +-
 arch/x86/mm/Makefile |   2 +-
 arch/x86/mm/kaiser.c | 172 +++
 arch/x86/mm/pageattr.c   |   2 +-
 arch/x86/mm/pgtable.c|  28 +-
 include/asm-generic/vmlinux.lds.h|  11 ++-
 include/linux/percpu-defs.h  |  30 ++
 init/main.c  |   5 +
 kernel/fork.c|   8 ++
 security/Kconfig |   7 ++
 22 files changed, 461 insertions(+), 17 deletions(-)
 create mode 100644 arch/x86/include/asm/kaiser.h
 create mode 100644 arch/x86/mm/kaiser.c

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 044d18e..631c7bf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 .code64
@@ -141,6 +142,7 @@ ENTRY(entry_SYSCALL_64)
 	 * it is too small to ever cause noticeable irq latency.
 	 */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	/*
 	 * A hypervisor implementation might want to use a label
 	 * after the swapgs, so that it can do the swapgs
@@ -223,6 +225,7 @@ entry_SYSCALL_64_fastpath:
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

@@ -318,10 +321,12 @@ return_from_SYSCALL_64:
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

 opportunistic_sysret_failed:
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
@@ -420,6 +425,7 @@ ENTRY(ret_from_fork)
 	leaq	FRAME_OFFSET(%rsp),%rdi	/* pt_regs pointer */
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
+	SWITCH_USER_CR3
 	SWAPGS
 	FRAME_END
 	jmp	restore_regs_and_iret
@@ -476,6 +482,7 @@ END(irq_entries_start)
 	 * tracking that we're in kernel mode.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -533,6 +540,7 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret

@@ -610,6 +618,7 @@ native_irq_return_ldt:

 	pushq	%rdi/* Stash user RDI */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* user RAX */
 	movq	(1*8)(%rsp), %rax		/* user RIP */
@@ -636,6 +645,7 @@ native_irq_return_ldt:
 	 * still points to an RO alias of the ESPFIX stack.
 	 */
 	orq	PER_CPU_VAR(espfix_stack), %rax
+	SWITCH_USER_CR3
 	SWAPGS
 	movq	%rax, %rsp

@@ -1034,6 +1044,7 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f/* negative -> in kernel */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 1:	ret
 END(paranoid_entry)
@@ -1056,6 +1067,7 @@ ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	paranoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
 	jmp	paranoid_exit_restore
 paranoid_exit_no_swapgs:
@@ -1085,6 +1097,7 @@ ENTRY(error_entry)
 	 * from user mode due to an IRET fault.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 .Lerror_entry_from_usermode_after_swapgs:
 	/*
@@ -1136,6 +1149,7 @@ ENTRY(error_entry)
 	 * Switch to kernel gsbase:
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
@@ -1234,6 +1248,7 @@ ENTRY(nmi)
 	 */

 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1274,6 +1289,7 @@ ENTRY(nmi)
 	 * Return back to user mode.  We must *not* 

Re: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-04 Thread Daniel Gruss
Sorry, missed a file in the first mail (from some code cleanup), the 
full patch is now attached.



Cheers,
Daniel
>From c4b1831d44c6144d3762ccc72f0c4e71a0c713e5 Mon Sep 17 00:00:00 2001
From: Richard Fellner 
Date: Thu, 4 May 2017 14:16:44 +0200
Subject: [PATCH] KAISER: Kernel Address Isolation

This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

https://github.com/IAIK/KAISER
---
 arch/x86/entry/entry_64.S|  17 
 arch/x86/entry/entry_64_compat.S |   7 +-
 arch/x86/include/asm/hw_irq.h|   2 +-
 arch/x86/include/asm/kaiser.h| 113 +++
 arch/x86/include/asm/pgtable.h   |   4 +
 arch/x86/include/asm/pgtable_64.h|  21 +
 arch/x86/include/asm/pgtable_types.h |  12 ++-
 arch/x86/include/asm/processor.h |   7 +-
 arch/x86/kernel/cpu/common.c |   4 +-
 arch/x86/kernel/espfix_64.c  |   6 ++
 arch/x86/kernel/head_64.S|  16 +++-
 arch/x86/kernel/irqinit.c|   2 +-
 arch/x86/kernel/process.c|   2 +-
 arch/x86/mm/Makefile |   2 +-
 arch/x86/mm/kaiser.c | 172 +++
 arch/x86/mm/pageattr.c   |   2 +-
 arch/x86/mm/pgtable.c|  28 +-
 include/asm-generic/vmlinux.lds.h|  11 ++-
 include/linux/percpu-defs.h  |  30 ++
 init/main.c  |   5 +
 kernel/fork.c|   8 ++
 security/Kconfig |   7 ++
 22 files changed, 461 insertions(+), 17 deletions(-)
 create mode 100644 arch/x86/include/asm/kaiser.h
 create mode 100644 arch/x86/mm/kaiser.c

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 044d18e..631c7bf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 .code64
@@ -141,6 +142,7 @@ ENTRY(entry_SYSCALL_64)
 	 * it is too small to ever cause noticeable irq latency.
 	 */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	/*
 	 * A hypervisor implementation might want to use a label
 	 * after the swapgs, so that it can do the swapgs
@@ -223,6 +225,7 @@ entry_SYSCALL_64_fastpath:
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

@@ -318,10 +321,12 @@ return_from_SYSCALL_64:
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

 opportunistic_sysret_failed:
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
@@ -420,6 +425,7 @@ ENTRY(ret_from_fork)
 	leaq	FRAME_OFFSET(%rsp),%rdi	/* pt_regs pointer */
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
+	SWITCH_USER_CR3
 	SWAPGS
 	FRAME_END
 	jmp	restore_regs_and_iret
@@ -476,6 +482,7 @@ END(irq_entries_start)
 	 * tracking that we're in kernel mode.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -533,6 +540,7 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret

@@ -610,6 +618,7 @@ native_irq_return_ldt:

 	pushq	%rdi/* Stash user RDI */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* user RAX */
 	movq	(1*8)(%rsp), %rax		/* user RIP */
@@ -636,6 +645,7 @@ native_irq_return_ldt:
 	 * still points to an RO alias of the ESPFIX stack.
 	 */
 	orq	PER_CPU_VAR(espfix_stack), %rax
+	SWITCH_USER_CR3
 	SWAPGS
 	movq	%rax, %rsp

@@ -1034,6 +1044,7 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f/* negative -> in kernel */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 1:	ret
 END(paranoid_entry)
@@ -1056,6 +1067,7 @@ ENTRY(paranoid_exit)
 	testl	%ebx, %ebx			/* swapgs needed? */
 	jnz	paranoid_exit_no_swapgs
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3_NO_STACK
 	SWAPGS_UNSAFE_STACK
 	jmp	paranoid_exit_restore
 paranoid_exit_no_swapgs:
@@ -1085,6 +1097,7 @@ ENTRY(error_entry)
 	 * from user mode due to an IRET fault.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 .Lerror_entry_from_usermode_after_swapgs:
 	/*
@@ -1136,6 +1149,7 @@ ENTRY(error_entry)
 	 * Switch to kernel gsbase:
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 	/*
 	 * Pretend that the exception came from user mode: set up pt_regs
@@ -1234,6 +1248,7 @@ ENTRY(nmi)
 	 */

 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1274,6 +1289,7 @@ ENTRY(nmi)
 	 * Return back to user mode.  We must *not* do the normal exit
 	 * work, 

[RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-04 Thread Daniel Gruss
After several recent works [1,2,3] KASLR on x86_64 was basically 
considered dead by many researchers. We have been working on an 
efficient but effective fix for this problem and found that not mapping 
the kernel space when running in user mode is the solution to this 
problem [4] (the corresponding paper [5] will be presented at ESSoS17).


With this RFC patch we allow anybody to configure their kernel with the 
flag CONFIG_KAISER to add our defense mechanism.


If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] 
https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] 
https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf

[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf


>From 03c413bc52f1ac253cf0f067605f367f3390d3f4 Mon Sep 17 00:00:00 2001
From: Richard Fellner 
Date: Thu, 4 May 2017 10:44:38 +0200
Subject: [PATCH] KAISER: Kernel Address Isolation

This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

https://github.com/IAIK/KAISER
---
 arch/x86/entry/entry_64.S| 17 +
 arch/x86/entry/entry_64_compat.S |  7 ++-
 arch/x86/include/asm/hw_irq.h|  2 +-
 arch/x86/include/asm/pgtable.h   |  4 
 arch/x86/include/asm/pgtable_64.h| 21 +
 arch/x86/include/asm/pgtable_types.h | 12 ++--
 arch/x86/include/asm/processor.h |  7 ++-
 arch/x86/kernel/cpu/common.c |  4 ++--
 arch/x86/kernel/espfix_64.c  |  6 ++
 arch/x86/kernel/head_64.S| 16 
 arch/x86/kernel/irqinit.c|  2 +-
 arch/x86/kernel/process.c|  2 +-
 arch/x86/mm/Makefile |  2 +-
 arch/x86/mm/pageattr.c   |  2 +-
 arch/x86/mm/pgtable.c| 28 +++-
 include/asm-generic/vmlinux.lds.h| 11 ++-
 include/linux/percpu-defs.h  | 30 ++
 init/main.c  |  5 +
 kernel/fork.c|  8 
 security/Kconfig |  7 +++
 20 files changed, 176 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 044d18e..631c7bf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 .code64
@@ -141,6 +142,7 @@ ENTRY(entry_SYSCALL_64)
 	 * it is too small to ever cause noticeable irq latency.
 	 */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	/*
 	 * A hypervisor implementation might want to use a label
 	 * after the swapgs, so that it can do the swapgs
@@ -223,6 +225,7 @@ entry_SYSCALL_64_fastpath:
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

@@ -318,10 +321,12 @@ return_from_SYSCALL_64:
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

 opportunistic_sysret_failed:
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
@@ -420,6 +425,7 @@ ENTRY(ret_from_fork)
 	leaq	FRAME_OFFSET(%rsp),%rdi	/* pt_regs pointer */
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
+	SWITCH_USER_CR3
 	SWAPGS
 	FRAME_END
 	jmp	restore_regs_and_iret
@@ -476,6 +482,7 @@ END(irq_entries_start)
 	 * tracking that we're in kernel mode.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -533,6 +540,7 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret

@@ -610,6 +618,7 @@ native_irq_return_ldt:

 	pushq	%rdi/* Stash user RDI */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* user RAX */
 	movq	(1*8)(%rsp), %rax		/* user RIP */
@@ -636,6 +645,7 @@ native_irq_return_ldt:
 	 * still points to an RO alias of the ESPFIX stack.
 	 */
 	orq	PER_CPU_VAR(espfix_stack), %rax
+	SWITCH_USER_CR3
 	SWAPGS
 	movq	%rax, %rsp

@@ -1034,6 +1044,7 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f/* negative -> in kernel */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 1:	ret
 END(paranoid_entry)
@@ 

[RFC, PATCH] x86_64: KAISER - do not map kernel in user mode

2017-05-04 Thread Daniel Gruss
After several recent works [1,2,3] KASLR on x86_64 was basically 
considered dead by many researchers. We have been working on an 
efficient but effective fix for this problem and found that not mapping 
the kernel space when running in user mode is the solution to this 
problem [4] (the corresponding paper [5] will be presented at ESSoS17).


With this RFC patch we allow anybody to configure their kernel with the 
flag CONFIG_KAISER to add our defense mechanism.


If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] 
https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] 
https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf

[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf


>From 03c413bc52f1ac253cf0f067605f367f3390d3f4 Mon Sep 17 00:00:00 2001
From: Richard Fellner 
Date: Thu, 4 May 2017 10:44:38 +0200
Subject: [PATCH] KAISER: Kernel Address Isolation

This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

https://github.com/IAIK/KAISER
---
 arch/x86/entry/entry_64.S| 17 +
 arch/x86/entry/entry_64_compat.S |  7 ++-
 arch/x86/include/asm/hw_irq.h|  2 +-
 arch/x86/include/asm/pgtable.h   |  4 
 arch/x86/include/asm/pgtable_64.h| 21 +
 arch/x86/include/asm/pgtable_types.h | 12 ++--
 arch/x86/include/asm/processor.h |  7 ++-
 arch/x86/kernel/cpu/common.c |  4 ++--
 arch/x86/kernel/espfix_64.c  |  6 ++
 arch/x86/kernel/head_64.S| 16 
 arch/x86/kernel/irqinit.c|  2 +-
 arch/x86/kernel/process.c|  2 +-
 arch/x86/mm/Makefile |  2 +-
 arch/x86/mm/pageattr.c   |  2 +-
 arch/x86/mm/pgtable.c| 28 +++-
 include/asm-generic/vmlinux.lds.h| 11 ++-
 include/linux/percpu-defs.h  | 30 ++
 init/main.c  |  5 +
 kernel/fork.c|  8 
 security/Kconfig |  7 +++
 20 files changed, 176 insertions(+), 17 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 044d18e..631c7bf 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 .code64
@@ -141,6 +142,7 @@ ENTRY(entry_SYSCALL_64)
 	 * it is too small to ever cause noticeable irq latency.
 	 */
 	SWAPGS_UNSAFE_STACK
+	SWITCH_KERNEL_CR3_NO_STACK
 	/*
 	 * A hypervisor implementation might want to use a label
 	 * after the swapgs, so that it can do the swapgs
@@ -223,6 +225,7 @@ entry_SYSCALL_64_fastpath:
 	movq	RIP(%rsp), %rcx
 	movq	EFLAGS(%rsp), %r11
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

@@ -318,10 +321,12 @@ return_from_SYSCALL_64:
 syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
+	SWITCH_USER_CR3
 	movq	RSP(%rsp), %rsp
 	USERGS_SYSRET64

 opportunistic_sysret_failed:
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
@@ -420,6 +425,7 @@ ENTRY(ret_from_fork)
 	leaq	FRAME_OFFSET(%rsp),%rdi	/* pt_regs pointer */
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
+	SWITCH_USER_CR3
 	SWAPGS
 	FRAME_END
 	jmp	restore_regs_and_iret
@@ -476,6 +482,7 @@ END(irq_entries_start)
 	 * tracking that we're in kernel mode.
 	 */
 	SWAPGS
+	SWITCH_KERNEL_CR3

 	/*
 	 * We need to tell lockdep that IRQs are off.  We can't do this until
@@ -533,6 +540,7 @@ GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
 	TRACE_IRQS_IRETQ
+	SWITCH_USER_CR3
 	SWAPGS
 	jmp	restore_regs_and_iret

@@ -610,6 +618,7 @@ native_irq_return_ldt:

 	pushq	%rdi/* Stash user RDI */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	movq	PER_CPU_VAR(espfix_waddr), %rdi
 	movq	%rax, (0*8)(%rdi)		/* user RAX */
 	movq	(1*8)(%rsp), %rax		/* user RIP */
@@ -636,6 +645,7 @@ native_irq_return_ldt:
 	 * still points to an RO alias of the ESPFIX stack.
 	 */
 	orq	PER_CPU_VAR(espfix_stack), %rax
+	SWITCH_USER_CR3
 	SWAPGS
 	movq	%rax, %rsp

@@ -1034,6 +1044,7 @@ ENTRY(paranoid_entry)
 	testl	%edx, %edx
 	js	1f/* negative -> in kernel */
 	SWAPGS
+	SWITCH_KERNEL_CR3
 	xorl	%ebx, %ebx
 1:	ret
 END(paranoid_entry)
@@ -1056,6 +1067,7 @@ 

Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-11-01 Thread Daniel Gruss

On 01.11.2016 09:10, Pavel Machek wrote:

cpu family : 6
model: 23
model name   : Intel(R) Core(TM)2 Duo CPU E7400  @ 2.80GHz
stepping : 10
microcode: 0xa07

so rowhammerjs/native is not available for this system. Bit mapping
for memory hash functions would need to be reverse engineered for more
effective attack.


By coincidence, we wrote a tool to do that in software: 
https://github.com/IAIK/drama ;)




Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-11-01 Thread Daniel Gruss

On 01.11.2016 09:10, Pavel Machek wrote:

cpu family : 6
model: 23
model name   : Intel(R) Core(TM)2 Duo CPU E7400  @ 2.80GHz
stepping : 10
microcode: 0xa07

so rowhammerjs/native is not available for this system. Bit mapping
for memory hash functions would need to be reverse engineered for more
effective attack.


By coincidence, we wrote a tool to do that in software: 
https://github.com/IAIK/drama ;)




Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-11-01 Thread Daniel Gruss

On 01.11.2016 07:33, Ingo Molnar wrote:

Can you suggest a method to find heavily rowhammer affected hardware? Only by
testing it, or are there some chipset IDs ranges or dmidecode info that will
pinpoint potentially affected machines?


I have worked with many different systems both on running rowhammer 
attacks and testing defense mechanisms. So far, every Ivy Bridge i5 
(DDR3) that I had access to was susceptible to bit flips - you will have 
highest chances with Ivy Bridge i5...




Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-11-01 Thread Daniel Gruss

On 01.11.2016 07:33, Ingo Molnar wrote:

Can you suggest a method to find heavily rowhammer affected hardware? Only by
testing it, or are there some chipset IDs ranges or dmidecode info that will
pinpoint potentially affected machines?


I have worked with many different systems both on running rowhammer 
attacks and testing defense mechanisms. So far, every Ivy Bridge i5 
(DDR3) that I had access to was susceptible to bit flips - you will have 
highest chances with Ivy Bridge i5...




Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 30.10.2016 00:01, Pavel Machek wrote:

Hmm, maybe I'm glad I don't have a new machine :-).

I assume you still get _some_ bitflips with generic "rowhammer"?


1 or 2 every 20-30 minutes...



Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 30.10.2016 00:01, Pavel Machek wrote:

Hmm, maybe I'm glad I don't have a new machine :-).

I assume you still get _some_ bitflips with generic "rowhammer"?


1 or 2 every 20-30 minutes...



Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 29.10.2016 23:45, Pavel Machek wrote:

indy/sandy/haswell/skylake, so I'll just use the generic version...?)


yes, generic might work, but i never tested it on anything that old...

on my system i have >30 bit flips per second (ivy bridge i5-3xxx) with 
the rowhammer-ivy test... sometimes even more than 100 per second...


Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 29.10.2016 23:45, Pavel Machek wrote:

indy/sandy/haswell/skylake, so I'll just use the generic version...?)


yes, generic might work, but i never tested it on anything that old...

on my system i have >30 bit flips per second (ivy bridge i5-3xxx) with 
the rowhammer-ivy test... sometimes even more than 100 per second...


Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 29.10.2016 23:05, Pavel Machek wrote:

So far I did bzip2 and kernel compilation. I believe I can prevent
flips in rowhammer-test with bzip2 going from 4 seconds to 5
seconds... let me see.


can you prevent bitflips in this one? 
https://github.com/IAIK/rowhammerjs/tree/master/native



Ok, let me try that. Problem is that the machine I'm testing on takes
20 minutes to produce bit flip...


will be lots faster with my code above ;)


Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 29.10.2016 23:05, Pavel Machek wrote:

So far I did bzip2 and kernel compilation. I believe I can prevent
flips in rowhammer-test with bzip2 going from 4 seconds to 5
seconds... let me see.


can you prevent bitflips in this one? 
https://github.com/IAIK/rowhammerjs/tree/master/native



Ok, let me try that. Problem is that the machine I'm testing on takes
20 minutes to produce bit flip...


will be lots faster with my code above ;)


Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 29.10.2016 21:42, Pavel Machek wrote:

Congratulations. Now I'd like to take away your toys :-).


I'm would like you to do that, but I'm very confident you're not 
successful the way your starting ;)



Not in my testing.


Have you tried music/video reencoding? Games? Anything that works with a 
decent amount of memory but not too much hard disk i/o?

Numbers are very clear there...


First, I'm not at all sure lowest CPU speed would
make any difference at all


It would. I've seen many bitflips but none where the CPU operated in the 
lower frequency range.



Second, going to lowest clock speed will reduce performance


As does the countermeasure you propose...


No, sorry, not going to play this particular whack-a-mole game.


But you are already with the countermeasure you propose...


Linux is designed for working hardware, and with bit flips, something is
going to break. (Does Flip Feng Shui really depend on dedup?)


Deduplication should be disabled not because of bit flips but because of 
information leakage (deduplication attacks, cache side-channel attacks, ...)


Yes, Flip Feng Shui requires deduplication and does not work without.
Disabling deduplication is what the authors recommend as a countermeasure.


But it will be nowhere near complete fix, right?

Will fix user attacking kernel, but not user1 attacking user2. You
could put each "user" into separate 2MB region, but then you'd have to
track who needs go go where. (Same uid is not enough, probably "can
ptrace"?)


Exactly. But preventing user2kernel is already a good start, and you 
would prevent that without any doubt and without any cost.


user2user is something else to think about and more complicated because 
you have shared libraries + copy on write --> same problems as 
deduplication. I think it might make sense to discuss whether separating 
by uids or even pids would be viable.



That'll still let remote server gain permissons of local user running
web server... using javascript exploit right?  And that's actually
attack that I find most scary. Local user to root exploit is bad, but
getting permissions of web browser from remote web server is very,
very, very bad.


Rowhammer.js skips the browser... it goes JS to full phys. memory 
access. Anyway, preventing Rowhammer from JS should be easy because even 
the slightest slow down should be enough to prevent any Rowhammer attack 
from JS.



That is a simple fix that does not cost any runtime performance.


Simple? Not really, I'm afraid. Feel free to try to implement it.


I had a student who already implemented this in another OS, I'm 
confident it can be done in Linux as well...



Cheers,
Daniel


Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

On 29.10.2016 21:42, Pavel Machek wrote:

Congratulations. Now I'd like to take away your toys :-).


I'm would like you to do that, but I'm very confident you're not 
successful the way your starting ;)



Not in my testing.


Have you tried music/video reencoding? Games? Anything that works with a 
decent amount of memory but not too much hard disk i/o?

Numbers are very clear there...


First, I'm not at all sure lowest CPU speed would
make any difference at all


It would. I've seen many bitflips but none where the CPU operated in the 
lower frequency range.



Second, going to lowest clock speed will reduce performance


As does the countermeasure you propose...


No, sorry, not going to play this particular whack-a-mole game.


But you are already with the countermeasure you propose...


Linux is designed for working hardware, and with bit flips, something is
going to break. (Does Flip Feng Shui really depend on dedup?)


Deduplication should be disabled not because of bit flips but because of 
information leakage (deduplication attacks, cache side-channel attacks, ...)


Yes, Flip Feng Shui requires deduplication and does not work without.
Disabling deduplication is what the authors recommend as a countermeasure.


But it will be nowhere near complete fix, right?

Will fix user attacking kernel, but not user1 attacking user2. You
could put each "user" into separate 2MB region, but then you'd have to
track who needs go go where. (Same uid is not enough, probably "can
ptrace"?)


Exactly. But preventing user2kernel is already a good start, and you 
would prevent that without any doubt and without any cost.


user2user is something else to think about and more complicated because 
you have shared libraries + copy on write --> same problems as 
deduplication. I think it might make sense to discuss whether separating 
by uids or even pids would be viable.



That'll still let remote server gain permissons of local user running
web server... using javascript exploit right?  And that's actually
attack that I find most scary. Local user to root exploit is bad, but
getting permissions of web browser from remote web server is very,
very, very bad.


Rowhammer.js skips the browser... it goes JS to full phys. memory 
access. Anyway, preventing Rowhammer from JS should be easy because even 
the slightest slow down should be enough to prevent any Rowhammer attack 
from JS.



That is a simple fix that does not cost any runtime performance.


Simple? Not really, I'm afraid. Feel free to try to implement it.


I had a student who already implemented this in another OS, I'm 
confident it can be done in Linux as well...



Cheers,
Daniel


Re: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

I think that this idea to mitigate Rowhammer is not a good approach.

I wrote Rowhammer.js (we published a paper on that) and I had the first 
reproducible bit flips on DDR4 at both, increased and default refresh 
rates (published in our DRAMA paper).


We have researched the number of cache misses induced from different 
applications in the past and there are many applications that cause more 
cache misses than Rowhammer (published in our Flush+Flush paper) they 
just cause them on different rows.
Slowing down a system surely works, but you could also, as a mitigation 
just make this CPU core run at the lowest possible frequency. That would 
likely be more effective than the solution you suggest.


Now, every Rowhammer attack exploits not only the DRAM effects but also 
the way the operating system organizes memory.


Some papers exploit page deduplication and disabling page deduplication 
should be the default also for other reasons, such as information 
disclosure attacks. If page deduplication is disabled, attacks like 
Dedup est Machina and Flip Feng Shui are inherently not possible anymore.


Most other attacks target page tables (the Google exploit, Rowhammer.js, 
Drammer). Now in Rowhammer.js we suggested a very simple fix, that is 
just an extension of what Linux already does.
Unless out of memory page tables and user pages are not placed in the 
same 2MB region. We suggested that this behavior should be more strict 
even in memory pressure situations. If the OS can only find a page table 
that resides in the same 2MB region as a user page, the request should 
fail instead and the process requesting it should go out of memory. More 
generally, the attack surface is gone if the OS never places a page 
table in proximity of less than 2MB to a user page.
That is a simple fix that does not cost any runtime performance. It 
mitigates all these scary attacks and won't even incur a memory cost in 
most situation.


Re: Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]

2016-10-29 Thread Daniel Gruss

I think that this idea to mitigate Rowhammer is not a good approach.

I wrote Rowhammer.js (we published a paper on that) and I had the first 
reproducible bit flips on DDR4 at both, increased and default refresh 
rates (published in our DRAMA paper).


We have researched the number of cache misses induced from different 
applications in the past and there are many applications that cause more 
cache misses than Rowhammer (published in our Flush+Flush paper) they 
just cause them on different rows.
Slowing down a system surely works, but you could also, as a mitigation 
just make this CPU core run at the lowest possible frequency. That would 
likely be more effective than the solution you suggest.


Now, every Rowhammer attack exploits not only the DRAM effects but also 
the way the operating system organizes memory.


Some papers exploit page deduplication and disabling page deduplication 
should be the default also for other reasons, such as information 
disclosure attacks. If page deduplication is disabled, attacks like 
Dedup est Machina and Flip Feng Shui are inherently not possible anymore.


Most other attacks target page tables (the Google exploit, Rowhammer.js, 
Drammer). Now in Rowhammer.js we suggested a very simple fix, that is 
just an extension of what Linux already does.
Unless out of memory page tables and user pages are not placed in the 
same 2MB region. We suggested that this behavior should be more strict 
even in memory pressure situations. If the OS can only find a page table 
that resides in the same 2MB region as a user page, the request should 
fail instead and the process requesting it should go out of memory. More 
generally, the attack surface is gone if the OS never places a page 
table in proximity of less than 2MB to a user page.
That is a simple fix that does not cost any runtime performance. It 
mitigates all these scary attacks and won't even incur a memory cost in 
most situation.