Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Jann Horn
On Thu, Oct 8, 2020 at 8:10 PM Topi Miettinen  wrote:
> On 8.10.2020 20.13, Jann Horn wrote:
> > On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen  wrote:
> >> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> >> enables full randomization of memory mappings created with mmap(NULL,
> >> ...). With 2, the base of the VMA used for such mappings is random,
> >> but the mappings are created in predictable places within the VMA and
> >> in sequential order. With 3, new VMAs are created to fully randomize
> >> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> >> even if not necessary.
> > [...]
> >> +   if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) {
> >> +   /*
> >> +* Caller is happy with a different address, so let's
> >> +* move even if not necessary!
> >> +*/
> >> +   new_addr = arch_mmap_rnd();
> >> +
> >> +   ret = mremap_to(addr, old_len, new_addr, new_len,
> >> +   &locked, flags, &uf, &uf_unmap_early,
> >> +   &uf_unmap);
> >> +   goto out;
> >> +   }
> >
> > You just pick a random number as the address, and try to place the
> > mapping there? Won't this fail if e.g. the old address range overlaps
> > with the new one, causing mremap_to() to bail out at "if (addr +
> > old_len > new_addr && new_addr + new_len > addr)"?
>
> Thanks for the review. I think overlap would be OK in this case and the
> check should be skipped.

No, mremap() can't deal with overlap (and trying to add such support
would make mremap() unnecessarily complicated).


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Topi Miettinen

On 8.10.2020 20.07, Matthew Wilcox wrote:

On Thu, Oct 08, 2020 at 07:54:08PM +0300, Topi Miettinen wrote:

+3   Additionally enable full randomization of memory mappings created
+with mmap(NULL, ...). With 2, the base of the VMA used for such
+mappings is random, but the mappings are created in predictable
+places within the VMA and in sequential order. With 3, new VMAs
+are created to fully randomize the mappings. Also mremap(...,
+MREMAP_MAYMOVE) will move the mappings even if not necessary.
+
+On 32 bit systems this may cause problems due to increased VM
+fragmentation if the address space gets crowded.


On all systems, it will reduce performance and increase memory usage due
to less efficient use of page tables and inability to merge adjacent VMAs
with compatible attributes.


Right, I'll update the description.


+   if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) {
+   /*
+* Caller is happy with a different address, so let's
+* move even if not necessary!
+*/
+   new_addr = arch_mmap_rnd();
+
+   ret = mremap_to(addr, old_len, new_addr, new_len,
+   &locked, flags, &uf, &uf_unmap_early,
+   &uf_unmap);
+   goto out;
+   }
+
+


Overly enthusiastic newline



Will remove.

-Topi


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Topi Miettinen

On 8.10.2020 20.13, Jann Horn wrote:

On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen  wrote:

Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
enables full randomization of memory mappings created with mmap(NULL,
...). With 2, the base of the VMA used for such mappings is random,
but the mappings are created in predictable places within the VMA and
in sequential order. With 3, new VMAs are created to fully randomize
the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
even if not necessary.

[...]

+   if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) {
+   /*
+* Caller is happy with a different address, so let's
+* move even if not necessary!
+*/
+   new_addr = arch_mmap_rnd();
+
+   ret = mremap_to(addr, old_len, new_addr, new_len,
+   &locked, flags, &uf, &uf_unmap_early,
+   &uf_unmap);
+   goto out;
+   }


You just pick a random number as the address, and try to place the
mapping there? Won't this fail if e.g. the old address range overlaps
with the new one, causing mremap_to() to bail out at "if (addr +
old_len > new_addr && new_addr + new_len > addr)"?


Thanks for the review. I think overlap would be OK in this case and the 
check should be skipped.



Also, on Linux, the main program stack is (currently) an expanding
memory mapping that starts out being something like a couple hundred
kilobytes in size. If you allocate memory too close to the main
program stack, and someone then recurses deep enough to need more
memory, the program will crash. It sounds like your patch will
randomly make such programs crash.


Right, especially on 32 bit systems this could be a real problem. I have 
limited the stack for tasks in the whole system to 2MB without problems 
(most use only 128kB) and on 48 bit virtual address systems the 
collision to 2MB area could be roughly 1/2^(48-21) which is a very small 
number. But perhaps this should be still be avoided by not picking an 
address too close to bottom of stack, say 64MB to be sure. It may also 
make this more useful also for 32 bit systems but overall I'm not so 
optimistic due to increased fragmentation.



Also, what's your strategy in general with regards to collisions with
existing mappings? Is your intention to just fall back to the classic
algorithm in that case?


Maybe a different address could be tried (but not infinitely, say 5 
times) and then fall back to classics. This would not be good for the 
ASLR but I haven't seen mremap() to be used much in my tests.



You may want to consider whether it would be better to store
information about free memory per subtree in the VMA tree, together
with the maximum gap size that is already stored in each node, and
then walk down the tree randomly, with the randomness weighted by free
memory in the subtrees, but ignoring subtrees whose gaps are too
small. And for expanding stacks, it might be a good idea for other
reasons as well (locking consistency) to refactor them such that the
size in the VMA tree corresponds to the maximum expansion of the stack
(and if an allocation is about to fail, shrink such stack mappings).


This would reduce the randomization which I want to avoid. I think the 
extra overhead should be OK: if this is unacceptable for a workload or 
system constraints, don't use mode '3' but '2'.


Instead of single global sysctl, this could be implemented as a new 
personality (or make this model the default and add a compatibility 
personality with no or less randomization), so it could be applied only 
for some tasks but not all.


-Topi


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Matthew Wilcox
On Thu, Oct 08, 2020 at 07:26:31PM +0200, Jann Horn wrote:
> On Thu, Oct 8, 2020 at 7:23 PM Matthew Wilcox  wrote:
> > On Thu, Oct 08, 2020 at 07:13:51PM +0200, Jann Horn wrote:
> > > And for expanding stacks, it might be a good idea for other
> > > reasons as well (locking consistency) to refactor them such that the
> > > size in the VMA tree corresponds to the maximum expansion of the stack
> > > (and if an allocation is about to fail, shrink such stack mappings).
> >
> > We're doing that as part of the B-tree ;-)  Although not the shrink
> > stack mappings part ...
> 
> Wheee, thanks! Finally no more data races on ->vm_start?

Ah, maybe still that.  The B-tree records the start of the mapping in
the tree, but we still keep vma->vm_start as pointing to the current top
of the stack (it's still the top if it grows down ... right?)  The key is
that these numbers may now be different, so from the tree's point of view,
the vm addresses for 1MB below the stack appear to be occupied.  From the
VMA's point of view, the stack finishes where it was last accessed.

We also get rid of the insanity of "return the next VMA if there's no
VMA at this address" which most of the callers don't want and have to
check for.  Again, from the tree's point of view, there is a VMA at this
address, but from the VMA's point of view, it'll need to expand to reach
that address.

I don't think this piece is implemented yet, but it's definitely planned.


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Jann Horn
On Thu, Oct 8, 2020 at 7:23 PM Matthew Wilcox  wrote:
> On Thu, Oct 08, 2020 at 07:13:51PM +0200, Jann Horn wrote:
> > And for expanding stacks, it might be a good idea for other
> > reasons as well (locking consistency) to refactor them such that the
> > size in the VMA tree corresponds to the maximum expansion of the stack
> > (and if an allocation is about to fail, shrink such stack mappings).
>
> We're doing that as part of the B-tree ;-)  Although not the shrink
> stack mappings part ...

Wheee, thanks! Finally no more data races on ->vm_start?


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Matthew Wilcox
On Thu, Oct 08, 2020 at 07:13:51PM +0200, Jann Horn wrote:
> You may want to consider whether it would be better to store
> information about free memory per subtree in the VMA tree, together
> with the maximum gap size that is already stored in each node, and
> then walk down the tree randomly, with the randomness weighted by free
> memory in the subtrees, but ignoring subtrees whose gaps are too
> small.

Please, no.  We're trying to get rid of the rbtree, not enhance it
further.  The new data structure is a B-tree and we'd rather not burden
it with extra per-node information (... although if we have to, we could)

> And for expanding stacks, it might be a good idea for other
> reasons as well (locking consistency) to refactor them such that the
> size in the VMA tree corresponds to the maximum expansion of the stack
> (and if an allocation is about to fail, shrink such stack mappings).

We're doing that as part of the B-tree ;-)  Although not the shrink
stack mappings part ...


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Jann Horn
On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen  wrote:
> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
> enables full randomization of memory mappings created with mmap(NULL,
> ...). With 2, the base of the VMA used for such mappings is random,
> but the mappings are created in predictable places within the VMA and
> in sequential order. With 3, new VMAs are created to fully randomize
> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
> even if not necessary.
[...]
> +   if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) {
> +   /*
> +* Caller is happy with a different address, so let's
> +* move even if not necessary!
> +*/
> +   new_addr = arch_mmap_rnd();
> +
> +   ret = mremap_to(addr, old_len, new_addr, new_len,
> +   &locked, flags, &uf, &uf_unmap_early,
> +   &uf_unmap);
> +   goto out;
> +   }

You just pick a random number as the address, and try to place the
mapping there? Won't this fail if e.g. the old address range overlaps
with the new one, causing mremap_to() to bail out at "if (addr +
old_len > new_addr && new_addr + new_len > addr)"?

Also, on Linux, the main program stack is (currently) an expanding
memory mapping that starts out being something like a couple hundred
kilobytes in size. If you allocate memory too close to the main
program stack, and someone then recurses deep enough to need more
memory, the program will crash. It sounds like your patch will
randomly make such programs crash.

Also, what's your strategy in general with regards to collisions with
existing mappings? Is your intention to just fall back to the classic
algorithm in that case?

You may want to consider whether it would be better to store
information about free memory per subtree in the VMA tree, together
with the maximum gap size that is already stored in each node, and
then walk down the tree randomly, with the randomness weighted by free
memory in the subtrees, but ignoring subtrees whose gaps are too
small. And for expanding stacks, it might be a good idea for other
reasons as well (locking consistency) to refactor them such that the
size in the VMA tree corresponds to the maximum expansion of the stack
(and if an allocation is about to fail, shrink such stack mappings).


Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Matthew Wilcox
On Thu, Oct 08, 2020 at 07:54:08PM +0300, Topi Miettinen wrote:
> +3   Additionally enable full randomization of memory mappings created
> +with mmap(NULL, ...). With 2, the base of the VMA used for such
> +mappings is random, but the mappings are created in predictable
> +places within the VMA and in sequential order. With 3, new VMAs
> +are created to fully randomize the mappings. Also mremap(...,
> +MREMAP_MAYMOVE) will move the mappings even if not necessary.
> +
> +On 32 bit systems this may cause problems due to increased VM
> +fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory usage due
to less efficient use of page tables and inability to merge adjacent VMAs
with compatible attributes.

> + if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) {
> + /*
> +  * Caller is happy with a different address, so let's
> +  * move even if not necessary!
> +  */
> + new_addr = arch_mmap_rnd();
> +
> + ret = mremap_to(addr, old_len, new_addr, new_len,
> + &locked, flags, &uf, &uf_unmap_early,
> + &uf_unmap);
> + goto out;
> + }
> +
> +

Overly enthusiastic newline


[PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()

2020-10-08 Thread Topi Miettinen
Writing a new value of 3 to /proc/sys/kernel/randomize_va_space
enables full randomization of memory mappings created with mmap(NULL,
...). With 2, the base of the VMA used for such mappings is random,
but the mappings are created in predictable places within the VMA and
in sequential order. With 3, new VMAs are created to fully randomize
the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings
even if not necessary.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

In this example, with value of 2, ld.so.cache, libc, an anonymous mmap
and locale-archive are located close to each other:
$ strace /bin/sync
...
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=189096, ...}) = 0
mmap(NULL, 189096, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7d9c1e7f2000
...
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0n\2\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1839792, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7d9c1e7f
mmap(NULL, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7d9c1e62b000
...
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0
mmap(NULL, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7d9c1e0c9000

With 3, they are located in unrelated addresses:
$ echo 3 > /proc/sys/kernel/randomize_va_space
$ /bin/sync
...
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=189096, ...}) = 0
mmap(NULL, 189096, PROT_READ, MAP_PRIVATE, 3, 0) = 0xeda4fbea000
...
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0n\2\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1839792, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb8fb9c1d000
mmap(NULL, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xaabd8598000
...
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0
mmap(NULL, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0xbe351ab8000

Signed-off-by: Topi Miettinen 
---
Resent also to hardening list (hopefully the right one)
v2: also randomize mremap(..., MREMAP_MAYMOVE)
---
 Documentation/admin-guide/hw-vuln/spectre.rst |  6 +++---
 Documentation/admin-guide/sysctl/kernel.rst   | 11 +++
 init/Kconfig  |  2 +-
 mm/mmap.c |  7 ++-
 mm/mremap.c   | 15 +++
 5 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst 
b/Documentation/admin-guide/hw-vuln/spectre.rst
index e05e581af5cf..9ea250522077 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -254,7 +254,7 @@ Spectre variant 2
left by the previous process will also be cleared.
 
User programs should use address space randomization to make attacks
-   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
+   more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3).
 
 3. A virtualized guest attacking the host
 ^
@@ -499,8 +499,8 @@ Spectre variant 2
more overhead and run slower.
 
User programs should use address space randomization
-   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
-   difficult.
+   (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks
+   more difficult.
 
 3. VM mitigation
 
diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index d4b32cc32bb7..acd0612155d9 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1060,6 +1060,17 @@ that support this feature.
 Systems with ancient and/or broken binaries should be configured
 with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
 address space randomization.
+
+3   Additionally enable full randomization of memory mappings created
+with mmap(NULL, ...). With 2, the base of the VMA used for such
+mappings is random, but the mappings are created in predictable
+places within the VMA and in sequential order. With 3, new VMAs
+are created to fully randomize the mappings. Also mremap(...,
+MREMAP_MAYMOVE) will move the mappings even if not necessary.
+
+On 32 bit systems this may cause problems due to increased VM
+fragmentation if the address space gets crowded.
+
 ==  ===
 
 
diff --git a/init/K