Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On Thu, Oct 8, 2020 at 8:10 PM Topi Miettinen wrote: > On 8.10.2020 20.13, Jann Horn wrote: > > On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen wrote: > >> Writing a new value of 3 to /proc/sys/kernel/randomize_va_space > >> enables full randomization of memory mappings created with mmap(NULL, > >> ...). With 2, the base of the VMA used for such mappings is random, > >> but the mappings are created in predictable places within the VMA and > >> in sequential order. With 3, new VMAs are created to fully randomize > >> the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings > >> even if not necessary. > > [...] > >> + if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) { > >> + /* > >> +* Caller is happy with a different address, so let's > >> +* move even if not necessary! > >> +*/ > >> + new_addr = arch_mmap_rnd(); > >> + > >> + ret = mremap_to(addr, old_len, new_addr, new_len, > >> + &locked, flags, &uf, &uf_unmap_early, > >> + &uf_unmap); > >> + goto out; > >> + } > > > > You just pick a random number as the address, and try to place the > > mapping there? Won't this fail if e.g. the old address range overlaps > > with the new one, causing mremap_to() to bail out at "if (addr + > > old_len > new_addr && new_addr + new_len > addr)"? > > Thanks for the review. I think overlap would be OK in this case and the > check should be skipped. No, mremap() can't deal with overlap (and trying to add such support would make mremap() unnecessarily complicated).
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On 8.10.2020 20.07, Matthew Wilcox wrote: On Thu, Oct 08, 2020 at 07:54:08PM +0300, Topi Miettinen wrote: +3 Additionally enable full randomization of memory mappings created +with mmap(NULL, ...). With 2, the base of the VMA used for such +mappings is random, but the mappings are created in predictable +places within the VMA and in sequential order. With 3, new VMAs +are created to fully randomize the mappings. Also mremap(..., +MREMAP_MAYMOVE) will move the mappings even if not necessary. + +On 32 bit systems this may cause problems due to increased VM +fragmentation if the address space gets crowded. On all systems, it will reduce performance and increase memory usage due to less efficient use of page tables and inability to merge adjacent VMAs with compatible attributes. Right, I'll update the description. + if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) { + /* +* Caller is happy with a different address, so let's +* move even if not necessary! +*/ + new_addr = arch_mmap_rnd(); + + ret = mremap_to(addr, old_len, new_addr, new_len, + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); + goto out; + } + + Overly enthusiastic newline Will remove. -Topi
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On 8.10.2020 20.13, Jann Horn wrote: On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen wrote: Writing a new value of 3 to /proc/sys/kernel/randomize_va_space enables full randomization of memory mappings created with mmap(NULL, ...). With 2, the base of the VMA used for such mappings is random, but the mappings are created in predictable places within the VMA and in sequential order. With 3, new VMAs are created to fully randomize the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not necessary. [...] + if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) { + /* +* Caller is happy with a different address, so let's +* move even if not necessary! +*/ + new_addr = arch_mmap_rnd(); + + ret = mremap_to(addr, old_len, new_addr, new_len, + &locked, flags, &uf, &uf_unmap_early, + &uf_unmap); + goto out; + } You just pick a random number as the address, and try to place the mapping there? Won't this fail if e.g. the old address range overlaps with the new one, causing mremap_to() to bail out at "if (addr + old_len > new_addr && new_addr + new_len > addr)"? Thanks for the review. I think overlap would be OK in this case and the check should be skipped. Also, on Linux, the main program stack is (currently) an expanding memory mapping that starts out being something like a couple hundred kilobytes in size. If you allocate memory too close to the main program stack, and someone then recurses deep enough to need more memory, the program will crash. It sounds like your patch will randomly make such programs crash. Right, especially on 32 bit systems this could be a real problem. I have limited the stack for tasks in the whole system to 2MB without problems (most use only 128kB) and on 48 bit virtual address systems the collision to 2MB area could be roughly 1/2^(48-21) which is a very small number. But perhaps this should be still be avoided by not picking an address too close to bottom of stack, say 64MB to be sure. It may also make this more useful also for 32 bit systems but overall I'm not so optimistic due to increased fragmentation. Also, what's your strategy in general with regards to collisions with existing mappings? Is your intention to just fall back to the classic algorithm in that case? Maybe a different address could be tried (but not infinitely, say 5 times) and then fall back to classics. This would not be good for the ASLR but I haven't seen mremap() to be used much in my tests. You may want to consider whether it would be better to store information about free memory per subtree in the VMA tree, together with the maximum gap size that is already stored in each node, and then walk down the tree randomly, with the randomness weighted by free memory in the subtrees, but ignoring subtrees whose gaps are too small. And for expanding stacks, it might be a good idea for other reasons as well (locking consistency) to refactor them such that the size in the VMA tree corresponds to the maximum expansion of the stack (and if an allocation is about to fail, shrink such stack mappings). This would reduce the randomization which I want to avoid. I think the extra overhead should be OK: if this is unacceptable for a workload or system constraints, don't use mode '3' but '2'. Instead of single global sysctl, this could be implemented as a new personality (or make this model the default and add a compatibility personality with no or less randomization), so it could be applied only for some tasks but not all. -Topi
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On Thu, Oct 08, 2020 at 07:26:31PM +0200, Jann Horn wrote: > On Thu, Oct 8, 2020 at 7:23 PM Matthew Wilcox wrote: > > On Thu, Oct 08, 2020 at 07:13:51PM +0200, Jann Horn wrote: > > > And for expanding stacks, it might be a good idea for other > > > reasons as well (locking consistency) to refactor them such that the > > > size in the VMA tree corresponds to the maximum expansion of the stack > > > (and if an allocation is about to fail, shrink such stack mappings). > > > > We're doing that as part of the B-tree ;-) Although not the shrink > > stack mappings part ... > > Wheee, thanks! Finally no more data races on ->vm_start? Ah, maybe still that. The B-tree records the start of the mapping in the tree, but we still keep vma->vm_start as pointing to the current top of the stack (it's still the top if it grows down ... right?) The key is that these numbers may now be different, so from the tree's point of view, the vm addresses for 1MB below the stack appear to be occupied. From the VMA's point of view, the stack finishes where it was last accessed. We also get rid of the insanity of "return the next VMA if there's no VMA at this address" which most of the callers don't want and have to check for. Again, from the tree's point of view, there is a VMA at this address, but from the VMA's point of view, it'll need to expand to reach that address. I don't think this piece is implemented yet, but it's definitely planned.
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On Thu, Oct 8, 2020 at 7:23 PM Matthew Wilcox wrote: > On Thu, Oct 08, 2020 at 07:13:51PM +0200, Jann Horn wrote: > > And for expanding stacks, it might be a good idea for other > > reasons as well (locking consistency) to refactor them such that the > > size in the VMA tree corresponds to the maximum expansion of the stack > > (and if an allocation is about to fail, shrink such stack mappings). > > We're doing that as part of the B-tree ;-) Although not the shrink > stack mappings part ... Wheee, thanks! Finally no more data races on ->vm_start?
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On Thu, Oct 08, 2020 at 07:13:51PM +0200, Jann Horn wrote: > You may want to consider whether it would be better to store > information about free memory per subtree in the VMA tree, together > with the maximum gap size that is already stored in each node, and > then walk down the tree randomly, with the randomness weighted by free > memory in the subtrees, but ignoring subtrees whose gaps are too > small. Please, no. We're trying to get rid of the rbtree, not enhance it further. The new data structure is a B-tree and we'd rather not burden it with extra per-node information (... although if we have to, we could) > And for expanding stacks, it might be a good idea for other > reasons as well (locking consistency) to refactor them such that the > size in the VMA tree corresponds to the maximum expansion of the stack > (and if an allocation is about to fail, shrink such stack mappings). We're doing that as part of the B-tree ;-) Although not the shrink stack mappings part ...
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On Thu, Oct 8, 2020 at 6:54 PM Topi Miettinen wrote: > Writing a new value of 3 to /proc/sys/kernel/randomize_va_space > enables full randomization of memory mappings created with mmap(NULL, > ...). With 2, the base of the VMA used for such mappings is random, > but the mappings are created in predictable places within the VMA and > in sequential order. With 3, new VMAs are created to fully randomize > the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings > even if not necessary. [...] > + if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) { > + /* > +* Caller is happy with a different address, so let's > +* move even if not necessary! > +*/ > + new_addr = arch_mmap_rnd(); > + > + ret = mremap_to(addr, old_len, new_addr, new_len, > + &locked, flags, &uf, &uf_unmap_early, > + &uf_unmap); > + goto out; > + } You just pick a random number as the address, and try to place the mapping there? Won't this fail if e.g. the old address range overlaps with the new one, causing mremap_to() to bail out at "if (addr + old_len > new_addr && new_addr + new_len > addr)"? Also, on Linux, the main program stack is (currently) an expanding memory mapping that starts out being something like a couple hundred kilobytes in size. If you allocate memory too close to the main program stack, and someone then recurses deep enough to need more memory, the program will crash. It sounds like your patch will randomly make such programs crash. Also, what's your strategy in general with regards to collisions with existing mappings? Is your intention to just fall back to the classic algorithm in that case? You may want to consider whether it would be better to store information about free memory per subtree in the VMA tree, together with the maximum gap size that is already stored in each node, and then walk down the tree randomly, with the randomness weighted by free memory in the subtrees, but ignoring subtrees whose gaps are too small. And for expanding stacks, it might be a good idea for other reasons as well (locking consistency) to refactor them such that the size in the VMA tree corresponds to the maximum expansion of the stack (and if an allocation is about to fail, shrink such stack mappings).
Re: [PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
On Thu, Oct 08, 2020 at 07:54:08PM +0300, Topi Miettinen wrote: > +3 Additionally enable full randomization of memory mappings created > +with mmap(NULL, ...). With 2, the base of the VMA used for such > +mappings is random, but the mappings are created in predictable > +places within the VMA and in sequential order. With 3, new VMAs > +are created to fully randomize the mappings. Also mremap(..., > +MREMAP_MAYMOVE) will move the mappings even if not necessary. > + > +On 32 bit systems this may cause problems due to increased VM > +fragmentation if the address space gets crowded. On all systems, it will reduce performance and increase memory usage due to less efficient use of page tables and inability to merge adjacent VMAs with compatible attributes. > + if ((flags & MREMAP_MAYMOVE) && randomize_va_space >= 3) { > + /* > + * Caller is happy with a different address, so let's > + * move even if not necessary! > + */ > + new_addr = arch_mmap_rnd(); > + > + ret = mremap_to(addr, old_len, new_addr, new_len, > + &locked, flags, &uf, &uf_unmap_early, > + &uf_unmap); > + goto out; > + } > + > + Overly enthusiastic newline
[PATCH RESEND v2] mm: Optional full ASLR for mmap() and mremap()
Writing a new value of 3 to /proc/sys/kernel/randomize_va_space enables full randomization of memory mappings created with mmap(NULL, ...). With 2, the base of the VMA used for such mappings is random, but the mappings are created in predictable places within the VMA and in sequential order. With 3, new VMAs are created to fully randomize the mappings. Also mremap(..., MREMAP_MAYMOVE) will move the mappings even if not necessary. On 32 bit systems this may cause problems due to increased VM fragmentation if the address space gets crowded. In this example, with value of 2, ld.so.cache, libc, an anonymous mmap and locale-archive are located close to each other: $ strace /bin/sync ... openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=189096, ...}) = 0 mmap(NULL, 189096, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7d9c1e7f2000 ... openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0n\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1839792, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7d9c1e7f mmap(NULL, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7d9c1e62b000 ... openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0 mmap(NULL, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7d9c1e0c9000 With 3, they are located in unrelated addresses: $ echo 3 > /proc/sys/kernel/randomize_va_space $ /bin/sync ... openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=189096, ...}) = 0 mmap(NULL, 189096, PROT_READ, MAP_PRIVATE, 3, 0) = 0xeda4fbea000 ... openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0n\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1839792, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb8fb9c1d000 mmap(NULL, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xaabd8598000 ... openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=5642592, ...}) = 0 mmap(NULL, 5642592, PROT_READ, MAP_PRIVATE, 3, 0) = 0xbe351ab8000 Signed-off-by: Topi Miettinen --- Resent also to hardening list (hopefully the right one) v2: also randomize mremap(..., MREMAP_MAYMOVE) --- Documentation/admin-guide/hw-vuln/spectre.rst | 6 +++--- Documentation/admin-guide/sysctl/kernel.rst | 11 +++ init/Kconfig | 2 +- mm/mmap.c | 7 ++- mm/mremap.c | 15 +++ 5 files changed, 36 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index e05e581af5cf..9ea250522077 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -254,7 +254,7 @@ Spectre variant 2 left by the previous process will also be cleared. User programs should use address space randomization to make attacks - more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2). + more difficult (Set /proc/sys/kernel/randomize_va_space = 1, 2 or 3). 3. A virtualized guest attacking the host ^ @@ -499,8 +499,8 @@ Spectre variant 2 more overhead and run slower. User programs should use address space randomization - (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more - difficult. + (/proc/sys/kernel/randomize_va_space = 1, 2 or 3) to make attacks + more difficult. 3. VM mitigation diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index d4b32cc32bb7..acd0612155d9 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1060,6 +1060,17 @@ that support this feature. Systems with ancient and/or broken binaries should be configured with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process address space randomization. + +3 Additionally enable full randomization of memory mappings created +with mmap(NULL, ...). With 2, the base of the VMA used for such +mappings is random, but the mappings are created in predictable +places within the VMA and in sequential order. With 3, new VMAs +are created to fully randomize the mappings. Also mremap(..., +MREMAP_MAYMOVE) will move the mappings even if not necessary. + +On 32 bit systems this may cause problems due to increased VM +fragmentation if the address space gets crowded. + == === diff --git a/init/K