Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-16 Thread Uladzislau Rezki
On Tue, Mar 16, 2021 at 10:01:46AM +0200, Topi Miettinen wrote:
> On 15.3.2021 19.47, Uladzislau Rezki wrote:
> > On Mon, Mar 15, 2021 at 09:16:26AM -0700, Kees Cook wrote:
> > > On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
> > > > On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> > > > > What's the problem with that? It seems to me that nothing relies on 
> > > > > specific
> > > > > addresses of the chunks, so it should be possible to randomize these 
> > > > > too.
> > > > > Also the alignment is honored.
> > > > > 
> > > > My concern are:
> > > > 
> > > > - it is not a vmalloc allocator;
> > > > - per-cpu allocator allocates chunks, thus it might be it happens only 
> > > > once. It does not allocate it often;
> > > 
> > > That's actually the reason to randomize it: if it always ends up in the
> > > same place at every boot, it becomes a stable target for attackers.
> > > 
> > Probably we can randomize a base address only once when pcpu-allocator
> > allocates a fist chunk during the boot.
> > 
> > > > - changing it will likely introduce issues you are not aware of;
> > > > - it is not supposed to be interacting with vmalloc allocator. Read the
> > > >comment under pcpu_get_vm_areas();
> > > > 
> > > > Therefore i propose just not touch it.
> > > 
> > > How about splitting it from this patch instead? Then it can get separate
> > > testing, etc.
> > > 
> > It should be split as well as tested.
> 
> Would you prefer another kernel option `randomize_percpu_allocator=1`, or
> would it be OK to make it a flag in `randomize_vmalloc`, like
> `randomize_vmalloc=3`? Maybe the latter would not be compatible with static
> branches.
> 
I think it is better to have a separate option, because there are two
different allocators.

--
Vlad Rezki


Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-16 Thread Topi Miettinen

On 15.3.2021 19.47, Uladzislau Rezki wrote:

On Mon, Mar 15, 2021 at 09:16:26AM -0700, Kees Cook wrote:

On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:

On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:

What's the problem with that? It seems to me that nothing relies on specific
addresses of the chunks, so it should be possible to randomize these too.
Also the alignment is honored.


My concern are:

- it is not a vmalloc allocator;
- per-cpu allocator allocates chunks, thus it might be it happens only once. It 
does not allocate it often;


That's actually the reason to randomize it: if it always ends up in the
same place at every boot, it becomes a stable target for attackers.


Probably we can randomize a base address only once when pcpu-allocator
allocates a fist chunk during the boot.


- changing it will likely introduce issues you are not aware of;
- it is not supposed to be interacting with vmalloc allocator. Read the
   comment under pcpu_get_vm_areas();

Therefore i propose just not touch it.


How about splitting it from this patch instead? Then it can get separate
testing, etc.


It should be split as well as tested.


Would you prefer another kernel option `randomize_percpu_allocator=1`, 
or would it be OK to make it a flag in `randomize_vmalloc`, like 
`randomize_vmalloc=3`? Maybe the latter would not be compatible with 
static branches.


-Topi



--
Vlad Rezki





Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-16 Thread Topi Miettinen

On 15.3.2021 20.02, Uladzislau Rezki wrote:

On Mon, Mar 15, 2021 at 06:23:37PM +0200, Topi Miettinen wrote:

On 15.3.2021 17.35, Uladzislau Rezki wrote:

On 14.3.2021 19.23, Uladzislau Rezki wrote:

Also, using vmaloc test driver i can trigger a kernel BUG:


[   24.627577] kernel BUG at mm/vmalloc.c:1272!


It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
very robust in face of fragmented virtual memory. What could be done to fix
that?


Your patch is broken in context of checking "vend" when you try to
allocate next time after first attempt. Passed "vend" is different
there comparing what is checked later to figure out if an allocation
failed or not:


  if (unlikely(addr == vend))
  goto overflow;




Thanks, I'll fix that.





In this patch, I could retry __alloc_vmap_area() with the whole region after
failure of both [random, vend] and [vstart, random] but I'm not sure that
would help much. Worth a try of course.


There is no need in your second [vstart, random]. If a first bigger range
has not been successful, the smaller one will never be success anyway. The
best way to go here is to repeat with real [vsart:vend], if it still fails
on a real range, then it will not be possible to accomplish an allocation
request with given parameters.



By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
failure, for example in full_fit_alloc_test().


Where?


Something like this:

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 5cf2fe9aab9e..27e5db9a96b4 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -182,9 +182,14 @@ static int long_busy_list_alloc_test(void)
 if (!ptr)
 return rv;

-   for (i = 0; i < 15000; i++)
+   for (i = 0; i < 15000; i++) {
 ptr[i] = vmalloc(1 * PAGE_SIZE);

+   if (!ptr[i])
+   goto leave;
+   }
+


Hmm. That is for creating a long list of allocated areas before running
a test. For example if one allocation among 15 000 fails, some index will
be set to NULL. Later on after "leave" label vfree() will bypass NULL freeing.

Either we have 15 000 extra elements or 10 000 does not really matter
and is considered as a corner case that is probably never happens. Yes,
you can simulate such precondition, but then a regular vmalloc()s will
likely also fails, thus the final results will be screwed up.


I'd argue that if the allocations fail, the test should be aborted 
immediately since the results are not representative.


-Topi




+
 for (i = 0; i < test_loop_count; i++) {
 ptr_1 = vmalloc(100 * PAGE_SIZE);
 if (!ptr_1)
@@ -236,7 +241,11 @@ static int full_fit_alloc_test(void)

 for (i = 0; i < junk_length; i++) {
 ptr[i] = vmalloc(1 * PAGE_SIZE);
+   if (!ptr[i])
+   goto error;
 junk_ptr[i] = vmalloc(1 * PAGE_SIZE);
+   if (!junk_ptr[i])
+   goto error;
 }

 for (i = 0; i < junk_length; i++)
@@ -256,8 +265,10 @@ static int full_fit_alloc_test(void)
 rv = 0;

  error:
-   for (i = 0; i < junk_length; i++)
+   for (i = 0; i < junk_length; i++) {
 vfree(ptr[i]);
+   vfree(junk_ptr[i]);
+   }

 vfree(ptr);
 vfree(junk_ptr);


Same here.

--
Vlad Rezki





Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Uladzislau Rezki
On Mon, Mar 15, 2021 at 06:23:37PM +0200, Topi Miettinen wrote:
> On 15.3.2021 17.35, Uladzislau Rezki wrote:
> > > On 14.3.2021 19.23, Uladzislau Rezki wrote:
> > > > Also, using vmaloc test driver i can trigger a kernel BUG:
> > > > 
> > > > 
> > > > [   24.627577] kernel BUG at mm/vmalloc.c:1272!
> > > 
> > > It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
> > > very robust in face of fragmented virtual memory. What could be done to 
> > > fix
> > > that?
> > > 
> > Your patch is broken in context of checking "vend" when you try to
> > allocate next time after first attempt. Passed "vend" is different
> > there comparing what is checked later to figure out if an allocation
> > failed or not:
> > 
> > 
> >  if (unlikely(addr == vend))
> >  goto overflow;
> > 
> 
> 
> Thanks, I'll fix that.
> 
> > 
> > > 
> > > In this patch, I could retry __alloc_vmap_area() with the whole region 
> > > after
> > > failure of both [random, vend] and [vstart, random] but I'm not sure that
> > > would help much. Worth a try of course.
> > > 
> > There is no need in your second [vstart, random]. If a first bigger range
> > has not been successful, the smaller one will never be success anyway. The
> > best way to go here is to repeat with real [vsart:vend], if it still fails
> > on a real range, then it will not be possible to accomplish an allocation
> > request with given parameters.
> > 
> > > 
> > > By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
> > > failure, for example in full_fit_alloc_test().
> > > 
> > Where?
> 
> Something like this:
> 
> diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
> index 5cf2fe9aab9e..27e5db9a96b4 100644
> --- a/lib/test_vmalloc.c
> +++ b/lib/test_vmalloc.c
> @@ -182,9 +182,14 @@ static int long_busy_list_alloc_test(void)
> if (!ptr)
> return rv;
> 
> -   for (i = 0; i < 15000; i++)
> +   for (i = 0; i < 15000; i++) {
> ptr[i] = vmalloc(1 * PAGE_SIZE);
> 
> +   if (!ptr[i])
> +   goto leave;
> +   }
> +
>
Hmm. That is for creating a long list of allocated areas before running
a test. For example if one allocation among 15 000 fails, some index will
be set to NULL. Later on after "leave" label vfree() will bypass NULL freeing.

Either we have 15 000 extra elements or 10 000 does not really matter
and is considered as a corner case that is probably never happens. Yes,
you can simulate such precondition, but then a regular vmalloc()s will
likely also fails, thus the final results will be screwed up.

> +
> for (i = 0; i < test_loop_count; i++) {
> ptr_1 = vmalloc(100 * PAGE_SIZE);
> if (!ptr_1)
> @@ -236,7 +241,11 @@ static int full_fit_alloc_test(void)
> 
> for (i = 0; i < junk_length; i++) {
> ptr[i] = vmalloc(1 * PAGE_SIZE);
> +   if (!ptr[i])
> +   goto error;
> junk_ptr[i] = vmalloc(1 * PAGE_SIZE);
> +   if (!junk_ptr[i])
> +   goto error;
> }
> 
> for (i = 0; i < junk_length; i++)
> @@ -256,8 +265,10 @@ static int full_fit_alloc_test(void)
> rv = 0;
> 
>  error:
> -   for (i = 0; i < junk_length; i++)
> +   for (i = 0; i < junk_length; i++) {
> vfree(ptr[i]);
> +   vfree(junk_ptr[i]);
> +   }
> 
> vfree(ptr);
> vfree(junk_ptr);
> 
Same here.

--
Vlad Rezki


Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Uladzislau Rezki
On Mon, Mar 15, 2021 at 09:16:26AM -0700, Kees Cook wrote:
> On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
> > On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> > > What's the problem with that? It seems to me that nothing relies on 
> > > specific
> > > addresses of the chunks, so it should be possible to randomize these too.
> > > Also the alignment is honored.
> > > 
> > My concern are:
> > 
> > - it is not a vmalloc allocator;
> > - per-cpu allocator allocates chunks, thus it might be it happens only 
> > once. It does not allocate it often;
> 
> That's actually the reason to randomize it: if it always ends up in the
> same place at every boot, it becomes a stable target for attackers.
> 
Probably we can randomize a base address only once when pcpu-allocator
allocates a fist chunk during the boot.

> > - changing it will likely introduce issues you are not aware of;
> > - it is not supposed to be interacting with vmalloc allocator. Read the
> >   comment under pcpu_get_vm_areas();
> > 
> > Therefore i propose just not touch it.
> 
> How about splitting it from this patch instead? Then it can get separate
> testing, etc.
> 
It should be split as well as tested.

--
Vlad Rezki


Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Topi Miettinen

On 15.3.2021 17.35, Uladzislau Rezki wrote:

On 14.3.2021 19.23, Uladzislau Rezki wrote:

Also, using vmaloc test driver i can trigger a kernel BUG:


[   24.627577] kernel BUG at mm/vmalloc.c:1272!


It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
very robust in face of fragmented virtual memory. What could be done to fix
that?


Your patch is broken in context of checking "vend" when you try to
allocate next time after first attempt. Passed "vend" is different
there comparing what is checked later to figure out if an allocation
failed or not:


 if (unlikely(addr == vend))
 goto overflow;




Thanks, I'll fix that.





In this patch, I could retry __alloc_vmap_area() with the whole region after
failure of both [random, vend] and [vstart, random] but I'm not sure that
would help much. Worth a try of course.


There is no need in your second [vstart, random]. If a first bigger range
has not been successful, the smaller one will never be success anyway. The
best way to go here is to repeat with real [vsart:vend], if it still fails
on a real range, then it will not be possible to accomplish an allocation
request with given parameters.



By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
failure, for example in full_fit_alloc_test().


Where?


Something like this:

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 5cf2fe9aab9e..27e5db9a96b4 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -182,9 +182,14 @@ static int long_busy_list_alloc_test(void)
if (!ptr)
return rv;

-   for (i = 0; i < 15000; i++)
+   for (i = 0; i < 15000; i++) {
ptr[i] = vmalloc(1 * PAGE_SIZE);

+   if (!ptr[i])
+   goto leave;
+   }
+
+
for (i = 0; i < test_loop_count; i++) {
ptr_1 = vmalloc(100 * PAGE_SIZE);
if (!ptr_1)
@@ -236,7 +241,11 @@ static int full_fit_alloc_test(void)

for (i = 0; i < junk_length; i++) {
ptr[i] = vmalloc(1 * PAGE_SIZE);
+   if (!ptr[i])
+   goto error;
junk_ptr[i] = vmalloc(1 * PAGE_SIZE);
+   if (!junk_ptr[i])
+   goto error;
}

for (i = 0; i < junk_length; i++)
@@ -256,8 +265,10 @@ static int full_fit_alloc_test(void)
rv = 0;

 error:
-   for (i = 0; i < junk_length; i++)
+   for (i = 0; i < junk_length; i++) {
vfree(ptr[i]);
+   vfree(junk_ptr[i]);
+   }

vfree(ptr);
vfree(junk_ptr);

-Topi


Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Kees Cook
On Mon, Mar 15, 2021 at 01:24:10PM +0100, Uladzislau Rezki wrote:
> On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> > What's the problem with that? It seems to me that nothing relies on specific
> > addresses of the chunks, so it should be possible to randomize these too.
> > Also the alignment is honored.
> > 
> My concern are:
> 
> - it is not a vmalloc allocator;
> - per-cpu allocator allocates chunks, thus it might be it happens only once. 
> It does not allocate it often;

That's actually the reason to randomize it: if it always ends up in the
same place at every boot, it becomes a stable target for attackers.

> - changing it will likely introduce issues you are not aware of;
> - it is not supposed to be interacting with vmalloc allocator. Read the
>   comment under pcpu_get_vm_areas();
> 
> Therefore i propose just not touch it.

How about splitting it from this patch instead? Then it can get separate
testing, etc.

-- 
Kees Cook


Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Uladzislau Rezki
> On 14.3.2021 19.23, Uladzislau Rezki wrote:
> > Also, using vmaloc test driver i can trigger a kernel BUG:
> > 
> > 
> > [   24.627577] kernel BUG at mm/vmalloc.c:1272!
> 
> It seems that most tests indeed fail. Perhaps the vmalloc subsystem isn't
> very robust in face of fragmented virtual memory. What could be done to fix
> that?
> 
Your patch is broken in context of checking "vend" when you try to
allocate next time after first attempt. Passed "vend" is different
there comparing what is checked later to figure out if an allocation
failed or not:


if (unlikely(addr == vend))
goto overflow;


>
> In this patch, I could retry __alloc_vmap_area() with the whole region after
> failure of both [random, vend] and [vstart, random] but I'm not sure that
> would help much. Worth a try of course.
> 
There is no need in your second [vstart, random]. If a first bigger range
has not been successful, the smaller one will never be success anyway. The
best way to go here is to repeat with real [vsart:vend], if it still fails
on a real range, then it will not be possible to accomplish an allocation
request with given parameters.

>
> By the way, some of the tests in test_vmalloc.c don't check for vmalloc()
> failure, for example in full_fit_alloc_test().
> 
Where?

--
Vlad Rezki


Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Uladzislau Rezki
On Mon, Mar 15, 2021 at 11:04:42AM +0200, Topi Miettinen wrote:
> On 14.3.2021 19.23, Uladzislau Rezki wrote:
> > > Memory mappings inside kernel allocated with vmalloc() are in
> > > predictable order and packed tightly toward the low addresses, except
> > > for per-cpu areas which start from top of the vmalloc area. With
> > > new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> > > used randomly to make the allocations less predictable and harder to
> > > guess for attackers. Also module and BPF code locations get randomized
> > > (within their dedicated and rather small area though) and if
> > > CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
> > > 
> > > On 32 bit systems this may cause problems due to increased VM
> > > fragmentation if the address space gets crowded.
> > > 
> > > On all systems, it will reduce performance and increase memory and
> > > cache usage due to less efficient use of page tables and inability to
> > > merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
> > > page tables, in the worst case, additional page table entries of up to
> > > 4 pages are created for each mapping, so with small mappings there's
> > > considerable penalty.
> > > 
> > > Without randomize_vmalloc=1:
> > > $ grep -v kernel_clone /proc/vmallocinfo
> > > 0xc900-0xc9009000   36864 
> > > irq_init_percpu_irqstack+0x176/0x1c0 vmap
> > > 0xc9009000-0xc900b0008192 
> > > acpi_os_map_iomem+0x2ac/0x2d0 phys=0x1ffe1000 ioremap
> > > 0xc900c000-0xc900f000   12288 
> > > acpi_os_map_iomem+0x2ac/0x2d0 phys=0x1ffe ioremap
> > > 0xc900f000-0xc90110008192 hpet_enable+0x31/0x4a4 
> > > phys=0xfed0 ioremap
> > > 0xc9011000-0xc90130008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xc9013000-0xc90150008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xc9015000-0xc90170008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xc9021000-0xc90230008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xc9023000-0xc90250008192 
> > > acpi_os_map_iomem+0x2ac/0x2d0 phys=0xfed0 ioremap
> > > 0xc9025000-0xc90270008192 memremap+0x19c/0x280 
> > > phys=0x000f5000 ioremap
> > > 0xc9031000-0xc9036000   20480 
> > > pcpu_create_chunk+0xe8/0x260 pages=4 vmalloc
> > > 0xc9043000-0xc9047000   16384 n_tty_open+0x11/0xe0 
> > > pages=3 vmalloc
> > > 0xc9211000-0xc9232000  135168 
> > > crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
> > > 0xc9232000-0xc9253000  135168 
> > > crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
> > > 0xc95a9000-0xc95ba000   69632 
> > > pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
> > > 0xc95ba000-0xc95cc000   73728 
> > > pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
> > > 0xe8c0-0xe8e0 2097152 
> > > pcpu_get_vm_areas+0x0/0x2290 vmalloc
> > > 
> > > With randomize_vmalloc=1, the allocations are randomized:
> > > $ grep -v kernel_clone /proc/vmallocinfo
> > > 0xc9759d443000-0xc9759d4450008192 hpet_enable+0x31/0x4a4 
> > > phys=0xfed0 ioremap
> > > 0xccf1e9f66000-0xccf1e9f680008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xcd2fc02a4000-0xcd2fc02a60008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xcdaefb898000-0xcdaefb89b000   12288 
> > > acpi_os_map_iomem+0x2ac/0x2d0 phys=0x1ffe ioremap
> > > 0xcef8074c3000-0xcef8074cc000   36864 
> > > irq_init_percpu_irqstack+0x176/0x1c0 vmap
> > > 0xcf725ca2e000-0xcf725ca4f000  135168 
> > > crypto_scomp_init_tfm+0xc6/0xf0 pages=32 vmalloc
> > > 0xd0efb25e1000-0xd0efb25f2000   69632 
> > > pcpu_create_chunk+0x7b/0x260 pages=16 vmalloc
> > > 0xd27054678000-0xd2705467c000   16384 n_tty_open+0x11/0xe0 
> > > pages=3 vmalloc
> > > 0xd2adf716e000-0xd2adf718   73728 
> > > pcpu_create_chunk+0xb2/0x260 pages=17 vmalloc
> > > 0xd4ba5fb6b000-0xd4ba5fb6d0008192 
> > > acpi_os_map_iomem+0x2ac/0x2d0 phys=0x1ffe1000 ioremap
> > > 0xded126192000-0xded1261940008192 memremap+0x19c/0x280 
> > > phys=0x000f5000 ioremap
> > > 0xe01a4dbcd000-0xe01a4dbcf0008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xe4b649952000-0xe4b6499540008192 
> > > acpi_os_map_iomem+0x2ac/0x2d0 phys=0xfed0 ioremap
> > > 0xe71ed592a000-0xe71ed592c0008192 
> > > gen_pool_add_owner+0x49/0x130 pages=1 vmalloc
> > > 0xe7dc5824f000-0xe7dc5827  135168 
> > > crypto_scomp_init_tfm+0x67/0xf0 pages=32 vmalloc
> > > 0xe8f4f980-0xe8f4f9a0 2097152 
> > > pcpu_get_vm_areas+0x0/0x2290 

Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Topi Miettinen

On 14.3.2021 19.23, Uladzislau Rezki wrote:

Also, using vmaloc test driver i can trigger a kernel BUG:


[   24.627577] kernel BUG at mm/vmalloc.c:1272!


It seems that most tests indeed fail. Perhaps the vmalloc subsystem 
isn't very robust in face of fragmented virtual memory. What could be 
done to fix that?


In this patch, I could retry __alloc_vmap_area() with the whole region 
after failure of both [random, vend] and [vstart, random] but I'm not 
sure that would help much. Worth a try of course.


By the way, some of the tests in test_vmalloc.c don't check for 
vmalloc() failure, for example in full_fit_alloc_test().


-Topi



Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-15 Thread Topi Miettinen

On 14.3.2021 19.23, Uladzislau Rezki wrote:

Memory mappings inside kernel allocated with vmalloc() are in
predictable order and packed tightly toward the low addresses, except
for per-cpu areas which start from top of the vmalloc area. With
new kernel boot parameter 'randomize_vmalloc=1', the entire area is
used randomly to make the allocations less predictable and harder to
guess for attackers. Also module and BPF code locations get randomized
(within their dedicated and rather small area though) and if
CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory and
cache usage due to less efficient use of page tables and inability to
merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
page tables, in the worst case, additional page table entries of up to
4 pages are created for each mapping, so with small mappings there's
considerable penalty.

Without randomize_vmalloc=1:
$ grep -v kernel_clone /proc/vmallocinfo
0xc900-0xc9009000   36864 
irq_init_percpu_irqstack+0x176/0x1c0 vmap
0xc9009000-0xc900b0008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe1000 ioremap
0xc900c000-0xc900f000   12288 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe ioremap
0xc900f000-0xc90110008192 hpet_enable+0x31/0x4a4 
phys=0xfed0 ioremap
0xc9011000-0xc90130008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9013000-0xc90150008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9015000-0xc90170008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9021000-0xc90230008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9023000-0xc90250008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0xfed0 ioremap
0xc9025000-0xc90270008192 memremap+0x19c/0x280 
phys=0x000f5000 ioremap
0xc9031000-0xc9036000   20480 pcpu_create_chunk+0xe8/0x260 
pages=4 vmalloc
0xc9043000-0xc9047000   16384 n_tty_open+0x11/0xe0 pages=3 
vmalloc
0xc9211000-0xc9232000  135168 crypto_scomp_init_tfm+0xc6/0xf0 
pages=32 vmalloc
0xc9232000-0xc9253000  135168 crypto_scomp_init_tfm+0x67/0xf0 
pages=32 vmalloc
0xc95a9000-0xc95ba000   69632 pcpu_create_chunk+0x7b/0x260 
pages=16 vmalloc
0xc95ba000-0xc95cc000   73728 pcpu_create_chunk+0xb2/0x260 
pages=17 vmalloc
0xe8c0-0xe8e0 2097152 pcpu_get_vm_areas+0x0/0x2290 
vmalloc

With randomize_vmalloc=1, the allocations are randomized:
$ grep -v kernel_clone /proc/vmallocinfo
0xc9759d443000-0xc9759d4450008192 hpet_enable+0x31/0x4a4 
phys=0xfed0 ioremap
0xccf1e9f66000-0xccf1e9f680008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xcd2fc02a4000-0xcd2fc02a60008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xcdaefb898000-0xcdaefb89b000   12288 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe ioremap
0xcef8074c3000-0xcef8074cc000   36864 
irq_init_percpu_irqstack+0x176/0x1c0 vmap
0xcf725ca2e000-0xcf725ca4f000  135168 crypto_scomp_init_tfm+0xc6/0xf0 
pages=32 vmalloc
0xd0efb25e1000-0xd0efb25f2000   69632 pcpu_create_chunk+0x7b/0x260 
pages=16 vmalloc
0xd27054678000-0xd2705467c000   16384 n_tty_open+0x11/0xe0 pages=3 
vmalloc
0xd2adf716e000-0xd2adf718   73728 pcpu_create_chunk+0xb2/0x260 
pages=17 vmalloc
0xd4ba5fb6b000-0xd4ba5fb6d0008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe1000 ioremap
0xded126192000-0xded1261940008192 memremap+0x19c/0x280 
phys=0x000f5000 ioremap
0xe01a4dbcd000-0xe01a4dbcf0008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xe4b649952000-0xe4b6499540008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0xfed0 ioremap
0xe71ed592a000-0xe71ed592c0008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xe7dc5824f000-0xe7dc5827  135168 crypto_scomp_init_tfm+0x67/0xf0 
pages=32 vmalloc
0xe8f4f980-0xe8f4f9a0 2097152 pcpu_get_vm_areas+0x0/0x2290 
vmalloc
0xe8f4f9a19000-0xe8f4f9a1e000   20480 pcpu_create_chunk+0xe8/0x260 
pages=4 vmalloc

With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
vmalloc area and therefore they also get randomized (only one example
line from /proc/vmallocinfo shown for brevity):

unrandomized:
0xc9018000-0xc9021000   36864 kernel_clone+0xf9/0x560 pages=8 
vmalloc

randomized:
0xcb57611a8000-0xcb57611b1000   36864 kernel_clone+0xf9/0x560 pages=8 
vmalloc

CC: Andrew Morton 
CC: Andy Lutomirski 
CC: Jann Horn 
CC: Kees Cook 
CC: Linux API 
CC: Matthew Wilcox 
CC: Mike Rapoport 

Re: [PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-14 Thread Uladzislau Rezki
> Memory mappings inside kernel allocated with vmalloc() are in
> predictable order and packed tightly toward the low addresses, except
> for per-cpu areas which start from top of the vmalloc area. With
> new kernel boot parameter 'randomize_vmalloc=1', the entire area is
> used randomly to make the allocations less predictable and harder to
> guess for attackers. Also module and BPF code locations get randomized
> (within their dedicated and rather small area though) and if
> CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.
> 
> On 32 bit systems this may cause problems due to increased VM
> fragmentation if the address space gets crowded.
> 
> On all systems, it will reduce performance and increase memory and
> cache usage due to less efficient use of page tables and inability to
> merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
> page tables, in the worst case, additional page table entries of up to
> 4 pages are created for each mapping, so with small mappings there's
> considerable penalty.
> 
> Without randomize_vmalloc=1:
> $ grep -v kernel_clone /proc/vmallocinfo
> 0xc900-0xc9009000   36864 
> irq_init_percpu_irqstack+0x176/0x1c0 vmap
> 0xc9009000-0xc900b0008192 acpi_os_map_iomem+0x2ac/0x2d0 
> phys=0x1ffe1000 ioremap
> 0xc900c000-0xc900f000   12288 acpi_os_map_iomem+0x2ac/0x2d0 
> phys=0x1ffe ioremap
> 0xc900f000-0xc90110008192 hpet_enable+0x31/0x4a4 
> phys=0xfed0 ioremap
> 0xc9011000-0xc90130008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xc9013000-0xc90150008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xc9015000-0xc90170008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xc9021000-0xc90230008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xc9023000-0xc90250008192 acpi_os_map_iomem+0x2ac/0x2d0 
> phys=0xfed0 ioremap
> 0xc9025000-0xc90270008192 memremap+0x19c/0x280 
> phys=0x000f5000 ioremap
> 0xc9031000-0xc9036000   20480 pcpu_create_chunk+0xe8/0x260 
> pages=4 vmalloc
> 0xc9043000-0xc9047000   16384 n_tty_open+0x11/0xe0 pages=3 
> vmalloc
> 0xc9211000-0xc9232000  135168 crypto_scomp_init_tfm+0xc6/0xf0 
> pages=32 vmalloc
> 0xc9232000-0xc9253000  135168 crypto_scomp_init_tfm+0x67/0xf0 
> pages=32 vmalloc
> 0xc95a9000-0xc95ba000   69632 pcpu_create_chunk+0x7b/0x260 
> pages=16 vmalloc
> 0xc95ba000-0xc95cc000   73728 pcpu_create_chunk+0xb2/0x260 
> pages=17 vmalloc
> 0xe8c0-0xe8e0 2097152 pcpu_get_vm_areas+0x0/0x2290 
> vmalloc
> 
> With randomize_vmalloc=1, the allocations are randomized:
> $ grep -v kernel_clone /proc/vmallocinfo
> 0xc9759d443000-0xc9759d4450008192 hpet_enable+0x31/0x4a4 
> phys=0xfed0 ioremap
> 0xccf1e9f66000-0xccf1e9f680008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xcd2fc02a4000-0xcd2fc02a60008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xcdaefb898000-0xcdaefb89b000   12288 acpi_os_map_iomem+0x2ac/0x2d0 
> phys=0x1ffe ioremap
> 0xcef8074c3000-0xcef8074cc000   36864 
> irq_init_percpu_irqstack+0x176/0x1c0 vmap
> 0xcf725ca2e000-0xcf725ca4f000  135168 crypto_scomp_init_tfm+0xc6/0xf0 
> pages=32 vmalloc
> 0xd0efb25e1000-0xd0efb25f2000   69632 pcpu_create_chunk+0x7b/0x260 
> pages=16 vmalloc
> 0xd27054678000-0xd2705467c000   16384 n_tty_open+0x11/0xe0 pages=3 
> vmalloc
> 0xd2adf716e000-0xd2adf718   73728 pcpu_create_chunk+0xb2/0x260 
> pages=17 vmalloc
> 0xd4ba5fb6b000-0xd4ba5fb6d0008192 acpi_os_map_iomem+0x2ac/0x2d0 
> phys=0x1ffe1000 ioremap
> 0xded126192000-0xded1261940008192 memremap+0x19c/0x280 
> phys=0x000f5000 ioremap
> 0xe01a4dbcd000-0xe01a4dbcf0008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xe4b649952000-0xe4b6499540008192 acpi_os_map_iomem+0x2ac/0x2d0 
> phys=0xfed0 ioremap
> 0xe71ed592a000-0xe71ed592c0008192 gen_pool_add_owner+0x49/0x130 
> pages=1 vmalloc
> 0xe7dc5824f000-0xe7dc5827  135168 crypto_scomp_init_tfm+0x67/0xf0 
> pages=32 vmalloc
> 0xe8f4f980-0xe8f4f9a0 2097152 pcpu_get_vm_areas+0x0/0x2290 
> vmalloc
> 0xe8f4f9a19000-0xe8f4f9a1e000   20480 pcpu_create_chunk+0xe8/0x260 
> pages=4 vmalloc
> 
> With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
> vmalloc area and therefore they also get randomized (only one example
> line from /proc/vmallocinfo shown for brevity):
> 
> unrandomized:
> 0xc9018000-0xc9021000   36864 kernel_clone+0xf9/0x560 pages=8 
> vmalloc
> 
> randomized:
> 0xcb57611a8000-0xcb57611b1000   36864 

[PATCH v4] mm/vmalloc: randomize vmalloc() allocations

2021-03-09 Thread Topi Miettinen
Memory mappings inside kernel allocated with vmalloc() are in
predictable order and packed tightly toward the low addresses, except
for per-cpu areas which start from top of the vmalloc area. With
new kernel boot parameter 'randomize_vmalloc=1', the entire area is
used randomly to make the allocations less predictable and harder to
guess for attackers. Also module and BPF code locations get randomized
(within their dedicated and rather small area though) and if
CONFIG_VMAP_STACK is enabled, also kernel thread stack locations.

On 32 bit systems this may cause problems due to increased VM
fragmentation if the address space gets crowded.

On all systems, it will reduce performance and increase memory and
cache usage due to less efficient use of page tables and inability to
merge adjacent VMAs with compatible attributes. On x86_64 with 5 level
page tables, in the worst case, additional page table entries of up to
4 pages are created for each mapping, so with small mappings there's
considerable penalty.

Without randomize_vmalloc=1:
$ grep -v kernel_clone /proc/vmallocinfo
0xc900-0xc9009000   36864 
irq_init_percpu_irqstack+0x176/0x1c0 vmap
0xc9009000-0xc900b0008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe1000 ioremap
0xc900c000-0xc900f000   12288 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe ioremap
0xc900f000-0xc90110008192 hpet_enable+0x31/0x4a4 
phys=0xfed0 ioremap
0xc9011000-0xc90130008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9013000-0xc90150008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9015000-0xc90170008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9021000-0xc90230008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xc9023000-0xc90250008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0xfed0 ioremap
0xc9025000-0xc90270008192 memremap+0x19c/0x280 
phys=0x000f5000 ioremap
0xc9031000-0xc9036000   20480 pcpu_create_chunk+0xe8/0x260 
pages=4 vmalloc
0xc9043000-0xc9047000   16384 n_tty_open+0x11/0xe0 pages=3 
vmalloc
0xc9211000-0xc9232000  135168 crypto_scomp_init_tfm+0xc6/0xf0 
pages=32 vmalloc
0xc9232000-0xc9253000  135168 crypto_scomp_init_tfm+0x67/0xf0 
pages=32 vmalloc
0xc95a9000-0xc95ba000   69632 pcpu_create_chunk+0x7b/0x260 
pages=16 vmalloc
0xc95ba000-0xc95cc000   73728 pcpu_create_chunk+0xb2/0x260 
pages=17 vmalloc
0xe8c0-0xe8e0 2097152 pcpu_get_vm_areas+0x0/0x2290 
vmalloc

With randomize_vmalloc=1, the allocations are randomized:
$ grep -v kernel_clone /proc/vmallocinfo
0xc9759d443000-0xc9759d4450008192 hpet_enable+0x31/0x4a4 
phys=0xfed0 ioremap
0xccf1e9f66000-0xccf1e9f680008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xcd2fc02a4000-0xcd2fc02a60008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xcdaefb898000-0xcdaefb89b000   12288 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe ioremap
0xcef8074c3000-0xcef8074cc000   36864 
irq_init_percpu_irqstack+0x176/0x1c0 vmap
0xcf725ca2e000-0xcf725ca4f000  135168 crypto_scomp_init_tfm+0xc6/0xf0 
pages=32 vmalloc
0xd0efb25e1000-0xd0efb25f2000   69632 pcpu_create_chunk+0x7b/0x260 
pages=16 vmalloc
0xd27054678000-0xd2705467c000   16384 n_tty_open+0x11/0xe0 pages=3 
vmalloc
0xd2adf716e000-0xd2adf718   73728 pcpu_create_chunk+0xb2/0x260 
pages=17 vmalloc
0xd4ba5fb6b000-0xd4ba5fb6d0008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0x1ffe1000 ioremap
0xded126192000-0xded1261940008192 memremap+0x19c/0x280 
phys=0x000f5000 ioremap
0xe01a4dbcd000-0xe01a4dbcf0008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xe4b649952000-0xe4b6499540008192 acpi_os_map_iomem+0x2ac/0x2d0 
phys=0xfed0 ioremap
0xe71ed592a000-0xe71ed592c0008192 gen_pool_add_owner+0x49/0x130 
pages=1 vmalloc
0xe7dc5824f000-0xe7dc5827  135168 crypto_scomp_init_tfm+0x67/0xf0 
pages=32 vmalloc
0xe8f4f980-0xe8f4f9a0 2097152 pcpu_get_vm_areas+0x0/0x2290 
vmalloc
0xe8f4f9a19000-0xe8f4f9a1e000   20480 pcpu_create_chunk+0xe8/0x260 
pages=4 vmalloc

With CONFIG_VMAP_STACK, also kernel thread stacks are placed in
vmalloc area and therefore they also get randomized (only one example
line from /proc/vmallocinfo shown for brevity):

unrandomized:
0xc9018000-0xc9021000   36864 kernel_clone+0xf9/0x560 pages=8 
vmalloc

randomized:
0xcb57611a8000-0xcb57611b1000   36864 kernel_clone+0xf9/0x560 pages=8 
vmalloc

CC: Andrew Morton 
CC: Andy Lutomirski 
CC: Jann Horn 
CC: Kees Cook 
CC: Linux API 
CC: Matthew Wilcox 
CC: Mike Rapoport 
CC: Vlad Rezki 
Signed-off-by: Topi Miettinen