from:"Igor Stoppa"

Re: [PATCH 0/2] mm: tweaks for improving use of vmap_area

2018-05-01 Thread Igor Stoppa



On 01/05/18 03:15, Andrew Morton wrote:

On Fri, 27 Apr 2018 03:42:41 +0400 Igor Stoppa <igor.sto...@gmail.com> wrote:


These two patches were written in preparation for the creation of
protectable memory, however their use is not limited to pmalloc and can
improve the use of virtually contiguous memory.

The first provides a faster path from struct page to the vm_struct that
tracks it.

The second patch renames a single linked list field inside of vmap_area.
The list is currently used only for disposing of the data structure, once
it is not in use anymore.
Which means that it cold be used for other purposes while it's not queued
for destruction.


The patches look benign to me (feel free to add my ack),


thank you


but I'm not seeing a reason to apply them at this time?


I thought they might come useful to others playing with vmap_areas, I'll 
resubmit them anyway with the protected memory set.


But I was also hoping to get some more review, especially for the 
second, which had not received any definitive ACK/NACK, till now.


So, I'm also ok if they can be merged once the others are ACK'ed.

--
igor

[PATCH 3/3] genalloc: selftest

2018-05-01 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

While it is possible to compile and executethe test as kenrel module, it
is mostly useful to confirm that there are no problems.
In case there were problems, the system is likely to crash well before
modules can be loaded. When troubleshooting a crash, it is recommended
to compile the tests into the monolithic kernel.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 lib/Kconfig.debug   |  23 +++
 lib/Makefile|   1 +
 lib/test_genalloc.c | 419 
 3 files changed, 443 insertions(+)
 create mode 100644 lib/test_genalloc.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b734cd1..4f511ac20869 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1943,6 +1943,29 @@ config TEST_DEBUG_VIRTUAL
 
  If unsure, say N.
 
+config TEST_GENERIC_ALLOCATOR
+   tristate "genalloc tester"
+   default n
+   depends on GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space,
+ in particular, it tests that the size of each allcoation can be
+ determined correctly.
+
+ If unsure, say N.
+
+config TEST_GENERIC_ALLOCATOR_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   depends on TEST_GENERIC_ALLOCATOR
+   help
+ During the self-testing, it will be possibe to visualize the bit
+ patterns that are expected to be produced by the sequence of
+ memory-oriented operations.
+
+ If unsure, say N
+
 endif # RUNTIME_TESTING_MENU
 
 config MEMTEST
diff --git a/lib/Makefile b/lib/Makefile
index 384713ff70d3..2c66346ab246 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -114,6 +114,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_TEST_GENERIC_ALLOCATOR) += test_genalloc.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/test_genalloc.c b/lib/test_genalloc.c
new file mode 100644
index ..46ab7796c9ec
--- /dev/null
+++ b/lib/test_genalloc.c
@@ -0,0 +1,419 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_genalloc.c
+ *
+ * (C) Copyright 2017-18 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_TEST_GENERIC_ALLOCATOR_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+   for (i = 1; i <= BITMAP_SIZE_C; i++)
+   bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+   *bm = '\0';
+   pr_notice("chunk: %pbitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+   CMD_ALLOCATOR,
+   CMD_ALLOCATE,
+   CMD_FLUSH,
+   CMD_FREE,
+   CMD_NUMBER,
+   CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+   void *null;
+};
+
+struct test_allocator {
+   genpool_algo_t algo;
+   union {
+   struct genpool_data_align align;
+   struct genpool_data_fixed offset;
+   struct null_struct null;
+   } data;
+};
+
+struct test_action {
+   unsigned int location;
+   char pattern[BITMAP_SIZE_C];
+   unsigned int size;
+};
+
+
+struct test_command {
+   enum test_commands command;
+   union {
+   struct test_allocator allocator;
+   struct test_action action;
+   };
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it m

[PATCH 0/3 v2] linux-next: mm: Track genalloc allocations

2018-05-01 Thread Igor Stoppa

This patchset was created as part of an older version of pmalloc, however
it has value per-se, as it hardens the memory management for the generic
allocator genalloc.

Genalloc does not currently track the size of the allocations it hands
out.

Either by mistake, or due to an attack, it is possible that more memory
than what was initially allocated is freed, leaving behind dangling
pointers, ready for an use-after-free attack.

With this patch, genalloc becomes capable of tracking the size of each
allocation it has handed out, when it's time to free it.

It can either verify that the size received is correct, when free is
invoked, or it can decide autonomously how much memory to free, if the
value received for the size parameter is 0.

These patches are proposed for beign merged into linux-next, to verify
that they do not introduce regressions, by comparing the value received
from the callers of the free function with the internal tracking.

For this reason, the patchset does not contain the removal of the size
parameter from users of the free() function.

Later on, the "size" parameter can be dropped, and each caller can be
adjusted accordingly.

However, I do not have access to most of the HW required for confirming
that all of its users are not negatively affected.
This is where I believe having the patches in linux-next would help to
coordinate with the maintaiers of the code that uses gen_alloc.

Since there were comments about the (lack-of) efficiency introduced by
this patchset, I have added some more explanations and calculations to the
description of the first patch, the one adding the bitmap.
My conclusion is that this patch should not cause any major perfomance
problem.

Regarding the possibility of completely changing genalloc into some other
type of allocator, I think it should not be a showstopper for this
patchset, which aims to plug a security hole in genalloc, without
introducing any major regression.

The security flaw is clear and present, while the benefit of introducing a
new allocator is not clear, at least for the current users of genalloc.

And anyway the users of genalloc should be fixed to not pass any size
parameter, which can be done after this patch is merged.

A newer, more efficient allocator will still benefit from not receiving a
spurious parameter (size), when freeing memory.

Changes since v1:

[http://www.openwall.com/lists/kernel-hardening/2018/04/29/1]

* make the tester code a kernel module
* turn selftest BUG() error exit paths into WARN()
* add analysis of impact on current users of genalloc


Igor Stoppa (3):
  genalloc: track beginning of allocations
  Add label and license to genalloc.rst
  genalloc: selftest

 Documentation/core-api/genalloc.rst |   4 +
 include/linux/genalloc.h| 112 +++---
 lib/Kconfig.debug   |  23 ++
 lib/Makefile|   1 +
 lib/genalloc.c  | 742 ++--
 lib/test_genalloc.c | 419 
 6 files changed, 1046 insertions(+), 255 deletions(-)
 create mode 100644 lib/test_genalloc.c

-- 
2.14.1

[PATCH 1/3] genalloc: track beginning of allocations

2018-05-01 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
>From a security standpoint, such errors could be exploited - or even
forcibly caused - by an attacker, trying to obtain memory still in use
by live pointer.

Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed (potential attack)
* The memory being freed doesn't start at the beginning of an
  allocation. (potential attack)

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

The primary effect of this patch is that code using the gen_alloc
library does not need anymore to keep track of the size of the
allocations it makes.

Prior to this patch, it was necessary to keep track of the size of the
allocation, so that it would be possible, later on, to know how much
space should be freed.

Now, users of the api can choose to etiher still specify explicitly the
size, or let the library determine it, by giving a value of 0.

However, even when the value is specified, the library still uses its on
understanding of the space associated with a certain allocation, to
confirm that they are consistent.

This verification also confirms that the patch works correctly.

Eventually, the extra parameter (and the corresponding verification)
must be dropped, in favor of a simplified API.

But this patch allows to transition them individually, while ensuring
that the memory being freed has both a legal start and size.

>From the perspective of performance imapact, the patch doubles the size
of the bitmap that must be parsed, however the lockless allocations is
preserved.
Regarding the actual amount of bits occupied, the current users of
genalloc are of 2 main types and the size of one allocation unit in the
bitmap corresponds respectively to the following:

* SRAM / MURAM / small memory allocators, one allocation unit corresponds
to either 1 or few bytes of special memory.
However these kind of memory come in small sizes (few tens of kbytes),
because they are often on-die and tend to take up space.
Ex: 48kB SRAM, each byte mapped individually.
48kB -> 6kB bitmap prepatch, 12kB bitmap after patch
This type of memory is usually allocated during init or anyway at system
startup and then used to preserve system status while the rest of the
core/SoC is powered down.

* DMA memory pools, one allocation unit corresponds to 1 page
Ex: 1MB allocation request, with PAGE_SIZE = 4kB
1MB -> 32B bitmap prepatch, 64B bitmap after patch.
Assuming a cache line size of 64B, they would both fit in one.
Also in this case, DMA buffers are typically established when either the
system or a specific peripheral is brought up, but they are not
continuously allocated and destroyed. Even if the allocation were to
become slightly slower, it would not bring any significant performance
loss.

Furthermore, since the data in the bitmap is accessed sequentially, it
should be prefetched for read, which means tha the increased time spent
parsing the bitmap should be negligible, compared to the actual use of the
memory being allocated.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h | 112 +++
 lib/genalloc.c   | 742 ++-
 2 files changed, 599 insertions(+), 255 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..ff7229520656 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
@@ -93,74 +93,82 @@ struct genpool_data_fixed {
unsigned long offset;   /* The offset of the specific region */
 }

[PATCH 2/3] Add label and license to genalloc.rst

2018-05-01 Thread Igor Stoppa

Add SPDX license to genalloc.rst, then a label, to allow cross-referencing.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/genalloc.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/core-api/genalloc.rst 
b/Documentation/core-api/genalloc.rst
index 6b38a39fab24..0b5ade832ee8 100644
--- a/Documentation/core-api/genalloc.rst
+++ b/Documentation/core-api/genalloc.rst
@@ -1,3 +1,7 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _genalloc:
+
 The genalloc/genpool subsystem
 ==
 
-- 
2.14.1

Re: [PATCH 0/3 v2] linux-next: mm: Track genalloc allocations

2018-05-02 Thread Igor Stoppa




On 03/05/18 01:50, Andrew Morton wrote:

On Wed,  2 May 2018 05:05:19 +0400 Igor Stoppa <igor.sto...@gmail.com> wrote:


This patchset was created as part of an older version of pmalloc, however
it has value per-se, as it hardens the memory management for the generic
allocator genalloc.

Genalloc does not currently track the size of the allocations it hands
out.

Either by mistake, or due to an attack, it is possible that more memory
than what was initially allocated is freed, leaving behind dangling
pointers, ready for an use-after-free attack.

With this patch, genalloc becomes capable of tracking the size of each
allocation it has handed out, when it's time to free it.

It can either verify that the size received is correct, when free is
invoked, or it can decide autonomously how much memory to free, if the
value received for the size parameter is 0.

These patches are proposed for beign merged into linux-next, to verify
that they do not introduce regressions, by comparing the value received
from the callers of the free function with the internal tracking.

For this reason, the patchset does not contain the removal of the size
parameter from users of the free() function.

Later on, the "size" parameter can be dropped, and each caller can be
adjusted accordingly.

However, I do not have access to most of the HW required for confirming
that all of its users are not negatively affected.
This is where I believe having the patches in linux-next would help to
coordinate with the maintaiers of the code that uses gen_alloc.

Since there were comments about the (lack-of) efficiency introduced by
this patchset, I have added some more explanations and calculations to the
description of the first patch, the one adding the bitmap.
My conclusion is that this patch should not cause any major perfomance
problem.

Regarding the possibility of completely changing genalloc into some other
type of allocator, I think it should not be a showstopper for this
patchset, which aims to plug a security hole in genalloc, without
introducing any major regression.

The security flaw is clear and present, while the benefit of introducing a
new allocator is not clear, at least for the current users of genalloc.

And anyway the users of genalloc should be fixed to not pass any size
parameter, which can be done after this patch is merged.

A newer, more efficient allocator will still benefit from not receiving a
spurious parameter (size), when freeing memory.

...

  Documentation/core-api/genalloc.rst |   4 +
  include/linux/genalloc.h| 112 +++---
  lib/Kconfig.debug   |  23 ++
  lib/Makefile|   1 +
  lib/genalloc.c  | 742 ++--
  lib/test_genalloc.c | 419 


That's a big patch,


True, but I am afraid I do not see how to split it further without 
braking bisection.


 and I'm having trouble believing that it's

justified?  We're trying to reduce the harm in bugs (none of which are
known to exist) in a small number of drivers to avoid exploits, none of
which are known to exist and which may not even be possible.


Should I create one, to justify the patch?
Maybe, what we are really discussing if security should be reactive or 
preventive. And what amount of extra complexity is acceptable, without a 
current, present threat that has already materialized.


My personal take is, if I see something that I think I could exploit, 
most likely those who do write exploits for a (really well paid) living 
can do much more harm than I can even think of.



Or something like that.  Perhaps all this is taking defensiveness a bit
too far?


My main goal was to remove the "size" parameter from the free() call, 
without introducing noticeable performance regression.


Is that a reasonable endeavor?

After all, we have IOMMUs also for preventing similar types of attack.

The current users of genalloc are primarily:
* SRAM memory managers, which are attractive because they are used for 
example to store system wide state inbetween transitions to off, when 
some components (like the MMU) might not be even active.


* DMA page allocators, another nice side channel, where a DMA controller 
could be used to completely side-step the type of protection enforced by 
the MMU



And a bitmap is a pretty crappy way of managing memory anyway, surely?


I did not put it there :-P
It also depends what one needs it for and if it's good enough.
Or if something better is justified.


If this code is indeed performance-sensitive then perhaps a
reimplementation with some standard textbook allocator(?) is warranted?


But, is it really performance sensitive?

I might be wrong, but I think this change that I am proposing is not 
really affecting performance.


I did get a question/comment about performance implications.

I have explained why I think my patches are not adding any real 
performance problem, i

[PATCH 0/2] mm: tweaks for improving use of vmap_area

2018-04-26 Thread Igor Stoppa

These two patches were written in preparation for the creation of
protectable memory, however their use is not limited to pmalloc and can
improve the use of virtally contigous memory.

The first provides a faster path from struct page to the vm_struct that
tracks it.

The second patch renames a single linked list field inside of vmap_area.
The list is currently used only for disposing of the data structure, once
it is not in use anymore.
Which means that it cold be used for other purposes while it'not queued
for destruction.

The patches can also be obtained from here:

https://github.com/Igor-security/linux/tree/preparations-for-mm


Igor Stoppa (2):
  struct page: add field for vm_struct
  vmalloc: rename llist field in vmap_area

 include/linux/mm_types.h | 1 +
 include/linux/vmalloc.h  | 2 +-
 mm/vmalloc.c | 8 +---
 3 files changed, 7 insertions(+), 4 deletions(-)

-- 
2.14.1

[PATCH 2/2] vmalloc: rename llist field in vmap_area

2018-04-26 Thread Igor Stoppa

The vmap_area structure has a field of type struct llist_node, named
purge_list and is used when performing lazy purge of the area.

Such field is left unused during the actual utilization of the
structure.

This patch renames the field to a more generic "area_list", to allow for
utilization outside of the purging phase.

Since the purging happens after the vmap_area is dismissed, its use is
mutually exclusive with any use performed while the area is allocated.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/vmalloc.h | 2 +-
 mm/vmalloc.c| 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..2d07dfef3cfd 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -47,7 +47,7 @@ struct vmap_area {
unsigned long flags;
struct rb_node rb_node; /* address sorted rbtree */
struct list_head list;  /* address sorted list */
-   struct llist_node purge_list;/* "lazy purge" list */
+   struct llist_node area_list;/* generic list of areas */
struct vm_struct *vm;
struct rcu_head rcu_head;
 };
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61a1ca22b0f6..1bb2233bb262 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -682,7 +682,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, 
unsigned long end)
lockdep_assert_held(_purge_lock);
 
valist = llist_del_all(_purge_list);
-   llist_for_each_entry(va, valist, purge_list) {
+   llist_for_each_entry(va, valist, area_list) {
if (va->va_start < start)
start = va->va_start;
if (va->va_end > end)
@@ -696,7 +696,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, 
unsigned long end)
flush_tlb_kernel_range(start, end);
 
spin_lock(_area_lock);
-   llist_for_each_entry_safe(va, n_va, valist, purge_list) {
+   llist_for_each_entry_safe(va, n_va, valist, area_list) {
int nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
 
__free_vmap_area(va);
@@ -743,7 +743,7 @@ static void free_vmap_area_noflush(struct vmap_area *va)
_lazy_nr);
 
/* After this point, we may free va at any time */
-   llist_add(>purge_list, _purge_list);
+   llist_add(>area_list, _purge_list);
 
if (unlikely(nr_lazy > lazy_max_pages()))
try_purge_vmap_area_lazy();
-- 
2.14.1

[PATCH 1/2] struct page: add field for vm_struct

2018-04-26 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtain
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area.

This will avoid more expensive searches, later on.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
Reviewed-by: Jay Freyensee <why2jjj.li...@gmail.com>
Reviewed-by: Matthew Wilcox <mawil...@microsoft.com>
---
 include/linux/mm_types.h | 1 +
 mm/vmalloc.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..c74e2aa9a48b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -86,6 +86,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ebff729cc956..61a1ca22b0f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1536,6 +1536,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1705,6 +1706,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
area->nr_pages = i;
goto fail;
}
+   page->area = area;
area->pages[i] = page;
if (gfpflags_allow_blocking(gfp_mask|highmem_mask))
cond_resched();
-- 
2.14.1

Re: [PATCH 0/3] linux-next: mm: hardening: Track genalloc allocations

2018-04-29 Thread Igor Stoppa


oops, sorry, I forgot the references :-(

On 29/04/18 20:39, Igor Stoppa wrote:



On 29/04/18 07:09, Matthew Wilcox wrote:

On Sun, Apr 29, 2018 at 06:45:39AM +0400, Igor Stoppa wrote:
This patchset was created as part of an older version of pmalloc, 
however

it has value per-se, as it hardens the memory management for the generic
allocator genalloc.

Genalloc does not currently track the size of the allocations it 
hands out.


Either by mistake, or due to an attack, it is possible that more memory
than what was initially allocated is freed, leaving behind dangling
pointers, ready for an use-after-free attack.


This is a good point.  It is worth hardening genalloc.
But I still don't like the cost of the bitmap.  I've been
reading about allocators and I found Bonwick's paper from 2001:
https://www.usenix.org/legacy/event/usenix01/full_papers/bonwick/bonwick.pdf 


Section 4 describes the vmem allocator which would seem to have superior
performance and lower memory overhead than the current genalloc 
allocator,

never mind the hardened allocator.

Maybe there's been an advnace in resource allocator technology since
then that someone more familiar with CS research can point out.


A quick search on google shows that there have been tons of improvements.

I found various implementation of vmem, not all with GPLv2 compatible 
license.


The most interesting one seems to be a libvmem from Intel[1], made to 
use jemalloc[2], for persistent memory.


jemalloc is, apaprently, the coolest kid on the block, when it comes to 
modern memory management.


But this is clearly a very large lump of work.

First of all, it should be assessed if jemalloc is really what the 
kernel could benefit from (my guess is yes, but it's just a guess), then 
if the license is compatible or if it can be relicensed for use in the 
kernel.


And, last, but not least, how to integrate the ongoing work in a way 
that doesn't require lots of effort to upgrade to new releases.


Even if it looks very interesting, I simply do not have time to do this, 
not for the next 5-6 months, for sure.


What I *can* offer to do, is the cleanup of the users of genalloc, by 
working with the various maintainers to remove the "size" parameter in 
the calls to gen_pool_free(), iff the patch I submitted can be merged.


This is work that has to be done anyway and does not preclude, later on, 
to phase out genalloc in favor of jemalloc or whatever is deemed to be 
the most effective solution.


There are 2 goals here:
1) plug potential security holes in the use of genalloc
2) see if some new allocator can improve the performance (and it might 
well be that the improvement can be extended also to other allocators 
used in kernel)


We seem to agree that 1) is a real need.
Regarding 2), I think it should have a separate life.

going back to 1), your objections so far, as far as I can tell are:

a) it will use more memory for the bitmap
b) it will be slower, because the bitmap is doubled
c) the "ends" or "starts" bitmaps should be separate

I think I have already answered them, but I'll recap my replies:

a) yes, it will double it, but if it was ok to "waste" some memory when 
I was asked to rewrite the pmalloc allocator to not use genalloc, in 
favor of speed, I think the same criteria applies here: on average it 
will probably take at most one more page per pool. It doesn't seem a 
huge loss.


b) the bitmap size is doubled, that much is true, however interleaving 
the "busy" and "start" bitmaps will ensure locality of the meta data and 
between the cache prefetch algorithm and the hints give to the compiler, 
it shouldn't make a huge difference, compared to the pre-patch case.
Oh, and the size of a bitmap seems to be overall negligible, from what 
users I saw.


c) "busy" and "start" are interleaved, to avoid having to do explicit 
locking, instead of relying on the intrinsic atomicity of accessing 
bitfields coming from the same word, as it is now.


And I'm anyway proposing to merge this into linux-next, so that there 
are more eyeballs looking for problems. I'm not proposing to merge it 
straight in the next kernel release.
Removing the size parameter from the various gen_pool_free() will impact 
not only the direct callers, but also their callers and so on, which 
means that it will take some time to purge all the layers of calls from 
"size".


During this time, it's likely that issues will surface, if there is any 
lurking.


And the removal of the parameter will require getting ACK from each 
user, so it should be enough to ensure that everyone is happy about the 
overall performance.


But I would start addressing the security issue, since I think the cost 
of doubling the bitmap will not be noticeable.


I'd like to hear if you disagree with my reasoning.

And did I overlook some other objection?



[1] https://github.com/pmem/pmdk/tree/master/src/libvmem
[2] http://jemalloc.net/

Re: [PATCH 0/3] linux-next: mm: hardening: Track genalloc allocations

2018-04-29 Thread Igor Stoppa




On 29/04/18 07:09, Matthew Wilcox wrote:

On Sun, Apr 29, 2018 at 06:45:39AM +0400, Igor Stoppa wrote:

This patchset was created as part of an older version of pmalloc, however
it has value per-se, as it hardens the memory management for the generic
allocator genalloc.

Genalloc does not currently track the size of the allocations it hands out.

Either by mistake, or due to an attack, it is possible that more memory
than what was initially allocated is freed, leaving behind dangling
pointers, ready for an use-after-free attack.


This is a good point.  It is worth hardening genalloc.
But I still don't like the cost of the bitmap.  I've been
reading about allocators and I found Bonwick's paper from 2001:
https://www.usenix.org/legacy/event/usenix01/full_papers/bonwick/bonwick.pdf
Section 4 describes the vmem allocator which would seem to have superior
performance and lower memory overhead than the current genalloc allocator,
never mind the hardened allocator.

Maybe there's been an advnace in resource allocator technology since
then that someone more familiar with CS research can point out.


A quick search on google shows that there have been tons of improvements.

I found various implementation of vmem, not all with GPLv2 compatible 
license.


The most interesting one seems to be a libvmem from Intel[1], made to 
use jemalloc[2], for persistent memory.


jemalloc is, apaprently, the coolest kid on the block, when it comes to 
modern memory management.


But this is clearly a very large lump of work.

First of all, it should be assessed if jemalloc is really what the 
kernel could benefit from (my guess is yes, but it's just a guess), then 
if the license is compatible or if it can be relicensed for use in the 
kernel.


And, last, but not least, how to integrate the ongoing work in a way 
that doesn't require lots of effort to upgrade to new releases.


Even if it looks very interesting, I simply do not have time to do this, 
not for the next 5-6 months, for sure.


What I *can* offer to do, is the cleanup of the users of genalloc, by 
working with the various maintainers to remove the "size" parameter in 
the calls to gen_pool_free(), iff the patch I submitted can be merged.


This is work that has to be done anyway and does not preclude, later on, 
to phase out genalloc in favor of jemalloc or whatever is deemed to be 
the most effective solution.


There are 2 goals here:
1) plug potential security holes in the use of genalloc
2) see if some new allocator can improve the performance (and it might 
well be that the improvement can be extended also to other allocators 
used in kernel)


We seem to agree that 1) is a real need.
Regarding 2), I think it should have a separate life.

going back to 1), your objections so far, as far as I can tell are:

a) it will use more memory for the bitmap
b) it will be slower, because the bitmap is doubled
c) the "ends" or "starts" bitmaps should be separate

I think I have already answered them, but I'll recap my replies:

a) yes, it will double it, but if it was ok to "waste" some memory when 
I was asked to rewrite the pmalloc allocator to not use genalloc, in 
favor of speed, I think the same criteria applies here: on average it 
will probably take at most one more page per pool. It doesn't seem a 
huge loss.


b) the bitmap size is doubled, that much is true, however interleaving 
the "busy" and "start" bitmaps will ensure locality of the meta data and 
between the cache prefetch algorithm and the hints give to the compiler, 
it shouldn't make a huge difference, compared to the pre-patch case.
Oh, and the size of a bitmap seems to be overall negligible, from what 
users I saw.


c) "busy" and "start" are interleaved, to avoid having to do explicit 
locking, instead of relying on the intrinsic atomicity of accessing 
bitfields coming from the same word, as it is now.


And I'm anyway proposing to merge this into linux-next, so that there 
are more eyeballs looking for problems. I'm not proposing to merge it 
straight in the next kernel release.
Removing the size parameter from the various gen_pool_free() will impact 
not only the direct callers, but also their callers and so on, which 
means that it will take some time to purge all the layers of calls from 
"size".


During this time, it's likely that issues will surface, if there is any 
lurking.


And the removal of the parameter will require getting ACK from each 
user, so it should be enough to ensure that everyone is happy about the 
overall performance.


But I would start addressing the security issue, since I think the cost 
of doubling the bitmap will not be noticeable.


I'd like to hear if you disagree with my reasoning.

And did I overlook some other objection?

--
igor

Re: [PATCH 3/3] genalloc: selftest

2018-04-29 Thread Igor Stoppa


On 29/04/18 07:36, Randy Dunlap wrote:

On 04/28/2018 07:45 PM, Igor Stoppa wrote:


[...]


+   test_genalloc();


Is there a stub for test_genalloc() when its config option is not enabled?
I don't see it.


I failed to add to the patch include/linux/test_genalloc.h :-/
That's where the stub is hiding.


diff --git a/lib/Kconfig b/lib/Kconfig
index 09565d779324..2bf89af50728 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -303,6 +303,21 @@ config DECOMPRESS_LZ4
  config GENERIC_ALLOCATOR
bool
  


These TEST_ kconfig symbols should be in lib/Kconfig.debug, not lib/Kconfig.


ok, I will fix it


+config TEST_GENERIC_ALLOCATOR
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR


This should depend on GENERIC_ALLOCATOR, not select it.

See TEST_PARMAN, TEST_BPF, TEST_FIRMWARE, TEST_SYSCTL, TEST_DEBUG_VIRTUAL
in lib/Kconfig.debug.


I was actually wondering about this.
The dependency I came up with allows to perform the test even if nothing 
is selecting genalloc, but ok, if this is how it is done, I'll adjust to 
it.



+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config TEST_GENERIC_ALLOCATOR_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select TEST_GENERIC_ALLOCATOR


depends on TEST_GENERIC_ALLOCATOR


ok

[...]


+ * guarranteed; allowing the boot to continue means risking to corrupt


   guaranteed;


hmmm

--

thanks, igor

Re: Correct way to access the physmap? - Was: Re: [PATCH 7/9] Pmalloc Rare Write: modify selected pools

2018-05-03 Thread Igor Stoppa




On 04/05/18 01:55, Dave Hansen wrote:

On 05/03/2018 02:52 PM, Igor Stoppa wrote:

At the end of the summit, we agreed that I would go through the physmap.


Do you mean the kernel linear map? 


Apparently I did mean it. It was confusing, because I couldn't find a 
single place stating it explicitly, like you just did.



That's just another name for the
virtual address that you get back from page_to_virt():

int *j = page_to_virt(vmalloc_to_page(i));



One reason why I was not sure is that also the linear mapping gets 
protected, when I protect hte corresponding page in the vmap_area:


if i do:

int *i = vmalloc(sizeof(int));
int *j = page_to_virt(vmalloc_to_page(i));
*i = 1;
set_memory_ro(i, 1);
*j = 2;

I get an error, because also *j has become read only.
I was expecting to have to do the protection of the linear mapping in a 
second phase.


It turns out that - at least on x86_64 - it's already in place.

But it invalidates what we agreed, which was based on the assumption 
that the linear mapping was writable all the time.


I see two options:

1) continue to go through the linear mapping, unprotecting it for the 
time it takes to make the write.


2) use the code I already wrote, which creates an additional, temporary 
mapping in R/W mode at a random address.



I'd prefer 2) because it is already designed to make life harder for 
someone trying to attack the data in the page: even if one manages to 
take over a core and busy loop on it, option 2) will use a random 
temporary address, that is harder to figure out.


Option 1) re-uses the linear mapping and therefore the attacker really 
only needs to get lucky and, depending on the target, write over some 
other data that was in the same page being unprotected, or overwrite the 
same data being updated, after the update has taken place.



Is there any objection if I continue with options 2?

--
igor

Correct way to access the physmap? - Was: Re: [PATCH 7/9] Pmalloc Rare Write: modify selected pools

2018-05-03 Thread Igor Stoppa




On 24/04/18 15:50, Matthew Wilcox wrote:

On Mon, Apr 23, 2018 at 04:54:56PM +0400, Igor Stoppa wrote:

While the vanilla version of pmalloc provides support for permanently
transitioning between writable and read-only of a memory pool, this
patch seeks to support a separate class of data, which would still
benefit from write protection, most of the time, but it still needs to
be modifiable. Maybe very seldom, but still cannot be permanently marked
as read-only.


This seems like a horrible idea that basically makes this feature useless.
I would say the right way to do this is to have:

struct modifiable_data {
struct immutable_data *d;
...
};

Then allocate a new pool, change d and destroy the old pool.


At the end of the summit, we agreed that I would go through the physmap.

But I'm not sure of what is the correct way to access it :-/

Starting from a vmalloc address, say:

int *i = vmalloc(sizeof(int));

I can get its linear counterpart:

int *j = page_to_virt(vmalloc_to_page(i));

and the physical address:

int *k = virt_to_phys(j);

But how do I get to the physmap?

I did not find much about it, apart from papers that talk about specific 
hardcoded addresses, but I would expect that if there is any hardcoded 
constant, by now, it's hidden behind some macro.


What I have verified, so far, at least on qemu x86_64, is that 
protecting "i" will also make "j" unwritable.


--
igor

[PATCH 2/3] Add label and license to genalloc.rst

2018-04-28 Thread Igor Stoppa

Add SPDX license to genalloc.rst, then a label, to allow cross-referencing.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/genalloc.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/core-api/genalloc.rst 
b/Documentation/core-api/genalloc.rst
index 6b38a39fab24..0b5ade832ee8 100644
--- a/Documentation/core-api/genalloc.rst
+++ b/Documentation/core-api/genalloc.rst
@@ -1,3 +1,7 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _genalloc:
+
 The genalloc/genpool subsystem
 ==
 
-- 
2.14.1

[PATCH 1/3] genalloc: track beginning of allocations

2018-04-28 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

The primary effect of this patch is that code using the gen_alloc
library does not need anymore to keep track of the size of the
allocations it makes.

Prior to this patch, it was necessary to keep track of the size of the
allocation, so that it would be possible, later on, to know how much
space should be freed.

Now, users of the api can choose to etiher still specify explicitly the
size, or let the library determine it, by giving a value of 0.

However, even when the value is specified, the library still uses its on
understanding of the space associated with a certain allocation, to
confirm that they are consistent.

This verification also confirms that the patch works correctly.

Eventually, the extra parameter (and the corresponding verification)
could be dropped, in favor of a simplified API.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h | 112 +++
 lib/genalloc.c   | 742 ++-
 2 files changed, 599 insertions(+), 255 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..ff7229520656 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
@@ -93,74 +93,82 @@ struct genpool_data_fixed {
unsigned long offset;   /* The offset of the specific region */
 };
 
-extern struct gen_pool *gen_pool_create(int, int);
-extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long);
-extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t,
-size_t, int);
-/**
- * gen_pool_add - add a new chunk of special memory to the pool
- * @pool: pool to add new memory chunk to
- * @addr: starting address of memory chunk to add to pool
- * @size: size in bytes of the memory chunk to add to pool
- * @nid: node id of the node the chunk structure and bitmap should be
- *   allocated on, or -1
- *
- * Add a new chunk of special memory to the specified pool.
- *
- * Returns 0 on success or a -ve errno on failure.
- */
+struct gen_pool *gen_pool_create(int min_alloc_order, int nid);
+
+int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
+ phys_addr_t phys, size_t size, int nid);
+
+
 static inline int gen_pool_add(struct gen_pool *pool, unsigned long addr,
   size_t size, int nid)
 {
return gen_pool_add_virt(pool, addr, -1, size, nid);
 }
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
-   genpool_algo_t algo, void *data);
-extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
-   dma_addr_t *dma);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
-extern void gen_pool_for_each_chunk(struct gen_pool *,
-   void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
-extern size_t gen_pool_avail(struct gen_pool *);
-extern size_t gen_pool_size(struct gen_pool *);
 
-extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo,
-   void *data);
+phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr);
 
-extern un

[PATCH 3/3] genalloc: selftest

2018-04-28 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

The testing takes place in the very early stages of main.c, to ensure
that failures in genalloc are caught before they can cause unexplained
erratic behavior in any of genalloc users.

Therefore, it would not be advisable to implement it as module.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 init/main.c |   2 +
 lib/Kconfig |  15 ++
 lib/Makefile|   1 +
 lib/test_genalloc.c | 410 
 4 files changed, 428 insertions(+)
 create mode 100644 lib/test_genalloc.c

diff --git a/init/main.c b/init/main.c
index b795aa341a3a..b3b319d91b0e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -91,6 +91,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -679,6 +680,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   test_genalloc();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index 09565d779324..2bf89af50728 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -303,6 +303,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config TEST_GENERIC_ALLOCATOR
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config TEST_GENERIC_ALLOCATOR_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select TEST_GENERIC_ALLOCATOR
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index 384713ff70d3..2c66346ab246 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -114,6 +114,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_TEST_GENERIC_ALLOCATOR) += test_genalloc.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/test_genalloc.c b/lib/test_genalloc.c
new file mode 100644
index ..ab9984861517
--- /dev/null
+++ b/lib/test_genalloc.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_genalloc.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@...wei.com>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+/*
+ * In case of failure of any of these tests, memory corruption is almost
+ * guarranteed; allowing the boot to continue means risking to corrupt
+ * also any filesystem/block device accessed write mode.
+ * Therefore, BUG_ON() is used, when testing.
+ */
+
+
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_TEST_GENERIC_ALLOCATOR_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+   for (i = 1; i <= BITMAP_SIZE_C; i++)
+   bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+   *bm = '\0';
+   pr_notice("chunk: %pbitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+   CMD_ALLOCATOR,
+   CMD_ALLOCATE,
+   CMD_FLUSH,
+   CMD_FREE,
+   CMD_NUMBER,
+   CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+   void *null;
+};
+
+struct test_allocator {
+   genpool_algo_t algo;
+   union {
+   struct genpool_data_align align;
+   struct genpool_

[PATCH 0/3] linux-next: mm: hardening: Track genalloc allocations

2018-04-28 Thread Igor Stoppa

This patchset was created as part of an older version of pmalloc, however
it has value per-se, as it hardens the memory management for the generic
allocator genalloc.

Genalloc does not currently track the size of the allocations it hands out.

Either by mistake, or due to an attack, it is possible that more memory
than what was initially allocated is freed, leaving behind dangling
pointers, ready for an use-after-free attack.

With this patch, genalloc becomes capable of tracking the size of each
allocation it has handed out, when it's time to free it.

It can either verify that the size received, when free is invoked, is
correct, or it can decide autonomously how much memory to free, if the
value received for the size parameter is 0.

These patches are proposed for beign merged into linux-next, to verify
that they do not introduce regressions, by comparing the value received
from the callers of the free function with the internal tracking.

Later on, the "size" parameter can be dropped, and each caller can be
adjusted accordingly.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>

Igor Stoppa (3):
  genalloc: track beginning of allocations
  Add label and license to genalloc.rst
  genalloc: selftest

 Documentation/core-api/genalloc.rst |   4 +
 include/linux/genalloc.h| 112 +++---
 init/main.c |   2 +
 lib/Kconfig |  15 +
 lib/Makefile|   1 +
 lib/genalloc.c  | 742 ++--
 lib/test_genalloc.c | 410 
 7 files changed, 1031 insertions(+), 255 deletions(-)
 create mode 100644 lib/test_genalloc.c

-- 
2.14.1

[PATCH 5/6] Documentation for Pmalloc

2018-01-24 Thread Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/pmalloc.txt | 104 +
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.txt

diff --git a/Documentation/core-api/pmalloc.txt 
b/Documentation/core-api/pmalloc.txt
new file mode 100644
index 000..9c39672
--- /dev/null
+++ b/Documentation/core-api/pmalloc.txt
@@ -0,0 +1,104 @@
+
+Protectable memory allocator
+
+
+Introduction
+
+
+When trying to perform an attack toward a system, the attacker typically
+wants to alter the execution flow, in a way that allows actions which
+would otherwise be forbidden.
+
+In recent years there has been lots of effort in preventing the execution
+of arbitrary code, so the attacker is progressively pushed to look for
+alternatives.
+
+If code changes are either detected or even prevented, what is left is to
+alter kernel data.
+
+As countermeasure, constant data is collected in a section which is then
+marked as readonly.
+To expand on this, also statically allocated variables which are tagged
+as __ro_after_init will receive a similar treatment.
+The difference from constant data is that such variables can be still
+altered freely during the kernel init phase.
+
+However, such solution does not address those variables which could be
+treated essentially as read-only, but whose size is not known at compile
+time or cannot be fully initialized during the init phase.
+
+
+Design
+--
+
+pmalloc builds on top of genalloc, using the same concept of memory pools
+A pool is a handle to a group of chunks of memory of various sizes.
+When created, a pool is empty. It will be populated by allocating chunks
+of memory, either when the first memory allocation request is received, or
+when a pre-allocation is performed.
+
+Either way, one or more memory pages will be obtaiend from vmalloc and
+registered in the pool as chunk. Subsequent requests will be satisfied by
+either using any available free space from the current chunks, or by
+allocating more vmalloc pages, should the current free space not suffice.
+
+This is the key point of pmalloc: it groups data that must be protected
+into a set of pages. The protection is performed through the mmu, which
+is a prerequisite and has a minimum granularity of one page.
+
+If the relevant variables were not grouped, there would be a problem of
+allowing writes to other variables that might happen to share the same
+page, but require further alterations over time.
+
+A pool is a group of pages that are write protected at the same time.
+Ideally, they have some high level correlation (ex: they belong to the
+same module), which justifies write protecting them all together.
+
+To keep it to a minimum, locking is left to the user of the API, in
+those cases where it's not strictly needed.
+Ideally, no further locking is required, since each module can have own
+pool (or pools), which should, for example, avoid the need for cross
+module or cross thread synchronization about write protecting a pool.
+
+The overhead of creating an additional pool is minimal: a handful of bytes
+from kmalloc space for the metadata and then what is left unused from the
+page(s) registered as chunks.
+
+Compared to plain use of vmalloc, genalloc has the advantage of tightly
+packing the allocations, reducing the number of pages used and therefore
+the pressure on the TLB. The slight overhead in execution time of the
+allocation should be mostly irrelevant, because pmalloc memory is not
+meant to be allocated/freed in tight loops. Rather it ought to be taken
+in use, initialized and write protected. Possibly destroyed.
+
+Considering that not much data is supposed to be dynamically allocated
+and then marked as read-only, it shouldn't be an issue that the address
+range for pmalloc is limited, on 32-bit systemd.
+
+Regarding SMP systems, the allocations are expected to happen mostly
+during an initial transient, after which there should be no more need to
+perform cross-processor synchronizations of page tables.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+2. [optional] pre-allocate some memory in the pool
+3. issue one or more allocation requests to the pool
+4. initialize the memory obtained
+   - iterate over points 3 & 4 as needed -
+5. write protect the pool
+6. use in read-only mode the handlers obtained throguh the allocations
+7. [optional] destroy the pool
+
+
+In a scenario where, for example due to some error, part or all of the
+allocations performed at point 3 must be reverted, it is possible to free
+them, as long as point 5 has not been executed, and the pool is still
+modifiable. Such freed memory can be re-used.
+Performing a free operation on a write-protected pool will, instead,
+simply release the corresponding m

[PATCH 6/6] Pmalloc: self-test

2018-01-24 Thread Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 mm/Kconfig|  7 ++
 mm/Makefile   |  1 +
 mm/pmalloc-selftest.c | 65 +++
 mm/pmalloc-selftest.h | 30 
 mm/pmalloc.c  |  3 +++
 5 files changed, 106 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8f..1de6ea6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,10 @@ config GUP_BENCHMARK
  performance of get_user_pages_fast().
 
  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY_SELFTEST
+   bool "Run self test for pmalloc memory allocator"
+   default n
+   help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index a6a47e1..1e76a9b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 000..1c025f3
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,65 @@
+/*
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)   \
+   pr_notice("must be " expected ": %s",   \
+ is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)\
+   validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)\
+   validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+   struct gen_pool *pool_unprot;
+   struct gen_pool *pool_prot;
+   void *var_prot, *var_unprot, *var_vmall;
+
+   pr_notice("pmalloc self-test");
+   pool_unprot = pmalloc_create_pool("unprotected", 0);
+   pool_prot = pmalloc_create_pool("protected", 0);
+   BUG_ON(!(pool_unprot && pool_prot));
+
+   var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+   var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+   var_vmall = vmalloc(SIZE_2);
+   is_alloc_ok(var_unprot, 10);
+   is_alloc_ok(var_unprot, SIZE_1);
+   is_alloc_ok(var_unprot, PAGE_SIZE);
+   is_alloc_no(var_unprot, SIZE_1 + 1);
+   is_alloc_no(var_vmall, 10);
+
+
+   pfree(pool_unprot, var_unprot);
+   vfree(var_vmall);
+
+   pmalloc_protect_pool(pool_prot);
+
+   /* This will intentionally trigger a WARN because the pool being
+* destroyed is not protected, which is unusual and should happen
+* on error paths only, where probably other warnings are already
+* displayed.
+*/
+   pmalloc_destroy_pool(pool_unprot);
+
+   /* This must not cause WARNings */
+   pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 000..3673d23
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __PMALLOC_SELFTEST_H__
+#define __PMALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+#include 
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index a64ac49..a722d7b 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include "pmalloc-selftest.h"
+
 /**
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -508,6 +510,7 @@ static int __init pmalloc_late_init(void)
}
}
mutex_unlock(_mutex);
+   pmalloc_selftest();
return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.9.3

[PATCH 4/6] Protectable Memory

2018-01-24 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 215 
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c   |  27 +++
 mm/Makefile  |   1 +
 mm/pmalloc.c | 513 +++
 mm/usercopy.c|  25 ++-
 7 files changed, 781 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index a8fdabf..9f2974f 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, 
size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000..cb18739
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,215 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include 
+#include 
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * a

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-01-26 Thread Igor Stoppa

On 26/01/18 07:35, Matthew Wilcox wrote:
> On Wed, Jan 24, 2018 at 08:10:53PM +0100, Jann Horn wrote:
>> I'm not entirely convinced by the approach of marking small parts of
>> kernel memory as readonly for hardening.
> 
> It depends how significant the data stored in there are.  For example,
> storing function pointers in read-only memory provides significant
> hardening.
> 
>> You're allocating with vmalloc(), which, as far as I know, establishes
>> a second mapping in the vmalloc area for pages that are already mapped
>> as RW through the physmap. AFAICS, later, when you're trying to make
>> pages readonly, you're only changing the protections on the second
>> mapping in the vmalloc area, therefore leaving the memory writable
>> through the physmap. Is that correct? If so, please either document
>> the reasoning why this is okay or change it.
> 
> Yes, this is still vulnerable to attacks through the physmap.  That's also
> true for marking structs as const.  We should probably fix that at some
> point, but at least they're not vulnerable to heap overruns by small
> amounts ... you have to be able to overrun some other array by terabytes.

Actually, I think there is something to say in favor of using a vmalloc
based approach, precisely because of the physmap :-P

If I understood correctly, the physmap is primarily meant to speed up
access to physical memory through the TLB. In particular, for kmalloc
based allocations.

Which means that, to perform a physmap-based attack to a kmalloced
allocation, one needs to know:

- the address of the target variable in the kmalloc range
- the randomized offset of the kernel
- the location of the physmap

But, for a vmalloc based allocation, there is one extra hoop: since the
mapping is really per page, now the attacker has actually to walk the
page table, to figure out where to poke in the physmap.

One more thought about physmap: does it map also code?
Because, if it does, and one wants to use it for an attack, isn't it
easier to look for some security test and replace a bne with be or
equivalent?

> It's worth having a discussion about whether we want the pmalloc API
> or whether we want a slab-based API.

pmalloc is meant to be useful where the attack surface is made up of
lots of small allocations - my first use case was the SE Linux policy
DB, where there is a variety of elements being allocated, in large
amount. To the point where having ready made caches would be wasteful.

Then there is the issue I already mentioned about arm/arm64 which would
require to break down large mappings, which seems to be against current
policy, as described in my previous mail:

http://www.openwall.com/lists/kernel-hardening/2018/01/24/11

I do not know exactly what you have in mind wrt slab, but my impression
is that it will most likely gravitate toward the pmalloc implementation.
It will need:

- "pools" or anyway some means to lock only a certain group of pages,
related to a specific kernel user

- (mostly) lockless allocation

- a way to manage granularity (or order of allocation)

Most of this is already provided by genalloc, which is what I ended up
almost re-implementing, before being pointed to it :-)

I only had to add the tracking of end of allocations, which is what the
patch 1/6 does - as side note, is anybody maintaining it?
I could not find an entry in MAINTAINERS

As I mentioned above, using vmalloc adds even an extra layer of protection.

The major downside is the increased TLB use, however this is not so
relevant for the volumes of data that I had to deal with so far:
only few 4K pages.

But you might have in mind something else.
I'd be interested to know what and what would be an obstacle in using
pmalloc. Maybe it can be solved.

--
igor

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-01-26 Thread Igor Stoppa

On 25/01/18 17:38, Jerome Glisse wrote:
> On Thu, Jan 25, 2018 at 10:14:28AM -0500, Boris Lukashev wrote:
>> On Thu, Jan 25, 2018 at 6:59 AM, Igor Stoppa <igor.sto...@huawei.com> wrote:
> 
> [...]
> 
>> DMA/physmap access coupled with a knowledge of which virtual mappings
>> are in the physical space should be enough for an attacker to bypass
>> the gating mechanism this work imposes. Not trivial, but not
>> impossible. Since there's no way to prevent that sort of access in
>> current hardware (especially something like a NIC or GPU working
>> independently of the CPU altogether)

[...]

> I am not saying that this can not happen but that we are trying our best
> to avoid it.

How about an opt-in verification, similar to what proposed by Boris
Lukashev?

When reading back the data, one could access the pointer directly and
bypass the verification, or could use a function that explicitly checks
the integrity of the data.

Starting from an unprotected kmalloc allocation, even just turning the
data into R/O is an improvement, but if one can afford the overhead of
performing the verification, why not?

It would still be better if the service was provided by the library,
instead than implemented by individual users, I think.

--
igor

Re: [PATCH 4/6] Protectable Memory

2018-01-26 Thread Igor Stoppa

On 24/01/18 19:56, Igor Stoppa wrote:

[...]

> +bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
> +{

[...]

> +abort:
> + vfree(chunk);

this should be vfree_atomic()

[...]

> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
> +{

[...]

> +free:
> + vfree(chunk);

and this one too

I will fix them in the next iteration.
I am waiting to see if any more comments arrive.
Otherwise, I'll send it out probably next Tuesday.

--
igor

Re: [PATCH 5/6] Documentation for Pmalloc

2018-01-24 Thread Igor Stoppa



On 24/01/18 21:14, Ralph Campbell wrote:
> 2 Minor typos inline below:

thanks for proof-reading, will fix accordingly.

--
igor

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-01-25 Thread Igor Stoppa

Hi,

thanks for the review. My reply below.

On 24/01/18 21:10, Jann Horn wrote:

> I'm not entirely convinced by the approach of marking small parts of
> kernel memory as readonly for hardening.

Because of the physmap you mention later?

Regarding small parts vs big parts (what is big enough?) I did propose
the use of a custom zone at the very beginning, however I met 2 objections:

1. It's not a special case and there was no will to reserve another zone
   This might be mitigated by aliasing with a zone that is already
   defined, but not in use. For example DMA or DMA32.
   But it looks like a good way to replicate the confusion that is page
   struct. Anyway, I found the next objection more convincing.

2. What would be the size of this zone? It would become something that
   is really application specific. At the very least it should become a
   command line parameter. A distro would have to allocate a lot of
   memory for it, because it cannot really know upfront what its users
   will do. But, most likely, the vast majority of users would never
   need that much.

If you have some idea of how to address these objections without using
vmalloc, or at least without using the same page provider that vmalloc
is using now, I'd be interested to hear it.

Besides the double mapping problem, the major benefit I can see from
having a contiguous area is that it simplifies the hardened user copy
verification, because there is a fixed range to test for overlap.

> Comments on some details are inline.

thank you

>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 1e5d8c3..116d280 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -20,6 +20,7 @@ struct notifier_block;/* in notifier.h */
>>  #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
>> initialized */
>>  #define VM_NO_GUARD0x0040  /* don't add guard page */
>>  #define VM_KASAN   0x0080  /* has allocated kasan 
>> shadow memory */
>> +#define VM_PMALLOC 0x0100  /* pmalloc area - see docs */
> 
> Is "see docs" specific enough to actually guide the reader to the
> right documentation?

The doc file is named pmalloc.txt, but I can be more explicit.

>> +#define pmalloc_attr_init(data, attr_name) \
>> +do { \
>> +   sysfs_attr_init(>attr_##attr_name.attr); \
>> +   data->attr_##attr_name.attr.name = #attr_name; \
>> +   data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
>> +   data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
>> +} while (0)
> 
> Is there a good reason for making all these files mode 0444 (as
> opposed to setting them to 0400 and then allowing userspace to make
> them accessible if desired)? /proc/slabinfo contains vaguely similar
> data and is mode 0400 (or mode 0600, depending on the kernel config)
> AFAICS.

ok, you do have a point, so far I have been mostly focusing on the

"drop-in replacement for kmalloc" aspect.

>> +void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
>> +{
> [...]
>> +   /* Expand pool */
>> +   chunk_size = roundup(size, PAGE_SIZE);
>> +   chunk = vmalloc(chunk_size);
> 
> You're allocating with vmalloc(), which, as far as I know, establishes
> a second mapping in the vmalloc area for pages that are already mapped
> as RW through the physmap. AFAICS, later, when you're trying to make
> pages readonly, you're only changing the protections on the second
> mapping in the vmalloc area, therefore leaving the memory writable
> through the physmap. Is that correct? If so, please either document
> the reasoning why this is okay or change it.

About why vmalloc as backend for pmalloc, please refer to this:

http://www.openwall.com/lists/kernel-hardening/2018/01/24/11

I tried to give a short summary of what took me toward vmalloc.
vmalloc is also a convenient way of obtaining arbitrarily (within
reason) large amounts of virtually contiguous memory.

Your objection is toward the unprotected access, through the alternate
mapping, rather than to the idea of having pools that can be protected
individually, right?

In the mail I linked, I explained that I could not use kmalloc because
of the problem of splitting huge pages on ARM.

kmalloc does require the physmap, for performance reason.

However, vmalloc is already doing mapping of individual pages, because
it must ensure that they are virtually contiguous, so would it be
possible to have vmalloc _always_ outside of the physmap?

If I have understood correctly, the actual extension of physmap is
highly architecture and platform dependant, so it might be (but I have
not checked) that in some cases (like some 32bit systems) vmalloc is
typically outside of physmap, but probably that is not the case on 64bit?

Also, I need to understand how physmap works against vmalloc vs how it
works against kernel text and const/__ro_after_init sections.

Can they also be accessed

[RFC PATCH v11 0/6] mm: security: ro protection for dynamic data

2018-01-24 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since the v10 version:

Initially I tried to provide support for hardening the LSM hooks, but the
LSM code was too much in a flux to have some chance to be merged.

Several drop-in replacement for kmalloc based functions, for example
kzalloc.

>From this perspective I have also modified genalloc, to make its free
functionality follow more closely the kfree, which doesn't need to be told
the size of the allocation being released. This was sent out for review
twice, but it has not received any feedback, so far.

Also genalloc now comes with self-testing.

The latest can be found also here:

https://www.spinics.net/lists/kernel/msg2696152.html

The need to integrate with hardened user copy has driven an optimization
in the management of vmap_areas, where each struct page in a vmalloc area
has a reference to it, saving the search through the various areas.

I was planning - and can still do it - to provide hardening for some IMA
data, but in the meanwhile it seems that the XFS developers might be
interested in htis functionality:

http://www.openwall.com/lists/kernel-hardening/2018/01/24/1

So I'm sending it out as preview.


Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Documentation for Pmalloc
  Pmalloc: self-test

 Documentation/core-api/pmalloc.txt | 104 
 include/linux/genalloc-selftest.h  |  30 +++
 include/linux/genalloc.h   |   6 +-
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 215 
 include/linux/vmalloc.h|   1 +
 init/main.c|   2 +
 lib/Kconfig|  15 ++
 lib/Makefile   |   1 +
 lib/genalloc-selftest.c| 402 +
 lib/genalloc.c | 444 +--
 mm/Kconfig |   7 +
 mm/Makefile|   2 +
 mm/pmalloc-selftest.c  |  65 +
 mm/pmalloc-selftest.h  |  30 +++
 mm/pmalloc.c   | 516 +
 mm/usercopy.c  |  25 +-
 mm/vmalloc.c   |  18 +-
 18 files changed, 1744 insertions(+), 140 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.txt
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.9.3

[PATCH 1/6] genalloc: track beginning of allocations

2018-01-24 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Ex:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

--

This patch doubles the space reserved in the bitmap for each allocation.
By using 2 bits per allocation, it is possible to encode also the
information of where the allocation starts:
(msb to the left, lsb to the right, in the following "dictionary")

11: first allocation unit in the allocation
10: any subsequent allocation unit (if any) in the allocation
00: available allocation unit
01: invalid

Ex, with the same notation as above - MSb...LSb:

 ...100010101011   <-- Read in this direction.
\__|\__|\|\|\__|
   |   | | |   \___ 4 used allocation units
   |   | | \___ 3 empty allocation units
   |   | \_ 1 used allocation unit
   |   \___ 2 used allocation units
   \___ 2 empty allocation units

Because of the encoding, the previous lockless operations are still
possible. The only caveat is to change the parameter of the zero-finding
function which establishes the alignment at which to perform the test
for first zero.
The original value of the parameter is 0, meaning that an allocation can
start at any point in the bitmap, while the new value is 1, meaning that
allocations can start only at even places (bit 0, bit 2, etc.)
The number of zeroes to look for, must therefore be doubled.

When it's time to free the memory associated to an allocation request,
it's a matter of checking if the corresponding allocation unit is really
the beginning of an allocation (both bits are set to 1).
Looking for the ending can also be performed locklessly.
It's sufficient to identify the first mapped allocation unit
that is represented either as free (00) or busy (11).
Even if the allocation status should change in the meanwhile, it doesn't
matter, since it can only transition between free (00) and
first-allocated (11).

The parameter indicating to the *_free() function the size of the space
that should be freed is not currently removed, to facilitate the
transition, but it is verified, whenever it is not zero.
If it is set to zero, then the free function will autonomously decide the
size to be free, by scanning the bitmap.

About the implementation: the patch introduces the concept of "bitmap
entry", which has a 1:1 mapping with allocation units, while the code
being patched has a 1:1 mapping between allocation units and bits.

This means that, now, the bitmap can be extended (by following powers of
2), to track also other properties of the allocations, if ever needed.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +-
 lib/genalloc.c   | 417 ---
 2 files changed, 289 insertions(+), 131 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 6dfec4d..a8fdabf 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,6 +32,7 @@
 
 #include 
 #include 
+#include 
 
 struct device;
 struct device_node;
@@ -75,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 144fe6b..13bc8cf 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -36,114 +36,221 @@
 #include 
 #include 
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LON

[PATCH 2/6] genalloc: selftest

2018-01-24 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  30 +++
 init/main.c   |   2 +
 lib/Kconfig   |  15 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 402 ++
 5 files changed, 450 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index 000..7af1901
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __GENALLOC_SELFTEST_H__
+#define __GENALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index 0e4d39c..2bdacb9 100644
--- a/init/main.c
+++ b/init/main.c
@@ -87,6 +87,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -649,6 +650,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index b1445b2..1c535f4 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -291,6 +291,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index b8f2c16..ff7ad5f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000..007a0cf
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,402 @@
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+
+/* Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+   f

[PATCH 3/6] struct page: add field for vm_struct

2018-01-24 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b..c3a4825 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6739420..44c5dfc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long 
size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-   struct vmap_area *va;
+   struct page *page;
 
-   va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA)
-   return va->vm;
+   if (unlikely(!is_vmalloc_addr(addr)))
+   return NULL;
 
-   return NULL;
+   page = vmalloc_to_page(addr);
+   if (unlikely(!page))
+   return NULL;
+
+   return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
const void *caller)
 {
struct vm_struct *area;
+   unsigned int page_counter;
void *addr;
unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
 
kmemleak_vmalloc(area, size, gfp_mask);
 
+   for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
+   area->pages[page_counter]->area = area;
+
return addr;
 
 fail:
-- 
2.9.3

[PATCH 2/2] genalloc: selftest

2018-01-11 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  30 +++
 init/main.c   |   2 +
 lib/Kconfig   |  14 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 402 ++
 5 files changed, 449 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index 000..7af1901
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __GENALLOC_SELFTEST_H__
+#define __GENALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index 0e4d39c..2bdacb9 100644
--- a/init/main.c
+++ b/init/main.c
@@ -87,6 +87,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -649,6 +650,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index b1445b2..89fa195 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -291,6 +291,20 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index b8f2c16..ff7ad5f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000..007a0cf
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,402 @@
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+
+/* Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+   for (i = 1; i <= BITMAP_SIZE_C; i++)
+   b

[RESEND PATCH v2 0/2] mm: genalloc - track beginning of allocations

2018-01-11 Thread Igor Stoppa

This is a partial resend:
- the primary functionality (PATCH 1/2) is unmodified
- while waiting for review, I added selftest capability for genalloc (2/2)


During the effort of introducing in the kernel an allocator for
protectable memory (pmalloc), it was noticed that genalloc can be
improved, to know how to separate the memory use by adjacent allocations

However, it seems that the functionality could have a value of its own.

It can:
- verify that the freeing of memory is consistent with previous allocations
- relieve the user of the API from tracking the size of each allocation
- enable use cases where generic code can free memory allocations received
  through a pointer (provided that the reference pool is known)

Details about the implementation are provided in the comment for the patch.

I mentioned this idea few months ago, as part of the pmalloc discussion,
but then I did not have time to follow-up immediately, as I had hoped.

This is an implementation of what I had in mind.
It seems to withstand several test cases i put together, in the form of
self-test, but it definitely would need thorough review.


I hope I have added as reviewer all the relevant people.
If I missed someone, please include them to the recipients.



Igor Stoppa (2):
  genalloc: track beginning of allocations
  genalloc: selftest

 include/linux/genalloc-selftest.h |  30 +++
 include/linux/genalloc.h  |   3 +-
 init/main.c   |   2 +
 lib/Kconfig   |  14 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 402 
 lib/genalloc.c| 417 ++
 7 files changed, 738 insertions(+), 131 deletions(-)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

-- 
2.9.3

[PATCH 1/2] genalloc: track beginning of allocations

2018-01-11 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Ex:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

--

This patch doubles the space reserved in the bitmap for each allocation.
By using 2 bits per allocation, it is possible to encode also the
information of where the allocation starts:
(msb to the left, lsb to the right, in the following "dictionary")

11: first allocation unit in the allocation
10: any subsequent allocation unit (if any) in the allocation
00: available allocation unit
01: invalid

Ex, with the same notation as above - MSb...LSb:

 ...100010101011   <-- Read in this direction.
\__|\__|\|\|\__|
   |   | | |   \___ 4 used allocation units
   |   | | \___ 3 empty allocation units
   |   | \_ 1 used allocation unit
   |   \___ 2 used allocation units
   \___ 2 empty allocation units

Because of the encoding, the previous lockless operations are still
possible. The only caveat is to change the parameter of the zero-finding
function which establishes the alignment at which to perform the test
for first zero.
The original value of the parameter is 0, meaning that an allocation can
start at any point in the bitmap, while the new value is 1, meaning that
allocations can start only at even places (bit 0, bit 2, etc.)
The number of zeroes to look for, must therefore be doubled.

When it's time to free the memory associated to an allocation request,
it's a matter of checking if the corresponding allocation unit is really
the beginning of an allocation (both bits are set to 1).
Looking for the ending can also be performed locklessly.
It's sufficient to identify the first mapped allocation unit
that is represented either as free (00) or busy (11).
Even if the allocation status should change in the meanwhile, it doesn't
matter, since it can only transition between free (00) and
first-allocated (11).

The parameter indicating to the *_free() function the size of the space
that should be freed is not currently removed, to facilitate the
transition, but it is verified, whenever it is not zero.
If it is set to zero, then the free function will autonomously decide the
size to be free, by scanning the bitmap.

About the implementation: the patch introduces the concept of "bitmap
entry", which has a 1:1 mapping with allocation units, while the code
being patched has a 1:1 mapping between allocation units and bits.

This means that, now, the bitmap can be extended (by following powers of
2), to track also other properties of the allocations, if ever needed.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +-
 lib/genalloc.c   | 417 ---
 2 files changed, 289 insertions(+), 131 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 6dfec4d..a8fdabf 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,6 +32,7 @@
 
 #include 
 #include 
+#include 
 
 struct device;
 struct device_node;
@@ -75,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 144fe6b..13bc8cf 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -36,114 +36,221 @@
 #include 
 #include 
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LON

[PATCH 1/6] genalloc: track beginning of allocations

2018-01-30 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Ex:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

--

This patch doubles the space reserved in the bitmap for each allocation.
By using 2 bits per allocation, it is possible to encode also the
information of where the allocation starts:
(msb to the left, lsb to the right, in the following "dictionary")

11: first allocation unit in the allocation
10: any subsequent allocation unit (if any) in the allocation
00: available allocation unit
01: invalid

Ex, with the same notation as above - MSb...LSb:

 ...100010101011   <-- Read in this direction.
\__|\__|\|\|\__|
   |   | | |   \___ 4 used allocation units
   |   | | \___ 3 empty allocation units
   |   | \_ 1 used allocation unit
   |   \___ 2 used allocation units
   \___ 2 empty allocation units

Because of the encoding, the previous lockless operations are still
possible. The only caveat is to change the parameter of the zero-finding
function which establishes the alignment at which to perform the test
for first zero.
The original value of the parameter is 0, meaning that an allocation can
start at any point in the bitmap, while the new value is 1, meaning that
allocations can start only at even places (bit 0, bit 2, etc.)
The number of zeroes to look for, must therefore be doubled.

When it's time to free the memory associated to an allocation request,
it's a matter of checking if the corresponding allocation unit is really
the beginning of an allocation (both bits are set to 1).
Looking for the ending can also be performed locklessly.
It's sufficient to identify the first mapped allocation unit
that is represented either as free (00) or busy (11).
Even if the allocation status should change in the meanwhile, it doesn't
matter, since it can only transition between free (00) and
first-allocated (11).

The parameter indicating to the *_free() function the size of the space
that should be freed is not currently removed, to facilitate the
transition, but it is verified, whenever it is not zero.
If it is set to zero, then the free function will autonomously decide the
size to be free, by scanning the bitmap.

About the implementation: the patch introduces the concept of "bitmap
entry", which has a 1:1 mapping with allocation units, while the code
being patched has a 1:1 mapping between allocation units and bits.

This means that, now, the bitmap can be extended (by following powers of
2), to track also other properties of the allocations, if ever needed.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   2 +-
 lib/genalloc.c   | 417 ---
 2 files changed, 288 insertions(+), 131 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930..0377681 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc..dde7830 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -36,114 +36,221 @@
 #include 
 #include 
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRI

[PATCH 2/6] genalloc: selftest

2018-01-30 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  30 +++
 init/main.c   |   2 +
 lib/Kconfig   |  15 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 402 ++
 5 files changed, 450 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index 000..7af1901
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __GENALLOC_SELFTEST_H__
+#define __GENALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b9..fb844aa 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index c5e84fb..430026d0 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index d11c48e..ba06e83 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000..007a0cf
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,402 @@
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+
+/* Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+

[PATCH 4/6] Protectable Memory

2018-01-30 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 216 
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c   |  27 +++
 mm/Makefile  |   1 +
 mm/pmalloc.c | 513 +++
 mm/usercopy.c|  25 ++-
 7 files changed, 782 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0377681..a486a26 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, 
size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000..ad7d557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,216 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include 
+#include 
+#include 
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later mem

[PATCH 3/6] struct page: add field for vm_struct

2018-01-30 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4..2abd540 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -56,6 +56,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6739420..44c5dfc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long 
size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-   struct vmap_area *va;
+   struct page *page;
 
-   va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA)
-   return va->vm;
+   if (unlikely(!is_vmalloc_addr(addr)))
+   return NULL;
 
-   return NULL;
+   page = vmalloc_to_page(addr);
+   if (unlikely(!page))
+   return NULL;
+
+   return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
const void *caller)
 {
struct vm_struct *area;
+   unsigned int page_counter;
void *addr;
unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
 
kmemleak_vmalloc(area, size, gfp_mask);
 
+   for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
+   area->pages[page_counter]->area = area;
+
return addr;
 
 fail:
-- 
2.9.3

[PATCH 5/6] Documentation for Pmalloc

2018-01-30 Thread Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/pmalloc.txt | 104 +
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.txt

diff --git a/Documentation/core-api/pmalloc.txt 
b/Documentation/core-api/pmalloc.txt
new file mode 100644
index 000..934d356
--- /dev/null
+++ b/Documentation/core-api/pmalloc.txt
@@ -0,0 +1,104 @@
+
+Protectable memory allocator
+
+
+Introduction
+
+
+When trying to perform an attack toward a system, the attacker typically
+wants to alter the execution flow, in a way that allows actions which
+would otherwise be forbidden.
+
+In recent years there has been lots of effort in preventing the execution
+of arbitrary code, so the attacker is progressively pushed to look for
+alternatives.
+
+If code changes are either detected or even prevented, what is left is to
+alter kernel data.
+
+As countermeasure, constant data is collected in a section which is then
+marked as readonly.
+To expand on this, also statically allocated variables which are tagged
+as __ro_after_init will receive a similar treatment.
+The difference from constant data is that such variables can be still
+altered freely during the kernel init phase.
+
+However, such solution does not address those variables which could be
+treated essentially as read-only, but whose size is not known at compile
+time or cannot be fully initialized during the init phase.
+
+
+Design
+--
+
+pmalloc builds on top of genalloc, using the same concept of memory pools
+A pool is a handle to a group of chunks of memory of various sizes.
+When created, a pool is empty. It will be populated by allocating chunks
+of memory, either when the first memory allocation request is received, or
+when a pre-allocation is performed.
+
+Either way, one or more memory pages will be obtained from vmalloc and
+registered in the pool as chunk. Subsequent requests will be satisfied by
+either using any available free space from the current chunks, or by
+allocating more vmalloc pages, should the current free space not suffice.
+
+This is the key point of pmalloc: it groups data that must be protected
+into a set of pages. The protection is performed through the mmu, which
+is a prerequisite and has a minimum granularity of one page.
+
+If the relevant variables were not grouped, there would be a problem of
+allowing writes to other variables that might happen to share the same
+page, but require further alterations over time.
+
+A pool is a group of pages that are write protected at the same time.
+Ideally, they have some high level correlation (ex: they belong to the
+same module), which justifies write protecting them all together.
+
+To keep it to a minimum, locking is left to the user of the API, in
+those cases where it's not strictly needed.
+Ideally, no further locking is required, since each module can have own
+pool (or pools), which should, for example, avoid the need for cross
+module or cross thread synchronization about write protecting a pool.
+
+The overhead of creating an additional pool is minimal: a handful of bytes
+from kmalloc space for the metadata and then what is left unused from the
+page(s) registered as chunks.
+
+Compared to plain use of vmalloc, genalloc has the advantage of tightly
+packing the allocations, reducing the number of pages used and therefore
+the pressure on the TLB. The slight overhead in execution time of the
+allocation should be mostly irrelevant, because pmalloc memory is not
+meant to be allocated/freed in tight loops. Rather it ought to be taken
+in use, initialized and write protected. Possibly destroyed.
+
+Considering that not much data is supposed to be dynamically allocated
+and then marked as read-only, it shouldn't be an issue that the address
+range for pmalloc is limited, on 32-bit systems.
+
+Regarding SMP systems, the allocations are expected to happen mostly
+during an initial transient, after which there should be no more need to
+perform cross-processor synchronizations of page tables.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+2. [optional] pre-allocate some memory in the pool
+3. issue one or more allocation requests to the pool
+4. initialize the memory obtained
+   - iterate over points 3 & 4 as needed -
+5. write protect the pool
+6. use in read-only mode the handlers obtained through the allocations
+7. [optional] destroy the pool
+
+
+In a scenario where, for example due to some error, part or all of the
+allocations performed at point 3 must be reverted, it is possible to free
+them, as long as point 5 has not been executed, and the pool is still
+modifiable. Such freed memory can be re-used.
+Performing a free operation on a write-protected pool will, instead,
+simply release the corresponding m

[RFC PATCH v12 0/6] mm: security: ro protection for dynamic data

2018-01-30 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since the v11 version:
[http://www.openwall.com/lists/kernel-hardening/2018/01/24/4]

- restricted access to sysfs entries created (444 -> 400)
- more explicit reference to documentation
- couple of typos

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Documentation for Pmalloc
  Pmalloc: self-test

 Documentation/core-api/pmalloc.txt | 104 
 include/linux/genalloc-selftest.h  |  30 +++
 include/linux/genalloc.h   |   5 +-
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 216 
 include/linux/vmalloc.h|   1 +
 init/main.c|   2 +
 lib/Kconfig|  15 ++
 lib/Makefile   |   1 +
 lib/genalloc-selftest.c| 402 +
 lib/genalloc.c | 444 +--
 mm/Kconfig |   7 +
 mm/Makefile|   2 +
 mm/pmalloc-selftest.c  |  65 +
 mm/pmalloc-selftest.h  |  30 +++
 mm/pmalloc.c   | 516 +
 mm/usercopy.c  |  25 +-
 mm/vmalloc.c   |  18 +-
 18 files changed, 1744 insertions(+), 140 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.txt
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.9.3

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-01-30 Thread Igor Stoppa

On 26/01/18 18:36, Boris Lukashev wrote:
> I like the idea of making the verification call optional for consumers
> allowing for fast/slow+hard paths depending on their needs.
> Cant see any additional vectors for abuse (other than the original
> ones effecting out-of-band modification) introduced by having
> verify/normal callers, but i've not had enough coffee yet. Any access
> races or things like that come to mind for anyone?

Well, the devil is in the details.
In this case, the question is how to perform the verification in a way
that is sufficiently robust against races.

After thinking about it for a while, I doubt it can be done reliably.
It might work for some small data types, but the typical use case I have
found myself dealing with, is protecting data structures.

That also brings up a separate problem: what would be the size of data
to hash? At one extreme there is a page, but it's probably too much, so
what is the correct size? it cannot be smaller than a specific
allocation, however that would imply looking for the hash related to the
data being accessed, with extra overhead.

And the data being accessed might be a field in a struct, for which we
would not have any hash.
There would be a hash only for the containing struct that was allocated ...

Overall, it seems a good idea in theory, but when I think about its
implementation, it seems like the overhead is so big that it would
discourage its use for almost any practical purpose.

If one really wants to be paranoid could, otoh have redundancy in a
different pool.

--
igor

[PATCH 6/6] Pmalloc: self-test

2018-01-30 Thread Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 lib/genalloc.c|  2 +-
 mm/Kconfig|  7 ++
 mm/Makefile   |  1 +
 mm/pmalloc-selftest.c | 65 +++
 mm/pmalloc-selftest.h | 30 
 mm/pmalloc.c  |  9 ---
 6 files changed, 110 insertions(+), 4 deletions(-)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/lib/genalloc.c b/lib/genalloc.c
index 62f69b3..7ba2ec9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -542,7 +542,7 @@ void gen_pool_flush_chunk(struct gen_pool *pool,
memset(chunk->entries, 0,
   DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
BITS_PER_BYTE));
-   atomic_set(>avail, size);
+   atomic_long_set(>avail, size);
 }
 
 
diff --git a/mm/Kconfig b/mm/Kconfig
index 03ff770..f0c960e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -765,3 +765,10 @@ config GUP_BENCHMARK
  performance of get_user_pages_fast().
 
  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY_SELFTEST
+   bool "Run self test for pmalloc memory allocator"
+   default n
+   help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index a6a47e1..1e76a9b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 000..1c025f3
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,65 @@
+/*
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)   \
+   pr_notice("must be " expected ": %s",   \
+ is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)\
+   validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)\
+   validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+   struct gen_pool *pool_unprot;
+   struct gen_pool *pool_prot;
+   void *var_prot, *var_unprot, *var_vmall;
+
+   pr_notice("pmalloc self-test");
+   pool_unprot = pmalloc_create_pool("unprotected", 0);
+   pool_prot = pmalloc_create_pool("protected", 0);
+   BUG_ON(!(pool_unprot && pool_prot));
+
+   var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+   var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+   var_vmall = vmalloc(SIZE_2);
+   is_alloc_ok(var_unprot, 10);
+   is_alloc_ok(var_unprot, SIZE_1);
+   is_alloc_ok(var_unprot, PAGE_SIZE);
+   is_alloc_no(var_unprot, SIZE_1 + 1);
+   is_alloc_no(var_vmall, 10);
+
+
+   pfree(pool_unprot, var_unprot);
+   vfree(var_vmall);
+
+   pmalloc_protect_pool(pool_prot);
+
+   /* This will intentionally trigger a WARN because the pool being
+* destroyed is not protected, which is unusual and should happen
+* on error paths only, where probably other warnings are already
+* displayed.
+*/
+   pmalloc_destroy_pool(pool_unprot);
+
+   /* This must not cause WARNings */
+   pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 000..3673d23
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __PMALLOC_SELFTEST_H__
+#define __PMALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+#include 
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmallo

Re: [PATCH 3/6] struct page: add field for vm_struct

2018-02-01 Thread Igor Stoppa

On 01/02/18 02:00, Christopher Lameter wrote:
> On Tue, 30 Jan 2018, Igor Stoppa wrote:
> 
>> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, 
>> unsigned long align,
>>
>>  kmemleak_vmalloc(area, size, gfp_mask);
>>
>> +for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
>> +area->pages[page_counter]->area = area;
>> +
>>  return addr;
> 
> Well this introduces significant overhead for large sized allocation. Does
> this not matter because the areas are small?

Relatively significant?
I do not object to your comment, but in practice i see that:

- vmalloc is used relatively little
- allocations do not seem to be huge
- there seem to be way larger overheads in the handling of virtual pages
  (see my proposal for the LFS/m summit, about collapsing struct
   vm_struct and struct vmap_area)

> Would it not be better to use compound page allocations here?
> page_head(whatever) gets you the head page where you can store all sorts
> of information about the chunk of memory.

Can you please point me to this function/macro? I don't seem to be able
to find it, at least not in 4.15

During hardened user copy permission check, I need to confirm if the
memory range that would be exposed to userspace is a legitimate
sub-range of a pmalloc allocation.

So, I start with the pair (address, size) and I must end up to something
I can compare it against.
The idea here is to pass through struct_page and then the related
vm_struct/vmap_area, which already has the information about the
specific chunk of virtual memory.

I cannot comment on your proposal because I do not know where to find
the reference you made, or maybe I do not understand what you mean :-(

--
igor

Re: [PATCH v6] checkpatch.pl: Add SPDX license tag check

2018-02-02 Thread Igor Stoppa

On 02/02/18 17:40, Rob Herring wrote:
> Add SPDX license tag check based on the rules defined in

Shouldn't it also check that the license is compatible?

[...]

> + } elsif ($realfile =~ /\.rst$/) {
> + $comment = '..';

What is the correct/acceptable license for documentation?
Creative Commons? AFAIK GPL is for source code.

Googling didn't bring the wished-for enlightenment.

--
igor

Re: [PATCH 3/6] struct page: add field for vm_struct

2018-02-02 Thread Igor Stoppa



On 01/02/18 23:11, Kees Cook wrote:

> IIUC, he means PageHead(), which is also hard to grep for, since it is
> a constructed name, via Page##uname in include/linux/page-flags.h:
> 
> __PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)

Thank you, I'll try to provide a meaningful reply soon, but I'll be AFK
during most of next 2 weeks, so it might be delayed :-(

--
igor

Re: [PATCH 5/6] Documentation for Pmalloc

2018-02-02 Thread Igor Stoppa

Thanks for the review and apologies for the delay.
Replies inlined below.

On 30/01/18 19:08, Jonathan Corbet wrote:
> On Tue, 30 Jan 2018 17:14:45 +0200
> Igor Stoppa <igor.sto...@huawei.com> wrote:

[...]

> Please don't put plain-text files into core-api - that's a directory full

ok

>> diff --git a/Documentation/core-api/pmalloc.txt 
>> b/Documentation/core-api/pmalloc.txt
>> new file mode 100644
>> index 000..934d356
>> --- /dev/null
>> +++ b/Documentation/core-api/pmalloc.txt
>> @@ -0,0 +1,104 @@
> 
> We might as well put the SPDX tag here, it's a new file.

ok, this is all new stuff to me ... I suppose I should do it also for
all the other new files I create

But what is the license for the documentation? It's not code, so GPL
seems wrong. Creative commons?

I just noticed a patch for checkpatch.pl about SPDX and asked the same
question there.

https://lkml.org/lkml/2018/2/2/365

>> +
>> +Protectable memory allocator
>> +
>> +
>> +Introduction
>> +

[...]

> This is all good information, but I'd suggest it belongs more in the 0/n
> patch posting than here.  The introduction of *this* document should say
> what it actually covers.

ok

> 
>> +
>> +Design
>> +--

[...]

>> +To keep it to a minimum, locking is left to the user of the API, in
>> +those cases where it's not strictly needed.
> 
> This seems like a relevant and important aspect of the API that shouldn't
> be buried in the middle of a section talking about random things.

I'll move it to the Use section.

[...]

>> +Use
>> +---
>> +
>> +The typical sequence, when using pmalloc, is:
>> +
>> +1. create a pool
>> +2. [optional] pre-allocate some memory in the pool
>> +3. issue one or more allocation requests to the pool
>> +4. initialize the memory obtained
>> +   - iterate over points 3 & 4 as needed -
>> +5. write protect the pool
>> +6. use in read-only mode the handlers obtained through the allocations
>> +7. [optional] destroy the pool
> 
> So one gets this far, but has no actual idea of how to do these things.
> Which leads me to wonder: what is this document for?  Who are you expecting
> to read it?

I will add a reference to the selftest file.
In practice it can also work as example.

> You could improve things a lot by (once again) going to RST and using
> directives to bring in the kerneldoc comments from the source (which, I
> note, do exist).  But I'd suggest rethinking this document and its
> audience.  Most of the people reading it are likely wanting to learn how to
> *use* this API; I think it would be best to not leave them frustrated.

ok, the example route should be more explicative.

--
thanks again for the review, igor

Re: [PATCH 3/6] struct page: add field for vm_struct

2018-02-03 Thread Igor Stoppa



On 02/02/18 20:43, Christopher Lameter wrote:
> On Thu, 1 Feb 2018, Igor Stoppa wrote:
> 
>>> Would it not be better to use compound page allocations here?

[...]

> Ok its compound_head(). See also the use in the SLAB and SLUB allocator.
> 
>> During hardened user copy permission check, I need to confirm if the
>> memory range that would be exposed to userspace is a legitimate
>> sub-range of a pmalloc allocation.
> 
> If you save the size in the head page struct then you could do that pretty
> fast.

Ok, now I get what you mean.
But it doesn't seem to fit the intended use case, for other reasons
(maybe the same, from 2 different POV):

- compound pages are aggregates of regular pages, in numbers that are
powers of 2, while the amount of pages to allocate is not known upfront.
One *could* give a hint to pmalloc about how many pages to allocate
every time there is a need to grow the pool.
Iow it would be the size of a chunk. But I'm afraid the granularity
would still be pretty low, so maybe it would be 2-4 times less.

- the property of the compound page will affect the property of all the
pages in the compound, so when one is write protected, it can generate a
lot of wasted memory, if there is too much slack (because of the order)
With vmalloc, I can allocate any number of pages, minimizing the waste.


Finally, there was a discussion about optimization:
http://www.openwall.com/lists/kernel-hardening/2017/08/07/2

The patch I sent does indeed take advantage of the new information, not
just for pmalloc use.

I have not measured if/where/what there is gain, but it does look like
the extra info can be exploited also elsewhere.

--
igor

Re: [PATCH v6] checkpatch.pl: Add SPDX license tag check

2018-02-03 Thread Igor Stoppa

On 02/02/18 21:06, Joe Perches wrote:
> On Fri, 2018-02-02 at 12:27 -0600, Rob Herring wrote:
>> On Fri, Feb 2, 2018 at 9:49 AM, Igor Stoppa <igor.sto...@huawei.com> wrote:
>>> On 02/02/18 17:40, Rob Herring wrote:
>>>> Add SPDX license tag check based on the rules defined in
>>>
>>> Shouldn't it also check that the license is compatible?
>>>
>>
>> Perhaps we shouldn't try to script legal advice.
> 
> True.
> 
> I believe what was meant was that the
> entry was a valid SPDX License entry
> that already exists as a specific file
> in the LICENSES/ path.

I expect that there is a finite number of compatible licenses.
Maybe I'm too optimistic about what can be taken as legal advice or not,
but I would expect that a warning about unmatched license type does not
constitute legal advice.

Is it too optimistic? :-D

--
igor

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-04 Thread Igor Stoppa

On 04/02/18 00:29, Boris Lukashev wrote:
> On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa <igor.sto...@huawei.com> wrote:

[...]

>> What you are suggesting, if I have understood it correctly, is that,
>> when the pool is protected, the addresses already given out, will become
>> traps that get resolved through a lookup table that is built based on
>> the content of each allocation.
>>
>> That seems to generate a lot of overhead, not to mention the fact that
>> it might not play very well with the MMU.
> 
> That is effectively what i'm suggesting - as a form of protection for
> consumers against direct reads of data which may have been corrupted
> by some irrelevant means. In the context of pmalloc, it would probably
> be a separate type of ro+verified pool
ok, that seems more like an extension though.

ATM I am having problems gaining traction to get even the basic merged :-)

I would consider this as a possibility for future work, unless it is
said that it's necessary for pmalloc to be accepted ...

--
igor

[PATCH 5/6] Pmalloc: self-test

2018-02-04 Thread Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 mm/Kconfig|  9 
 mm/Makefile   |  1 +
 mm/pmalloc-selftest.c | 61 +++
 mm/pmalloc-selftest.h | 26 ++
 4 files changed, 97 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index 03ff7703d322..f5edfadd036a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -765,3 +765,12 @@ config GUP_BENCHMARK
  performance of get_user_pages_fast().
 
  See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY_SELFTEST
+   bool "Run self test for pmalloc memory allocator"
+   depends on ARCH_HAS_SET_MEMORY
+   select GENERIC_ALLOCATOR
+   default n
+   help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index a6a47e1b6e66..1e76a9b96180 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index ..9f6401ebd8d1
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#include 
+#include 
+
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)   \
+   pr_notice("must be " expected ": %s",   \
+ is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)\
+   validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)\
+   validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+   struct gen_pool *pool_unprot;
+   struct gen_pool *pool_prot;
+   void *var_prot, *var_unprot, *var_vmall;
+
+   pr_notice("pmalloc self-test");
+   pool_unprot = pmalloc_create_pool("unprotected", 0);
+   pool_prot = pmalloc_create_pool("protected", 0);
+   BUG_ON(!(pool_unprot && pool_prot));
+
+   var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+   var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+   var_vmall = vmalloc(SIZE_2);
+   is_alloc_ok(var_unprot, 10);
+   is_alloc_ok(var_unprot, SIZE_1);
+   is_alloc_ok(var_unprot, PAGE_SIZE);
+   is_alloc_no(var_unprot, SIZE_1 + 1);
+   is_alloc_no(var_vmall, 10);
+
+
+   pfree(pool_unprot, var_unprot);
+   vfree(var_vmall);
+
+   pmalloc_protect_pool(pool_prot);
+
+   /* This will intentionally trigger a WARN because the pool being
+* destroyed is not protected, which is unusual and should happen
+* on error paths only, where probably other warnings are already
+* displayed.
+*/
+   pmalloc_destroy_pool(pool_unprot);
+
+   /* This must not cause WARNings */
+   pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index ..b34ac86e14a8
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+
+#ifndef __PMALLOC_SELFTEST_H__
+#define __PMALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+#include 
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
-- 
2.16.0

[PATCH 6/6] Documentation for Pmalloc

2018-02-04 Thread Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index d5bbe035316d..7244ddeb540f 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -21,6 +21,7 @@ Core utilities
flexible-arrays
librs
genalloc
+   pmalloc
 
 Interfaces for kernel debugging
 ===
diff --git a/Documentation/core-api/pmalloc.rst 
b/Documentation/core-api/pmalloc.rst
new file mode 100644
index ..8dabb5e18d8f
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+SPDX-License-Identifier: GPL-2.0
+
+Protectable memory allocator
+
+
+Purpose
+---
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor it could take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Ex: A policy that is loaded from userspace.
+
+
+Concept
+---
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel idrivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+---
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requestor, however it will not become avaiable for
+  further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support wrt allocating vs protecting
+  an individual pool, for performance reason. It is recommended to not
+  share the same pool between unrelated functions. Should sharing be a
+  necessity, the user of the shared pool is expected to implement locking
+  for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using directly vmalloc. The exact number depends on size
+  of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handlers obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.16.0

[PATCH 1/6] genalloc: track beginning of allocations

2018-02-04 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Ex:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

--

This patch doubles the space reserved in the bitmap for each allocation.
By using 2 bits per allocation, it is possible to encode also the
information of where the allocation starts:
(msb to the left, lsb to the right, in the following "dictionary")

11: first allocation unit in the allocation
10: any subsequent allocation unit (if any) in the allocation
00: available allocation unit
01: invalid

Ex, with the same notation as above - MSb...LSb:

 ...100010101011   <-- Read in this direction.
\__|\__|\|\|\__|
   |   | | |   \___ 4 used allocation units
   |   | | \___ 3 empty allocation units
   |   | \_ 1 used allocation unit
   |   \___ 2 used allocation units
   \___ 2 empty allocation units

Because of the encoding, the previous lockless operations are still
possible. The only caveat is to change the parameter of the zero-finding
function which establishes the alignment at which to perform the test
for first zero.
The original value of the parameter is 0, meaning that an allocation can
start at any point in the bitmap, while the new value is 1, meaning that
allocations can start only at even places (bit 0, bit 2, etc.)
The number of zeroes to look for, must therefore be doubled.

When it's time to free the memory associated to an allocation request,
it's a matter of checking if the corresponding allocation unit is really
the beginning of an allocation (both bits are set to 1).
Looking for the ending can also be performed locklessly.
It's sufficient to identify the first mapped allocation unit
that is represented either as free (00) or busy (11).
Even if the allocation status should change in the meanwhile, it doesn't
matter, since it can only transition between free (00) and
first-allocated (11).

The parameter indicating to the *_free() function the size of the space
that should be freed is not currently removed, to facilitate the
transition, but it is verified, whenever it is not zero.
If it is set to zero, then the free function will autonomously decide the
size to be free, by scanning the bitmap.

About the implementation: the patch introduces the concept of "bitmap
entry", which has a 1:1 mapping with allocation units, while the code
being patched has a 1:1 mapping between allocation units and bits.

This means that, now, the bitmap can be extended (by following powers of
2), to track also other properties of the allocations, if ever needed.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c   | 417 ---
 2 files changed, 289 insertions(+), 132 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..dde78307b093 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -36,114 +36,221 @@
 #include 
 #include 
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PE

[PATCH 2/6] genalloc: selftest

2018-02-04 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  30 +++
 init/main.c   |   2 +
 lib/Kconfig   |  15 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 402 ++
 5 files changed, 450 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index ..7af1901e57dc
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __GENALLOC_SELFTEST_H__
+#define __GENALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index c5e84fbcb30b..430026d0815d 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index d11c48ec8ffd..ba06e8366705 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index ..007a0cfb3d77
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,402 @@
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+
+/* Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_

[PATCH 3/6] struct page: add field for vm_struct

2018-02-04 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4e5e0e..2abd540b969f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -56,6 +56,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..44c5dfcb2fd7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long 
size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-   struct vmap_area *va;
+   struct page *page;
 
-   va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA)
-   return va->vm;
+   if (unlikely(!is_vmalloc_addr(addr)))
+   return NULL;
 
-   return NULL;
+   page = vmalloc_to_page(addr);
+   if (unlikely(!page))
+   return NULL;
+
+   return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
const void *caller)
 {
struct vm_struct *area;
+   unsigned int page_counter;
void *addr;
unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
 
kmemleak_vmalloc(area, size, gfp_mask);
 
+   for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
+   area->pages[page_counter]->area = area;
+
return addr;
 
 fail:
-- 
2.16.0

[PATCH 4/6] Protectable Memory

2018-02-04 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 213 
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c   |  27 +++
 mm/Makefile  |   1 +
 mm/pmalloc.c | 514 +++
 mm/usercopy.c|  25 ++-
 7 files changed, 780 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, 
size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index ..5fa8a78be819
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,213 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include 
+#include 
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmal

[RFC PATCH v14 0/6] mm: security: ro protection for dynamic data

2018-02-04 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v13:
[http://www.openwall.com/lists/kernel-hardening/2018/02/03/3]

- fixed warnings from "make htmldocs"
- added documentation to core-api index


Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 
 include/linux/genalloc-selftest.h  |  30 +++
 include/linux/genalloc.h   |   7 +-
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 213 +++
 include/linux/vmalloc.h|   1 +
 init/main.c|   2 +
 lib/Kconfig|  15 ++
 lib/Makefile   |   1 +
 lib/genalloc-selftest.c| 402 +
 lib/genalloc.c | 444 ++--
 mm/Kconfig |   9 +
 mm/Makefile|   2 +
 mm/pmalloc-selftest.c  |  61 +
 mm/pmalloc-selftest.h  |  26 ++
 mm/pmalloc.c   | 514 +
 mm/usercopy.c  |  25 +-
 mm/vmalloc.c   |  18 +-
 19 files changed, 1745 insertions(+), 141 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.16.0

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-03 Thread Igor Stoppa

+Boris Lukashev

On 02/02/18 20:39, Christopher Lameter wrote:
> On Thu, 25 Jan 2018, Matthew Wilcox wrote:
> 
>> It's worth having a discussion about whether we want the pmalloc API
>> or whether we want a slab-based API.  We can have a separate discussion
>> about an API to remove pages from the physmap.
> 
> We could even do this in a more thorough way. Can we use a ring 1 / 2
> distinction to create a hardened OS core that policies the rest of
> the ever expanding kernel with all its modules and this and that feature?

What would be the differentiating criteria? Furthermore, what are the
chances
of invalidating the entire concept, because there is already an
hypervisor using
the higher level features?
That is what you are proposing, if I understand correctly.

But more on this below ...

> I think that will long term be a better approach and allow more than the
> current hardening approaches can get you. It seems that we are willing to
> tolerate significant performance regressions now. So lets use the
> protection mechanisms that the hardware offers.

I would rather *not* propose significant performance regression :-P

There might be some one-off case or anyway rare event which is
penalized, but my preference goes to not introducing any significant
performance penalty, during regular use.
After all, the lower the penalty, the wider the (potential) adoption.

More in detail: there are 2 major cases for wanting some form of
read-only protection.

1) extra ward against accidental corruption
The kernel provides many debugging tools and they can detect lots of
errors during development, but they require time and knowledge to use
them, which are not always available.
Furthermore, it is objectively true that not all the code has the same
level of maturity, especially when non-upstream code is used in some
custom product. It's not my main goal, but it would be nice if that case
too could be addressed by the protection.
Corruption *can* happen.
Having live guards against it, will definitely help spotting bugs or, at
the very least, crash/reboot a device before it can cause permanent data
corruption.
Protection against accidental corruption should be used as widely as
possible, therefore it cannot have an high price tag, in terms of lost
performance. Otherwise, there's the risk that it will be just a debug
feature, more like lockdep or ubsan.

2) protection against malicious attacks
This is harder, of course, but what is realistically to be expected?

If an attacker can gain full control of the kernel, the only way to do
damage control is to have HW and/or higher privilege SW that can somehow
limit the reach of the attacker.
To make it work for real, it should be mandated that either these extra
HW/SW means can tell apart legitimate kernel activity from rogue
actions, or they operate so independently from the kernel that a
compromise kernel cannot use any API to influence them.

The consensus seems to be to put aside (for now) this concern and
instead focus on what is a typical scenario:
- some bug is found that allows to read/write kernel memory
- some other bug is found, which leaks the address of a well known
variable, effectively revealing the randomized offset of each symbol
placed in linear memory, once their relative location is known.

What is described above is a toolkit that effectively can allow - with
patience - to attack anything that is writable by the kernel.
Including page tables and permissions.

However the typical attack is more like: "let's flip some bit(s)".
Which is where __ro_after_init has its purpose to exist.

My proposal is to extend the same sort of protection also to variables
allocated dynamically.

* make the pages read only, once the data is initialized
* use vmalloc to prevent that exfiltrating the address of an unrelated
variable can easily give away the location of the real target, because
of the individual page mapping vs linear mapping.

Boris Lukashev proposed additional hardening, when accessing a certain
variable, in the form of hash/checksum, but I could not come up with an
implementation that did not have too much overhead.

Re-considering this, one option would be to have a function
"pool_validate()" - probably expensive - that could be invoked by a
piece of code before using the data from the pool.

Not perfect, because it would not be atomic, but it could be used once,
at the beginning of a function, without adding overhead to each access
to the pool that the function would perform.

An attacker would have to time the attack so that the corruption of the
data wold happen after the pool is validated and before the data is read
from it.
Possible, but way tricker than the current unprotected situation.

What I am trying to say, is that even after having multi-ring
implementation (which would be more dependent on HW features), there
would be still the problem of validating the legitimacy of the use of
the API that such implementation would expose.

I'd rather try to

[PATCH 4/6] Protectable Memory

2018-02-03 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 211 +++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c   |  27 +++
 mm/Makefile  |   1 +
 mm/pmalloc.c | 514 +++
 mm/usercopy.c|  25 ++-
 7 files changed, 778 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e..b6c4cea 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, 
size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000..6d4a24e
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,211 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include 
+#include 
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of fr

[PATCH 2/6] genalloc: selftest

2018-02-03 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  30 +++
 init/main.c   |   2 +
 lib/Kconfig   |  15 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 402 ++
 5 files changed, 450 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index 000..7af1901
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __GENALLOC_SELFTEST_H__
+#define __GENALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b9..fb844aa 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index c5e84fb..430026d0 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index d11c48e..ba06e83 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 000..007a0cf
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,402 @@
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+
+/* Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+

[PATCH 1/6] genalloc: track beginning of allocations

2018-02-03 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Ex:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

--

This patch doubles the space reserved in the bitmap for each allocation.
By using 2 bits per allocation, it is possible to encode also the
information of where the allocation starts:
(msb to the left, lsb to the right, in the following "dictionary")

11: first allocation unit in the allocation
10: any subsequent allocation unit (if any) in the allocation
00: available allocation unit
01: invalid

Ex, with the same notation as above - MSb...LSb:

 ...100010101011   <-- Read in this direction.
\__|\__|\|\|\__|
   |   | | |   \___ 4 used allocation units
   |   | | \___ 3 empty allocation units
   |   | \_ 1 used allocation unit
   |   \___ 2 used allocation units
   \___ 2 empty allocation units

Because of the encoding, the previous lockless operations are still
possible. The only caveat is to change the parameter of the zero-finding
function which establishes the alignment at which to perform the test
for first zero.
The original value of the parameter is 0, meaning that an allocation can
start at any point in the bitmap, while the new value is 1, meaning that
allocations can start only at even places (bit 0, bit 2, etc.)
The number of zeroes to look for, must therefore be doubled.

When it's time to free the memory associated to an allocation request,
it's a matter of checking if the corresponding allocation unit is really
the beginning of an allocation (both bits are set to 1).
Looking for the ending can also be performed locklessly.
It's sufficient to identify the first mapped allocation unit
that is represented either as free (00) or busy (11).
Even if the allocation status should change in the meanwhile, it doesn't
matter, since it can only transition between free (00) and
first-allocated (11).

The parameter indicating to the *_free() function the size of the space
that should be freed is not currently removed, to facilitate the
transition, but it is verified, whenever it is not zero.
If it is set to zero, then the free function will autonomously decide the
size to be free, by scanning the bitmap.

About the implementation: the patch introduces the concept of "bitmap
entry", which has a 1:1 mapping with allocation units, while the code
being patched has a 1:1 mapping between allocation units and bits.

This means that, now, the bitmap can be extended (by following powers of
2), to track also other properties of the allocations, if ever needed.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c   | 417 ---
 2 files changed, 289 insertions(+), 132 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930..dcaa33e 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc..dde7830 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -36,114 +36,221 @@
 #include 
 #include 
 
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define E

[PATCH 3/6] struct page: add field for vm_struct

2018-02-03 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4..2abd540 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -56,6 +56,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6739420..44c5dfc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long 
size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-   struct vmap_area *va;
+   struct page *page;
 
-   va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA)
-   return va->vm;
+   if (unlikely(!is_vmalloc_addr(addr)))
+   return NULL;
 
-   return NULL;
+   page = vmalloc_to_page(addr);
+   if (unlikely(!page))
+   return NULL;
+
+   return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
const void *caller)
 {
struct vm_struct *area;
+   unsigned int page_counter;
void *addr;
unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
 
kmemleak_vmalloc(area, size, gfp_mask);
 
+   for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
+   area->pages[page_counter]->area = area;
+
return addr;
 
 fail:
-- 
2.9.3

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-03 Thread Igor Stoppa

>> On Thu, 25 Jan 2018, Matthew Wilcox wrote:

>>> It's worth having a discussion about whether we want the pmalloc API
>>> or whether we want a slab-based API.  
I'd love to have some feedback specifically about the API.

I have also some idea about userspace and how to extend the pmalloc
concept to it:

http://www.openwall.com/lists/kernel-hardening/2018/01/30/20

I'll be AFK intermittently for about 2 weeks, so i might not be able to
reply immediately, but from my perspective this would be just the
beginning of a broader hardening of both kernel and userspace that I'd
like to pursue.

--
igor

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-03 Thread Igor Stoppa

On 03/02/18 22:12, Boris Lukashev wrote:

> Regarding the notion of validated protected memory, is there a method
> by which the resulting checksum could be used in a lookup
> table/function to resolve the location of the protected data?

What I have in mind is a checksum at page/vmap_area level, so there
would be no 1:1 mapping between a specific allocation and the checksum.

An extreme case would be the one where an allocation crosses one or more
page boundaries, while the checksum refers to a (partially) overlapping
memory area.

Code accessing a pool could perform one (relatively expensive)
validation. But still something that would require a more sophisticated
attack, to subvert.

> Effectively a hash table of protected allocations, with a benefit of
> dedup since any data matching the same key would be the same data
> (multiple identical cred structs being pushed around). Should leave
> the resolver address/csum in recent memory to check against, right?

I see where you are trying to land, but I do not see how it would work
without a further intermediate step.

pmalloc dishes out virtual memory addresses, when called.

It doesn't know what the user of the allocation will put in it.
The user, otoh, has the direct address of the memory it got.

What you are suggesting, if I have understood it correctly, is that,
when the pool is protected, the addresses already given out, will become
traps that get resolved through a lookup table that is built based on
the content of each allocation.

That seems to generate a lot of overhead, not to mention the fact that
it might not play very well with the MMU.

If I misunderstood, then I'd need a step by step description of what
happens, because it's not clear to me how else the data would be
accessed if not through the address that was obtained when pmalloc was
invoked.

--
igor

[RFC PATCH v13 0/6] mm: security: ro protection for dynamic data

2018-02-03 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v12
[https://lkml.org/lkml/2018/1/30/397]

- fixed Kconfig dependency for pmalloc-test
- fixed warning for size_t treated as %ul on i386
- moved to SPDX license reference
- rewrote pmalloc docs

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/pmalloc.rst | 114 
 include/linux/genalloc-selftest.h  |  30 +++
 include/linux/genalloc.h   |   7 +-
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 211 +++
 include/linux/vmalloc.h|   1 +
 init/main.c|   2 +
 lib/Kconfig|  15 ++
 lib/Makefile   |   1 +
 lib/genalloc-selftest.c| 402 +
 lib/genalloc.c | 444 ++--
 mm/Kconfig |   9 +
 mm/Makefile|   2 +
 mm/pmalloc-selftest.c  |  61 +
 mm/pmalloc-selftest.h  |  26 ++
 mm/pmalloc.c   | 514 +
 mm/usercopy.c  |  25 +-
 mm/vmalloc.c   |  18 +-
 18 files changed, 1742 insertions(+), 141 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.9.3

Re: [PATCH 1/6] genalloc: track beginning of allocations

2018-02-12 Thread Igor Stoppa

On 11/02/18 14:24, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:15AM +0200, Igor Stoppa wrote:
[...]

>> +/**
>> + * mem_to_units - convert references to memory into orders of allocation
> 
> Documentation/doc-guide/kernel-doc.rst recommends to to include brackets
> for function comments. I haven't noticed any difference in the resulting
> html, so I'm not sure if the brackets are actually required.

This is what I see in the example from mailine docs:

/**
 * foobar() - Brief description of foobar.
 * @argument1: Description of parameter argument1 of foobar.
 * @argument2: Description of parameter argument2 of foobar.
 *
 * Longer description of foobar.
 *
 * Return: Description of return value of foobar.
 */
int foobar(int argument1, char *argument2)

What are you referring to?

[...]

>> + * @size: amount in bytes
>> + * @order: power of 2 represented by each entry in the bitmap
>> + *
>> + * Returns the number of units representing the size.
> 
> Please s/Return/Return:/

:-( I thought I had fixed them all. thanks for spotting this.

[...]

>> + * Return: If two users alter the same bit, to one it will return
>> + * remaining entries, to the other it will return 0.
> 
> And what if there are three or four concurrent users? ;-)
> 
> I believe that a more elaborate description about what happens with
> concurrent attempts to alter the bitmap would be really helpful.

ok

--
thanks, igor

Re: [PATCH 4/6] Protectable Memory

2018-02-12 Thread Igor Stoppa

On 11/02/18 14:37, Mike Rapoport wrote:
> On Sun, Feb 11, 2018 at 05:19:18AM +0200, Igor Stoppa wrote:

>> + * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
>> + * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
> 
> typo:^ memory

thanks :-(

[...]

>> +/**
>> + * When the sysfs is ready to receive registrations, connect all the
>> + * pools previously created. Also enable further pools to be connected
>> + * right away.
>> + */
> 
> This does not seem as kernel-doc comment. Please either remove the second *
> from the opening comment mark or reformat the comment.

For this too, I thought I had caught them all, but I was wrong ...

I didn't find any mention of automated checking for comments.
Is there such tool?

--
thanks, igor

Re: [PATCH 6/6] Documentation for Pmalloc

2018-02-12 Thread Igor Stoppa



On 11/02/18 23:17, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:20AM +0200, Igor Stoppa wrote:
>> @@ -0,0 +1,114 @@
>> +SPDX-License-Identifier: CC-BY-SA-4.0
> 
> You need the '.. ' before the 'SPDX'.  See
> Documentation/process/license-rules.rst

yes, sorry, I thought I had understood how it works,
but clearly I hadn't :-(

--
igor

Re: [PATCH 6/6] Documentation for Pmalloc

2018-02-09 Thread Igor Stoppa

On 04/02/18 23:37, Randy Dunlap wrote:

[...]

>> +reason, could neither be declared as constant, nor it could take advantage
> 
>   nor could it

ok

[...]

>> +Ex: A policy that is loaded from userspace.
> 
> Either
>Example:
> or
>E.g.:
> (meaning For example)

ok

[...]

>> +Different kernel idrivers and threads can use different pools, for finer
> 
> drivers

:-( ok

[...]

>> +  in use anymore by the requestor, however it will not become avaiable for
> 
>requester; however,   available

ok

[...]

>> +- pmalloc does not provide locking support wrt allocating vs protecting
> 
> Write out "wrt" -> with respect to.

ok

>> +  an individual pool, for performance reason. It is recommended to not
> 
>  reasons.  not to

ok & ok

[...]

>> +  in the case of using directly vmalloc. The exact number depends on size
> 
>  of using vmalloc directly.  on the 
> size

ok & ok

[...]

>> +6. write protect the pool
> 
>   write-protect

ok

[...]

>> +7. use in read-only mode the handlers obtained through the allocations
> 
> handles ??

yes

---
thanks, igor

Re: [PATCH 4/6] Protectable Memory

2018-02-12 Thread Igor Stoppa

On 12/02/18 17:31, Mike Rapoport wrote:

[...]

> Seems that kernel-doc does not consider () as a valid match for the
> identifier :)
>  
> Can you please check with the below patch?

yes, it works now, than you!

--
igor

Re: [PATCH 4/6] Protectable Memory

2018-02-12 Thread Igor Stoppa



On 12/02/18 14:53, Mike Rapoport wrote:
> 'scripts/kernel-doc -v -none 

That has a quite interesting behavior.

I run it on genalloc.c while I am in the process of adding the brackets
to the function names in the kernel-doc description.

The brackets confuse the script and it fails to output the name of the
function in the log:

lib/genalloc.c:123: info: Scanning doc for get_bitmap_entry
lib/genalloc.c:139: info: Scanning doc for
lib/genalloc.c:152: info: Scanning doc for
lib/genalloc.c:164: info: Scanning doc for


The first function does not have the brackets.
The others do. So what should I do with the missing brackets?
Add them, according to the kernel docs, or leave them out?

I'd lean toward adding them.

--
igor

[PATCH 5/6] Pmalloc: self-test

2018-02-12 Thread Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 mm/Kconfig|  9 
 mm/Makefile   |  1 +
 mm/pmalloc-selftest.c | 64 +++
 mm/pmalloc-selftest.h | 24 +++
 mm/pmalloc.c  |  2 ++
 5 files changed, 100 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..098aefef78b1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
 depends on ARCH_HAS_SET_MEMORY
 select GENERIC_ALLOCATOR
 default y
+
+config PROTECTABLE_MEMORY_SELFTEST
+   bool "Run self test for pmalloc memory allocator"
+   depends on ARCH_HAS_SET_MEMORY
+   select PROTECTABLE_MEMORY
+   default n
+   help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..f7bbbfde6967 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index ..97ba52d17f69
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#include 
+#include 
+
+#include "pmalloc-selftest.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)   \
+   pr_notice("must be " expected ": %s",   \
+ is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)\
+   validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)\
+   validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+   struct gen_pool *pool_unprot;
+   struct gen_pool *pool_prot;
+   void *var_prot, *var_unprot, *var_vmall;
+
+   pr_notice("pmalloc self-test");
+   pool_unprot = pmalloc_create_pool("unprotected", 0);
+   pool_prot = pmalloc_create_pool("protected", 0);
+   BUG_ON(!(pool_unprot && pool_prot));
+
+   var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+   var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+   *(int *)var_prot = 0;
+   var_vmall = vmalloc(SIZE_2);
+   is_alloc_ok(var_unprot, 10);
+   is_alloc_ok(var_unprot, SIZE_1);
+   is_alloc_ok(var_unprot, PAGE_SIZE);
+   is_alloc_no(var_unprot, SIZE_1 + 1);
+   is_alloc_no(var_vmall, 10);
+
+
+   pfree(pool_unprot, var_unprot);
+   vfree(var_vmall);
+
+   pmalloc_protect_pool(pool_prot);
+
+   /*
+* This will intentionally trigger a WARN because the pool being
+* destroyed is not protected, which is unusual and should happen
+* on error paths only, where probably other warnings are already
+* displayed.
+*/
+   pmalloc_destroy_pool(pool_unprot);
+
+   /* This must not cause WARNings */
+   pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index ..58a5a0cbec14
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+
+#ifndef __MM_PMALLOC_SELFTEST_H
+#define __MM_PMALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index abddba90a9f6..eb445c574b19 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include "pmalloc-selftest.h"
 /*
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -494,6 +495,7 @@ static int __init pmalloc_late_init(void)
}
}
mutex_unlock(_mutex);
+   pmalloc_selftest();
return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.14.1

[PATCH 6/6] Documentation for Pmalloc

2018-02-12 Thread Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
genalloc
errseq
printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===
diff --git a/Documentation/core-api/pmalloc.rst 
b/Documentation/core-api/pmalloc.rst
new file mode 100644
index ..d9725870444e
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Protectable memory allocator
+
+
+Purpose
+---
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+---
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+---
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

[PATCH 1/6] genalloc: track beginning of allocations

2018-02-12 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c   | 631 ++-
 2 files changed, 465 insertions(+), 170 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..87f62f31b52f 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * ---
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb...LSb:
+ *
+ *  ...100010101011   <-- Read in this direction.
+ * \__|\__|\|\|\__|
+ *|   | | |   \___ 4 used allocation units
+ *|   | | \___ 3 empty allocation units
+ *|   | \_ 1 used allocation unit
+ *|   \___ 2 used allocation units
+ *\___ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking i

[RFC PATCH v16 0/6] mm: security: ro protection for dynamic data

2018-02-12 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v15:
[http://www.openwall.com/lists/kernel-hardening/2018/02/11/4]

- Fixed remaining broken comments
- Fixed remaining broken "Returns" instead of "Return:" in kernel-doc
- Converted "Return:" values to lists
- Fixed SPDX license statements

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +++
 include/linux/genalloc-selftest.h  |  26 ++
 include/linux/genalloc.h   |   7 +-
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 242 ++
 include/linux/vmalloc.h|   1 +
 init/main.c|   2 +
 lib/Kconfig|  15 +
 lib/Makefile   |   1 +
 lib/genalloc-selftest.c| 400 ++
 lib/genalloc.c | 658 +++--
 mm/Kconfig |  15 +
 mm/Makefile|   2 +
 mm/pmalloc-selftest.c  |  64 
 mm/pmalloc-selftest.h  |  24 ++
 mm/pmalloc.c   | 501 
 mm/usercopy.c  |  33 ++
 mm/vmalloc.c   |  18 +-
 19 files changed, 1950 insertions(+), 175 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.14.1

[PATCH 4/6] Protectable Memory

2018-02-12 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 242 +++
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c   |  27 +++
 mm/Kconfig   |   6 +
 mm/Makefile  |   1 +
 mm/pmalloc.c | 499 +++
 mm/usercopy.c|  33 
 8 files changed, 812 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, 
size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index ..afc2068d5545
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include 
+#include 
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool   - success
+ * * NULL  - error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+int min_alloc_order);
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0 - the object does not belong to pmalloc
+ * * 1 - the object belongs to pmalloc
+ * * \-1   - the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc() - tries to allocate a memory chunk of the re

Re: [PATCH 3/6] struct page: add field for vm_struct

2018-02-12 Thread Igor Stoppa



On 11/02/18 23:16, Matthew Wilcox wrote:
> On Sun, Feb 11, 2018 at 05:19:17AM +0200, Igor Stoppa wrote:
>> The struct page has a "mapping" field, which can be re-used, to store a
>> pointer to the parent area. This will avoid more expensive searches.
>>
>> As example, the function find_vm_area is reimplemented, to take advantage
>> of the newly introduced field.
> 
> Umm.  Is it more efficient?  You're replacing an rb-tree search with a
> page-table walk.  You eliminate a spinlock, which is great, but is the
> page-table walk more efficient?  I suppose it'll depend on the depth of
> the rb-tree, and (at least on x86), the page tables should already be
> in cache.

I thought the tradeoff favorable. How to verify it?

> Unrelated to this patch, I'm working on a patch to give us page_type,
> and I think I'll allocate a bit to mark pages which are vmalloced.

pmalloced too?

--
igor

[PATCH 2/6] genalloc: selftest

2018-02-12 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  26 +++
 init/main.c   |   2 +
 lib/Kconfig   |  15 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 400 ++
 5 files changed, 444 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index ..e0ac8f963abc
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+
+#ifndef __LINUX_GENALLOC_SELFTEST_H
+#define __LINUX_GENALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..0d526c004e81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..fadb30abde08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index ..e86be22c5512
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,400 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+   for (i = 1; i <= BITMAP_SIZE_C; i++)
+   bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+   *bm = '\0';
+   pr_notice("chunk: %pbitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+   CMD_ALLOCATOR,
+   CMD_

[PATCH 3/6] struct page: add field for vm_struct

2018-02-12 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..9404ffd0ee98 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long 
size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-   struct vmap_area *va;
+   struct page *page;
 
-   va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA)
-   return va->vm;
+   if (unlikely(!is_vmalloc_addr(addr)))
+   return NULL;
 
-   return NULL;
+   page = vmalloc_to_page(addr);
+   if (unlikely(!page))
+   return NULL;
+
+   return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
const void *caller)
 {
struct vm_struct *area;
+   unsigned int i;
void *addr;
unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
 
kmemleak_vmalloc(area, size, gfp_mask);
 
+   for (i = 0; i < area->nr_pages; i++)
+   area->pages[i]->area = area;
+
return addr;
 
 fail:
-- 
2.14.1

Re: [PATCH 1/6] genalloc: track beginning of allocations

2018-02-09 Thread Igor Stoppa



On 05/02/18 00:34, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

> It would be good for a lot of this to be in a source file or the
> pmalloc.rst documentation file instead of living only in the git repository.

This is actually about genalloc. The genalloc documentation is high
level and mostly about the API, while this talks about the guts of the
library. The part modified by the patch. This text doesn't seem to
belong to the generic genalloc documentation.
I will move it to the .c file, but isn't it too much text in a source file?

[...]

>> + * @order: pow of 2 represented by each entry in the bitmap
> 
>   power

ok

[...]

>> + * chunk_size - dimension of a chunk of memory
> 
> can this be more explicit about which dimension?

I'll put "size in bytes of a chunk of memory"


[...]

>> + * cleart_bits_ll - according to the mask, clears the bits specified by
> 
>   clear_bits_ll

yes :-(

[...]

>> - * bitmap_clear_ll - clear the specified number of bits at the specified 
>> position
>> + * alter_bitmap_ll - set or clear the entries associated to an allocation
> 
> with an allocation

ok


>> + * @alteration: selection if the bits selected should be set or cleared
> 
>indicates if

ok


[...]

>> +/* Prepare for writing the initial part of the allocation, from
>> + * starting entry, to the end of the UL bitmap element which
>> + * contains it. It might be larger than the actual allocation.
>> + */
> 
> Use kernel multi-line comment style.

ok, also for further occurrences

[...]

>> +index =  BITS_DIV_LONGS(start_bit);
> 
>   index = BITS_DIV_LONGS
> (only 1 space after '=')

oops, yes

--
thank you, igor

[RFC PATCH v15 0/6] mm: security: ro protection for dynamic data

2018-02-10 Thread Igor Stoppa

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v14:
[http://www.openwall.com/lists/kernel-hardening/2018/02/04/2]

- fix various warnings from sparse
- multiline comments
- fix naming of headers guards
- fix compilation of individual patches, for bisect
- split genalloc documentation about bitmap for allocation
- fix headers to match kerneldoc format for "Return:" field
- fix variable naming according to coding guidelines
- fix wrong default value for pmalloc Kconfig option
- refreshed integration of pmalloc with hardened usercopy
- removed unnecessary include that was causing compilation failures
- changed license of pmalloc documentation from GPL 2.0 to CC-BY-SA-4.0

Igor Stoppa (6):
  genalloc: track beginning of allocations
  genalloc: selftest
  struct page: add field for vm_struct
  Protectable Memory
  Pmalloc: self-test
  Documentation for Pmalloc

 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 
 include/linux/genalloc-selftest.h  |  26 ++
 include/linux/genalloc.h   |   7 +-
 include/linux/mm_types.h   |   1 +
 include/linux/pmalloc.h| 222 +++
 include/linux/vmalloc.h|   1 +
 init/main.c|   2 +
 lib/Kconfig|  15 +
 lib/Makefile   |   1 +
 lib/genalloc-selftest.c| 400 ++
 lib/genalloc.c | 554 +++--
 mm/Kconfig |  15 +
 mm/Makefile|   2 +
 mm/pmalloc-selftest.c  |  63 +
 mm/pmalloc-selftest.h  |  24 ++
 mm/pmalloc.c   | 499 +
 mm/usercopy.c  |  33 +++
 mm/vmalloc.c   |  18 +-
 19 files changed, 1852 insertions(+), 146 deletions(-)
 create mode 100644 Documentation/core-api/pmalloc.rst
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 lib/genalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h
 create mode 100644 mm/pmalloc.c

-- 
2.14.1

[PATCH 1/6] genalloc: track beginning of allocations

2018-02-10 Thread Igor Stoppa

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
  allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   4 +-
 lib/genalloc.c   | 527 ++-
 2 files changed, 390 insertions(+), 141 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..dcaa33e74b1c 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 struct device;
 struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr;  /* physical starting address of memory 
chunk */
unsigned long start_addr;   /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk 
(inclusive) */
-   unsigned long bits[0];  /* bitmap for allocating memory chunk */
+   unsigned long entries[0];   /* bitmap for allocating memory chunk */
 };
 
 /*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..044347163acb 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -26,6 +26,74 @@
  *
  * This source code is licensed under the GNU General Public License,
  * Version 2.  See the file COPYING for more details.
+ *
+ *
+ *
+ * Encoding of the bitmap tracking the allocations
+ * ---
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ *  - allocation status (busy/free)
+ *  - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb...LSb:
+ *
+ *  ...100010101011   <-- Read in this direction.
+ * \__|\__|\|\|\__|
+ *|   | | |   \___ 4 used allocation units
+ *|   | | \___ 3 empty allocation units
+ *|   | \_ 1 used allocation unit
+ *|   \___ 2 used allocation units
+ *\___ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the research for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking i

[PATCH 4/6] Protectable Memory

2018-02-10 Thread Igor Stoppa

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc.h |   3 +
 include/linux/pmalloc.h  | 222 +
 include/linux/vmalloc.h  |   1 +
 lib/genalloc.c   |  27 +++
 mm/Kconfig   |   6 +
 mm/Makefile  |   1 +
 mm/pmalloc.c | 497 +++
 mm/usercopy.c|  33 
 8 files changed, 790 insertions(+)
 create mode 100644 include/linux/pmalloc.h
 create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index dcaa33e74b1c..b6c4cea9fbd8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, 
size_t,
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+struct gen_pool_chunk *chunk);
 extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
 extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index ..624379a937c5
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include 
+#include 
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ *   from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return: pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+int min_alloc_order);
+
+/**
+ * is_pmalloc_object - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return: 0 if the object does not belong to pmalloc, 1 if it belongs to
+ * pmalloc, -1 if it partially overlaps pmalloc meory, but incorectly.
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handle to the pool to be used for memory allocati

[PATCH 6/6] Documentation for Pmalloc

2018-02-10 Thread Igor Stoppa

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 Documentation/core-api/index.rst   |   1 +
 Documentation/core-api/pmalloc.rst | 114 +
 2 files changed, 115 insertions(+)
 create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
genalloc
errseq
printk-formats
+   pmalloc
 
 Interfaces for kernel debugging
 ===
diff --git a/Documentation/core-api/pmalloc.rst 
b/Documentation/core-api/pmalloc.rst
new file mode 100644
index ..1afc13b6a74d
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,114 @@
+SPDX-License-Identifier: CC-BY-SA-4.0
+
+Protectable memory allocator
+
+
+Purpose
+---
+
+The pmalloc library is meant to provide R/O status to data that, for some
+reason, could neither be declared as constant, nor could it take advantage
+of the qualifier __ro_after_init, but is write-once and read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+---
+
+pmalloc builds on top of genalloc, using the same concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become R/O, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes R/O and when. And for improved lockless concurrency.
+
+
+Caveats
+---
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+  from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+  in use anymore by the requester; however, it will not become available
+  for further use, until the pool is destroyed.
+
+- Before destroying a pool, all the memory allocated from it must be
+  released.
+
+- pmalloc does not provide locking support with respect to allocating vs
+  protecting an individual pool, for performance reasons.
+  It is recommended not to share the same pool between unrelated functions.
+  Should sharing be a necessity, the user of the shared pool is expected
+  to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+  through vmalloc. Some more TLB entries will be used, however less than
+  in the case of using vmalloc directly. The exact number depends on the
+  size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+  and then marked as read-only, it shouldn't be an issue that the address
+  range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+  during an initial transient, after which there should be no more need to
+  perform cross-processor synchronizations of page tables.
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+  helper functions are provided, mirroring their kmalloc counterparts.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_create_pool
+
+2. [optional] pre-allocate some memory in the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_prealloc
+
+3. issue one or more allocation requests to the pool with locking as needed
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pzalloc
+
+4. initialize the memory obtained with desired values
+
+5. [optional] iterate over points 3 & 4 as needed
+
+6. write-protect the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_protect_pool
+
+7. use in read-only mode the handles obtained through the allocations
+
+8. [optional] release all the memory allocated
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pfree
+
+9. [optional, but depends on point 8] destroy the pool
+
+.. kernel-doc:: include/linux/pmalloc.h
+   :functions: pmalloc_destroy_pool
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
-- 
2.14.1

[PATCH 5/6] Pmalloc: self-test

2018-02-10 Thread Igor Stoppa

Add basic self-test functionality for pmalloc.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 mm/Kconfig|  9 
 mm/Makefile   |  1 +
 mm/pmalloc-selftest.c | 63 +++
 mm/pmalloc-selftest.h | 24 
 mm/pmalloc.c  |  2 ++
 5 files changed, 99 insertions(+)
 create mode 100644 mm/pmalloc-selftest.c
 create mode 100644 mm/pmalloc-selftest.h

diff --git a/mm/Kconfig b/mm/Kconfig
index be578fbdce6d..098aefef78b1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -766,3 +766,12 @@ config PROTECTABLE_MEMORY
 depends on ARCH_HAS_SET_MEMORY
 select GENERIC_ALLOCATOR
 default y
+
+config PROTECTABLE_MEMORY_SELFTEST
+   bool "Run self test for pmalloc memory allocator"
+   depends on ARCH_HAS_SET_MEMORY
+   select PROTECTABLE_MEMORY
+   default n
+   help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..f7bbbfde6967 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
 obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += page_poison.o
 obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index ..05acd62d23ec
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#include 
+#include 
+
+#include "pmalloc-selftest.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size)   \
+   pr_notice("must be " expected ": %s",   \
+ is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size)\
+   validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size)\
+   validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+   struct gen_pool *pool_unprot;
+   struct gen_pool *pool_prot;
+   void *var_prot, *var_unprot, *var_vmall;
+
+   pr_notice("pmalloc self-test");
+   pool_unprot = pmalloc_create_pool("unprotected", 0);
+   pool_prot = pmalloc_create_pool("protected", 0);
+   BUG_ON(!(pool_unprot && pool_prot));
+
+   var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
+   var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
+   *(int *)var_prot = 0;
+   var_vmall = vmalloc(SIZE_2);
+   is_alloc_ok(var_unprot, 10);
+   is_alloc_ok(var_unprot, SIZE_1);
+   is_alloc_ok(var_unprot, PAGE_SIZE);
+   is_alloc_no(var_unprot, SIZE_1 + 1);
+   is_alloc_no(var_vmall, 10);
+
+
+   pfree(pool_unprot, var_unprot);
+   vfree(var_vmall);
+
+   pmalloc_protect_pool(pool_prot);
+
+   /* This will intentionally trigger a WARN because the pool being
+* destroyed is not protected, which is unusual and should happen
+* on error paths only, where probably other warnings are already
+* displayed.
+*/
+   pmalloc_destroy_pool(pool_unprot);
+
+   /* This must not cause WARNings */
+   pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index ..63e430de74f8
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+
+#ifndef __MM_PMALLOC_SELFTEST_H
+#define __MM_PMALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index e94bfb407c92..c9a472730afc 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include "pmalloc-selftest.h"
 /*
  * pmalloc_data contains the data specific to a pmalloc pool,
  * in a format compatible with the design of gen_alloc.
@@ -492,6 +493,7 @@ static int __init pmalloc_late_init(void)
}
}
mutex_unlock(_mutex);
+   pmalloc_selftest();
return 0;
 }
 late_initcall(pmalloc_late_init);
-- 
2.14.1

[PATCH 2/6] genalloc: selftest

2018-02-10 Thread Igor Stoppa

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/genalloc-selftest.h |  26 +++
 init/main.c   |   2 +
 lib/Kconfig   |  15 ++
 lib/Makefile  |   1 +
 lib/genalloc-selftest.c   | 400 ++
 5 files changed, 444 insertions(+)
 create mode 100644 include/linux/genalloc-selftest.h
 create mode 100644 lib/genalloc-selftest.c

diff --git a/include/linux/genalloc-selftest.h 
b/include/linux/genalloc-selftest.h
new file mode 100644
index ..d8b9ae56a54e
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+
+#ifndef __LINUX_GENALLOC_SELFTEST_H
+#define __LINUX_GENALLOC_SELFTEST_H
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include 
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b954839..fb844aa3eb8c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
 */
mem_encrypt_init();
 
+   genalloc_selftest();
 #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..0d526c004e81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
 config GENERIC_ALLOCATOR
bool
 
+config GENERIC_ALLOCATOR_SELFTEST
+   bool "genalloc tester"
+   default n
+   select GENERIC_ALLOCATOR
+   help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+   bool "make the genalloc tester more verbose"
+   default n
+   select GENERIC_ALLOCATOR_SELFTEST
+   help
+ More information will be displayed during the self-testing.
+
 #
 # reed solomon support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..fadb30abde08 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
 obj-$(CONFIG_CRC8) += crc8.o
 obj-$(CONFIG_XXHASH)   += xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index ..420ed067940b
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,400 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <igor.sto...@huawei.com>
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE  (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+   struct gen_pool_chunk *chunk;
+   char bitmap[BITMAP_SIZE_C * 2 + 1];
+   unsigned long i;
+   char *bm = bitmap;
+   char *entry;
+
+   if (unlikely(pool == NULL || pool->chunks.next == NULL))
+   return;
+
+   chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+next_chunk);
+   entry = (void *)chunk->entries;
+   for (i = 1; i <= BITMAP_SIZE_C; i++)
+   bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+   *bm = '\0';
+   pr_notice("chunk: %pbitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+   CMD_ALLOCATOR,
+   CMD_

[PATCH 3/6] struct page: add field for vm_struct

2018-02-10 Thread Igor Stoppa

When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.

As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.

Signed-off-by: Igor Stoppa <igor.sto...@huawei.com>
---
 include/linux/mm_types.h |  1 +
 mm/vmalloc.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
void *s_mem;/* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+   struct vm_struct *area;
};
 
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 673942094328..9404ffd0ee98 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long 
size, unsigned long flags,
  */
 struct vm_struct *find_vm_area(const void *addr)
 {
-   struct vmap_area *va;
+   struct page *page;
 
-   va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA)
-   return va->vm;
+   if (unlikely(!is_vmalloc_addr(addr)))
+   return NULL;
 
-   return NULL;
+   page = vmalloc_to_page(addr);
+   if (unlikely(!page))
+   return NULL;
+
+   return page->area;
 }
 
 /**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
struct page *page = area->pages[i];
 
BUG_ON(!page);
+   page->area = NULL;
__free_pages(page, 0);
}
 
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
const void *caller)
 {
struct vm_struct *area;
+   unsigned int i;
void *addr;
unsigned long real_size = size;
 
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
 
kmemleak_vmalloc(area, size, gfp_mask);
 
+   for (i = 0; i < area->nr_pages; i++)
+   area->pages[i]->area = area;
+
return addr;
 
 fail:
-- 
2.14.1

Re: [PATCH 4/6] Protectable Memory

2018-02-10 Thread Igor Stoppa

On 05/02/18 00:06, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

>> + * pmalloc_create_pool - create a new protectable memory pool -
> 
> Drop trailing " -".

yes

>> + * @name: the name of the pool, must be unique
> 
> Is that enforced?  Will return NULL if @name is duplicated?

ok, I'll state it more clearly that it's enforced

[...]

>> + * @pool: handler to the pool to be used for memory allocation
> 
>  handle (I think)

yes, also for all the other ones

[...]

>> + * avoid sleping during allocation.
> 
> sleeping

yes

[...]

>> + * opposite to what is allocated on-demand when pmalloc runs out of free
> 
>   opposed to

yes

>> + * space already existing in the pool and has to invoke vmalloc.
>> + *
>> + * Returns true if the vmalloc call was successful, false otherwise.
> 
> Where is the allocated memory (pointer)?  I.e., how does the caller know
> where that memory is?
> Oh, that memory isn't yet available to the caller until it calls pmalloc(), 
> right?

yes, it's a way to:
- preemptively beef up the pool, before entering atomic context
(unlikely that it will be needed, but possible), so that there is no
need to allocate extra pages (assuming one can estimate the max memory
that will be requested)
- avoid fragmentation caused by allocating smaller groups of pages


I'll add explanation for this.

[...]

>> + * @size: amount of memory (in bytes) requested
>> + * @gfp: flags for page allocation
>> + *
>> + * Allocates memory from an unprotected pool. If the pool doesn't have
>> + * enough memory, and the request did not include GFP_ATOMIC, an attempt
>> + * is made to add a new chunk of memory to the pool
>> + * (a multiple of PAGE_SIZE), in order to fit the new request.
> 
>  fill
> What if @size is > PAGE_SIZE?

Nothing special, it gets rounded up to the nearest multiple of
PAGE_SIZE. vmalloc doesn't have only drawbacks ;-)

[...]

>> + * Returns the pointer to the memory requested upon success,
>> + * NULL otherwise (either no memory available or pool already read-only).
> 
> It would be good to use the
> * Return:
> kernel-doc notation for return values.

yes, good point, I'm fixing it everywhere in the patchset

[...]

>> + * will be availabel for further allocations.
> 
>   available

yes

[...]

>> +/**
> 
> /** means that the following comments are kernel-doc notation, but these
> comments are not, so just use /* there, please.

yes, also to the others

[...]

>> +/* is_pmalloc_object gets called pretty late, so chances are high
>> + * that the object is indeed of vmalloc type
>> + */
> 
> Multi-line comment style is
>   /*
>* comment1
>* comment..N
>*/

yes, also to the others

--
igor

Re: [PATCH 2/6] genalloc: selftest

2018-02-10 Thread Igor Stoppa



On 07/02/18 22:25, kbuild test robot wrote:

[...]

>>> lib/genalloc-selftest.c:17:10: fatal error: asm/set_memory.h: No such file 
>>> or directory
> #include 

This header is unnecessary and will be removed.

--
igor

Re: [PATCH 2/6] genalloc: selftest

2018-02-10 Thread Igor Stoppa

On 05/02/18 00:19, Randy Dunlap wrote:
> On 02/04/2018 08:47 AM, Igor Stoppa wrote:

[...]

> Please use kernel multi-line comment style.

ok for all of them

[...]

>> +BUG_ON(!locations[action->location]);
>> +print_first_chunk_bitmap(pool);
>> +BUG_ON(compare_bitmaps(pool, action->pattern));
> 
> BUG_ON() seems harsh to me, but some of the other self-tests also do that.

I would expect that the test never fails, if one is not modifying
anything related to genalloc.

But if an error slips in during development of genalloc or anything
related (like the functions used to scan the bitmaps), I think it's
better to pull the handbrake immediately, because failure in tracking
correctly the memory allocation is likely to cause corruption and every
sort of mysterious weird errors.

--
igor

Re: [PATCH 5/6] Pmalloc: self-test

2018-02-10 Thread Igor Stoppa



On 07/02/18 19:18, kbuild test robot wrote:
> Hi Igor,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on kees/for-next/pstore]
> [also build test ERROR on v4.15]
> [cannot apply to linus/master mmotm/master next-20180207]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180207-171252
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
> for-next/pstore
> config: i386-randconfig-s1-201805+bisect_validate (attached as .config)
> compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>mm/pmalloc.o: In function `pmalloc_pool_show_chunks':
>>> mm/pmalloc.c:100: undefined reference to `gen_pool_for_each_chunk'
>mm/pmalloc.o: In function `pmalloc_pool_show_size':
>>> mm/pmalloc.c:81: undefined reference to `gen_pool_size'
>mm/pmalloc.o: In function `pmalloc_pool_show_avail':
>>> mm/pmalloc.c:70: undefined reference to `gen_pool_avail'
>mm/pmalloc.o: In function `pmalloc_chunk_free':
>>> mm/pmalloc.c:459: undefined reference to `gen_pool_flush_chunk'
>mm/pmalloc.o: In function `pmalloc_create_pool':
>>> mm/pmalloc.c:173: undefined reference to `gen_pool_create'
>>> mm/pmalloc.c:210: undefined reference to `gen_pool_destroy'
>mm/pmalloc.o: In function `gen_pool_add':
>>> include/linux/genalloc.h:115: undefined reference to `gen_pool_add_virt'
>mm/pmalloc.o: In function `pmalloc':
>>> mm/pmalloc.c:357: undefined reference to `gen_pool_alloc'
>mm/pmalloc.o: In function `gen_pool_add':
>>> include/linux/genalloc.h:115: undefined reference to `gen_pool_add_virt'
>mm/pmalloc.o: In function `pmalloc':
>mm/pmalloc.c:386: undefined reference to `gen_pool_alloc'
>mm/pmalloc.o: In function `pmalloc_destroy_pool':
>mm/pmalloc.c:484: undefined reference to `gen_pool_for_each_chunk'
>mm/pmalloc.c:485: undefined reference to `gen_pool_destroy'

Wrong default value for the pmalloc Kconfig option, it must default to
true. Will fix it.

--
igor

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-09 Thread Igor Stoppa

On 05/02/18 17:40, Christopher Lameter wrote:
> On Sat, 3 Feb 2018, Igor Stoppa wrote:
> 
>>> We could even do this in a more thorough way. Can we use a ring 1 / 2
>>> distinction to create a hardened OS core that policies the rest of
>>> the ever expanding kernel with all its modules and this and that feature?
>>
>> What would be the differentiating criteria? Furthermore, what are the
>> chances
>> of invalidating the entire concept, because there is already an
>> hypervisor using
>> the higher level features?
>> That is what you are proposing, if I understand correctly.
> 
> Were there not 4 rings as well as methods by the processor vendors to
> virtualize them as well?

I think you are talking x86, mostly.
On ARM there are ELx and they are often (typically?) already used.
For x86 I cannot comment.

>>> I think that will long term be a better approach and allow more than the
>>> current hardening approaches can get you. It seems that we are willing to
>>> tolerate significant performance regressions now. So lets use the
>>> protection mechanisms that the hardware offers.
>>
>> I would rather *not* propose significant performance regression :-P
> 
> But we already have implemented significant kernel hardening which causes
> performance regressions. Using hardware capabilities allows the processor
> vendor to further optimize these mechanisms whereas the software
> preventative measures are eating up more and more performance as the pile
> them on. Plus these are methods that can be worked around. Restrictions
> implemented in a higher ring can be enforced and are much better than
> just "hardening" (which is making life difficult for the hackers and
> throwing away performannce for the average user).

What you are proposing requires major restructuring of the memory
management - at the very least - provided that it doesn't cause the
conflicts I mentioned above.

Even after you do that, the system will still be working with memory
pages, there will be still a need to segregate data within certain
pages, or pay the penalty of handling exceptions, when data with
different permissions coexist within the same page.

The way the pmalloc API is designed is meant to facilitate the
segregation and to actually improve performance, by grouping types of
data with same scope and permission.

WRT the implementation, there is a minimal exposure to the memory
provider, both for allocation and release.

Same goes for the protection mechanism.
It's a single call to the function which makes pages read only.
It would be trivial to swap it out with a call to whatever framework you
want to come up with, for implementing ring/EL based protection.

>From this perspective, you can easily provide patches that implement
what you are proposing, against pmalloc, if you really think that it's
the way to go.

I'll be happy to use them, if they provide improved performance and same
or better protection.

The way I designed pmalloc was really to be able to switch to some
alternate memory provider and/or protection mechanism, should a better
one arise.

But it can be done in a separate step, I think, since you are not
proposing to just change pmalloc, you are proposing to re-design how the
overall kernel memory hardening works (including executable pages, const
data, __ro_after_init, etc.)

--
igor

Re: [PATCH 3/6] struct page: add field for vm_struct

2018-02-09 Thread Igor Stoppa

On 05/02/18 17:33, Christopher Lameter wrote:
> On Sat, 3 Feb 2018, Igor Stoppa wrote:
> 
>> - the property of the compound page will affect the property of all the
>> pages in the compound, so when one is write protected, it can generate a
>> lot of wasted memory, if there is too much slack (because of the order)
>> With vmalloc, I can allocate any number of pages, minimizing the waste.
> 
> I thought the intend here is to create a pool where the whole pool becomes
> RO?

Yes, but why would I force the number of pages in the pool to be a power
of 2, when it can be any number?

If a need, say, 17 pages, I would have to allocate 32.
But it can be worse than that.
Since the size of the overall allocated memory is not known upfront, I
wold have a problem to decide how many pages to allocate, every time
there is need to grow the pool.

Or push the problem to the user of the API, who might be equally unaware.

Notice that there is already a function (prealloc) available to the user
of the API, if the size is known upfront.

So I do not really see how using compound pages would make memory
utilization better or even not worse.

--
igor

Re: [PATCH 1/6] genalloc: track beginning of allocations

2018-02-09 Thread Igor Stoppa



On 05/02/18 05:45, Matthew Wilcox wrote:
> On Sun, Feb 04, 2018 at 02:34:08PM -0800, Randy Dunlap wrote:
>>> +/**
>>> + * cleart_bits_ll - according to the mask, clears the bits specified by
>>
>>   clear_bits_ll
> 
> 'make W=1' should catch this ... yes?
> 
> (hint: building with 'make C=1 W=1' finds all kinds of interesting issues
> in your code.  W=12 or W=123 finds too many false positives for my tastes)

ok, thank you, I will start using it

--
igor

Re: [PATCH 2/6] genalloc: selftest

2018-02-09 Thread Igor Stoppa



On 05/02/18 02:14, Randy Dunlap wrote:
> On 02/04/2018 03:03 PM, Matthew Wilcox wrote:
>> On Sun, Feb 04, 2018 at 02:19:22PM -0800, Randy Dunlap wrote:
 +#ifndef __GENALLOC_SELFTEST_H__
 +#define __GENALLOC_SELFTEST_H__
>>>
>>> Please use _LINUX_GENALLOC_SELFTEST_H_
>>
>> willy@bobo:~/kernel/linux$ git grep define.*_H__$ include/linux/*.h |wc -l
>> 98
>> willy@bobo:~/kernel/linux$ git grep define.*_H_$ include/linux/*.h |wc -l
>> 110
>> willy@bobo:~/kernel/linux$ git grep define.*_H$ include/linux/*.h |wc -l
>> 885
>>
>> No trailing underscore is 8x as common as one trailing underscore.
> 
> OK, ack.

ok, I'll move to _LINUX_xxx_yyy_H format

--
thanks, igor

Re: [PATCH 3/6] struct page: add field for vm_struct

2018-02-09 Thread Igor Stoppa

On 06/02/18 14:37, Matthew Wilcox wrote:

[...]

> LOCAL variable names should be short, and to the point.

[...]

> (Documentation/process/coding-style.rst)

ok, will do, thanks for the pointer!

--
igor

Re: [RFC PATCH v16 0/6] mm: security: ro protection for dynamic data

2018-02-21 Thread Igor Stoppa

On 21/02/18 03:36, Dave Chinner wrote:
> On Tue, Feb 20, 2018 at 03:56:00PM -0800, Matthew Wilcox wrote:
>> On Wed, Feb 21, 2018 at 08:36:04AM +1100, Dave Chinner wrote:
>>> FWIW, I'm not wanting to use it to replace static variables. All the
>>> structures are dynamically allocated right now, and get assigned to
>>> other dynamically allocated pointers. I'd likely split the current
>>> structures into a "ro after init" 

I would prefer to use a different terminology, because, if I have
understood the use case, this is not exactly the same as __ro_after_init

So, this is my understanding:

* "const" needs to be known at link time - there might be some
adjustments later on, ex: patching of "const" pointers, after relocation
has taken place - I am assuming we are not planning to patch const data
The compiler can perform whatever optimization it feels like and it is
allowed to do, on this.

* __ro_after_init is almost the same as a const, from a r/w perspective,
but it will become effectively read_only after the completion of the
init phase. The compiler cannot use it in any way to detect errors,
AFAIK. The system will just generate a runtime error is someone tries to
alter some __ro_after_init data, when it's read-only.
The only trick available is to use, after the protection, a different
type of handle, const.

* pmalloc pools can be protected (hence the "p") at any time, but they
start as r/w. Also, they cannot be declared statically.

* data which is either const or __ro_after_init is placed into specific
sections (on arm64 it's actually the same) and its pages are then marked
as read-only.

>>> structure and rw structure, so
>>> how does the "__ro_after_init" attribute work in that case? Is it
>>> something like this?
>>>
>>> struct xfs_mount {
>>> struct xfs_mount_ro{
>>> ...
>>> } *ro __ro_after_init;
>
> 
> pointer, not embedded structure

I doubt this would work, because I think it's not possible to put a
field of a structure into a separate section, afaik.

__ro_after_init would refer to the ro field, not to the memory it refers to.

>>> ..
>>
>> No, you'd do:
>>
>> struct xfs_mount_ro {
>>  [...]
>> };

is this something that is readonly from the beginning and then shared
among mount points or is it specific to each mount point?

>> struct xfs_mount {
>>  const struct xfs_mount_ro *ro;
>>  [...]
>> };
> 
>  so that's pretty much the same thing :P

The "const" modifier is a nice way to catch errors through the compiler,
iff the ro data will not be initialized through this handle, when it's
still writable.

>>> Also, what compile time checks are in place to catch writes to
>>> ro structure members? Is sparse going to be able to check this sort
>>> of thing, like is does with endian-specific variables?
>>
>> Just labelling the pointer const should be enough for the compiler to
>> catch unintended writes.
> 
> Ok.

yes, anyway the first one trying to alter it at run time, is in for some
surprise.

 I'd be interested to have your review of the pmalloc API, if you think
 something is missing, once I send out the next revision.
>>>
>>> I'll look at it in more depth when it comes past again. :P
>>
>> I think the key question is whether you want a slab-style interface
>> or whether you want a kmalloc-style interface.  I'd been assuming
>> the former, but Igor has implemented the latter already.
> 
> Slabs are rally only useful when you have lots of a specific type of
> object. I'm concerned mostly about one-off per-mount point
> structures, of which there are relatively few. A heap-like pool per
> mount is fine for this.

That was my same sentiment.

Actually it would be even possible to simulate caches with pools: each
pool supports a granularity parameter, during creation. One could have
multiple pools, each with different granularity, but it would probably
lead to a proliferation of pools.

Instead, I preferred to have pmalloc as a drop-in replacement for the
variants of k/v/kv malloc.

The only real issue was the - previous - inability of tracking the size
of an allocation, given its address, but that is taken care of by the
patch for the genalloc bitmap.

If I could have a pointer to a good candidate for the pmalloc treatment,
I could come up with a patch, to show how it could be done.

Then it might be easier to discuss if the API needs to be modified
and/or extended somehow.

--
igor

Re: [PATCH 5/6] Pmalloc: self-test

2018-02-22 Thread Igor Stoppa



On 22/02/18 00:24, Kees Cook wrote:
> On Tue, Feb 20, 2018 at 8:40 AM, Igor Stoppa <igor.sto...@huawei.com> wrote:

[...]

>> sorry for being dense ... are you proposing that I do something to
>> lkdtm_rodata.c ? An example would probably help me understand.
> 
> It would likely live in lkdtm_perms.c (or maybe lkdtm_heap.c). Namely,
> use the pmalloc API and then attempt to write to a read-only variable
> in the pmalloc region (to prove that the permission adjustment
> actually happened). Likely a good example is
> lkdtm_WRITE_RO_AFTER_INIT().


ok, thanks for the explanation, I will do it

--
igor

Re: [RFC PATCH v16 0/6] mm: security: ro protection for dynamic data

2018-02-22 Thread Igor Stoppa

On 21/02/18 23:36, Dave Chinner wrote:
> On Wed, Feb 21, 2018 at 11:56:22AM +0200, Igor Stoppa wrote:

[...]

> It seems lots of people get confused when discussing concepts vs
> implementation... :)

IMHO, if possible, it's better to use unambiguous terms at every point.
__ro_after_init is already taken :-P

In this specific case, I wanted to be absolutely sure I understood
correctly what you need. I think I have now, thanks.

>> is this something that is readonly from the beginning and then shared
>> among mount points or is it specific to each mount point?
> 
> It's dynamically allocated for each mount point, made read-only
> before the mount completes and lives for the length of the mount
> point.

ok. And destroyed when the mount point is unmounted, I expect.

[...]

>> The "const" modifier is a nice way to catch errors through the compiler,
>> iff the ro data will not be initialized through this handle, when it's
>> still writable.
> 
> That's kinda implied by the const, isn't it? If we don't do it that
> way, then the compiler will throw errors

I might be splitting the hair, but since I'm advertising something I
worte, I don't want to look like a peddler of snake oil, in hindsight :-P

To clarify my previous comment:

* const can mean the world to the compiler, but that doesn't
automatically translate into write-protected memory, yet I do appreciate
the advantage of teaching the compiler what should not be altered.
And I have nothing against doing it.

* even if some handle will be const, it still needs to be aliased to
some other pointer that is not const, at the beginning, because it must
be initialized and it's anyway writable. So, this cannot be avoided.

--
igor

Re: [PATCH 2/6] genalloc: selftest

2018-02-22 Thread Igor Stoppa

On 22/02/18 00:28, Kees Cook wrote:
> On Tue, Feb 20, 2018 at 8:59 AM, Igor Stoppa <igor.sto...@huawei.com> wrote:
>>
>>
>> On 13/02/18 01:50, Kees Cook wrote:
>>> On Mon, Feb 12, 2018 at 8:52 AM, Igor Stoppa <igor.sto...@huawei.com> wrote:

[...]

>>>> +   genalloc_selftest();
>>>
>>> I wonder if it's possible to make this module-loadable instead? That
>>> way it could be built and tested separately.
>>
>> In my case modules are not an option.
>> Of course it could be still built in, but what is the real gain?
> 
> The gain for it being a module is that it can be loaded and tested
> separately from the final kernel image and module collection. For
> example, Chrome OS builds lots of debugging test modules but doesn't
> include them on the final image. They're only used for testing, and
> can be separate from the kernel and "production" modules.

ok

> 
>> [...]
>>
>>>> +   BUG_ON(compare_bitmaps(pool, action->pattern));
>>>
>>> There's been a lot recently on BUG vs WARN. It does seem crazy to not
>>> BUG for an allocator selftest, but if we can avoid it, we should.
>>
>> If this fails, I would expect that memory corruption is almost guaranteed.
>> Do we really want to allow the boot to continue, possibly mounting a
>> filesystem, only to corrupt it? It seems very dangerous.
> 
> I would include the rationale in either a comment or the commit log.
> BUG() tends to need to be very well justified these days.

ok

> 
>>> Also, I wonder if it might make sense to split this series up a little
>>> more, as in:
>>>
>>> 1/n: add genalloc selftest
>>> 2/n: update bitmaps
>>> 3/n: add/change bitmap tests to selftest

[...]

> I'll leave it up to the -mm folks, but that was just my thought.

The reasons why I have doubts about your proposal are:

1) leaving 2/n and 3/n separate mean that the selftest could be broken,
if one does bisections with selftest enabled

2) since I would need to change the selftest, to make it work with the
updated bitmap, it would not really prove that the change is correct.

If there was a selftest that can work without changes, after the bitmap
update, I would definitely agree to introduce it first.

OTOH, as I wrote, the bitmap patch itself is already introducing
verification of the calculated value (from the bitmap) against the
provided value (from the call parameters).

This, unfortunately, cannot be split, because it still relies on the
"find end of the allocation" capability introduced by the very same patch.

Anyway, putting aside concerns about the verification of the correctness
of the patch, I still see point #1 as a problem: breaking bisectability.

Or did I not understand correctly how this works?

--
igor

Re: [PATCH 1/6] genalloc: track beginning of allocations

2018-02-20 Thread Igor Stoppa

On 13/02/18 01:52, Kees Cook wrote:
> On Mon, Feb 12, 2018 at 8:52 AM, Igor Stoppa <igor.sto...@huawei.com> wrote:
>> @@ -738,14 +1031,16 @@ EXPORT_SYMBOL(devm_gen_pool_create);
>>
>>  #ifdef CONFIG_OF
>>  /**
>> - * of_gen_pool_get - find a pool by phandle property
>> + * of_gen_pool_get() - find a pool by phandle property
>>   * @np: device node
>>   * @propname: property name containing phandle(s)
>>   * @index: index into the phandle array
>>   *
>> - * Returns the pool that contains the chunk starting at the physical
>> - * address of the device tree node pointed at by the phandle property,
>> - * or NULL if not found.
>> + * Return:
>> + * * pool address  - it contains the chunk starting at the physical
>> + *   address of the device tree node pointed at by
>> + *   the phandle property
>> + * * NULL  - otherwise
>>   */
>>  struct gen_pool *of_gen_pool_get(struct device_node *np,
>> const char *propname, int index)
> 
> I wonder if this might be more readable by splitting the kernel-doc
> changes from the bitmap changes? I.e. fix all the kernel-doc in one
> patch, and in the following, make the bitmap changes. Maybe it's such
> a small part that it doesn't matter, though?

I had the same thought, but then I would have made most of the kerneldoc
changes to something that would be altered by the following patch,
because it would have made little sense to fix only those parts that
would have survived.

If it is really a problem to keep them together, I could put these
changes in a following patch. Would that be ok?

--
igor

Re: arm64 physmap (was Re: [kernel-hardening] [PATCH 4/6] Protectable Memory)

2018-02-20 Thread Igor Stoppa

On 14/02/18 21:29, Kees Cook wrote:
> On Wed, Feb 14, 2018 at 11:06 AM, Laura Abbott  wrote:

[...]

>> Kernel code should be fine, if it isn't that is a bug that should be
>> fixed. Modules yes are not fully protected. The conclusion from past
> 
> I think that's a pretty serious problem: we can't have aliases with
> mismatched permissions; this degrades a deterministic protection
> (read-only) to a probabilistic protection (knowing where the alias of
> a target is mapped). Having an attack be "needs some info leaks"
> instead of "need execution control to change perms" is a much lower
> bar, IMO.

Why "need execution control to change permission"?
Or, iow, what does it mean exactly?
ROP/JOP? Data-oriented control flow hijack?

Unless I misunderstand the meaning of "need execution control", I think
that "need write capability to arbitrary data address" should be
sufficient, albeit uncomfortable to use.

OTOH, "need read/write capability from/to arbitrary data address" would
be enough, I think, assuming that one knows the offset where to write to
- but that information could be inferred, for example, by scanning the
memory for known patterns.

IMHO the attack surface is so vast that it's not unreasonable to expect
that it will be possible to fish out means to perform arbitrary R/W into
kernel address space. Ex: some more recent/less tested driver.

One can argue that this sort of R/W activity probably does require some
form of execution control, but AFAIK, the only way to to prevent it, is
to have CFI - btw, is there any standardization in that sense?

So, from my (pessimistic?) perspective, the best that can be hoped for,
is to make it much harder to figure out where the data is located.

Virtual mapping has this side effect, compared to linear mapping.

But, once easier attack targets are removed, I suspect the page mapping
will become the next target.

--
igor

< 1 2 3 4 5 6 7 8 9 >

101 - 200 of 850 matches

Mail list logo