Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-06 Thread Laura Abbott
On 06/06/2017 04:34 AM, Igor Stoppa wrote:
> On 06/06/17 09:25, Christoph Hellwig wrote:
>> On Tue, Jun 06, 2017 at 01:44:32PM +0900, Tetsuo Handa wrote:
> 
> [..]
> 
>>> As far as I know, not all CONFIG_MMU=y architectures provide
>>> set_memory_ro()/set_memory_rw(). You need to provide fallback for
>>> architectures which do not provide set_memory_ro()/set_memory_rw()
>>> or kernels built with CONFIG_MMU=n.
>>
>> I think we'll just need to generalize CONFIG_STRICT_MODULE_RWX and/or
>> ARCH_HAS_STRICT_MODULE_RWX so there is a symbol to key this off.
> 
> Would STRICT_KERNEL_RWX work? It's already present.
> If both kernel text and rodata can be protected, so can pmalloc data.
> 
> ---
> igor
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 


There's already ARCH_HAS_SET_MEMORY for this purpose.

Thanks,
Laura


Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-06 Thread Igor Stoppa

On 06/06/17 15:08, Tetsuo Handa wrote:
> Igor Stoppa wrote:
 +struct pmalloc_node {
 +  struct hlist_node nodes_list;
 +  atomic_t used_words;
 +  unsigned int total_words;
 +  __PMALLOC_ALIGNED align_t data[];
 +};
>>>
>>> Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?
>>
>> In an earlier version I actually asked the same question.
>> It is currently there because I just don't know enough about various
>> architectures. The idea of having "align_t" was that it could be tied
>> into what is the most desirable alignment for each architecture.
>> But I'm actually looking for advise on this.
> 
> I think that let the compiler use natural alignment is OK.

On a 64 bit machine the preferred alignment might be either 32 or 64,
depending on the application. How can the compiler choose?


>>> You need to check for node != NULL before dereference it.
>>
>> So, if I understood correctly, there shouldn't be a case where node is
>> NULL, right?
>> Unless it has been tampered/damaged. Is that what you mean?
> 
> I meant to say
> 
> + node = __pmalloc_create_node(req_words);
> // this location.
> + starting_word = atomic_fetch_add(req_words, &node->used_words);

argh, yes


 +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
 +{
 +  unsigned long p;
 +
 +  p = (unsigned long)ptr;
 +  n += (unsigned long)ptr;
 +  for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
 +  if (is_vmalloc_addr((void *)p)) {
 +  struct page *page;
 +
 +  page = vmalloc_to_page((void *)p);
 +  if (!(page && PagePmalloc(page)))
 +  return msg;
 +  }
 +  }
 +  return NULL;
 +}
>>>
>>> I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
>>> according to check_page_span().
>>
>> It seems to work. If I am missing your point, could you please
>> use the same format of the example I made, to explain me?
> 
> If ptr == NULL and n == PAGE_SIZE so that (ptr + n) % PAGE_SIZE == 0,
> this loop will access two pages (one page containing p == 0 and another
> page containing p == PAGE_SIZE) when this loop should access only one
> page containing p == 0. When checking n bytes, it's range is 0 to n - 1.

oh, so:

p = (unsigned long) ptr;
n = p + n - 1;


--
igor


Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-06 Thread Tetsuo Handa
Igor Stoppa wrote:
> >> +struct pmalloc_node {
> >> +  struct hlist_node nodes_list;
> >> +  atomic_t used_words;
> >> +  unsigned int total_words;
> >> +  __PMALLOC_ALIGNED align_t data[];
> >> +};
> >
> > Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?
> 
> In an earlier version I actually asked the same question.
> It is currently there because I just don't know enough about various
> architectures. The idea of having "align_t" was that it could be tied
> into what is the most desirable alignment for each architecture.
> But I'm actually looking for advise on this.

I think that let the compiler use natural alignment is OK.



> > You need to check for node != NULL before dereference it.
> 
> So, if I understood correctly, there shouldn't be a case where node is
> NULL, right?
> Unless it has been tampered/damaged. Is that what you mean?

I meant to say

+   node = __pmalloc_create_node(req_words);
// this location.
+   starting_word = atomic_fetch_add(req_words, &node->used_words);



> >> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
> >> +{
> >> +  unsigned long p;
> >> +
> >> +  p = (unsigned long)ptr;
> >> +  n += (unsigned long)ptr;
> >> +  for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
> >> +  if (is_vmalloc_addr((void *)p)) {
> >> +  struct page *page;
> >> +
> >> +  page = vmalloc_to_page((void *)p);
> >> +  if (!(page && PagePmalloc(page)))
> >> +  return msg;
> >> +  }
> >> +  }
> >> +  return NULL;
> >> +}
> > 
> > I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
> > according to check_page_span().
> 
> It seems to work. If I am missing your point, could you please
> use the same format of the example I made, to explain me?

If ptr == NULL and n == PAGE_SIZE so that (ptr + n) % PAGE_SIZE == 0,
this loop will access two pages (one page containing p == 0 and another
page containing p == PAGE_SIZE) when this loop should access only one
page containing p == 0. When checking n bytes, it's range is 0 to n - 1.


Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-06 Thread Igor Stoppa
Hi,
thanks a lot for the review. My answers are in-line below.
I have rearranged your comments because I wasn't sure how to reply to
them inlined.

On 06/06/17 07:44, Tetsuo Handa wrote:
> Igor Stoppa wrote:

[...]

> As far as I know, not all CONFIG_MMU=y architectures provide
> set_memory_ro()/set_memory_rw().

I'll follow up on this in the existing separate thread.

[...]

>> +struct pmalloc_node {
>> +struct hlist_node nodes_list;
>> +atomic_t used_words;
>> +unsigned int total_words;
>> +__PMALLOC_ALIGNED align_t data[];
>> +};
>
> Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?

In an earlier version I actually asked the same question.
It is currently there because I just don't know enough about various
architectures. The idea of having "align_t" was that it could be tied
into what is the most desirable alignment for each architecture.
But I'm actually looking for advise on this.


>> +size = ((HEADER_SIZE - 1 + PAGE_SIZE) +
>> +WORD_SIZE * (unsigned long) words) & PAGE_MASK;
> 
>> +req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;
>
> Please use macros for round up/down.

ok

[...]


> + rcu_read_lock();
> + hlist_for_each_entry_rcu(node, &pool->nodes_list_head, nodes_list) {
> + starting_word = atomic_fetch_add(req_words, &node->used_words);
> + if (starting_word + req_words > node->total_words)
> + atomic_sub(req_words, &node->used_words);
> + else
> + goto found_node;
> + }
> + rcu_read_unlock();
> 
> You need to check for node != NULL before dereference it.

This was intentionally left out, on the ground that I'm using the kernel
macros for both populating and walking the list.
So, if I understood correctly, there shouldn't be a case where node is
NULL, right?
Unless it has been tampered/damaged. Is that what you mean?


> Also, why rcu_read_lock()/rcu_read_unlock() ? 
> I can't find corresponding synchronize_rcu() etc. in this patch.

oops. Thanks for spotting it.


> pmalloc() won't be hotpath. Enclosing whole using a mutex might be OK.
> If any reason to use rcu, rcu_read_unlock() is missing if came from "goto".

If there are no strong objections, I'd rather fix it and keep it as RCU.
Kees Cook was mentioning the possibility of implementing also
"write seldom" in a similar fashion.
In that case the path is likely to warm up.
It might be premature optimization, but I'd prefer to avoid knowingly
introduce performance issues.
Said this, I agree on the bug you spotted.


>> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
>> +{
>> +unsigned long p;
>> +
>> +p = (unsigned long)ptr;
>> +n += (unsigned long)ptr;
>> +for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
>> +if (is_vmalloc_addr((void *)p)) {
>> +struct page *page;
>> +
>> +page = vmalloc_to_page((void *)p);
>> +if (!(page && PagePmalloc(page)))
>> +return msg;
>> +}
>> +}
>> +return NULL;
>> +}
> 
> I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
> according to check_page_span().

Hmm,
let's say PAGE_SIZE is 0x0100 and PAGE MASK 0xFF00

Here are some cases (N = number of pages found):

 ptr   ptr + n   PagesTest  N

0x0005 0x00FF  1 0x <= 0x00FF  true 1
0x0105 0x00FF0x0100 <= 0x00FF  false1

0x0005 0x0100  2 0x <= 0x0100  true 1
0x0100 0x01000x0100 <= 0x0100  true 2
0x0200 0x01000x0200 <= 0x0100  false2

0x0005 0x01FF  2 0x <= 0x0100  true 1
0x0105 0x01FF0x0100 <= 0x0100  true 2
0x0205 0x01FF0x0200 <= 0x0100  false2

It seems to work. If I am missing your point, could you please
use the same format of the example I made, to explain me?

I might be able to understand better.

>> +int __init pmalloc_init(void)
>> +{
>> +pmalloc_data = vmalloc(sizeof(struct pmalloc_data));
>> +if (!pmalloc_data)
>> +return -ENOMEM;
>> +INIT_HLIST_HEAD(&pmalloc_data->pools_list_head);
>> +mutex_init(&pmalloc_data->pools_list_mutex);
>> +atomic_set(&pmalloc_data->pools_count, 0);
>> +return 0;
>> +}
>> +EXPORT_SYMBOL(pmalloc_init);
>
> Why need to call pmalloc_init() from loadable kernel module?
> It has to be called very early stage of boot for only once.

Yes, this is a bug.
Actually I forgot to put in this patch the real call to pmalloc init,
which is in init/main.c, right before the security init.
I should see if I can move it higher, to allow for more early users of
pmalloc.

> Since pmalloc_data is a globally shared variable, why need to
> allocate it dynamically? If it is for randomizing the address
> of pmalloc_data, it does not make sense to continue because
> vmalloc() failure causes subsequent oops

Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-06 Thread Igor Stoppa
On 06/06/17 09:25, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017 at 01:44:32PM +0900, Tetsuo Handa wrote:

[..]

>> As far as I know, not all CONFIG_MMU=y architectures provide
>> set_memory_ro()/set_memory_rw(). You need to provide fallback for
>> architectures which do not provide set_memory_ro()/set_memory_rw()
>> or kernels built with CONFIG_MMU=n.
> 
> I think we'll just need to generalize CONFIG_STRICT_MODULE_RWX and/or
> ARCH_HAS_STRICT_MODULE_RWX so there is a symbol to key this off.

Would STRICT_KERNEL_RWX work? It's already present.
If both kernel text and rodata can be protected, so can pmalloc data.

---
igor



Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-05 Thread Christoph Hellwig
On Tue, Jun 06, 2017 at 01:44:32PM +0900, Tetsuo Handa wrote:
> Igor Stoppa wrote:
> > +int pmalloc_protect_pool(struct pmalloc_pool *pool)
> > +{
> > +   struct pmalloc_node *node;
> > +
> > +   if (!pool)
> > +   return -EINVAL;
> > +   mutex_lock(&pool->nodes_list_mutex);
> > +   hlist_for_each_entry(node, &pool->nodes_list_head, nodes_list) {
> > +   unsigned long size, pages;
> > +
> > +   size = WORD_SIZE * node->total_words + HEADER_SIZE;
> > +   pages = size / PAGE_SIZE;
> > +   set_memory_ro((unsigned long)node, pages);
> > +   }
> > +   pool->protected = true;
> > +   mutex_unlock(&pool->nodes_list_mutex);
> > +   return 0;
> > +}
> 
> As far as I know, not all CONFIG_MMU=y architectures provide
> set_memory_ro()/set_memory_rw(). You need to provide fallback for
> architectures which do not provide set_memory_ro()/set_memory_rw()
> or kernels built with CONFIG_MMU=n.

I think we'll just need to generalize CONFIG_STRICT_MODULE_RWX and/or
ARCH_HAS_STRICT_MODULE_RWX so there is a symbol to key this off.


Re: [PATCH 2/5] Protectable Memory Allocator

2017-06-05 Thread Tetsuo Handa
Igor Stoppa wrote:
> +int pmalloc_protect_pool(struct pmalloc_pool *pool)
> +{
> + struct pmalloc_node *node;
> +
> + if (!pool)
> + return -EINVAL;
> + mutex_lock(&pool->nodes_list_mutex);
> + hlist_for_each_entry(node, &pool->nodes_list_head, nodes_list) {
> + unsigned long size, pages;
> +
> + size = WORD_SIZE * node->total_words + HEADER_SIZE;
> + pages = size / PAGE_SIZE;
> + set_memory_ro((unsigned long)node, pages);
> + }
> + pool->protected = true;
> + mutex_unlock(&pool->nodes_list_mutex);
> + return 0;
> +}

As far as I know, not all CONFIG_MMU=y architectures provide
set_memory_ro()/set_memory_rw(). You need to provide fallback for
architectures which do not provide set_memory_ro()/set_memory_rw()
or kernels built with CONFIG_MMU=n.

>  mmu-$(CONFIG_MMU):= gup.o highmem.o memory.o mincore.o \
>  mlock.o mmap.o mprotect.o mremap.o msync.o \
>  page_vma_mapped.o pagewalk.o pgtable-generic.o \
> -rmap.o vmalloc.o
> +rmap.o vmalloc.o pmalloc.o



Is this __PMALLOC_ALIGNED needed? Why not use "long" and "BITS_PER_LONG" ?

> +struct pmalloc_node {
> + struct hlist_node nodes_list;
> + atomic_t used_words;
> + unsigned int total_words;
> + __PMALLOC_ALIGNED align_t data[];
> +};



Please use macros for round up/down.

> + size = ((HEADER_SIZE - 1 + PAGE_SIZE) +
> + WORD_SIZE * (unsigned long) words) & PAGE_MASK;

> + req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;



You need to check for node != NULL before dereference it.
Also, why rcu_read_lock()/rcu_read_unlock() ? 
I can't find corresponding synchronize_rcu() etc. in this patch.
pmalloc() won't be hotpath. Enclosing whole using a mutex might be OK.
If any reason to use rcu, rcu_read_unlock() is missing if came from "goto".

+void *pmalloc(unsigned long size, struct pmalloc_pool *pool)
+{
+   struct pmalloc_node *node;
+   int req_words;
+   int starting_word;
+
+   if (size > INT_MAX || size == 0)
+   return NULL;
+   req_words = (((int)size) + WORD_SIZE - 1) / WORD_SIZE;
+   rcu_read_lock();
+   hlist_for_each_entry_rcu(node, &pool->nodes_list_head, nodes_list) {
+   starting_word = atomic_fetch_add(req_words, &node->used_words);
+   if (starting_word + req_words > node->total_words)
+   atomic_sub(req_words, &node->used_words);
+   else
+   goto found_node;
+   }
+   rcu_read_unlock();
+   node = __pmalloc_create_node(req_words);
+   starting_word = atomic_fetch_add(req_words, &node->used_words);
+   mutex_lock(&pool->nodes_list_mutex);
+   hlist_add_head_rcu(&node->nodes_list, &pool->nodes_list_head);
+   mutex_unlock(&pool->nodes_list_mutex);
+   atomic_inc(&pool->nodes_count);
+found_node:
+   return node->data + starting_word;
+}



I feel that n is off-by-one if (ptr + n) % PAGE_SIZE == 0
according to check_page_span().

> +const char *__pmalloc_check_object(const void *ptr, unsigned long n)
> +{
> + unsigned long p;
> +
> + p = (unsigned long)ptr;
> + n += (unsigned long)ptr;
> + for (; (PAGE_MASK & p) <= (PAGE_MASK & n); p += PAGE_SIZE) {
> + if (is_vmalloc_addr((void *)p)) {
> + struct page *page;
> +
> + page = vmalloc_to_page((void *)p);
> + if (!(page && PagePmalloc(page)))
> + return msg;
> + }
> + }
> + return NULL;
> +}



Why need to call pmalloc_init() from loadable kernel module?
It has to be called very early stage of boot for only once.

> +int __init pmalloc_init(void)
> +{
> + pmalloc_data = vmalloc(sizeof(struct pmalloc_data));
> + if (!pmalloc_data)
> + return -ENOMEM;
> + INIT_HLIST_HEAD(&pmalloc_data->pools_list_head);
> + mutex_init(&pmalloc_data->pools_list_mutex);
> + atomic_set(&pmalloc_data->pools_count, 0);
> + return 0;
> +}
> +EXPORT_SYMBOL(pmalloc_init);

Since pmalloc_data is a globally shared variable, why need to
allocate it dynamically? If it is for randomizing the address
of pmalloc_data, it does not make sense to continue because
vmalloc() failure causes subsequent oops.