Re: [Devel] [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Stanislav Kinsburskiy


14.09.2017 02:38, Ian Kent пишет:
> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>> Signed-off-by: Stanislav Kinsburskiy 
>> ---
>>  fs/autofs4/autofs_i.h  |3 +++
>>  fs/autofs4/dev-ioctl.c |3 +++
>>  fs/autofs4/inode.c |4 +++-
>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index 4737615..3da105f 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>  struct list_head active_list;
>>  struct list_head expiring_list;
>>  struct rcu_head rcu;
>> +#ifdef CONFIG_COMPAT
>> +unsigned is32bit:1;
>> +#endif
>>  };
>>  
>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index b7c816f..467d6c4 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>  sbi->pipefd = pipefd;
>>  sbi->pipe = pipe;
>>  sbi->catatonic = 0;
>> +#ifdef CONFIG_COMPAT
>> +sbi->is32bit = is_compat_task();
>> +#endif
>>  }
>>  out:
>>  put_pid(new_pid);
>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>> index 09e7d68..21d3c0b 100644
>> --- a/fs/autofs4/inode.c
>> +++ b/fs/autofs4/inode.c
>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>> *data, int silent)
>>  } else {
>>  sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>  }
>> -
>> +#ifdef CONFIG_COMPAT
>> +sbi->is32bit = is_compat_task();
>> +#endif
>>  if (autofs_type_trigger(sbi->type))
>>  __managed_dentry_set_managed(root);
>>  
>>
> 
> Not sure about this.
> 
> Don't you think it would be better to avoid the in code #ifdefs by doing some
> checks and defines in the header file and defining what's need to just use
> is_compat_task().
> 

Yes, might be...

> Not sure 2 patches are needed for this either ..
> 

Well, I found this issue occasionally.
And, frankly speaking, it's not clear to me, whether this issue is important at 
all, so I wanted to clarify this first.
Thanks to O_DIRECT, the only way to catch the issue is to try to read more, 
than expected, in compat task (that's how I found it).
I don't see any other flaw so far. And if so, that, probably, we shouldn't care 
about the issue at all.
What do you think?


> Ian
> 


 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 02:38, Ian Kent пишет:
>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>>> Signed-off-by: Stanislav Kinsburskiy 
>>> ---
>>>  fs/autofs4/autofs_i.h  |3 +++
>>>  fs/autofs4/dev-ioctl.c |3 +++
>>>  fs/autofs4/inode.c |4 +++-
>>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>>> index 4737615..3da105f 100644
>>> --- a/fs/autofs4/autofs_i.h
>>> +++ b/fs/autofs4/autofs_i.h
>>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>> struct list_head active_list;
>>> struct list_head expiring_list;
>>> struct rcu_head rcu;
>>> +#ifdef CONFIG_COMPAT
>>> +   unsigned is32bit:1;
>>> +#endif
>>>  };
>>>  
>>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>>> index b7c816f..467d6c4 100644
>>> --- a/fs/autofs4/dev-ioctl.c
>>> +++ b/fs/autofs4/dev-ioctl.c
>>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>>> sbi->pipefd = pipefd;
>>> sbi->pipe = pipe;
>>> sbi->catatonic = 0;
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> }
>>>  out:
>>> put_pid(new_pid);
>>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>>> index 09e7d68..21d3c0b 100644
>>> --- a/fs/autofs4/inode.c
>>> +++ b/fs/autofs4/inode.c
>>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>>> *data, int silent)
>>> } else {
>>> sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>> }
>>> -
>>> +#ifdef CONFIG_COMPAT
>>> +   sbi->is32bit = is_compat_task();
>>> +#endif
>>> if (autofs_type_trigger(sbi->type))
>>> __managed_dentry_set_managed(root);
>>>  
>>>
>>
>> Not sure about this.
>>
>> Don't you think it would be better to avoid the in code #ifdefs by doing some
>> checks and defines in the header file and defining what's need to just use
>> is_compat_task().
>>
> 
> Yes, might be...
> 
>> Not sure 2 patches are needed for this either ..
>>
> 
> Well, I found this issue occasionally.

I'm wondering what the symptoms are?

> And, frankly speaking, it's not clear to me, whether this issue is important 
> at all, so I wanted to clarify this first.
> Thanks to O_DIRECT, the only way to catch the issue is to try to read more, 
> than expected, in compat task (that's how I found it).

Right, the O_DIRECT patch from Linus was expected to fix the structure
alignment problem. The stuct field offsets are ok aren't they?

> I don't see any other flaw so far. And if so, that, probably, we shouldn't 
> care about the issue at all.
> What do you think?

If we are seeing hangs, incorrect struct fields or similar something
should be done about it but if all is actually working ok then the
O_DIRECT fix is doing it's job and further changes aren't necessary.

Ian
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Stanislav Kinsburskiy


14.09.2017 13:29, Ian Kent пишет:
> On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
>>
>>
>> 14.09.2017 02:38, Ian Kent пишет:
>>> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
 Signed-off-by: Stanislav Kinsburskiy 
 ---
  fs/autofs4/autofs_i.h  |3 +++
  fs/autofs4/dev-ioctl.c |3 +++
  fs/autofs4/inode.c |4 +++-
  3 files changed, 9 insertions(+), 1 deletion(-)

 diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
 index 4737615..3da105f 100644
 --- a/fs/autofs4/autofs_i.h
 +++ b/fs/autofs4/autofs_i.h
 @@ -120,6 +120,9 @@ struct autofs_sb_info {
struct list_head active_list;
struct list_head expiring_list;
struct rcu_head rcu;
 +#ifdef CONFIG_COMPAT
 +  unsigned is32bit:1;
 +#endif
  };
  
  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
 diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
 index b7c816f..467d6c4 100644
 --- a/fs/autofs4/dev-ioctl.c
 +++ b/fs/autofs4/dev-ioctl.c
 @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
sbi->pipefd = pipefd;
sbi->pipe = pipe;
sbi->catatonic = 0;
 +#ifdef CONFIG_COMPAT
 +  sbi->is32bit = is_compat_task();
 +#endif
}
  out:
put_pid(new_pid);
 diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
 index 09e7d68..21d3c0b 100644
 --- a/fs/autofs4/inode.c
 +++ b/fs/autofs4/inode.c
 @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
 *data, int silent)
} else {
sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
}
 -
 +#ifdef CONFIG_COMPAT
 +  sbi->is32bit = is_compat_task();
 +#endif
if (autofs_type_trigger(sbi->type))
__managed_dentry_set_managed(root);
  

>>>
>>> Not sure about this.
>>>
>>> Don't you think it would be better to avoid the in code #ifdefs by doing 
>>> some
>>> checks and defines in the header file and defining what's need to just use
>>> is_compat_task().
>>>
>>
>> Yes, might be...
>>
>>> Not sure 2 patches are needed for this either ..
>>>
>>
>> Well, I found this issue occasionally.
> 
> I'm wondering what the symptoms are?
> 

Size of struct autofs_v5_packet is 300 bytes for x86 and 304 bytes for x86_64.
Which means, that 32bit task can read more than size of autofs_v5_packet on 
64bit kernel.

>> And, frankly speaking, it's not clear to me, whether this issue is important 
>> at all, so I wanted to clarify this first.
>> Thanks to O_DIRECT, the only way to catch the issue is to try to read more, 
>> than expected, in compat task (that's how I found it).
> 
> Right, the O_DIRECT patch from Linus was expected to fix the structure
> alignment problem. The stuct field offsets are ok aren't they?
> 

Yes, they are ok.

>> I don't see any other flaw so far. And if so, that, probably, we shouldn't 
>> care about the issue at all.
>> What do you think?
> 
> If we are seeing hangs, incorrect struct fields or similar something
> should be done about it but if all is actually working ok then the
> O_DIRECT fix is doing it's job and further changes aren't necessary.
> 

Well, yes. O_DIRECT fix covers the issue.
Ok then.
Thanks for the clarification!

> Ian
> 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Ian Kent
On 14/09/17 19:39, Stanislav Kinsburskiy wrote:
> 
> 
> 14.09.2017 13:29, Ian Kent пишет:
>> On 14/09/17 17:24, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 14.09.2017 02:38, Ian Kent пишет:
 On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  fs/autofs4/autofs_i.h  |3 +++
>  fs/autofs4/dev-ioctl.c |3 +++
>  fs/autofs4/inode.c |4 +++-
>  3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
> index 4737615..3da105f 100644
> --- a/fs/autofs4/autofs_i.h
> +++ b/fs/autofs4/autofs_i.h
> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>   struct list_head active_list;
>   struct list_head expiring_list;
>   struct rcu_head rcu;
> +#ifdef CONFIG_COMPAT
> + unsigned is32bit:1;
> +#endif
>  };
>  
>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
> index b7c816f..467d6c4 100644
> --- a/fs/autofs4/dev-ioctl.c
> +++ b/fs/autofs4/dev-ioctl.c
> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
>   sbi->pipefd = pipefd;
>   sbi->pipe = pipe;
>   sbi->catatonic = 0;
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   }
>  out:
>   put_pid(new_pid);
> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
> index 09e7d68..21d3c0b 100644
> --- a/fs/autofs4/inode.c
> +++ b/fs/autofs4/inode.c
> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
> *data, int silent)
>   } else {
>   sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>   }
> -
> +#ifdef CONFIG_COMPAT
> + sbi->is32bit = is_compat_task();
> +#endif
>   if (autofs_type_trigger(sbi->type))
>   __managed_dentry_set_managed(root);
>  
>

 Not sure about this.

 Don't you think it would be better to avoid the in code #ifdefs by doing 
 some
 checks and defines in the header file and defining what's need to just use
 is_compat_task().

>>>
>>> Yes, might be...
>>>
 Not sure 2 patches are needed for this either ..

>>>
>>> Well, I found this issue occasionally.
>>
>> I'm wondering what the symptoms are?
>>
> 
> Size of struct autofs_v5_packet is 300 bytes for x86 and 304 bytes for x86_64.
> Which means, that 32bit task can read more than size of autofs_v5_packet on 
> 64bit kernel.

Are you sure?

Shouldn't that be a short read on the x86 side of a 4 bytes longer
structure on the x86_64 side.

I didn't think you could have a 64 bit client on a 32 bit kernel
so the converse (the read past end of struct) doesn't apply.

Ian
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-09-14 Thread Stanislav Kinsburskiy


14.09.2017 13:45, Ian Kent пишет:
> On 14/09/17 19:39, Stanislav Kinsburskiy wrote:
>>
>>
>> 14.09.2017 13:29, Ian Kent пишет:
>>> On 14/09/17 17:24, Stanislav Kinsburskiy wrote:


 14.09.2017 02:38, Ian Kent пишет:
> On 01/09/17 19:21, Stanislav Kinsburskiy wrote:
>> Signed-off-by: Stanislav Kinsburskiy 
>> ---
>>  fs/autofs4/autofs_i.h  |3 +++
>>  fs/autofs4/dev-ioctl.c |3 +++
>>  fs/autofs4/inode.c |4 +++-
>>  3 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
>> index 4737615..3da105f 100644
>> --- a/fs/autofs4/autofs_i.h
>> +++ b/fs/autofs4/autofs_i.h
>> @@ -120,6 +120,9 @@ struct autofs_sb_info {
>>  struct list_head active_list;
>>  struct list_head expiring_list;
>>  struct rcu_head rcu;
>> +#ifdef CONFIG_COMPAT
>> +unsigned is32bit:1;
>> +#endif
>>  };
>>  
>>  static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
>> index b7c816f..467d6c4 100644
>> --- a/fs/autofs4/dev-ioctl.c
>> +++ b/fs/autofs4/dev-ioctl.c
>> @@ -397,6 +397,9 @@ static int autofs_dev_ioctl_setpipefd(struct file 
>> *fp,
>>  sbi->pipefd = pipefd;
>>  sbi->pipe = pipe;
>>  sbi->catatonic = 0;
>> +#ifdef CONFIG_COMPAT
>> +sbi->is32bit = is_compat_task();
>> +#endif
>>  }
>>  out:
>>  put_pid(new_pid);
>> diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
>> index 09e7d68..21d3c0b 100644
>> --- a/fs/autofs4/inode.c
>> +++ b/fs/autofs4/inode.c
>> @@ -301,7 +301,9 @@ int autofs4_fill_super(struct super_block *s, void 
>> *data, int silent)
>>  } else {
>>  sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
>>  }
>> -
>> +#ifdef CONFIG_COMPAT
>> +sbi->is32bit = is_compat_task();
>> +#endif
>>  if (autofs_type_trigger(sbi->type))
>>  __managed_dentry_set_managed(root);
>>  
>>
>
> Not sure about this.
>
> Don't you think it would be better to avoid the in code #ifdefs by doing 
> some
> checks and defines in the header file and defining what's need to just use
> is_compat_task().
>

 Yes, might be...

> Not sure 2 patches are needed for this either ..
>

 Well, I found this issue occasionally.
>>>
>>> I'm wondering what the symptoms are?
>>>
>>
>> Size of struct autofs_v5_packet is 300 bytes for x86 and 304 bytes for 
>> x86_64.
>> Which means, that 32bit task can read more than size of autofs_v5_packet on 
>> 64bit kernel.
> 
> Are you sure?
> 
> Shouldn't that be a short read on the x86 side of a 4 bytes longer
> structure on the x86_64 side.
> 
> I didn't think you could have a 64 bit client on a 32 bit kernel
> so the converse (the read past end of struct) doesn't apply.
> 

Sorry for the confusion, I had to add brackets like this:

> Which means, that 32bit task can read more than size of autofs_v5_packet (on 
> 64bit kernel).

IOW, 32bit task expects to read 300 bytes (size of struct autofs_v5_packet) 
while there are 304 bytes available on the "wire" from the 64bit kernel.


> Ian
> 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 05/39] mm/mempool: avoid KASAN marking mempool poison checks as use-after-free

2017-09-14 Thread Andrey Ryabinin
From: Matthew Dawson 

When removing an element from the mempool, mark it as unpoisoned in KASAN
before verifying its contents for SLUB/SLAB debugging.  Otherwise KASAN
will flag the reads checking the element use-after-free writes as
use-after-free reads.

Signed-off-by: Matthew Dawson 
Acked-by: Andrey Ryabinin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 7640131032db9118a78af715ac77ba2debeeb17c)
Signed-off-by: Andrey Ryabinin 
---
 mm/mempool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mempool.c b/mm/mempool.c
index d9fa60d6f098..791ec8a148b4 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -135,8 +135,8 @@ static void *remove_element(mempool_t *pool)
void *element = pool->elements[--pool->curr_nr];
 
BUG_ON(pool->curr_nr < 0);
-   check_element(pool, element);
kasan_unpoison_element(pool, element);
+   check_element(pool, element);
return element;
 }
 
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 04/39] mm, kasan: SLAB support

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Add KASAN hooks to SLAB allocator.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Steven Rostedt 
Cc: Konstantin Serebryany 
Cc: Dmitry Chernenkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 0295fd5d570626817d10deadf5a2ad5e49c36a1d)
Signed-off-by: Andrey Ryabinin 
---
 Documentation/kasan.txt  |   7 ++--
 include/linux/kasan.h|  12 ++
 include/linux/slab.h |   6 +++
 include/linux/slab_def.h |  14 +++
 include/linux/slub_def.h |  11 +
 lib/Kconfig.kasan|   8 ++--
 mm/Makefile  |   1 +
 mm/kasan/kasan.c | 102 +++
 mm/kasan/kasan.h |  34 
 mm/kasan/report.c|  54 -
 mm/slab.c|  39 +++---
 mm/slub.c|   2 +-
 12 files changed, 266 insertions(+), 24 deletions(-)

diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
index 310746718f21..7dd95b35cd7c 100644
--- a/Documentation/kasan.txt
+++ b/Documentation/kasan.txt
@@ -12,8 +12,7 @@ KASAN uses compile-time instrumentation for checking every 
memory access,
 therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
 required for detection of out-of-bounds accesses to stack or global variables.
 
-Currently KASAN is supported only for x86_64 architecture and requires the
-kernel to be built with the SLUB allocator.
+Currently KASAN is supported only for x86_64 architecture.
 
 1. Usage
 
@@ -27,8 +26,8 @@ inline are compiler instrumentation types. The former 
produces smaller binary
 the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
 version 5.0 or later.
 
-Currently KASAN works only with the SLUB memory allocator.
-For better bug detection and nicer report and enable CONFIG_STACKTRACE.
+KASAN works with both SLUB and SLAB memory allocators.
+For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.
 
 To disable instrumentation for specific files or directories, add a line
 similar to the following to the respective kernel Makefile:
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 5486d777b706..f55c31becdb6 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -38,6 +38,9 @@ void kasan_unpoison_shadow(const void *address, size_t size);
 void kasan_alloc_pages(struct page *page, unsigned int order);
 void kasan_free_pages(struct page *page, unsigned int order);
 
+void kasan_cache_create(struct kmem_cache *cache, size_t *size,
+   unsigned long *flags);
+
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
 void kasan_poison_object_data(struct kmem_cache *cache, void *object);
@@ -51,6 +54,11 @@ void kasan_krealloc(const void *object, size_t new_size);
 void kasan_slab_alloc(struct kmem_cache *s, void *object);
 void kasan_slab_free(struct kmem_cache *s, void *object);
 
+struct kasan_cache {
+   int alloc_meta_offset;
+   int free_meta_offset;
+};
+
 int kasan_module_alloc(void *addr, size_t size);
 void kasan_free_shadow(const struct vm_struct *vm);
 
@@ -64,6 +72,10 @@ static inline void kasan_disable_current(void) {}
 static inline void kasan_alloc_pages(struct page *page, unsigned int order) {}
 static inline void kasan_free_pages(struct page *page, unsigned int order) {}
 
+static inline void kasan_cache_create(struct kmem_cache *cache,
+ size_t *size,
+ unsigned long *flags) {}
+
 static inline void kasan_poison_slab(struct page *page) {}
 static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
void *object) {}
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1d956669b013..7dc1b73cdcec 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -85,6 +85,12 @@
 # define SLAB_ACCOUNT  0xUL
 #endif
 
+#ifdef CONFIG_KASAN
+#define SLAB_KASAN 0x0800UL
+#else
+#define SLAB_KASAN 0xUL
+#endif
+
 /* The following flags affect the page allocator grouping pages by mobility */
 #define SLAB_RECLAIM_ACCOUNT   0x0002UL/* Objects are 
reclaimable */
 #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT/* Objects are 
short-lived */
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index bf41210f8f54..13c72b34c6f4 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -70,6 +70,9 @@ struct kmem_cache {
 #ifdef CONFIG_MEMCG_KMEM
struct memcg_cache_params memcg_params;
 #

[Devel] [PATCH rh7 12/39] lib/stackdepot: avoid to return 0 handle

2017-09-14 Thread Andrey Ryabinin
From: Joonsoo Kim 

Recently, we allow to save the stacktrace whose hashed value is 0.  It
causes the problem that stackdepot could return 0 even if in success.
User of stackdepot cannot distinguish whether it is success or not so we
need to solve this problem.  In this patch, 1 bit are added to handle
and make valid handle none 0 by setting this bit.  After that, valid
handle will not be 0 and 0 handle will represent failure correctly.

Fixes: 4e25769c ("lib/stackdepot.c: allow the stack trace hash to be zero")
Link: 
http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo@lge.com
Signed-off-by: Joonsoo Kim 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 7c31190bcfdbff225950902a9f226e4eb79ca94f)
Signed-off-by: Andrey Ryabinin 
---
 lib/stackdepot.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 9e0b0315a724..53ad6c0831ae 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -42,12 +42,14 @@
 
 #define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
 
+#define STACK_ALLOC_NULL_PROTECTION_BITS 1
 #define STACK_ALLOC_ORDER 2 /* 'Slab' size order for stack depot, 4 pages */
 #define STACK_ALLOC_SIZE (1LL << (PAGE_SHIFT + STACK_ALLOC_ORDER))
 #define STACK_ALLOC_ALIGN 4
 #define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
STACK_ALLOC_ALIGN)
-#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - STACK_ALLOC_OFFSET_BITS)
+#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
+   STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
 #define STACK_ALLOC_SLABS_CAP 1024
 #define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
@@ -59,6 +61,7 @@ union handle_parts {
struct {
u32 slabindex : STACK_ALLOC_INDEX_BITS;
u32 offset : STACK_ALLOC_OFFSET_BITS;
+   u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
};
 };
 
@@ -136,6 +139,7 @@ static struct stack_record *depot_alloc_stack(unsigned long 
*entries, int size,
stack->size = size;
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
+   stack->handle.valid = 1;
memcpy(stack->entries, entries, size * sizeof(unsigned long));
depot_offset += required_size;
 
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 19/39] mm/kasan, slub: don't disable interrupts when object leaves quarantine

2017-09-14 Thread Andrey Ryabinin
SLUB doesn't require disabled interrupts to call ___cache_free().

Link: 
http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabi...@virtuozzo.com
Signed-off-by: Andrey Ryabinin 
Acked-by: Alexander Potapenko 
Cc: Dmitry Vyukov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit f7376aed6c032aab820fa36806a89e16e353a0d9)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/quarantine.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 65793f150d1f..4852625ff851 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -147,10 +147,14 @@ static void qlink_free(struct qlist_node *qlink, struct 
kmem_cache *cache)
struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
unsigned long flags;
 
-   local_irq_save(flags);
+   if (IS_ENABLED(CONFIG_SLAB))
+   local_irq_save(flags);
+
alloc_info->state = KASAN_STATE_FREE;
___cache_free(cache, object, _THIS_IP_);
-   local_irq_restore(flags);
+
+   if (IS_ENABLED(CONFIG_SLAB))
+   local_irq_restore(flags);
 }
 
 static void qlist_free_all(struct qlist_head *q, struct kmem_cache *cache)
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 09/39] mm, kasan: fix compilation for CONFIG_SLAB

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Add the missing argument to set_track().

Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot 
for SLAB")
Signed-off-by: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Christoph Lameter 
Cc: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Steven Rostedt 
Cc: Joonsoo Kim 
Cc: Konstantin Serebryany 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 0b355ee9bb8bb08b563ef55ecb23a4d743da)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/kasan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 82c8d58b9a7d..8805ce61e9d0 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -475,7 +475,7 @@ void kasan_slab_free(struct kmem_cache *cache, void *object)
struct kasan_alloc_meta *alloc_info =
get_alloc_info(cache, object);
alloc_info->state = KASAN_STATE_FREE;
-   set_track(&free_info->track);
+   set_track(&free_info->track, GFP_NOWAIT);
}
 #endif
 
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 11/39] lib/stackdepot.c: allow the stack trace hash to be zero

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.

Reported-by: Joonsoo Kim 
Signed-off-by: Alexander Potapenko 
Acked-by: Andrey Ryabinin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 4e25769c6ad69b983379578f42581d99a2f9)
Signed-off-by: Andrey Ryabinin 
---
 lib/stackdepot.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 654c9d87e83a..9e0b0315a724 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -210,10 +210,6 @@ depot_stack_handle_t depot_save_stack(struct stack_trace 
*trace,
goto fast_exit;
 
hash = hash_stack(trace->entries, trace->nr_entries);
-   /* Bad luck, we won't store this stack. */
-   if (hash == 0)
-   goto exit;
-
bucket = &stack_table[hash & STACK_HASH_MASK];
 
/*
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 03/39] kasan: various fixes in documentation

2017-09-14 Thread Andrey Ryabinin
From: Andrey Konovalov 

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrey Konovalov 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Alexander Potapenko 
Cc: Konstantin Serebryany 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 0295fd5d570626817d10deadf5a2ad5e49c36a1d)
Signed-off-by: Andrey Ryabinin 
---
 Documentation/kasan.txt | 43 ++-
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
index 82ed25f9d23c..310746718f21 100644
--- a/Documentation/kasan.txt
+++ b/Documentation/kasan.txt
@@ -1,32 +1,31 @@
-Kernel address sanitizer
-
+KernelAddressSanitizer (KASAN)
+==
 
 0. Overview
 ===
 
-Kernel Address sanitizer (KASan) is a dynamic memory error detector. It 
provides
+KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides
 a fast and comprehensive solution for finding use-after-free and out-of-bounds
 bugs.
 
-KASan uses compile-time instrumentation for checking every memory access,
-therefore you will need a gcc version of 4.9.2 or later. KASan could detect out
-of bounds accesses to stack or global variables, but only if gcc 5.0 or later 
was
-used to built the kernel.
+KASAN uses compile-time instrumentation for checking every memory access,
+therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
+required for detection of out-of-bounds accesses to stack or global variables.
 
-Currently KASan is supported only for x86_64 architecture and requires that the
-kernel be built with the SLUB allocator.
+Currently KASAN is supported only for x86_64 architecture and requires the
+kernel to be built with the SLUB allocator.
 
 1. Usage
-=
+
 
 To enable KASAN configure kernel with:
 
  CONFIG_KASAN = y
 
-and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline/inline
-is compiler instrumentation types. The former produces smaller binary the
-latter is 1.1 - 2 times faster. Inline instrumentation requires a gcc version
-of 5.0 or later.
+and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and
+inline are compiler instrumentation types. The former produces smaller binary
+the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
+version 5.0 or later.
 
 Currently KASAN works only with the SLUB memory allocator.
 For better bug detection and nicer report and enable CONFIG_STACKTRACE.
@@ -41,7 +40,7 @@ similar to the following to the respective kernel Makefile:
 KASAN_SANITIZE := n
 
 1.1 Error reports
-==
+=
 
 A typical out of bounds access report looks like this:
 
@@ -118,14 +117,16 @@ Memory state around the buggy address:
  8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==
 
-First sections describe slub object where bad access happened.
-See 'SLUB Debug output' section in Documentation/vm/slub.txt for details.
+The header of the report discribe what kind of bug happened and what kind of
+access caused it. It's followed by the description of the accessed slub object
+(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and
+the description of the accessed memory page.
 
 In the last section the report shows memory state around the accessed address.
-Reading this part requires some more understanding of how KASAN works.
+Reading this part requires some understanding of how KASAN works.
 
-Each 8 bytes of memory are encoded in one shadow byte as accessible,
-partially accessible, freed or they can be part of a redzone.
+The state of each 8 aligned bytes of memory is encoded in one shadow byte.
+Those 8 bytes can be accessible, partially accessible, freed or be a redzone.
 We use the following encoding for each shadow byte: 0 means that all 8 bytes
 of the corresponding memory region are accessible; number N (1 <= N <= 7) means
 that the first N bytes are accessible, and other (8 - N) bytes are not;
@@ -138,7 +139,7 @@ the accessed address is partially accessible.
 
 
 2. Implementation details
-
+=
 
 From a high level, our approach to memory error detection is similar to that
 of kmemcheck: use shadow memory to record whether each byte of memory is safe
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 24/39] x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Dmitry Vyukov has reported unexpected KASAN stackdepot growth:

  https://github.com/google/kasan/issues/36

... which is caused by the APIC handlers not being present in .irqentry.text:

When building with CONFIG_FUNCTION_GRAPH_TRACER=y or CONFIG_KASAN=y, put the
APIC interrupt handlers into the .irqentry.text section. This is needed
because both KASAN and function graph tracer use __irqentry_text_start and
__irqentry_text_end to determine whether a function is an IRQ entry point.

Reported-by: Dmitry Vyukov 
Signed-off-by: Alexander Potapenko 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: aryabi...@virtuozzo.com
Cc: kasan-...@googlegroups.com
Cc: k...@google.com
Cc: rost...@goodmis.org
Link: 
http://lkml.kernel.org/r/1468575763-144889-1-git-send-email-gli...@google.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 469f00231278da68062a809306df0bac95a27507)
Signed-off-by: Andrey Ryabinin 
---
 arch/x86/kernel/entry_64.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index d9f78516a26c..dd755f8037ca 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -960,9 +960,20 @@ apicinterrupt3 \num trace(\sym) smp_trace(\sym)
 .endm
 #endif
 
+/* Make sure APIC interrupt handlers end up in the irqentry section: */
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+# define PUSH_SECTION_IRQENTRY .pushsection .irqentry.text, "ax"
+# define POP_SECTION_IRQENTRY  .popsection
+#else
+# define PUSH_SECTION_IRQENTRY
+# define POP_SECTION_IRQENTRY
+#endif
+
 .macro apicinterrupt num sym do_sym
+PUSH_SECTION_IRQENTRY
 apicinterrupt3 \num \sym \do_sym
 trace_apicinterrupt \num \sym
+POP_SECTION_IRQENTRY
 .endm
 
 #ifdef CONFIG_SMP
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 29/39] kasan: update kasan_global for gcc 7

2017-09-14 Thread Andrey Ryabinin
From: Dmitry Vyukov 

kasan_global struct is part of compiler/runtime ABI.  gcc revision
241983 has added a new field to kasan_global struct.  Update kernel
definition of kasan_global struct to include the new field.

Without this patch KASAN is broken with gcc 7.

Link: 
http://lkml.kernel.org/r/1479219743-28682-1-git-send-email-dvyu...@google.com
Signed-off-by: Dmitry Vyukov 
Acked-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: [4.0+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 045d599a286bc01daa3510d59272440a17b23c2e)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/compiler-gcc.h | 4 +++-
 mm/kasan/kasan.h | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 8efb40e61d6e..0f61ed053056 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -231,7 +231,9 @@
 #endif
 #endif /* CONFIG_ARCH_USE_BUILTIN_BSWAP */
 
-#if GCC_VERSION >= 5
+#if GCC_VERSION >= 7
+#define KASAN_ABI_VERSION 5
+#elif GCC_VERSION >= 5
 #define KASAN_ABI_VERSION 4
 #elif GCC_VERSION >= 40902
 #define KASAN_ABI_VERSION 3
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index ddce58734098..b0ae78fa79e5 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -54,6 +54,9 @@ struct kasan_global {
 #if KASAN_ABI_VERSION >= 4
struct kasan_source_location *location;
 #endif
+#if KASAN_ABI_VERSION >= 5
+   char *odr_indicator;
+#endif
 };
 
 /**
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 13/39] mm, kasan: don't call kasan_krealloc() from ksize().

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Instead of calling kasan_krealloc(), which replaces the memory
allocation stack ID (if stack depot is used), just unpoison the whole
memory chunk.

Signed-off-by: Alexander Potapenko 
Acked-by: Andrey Ryabinin 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Christoph Lameter 
Cc: Konstantin Serebryany 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 4ebb31a42ffa03912447fe1aabbdb28242f909ba)
Signed-off-by: Andrey Ryabinin 
---
 mm/slab.c | 2 +-
 mm/slub.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index ba35acc00df1..7f5b2a30c9aa 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4590,7 +4590,7 @@ size_t ksize(const void *objp)
/* We assume that ksize callers could use the whole allocated area,
 * so we need to unpoison this area.
 */
-   kasan_krealloc(objp, size, GFP_NOWAIT);
+   kasan_unpoison_shadow(objp, size);
 
return size;
 }
diff --git a/mm/slub.c b/mm/slub.c
index e32920fa85d1..2a97b19b8855 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3591,8 +3591,9 @@ size_t ksize(const void *object)
 {
size_t size = __ksize(object);
/* We assume that ksize callers could use whole allocated area,
-  so we need unpoison this area. */
-   kasan_krealloc(object, size, GFP_NOWAIT);
+* so we need to unpoison this area.
+*/
+   kasan_unpoison_shadow(object, size);
return size;
 }
 EXPORT_SYMBOL(ksize);
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 10/39] mm: kasan: initial memory quarantine implementation

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Quarantine isolates freed objects in a separate queue.  The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

When the object is freed, its state changes from KASAN_STATE_ALLOC to
KASAN_STATE_QUARANTINE.  The object is poisoned and put into quarantine
instead of being returned to the allocator, therefore every subsequent
access to that object triggers a KASAN error, and the error handler is
able to say where the object has been allocated and deallocated.

When it's time for the object to leave quarantine, its state becomes
KASAN_STATE_FREE and it's returned to the allocator.  From now on the
allocator may reuse it for another allocation.  Before that happens,
it's still possible to detect a use-after free on that object (it
retains the allocation/deallocation stacks).

When the allocator reuses this object, the shadow is unpoisoned and old
allocation/deallocation stacks are wiped.  Therefore a use of this
object, even an incorrect one, won't trigger ASan warning.

Without the quarantine, it's not guaranteed that the objects aren't
reused immediately, that's why the probability of catching a
use-after-free is lower than with quarantine in place.

Quarantine isolates freed objects in a separate queue.  The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

Freed objects are first added to per-cpu quarantine queues.  When a
cache is destroyed or memory shrinking is requested, the objects are
moved into the global quarantine queue.  Whenever a kmalloc call allows
memory reclaiming, the oldest objects are popped out of the global queue
until the total size of objects in quarantine is less than 3/4 of the
maximum quarantine size (which is a fraction of installed physical
memory).

As long as an object remains in the quarantine, KASAN is able to report
accesses to it, so the chance of reporting a use-after-free is
increased.  Once the object leaves quarantine, the allocator may reuse
it, in which case the object is unpoisoned and KASAN can't detect
incorrect accesses to it.

Right now quarantine support is only enabled in SLAB allocator.
Unification of KASAN features in SLAB and SLUB will be done later.

This patch is based on the "mm: kasan: quarantine" patch originally
prepared by Dmitry Chernenkov.  A number of improvements have been
suggested by Andrey Ryabinin.

[gli...@google.com: v9]
  Link: 
http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-gli...@google.com
Signed-off-by: Alexander Potapenko 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Steven Rostedt 
Cc: Konstantin Serebryany 
Cc: Dmitry Chernenkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 55834c59098d0c5a97b0f3247e55832b67facdcf)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/kasan.h |  13 ++-
 mm/kasan/Makefile |   2 +
 mm/kasan/kasan.c  |  57 --
 mm/kasan/kasan.h  |  21 +++-
 mm/kasan/quarantine.c | 291 ++
 mm/kasan/report.c |   1 +
 mm/mempool.c  |   2 +-
 mm/slab.c |  12 ++-
 mm/slab.h |   1 +
 mm/slab_common.c  |   3 +
 10 files changed, 388 insertions(+), 15 deletions(-)
 create mode 100644 mm/kasan/quarantine.c

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index ab45598049da..9ab426991c4e 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,6 +40,8 @@ void kasan_free_pages(struct page *page, unsigned int order);
 
 void kasan_cache_create(struct kmem_cache *cache, size_t *size,
unsigned long *flags);
+void kasan_cache_shrink(struct kmem_cache *cache);
+void kasan_cache_destroy(struct kmem_cache *cache);
 
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
@@ -53,7 +55,8 @@ void kasan_kmalloc(struct kmem_cache *s, const void *object, 
size_t size,
 void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
 
 void kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
-void kasan_slab_free(struct kmem_cache *s, void *object);
+bool kasan_slab_free(struct kmem_cache *s, void *object);
+void kasan_poison_slab_free(struct kmem_cache *s, void *object);
 
 struct kasan_cache {
int alloc_meta_offset;
@@ -76,6 +79,8 @@ static inline void kasan_free_pages(struct page *page, 
unsigned int order) {}
 static inline void kasan_cache_create(struct kmem_cache *cache,
  size_t *size,
  unsigned long *flags) {}
+static inline void kasan_cache_shrink(struct kmem_cache *cache) {}
+static inline void kasan_cache_destroy(struct kmem_cache *cache) {}
 
 static inline void kasan_poison_slab(struct page *page) {}
 static inline void kasan_

[Devel] [PATCH rh7 17/39] lib/stackdepot.c: use __GFP_NOWARN for stack allocations

2017-09-14 Thread Andrey Ryabinin
From: "Kirill A. Shutemov" 

This (large, atomic) allocation attempt can fail.  We expect and handle
that, so avoid the scary warning.

Link: http://lkml.kernel.org/r/20160720151905.gb19...@node.shutemov.name
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Michal Hocko 
Cc: Rik van Riel 
Cc: David Rientjes 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 87cc271d5e4320d705cfdf59f68d4d037b3511b2)
Signed-off-by: Andrey Ryabinin 
---
 lib/stackdepot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 53ad6c0831ae..60f77f1d470a 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -242,6 +242,7 @@ depot_stack_handle_t depot_save_stack(struct stack_trace 
*trace,
 */
alloc_flags &= ~GFP_ZONEMASK;
alloc_flags &= (GFP_ATOMIC | GFP_KERNEL);
+   alloc_flags |= __GFP_NOWARN;
page = alloc_pages(alloc_flags, STACK_ALLOC_ORDER);
if (page)
prealloc = page_address(page);
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 08/39] mm, kasan: stackdepot implementation. Enable stackdepot for SLAB

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks.  The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator.  Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[a...@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabi...@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko 
Signed-off-by: Andrey Ryabinin 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Steven Rostedt 
Cc: Konstantin Serebryany 
Cc: Dmitry Chernenkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit cd11016e5f5212c13c0cec7384a525edc93b4921)
Signed-off-by: Andrey Ryabinin 
---
 arch/x86/kernel/Makefile   |   1 +
 include/linux/stackdepot.h |  32 +
 lib/Kconfig|   4 +
 lib/Kconfig.kasan  |   1 +
 lib/Makefile   |   4 +
 lib/stackdepot.c   | 284 +
 mm/kasan/kasan.c   |  55 -
 mm/kasan/kasan.h   |  11 +-
 mm/kasan/report.c  |  12 +-
 9 files changed, 392 insertions(+), 12 deletions(-)
 create mode 100644 include/linux/stackdepot.h
 create mode 100644 lib/stackdepot.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2a23dc9eda7a..a6981d800222 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -25,6 +25,7 @@ OBJECT_FILES_NON_STANDARD_entry_$(BITS).o := y
 KASAN_SANITIZE_head$(BITS).o := n
 KASAN_SANITIZE_dumpstack.o := n
 KASAN_SANITIZE_dumpstack_$(BITS).o := n
+KASAN_SANITIZE_stacktrace.o := n
 
 # If instrumentation of this dir is enabled, boot hangs during first second.
 # Probably could be more selective here, but note that files related to irqs,
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
new file mode 100644
index ..7978b3e2c1e1
--- /dev/null
+++ b/include/linux/stackdepot.h
@@ -0,0 +1,32 @@
+/*
+ * A generic stack depot implementation
+ *
+ * Author: Alexander Potapenko 
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * Based on code by Dmitry Chernenkov.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_STACKDEPOT_H
+#define _LINUX_STACKDEPOT_H
+
+typedef u32 depot_stack_handle_t;
+
+struct stack_trace;
+
+depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t flags);
+
+void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 932be006a0ed..4ac97849f562 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -492,4 +492,8 @@ config ARCH_HAS_MMIO_FLUSH
 config PARMAN
tristate
 
+config STACKDEPOT
+   bool
+   select STACKTRACE
+
 endmenu
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 6471d772c243..670504a50612 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -7,6 +7,7 @@ config KASAN
bool "KASan: runtime memory debugger"
depends on SLUB_DEBUG || (SLAB && !DEBUG_SLAB)
select CONSTRUCTORS
+   select STACKDEPOT if SLAB
help
  Enables kernel address sanitizer - runtime memory debugger,
  designed to find out-of-bounds accesses and use-after-free bugs.
diff --git a/lib/Makefile b/lib/Makefile
index c02b909c6239..cfe21bd255b4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -166,7 +166,11 @@ obj-$(CONFIG_SG_POOL) += sg_pool.o
 obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
 obj-$(CONFIG_IRQ_POLL) += irq_poll.o
 
+obj-$(CONFIG_STACKDEPOT) += stackdepot.o
+KASAN_SANITIZE_stackdepot.o := n
+
 libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o
+
 $(foreach file, $(libfdt_files), \
$(eval CFLAGS_$(file) = -I$(src)/../scripts/dtc/libfdt))
 lib-$(CONFIG_LIBFDT) += 

[Devel] [PATCH rh7 25/39] kasan: remove the unnecessary WARN_ONCE from quarantine.c

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

It's quite unlikely that the user will so little memory that the per-CPU
quarantines won't fit into the given fraction of the available memory.
Even in that case he won't be able to do anything with the information
given in the warning.

Link: 
http://lkml.kernel.org/r/1470929182-101413-1-git-send-email-gli...@google.com
Signed-off-by: Alexander Potapenko 
Acked-by: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Andrey Konovalov 
Cc: Christoph Lameter 
Cc: Joonsoo Kim 
Cc: Kuthonuzo Luruo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit bcbf0d566b6e59a6e873bfe415cc415111a819e2)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/quarantine.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index b6728a33a4ac..baabaad4a4aa 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -217,11 +217,8 @@ void quarantine_reduce(void)
new_quarantine_size = (READ_ONCE(totalram_pages) << PAGE_SHIFT) /
QUARANTINE_FRACTION;
percpu_quarantines = QUARANTINE_PERCPU_SIZE * num_online_cpus();
-   if (WARN_ONCE(new_quarantine_size < percpu_quarantines,
-   "Too little memory, disabling global KASAN quarantine.\n"))
-   new_quarantine_size = 0;
-   else
-   new_quarantine_size -= percpu_quarantines;
+   new_quarantine_size = (new_quarantine_size < percpu_quarantines) ?
+   0 : new_quarantine_size - percpu_quarantines;
WRITE_ONCE(quarantine_size, new_quarantine_size);
 
last = global_quarantine.head;
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 16/39] mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

For KASAN builds:
 - switch SLUB allocator to using stackdepot instead of storing the
   allocation/deallocation stacks in the objects;
 - change the freelist hook so that parts of the freelist can be put
   into the quarantine.

[aryabi...@virtuozzo.com: fixes]
  Link: 
http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabi...@virtuozzo.com
Link: 
http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-gli...@google.com
Signed-off-by: Alexander Potapenko 
Cc: Andrey Konovalov 
Cc: Christoph Lameter 
Cc: Dmitry Vyukov 
Cc: Steven Rostedt (Red Hat) 
Cc: Joonsoo Kim 
Cc: Kostya Serebryany 
Cc: Andrey Ryabinin 
Cc: Kuthonuzo Luruo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 80a9201a5965f4715d5c09790862e0df84ce0614)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/kasan.h|  4 +++
 include/linux/slab_def.h |  3 ++-
 include/linux/slub_def.h |  4 +++
 lib/Kconfig.kasan|  4 +--
 mm/kasan/Makefile|  4 +--
 mm/kasan/kasan.c | 63 
 mm/kasan/kasan.h |  3 +--
 mm/kasan/report.c|  8 +++---
 mm/slub.c| 60 +++--
 9 files changed, 96 insertions(+), 57 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 9ab426991c4e..1122a7ff724b 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -66,6 +66,8 @@ struct kasan_cache {
 int kasan_module_alloc(void *addr, size_t size);
 void kasan_free_shadow(const struct vm_struct *vm);
 
+size_t kasan_metadata_size(struct kmem_cache *cache);
+
 #else /* CONFIG_KASAN */
 
 static inline void kasan_unpoison_shadow(const void *address, size_t size) {}
@@ -107,6 +109,8 @@ static inline void kasan_poison_slab_free(struct kmem_cache 
*s, void *object) {}
 static inline int kasan_module_alloc(void *addr, size_t size) { return 0; }
 static inline void kasan_free_shadow(const struct vm_struct *vm) {}
 
+static inline size_t kasan_metadata_size(struct kmem_cache *cache) { return 0; 
}
+
 #endif /* CONFIG_KASAN */
 
 #endif /* LINUX_KASAN_H */
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 13c72b34c6f4..b2e694e3db4d 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -94,7 +94,8 @@ struct kmem_cache {
 };
 
 static inline void *nearest_obj(struct kmem_cache *cache, struct page *page,
-   void *x) {
+   void *x)
+{
void *object = x - (x - page->s_mem) % cache->size;
void *last_object = page->s_mem + (cache->num - 1) * cache->size;
 
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 7188ba07139e..919acd6ed29d 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -98,6 +98,10 @@ struct kmem_cache {
 */
int remote_node_defrag_ratio;
 #endif
+#ifdef CONFIG_KASAN
+   struct kasan_cache kasan_info;
+#endif
+
struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 670504a50612..da48f37ad788 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -5,9 +5,9 @@ if HAVE_ARCH_KASAN
 
 config KASAN
bool "KASan: runtime memory debugger"
-   depends on SLUB_DEBUG || (SLAB && !DEBUG_SLAB)
+   depends on SLUB || (SLAB && !DEBUG_SLAB)
select CONSTRUCTORS
-   select STACKDEPOT if SLAB
+   select STACKDEPOT
help
  Enables kernel address sanitizer - runtime memory debugger,
  designed to find out-of-bounds accesses and use-after-free bugs.
diff --git a/mm/kasan/Makefile b/mm/kasan/Makefile
index 7096981108a6..ac9cc9665e57 100644
--- a/mm/kasan/Makefile
+++ b/mm/kasan/Makefile
@@ -7,6 +7,4 @@ CFLAGS_REMOVE_kasan.o = -pg
 # see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63533
 CFLAGS_kasan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
 
-obj-y := kasan.o report.o
-obj-$(CONFIG_SLAB) += quarantine.o
-
+obj-y := kasan.o report.o quarantine.o
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 014897fe6f06..8a57f22560a4 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -312,7 +312,6 @@ void kasan_free_pages(struct page *page, unsigned int order)
KASAN_FREE_PAGE);
 }
 
-#ifdef CONFIG_SLAB
 /*
  * Adaptive redzone policy taken from the userspace AddressSanitizer runtime.
  * For larger allocations larger redzones are used.
@@ -334,16 +333,8 @@ void kasan_cache_create(struct kmem_cache *cache, size_t 
*size,
unsigned long *flags)
 {
int redzone_adjust;
-   /* Make sure the adjusted size is still less than
-* KMALLOC_MAX_CACHE_SIZE.
-* TODO: this check is only useful for SLAB, but not SLUB. We'll need
-* to skip it for SLUB when it starts using kasan_cache_create().
-*/
-   if (*size > KMALLOC_MAX_CACHE_

[Devel] [PATCH rh7 27/39] kcov: do not instrument lib/stackdepot.c

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

There's no point in collecting coverage from lib/stackdepot.c, as it is
not a function of syscall inputs.  Disabling kcov instrumentation for that
file will reduce the coverage noise level.

Link: 
http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-gli...@google.com
Signed-off-by: Alexander Potapenko 
Acked-by: Dmitry Vyukov 
Cc: Kostya Serebryany 
Cc: Andrey Konovalov 
Cc: syzkaller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 65deb8af76defeae4b114a75242ed15b0bcba173)
Signed-off-by: Andrey Ryabinin 
---
 lib/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/Makefile b/lib/Makefile
index cfe21bd255b4..9b8233e61ee0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -168,6 +168,7 @@ obj-$(CONFIG_IRQ_POLL) += irq_poll.o
 
 obj-$(CONFIG_STACKDEPOT) += stackdepot.o
 KASAN_SANITIZE_stackdepot.o := n
+KCOV_INSTRUMENT_stackdepot.o := n
 
 libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o
 
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 01/39] kasan: show gcc version requirements in Kconfig and Documentation

2017-09-14 Thread Andrey Ryabinin
From: Joe Perches 

The documentation shows a need for gcc > 4.9.2, but it's really >=.  The
Kconfig entries don't show require versions so add them.  Correct a
latter/later typo too.  Also mention that gcc 5 required to catch out of
bounds accesses to global and stack variables.

Signed-off-by: Joe Perches 
Signed-off-by: Andrey Ryabinin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 01e76903f655a4d88c2e09d3182436c65f6e1213)
Signed-off-by: Andrey Ryabinin 
---
 Documentation/kasan.txt | 8 +---
 lib/Kconfig.kasan   | 8 ++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
index ee36ef1a64c0..67e62ed6a198 100644
--- a/Documentation/kasan.txt
+++ b/Documentation/kasan.txt
@@ -9,7 +9,9 @@ a fast and comprehensive solution for finding use-after-free 
and out-of-bounds
 bugs.
 
 KASan uses compile-time instrumentation for checking every memory access,
-therefore you will need a certain version of GCC > 4.9.2
+therefore you will need a gcc version of 4.9.2 or later. KASan could detect out
+of bounds accesses to stack or global variables, but only if gcc 5.0 or later 
was
+used to built the kernel.
 
 Currently KASan is supported only for x86_64 architecture and requires that the
 kernel be built with the SLUB allocator.
@@ -23,8 +25,8 @@ To enable KASAN configure kernel with:
 
 and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline/inline
 is compiler instrumentation types. The former produces smaller binary the
-latter is 1.1 - 2 times faster. Inline instrumentation requires GCC 5.0 or
-latter.
+latter is 1.1 - 2 times faster. Inline instrumentation requires a gcc version
+of 5.0 or later.
 
 Currently KASAN works only with the SLUB memory allocator.
 For better bug detection and nicer report and enable CONFIG_STACKTRACE.
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 4fecaedc80a2..777eda7d1ab4 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -10,8 +10,11 @@ config KASAN
help
  Enables kernel address sanitizer - runtime memory debugger,
  designed to find out-of-bounds accesses and use-after-free bugs.
- This is strictly debugging feature. It consumes about 1/8
- of available memory and brings about ~x3 performance slowdown.
+ This is strictly a debugging feature and it requires a gcc version
+ of 4.9.2 or later. Detection of out of bounds accesses to stack or
+ global variables requires gcc 5.0 or later.
+ This feature consumes about 1/8 of available memory and brings about
+ ~x3 performance slowdown.
  For better error detection enable CONFIG_STACKTRACE,
  and add slub_debug=U to boot cmdline.
 
@@ -40,6 +43,7 @@ config KASAN_INLINE
  memory accesses. This is faster than outline (in some workloads
  it gives about x2 boost over outline instrumentation), but
  make kernel's .text size much bigger.
+ This requires a gcc version of 5.0 or later.
 
 endchoice
 
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 15/39] kasan/quarantine: fix bugs on qlist_move_cache()

2017-09-14 Thread Andrey Ryabinin
From: Joonsoo Kim 

There are two bugs on qlist_move_cache().  One is that qlist's tail
isn't set properly.  curr->next can be NULL since it is singly linked
list and NULL value on tail is invalid if there is one item on qlist.
Another one is that if cache is matched, qlist_put() is called and it
will set curr->next to NULL.  It would cause to stop the loop
prematurely.

These problems come from complicated implementation so I'd like to
re-implement it completely.  Implementation in this patch is really
simple.  Iterate all qlist_nodes and put them to appropriate list.

Unfortunately, I got this bug sometime ago and lose oops message.  But,
the bug looks trivial and no need to attach oops.

Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Link: 
http://lkml.kernel.org/r/1467766348-22419-1-git-send-email-iamjoonsoo@lge.com
Signed-off-by: Joonsoo Kim 
Reviewed-by: Dmitry Vyukov 
Acked-by: Andrey Ryabinin 
Acked-by: Alexander Potapenko 
Cc: Kuthonuzo Luruo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 0ab686d8c8303069e80300663b3be6201a8697fb)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/quarantine.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 4973505a9bdd..65793f150d1f 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -238,30 +238,23 @@ static void qlist_move_cache(struct qlist_head *from,
   struct qlist_head *to,
   struct kmem_cache *cache)
 {
-   struct qlist_node *prev = NULL, *curr;
+   struct qlist_node *curr;
 
if (unlikely(qlist_empty(from)))
return;
 
curr = from->head;
+   qlist_init(from);
while (curr) {
-   struct qlist_node *qlink = curr;
-   struct kmem_cache *obj_cache = qlink_to_cache(qlink);
-
-   if (obj_cache == cache) {
-   if (unlikely(from->head == qlink)) {
-   from->head = curr->next;
-   prev = curr;
-   } else
-   prev->next = curr->next;
-   if (unlikely(from->tail == qlink))
-   from->tail = curr->next;
-   from->bytes -= cache->size;
-   qlist_put(to, qlink, cache->size);
-   } else {
-   prev = curr;
-   }
-   curr = curr->next;
+   struct qlist_node *next = curr->next;
+   struct kmem_cache *obj_cache = qlink_to_cache(curr);
+
+   if (obj_cache == cache)
+   qlist_put(to, curr, obj_cache->size);
+   else
+   qlist_put(from, curr, obj_cache->size);
+
+   curr = next;
}
 }
 
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 26/39] mm, mempolicy: task->mempolicy must be NULL before dropping final reference

2017-09-14 Thread Andrey Ryabinin
From: David Rientjes 

KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current->mempolicy through any
number of allocation functions.  It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:

BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr 
88010b48102c
CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
...
Call Trace:
dump_stack
kasan_object_err
kasan_report_error
__asan_report_load2_noabort
alloc_pages_current <-- use after free
depot_save_stack
save_stack
kasan_slab_free
kmem_cache_free
__mpol_put  <-- free
do_exit

This patch sets current->mempolicy to NULL before dropping the final
reference.

Link: 
http://lkml.kernel.org/r/alpine.deb.2.10.1608301442180.63...@chino.kir.corp.google.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot 
for SLAB")
Signed-off-by: David Rientjes 
Reported-by: Vegard Nossum 
Acked-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Cc: [4.6+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit c11600e4fed67ae4cd6a8096936afd445410e8ed)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/mempolicy.h |  4 
 kernel/exit.c |  7 +--
 mm/mempolicy.c| 17 +
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 7f26526c488b..7e47465520f4 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -196,6 +196,7 @@ static inline int vma_migratable(struct vm_area_struct *vma)
 }
 
 extern int mpol_misplaced(struct page *, struct vm_area_struct *, unsigned 
long);
+extern void mpol_put_task_policy(struct task_struct *);
 
 #else
 
@@ -320,5 +321,8 @@ static inline int mpol_misplaced(struct page *page, struct 
vm_area_struct *vma,
return -1; /* no node preference */
 }
 
+static inline void mpol_put_task_policy(struct task_struct *task)
+{
+}
 #endif /* CONFIG_NUMA */
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 668cacf375d2..32b7ba21d203 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -866,12 +866,7 @@ void do_exit(long code)
ptrace_put_breakpoints(tsk);
 
exit_notify(tsk, group_dead);
-#ifdef CONFIG_NUMA
-   task_lock(tsk);
-   mpol_put(tsk->mempolicy);
-   tsk->mempolicy = NULL;
-   task_unlock(tsk);
-#endif
+   mpol_put_task_policy(tsk);
 #ifdef CONFIG_FUTEX
if (unlikely(current->pi_state_cache))
kfree(current->pi_state_cache);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9b7800695b72..a2e2422f63c7 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2377,6 +2377,23 @@ out:
return ret;
 }
 
+/*
+ * Drop the (possibly final) reference to task->mempolicy.  It needs to be
+ * dropped after task->mempolicy is set to NULL so that any allocation done as
+ * part of its kmem_cache_free(), such as by KASAN, doesn't reference a freed
+ * policy.
+ */
+void mpol_put_task_policy(struct task_struct *task)
+{
+   struct mempolicy *pol;
+
+   task_lock(task);
+   pol = task->mempolicy;
+   task->mempolicy = NULL;
+   task_unlock(task);
+   mpol_put(pol);
+}
+
 static void sp_delete(struct shared_policy *sp, struct sp_node *n)
 {
pr_debug("deleting %lx-l%lx\n", n->start, n->end);
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 30/39] kasan: eliminate long stalls during quarantine reduction

2017-09-14 Thread Andrey Ryabinin
From: Dmitry Vyukov 

Currently we dedicate 1/32 of RAM for quarantine and then reduce it by
1/4 of total quarantine size.  This can be a significant amount of
memory.  For example, with 4GB of RAM total quarantine size is 128MB and
it is reduced by 32MB at a time.  With 128GB of RAM total quarantine
size is 4GB and it is reduced by 1GB.  This leads to several problems:

 - freeing 1GB can take tens of seconds, causes rcu stall warnings and
   just introduces unexpected long delays at random places
 - if kmalloc() is called under a mutex, other threads stall on that
   mutex while a thread reduces quarantine
 - threads wait on quarantine_lock while one thread grabs a large batch
   of objects to evict
 - we walk the uncached list of object to free twice which makes all of
   the above worse
 - when a thread frees objects, they are already not accounted against
   global_quarantine.bytes; as the result we can have quarantine_size
   bytes in quarantine + unbounded amount of memory in large batches in
   threads that are in process of freeing

Reduce size of quarantine in smaller batches to reduce the delays.  The
only reason to reduce it in batches is amortization of overheads, the
new batch size of 1MB should be well enough to amortize spinlock
lock/unlock and few function calls.

Plus organize quarantine as a FIFO array of batches.  This allows to not
walk the list in quarantine_reduce() under quarantine_lock, which in
turn reduces contention and is just faster.

This improves performance of heavy load (syzkaller fuzzing) by ~20% with
4 CPUs and 32GB of RAM.  Also this eliminates frequent (every 5 sec)
drops of CPU consumption from ~400% to ~100% (one thread reduces
quarantine while others are waiting on a mutex).

Some reference numbers:
1. Machine with 4 CPUs and 4GB of memory. Quarantine size 128MB.
   Currently we free 32MB at at time.
   With new code we free 1MB at a time (1024 batches, ~128 are used).
2. Machine with 32 CPUs and 128GB of memory. Quarantine size 4GB.
   Currently we free 1GB at at time.
   With new code we free 8MB at a time (1024 batches, ~512 are used).
3. Machine with 4096 CPUs and 1TB of memory. Quarantine size 32GB.
   Currently we free 8GB at at time.
   With new code we free 4MB at a time (16K batches, ~8K are used).

Link: 
http://lkml.kernel.org/r/1478756952-18695-1-git-send-email-dvyu...@google.com
Signed-off-by: Dmitry Vyukov 
Cc: Eric Dumazet 
Cc: Greg Thelen 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Andrey Konovalov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 64abdcb24351a27bed6e2b6a3c27348fe532c73f)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/quarantine.c | 94 ++-
 1 file changed, 48 insertions(+), 46 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index baabaad4a4aa..dae929c02bbb 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -86,24 +86,9 @@ static void qlist_move_all(struct qlist_head *from, struct 
qlist_head *to)
qlist_init(from);
 }
 
-static void qlist_move(struct qlist_head *from, struct qlist_node *last,
-   struct qlist_head *to, size_t size)
-{
-   if (unlikely(last == from->tail)) {
-   qlist_move_all(from, to);
-   return;
-   }
-   if (qlist_empty(to))
-   to->head = from->head;
-   else
-   to->tail->next = from->head;
-   to->tail = last;
-   from->head = last->next;
-   last->next = NULL;
-   from->bytes -= size;
-   to->bytes += size;
-}
-
+#define QUARANTINE_PERCPU_SIZE (1 << 20)
+#define QUARANTINE_BATCHES \
+   (1024 > 4 * CONFIG_NR_CPUS ? 1024 : 4 * CONFIG_NR_CPUS)
 
 /*
  * The object quarantine consists of per-cpu queues and a global queue,
@@ -111,11 +96,22 @@ static void qlist_move(struct qlist_head *from, struct 
qlist_node *last,
  */
 static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine);
 
-static struct qlist_head global_quarantine;
+/* Round-robin FIFO array of batches. */
+static struct qlist_head global_quarantine[QUARANTINE_BATCHES];
+static int quarantine_head;
+static int quarantine_tail;
+/* Total size of all objects in global_quarantine across all batches. */
+static unsigned long quarantine_size;
 static DEFINE_SPINLOCK(quarantine_lock);
 
 /* Maximum size of the global queue. */
-static unsigned long quarantine_size;
+static unsigned long quarantine_max_size;
+
+/*
+ * Target size of a batch in global_quarantine.
+ * Usually equal to QUARANTINE_PERCPU_SIZE unless we have too much RAM.
+ */
+static unsigned long quarantine_batch_size;
 
 /*
  * The fraction of physical memory the quarantine is allowed to occupy.
@@ -124,9 +120,6 @@ static unsigned long quarantine_size;
  */
 #define QUARANTINE_FRACTION 32
 
-#define QUARANTINE_LOW_SIZE (READ_ONCE(quarantine_size) * 3 / 4)
-#define QUARANTINE_PERCPU_SIZE (1 << 20)
-
 static struct kmem_cach

[Devel] [PATCH rh7 21/39] mm/kasan: get rid of ->state in struct kasan_alloc_meta

2017-09-14 Thread Andrey Ryabinin
The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta.  We can get rid of the
latter.  The will save us a little bit of memory.  Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption.  So now we should always know when the last time the
object was freed.  This may be useful for long delayed use-after-free
bugs.

As a side effect this fixes following UBSAN warning:
UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
member access within misaligned address 88000d1efebc for type 
'struct qlist_node'
which requires 8 byte alignment

Link: 
http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabi...@virtuozzo.com
Reported-by: kernel test robot 
Signed-off-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit b3cbd9bf77cd1888114dbee1653e79aa23fd4068)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/kasan.h |  3 +++
 mm/kasan/kasan.c  | 61 +++
 mm/kasan/kasan.h  | 12 ++
 mm/kasan/quarantine.c |  2 --
 mm/kasan/report.c | 23 +--
 mm/slab.c |  1 +
 mm/slub.c |  2 ++
 7 files changed, 41 insertions(+), 63 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 1122a7ff724b..536a400d1d39 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -46,6 +46,7 @@ void kasan_cache_destroy(struct kmem_cache *cache);
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
 void kasan_poison_object_data(struct kmem_cache *cache, void *object);
+void kasan_init_slab_obj(struct kmem_cache *cache, const void *object);
 
 void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
 void kasan_kfree_large(const void *ptr);
@@ -89,6 +90,8 @@ static inline void kasan_unpoison_object_data(struct 
kmem_cache *cache,
void *object) {}
 static inline void kasan_poison_object_data(struct kmem_cache *cache,
void *object) {}
+static inline void kasan_init_slab_obj(struct kmem_cache *cache,
+   const void *object) {}
 
 static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
 static inline void kasan_kfree_large(const void *ptr) {}
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index a8d3e087dad3..7fa1643e83df 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -403,11 +403,6 @@ void kasan_poison_object_data(struct kmem_cache *cache, 
void *object)
kasan_poison_shadow(object,
round_up(cache->object_size, KASAN_SHADOW_SCALE_SIZE),
KASAN_KMALLOC_REDZONE);
-   if (cache->flags & SLAB_KASAN) {
-   struct kasan_alloc_meta *alloc_info =
-   get_alloc_info(cache, object);
-   alloc_info->state = KASAN_STATE_INIT;
-   }
 }
 
 static inline int in_irqentry_text(unsigned long ptr)
@@ -471,6 +466,17 @@ struct kasan_free_meta *get_free_info(struct kmem_cache 
*cache,
return (void *)object + cache->kasan_info.free_meta_offset;
 }
 
+void kasan_init_slab_obj(struct kmem_cache *cache, const void *object)
+{
+   struct kasan_alloc_meta *alloc_info;
+
+   if (!(cache->flags & SLAB_KASAN))
+   return;
+
+   alloc_info = get_alloc_info(cache, object);
+   __memset(alloc_info, 0, sizeof(*alloc_info));
+}
+
 void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
 {
kasan_kmalloc(cache, object, cache->object_size, flags);
@@ -490,34 +496,27 @@ void kasan_poison_slab_free(struct kmem_cache *cache, 
void *object)
 
 bool kasan_slab_free(struct kmem_cache *cache, void *object)
 {
+   s8 shadow_byte;
+
/* RCU slabs could be legally used after free within the RCU period */
if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU))
return false;
 
-   if (likely(cache->flags & SLAB_KASAN)) {
-   struct kasan_alloc_meta *alloc_info;
-   struct kasan_free_meta *free_info;
+   shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
+   if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) {
+   pr_err("Double free");
+   dump_stack();
+   return true;
+   }
 
-   alloc_info = get_alloc_info(cache, object);
-   free_info = get_free_info(cache, object);
+   kasan_poison_slab_free(cache, object);
 
-   switch (alloc_info->state) {
-   case KASAN_STATE_ALLOC:
-   alloc_info->state = KASAN_STATE_QUARANTINE;
-  

[Devel] [PATCH rh7 22/39] kasan: improve double-free reports

2017-09-14 Thread Andrey Ryabinin
Currently we just dump stack in case of double free bug.
Let's dump all info about the object that we have.

[aryabi...@virtuozzo.com: change double free message per Alexander]
  Link: 
http://lkml.kernel.org/r/1470153654-30160-1-git-send-email-aryabi...@virtuozzo.com
Link: 
http://lkml.kernel.org/r/1470062715-14077-6-git-send-email-aryabi...@virtuozzo.com
Signed-off-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 7e088978933ee186533355ae03a9dc1de99cf6c7)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/kasan.c  |  3 +--
 mm/kasan/kasan.h  |  2 ++
 mm/kasan/report.c | 51 ++-
 3 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 7fa1643e83df..8f350a2edcb6 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -504,8 +504,7 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object)
 
shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) {
-   pr_err("Double free");
-   dump_stack();
+   kasan_report_double_free(cache, object, shadow_byte);
return true;
}
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index e4c0e91524b1..ddce58734098 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -100,6 +100,8 @@ static inline bool kasan_enabled(void)
 
 void kasan_report(unsigned long addr, size_t size,
bool is_write, unsigned long ip);
+void kasan_report_double_free(struct kmem_cache *cache, void *object,
+   s8 shadow);
 
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB)
 void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache);
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 94bb359fd0f3..cbd7f6e50cc1 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -98,6 +98,26 @@ static inline bool init_task_stack_addr(const void *addr)
sizeof(init_thread_union.stack));
 }
 
+static DEFINE_SPINLOCK(report_lock);
+
+static void kasan_start_report(unsigned long *flags)
+{
+   /*
+* Make sure we don't end up in loop.
+*/
+   kasan_disable_current();
+   spin_lock_irqsave(&report_lock, *flags);
+   
pr_err("==\n");
+}
+
+static void kasan_end_report(unsigned long *flags)
+{
+   
pr_err("==\n");
+   add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+   spin_unlock_irqrestore(&report_lock, *flags);
+   kasan_enable_current();
+}
+
 static void print_track(struct kasan_track *track)
 {
pr_err("PID = %u\n", track->pid);
@@ -111,8 +131,7 @@ static void print_track(struct kasan_track *track)
}
 }
 
-static void kasan_object_err(struct kmem_cache *cache, struct page *page,
-   void *object, char *unused_reason)
+static void kasan_object_err(struct kmem_cache *cache, void *object)
 {
struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
 
@@ -129,6 +148,18 @@ static void kasan_object_err(struct kmem_cache *cache, 
struct page *page,
print_track(&alloc_info->free_track);
 }
 
+void kasan_report_double_free(struct kmem_cache *cache, void *object,
+   s8 shadow)
+{
+   unsigned long flags;
+
+   kasan_start_report(&flags);
+   pr_err("BUG: Double free or freeing an invalid pointer\n");
+   pr_err("Unexpected shadow byte: 0x%hhX\n", shadow);
+   kasan_object_err(cache, object);
+   kasan_end_report(&flags);
+}
+
 static void print_address_description(struct kasan_access_info *info)
 {
const void *addr = info->access_addr;
@@ -142,8 +173,7 @@ static void print_address_description(struct 
kasan_access_info *info)
struct kmem_cache *cache = page->slab_cache;
object = nearest_obj(cache, page,
(void *)info->access_addr);
-   kasan_object_err(cache, page, object,
-   "kasan: bad access detected");
+   kasan_object_err(cache, object);
return;
}
dump_page(page, "kasan: bad access detected");
@@ -204,16 +234,13 @@ static void print_shadow_for_address(const void *addr)
}
 }
 
-static DEFINE_SPINLOCK(report_lock);
-
 static void kasan_report_error(struct kasan_access_info *info)
 {
unsigned long flags;
const char *bug_type;
 
-   spin_lock_irqsave(&report_lock, flags);
-   pr_err("="
-   "=\n");
+   kasan_start_report(&flags);
+

[Devel] [PATCH rh7 07/39] arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to  so that the
users don't need to pull in .  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko 
Acked-by: Steven Rostedt 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Konstantin Serebryany 
Cc: Dmitry Chernenkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit be7635e7287e0e8013af3c89a6354a9e0182594c)
Signed-off-by: Andrey Ryabinin 
---
 arch/arm/include/asm/exception.h |  2 +-
 arch/arm/kernel/vmlinux.lds.S|  1 +
 arch/arm64/kernel/vmlinux.lds.S  |  1 +
 arch/blackfin/kernel/vmlinux.lds.S   |  1 +
 arch/c6x/kernel/vmlinux.lds.S|  1 +
 arch/metag/kernel/vmlinux.lds.S  |  1 +
 arch/microblaze/kernel/vmlinux.lds.S |  1 +
 arch/mips/kernel/vmlinux.lds.S   |  1 +
 arch/openrisc/kernel/vmlinux.lds.S   |  1 +
 arch/parisc/kernel/vmlinux.lds.S |  1 +
 arch/powerpc/kernel/vmlinux.lds.S|  1 +
 arch/s390/kernel/vmlinux.lds.S   |  1 +
 arch/sh/kernel/vmlinux.lds.S |  1 +
 arch/sparc/kernel/vmlinux.lds.S  |  1 +
 arch/tile/kernel/vmlinux.lds.S   |  1 +
 arch/x86/kernel/vmlinux.lds.S|  1 +
 include/asm-generic/vmlinux.lds.h| 12 +++-
 include/linux/ftrace.h   | 11 ---
 include/linux/interrupt.h| 20 
 kernel/softirq.c |  2 +-
 kernel/trace/trace_functions_graph.c |  1 +
 21 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/arch/arm/include/asm/exception.h b/arch/arm/include/asm/exception.h
index 5abaf5bbd985..bf1991263d2d 100644
--- a/arch/arm/include/asm/exception.h
+++ b/arch/arm/include/asm/exception.h
@@ -7,7 +7,7 @@
 #ifndef __ASM_ARM_EXCEPTION_H
 #define __ASM_ARM_EXCEPTION_H
 
-#include 
+#include 
 
 #define __exception__attribute__((section(".exception.text")))
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 33f2ea32f5a0..b3428ce67bd0 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -100,6 +100,7 @@ SECTIONS
*(.exception.text)
__exception_text_end = .;
IRQENTRY_TEXT
+   SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
LOCK_TEXT
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3fae2be8b016..96b19d8d264d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ SECTIONS
*(.exception.text)
__exception_text_end = .;
IRQENTRY_TEXT
+   SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
LOCK_TEXT
diff --git a/arch/blackfin/kernel/vmlinux.lds.S 
b/arch/blackfin/kernel/vmlinux.lds.S
index ba35864b2b74..f7f4c3ae3f3e 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -35,6 +35,7 @@ SECTIONS
 #endif
LOCK_TEXT
IRQENTRY_TEXT
+   SOFTIRQENTRY_TEXT
KPROBES_TEXT
 #ifdef CONFIG_ROMKERNEL
__sinittext = .;
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 1d81c4c129ec..5a05a725331f 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -78,6 +78,7 @@ SECTIONS
SCHED_TEXT
LOCK_TEXT
IRQENTRY_TEXT
+   SOFTIRQENTRY_TEXT
KPROBES_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index e12055e88bfe..150ace92c7ad 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -24,6 +24,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+   SOFTIRQENTRY_TEXT
*(.text.*)
*(.gnu.warning)
}
diff --git a/arch/microblaze/kernel/vmlinux.lds.S 
b/arch/microblaze/kernel/vmlinux.lds.S
index 936d01a689d7..f8ee75888d9c 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS {
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
+   SOFTIRQENTRY_TEXT
. = ALIGN (4) ;
_etext = . ;
}
diff --git a/arch/mi

[Devel] [PATCH rh7 23/39] kasan: avoid overflowing quarantine size on low memory systems

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

If the total amount of memory assigned to quarantine is less than the
amount of memory assigned to per-cpu quarantines, |new_quarantine_size|
may overflow.  Instead, set it to zero.

[a...@linux-foundation.org: cleanup: use WARN_ONCE return value]
Link: 
http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-gli...@google.com
Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Signed-off-by: Alexander Potapenko 
Reported-by: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit c3cee372282cb6bcdf19ac1457581d5dd5ecb554)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/quarantine.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 7fd121d13b88..b6728a33a4ac 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -198,7 +198,7 @@ void quarantine_put(struct kasan_free_meta *info, struct 
kmem_cache *cache)
 
 void quarantine_reduce(void)
 {
-   size_t new_quarantine_size;
+   size_t new_quarantine_size, percpu_quarantines;
unsigned long flags;
struct qlist_head to_free = QLIST_INIT;
size_t size_to_free = 0;
@@ -216,7 +216,12 @@ void quarantine_reduce(void)
 */
new_quarantine_size = (READ_ONCE(totalram_pages) << PAGE_SHIFT) /
QUARANTINE_FRACTION;
-   new_quarantine_size -= QUARANTINE_PERCPU_SIZE * num_online_cpus();
+   percpu_quarantines = QUARANTINE_PERCPU_SIZE * num_online_cpus();
+   if (WARN_ONCE(new_quarantine_size < percpu_quarantines,
+   "Too little memory, disabling global KASAN quarantine.\n"))
+   new_quarantine_size = 0;
+   else
+   new_quarantine_size -= percpu_quarantines;
WRITE_ONCE(quarantine_size, new_quarantine_size);
 
last = global_quarantine.head;
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 18/39] mm/kasan: fix corruptions and false positive reports

2017-09-14 Thread Andrey Ryabinin
Once an object is put into quarantine, we no longer own it, i.e.  object
could leave the quarantine and be reallocated.  So having set_track()
call after the quarantine_put() may corrupt slab objects.

 BUG kmalloc-4096 (Not tainted): Poison overwritten
 -
 Disabling lock debugging due to kernel taint
 INFO: 0x8804540de850-0x8804540de857. First byte 0xb5 instead of 0x6b
...
 INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492
  __slab_free+0x1d6/0x2e0
  ___cache_free+0xb6/0xd0
  qlist_free_all+0x83/0x100
  quarantine_reduce+0x177/0x1b0
  kasan_kmalloc+0xf3/0x100
  kasan_slab_alloc+0x12/0x20
  kmem_cache_alloc+0x109/0x3e0
  mmap_region+0x53e/0xe40
  do_mmap+0x70f/0xa50
  vm_mmap_pgoff+0x147/0x1b0
  SyS_mmap_pgoff+0x2c7/0x5b0
  SyS_mmap+0x1b/0x30
  do_syscall_64+0x1a0/0x4e0
  return_from_SYSCALL_64+0x0/0x7a
 INFO: Slab 0xea0011503600 objects=7 used=7 fp=0x  (null) 
flags=0x80004080
 INFO: Object 0x8804540de848 @offset=26696 fp=0x8804540dc588
 Redzone 8804540de840: bb bb bb bb bb bb bb bb  

 Object 8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc  
.R`.

Similarly, poisoning after the quarantine_put() leads to false positive
use-after-free reports:

 BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at 
addr 880405c540a0
 Read of size 8 by task trinity-c0/3036
 CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9
 Call Trace:
   dump_stack+0x68/0x96
   kasan_report_error+0x222/0x600
   __asan_report_load8_noabort+0x61/0x70
   anon_vma_interval_tree_insert+0x304/0x430
   anon_vma_chain_link+0x91/0xd0
   anon_vma_clone+0x136/0x3f0
   anon_vma_fork+0x81/0x4c0
   copy_process.part.47+0x2c43/0x5b20
   _do_fork+0x16d/0xbd0
   SyS_clone+0x19/0x20
   do_syscall_64+0x1a0/0x4e0
   entry_SYSCALL64_slow_path+0x25/0x25

Fix this by putting an object in the quarantine after all other
operations.

Fixes: 80a9201a5965 ("mm, kasan: switch SLUB to stackdepot, enable memory 
quarantine for SLUB")
Link: 
http://lkml.kernel.org/r/1470062715-14077-1-git-send-email-aryabi...@virtuozzo.com
Signed-off-by: Andrey Ryabinin 
Reported-by: Dave Jones 
Reported-by: Vegard Nossum 
Reported-by: Sasha Levin 
Acked-by: Alexander Potapenko 
Cc: Dmitry Vyukov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 4a3d308d6674fabf213bce9c1a661ef43a85e515)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/kasan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 8a57f22560a4..d7c814309c3e 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -504,9 +504,9 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object)
switch (alloc_info->state) {
case KASAN_STATE_ALLOC:
alloc_info->state = KASAN_STATE_QUARANTINE;
-   quarantine_put(free_info, cache);
set_track(&free_info->track, GFP_NOWAIT);
kasan_poison_slab_free(cache, object);
+   quarantine_put(free_info, cache);
return true;
case KASAN_STATE_QUARANTINE:
case KASAN_STATE_FREE:
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 20/39] mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta

2017-09-14 Thread Andrey Ryabinin
Size of slab object already stored in cache->object_size.

Note, that kmalloc() internally rounds up size of allocation, so
object_size may be not equal to alloc_size, but, usually we don't need
to know the exact size of allocated object.  In case if we need that
information, we still can figure it out from the report.  The dump of
shadow memory allows to identify the end of allocated memory, and
thereby the exact allocation size.

Link: 
http://lkml.kernel.org/r/1470062715-14077-4-git-send-email-aryabi...@virtuozzo.com
Signed-off-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 47b5c2a0f021e90a79845d1a1353780e5edd0bce)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/kasan.c  | 1 -
 mm/kasan/kasan.h  | 4 +---
 mm/kasan/report.c | 8 +++-
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index d7c814309c3e..a8d3e087dad3 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -545,7 +545,6 @@ void kasan_kmalloc(struct kmem_cache *cache, const void 
*object, size_t size,
get_alloc_info(cache, object);
 
alloc_info->state = KASAN_STATE_ALLOC;
-   alloc_info->alloc_size = size;
set_track(&alloc_info->track, flags);
}
 }
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 1143e64b6a34..1175fa05f8a6 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -76,9 +76,7 @@ struct kasan_track {
 
 struct kasan_alloc_meta {
struct kasan_track track;
-   u32 state : 2;  /* enum kasan_state */
-   u32 alloc_size : 30;
-   u32 reserved;
+   u32 state;
 };
 
 struct qlist_node {
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index ef85919f4326..45f17623677e 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -118,7 +118,9 @@ static void kasan_object_err(struct kmem_cache *cache, 
struct page *page,
struct kasan_free_meta *free_info;
 
dump_stack();
-   pr_err("Object at %p, in cache %s\n", object, cache->name);
+   pr_err("Object at %p, in cache %s size: %d\n", object, cache->name,
+   cache->object_size);
+
if (!(cache->flags & SLAB_KASAN))
return;
switch (alloc_info->state) {
@@ -126,15 +128,11 @@ static void kasan_object_err(struct kmem_cache *cache, 
struct page *page,
pr_err("Object not allocated yet\n");
break;
case KASAN_STATE_ALLOC:
-   pr_err("Object allocated with size %u bytes.\n",
-  alloc_info->alloc_size);
pr_err("Allocation:\n");
print_track(&alloc_info->track);
break;
case KASAN_STATE_FREE:
case KASAN_STATE_QUARANTINE:
-   pr_err("Object freed, allocated with size %u bytes\n",
-  alloc_info->alloc_size);
free_info = get_free_info(cache, object);
pr_err("Allocation:\n");
print_track(&alloc_info->track);
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 06/39] mm, kasan: add GFP flags to KASAN API

2017-09-14 Thread Andrey Ryabinin
From: Alexander Potapenko 

Add GFP flags to KASAN hooks for future patches to use.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrey Konovalov 
Cc: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Steven Rostedt 
Cc: Konstantin Serebryany 
Cc: Dmitry Chernenkov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 505f5dcb1c419e55a9621a01f83eb5745d8d7398)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/kasan.h | 19 +++
 include/linux/slab.h  |  4 ++--
 mm/kasan/kasan.c  | 15 ---
 mm/mempool.c  | 16 
 mm/slab.c | 15 ---
 mm/slab_common.c  |  4 ++--
 mm/slub.c | 17 +
 7 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index f55c31becdb6..ab45598049da 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -45,13 +45,14 @@ void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
 void kasan_poison_object_data(struct kmem_cache *cache, void *object);
 
-void kasan_kmalloc_large(const void *ptr, size_t size);
+void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
 void kasan_kfree_large(const void *ptr);
 void kasan_kfree(void *ptr);
-void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size);
-void kasan_krealloc(const void *object, size_t new_size);
+void kasan_kmalloc(struct kmem_cache *s, const void *object, size_t size,
+ gfp_t flags);
+void kasan_krealloc(const void *object, size_t new_size, gfp_t flags);
 
-void kasan_slab_alloc(struct kmem_cache *s, void *object);
+void kasan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
 void kasan_slab_free(struct kmem_cache *s, void *object);
 
 struct kasan_cache {
@@ -82,14 +83,16 @@ static inline void kasan_unpoison_object_data(struct 
kmem_cache *cache,
 static inline void kasan_poison_object_data(struct kmem_cache *cache,
void *object) {}
 
-static inline void kasan_kmalloc_large(void *ptr, size_t size) {}
+static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
 static inline void kasan_kfree_large(const void *ptr) {}
 static inline void kasan_kfree(void *ptr) {}
 static inline void kasan_kmalloc(struct kmem_cache *s, const void *object,
-   size_t size) {}
-static inline void kasan_krealloc(const void *object, size_t new_size) {}
+   size_t size, gfp_t flags) {}
+static inline void kasan_krealloc(const void *object, size_t new_size,
+gfp_t flags) {}
 
-static inline void kasan_slab_alloc(struct kmem_cache *s, void *object) {}
+static inline void kasan_slab_alloc(struct kmem_cache *s, void *object,
+  gfp_t flags) {}
 static inline void kasan_slab_free(struct kmem_cache *s, void *object) {}
 
 static inline int kasan_module_alloc(void *addr, size_t size) { return 0; }
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 7dc1b73cdcec..d4946a66d15b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -322,7 +322,7 @@ static __always_inline void *kmem_cache_alloc_trace(struct 
kmem_cache *s,
 {
void *ret = kmem_cache_alloc(s, flags);
 
-   kasan_kmalloc(s, ret, size);
+   kasan_kmalloc(s, ret, size, flags);
return ret;
 }
 
@@ -333,7 +333,7 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s,
 {
void *ret = kmem_cache_alloc_node(s, gfpflags, node);
 
-   kasan_kmalloc(s, ret, size);
+   kasan_kmalloc(s, ret, size, gfpflags);
return ret;
 }
 #endif /* CONFIG_TRACING */
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 2e1a640f8772..03a856d1af12 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -411,9 +411,9 @@ struct kasan_free_meta *get_free_info(struct kmem_cache 
*cache,
 }
 #endif
 
-void kasan_slab_alloc(struct kmem_cache *cache, void *object)
+void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
 {
-   kasan_kmalloc(cache, object, cache->object_size);
+   kasan_kmalloc(cache, object, cache->object_size, flags);
 }
 
 void kasan_slab_free(struct kmem_cache *cache, void *object)
@@ -439,7 +439,8 @@ void kasan_slab_free(struct kmem_cache *cache, void *object)
kasan_poison_shadow(object, rounded_up_size, KASAN_KMALLOC_FREE);
 }
 
-void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size)
+void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
+  gfp_t flags)
 {
unsigned long redzone_start;
unsigned long redzone_end;
@@ -468,

[Devel] [PATCH rh7 14/39] kasan: add newline to messages

2017-09-14 Thread Andrey Ryabinin
From: Dmitry Vyukov 

Currently GPF messages with KASAN look as follows:

  kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral 
protection fault:  [#1] SMP DEBUG_PAGEALLOC KASAN

Add newlines.

Link: 
http://lkml.kernel.org/r/1467294357-98002-1-git-send-email-dvyu...@google.com
Signed-off-by: Dmitry Vyukov 
Acked-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 2ba78056acfe8d63a29565f91dae4678ed6b81ca)
Signed-off-by: Andrey Ryabinin 
---
 arch/x86/mm/kasan_init_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index f9fb08ed645a..dbe2a7156d94 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -168,8 +168,8 @@ static int kasan_die_handler(struct notifier_block *self,
 void *data)
 {
if (val == DIE_GPF) {
-   pr_emerg("CONFIG_KASAN_INLINE enabled");
-   pr_emerg("GPF could be caused by NULL-ptr deref or user memory 
access");
+   pr_emerg("CONFIG_KASAN_INLINE enabled\n");
+   pr_emerg("GPF could be caused by NULL-ptr deref or user memory 
access\n");
}
return NOTIFY_OK;
 }
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 02/39] Documentation: kasan: fix a typo

2017-09-14 Thread Andrey Ryabinin
From: Wang Long 

Fix a couple of typos in the kasan document.

Signed-off-by: Wang Long 
Signed-off-by: Jonathan Corbet 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit f66fa08bf9e59b1231aba9e3c2ec28dcf08f0389)
Signed-off-by: Andrey Ryabinin 
---
 Documentation/kasan.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/kasan.txt b/Documentation/kasan.txt
index 67e62ed6a198..82ed25f9d23c 100644
--- a/Documentation/kasan.txt
+++ b/Documentation/kasan.txt
@@ -149,7 +149,7 @@ AddressSanitizer dedicates 1/8 of kernel memory to its 
shadow memory
 (e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and
 offset to translate a memory address to its corresponding shadow address.
 
-Here is the function witch translate an address to its corresponding shadow
+Here is the function which translates an address to its corresponding shadow
 address:
 
 static inline void *kasan_mem_to_shadow(const void *addr)
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 28/39] lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB

2017-09-14 Thread Andrey Ryabinin
From: Dmitry Vyukov 

KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level).  Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks.  This capacity was chosen empirically and it
is enough to run kernel normally.

However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes.  I've
tested for long a time with number of top level entries bumped 4x
(4096).  And I think I've seen overflow only once.  But I don't have all
configs enabled and code coverage has not reached maximum yet.  So bump
it 8x to 8192.

Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.

Here is some approx math.

128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
alloc/free call sites are reachable with only one stack.  But some
utility functions can have large fanout.  Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.

Link: 
http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyu...@google.com
Signed-off-by: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Joonsoo Kim 
Cc: Baozeng Ding 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 02754e0a484a50a92d44c38879f2cb2792ebc572)
Signed-off-by: Andrey Ryabinin 
---
 lib/stackdepot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 60f77f1d470a..4d830e299989 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -50,7 +50,7 @@
STACK_ALLOC_ALIGN)
 #define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
-#define STACK_ALLOC_SLABS_CAP 1024
+#define STACK_ALLOC_SLABS_CAP 8192
 #define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
 (1LL << (STACK_ALLOC_INDEX_BITS)) : STACK_ALLOC_SLABS_CAP)
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 39/39] module: Fix load_module() error path

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

The load_module() error path frees a module but forgot to take it out
of the mod_tree, leaving a dangling entry in the tree, causing havoc.

Cc: Mathieu Desnoyers 
Reported-by: Arthur Marsh 
Tested-by: Arthur Marsh 
Fixes: 93c2e105f6bc ("module: Optimize __module_address() using a latched 
RB-tree")
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 758556bdc1c8a8dffea0ea9f9df891878cc2468c)
Signed-off-by: Andrey Ryabinin 
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index 952a9582f840..a5ee99f0f7a0 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3643,6 +3643,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
+   mod_tree_remove(mod);
wake_up_all(&module_wq);
mutex_unlock(&module_mutex);
  free_module:
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 33/39] rbtree: Make lockless searches non-fatal

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

Change the insert and erase code such that lockless searches are
non-fatal.

In and of itself an rbtree cannot be correctly searched while
in-modification, we can however provide weaker guarantees that will
allow the rbtree to be used in conjunction with other techniques, such
as latches; see 9b0fd802e8c0 ("seqcount: Add raw_write_seqcount_latch()").

For this to work we need the following guarantees from the rbtree
code:

 1) a lockless reader must not see partial stores, this would allow it
to observe nodes that are invalid memory.

 2) there must not be (temporary) loops in the tree structure in the
modifier's program order, this would cause a lookup which
interrupts the modifier to get stuck indefinitely.

For 1) we must use WRITE_ONCE() for all updates to the tree structure;
in particular this patch only does rb_{left,right} as those are the
only element required for simple searches.

It generates slightly worse code, probably because volatile. But in
pointer chasing heavy code a few instructions more should not matter.

For 2) I have carefully audited the code and drawn every intermediate
link state and not found a loop.

Cc: Mathieu Desnoyers 
Cc: "Paul E. McKenney" 
Cc: Oleg Nesterov 
Cc: Andrea Arcangeli 
Cc: David Woodhouse 
Cc: Rik van Riel 
Reviewed-by: Michel Lespinasse 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit d72da4a4d973d8a0a0d3c97e7cdebf287fbe3a99)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/rbtree.h   | 16 +++--
 include/linux/rbtree_augmented.h | 21 +++
 lib/rbtree.c | 76 
 3 files changed, 81 insertions(+), 32 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 57e75ae9910f..829c5a8b41c0 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -31,6 +31,7 @@
 
 #include 
 #include 
+#include 
 
 struct rb_node {
unsigned long  __rb_parent_color;
@@ -73,11 +74,11 @@ extern struct rb_node *rb_first_postorder(const struct 
rb_root *);
 extern struct rb_node *rb_next_postorder(const struct rb_node *);
 
 /* Fast replacement of a single node without remove/rebalance/add/rebalance */
-extern void rb_replace_node(struct rb_node *victim, struct rb_node *new, 
+extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
struct rb_root *root);
 
-static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
-   struct rb_node ** rb_link)
+static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
+   struct rb_node **rb_link)
 {
node->__rb_parent_color = (unsigned long)parent;
node->rb_left = node->rb_right = NULL;
@@ -85,6 +86,15 @@ static inline void rb_link_node(struct rb_node * node, 
struct rb_node * parent,
*rb_link = node;
 }
 
+static inline void rb_link_node_rcu(struct rb_node *node, struct rb_node 
*parent,
+   struct rb_node **rb_link)
+{
+   node->__rb_parent_color = (unsigned long)parent;
+   node->rb_left = node->rb_right = NULL;
+
+   rcu_assign_pointer(*rb_link, node);
+}
+
 #define rb_entry_safe(ptr, type, member) \
({ typeof(ptr) ptr = (ptr); \
   ptr ? rb_entry(ptr, type, member) : NULL; \
diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
index fea49b5da12a..1690f2612449 100644
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -113,11 +113,11 @@ __rb_change_child(struct rb_node *old, struct rb_node 
*new,
 {
if (parent) {
if (parent->rb_left == old)
-   parent->rb_left = new;
+   WRITE_ONCE(parent->rb_left, new);
else
-   parent->rb_right = new;
+   WRITE_ONCE(parent->rb_right, new);
} else
-   root->rb_node = new;
+   WRITE_ONCE(root->rb_node, new);
 }
 
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
@@ -127,7 +127,8 @@ static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 const struct rb_augment_callbacks *augment)
 {
-   struct rb_node *child = node->rb_right, *tmp = node->rb_left;
+   struct rb_node *child = node->rb_right;
+   struct rb_node *tmp = node->rb_left;
struct rb_node *parent, *rebalance;
unsigned long pc;
 
@@ -157,6 +158,7 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root 
*root,
tmp = parent;
} else {
struct rb_node *successor = child, *child2;
+
tmp = child->rb_left;
if (!tmp) {
/*
@@ -170,6 +172,7 @@ __rb_erase_augmented(struct rb_n

[Devel] [PATCH rh7 37/39] rbtree: Implement generic latch_tree

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

Implement a latched RB-tree in order to get unconditional RCU/lockless
lookups.

Cc: Oleg Nesterov 
Cc: Michel Lespinasse 
Cc: Andrea Arcangeli 
Cc: David Woodhouse 
Cc: Rik van Riel 
Cc: Mathieu Desnoyers 
Cc: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit ade3f510f93a5613b672febe88eff8ea7f1c63b7)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/rbtree_latch.h | 212 +++
 1 file changed, 212 insertions(+)
 create mode 100644 include/linux/rbtree_latch.h

diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
new file mode 100644
index ..4f3432c61d12
--- /dev/null
+++ b/include/linux/rbtree_latch.h
@@ -0,0 +1,212 @@
+/*
+ * Latched RB-trees
+ *
+ * Copyright (C) 2015 Intel Corp., Peter Zijlstra 
+ *
+ * Since RB-trees have non-atomic modifications they're not immediately suited
+ * for RCU/lockless queries. Even though we made RB-tree lookups non-fatal for
+ * lockless lookups; we cannot guarantee they return a correct result.
+ *
+ * The simplest solution is a seqlock + RB-tree, this will allow lockless
+ * lookups; but has the constraint (inherent to the seqlock) that read sides
+ * cannot nest in write sides.
+ *
+ * If we need to allow unconditional lookups (say as required for NMI context
+ * usage) we need a more complex setup; this data structure provides this by
+ * employing the latch technique -- see @raw_write_seqcount_latch -- to
+ * implement a latched RB-tree which does allow for unconditional lookups by
+ * virtue of always having (at least) one stable copy of the tree.
+ *
+ * However, while we have the guarantee that there is at all times one stable
+ * copy, this does not guarantee an iteration will not observe modifications.
+ * What might have been a stable copy at the start of the iteration, need not
+ * remain so for the duration of the iteration.
+ *
+ * Therefore, this does require a lockless RB-tree iteration to be non-fatal;
+ * see the comment in lib/rbtree.c. Note however that we only require the first
+ * condition -- not seeing partial stores -- because the latch thing isolates
+ * us from loops. If we were to interrupt a modification the lookup would be
+ * pointed at the stable tree and complete while the modification was halted.
+ */
+
+#ifndef RB_TREE_LATCH_H
+#define RB_TREE_LATCH_H
+
+#include 
+#include 
+
+struct latch_tree_node {
+   struct rb_node node[2];
+};
+
+struct latch_tree_root {
+   seqcount_t  seq;
+   struct rb_root  tree[2];
+};
+
+/**
+ * latch_tree_ops - operators to define the tree order
+ * @less: used for insertion; provides the (partial) order between two 
elements.
+ * @comp: used for lookups; provides the order between the search key and an 
element.
+ *
+ * The operators are related like:
+ *
+ * comp(a->key,b) < 0  := less(a,b)
+ * comp(a->key,b) > 0  := less(b,a)
+ * comp(a->key,b) == 0 := !less(a,b) && !less(b,a)
+ *
+ * If these operators define a partial order on the elements we make no
+ * guarantee on which of the elements matching the key is found. See
+ * latch_tree_find().
+ */
+struct latch_tree_ops {
+   bool (*less)(struct latch_tree_node *a, struct latch_tree_node *b);
+   int  (*comp)(void *key, struct latch_tree_node *b);
+};
+
+static __always_inline struct latch_tree_node *
+__lt_from_rb(struct rb_node *node, int idx)
+{
+   return container_of(node, struct latch_tree_node, node[idx]);
+}
+
+static __always_inline void
+__lt_insert(struct latch_tree_node *ltn, struct latch_tree_root *ltr, int idx,
+   bool (*less)(struct latch_tree_node *a, struct latch_tree_node *b))
+{
+   struct rb_root *root = tree[idx];
+   struct rb_node **link = &root->rb_node;
+   struct rb_node *node = node[idx];
+   struct rb_node *parent = NULL;
+   struct latch_tree_node *ltp;
+
+   while (*link) {
+   parent = *link;
+   ltp = __lt_from_rb(parent, idx);
+
+   if (less(ltn, ltp))
+   link = &parent->rb_left;
+   else
+   link = &parent->rb_right;
+   }
+
+   rb_link_node_rcu(node, parent, link);
+   rb_insert_color(node, root);
+}
+
+static __always_inline void
+__lt_erase(struct latch_tree_node *ltn, struct latch_tree_root *ltr, int idx)
+{
+   rb_erase(node[idx], tree[idx]);
+}
+
+static __always_inline struct latch_tree_node *
+__lt_find(void *key, struct latch_tree_root *ltr, int idx,
+ int (*comp)(void *key, struct latch_tree_node *node))
+{
+   struct rb_node *node = rcu_dereference_raw(ltr->tree[idx].rb_node);
+   struct latch_tree_node *ltn;
+   int c;
+
+   while (node) {
+   ltn = __lt_from_rb(node, idx);
+   c = comp(key, ltn);
+
+   if (c < 0)
+   node = rcu_

[Devel] [PATCH rh7 34/39] seqlock: Better document raw_write_seqcount_latch()

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

Improve the documentation of the latch technique as used in the
current timekeeping code, such that it can be readily employed
elsewhere.

Borrow from the comments in timekeeping and replace those with a
reference to this more generic comment.

Cc: Andrea Arcangeli 
Cc: David Woodhouse 
Cc: Rik van Riel 
Cc: "Paul E. McKenney" 
Cc: Oleg Nesterov 
Reviewed-by: Mathieu Desnoyers 
Acked-by: Michel Lespinasse 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 6695b92a60bc7160c92d6dc5b17cc79673017c2f)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/seqlock.h   | 76 ++-
 kernel/time/timekeeping.c | 27 +
 2 files changed, 76 insertions(+), 27 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 48f2f69e3867..ee088ed20a6c 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -171,9 +171,83 @@ static inline int read_seqcount_retry(const seqcount_t *s, 
unsigned start)
 }
 
 
-/*
+/**
  * raw_write_seqcount_latch - redirect readers to even/odd copy
  * @s: pointer to seqcount_t
+ *
+ * The latch technique is a multiversion concurrency control method that allows
+ * queries during non-atomic modifications. If you can guarantee queries never
+ * interrupt the modification -- e.g. the concurrency is strictly between CPUs
+ * -- you most likely do not need this.
+ *
+ * Where the traditional RCU/lockless data structures rely on atomic
+ * modifications to ensure queries observe either the old or the new state the
+ * latch allows the same for non-atomic updates. The trade-off is doubling the
+ * cost of storage; we have to maintain two copies of the entire data
+ * structure.
+ *
+ * Very simply put: we first modify one copy and then the other. This ensures
+ * there is always one copy in a stable state, ready to give us an answer.
+ *
+ * The basic form is a data structure like:
+ *
+ * struct latch_struct {
+ * seqcount_t  seq;
+ * struct data_struct  data[2];
+ * };
+ *
+ * Where a modification, which is assumed to be externally serialized, does the
+ * following:
+ *
+ * void latch_modify(struct latch_struct *latch, ...)
+ * {
+ * smp_wmb();  <- Ensure that the last data[1] update is visible
+ * latch->seq++;
+ * smp_wmb();  <- Ensure that the seqcount update is visible
+ *
+ * modify(latch->data[0], ...);
+ *
+ * smp_wmb();  <- Ensure that the data[0] update is visible
+ * latch->seq++;
+ * smp_wmb();  <- Ensure that the seqcount update is visible
+ *
+ * modify(latch->data[1], ...);
+ * }
+ *
+ * The query will have a form like:
+ *
+ * struct entry *latch_query(struct latch_struct *latch, ...)
+ * {
+ * struct entry *entry;
+ * unsigned seq, idx;
+ *
+ * do {
+ * seq = latch->seq;
+ * smp_rmb();
+ *
+ * idx = seq & 0x01;
+ * entry = data_query(latch->data[idx], ...);
+ *
+ * smp_rmb();
+ * } while (seq != latch->seq);
+ *
+ * return entry;
+ * }
+ *
+ * So during the modification, queries are first redirected to data[1]. Then we
+ * modify data[0]. When that is complete, we redirect queries back to data[0]
+ * and we can modify data[1].
+ *
+ * NOTE: The non-requirement for atomic modifications does _NOT_ include
+ *   the publishing of new entries in the case where data is a dynamic
+ *   data structure.
+ *
+ *   An iteration might start in data[0] and get suspended long enough
+ *   to miss an entire modification sequence, once it resumes it might
+ *   observe the new entry.
+ *
+ * NOTE: When data is a dynamic data structure; one should use regular RCU
+ *   patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e79a23a1bd03..8e5b95064209 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -229,32 +229,7 @@ static inline s64 timekeeping_get_ns(struct tk_read_base 
*tkr)
  * We want to use this from any context including NMI and tracing /
  * instrumenting the timekeeping code itself.
  *
- * So we handle this differently than the other timekeeping accessor
- * functions which retry when the sequence count has changed. The
- * update side does:
- *
- * smp_wmb();  <- Ensure that the last base[1] update is visible
- * tkf->seq++;
- * smp_wmb();  <- Ensure that the seqcount update is visible
- * update(tkf->base[0], tkr);
- * smp_wmb();  <- Ensure that the base[0] update is visible
- * tkf->seq++;
- * smp_wmb();  <- Ensure that the seqcount update is visible
- * update(tkf->base[1], tkr);
- *
- * The reader side does:
- *
- * do {
- * seq = tkf->seq;
- * smp_rmb();
- * idx = seq & 0x01;
- * now = now(tkf->base[idx]);
- * smp_rmb();
- * } while (s

[Devel] [PATCH rh7 38/39] module: Optimize __module_address() using a latched RB-tree

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

Currently __module_address() is using a linear search through all
modules in order to find the module corresponding to the provided
address. With a lot of modules this can take a lot of time.

One of the users of this is kernel_text_address() which is employed
in many stack unwinders; which in turn are used by perf-callchain and
ftrace (possibly from NMI context).

So by optimizing __module_address() we optimize many stack unwinders
which are used by both perf and tracing in performance sensitive code.

Cc: Rusty Russell 
Cc: Steven Rostedt 
Cc: Mathieu Desnoyers 
Cc: Oleg Nesterov 
Cc: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 93c2e105f6bcee231c951ba0e56e84505c4b0483)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/module.h |  32 +++---
 kernel/module.c| 117 ++---
 2 files changed, 138 insertions(+), 11 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index a4155ca70d1a..48c7335b05c8 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -236,8 +237,14 @@ struct module_ext {
 #endif
 };
 
-struct module
-{
+struct module;
+
+struct mod_tree_node {
+   struct module *mod;
+   struct latch_tree_node node;
+};
+
+struct module {
enum module_state state;
 
/* Member of list of modules */
@@ -296,8 +303,15 @@ struct module
/* Startup function. */
int (*init)(void);
 
-   /* If this is non-NULL, vfree after init() returns */
-   void *module_init;
+   /*
+* If this is non-NULL, vfree() after init() returns.
+*
+* Cacheline align here, such that:
+*   module_init, module_core, init_size, core_size,
+*   init_text_size, core_text_size and ltn_core.node[0]
+* are on the same cacheline.
+*/
+   void *module_init   cacheline_aligned;
 
/* Here is the actual code + data, vfree'd on unload. */
void *module_core;
@@ -308,6 +322,14 @@ struct module
/* The size of the executable code in each section.  */
unsigned int init_text_size, core_text_size;
 
+   /*
+* We want mtn_core::{mod,node[0]} to be in the same cacheline as the
+* above entries such that a regular lookup will only touch one
+* cacheline.
+*/
+   struct mod_tree_nodemtn_core;
+   struct mod_tree_nodemtn_init;
+
/* Size of RO sections of the module (text+rodata) */
unsigned int init_ro_size, core_ro_size;
 
@@ -392,7 +414,7 @@ struct module
ctor_fn_t *ctors;
unsigned int num_ctors;
 #endif
-};
+} cacheline_aligned;
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
 #endif
diff --git a/kernel/module.c b/kernel/module.c
index 3f5edae1edac..952a9582f840 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -105,6 +105,108 @@
 DEFINE_MUTEX(module_mutex);
 EXPORT_SYMBOL_GPL(module_mutex);
 static LIST_HEAD(modules);
+
+/*
+ * Use a latched RB-tree for __module_address(); this allows us to use
+ * RCU-sched lookups of the address from any context.
+ *
+ * Because modules have two address ranges: init and core, we need two
+ * latch_tree_nodes entries. Therefore we need the back-pointer from
+ * mod_tree_node.
+ *
+ * Because init ranges are short lived we mark them unlikely and have placed
+ * them outside the critical cacheline in struct module.
+ */
+
+static __always_inline unsigned long __mod_tree_val(struct latch_tree_node *n)
+{
+   struct mod_tree_node *mtn = container_of(n, struct mod_tree_node, node);
+   struct module *mod = mtn->mod;
+
+   if (unlikely(mtn == &mod->mtn_init))
+   return (unsigned long)mod->module_init;
+
+   return (unsigned long)mod->module_core;
+}
+
+static __always_inline unsigned long __mod_tree_size(struct latch_tree_node *n)
+{
+   struct mod_tree_node *mtn = container_of(n, struct mod_tree_node, node);
+   struct module *mod = mtn->mod;
+
+   if (unlikely(mtn == &mod->mtn_init))
+   return (unsigned long)mod->init_size;
+
+   return (unsigned long)mod->core_size;
+}
+
+static __always_inline bool
+mod_tree_less(struct latch_tree_node *a, struct latch_tree_node *b)
+{
+   return __mod_tree_val(a) < __mod_tree_val(b);
+}
+
+static __always_inline int
+mod_tree_comp(void *key, struct latch_tree_node *n)
+{
+   unsigned long val = (unsigned long)key;
+   unsigned long start, end;
+
+   start = __mod_tree_val(n);
+   if (val < start)
+   return -1;
+
+   end = start + __mod_tree_size(n);
+   if (val >= end)
+   return 1;
+
+   return 0;
+}
+
+static const struct latch_tree_ops mod_tree_ops = {
+   .less = mod_tree_less,
+   .comp = mod_tree_comp,
+};
+
+stati

[Devel] [PATCH rh7 32/39] kasan: fix races in quarantine_remove_cache()

2017-09-14 Thread Andrey Ryabinin
From: Dmitry Vyukov 

quarantine_remove_cache() frees all pending objects that belong to the
cache, before we destroy the cache itself.  However there are currently
two possibilities how it can fail to do so.

First, another thread can hold some of the objects from the cache in
temp list in quarantine_put().  quarantine_put() has a windows of
enabled interrupts, and on_each_cpu() in quarantine_remove_cache() can
finish right in that window.  These objects will be later freed into the
destroyed cache.

Then, quarantine_reduce() has the same problem.  It grabs a batch of
objects from the global quarantine, then unlocks quarantine_lock and
then frees the batch.  quarantine_remove_cache() can finish while some
objects from the cache are still in the local to_free list in
quarantine_reduce().

Fix the race with quarantine_put() by disabling interrupts for the whole
duration of quarantine_put().  In combination with on_each_cpu() in
quarantine_remove_cache() it ensures that quarantine_remove_cache()
either sees the objects in the per-cpu list or in the global list.

Fix the race with quarantine_reduce() by protecting quarantine_reduce()
with srcu critical section and then doing synchronize_srcu() at the end
of quarantine_remove_cache().

I've done some assessment of how good synchronize_srcu() works in this
case.  And on a 4 CPU VM I see that it blocks waiting for pending read
critical sections in about 2-3% of cases.  Which looks good to me.

I suspect that these races are the root cause of some GPFs that I
episodically hit.  Previously I did not have any explanation for them.

  BUG: unable to handle kernel NULL pointer dereference at 00c8
  IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
  PGD 6aeea067
  PUD 60ed7067
  PMD 0
  Oops:  [#1] SMP KASAN
  Dumping ftrace buffer:
 (ftrace buffer empty)
  Modules linked in:
  CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: 88005f948040 task.stack: 880069818000
  RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
  RSP: 0018:88006981f298 EFLAGS: 00010246
  RAX: ea00 RBX:  RCX: ea1f
  RDX:  RSI: 88003fffc3e0 RDI: 
  RBP: 88006981f2c0 R08: 88002fed7bd8 R09: 0001001f000d
  R10: 001f000d R11: 88006981f000 R12: 88003fffc3e0
  R13: 88006981f2d0 R14: 81877fae R15: 8000
  FS:  7fb911a2d700() GS:88003ec0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 00c8 CR3: 60ed6000 CR4: 06f0
  Call Trace:
   quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239
   kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590
   kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
   slab_post_alloc_hook mm/slab.h:456 [inline]
   slab_alloc_node mm/slub.c:2718 [inline]
   kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754
   __alloc_skb+0x10f/0x770 net/core/skbuff.c:219
   alloc_skb include/linux/skbuff.h:932 [inline]
   _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388
   sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline]
   sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746
   sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266
   sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962
   inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
   sock_sendmsg_nosec net/socket.c:633 [inline]
   sock_sendmsg+0xca/0x110 net/socket.c:643
   SYSC_sendto+0x660/0x810 net/socket.c:1685
   SyS_sendto+0x40/0x50 net/socket.c:1653

I am not sure about backporting.  The bug is quite hard to trigger, I've
seen it few times during our massive continuous testing (however, it
could be cause of some other episodic stray crashes as it leads to
memory corruption...).  If it is triggered, the consequences are very
bad -- almost definite bad memory corruption.  The fix is non trivial
and has chances of introducing new bugs.  I am also not sure how
actively people use KASAN on older releases.

[dvyu...@google.com: - sorted includes[
  Link: http://lkml.kernel.org/r/20170309094028.51088-1-dvyu...@google.com
Link: http://lkml.kernel.org/r/20170308151532.5070-1-dvyu...@google.com
Signed-off-by: Dmitry Vyukov 
Acked-by: Andrey Ryabinin 
Cc: Greg Thelen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit ce5bec54bb5debbbe51b40270d8f209a23cadae4)
Signed-off-by: Andrey Ryabinin 
---
 mm/kasan/quarantine.c | 42 --
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 6f1ed1630873..5c44c08f46b6 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -103,6 +104,7 @@ static int quarantine_tail;
 /*

[Devel] [PATCH rh7 31/39] kasan: drain quarantine of memcg slab objects

2017-09-14 Thread Andrey Ryabinin
From: Greg Thelen 

Per memcg slab accounting and kasan have a problem with kmem_cache
destruction.
 - kmem_cache_create() allocates a kmem_cache, which is used for
   allocations from processes running in root (top) memcg.
 - Processes running in non root memcg and allocating with either
   __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
   kmem_cache.
 - Kasan catches use-after-free by having kfree() and kmem_cache_free()
   defer freeing of objects. Objects are placed in a quarantine.
 - kmem_cache_destroy() destroys root and non root kmem_caches. It takes
   care to drain the quarantine of objects from the root memcg's
   kmem_cache, but ignores objects associated with non root memcg. This
   causes leaks because quarantined per memcg objects refer to per memcg
   kmem cache being destroyed.

To see the problem:

 1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
 2) from non root memcg, allocate and free a few objects from cache
 3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
will trigger a "Slab cache still has objects" warning indicating
that the per memcg kmem_cache structure was leaked.

Fix the leak by draining kasan quarantined objects allocated from non
root memcg.

Racing memcg deletion is tricky, but handled.  kmem_cache_destroy() =>
shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
flushes per memcg quarantined objects, even if that memcg has been
rmdir'd and gone through memcg_deactivate_kmem_caches().

This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
enabled.  So I don't think it's worth patching stable kernels.

Link: 
http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthe...@google.com
Signed-off-by: Greg Thelen 
Reviewed-by: Vladimir Davydov 
Acked-by: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: Dmitry Vyukov 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit f9fa1d919c696e90c887d8742198023e7639d139)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/kasan.h | 4 ++--
 mm/kasan/kasan.c  | 2 +-
 mm/kasan/quarantine.c | 1 +
 mm/slab_common.c  | 6 --
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 536a400d1d39..21cedc322d9a 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -41,7 +41,7 @@ void kasan_free_pages(struct page *page, unsigned int order);
 void kasan_cache_create(struct kmem_cache *cache, size_t *size,
unsigned long *flags);
 void kasan_cache_shrink(struct kmem_cache *cache);
-void kasan_cache_destroy(struct kmem_cache *cache);
+void kasan_cache_shutdown(struct kmem_cache *cache);
 
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
@@ -83,7 +83,7 @@ static inline void kasan_cache_create(struct kmem_cache 
*cache,
  size_t *size,
  unsigned long *flags) {}
 static inline void kasan_cache_shrink(struct kmem_cache *cache) {}
-static inline void kasan_cache_destroy(struct kmem_cache *cache) {}
+static inline void kasan_cache_shutdown(struct kmem_cache *cache) {}
 
 static inline void kasan_poison_slab(struct page *page) {}
 static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 8f350a2edcb6..8b9531312417 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -373,7 +373,7 @@ void kasan_cache_shrink(struct kmem_cache *cache)
quarantine_remove_cache(cache);
 }
 
-void kasan_cache_destroy(struct kmem_cache *cache)
+void kasan_cache_shutdown(struct kmem_cache *cache)
 {
quarantine_remove_cache(cache);
 }
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index dae929c02bbb..6f1ed1630873 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -274,6 +274,7 @@ static void per_cpu_remove_cache(void *arg)
qlist_free_all(&to_free, cache);
 }
 
+/* Free all quarantined objects belonging to cache. */
 void quarantine_remove_cache(struct kmem_cache *cache)
 {
unsigned long flags, i;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8c8c99b9db05..b24d35d85e58 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -359,6 +359,10 @@ EXPORT_SYMBOL(kmem_cache_create);
 static int do_kmem_cache_shutdown(struct kmem_cache *s,
struct list_head *release, bool *need_rcu_barrier)
 {
+
+   /* free asan quarantined objects */
+   kasan_cache_shutdown(s);
+
if (__kmem_cache_shutdown(s) != 0) {
printk(KERN_ERR "kmem_cache_destroy %s: "
   "Slab cache still has objects\n", s->name);
@@ -544,8 +548,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
 
BUG_ON(!is_root_cache(s));
 
-   kasan_ca

[Devel] [PATCH rh7 35/39] rcu: Move lockless_dereference() out of rcupdate.h

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

I want to use lockless_dereference() from seqlock.h, which would mean
including rcupdate.h from it, however rcupdate.h already includes
seqlock.h.

Avoid this by moving lockless_dereference() into compiler.h. This is
somewhat tricky since it uses smp_read_barrier_depends() which isn't
available there, but its a CPP macro so we can get away with it.

The alternative would be moving it into asm/barrier.h, but that would
be updating each arch (I can do if people feel that is more
appropriate).

Cc: Paul McKenney 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 0a04b0166929405cd833c1cc40f99e862b965ddc)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/compiler.h | 15 +++
 include/linux/rcupdate.h | 15 ---
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 73647b4cd947..7ce904c040dd 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -466,6 +466,21 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
  */
 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
 
+/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU.  That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+   typeof(p) _p1 = ACCESS_ONCE(p); \
+   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+   (_p1); \
+})
+
 /* Ignore/forbid kprobes attach on very low level functions marked by this 
attribute: */
 #ifdef CONFIG_KPROBES
 # define __kprobes __attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 68df10240cb4..981261775a41 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -580,21 +580,6 @@ static inline void rcu_preempt_sleep_check(void)
} while (0)
 
 /**
- * lockless_dereference() - safely load a pointer for later dereference
- * @p: The pointer to load
- *
- * Similar to rcu_dereference(), but for situations where the pointed-to
- * object's lifetime is managed by something other than RCU.  That
- * "something other" might be reference counting or simple immortality.
- */
-#define lockless_dereference(p) \
-({ \
-   typeof(p) _p1 = ACCESS_ONCE(p); \
-   smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
-   (_p1); \
-})
-
-/**
  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
  * @p: The pointer to read
  *
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 36/39] seqlock: Introduce raw_read_seqcount_latch()

2017-09-14 Thread Andrey Ryabinin
From: Peter Zijlstra 

Because with latches there is a strict data dependency on the seq load
we can avoid the rmb in favour of a read_barrier_depends.

Suggested-by: Ingo Molnar 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Rusty Russell 

https://jira.sw.ru/browse/PSBM-69081
(cherry picked from commit 7fc26327b75685f37f58d64bdb061460f834f80d)
Signed-off-by: Andrey Ryabinin 
---
 include/linux/seqlock.h   | 8 ++--
 kernel/time/timekeeping.c | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index ee088ed20a6c..9d8997027263 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -34,6 +34,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -170,6 +171,10 @@ static inline int read_seqcount_retry(const seqcount_t *s, 
unsigned start)
return __read_seqcount_retry(s, start);
 }
 
+static inline int raw_read_seqcount_latch(seqcount_t *s)
+{
+   return lockless_dereference(s->sequence);
+}
 
 /**
  * raw_write_seqcount_latch - redirect readers to even/odd copy
@@ -222,8 +227,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, 
unsigned start)
  * unsigned seq, idx;
  *
  * do {
- * seq = latch->seq;
- * smp_rmb();
+ * seq = lockless_dereference(latch->seq);
  *
  * idx = seq & 0x01;
  * entry = data_query(latch->data[idx], ...);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 8e5b95064209..d99c89095bfd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -292,7 +292,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct 
tk_fast *tkf)
u64 now;
 
do {
-   seq = raw_read_seqcount(&tkf->seq);
+   seq = raw_read_seqcount_latch(&tkf->seq);
tkr = tkf->base + (seq & 0x01);
now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
} while (read_seqcount_retry(&tkf->seq, seq));
-- 
2.13.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel