date:20170831

Re: [Devel] [RFC PATCH 2/2] autofs: sent 32-bit sized packet for 32-bit process

2017-08-31 Thread Dmitry V. Levin

On Thu, Aug 31, 2017 at 05:57:11PM +0400, Stanislav Kinsburskiy wrote:
> The structure autofs_v5_packet (except name) is not aligned by 8 bytes, which
> lead to different sizes in 32 and 64-bit architectures.
> Let's form 32-bit compatible packet when daemon has 32-bit addressation.
> 
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  fs/autofs4/waitq.c |   11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 309ca6b..484cf2e 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -153,12 +153,19 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
> *sbi,
>   {
>   struct autofs_v5_packet *packet = &pkt.v5_pkt.v5_packet;
>   struct user_namespace *user_ns = sbi->pipe->f_cred->user_ns;
> + size_t name_offset;
>  
> - pktsz = sizeof(*packet);
> + if (sbi->is32bit)
> + name_offset = offsetof(struct autofs_v5_packet, len) +
> +   sizeof(packet->len);
> + else
> + name_offset = offsetof(struct autofs_v5_packet, name);

This doesn't help at all because the offset of struct autofs_v5_packet.name
does not change.

> + pktsz = name_offset + sizeof(packet->name);

What changes is pktsz: it's either sizeof(struct autofs_v5_packet)
or 4 bytes less, depending on the architecture.  For example,

#ifdef CONFIG_COMPAT
if (__alignof__(compat_u64) < __alignof__(u64) && sbi->is32bit)
pktsz = offsetofend(struct autofs_v5_packet, name);
else
#endif
pktsz = sizeof(*packet);


-- 
ldv


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] mm: Count list_lru_one::nr_items lockless

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 6fd774dbf6fd05eca8cfa192753bf35dac694368
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:25:20 2017 +0300

mm: Count list_lru_one::nr_items lockless

During the reclaiming slab of a memcg, shrink_slab iterates
over all registered shrinkers in the system, and tries to count
and consume objects related to the cgroup. In case of memory
pressure, this behaves bad: I observe high system time and
time spent in list_lru_count_one() for many processes:

0,50%  nixstatsagent  [kernel.vmlinux]  [k] _raw_spin_lock
[k] _raw_spin_lock
0,26%  nixstatsagent  [kernel.vmlinux]  [k] shrink_slab   
[k] shrink_slab
0,23%  nixstatsagent  [kernel.vmlinux]  [k] super_cache_count 
[k] super_cache_count
0,15%  nixstatsagent  [kernel.vmlinux]  [k] __list_lru_count_one.isra.2   
[k] _raw_spin_lock
0,15%  nixstatsagent  [kernel.vmlinux]  [k] list_lru_count_one
[k] __list_lru_count_one.isra.2

0,94%  mysqld [kernel.vmlinux]  [k] _raw_spin_lock
[k] _raw_spin_lock
0,57%  mysqld [kernel.vmlinux]  [k] shrink_slab   
[k] shrink_slab
0,51%  mysqld [kernel.vmlinux]  [k] super_cache_count 
[k] super_cache_count
0,32%  mysqld [kernel.vmlinux]  [k] __list_lru_count_one.isra.2   
[k] _raw_spin_lock
0,32%  mysqld [kernel.vmlinux]  [k] list_lru_count_one
[k] __list_lru_count_one.isra.2

0,73%  sshd   [kernel.vmlinux]  [k] _raw_spin_lock
[k] _raw_spin_lock
0,35%  sshd   [kernel.vmlinux]  [k] shrink_slab   
[k] shrink_slab
0,32%  sshd   [kernel.vmlinux]  [k] super_cache_count 
[k] super_cache_count
0,21%  sshd   [kernel.vmlinux]  [k] __list_lru_count_one.isra.2   
[k] _raw_spin_lock
0,21%  sshd   [kernel.vmlinux]  [k] list_lru_count_one
[k] __list_lru_count_one.isra.2

This patch aims to make super_cache_count() more effective. It
makes __list_lru_count_one() count nr_items lockless to minimize
overhead introducing by locking operation, and to make parallel
reclaims more scalable.

The lock won't be taken on shrinker::count_objects(),
it would be taken only for the real shrink by the thread,
who realizes it.

https://jira.sw.ru/browse/PSBM-69296

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/list_lru.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index b166eff..5adc6621 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -160,10 +160,10 @@ static unsigned long __list_lru_count_one(struct list_lru 
*lru,
struct list_lru_one *l;
unsigned long count;
 
-   spin_lock(&nlru->lock);
+   rcu_read_lock();
l = list_lru_from_memcg_idx(nlru, memcg_idx);
count = l->nr_items;
-   spin_unlock(&nlru->lock);
+   rcu_read_unlock();
 
return count;
 }
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] mm: Make list_lru_node::memcg_lrus RCU protected

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 5db3da0bf7112c551ca9ce90b1c0e8a2bcad9ac1
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:25:20 2017 +0300

mm: Make list_lru_node::memcg_lrus RCU protected

The array list_lru_node::memcg_lrus::list_lru_one[] only grows,
and it never shrinks. The growths happens in memcg_update_list_lru_node(),
and old array's members remain the same after it.

So, the access to the array's members may become RCU protected,
and it's possible to avoid using list_lru_node::lock to dereference it.
This will be used to get list's nr_items in next patch lockless.

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 include/linux/list_lru.h |  2 +-
 mm/list_lru.c| 59 
 2 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 7bf4251..00a339b 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -43,7 +43,7 @@ struct list_lru_node {
struct list_lru_one lru;
 #ifdef CONFIG_MEMCG_KMEM
/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
-   struct list_lru_memcg   *memcg_lrus;
+   struct list_lru_memcg   __rcu *memcg_lrus;
 #endif
 } cacheline_aligned_in_smp;
 
diff --git a/mm/list_lru.c b/mm/list_lru.c
index cb53462..b166eff 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -42,19 +42,24 @@ static void list_lru_unregister(struct list_lru *lru)
 #ifdef CONFIG_MEMCG_KMEM
 static inline bool list_lru_memcg_aware(struct list_lru *lru)
 {
-   return !!lru->node[0].memcg_lrus;
+   struct list_lru_memcg *memcg_lrus;
+   /* Here we only check the pointer is not NULL, so RCU lock isn't need */
+   memcg_lrus = rcu_dereference_check(lru->node[0].memcg_lrus, true);
+   return !!memcg_lrus;
 }
 
 static inline struct list_lru_one *
 list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
 {
+   struct list_lru_memcg *memcg_lrus;
/*
-* The lock protects the array of per cgroup lists from relocation
-* (see memcg_update_list_lru_node).
+* Either lock and RCU protects the array of per cgroup lists
+* from relocation (see memcg_update_list_lru_node).
 */
-   lockdep_assert_held(&nlru->lock);
-   if (nlru->memcg_lrus && idx >= 0)
-   return nlru->memcg_lrus->lru[idx];
+   memcg_lrus = rcu_dereference_check(nlru->memcg_lrus,
+  lockdep_is_held(&nlru->lock));
+   if (memcg_lrus && idx >= 0)
+   return memcg_lrus->lru[idx];
 
return &nlru->lru;
 }
@@ -62,9 +67,12 @@ list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx)
 static inline struct list_lru_one *
 list_lru_from_kmem(struct list_lru_node *nlru, void *ptr)
 {
+   struct list_lru_memcg *memcg_lrus;
struct mem_cgroup *memcg;
 
-   if (!nlru->memcg_lrus)
+   memcg_lrus = rcu_dereference_check(nlru->memcg_lrus,
+  lockdep_is_held(&nlru->lock));
+   if (!memcg_lrus)
return &nlru->lru;
 
memcg = mem_cgroup_from_kmem(ptr);
@@ -311,25 +319,34 @@ static int __memcg_init_list_lru_node(struct 
list_lru_memcg *memcg_lrus,
 
 static int memcg_init_list_lru_node(struct list_lru_node *nlru)
 {
+   struct list_lru_memcg *memcg_lrus;
int size = memcg_nr_cache_ids;
 
-   nlru->memcg_lrus = kmalloc(sizeof(struct list_lru_memcg) +
-  size * sizeof(void *), GFP_KERNEL);
-   if (!nlru->memcg_lrus)
+   memcg_lrus = kmalloc(sizeof(*memcg_lrus) +
+size * sizeof(void *), GFP_KERNEL);
+   if (!memcg_lrus)
return -ENOMEM;
 
-   if (__memcg_init_list_lru_node(nlru->memcg_lrus, 0, size)) {
-   kfree(nlru->memcg_lrus);
+   if (__memcg_init_list_lru_node(memcg_lrus, 0, size)) {
+   kfree(memcg_lrus);
return -ENOMEM;
}
+   rcu_assign_pointer(nlru->memcg_lrus, memcg_lrus);
 
return 0;
 }
 
 static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
 {
-   __memcg_destroy_list_lru_node(nlru->memcg_lrus, 0, memcg_nr_cache_ids);
-   kfree(nlru->memcg_lrus);
+   struct list_lru_memcg *memcg_lrus;
+
+   /*
+* This is called when shrinker has already been unregistered,
+* so nobody can use it.
+*/
+   memcg_lrus = rcu_dereference_check(nlru->memcg_lrus, true);
+   __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
+   kfree(memcg_lrus);
 }
 
 static int memcg_update_list_lru_node(struct list_lru_node *nlru,
@@ -338,8 +355,10 @@ static int memcg_update_list_lru_node(struct list_lru_node 
*nlru,
struct list_lr

[Devel] [PATCH RHEL7 COMMIT] mm: Add rcu field to struct list_lru_memcg

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit b3b3ea1125f07f57ea0f95b29ad368934cc7bb53
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:25:19 2017 +0300

mm: Add rcu field to struct list_lru_memcg

Patchset description:
Make count list_lru_one::nr_items lockless

This series aims to improve scalability of list_lru shrinking
and to make list_lru_count_one() working more effective.

Kirill Tkhai (3):
  mm: Add rcu field to struct list_lru_memcg
  mm: Make list_lru_node::memcg_lrus RCU protected
  mm: Count list_lru_one::nr_items lockless

https://jira.sw.ru/browse/PSBM-69296

=
This patch description:

This patch adds the new field and teaches kmalloc()
to allocate memory for it.

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 include/linux/list_lru.h | 1 +
 mm/list_lru.c| 7 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 2a6b994..7bf4251 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -31,6 +31,7 @@ struct list_lru_one {
 };
 
 struct list_lru_memcg {
+   struct rcu_head rcu;
/* array of per cgroup lists, indexed by memcg_cache_id */
struct list_lru_one *lru[0];
 };
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 84b4c21..cb53462 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -313,7 +313,8 @@ static int memcg_init_list_lru_node(struct list_lru_node 
*nlru)
 {
int size = memcg_nr_cache_ids;
 
-   nlru->memcg_lrus = kmalloc(size * sizeof(void *), GFP_KERNEL);
+   nlru->memcg_lrus = kmalloc(sizeof(struct list_lru_memcg) +
+  size * sizeof(void *), GFP_KERNEL);
if (!nlru->memcg_lrus)
return -ENOMEM;
 
@@ -339,7 +340,7 @@ static int memcg_update_list_lru_node(struct list_lru_node 
*nlru,
BUG_ON(old_size > new_size);
 
old = nlru->memcg_lrus;
-   new = kmalloc(new_size * sizeof(void *), GFP_KERNEL);
+   new = kmalloc(sizeof(*new) + new_size * sizeof(void *), GFP_KERNEL);
if (!new)
return -ENOMEM;
 
@@ -348,7 +349,7 @@ static int memcg_update_list_lru_node(struct list_lru_node 
*nlru,
return -ENOMEM;
}
 
-   memcpy(new, old, old_size * sizeof(void *));
+   memcpy(&new->lru, &old->lru, old_size * sizeof(void *));
 
/*
 * The lock guarantees that we won't race with a reader
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Cleanup unused expression from tcache_lru_isolate()

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 389f1b056f987726601af0399791b18f107436c5
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:18 2017 +0300

tcache: Cleanup unused expression from tcache_lru_isolate()

Nobody use nr_to_isolate in further. It seems, it's historical leftover.

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index ab70af2..0e57ae6 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -1049,7 +1049,6 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
nr = __tcache_lru_isolate(pni, pages, nr_to_isolate);
ni->nr_pages -= nr;
nr_isolated += nr;
-   nr_to_isolate -= nr;
 
if (!list_empty(&pni->lru))
__tcache_insert_reclaim_node(ni, pni);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Make tcache_lru_isolate() keep ni->lock less

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 05e159ab7e5981fc76950e8e999d5f855d9313f7
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:20 2017 +0300

tcache: Make tcache_lru_isolate() keep ni->lock less

Grab pool using RCU technics, and do not use ni->lock.
This refactors the function and will be used in further.

v2: Use tcache_nodeinfo::rb_first

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 40608ec..3d9c5ac 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -1044,33 +1044,49 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
int nr_isolated = 0;
struct rb_node *rbn;
 
-   spin_lock_irq(&ni->lock);
+   rcu_read_lock();
 again:
-   rbn = rb_first(&ni->reclaim_tree);
-   if (!rbn)
+   rbn = rcu_dereference(ni->rb_first);
+   if (!rbn) {
+   rcu_read_unlock();
goto out;
-
-   rb_erase(rbn, &ni->reclaim_tree);
-   RB_CLEAR_NODE(rbn);
-   update_ni_rb_first(ni);
+   }
 
pni = rb_entry(rbn, struct tcache_pool_nodeinfo, reclaim_node);
-   if (!tcache_grab_pool(pni->pool))
+   if (!tcache_grab_pool(pni->pool)) {
+   spin_lock_irq(&ni->lock);
+   if (!RB_EMPTY_NODE(rbn) && list_empty(&pni->lru)) {
+   rb_erase(rbn, &ni->reclaim_tree);
+   RB_CLEAR_NODE(rbn);
+   update_ni_rb_first(ni);
+   }
+   spin_unlock_irq(&ni->lock);
goto again;
+   }
+   rcu_read_unlock();
 
+   spin_lock_irq(&ni->lock);
spin_lock(&pni->lock);
nr_isolated = __tcache_lru_isolate(pni, pages, nr_to_isolate);
+
+   if (!nr_isolated)
+   goto unlock;
+
ni->nr_pages -= nr_isolated;
 
-   if (!list_empty(&pni->lru)) {
-   __tcache_insert_reclaim_node(ni, pni);
-   update_ni_rb_first(ni);
+   if (!RB_EMPTY_NODE(rbn)) {
+   rb_erase(rbn, &ni->reclaim_tree);
+   RB_CLEAR_NODE(rbn);
}
+   if (!list_empty(&pni->lru))
+   __tcache_insert_reclaim_node(ni, pni);
+   update_ni_rb_first(ni);
 
+unlock:
spin_unlock(&pni->lock);
+   spin_unlock_irq(&ni->lock);
tcache_put_pool(pni->pool);
 out:
-   spin_unlock_irq(&ni->lock);
return nr_isolated;
 }
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Use ni->lock only for inserting and erasing from rbtree.

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 342e800a8b114e74c372374268812ce2612a26aa
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:22 2017 +0300

tcache: Use ni->lock only for inserting and erasing from rbtree.

This patch completes splitting of ni->lock into ni->lock and pni->lock.
Now, global ni->lock is used for inserting in tcache_nodeinfo::reclaim_tree,
which happen just on every 1024 inserting or erasing of pages.
For other LRU operations is used pni->lock, which is per-filesystem
(i.e., per-container), and does not affect other containers.

Also, lock order is changed:

spin_lock(&pni->lock);
spin_lock(&ni->lock);

v3: Disable irqs in tcache_lru_isolate().

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 202834c..5faa390 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -261,7 +261,6 @@ static void tcache_lru_add(struct tcache_pool *pool, struct 
page *page)
struct tcache_nodeinfo *ni = &tcache_nodeinfo[nid];
struct tcache_pool_nodeinfo *pni = &pool->nodeinfo[nid];
 
-   spin_lock(&ni->lock);
spin_lock(&pni->lock);
atomic_long_inc(&ni->nr_pages);
pni->nr_pages++;
@@ -274,13 +273,14 @@ static void tcache_lru_add(struct tcache_pool *pool, 
struct page *page)
}
 
if (tcache_check_events(pni) || RB_EMPTY_NODE(&pni->reclaim_node)) {
+   spin_lock(&ni->lock);
if (!RB_EMPTY_NODE(&pni->reclaim_node))
rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
__tcache_insert_reclaim_node(ni, pni);
update_ni_rb_first(ni);
+   spin_unlock(&ni->lock);
}
spin_unlock(&pni->lock);
-   spin_unlock(&ni->lock);
 }
 
 static void __tcache_lru_del(struct tcache_pool_nodeinfo *pni,
@@ -301,7 +301,6 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
struct tcache_nodeinfo *ni = &tcache_nodeinfo[nid];
struct tcache_pool_nodeinfo *pni = &pool->nodeinfo[nid];
 
-   spin_lock(&ni->lock);
spin_lock(&pni->lock);
 
/* Raced with reclaimer? */
@@ -315,14 +314,15 @@ static void tcache_lru_del(struct tcache_pool *pool, 
struct page *page,
pni->recent_gets++;
 
if (tcache_check_events(pni)) {
+   spin_lock(&ni->lock);
if (!RB_EMPTY_NODE(&pni->reclaim_node))
rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
__tcache_insert_reclaim_node(ni, pni);
update_ni_rb_first(ni);
+   spin_unlock(&ni->lock);
}
 out:
spin_unlock(&pni->lock);
-   spin_unlock(&ni->lock);
 }
 
 static int tcache_create_pool(void)
@@ -1065,8 +1065,7 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
}
rcu_read_unlock();
 
-   spin_lock_irq(&ni->lock);
-   spin_lock(&pni->lock);
+   spin_lock_irq(&pni->lock);
nr_isolated = __tcache_lru_isolate(pni, pages, nr_to_isolate);
 
if (!nr_isolated)
@@ -1074,17 +1073,19 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
 
atomic_long_sub(nr_isolated, &ni->nr_pages);
 
-   if (!RB_EMPTY_NODE(rbn)) {
-   rb_erase(rbn, &ni->reclaim_tree);
-   RB_CLEAR_NODE(rbn);
+   if (!RB_EMPTY_NODE(rbn) || !list_empty(&pni->lru)) {
+   spin_lock(&ni->lock);
+   if (!RB_EMPTY_NODE(rbn))
+   rb_erase(rbn, &ni->reclaim_tree);
+   if (!list_empty(&pni->lru))
+   __tcache_insert_reclaim_node(ni, pni);
+   else
+   RB_CLEAR_NODE(rbn);
+   update_ni_rb_first(ni);
+   spin_unlock(&ni->lock);
}
-   if (!list_empty(&pni->lru))
-   __tcache_insert_reclaim_node(ni, pni);
-   update_ni_rb_first(ni);
-
 unlock:
-   spin_unlock(&pni->lock);
-   spin_unlock_irq(&ni->lock);
+   spin_unlock_irq(&pni->lock);
tcache_put_pool(pni->pool);
 out:
return nr_isolated;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Remove excess variable from tcache_lru_isolate()

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit e6c8082f25609c977202364e85e72f6c2442d4b5
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:19 2017 +0300

tcache: Remove excess variable from tcache_lru_isolate()

We have two variables (nr and nr_isolated), which show the same.
Kill one of them.

v2: new

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 0e57ae6..0f15e8e 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -1029,7 +1029,7 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
 {
struct tcache_nodeinfo *ni = &tcache_nodeinfo[nid];
struct tcache_pool_nodeinfo *pni;
-   int nr, nr_isolated = 0;
+   int nr_isolated = 0;
struct rb_node *rbn;
 
spin_lock_irq(&ni->lock);
@@ -1046,9 +1046,8 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
goto again;
 
spin_lock(&pni->lock);
-   nr = __tcache_lru_isolate(pni, pages, nr_to_isolate);
-   ni->nr_pages -= nr;
-   nr_isolated += nr;
+   nr_isolated = __tcache_lru_isolate(pni, pages, nr_to_isolate);
+   ni->nr_pages -= nr_isolated;
 
if (!list_empty(&pni->lru))
__tcache_insert_reclaim_node(ni, pni);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Cache rb_first() of reclaim tree in tcache_nodeinfo::rb_first

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 5a95787003bdb2cbd00fa9111a3ef67aec05468c
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:20 2017 +0300

tcache: Cache rb_first() of reclaim tree in tcache_nodeinfo::rb_first

Set rb_first via RCU and, thus, allow lockless access to it.

v3: Move update_ni_rb_first() from patch "tcache: Move erase-insert logic 
out of tcache_check_events()".
v2: New
Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 0f15e8e..40608ec 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -157,6 +157,7 @@ struct tcache_nodeinfo {
 
/* tree of pools, sorted by reclaim prio */
struct rb_root reclaim_tree;
+   struct rb_node __rcu *rb_first;
 
/* total number of pages on all LRU lists corresponding to this node */
unsigned long nr_pages;
@@ -205,6 +206,13 @@ node_tree_from_key(struct tcache_pool *pool,
return &pool->node_tree[key_hash(key) & (num_node_trees - 1)];
 }
 
+static struct rb_node *update_ni_rb_first(struct tcache_nodeinfo *ni)
+{
+   struct rb_node *first = rb_first(&ni->reclaim_tree);
+   rcu_assign_pointer(ni->rb_first, first);
+   return first;
+}
+
 static void __tcache_insert_reclaim_node(struct tcache_nodeinfo *ni,
 struct tcache_pool_nodeinfo *pni);
 
@@ -242,6 +250,7 @@ static inline void __tcache_check_events(struct 
tcache_nodeinfo *ni,
 
rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
__tcache_insert_reclaim_node(ni, pni);
+   update_ni_rb_first(ni);
 }
 
 /*
@@ -270,8 +279,10 @@ static void tcache_lru_add(struct tcache_pool *pool, 
struct page *page)
 
__tcache_check_events(ni, pni);
 
-   if (unlikely(RB_EMPTY_NODE(&pni->reclaim_node)))
+   if (unlikely(RB_EMPTY_NODE(&pni->reclaim_node))) {
__tcache_insert_reclaim_node(ni, pni);
+   update_ni_rb_first(ni);
+   }
 
spin_unlock(&pni->lock);
spin_unlock(&ni->lock);
@@ -934,6 +945,7 @@ tcache_remove_from_reclaim_trees(struct tcache_pool *pool)
spin_lock_irq(&ni->lock);
if (!RB_EMPTY_NODE(&pni->reclaim_node)) {
rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
+   update_ni_rb_first(ni);
/*
 * Clear the node for __tcache_check_events() not to
 * reinsert the pool back into the tree.
@@ -1040,6 +1052,7 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
 
rb_erase(rbn, &ni->reclaim_tree);
RB_CLEAR_NODE(rbn);
+   update_ni_rb_first(ni);
 
pni = rb_entry(rbn, struct tcache_pool_nodeinfo, reclaim_node);
if (!tcache_grab_pool(pni->pool))
@@ -1049,8 +1062,10 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
nr_isolated = __tcache_lru_isolate(pni, pages, nr_to_isolate);
ni->nr_pages -= nr_isolated;
 
-   if (!list_empty(&pni->lru))
+   if (!list_empty(&pni->lru)) {
__tcache_insert_reclaim_node(ni, pni);
+   update_ni_rb_first(ni);
+   }
 
spin_unlock(&pni->lock);
tcache_put_pool(pni->pool);
@@ -1349,6 +1364,7 @@ static int __init tcache_nodeinfo_init(void)
ni = &tcache_nodeinfo[i];
spin_lock_init(&ni->lock);
ni->reclaim_tree = RB_ROOT;
+   update_ni_rb_first(ni);
}
return 0;
 }
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Add tcache_pool_nodeinfo::lock

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 13afaf53ede5cb733a5dba3319bcffea95fe9f48
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:18 2017 +0300

tcache: Add tcache_pool_nodeinfo::lock

Currently, for protection of all LRU lists is used tcache_nodeinfo::lock,
which is the only for the NUMA node, and it is used for all containers.
It's used when every container adds a page to LRU list. This makes it
"big tcache lock", which does not scale good.

The patch introduces a new lock for protection of struct 
tcache_pool_nodeinfo
fields, in particular, LRU list. LRU lists of filesystems (of containers)
are independent of each other, so different locks allows to scale better.

This patch only introduces the lock, and the lock order is:
tcache_nodeinfo::lock -> tcache_pool_nodeinfo::lock at the moment.
Next patches gradually will allow to change it vice versa.

Note, that now update of tcache_pool_nodeinfo::nr_pages and
tcache_nodeinfo::nr_pages happens under different locks.

v3: Add spin_lock_init() for lockdep

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 9f296dc..ab70af2 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -66,6 +66,7 @@ struct tcache_pool_nodeinfo {
/* increased on every LRU add/del, reset once it gets big enough;
 * used for rate limiting rebalancing of reclaim_tree */
unsigned long   events;
+   spinlock_t  lock;
 } cacheline_aligned_in_smp;
 
 /*
@@ -255,6 +256,7 @@ static void tcache_lru_add(struct tcache_pool *pool, struct 
page *page)
struct tcache_pool_nodeinfo *pni = &pool->nodeinfo[nid];
 
spin_lock(&ni->lock);
+   spin_lock(&pni->lock);
 
ni->nr_pages++;
pni->nr_pages++;
@@ -271,6 +273,7 @@ static void tcache_lru_add(struct tcache_pool *pool, struct 
page *page)
if (unlikely(RB_EMPTY_NODE(&pni->reclaim_node)))
__tcache_insert_reclaim_node(ni, pni);
 
+   spin_unlock(&pni->lock);
spin_unlock(&ni->lock);
 }
 
@@ -293,6 +296,7 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
struct tcache_pool_nodeinfo *pni = &pool->nodeinfo[nid];
 
spin_lock(&ni->lock);
+   spin_lock(&pni->lock);
 
/* Raced with reclaimer? */
if (unlikely(list_empty(&page->lru)))
@@ -306,6 +310,7 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
 
__tcache_check_events(ni, pni);
 out:
+   spin_unlock(&pni->lock);
spin_unlock(&ni->lock);
 }
 
@@ -342,6 +347,7 @@ static int tcache_create_pool(void)
pni->pool = pool;
RB_CLEAR_NODE(&pni->reclaim_node);
INIT_LIST_HEAD(&pni->lru);
+   spin_lock_init(&pni->lock);
}
 
idr_preload(GFP_KERNEL);
@@ -1039,6 +1045,7 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
if (!tcache_grab_pool(pni->pool))
goto again;
 
+   spin_lock(&pni->lock);
nr = __tcache_lru_isolate(pni, pages, nr_to_isolate);
ni->nr_pages -= nr;
nr_isolated += nr;
@@ -1047,6 +1054,7 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
if (!list_empty(&pni->lru))
__tcache_insert_reclaim_node(ni, pni);
 
+   spin_unlock(&pni->lock);
tcache_put_pool(pni->pool);
 out:
spin_unlock_irq(&ni->lock);
@@ -1091,14 +1099,17 @@ tcache_try_to_reclaim_page(struct tcache_pool *pool, 
int nid)
 
local_irq_save(flags);
 
-   spin_lock(&ni->lock);
+   spin_lock(&pni->lock);
ret = __tcache_lru_isolate(pni, &page, 1);
-   ni->nr_pages -= ret;
-   spin_unlock(&ni->lock);
+   spin_unlock(&pni->lock);
 
if (!ret)
goto out;
 
+   spin_lock(&ni->lock);
+   ni->nr_pages -= ret;
+   spin_unlock(&ni->lock);
+
if (!__tcache_reclaim_page(page))
page = NULL;
else
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Move add/sub out of pni->lock

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 673358de1fce85596dcd17e1bde8b7a9639fcc1c
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:22 2017 +0300

tcache: Move add/sub out of pni->lock

This minimizes number of operations happening under pni->lock.
Note, that we do add before linking to the list, so parallel
shrink does not make nr_pages negative.

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 5faa390..d1a2c53 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -261,8 +261,9 @@ static void tcache_lru_add(struct tcache_pool *pool, struct 
page *page)
struct tcache_nodeinfo *ni = &tcache_nodeinfo[nid];
struct tcache_pool_nodeinfo *pni = &pool->nodeinfo[nid];
 
-   spin_lock(&pni->lock);
atomic_long_inc(&ni->nr_pages);
+
+   spin_lock(&pni->lock);
pni->nr_pages++;
list_add_tail(&page->lru, &pni->lru);
 
@@ -300,6 +301,7 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
int nid = page_to_nid(page);
struct tcache_nodeinfo *ni = &tcache_nodeinfo[nid];
struct tcache_pool_nodeinfo *pni = &pool->nodeinfo[nid];
+   bool deleted = false;
 
spin_lock(&pni->lock);
 
@@ -308,7 +310,7 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
goto out;
 
__tcache_lru_del(pni, page);
-   atomic_long_dec(&ni->nr_pages);
+   deleted = true;
 
if (reused)
pni->recent_gets++;
@@ -323,6 +325,8 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
}
 out:
spin_unlock(&pni->lock);
+   if (deleted)
+   atomic_long_dec(&ni->nr_pages);
 }
 
 static int tcache_create_pool(void)
@@ -1071,8 +1075,6 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
if (!nr_isolated)
goto unlock;
 
-   atomic_long_sub(nr_isolated, &ni->nr_pages);
-
if (!RB_EMPTY_NODE(rbn) || !list_empty(&pni->lru)) {
spin_lock(&ni->lock);
if (!RB_EMPTY_NODE(rbn))
@@ -1088,6 +1090,8 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
spin_unlock_irq(&pni->lock);
tcache_put_pool(pni->pool);
 out:
+   if (nr_isolated)
+   atomic_long_sub(nr_isolated, &ni->nr_pages);
return nr_isolated;
 }
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Decrement removed from LRU pages out of __tcache_lru_del()

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit eb34be224b7ca575751cc6f9752a7f8171c5c4f7
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:17 2017 +0300

tcache: Decrement removed from LRU pages out of __tcache_lru_del()

Patchset description:
tcache: Manage LRU lists under per-filesystem lock

Changes to v2:
Disable irqs in tcache_lru_isolate() [9/10]
Move update_ni_rb_first() to "tcache: Cache rb_first() of reclaim tree 
in tcache_nodeinfo::rb_first".
Add spin_lock_init() for lockdep [2/10]

Kirill Tkhai (10):
  tcache: Decrement removed from LRU pages out of __tcache_lru_del()
  tcache: Add tcache_pool_nodeinfo::lock
  tcache: Cleanup unused expression from tcache_lru_isolate()
  tcache: Remove excess variable from tcache_lru_isolate()
  tcache: Cache rb_first() of reclaim tree in tcache_nodeinfo::rb_first
  tcache: Make tcache_lru_isolate() keep ni->lock less
  tcache: Move erase-insert logic out of tcache_check_events()
  tcache: Make tcache_nodeinfo::nr_pages atomic_long_t
  tcache: Use ni->lock only for inserting and erasing from rbtree.
  tcache: Move add/sub out of pni->lock

https://jira.sw.ru/browse/PSBM-69296

This patchset decreases the cpu usage on writing big files in Containers.

==
This patch description:

Move the subtraction out of __tcache_lru_del, and this will be used
in next patches. Also, delete ni argument of the function.

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 0bfbb69..9f296dc 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -274,11 +274,9 @@ static void tcache_lru_add(struct tcache_pool *pool, 
struct page *page)
spin_unlock(&ni->lock);
 }
 
-static void __tcache_lru_del(struct tcache_nodeinfo *ni,
-struct tcache_pool_nodeinfo *pni,
+static void __tcache_lru_del(struct tcache_pool_nodeinfo *pni,
 struct page *page)
 {
-   ni->nr_pages--;
pni->nr_pages--;
list_del_init(&page->lru);
 }
@@ -300,7 +298,8 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
if (unlikely(list_empty(&page->lru)))
goto out;
 
-   __tcache_lru_del(ni, pni, page);
+   __tcache_lru_del(pni, page);
+   ni->nr_pages--;
 
if (reused)
pni->recent_gets++;
@@ -988,8 +987,7 @@ __tcache_insert_reclaim_node(struct tcache_nodeinfo *ni,
 }
 
 static noinline_for_stack int
-__tcache_lru_isolate(struct tcache_nodeinfo *ni,
-struct tcache_pool_nodeinfo *pni,
+__tcache_lru_isolate(struct tcache_pool_nodeinfo *pni,
 struct page **pages, int nr_to_scan)
 {
struct tcache_node *node;
@@ -1002,7 +1000,7 @@ __tcache_lru_isolate(struct tcache_nodeinfo *ni,
if (unlikely(!page_cache_get_speculative(page)))
continue;
 
-   __tcache_lru_del(ni, pni, page);
+   __tcache_lru_del(pni, page);
 
/*
 * A node can be destroyed only if all its pages have been
@@ -1041,7 +1039,8 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
if (!tcache_grab_pool(pni->pool))
goto again;
 
-   nr = __tcache_lru_isolate(ni, pni, pages, nr_to_isolate);
+   nr = __tcache_lru_isolate(pni, pages, nr_to_isolate);
+   ni->nr_pages -= nr;
nr_isolated += nr;
nr_to_isolate -= nr;
 
@@ -1093,7 +1092,8 @@ tcache_try_to_reclaim_page(struct tcache_pool *pool, int 
nid)
local_irq_save(flags);
 
spin_lock(&ni->lock);
-   ret = __tcache_lru_isolate(ni, pni, &page, 1);
+   ret = __tcache_lru_isolate(pni, &page, 1);
+   ni->nr_pages -= ret;
spin_unlock(&ni->lock);
 
if (!ret)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Make tcache_nodeinfo::nr_pages atomic_long_t

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 89f8a885e1deeff230554cf1c4dcd323fcbaa9ea
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:21 2017 +0300

tcache: Make tcache_nodeinfo::nr_pages atomic_long_t

This allows to do not avoid tcache_nodeinfo::lock
to change nr_pages.

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 6962097..202834c 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -160,7 +160,7 @@ struct tcache_nodeinfo {
struct rb_node __rcu *rb_first;
 
/* total number of pages on all LRU lists corresponding to this node */
-   unsigned long nr_pages;
+   atomic_long_t nr_pages;
 } cacheline_aligned_in_smp;
 
 /*
@@ -263,8 +263,7 @@ static void tcache_lru_add(struct tcache_pool *pool, struct 
page *page)
 
spin_lock(&ni->lock);
spin_lock(&pni->lock);
-
-   ni->nr_pages++;
+   atomic_long_inc(&ni->nr_pages);
pni->nr_pages++;
list_add_tail(&page->lru, &pni->lru);
 
@@ -310,7 +309,7 @@ static void tcache_lru_del(struct tcache_pool *pool, struct 
page *page,
goto out;
 
__tcache_lru_del(pni, page);
-   ni->nr_pages--;
+   atomic_long_dec(&ni->nr_pages);
 
if (reused)
pni->recent_gets++;
@@ -1073,7 +1072,7 @@ tcache_lru_isolate(int nid, struct page **pages, int 
nr_to_isolate)
if (!nr_isolated)
goto unlock;
 
-   ni->nr_pages -= nr_isolated;
+   atomic_long_sub(nr_isolated, &ni->nr_pages);
 
if (!RB_EMPTY_NODE(rbn)) {
rb_erase(rbn, &ni->reclaim_tree);
@@ -1136,9 +1135,7 @@ tcache_try_to_reclaim_page(struct tcache_pool *pool, int 
nid)
if (!ret)
goto out;
 
-   spin_lock(&ni->lock);
-   ni->nr_pages -= ret;
-   spin_unlock(&ni->lock);
+   atomic_long_dec(&ni->nr_pages);
 
if (!__tcache_reclaim_page(page))
page = NULL;
@@ -1163,7 +1160,12 @@ static struct page *tcache_alloc_page(struct tcache_pool 
*pool)
 static unsigned long tcache_shrink_count(struct shrinker *shrink,
 struct shrink_control *sc)
 {
-   return tcache_nodeinfo[sc->nid].nr_pages;
+   atomic_long_t *nr_pages = &tcache_nodeinfo[sc->nid].nr_pages;
+   long ret;
+
+   ret = atomic_long_read(nr_pages);
+   WARN_ON(ret < 0);
+   return ret >= 0 ? ret : 0;
 }
 
 #define TCACHE_SCAN_BATCH 128UL
@@ -1380,6 +1382,7 @@ static int __init tcache_nodeinfo_init(void)
for (i = 0; i < nr_node_ids; i++) {
ni = &tcache_nodeinfo[i];
spin_lock_init(&ni->lock);
+   atomic_long_set(&ni->nr_pages, 0);
ni->reclaim_tree = RB_ROOT;
update_ni_rb_first(ni);
}
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] tcache: Move erase-insert logic out of tcache_check_events()

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit e6e93d14b403bd4176358427d6f7f0e1c252ea5e
Author: Kirill Tkhai 
Date:   Thu Aug 31 18:18:21 2017 +0300

tcache: Move erase-insert logic out of tcache_check_events()

Make the function return true, when erase-insert (requeue)
should be executed. Move erase-insert out of the function.

v3: Move update_ni_rb_first() to "tcache: Cache rb_first() of reclaim tree 
in tcache_nodeinfo::rb_first".

Signed-off-by: Kirill Tkhai 
Acked-by: Andrey Ryabinin 
---
 mm/tcache.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 3d9c5ac..6962097 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -216,8 +216,7 @@ static struct rb_node *update_ni_rb_first(struct 
tcache_nodeinfo *ni)
 static void __tcache_insert_reclaim_node(struct tcache_nodeinfo *ni,
 struct tcache_pool_nodeinfo *pni);
 
-static inline void __tcache_check_events(struct tcache_nodeinfo *ni,
-struct tcache_pool_nodeinfo *pni)
+static inline bool tcache_check_events(struct tcache_pool_nodeinfo *pni)
 {
/*
 * We don't want to rebalance reclaim_tree on each get/put, because it
@@ -228,7 +227,7 @@ static inline void __tcache_check_events(struct 
tcache_nodeinfo *ni,
 */
pni->events++;
if (likely(pni->events < 1024))
-   return;
+   return false;
 
pni->events = 0;
 
@@ -238,7 +237,7 @@ static inline void __tcache_check_events(struct 
tcache_nodeinfo *ni,
 * it will be done by the shrinker once it tries to scan it.
 */
if (unlikely(list_empty(&pni->lru)))
-   return;
+   return false;
 
/*
 * This can only happen if the node was removed from the tree on pool
@@ -246,11 +245,9 @@ static inline void __tcache_check_events(struct 
tcache_nodeinfo *ni,
 * then.
 */
if (unlikely(RB_EMPTY_NODE(&pni->reclaim_node)))
-   return;
+   return false;
 
-   rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
-   __tcache_insert_reclaim_node(ni, pni);
-   update_ni_rb_first(ni);
+   return true;
 }
 
 /*
@@ -277,13 +274,12 @@ static void tcache_lru_add(struct tcache_pool *pool, 
struct page *page)
pni->recent_puts /= 2;
}
 
-   __tcache_check_events(ni, pni);
-
-   if (unlikely(RB_EMPTY_NODE(&pni->reclaim_node))) {
+   if (tcache_check_events(pni) || RB_EMPTY_NODE(&pni->reclaim_node)) {
+   if (!RB_EMPTY_NODE(&pni->reclaim_node))
+   rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
__tcache_insert_reclaim_node(ni, pni);
update_ni_rb_first(ni);
}
-
spin_unlock(&pni->lock);
spin_unlock(&ni->lock);
 }
@@ -319,7 +315,12 @@ static void tcache_lru_del(struct tcache_pool *pool, 
struct page *page,
if (reused)
pni->recent_gets++;
 
-   __tcache_check_events(ni, pni);
+   if (tcache_check_events(pni)) {
+   if (!RB_EMPTY_NODE(&pni->reclaim_node))
+   rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
+   __tcache_insert_reclaim_node(ni, pni);
+   update_ni_rb_first(ni);
+   }
 out:
spin_unlock(&pni->lock);
spin_unlock(&ni->lock);
@@ -947,7 +948,7 @@ tcache_remove_from_reclaim_trees(struct tcache_pool *pool)
rb_erase(&pni->reclaim_node, &ni->reclaim_tree);
update_ni_rb_first(ni);
/*
-* Clear the node for __tcache_check_events() not to
+* Clear the node for tcache_check_events() not to
 * reinsert the pool back into the tree.
 */
RB_CLEAR_NODE(&pni->reclaim_node);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] Revert "autofs: fix autofs_v5_packet structure for compat mode"

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit a24e586ec36bf182a3261a9608e8515d424b242e
Author: Konstantin Khorenko 
Date:   Thu Aug 31 17:56:44 2017 +0300

Revert "autofs: fix autofs_v5_packet structure for compat mode"

This reverts commit e484b0abe8af8793f58e6434060a3779261d3151.

The patch is question increases the offsetof(struct autofs_v5_packet, name) 
by
4 which is not good, the patch is to be reworked.

Thanks to Dmitry V. Levin for noticing it.

https://jira.sw.ru/browse/PSBM-71078

Signed-off-by: Konstantin Khorenko 
---
 include/uapi/linux/auto_fs4.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index 8729a47..e02982f 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -137,8 +137,6 @@ struct autofs_v5_packet {
__u32 pid;
__u32 tgid;
__u32 len;
-   __u32 blob; /* This is needed to align structure up to 8
-  bytes for ALL archs including 32-bit */
char name[NAME_MAX+1];
 };
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH RHEL7 COMMIT] ms/mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp

2017-08-31 Thread Konstantin Khorenko


Please consider to prepare a ReadyKernel patch for it.

https://readykernel.com/

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/31/2017 05:51 PM, Konstantin Khorenko wrote:

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit f5b413ea4e53d819c8b4e4a4927fb563bd3ec24f
Author: Keno Fischer 
Date:   Thu Aug 31 17:51:25 2017 +0300

ms/mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp

commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.

In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead.  Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.

However, a similar check in huge_memory.c was forgotten.  As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.

While in this state the process is stil SIGKILLable, but little else
works (e.g.  no ptrace attach, no other signals).  This is easily
reproduced with the following code (assuming thp are set to always):

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define TEST_SIZE 5 * 1024 * 1024

int main(void) {
  int status;
  pid_t child;
  int fd = open("/proc/self/mem", O_RDWR);
  void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
  assert(addr != MAP_FAILED);
  pid_t parent_pid = getpid();
  if ((child = fork()) == 0) {
void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
   MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
assert(addr2 != MAP_FAILED);
memset(addr2, 'a', TEST_SIZE);
pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
return 0;
  }
  assert(child == waitpid(child, &status, 0));
  assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
  return 0;
}

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit.  The same pattern exists
in follow_devmap_pmd.  However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.

[a...@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.ga38...@juliacomputing.com
Signed-off-by: Keno Fischer 
Acked-by: Kirill A. Shutemov 
Cc: Greg Thelen 
Cc: Nicholas Piggin 
Cc: Willy Tarreau 
Cc: Oleg Nesterov 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Michal Hocko 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2:
 - Drop change to follow_devmap_pmd()
 - pmd_dirty() is not available; check the page flags as in
   can_follow_write_pte()
 - Adjust context]
Signed-off-by: Ben Hutchings 
[mhocko:
  This has been forward ported from the 3.2 stable tree.
  And fixed to return NULL.]
Reviewed-by: Michal Hocko 
Signed-off-by: Jiri Slaby 
Signed-off-by: Willy Tarreau 

https://jira.sw.ru/browse/PSBM-70151

Signed-off-by: Andrey Ryabinin 
---
 mm/huge_memory.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 477610d..5a07e76 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1317,6 +1317,18 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return ret;
 }

+/*
+ * foll_force can write to even unwritable pmd's, but only
+ * after we've gone through a cow cycle and they are dirty.
+ */
+static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
+   unsigned int flags)
+{
+   return pmd_write(pmd) ||
+   ((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
+page && PageAnon(page));
+}
+
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
   unsigned long addr,
   pmd_t *pmd,
@@ -1327,9 +1339,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct 
*vma,

assert_spin_locked(pmd_lockptr(mm, pmd));

-   if (flags & FOLL_WRITE && !pmd_write(*pmd))
-

[Devel] [PATCH RHEL7 COMMIT] ms/mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit f5b413ea4e53d819c8b4e4a4927fb563bd3ec24f
Author: Keno Fischer 
Date:   Thu Aug 31 17:51:25 2017 +0300

ms/mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp

commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.

In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead.  Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.

However, a similar check in huge_memory.c was forgotten.  As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.

While in this state the process is stil SIGKILLable, but little else
works (e.g.  no ptrace attach, no other signals).  This is easily
reproduced with the following code (assuming thp are set to always):

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define TEST_SIZE 5 * 1024 * 1024

int main(void) {
  int status;
  pid_t child;
  int fd = open("/proc/self/mem", O_RDWR);
  void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
  assert(addr != MAP_FAILED);
  pid_t parent_pid = getpid();
  if ((child = fork()) == 0) {
void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
   MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
assert(addr2 != MAP_FAILED);
memset(addr2, 'a', TEST_SIZE);
pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
return 0;
  }
  assert(child == waitpid(child, &status, 0));
  assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
  return 0;
}

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit.  The same pattern exists
in follow_devmap_pmd.  However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.

[a...@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.ga38...@juliacomputing.com
Signed-off-by: Keno Fischer 
Acked-by: Kirill A. Shutemov 
Cc: Greg Thelen 
Cc: Nicholas Piggin 
Cc: Willy Tarreau 
Cc: Oleg Nesterov 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Michal Hocko 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2:
 - Drop change to follow_devmap_pmd()
 - pmd_dirty() is not available; check the page flags as in
   can_follow_write_pte()
 - Adjust context]
Signed-off-by: Ben Hutchings 
[mhocko:
  This has been forward ported from the 3.2 stable tree.
  And fixed to return NULL.]
Reviewed-by: Michal Hocko 
Signed-off-by: Jiri Slaby 
Signed-off-by: Willy Tarreau 

https://jira.sw.ru/browse/PSBM-70151

Signed-off-by: Andrey Ryabinin 
---
 mm/huge_memory.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 477610d..5a07e76 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1317,6 +1317,18 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return ret;
 }
 
+/*
+ * foll_force can write to even unwritable pmd's, but only
+ * after we've gone through a cow cycle and they are dirty.
+ */
+static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
+   unsigned int flags)
+{
+   return pmd_write(pmd) ||
+   ((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
+page && PageAnon(page));
+}
+
 struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
   unsigned long addr,
   pmd_t *pmd,
@@ -1327,9 +1339,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct 
*vma,
 
assert_spin_locked(pmd_lockptr(mm, pmd));
 
-   if (flags & FOLL_WRITE && !pmd_write(*pmd))
-   goto out;
-
/* Avoid dumping huge zero page */
if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
return ERR_PTR(-EFAULT)

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for coredump event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit b6d449038da008a26835e1ae16292869b1fe80aa
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:36 2017 +0300

proc connector: use generic event helper for coredump event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 2d5ff7c..312f30f 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -222,31 +222,17 @@ void proc_comm_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_COMM, 0, fill_comm_event);
 }
 
-void proc_coredump_connector(struct task_struct *task)
+static bool fill_coredump_event(struct proc_event *ev, struct task_struct 
*task,
+   int unused)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-   struct timespec ts;
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_COREDUMP;
ev->event_data.coredump.process_pid = task->pid;
ev->event_data.coredump.process_tgid = task->tgid;
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_coredump_connector(struct task_struct *task)
+{
+   proc_event_connector(task, PROC_EVENT_COREDUMP, 0, fill_coredump_event);
 }
 
 void proc_exit_connector(struct task_struct *task)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: take number of listeners and per-cpu conters from VE

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 06bed1f6e4442906e11d86763920b00f107a2112
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:40 2017 +0300

proc connector: take number of listeners and per-cpu conters from VE

Instead of static variables.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 50 +
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 7a1124a..ff99f06 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -50,21 +50,17 @@ static inline struct cn_msg *buffer_to_cn_msg(__u8 *buffer)
return (struct cn_msg *)(buffer + 4);
 }
 
-static atomic_t proc_event_num_listeners = ATOMIC_INIT(0);
 static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC };
 
-/* proc_event_counts is used as the sequence number of the netlink message */
-static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
-
-static inline void get_seq(__u32 *ts, int *cpu)
+static inline void get_seq(struct ve_struct *ve, __u32 *ts, int *cpu)
 {
preempt_disable();
-   *ts = __this_cpu_inc_return(proc_event_counts) - 1;
+   *ts = __this_cpu_inc_return(*ve->cn->proc_event_counts) - 1;
*cpu = smp_processor_id();
preempt_enable();
 }
 
-static struct cn_msg *cn_msg_fill(__u8 *buffer,
+static struct cn_msg *cn_msg_fill(__u8 *buffer, struct ve_struct *ve,
  struct task_struct *task,
  int what, int cookie,
  bool (*fill_event)(struct proc_event *ev,
@@ -78,7 +74,7 @@ static struct cn_msg *cn_msg_fill(__u8 *buffer,
msg = buffer_to_cn_msg(buffer);
ev = (struct proc_event *)msg->data;
 
-   get_seq(&msg->seq, &ev->cpu);
+   get_seq(ve, &msg->seq, &ev->cpu);
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
@@ -92,6 +88,13 @@ static struct cn_msg *cn_msg_fill(__u8 *buffer,
return fill_event(ev, task, cookie) ? msg : NULL;
 }
 
+static int proc_event_num_listeners(struct ve_struct *ve)
+{
+   if (ve->cn)
+   return atomic_read(&ve->cn->proc_event_num_listeners);
+   return 0;
+}
+
 static void proc_event_connector(struct task_struct *task,
 int what, int cookie,
 bool (*fill_event)(struct proc_event *ev,
@@ -100,11 +103,12 @@ static void proc_event_connector(struct task_struct *task,
 {
struct cn_msg *msg;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+   struct ve_struct *ve = task->task_ve;
 
-   if (atomic_read(&proc_event_num_listeners) < 1)
+   if (proc_event_num_listeners(ve) < 1)
return;
 
-   msg = cn_msg_fill(buffer, task, what, cookie, fill_event);
+   msg = cn_msg_fill(buffer, ve, task, what, cookie, fill_event);
if (!msg)
return;
 
@@ -258,14 +262,14 @@ void proc_exit_connector(struct task_struct *task)
  * values because it's not being returned via syscall return
  * mechanisms.
  */
-static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
+static void cn_proc_ack(struct ve_struct *ve, int err, int rcvd_seq, int 
rcvd_ack)
 {
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
struct timespec ts;
 
-   if (atomic_read(&proc_event_num_listeners) < 1)
+   if (proc_event_num_listeners(ve) < 1)
return;
 
msg = buffer_to_cn_msg(buffer);
@@ -292,6 +296,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
  struct netlink_skb_parms *nsp)
 {
enum proc_cn_mcast_op *mc_op = NULL;
+   struct ve_struct *ve = get_exec_env();
int err = 0;
 
if (msg->len != sizeof(*mc_op))
@@ -315,10 +320,10 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
mc_op = (enum proc_cn_mcast_op *)msg->data;
switch (*mc_op) {
case PROC_CN_MCAST_LISTEN:
-   atomic_inc(&proc_event_num_listeners);
+   atomic_inc(&ve->cn->proc_event_num_listeners);
break;
case PROC_CN_MCAST_IGNORE:
-   atomic_dec(&proc_event_num_listeners);
+   atomic_dec(&ve->cn->proc_event_num_listeners);
break;
default:
err = EINVAL;
@@ -326,22 +331,31 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
}
 
 out:
-   cn_proc_ack(err, msg->seq, msg->ack);
+   cn_proc_ack(ve, err, msg->seq, msg->ack);
 }
 
 int cn_proc_init_ve(struct ve_struct *ve)
 {
-   int err = cn_add_callback_ve(ve, &cn_proc_event_id,
-

[Devel] [PATCH RHEL7 COMMIT] connector: store all private data on VE structure

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit a74b5b56cac3c2212351dbc1e9ca957789221347
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:28 2017 +0300

connector: store all private data on VE structure

This is needed to containerize connector and its proc part.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 include/linux/connector.h | 9 +
 include/linux/ve.h| 4 
 2 files changed, 13 insertions(+)

diff --git a/include/linux/connector.h b/include/linux/connector.h
index 4c4d2b9..9e05e28 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -67,6 +67,15 @@ struct cn_dev {
struct cn_queue_dev *cbdev;
 };
 
+struct cn_private {
+   struct cn_dev   cdev;
+   int cn_already_initialized;
+
+   atomic_tproc_event_num_listeners;
+   u32 __percpu*proc_event_counts;
+
+};
+
 int cn_add_callback(struct cb_id *id, const char *name,
void (*callback)(struct cn_msg *, struct netlink_skb_parms 
*));
 void cn_del_callback(struct cb_id *);
diff --git a/include/linux/ve.h b/include/linux/ve.h
index c9b0af4..d63edee 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -30,6 +30,7 @@ struct file_system_type;
 struct veip_struct;
 struct nsproxy;
 struct user_namespace;
+struct cn_private;
 extern struct user_namespace init_user_ns;
 
 struct ve_struct {
@@ -123,6 +124,9 @@ struct ve_struct {
 #ifdef CONFIG_COREDUMP
charcore_pattern[CORENAME_MAX_SIZE];
 #endif
+#ifdef CONFIG_CONNECTOR
+   struct cn_private   *cn;
+#endif
 };
 
 struct ve_devmnt {
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for comm event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 5e8a090a6347dc8364c23612aaf6a225254a0c53
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:36 2017 +0300

proc connector: use generic event helper for comm event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 36a53fd..2d5ff7c 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -208,32 +208,18 @@ void proc_ptrace_connector(struct task_struct *task, int 
ptrace_id)
 fill_ptrace_event);
 }
 
-void proc_comm_connector(struct task_struct *task)
+static bool fill_comm_event(struct proc_event *ev, struct task_struct *task,
+   int unused)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   struct timespec ts;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_COMM;
ev->event_data.comm.process_pid  = task->pid;
ev->event_data.comm.process_tgid = task->tgid;
get_task_comm(ev->event_data.comm.comm, task);
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_comm_connector(struct task_struct *task)
+{
+   proc_event_connector(task, PROC_EVENT_COMM, 0, fill_comm_event);
 }
 
 void proc_coredump_connector(struct task_struct *task)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: add VE SS hook

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 07e77673691685713f04bd6b84fc0e07eae57158
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:45 2017 +0300

connector: add VE SS hook

And thus containerize connector finally.

https://jira.sw.ru/browse/PSBM-60227

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 81854bf..752c692 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -273,8 +273,9 @@ static const struct file_operations cn_file_ops = {
.release = single_release
 };
 
-static int cn_init_ve(struct ve_struct *ve)
+static int cn_init_ve(void *data)
 {
+   struct ve_struct *ve = data;
struct cn_dev *dev;
struct netlink_kernel_cfg cfg = {
.groups = CN_NETLINK_USERS + 0xf,
@@ -326,8 +327,9 @@ static int cn_init_ve(struct ve_struct *ve)
return err;
 }
 
-static void cn_fini_ve(struct ve_struct *ve)
+static void cn_fini_ve(void *data)
 {
+   struct ve_struct *ve = data;
struct cn_dev *dev = get_cdev(ve);
struct net *net = ve->ve_netns;
 
@@ -344,13 +346,28 @@ static void cn_fini_ve(struct ve_struct *ve)
ve->cn = NULL;
 }
 
+static struct ve_hook cn_ss_hook = {
+   .init = cn_init_ve,
+   .fini = cn_fini_ve,
+   .priority = HOOK_PRIO_DEFAULT,
+   .owner = THIS_MODULE,
+};
+
 static int cn_init(void)
 {
-   return cn_init_ve(get_ve0());
+   int err;
+
+   err = cn_init_ve(get_ve0());
+   if (err)
+   return err;
+
+   ve_hook_register(VE_SS_CHAIN, &cn_ss_hook);
+   return 0;
 }
 
 static void cn_fini(void)
 {
+   ve_hook_unregister(&cn_ss_hook);
return cn_fini_ve(get_ve0());
 }
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: send events to both VEs if not in VE#0

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 4ba0afbe02d33bf2e906209521bb59e7fa0def73
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:43 2017 +0300

proc connector: send events to both VEs if not in VE#0

This is needed to preserve current behaviour, when process in initial pid 
and
user namespaces (i.e. in VE#0) can receive events from all the processes in
the system.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 17e0247..81f2e56 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -96,16 +96,16 @@ static int proc_event_num_listeners(struct ve_struct *ve)
return 0;
 }
 
-static void proc_event_connector(struct task_struct *task,
-int what, int cookie,
-bool (*fill_event)(struct proc_event *ev,
-   struct ve_struct *ve,
-   struct task_struct *task,
-   int cookie))
+static void proc_event_connector_ve(struct task_struct *task,
+   struct ve_struct *ve,
+   int what, int cookie,
+   bool (*fill_event)(struct proc_event *ev,
+  struct ve_struct *ve,
+  struct task_struct *task,
+  int cookie))
 {
struct cn_msg *msg;
__u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-   struct ve_struct *ve = task->task_ve;
 
if (proc_event_num_listeners(ve) < 1)
return;
@@ -118,6 +118,21 @@ static void proc_event_connector(struct task_struct *task,
cn_netlink_send_ve(ve, msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
+static void proc_event_connector(struct task_struct *task,
+int what, int cookie,
+bool (*fill_event)(struct proc_event *ev,
+   struct ve_struct *ve,
+   struct task_struct *task,
+   int cookie))
+{
+   struct ve_struct *ve = task->task_ve;
+
+   if (!ve_is_super(ve))
+   proc_event_connector_ve(task, ve, what, cookie, fill_event);
+
+   proc_event_connector_ve(task, get_ve0(), what, cookie, fill_event);
+}
+
 static bool fill_fork_event(struct proc_event *ev, struct ve_struct *ve,
struct task_struct *task, int unused)
 {
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: add pid namespace awareness

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit d53ad1ca8439459567dbb732ea568ae75cb9a6b3
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:38 2017 +0300

proc connector: add pid namespace awareness

This is precursor patch. Later VE pid ns will be used.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 4ee1640..17a8c8c 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -119,11 +119,11 @@ static bool fill_fork_event(struct proc_event *ev, struct 
task_struct *task,
 
rcu_read_lock();
parent = rcu_dereference(task->real_parent);
-   ev->event_data.fork.parent_pid = parent->pid;
-   ev->event_data.fork.parent_tgid = parent->tgid;
+   ev->event_data.fork.parent_pid = task_pid_nr_ns(parent, &init_pid_ns);
+   ev->event_data.fork.parent_tgid = task_tgid_nr_ns(parent, &init_pid_ns);
rcu_read_unlock();
-   ev->event_data.fork.child_pid = task->pid;
-   ev->event_data.fork.child_tgid = task->tgid;
+   ev->event_data.fork.child_pid = task_pid_nr_ns(task, &init_pid_ns);
+   ev->event_data.fork.child_tgid = task_tgid_nr_ns(task, &init_pid_ns);
return true;
 }
 
@@ -135,8 +135,8 @@ void proc_fork_connector(struct task_struct *task)
 static bool fill_exec_event(struct proc_event *ev, struct task_struct *task,
int unused)
 {
-   ev->event_data.exec.process_pid = task->pid;
-   ev->event_data.exec.process_tgid = task->tgid;
+   ev->event_data.exec.process_pid = task_pid_nr_ns(task, &init_pid_ns);
+   ev->event_data.exec.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
return true;
 }
 
@@ -150,8 +150,8 @@ static bool fill_id_event(struct proc_event *ev, struct 
task_struct *task,
 {
const struct cred *cred;
 
-   ev->event_data.id.process_pid = task->pid;
-   ev->event_data.id.process_tgid = task->tgid;
+   ev->event_data.id.process_pid = task_pid_nr_ns(task, &init_pid_ns);
+   ev->event_data.id.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
rcu_read_lock();
cred = __task_cred(task);
if (which_id == PROC_EVENT_UID) {
@@ -176,8 +176,8 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
 static bool fill_sid_event(struct proc_event *ev, struct task_struct *task,
   int unused)
 {
-   ev->event_data.sid.process_pid = task->pid;
-   ev->event_data.sid.process_tgid = task->tgid;
+   ev->event_data.sid.process_pid = task_pid_nr_ns(task, &init_pid_ns);
+   ev->event_data.sid.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
return true;
 }
 
@@ -189,11 +189,11 @@ void proc_sid_connector(struct task_struct *task)
 static bool fill_ptrace_event(struct proc_event *ev, struct task_struct *task,
   int ptrace_id)
 {
-   ev->event_data.ptrace.process_pid  = task->pid;
-   ev->event_data.ptrace.process_tgid = task->tgid;
+   ev->event_data.ptrace.process_pid  = task_pid_nr_ns(task, &init_pid_ns);
+   ev->event_data.ptrace.process_tgid = task_tgid_nr_ns(task, 
&init_pid_ns);
if (ptrace_id == PTRACE_ATTACH) {
-   ev->event_data.ptrace.tracer_pid  = current->pid;
-   ev->event_data.ptrace.tracer_tgid = current->tgid;
+   ev->event_data.ptrace.tracer_pid  = task_pid_nr_ns(current, 
&init_pid_ns);
+   ev->event_data.ptrace.tracer_tgid = task_tgid_nr_ns(current, 
&init_pid_ns);
} else if (ptrace_id == PTRACE_DETACH) {
ev->event_data.ptrace.tracer_pid  = 0;
ev->event_data.ptrace.tracer_tgid = 0;
@@ -211,8 +211,8 @@ void proc_ptrace_connector(struct task_struct *task, int 
ptrace_id)
 static bool fill_comm_event(struct proc_event *ev, struct task_struct *task,
int unused)
 {
-   ev->event_data.comm.process_pid  = task->pid;
-   ev->event_data.comm.process_tgid = task->tgid;
+   ev->event_data.comm.process_pid  = task_pid_nr_ns(task, &init_pid_ns);
+   ev->event_data.comm.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
get_task_comm(ev->event_data.comm.comm, task);
return true;
 }
@@ -225,8 +225,8 @@ void proc_comm_connector(struct task_struct *task)
 static bool fill_coredump_event(struct proc_event *ev, struct task_struct 
*task,
int unused)
 {
-   ev->event_data.coredump.process_pid = task->pid;
-   ev->event_data.coredump.process_tgid = task->tgid;
+   ev->event_data.coredump.process_pid = task_pid_nr_ns(task, 
&init_pid_ns);
+   ev->event_data.coredump.pro

[Devel] [PATCH RHEL7 COMMIT] connector: take VE from socket upon callback

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit d7f362627da257bcb656a806fa0ece3743371fd4
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:44 2017 +0300

connector: take VE from socket upon callback

This is needed to attach listener to the right device. I.e. attach to the
right source of events (in terms of CT).

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 771dadf..81854bf 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -130,7 +130,7 @@ EXPORT_SYMBOL_GPL(cn_netlink_send);
 static int cn_call_callback(struct sk_buff *skb)
 {
struct cn_callback_entry *i, *cbq = NULL;
-   struct cn_dev *dev = get_cdev(get_ve0());
+   struct cn_dev *dev = get_cdev(skb->sk->sk_net->owner_ve);
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = &NETLINK_CB(skb);
int err = -ENODEV;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: use device stored in VE

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 0773323bf46b0b99e6095a74cc1e1cd46dd18752
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:30 2017 +0300

connector: use device stored in VE

Instead of global static device.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index f5484b2..bc2308a 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -38,8 +38,6 @@ MODULE_AUTHOR("Evgeniy Polyakov ");
 MODULE_DESCRIPTION("Generic userspace <-> kernelspace connector.");
 MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_CONNECTOR);
 
-static struct cn_dev cdev;
-
 static int cn_already_initialized;
 
 /*
@@ -66,7 +64,7 @@ static int cn_already_initialized;
 
 static struct cn_dev *get_cdev(struct ve_struct *ve)
 {
-   return &cdev;
+   return &ve->cn->cdev;
 }
 
 int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t gfp_mask)
@@ -261,7 +259,7 @@ static const struct file_operations cn_file_ops = {
 
 static int cn_init_ve(struct ve_struct *ve)
 {
-   struct cn_dev *dev = get_cdev(get_ve0());
+   struct cn_dev *dev;
struct netlink_kernel_cfg cfg = {
.groups = CN_NETLINK_USERS + 0xf,
.input  = cn_rx_skb,
@@ -272,6 +270,8 @@ static int cn_init_ve(struct ve_struct *ve)
if (!ve->cn)
return -ENOMEM;
 
+   dev = &ve->cn->cdev;
+
dev->nls = netlink_kernel_create(net, NETLINK_CONNECTOR, &cfg);
if (!dev->nls)
return -EIO;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: pass VE to event fillers

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 472f0bf7498a2c07fb5e3764cda8036314497bf9
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:40 2017 +0300

proc connector: pass VE to event fillers

Precursor patch. VE will be used later to get proper pid and user namespaces
for correct event generation.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index ff99f06..b66fde8 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -64,6 +64,7 @@ static struct cn_msg *cn_msg_fill(__u8 *buffer, struct 
ve_struct *ve,
  struct task_struct *task,
  int what, int cookie,
  bool (*fill_event)(struct proc_event *ev,
+struct ve_struct *ve,
 struct task_struct *task,
 int cookie))
 {
@@ -85,7 +86,7 @@ static struct cn_msg *cn_msg_fill(__u8 *buffer, struct 
ve_struct *ve,
ev->timestamp_ns = timespec_to_ns(&ts);
ev->what = what;
 
-   return fill_event(ev, task, cookie) ? msg : NULL;
+   return fill_event(ev, ve, task, cookie) ? msg : NULL;
 }
 
 static int proc_event_num_listeners(struct ve_struct *ve)
@@ -98,6 +99,7 @@ static int proc_event_num_listeners(struct ve_struct *ve)
 static void proc_event_connector(struct task_struct *task,
 int what, int cookie,
 bool (*fill_event)(struct proc_event *ev,
+   struct ve_struct *ve,
struct task_struct *task,
int cookie))
 {
@@ -116,8 +118,8 @@ static void proc_event_connector(struct task_struct *task,
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
-static bool fill_fork_event(struct proc_event *ev, struct task_struct *task,
-   int unused)
+static bool fill_fork_event(struct proc_event *ev, struct ve_struct *ve,
+   struct task_struct *task, int unused)
 {
struct task_struct *parent;
 
@@ -136,8 +138,8 @@ void proc_fork_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_FORK, 0, fill_fork_event);
 }
 
-static bool fill_exec_event(struct proc_event *ev, struct task_struct *task,
-   int unused)
+static bool fill_exec_event(struct proc_event *ev, struct ve_struct *ve,
+   struct task_struct *task, int unused)
 {
ev->event_data.exec.process_pid = task_pid_nr_ns(task, &init_pid_ns);
ev->event_data.exec.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
@@ -149,8 +151,8 @@ void proc_exec_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_EXEC, 0, fill_exec_event);
 }
 
-static bool fill_id_event(struct proc_event *ev, struct task_struct *task,
- int which_id)
+static bool fill_id_event(struct proc_event *ev, struct ve_struct *ve,
+ struct task_struct *task, int which_id)
 {
const struct cred *cred;
 
@@ -177,8 +179,8 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
proc_event_connector(task, which_id, which_id, fill_id_event);
 }
 
-static bool fill_sid_event(struct proc_event *ev, struct task_struct *task,
-  int unused)
+static bool fill_sid_event(struct proc_event *ev, struct ve_struct *ve,
+  struct task_struct *task, int unused)
 {
ev->event_data.sid.process_pid = task_pid_nr_ns(task, &init_pid_ns);
ev->event_data.sid.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
@@ -190,8 +192,8 @@ void proc_sid_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_SID, 0, fill_sid_event);
 }
 
-static bool fill_ptrace_event(struct proc_event *ev, struct task_struct *task,
-  int ptrace_id)
+static bool fill_ptrace_event(struct proc_event *ev, struct ve_struct *ve,
+ struct task_struct *task, int ptrace_id)
 {
ev->event_data.ptrace.process_pid  = task_pid_nr_ns(task, &init_pid_ns);
ev->event_data.ptrace.process_tgid = task_tgid_nr_ns(task, 
&init_pid_ns);
@@ -212,8 +214,8 @@ void proc_ptrace_connector(struct task_struct *task, int 
ptrace_id)
 fill_ptrace_event);
 }
 
-static bool fill_comm_event(struct proc_event *ev, struct task_struct *task,
-

[Devel] [PATCH RHEL7 COMMIT] proc connector: call proc-related init and fini routines explicitly

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 33a6978beb7622e8e97837904db45d7432776bb5
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:39 2017 +0300

proc connector: call proc-related init and fini routines explicitly

This allows to support per-container connector creation and destruction.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c   | 19 ---
 drivers/connector/connector.c | 33 -
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 8998335..7a1124a 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -345,22 +345,3 @@ void cn_proc_fini_ve(struct ve_struct *ve)
 {
cn_del_callback_ve(ve, &cn_proc_event_id);
 }
-
-/*
- * cn_proc_init - initialization entry point
- *
- * Adds the connector callback to the connector driver.
- */
-static int __init cn_proc_init(void)
-{
-   int err = cn_add_callback(&cn_proc_event_id,
- "cn_proc",
- &cn_proc_mcast_ctl);
-   if (err) {
-   pr_warn("cn_proc failed to register\n");
-   return err;
-   }
-   return 0;
-}
-
-module_init(cn_proc_init);
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 110637b..59d81a3 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -281,6 +281,7 @@ static int cn_init_ve(struct ve_struct *ve)
.input  = cn_rx_skb,
};
struct net *net = ve->ve_netns;
+   int err;
 
ve->cn = kzalloc(sizeof(*ve->cn), GFP_KERNEL);
if (!ve->cn)
@@ -289,20 +290,40 @@ static int cn_init_ve(struct ve_struct *ve)
dev = &ve->cn->cdev;
 
dev->nls = netlink_kernel_create(net, NETLINK_CONNECTOR, &cfg);
-   if (!dev->nls)
-   return -EIO;
+   if (!dev->nls) {
+   err = -EIO;
+   goto free_cn;
+   }
 
dev->cbdev = cn_queue_alloc_dev("cqueue", dev->nls);
if (!dev->cbdev) {
-   netlink_kernel_release(dev->nls);
-   return -EINVAL;
+   err = -EINVAL;
+   goto netlink_release;
}
 
ve->cn->cn_already_initialized = 1;
 
-   proc_create("connector", S_IRUGO, net->proc_net, &cn_file_ops);
+   if (!proc_create("connector", S_IRUGO, net->proc_net, &cn_file_ops)) {
+   err = -ENOMEM;
+   goto free_cdev;
+   }
+
+   err = cn_proc_init_ve(ve);
+   if (err)
+   goto remove_proc;
 
return 0;
+
+remove_proc:
+   remove_proc_entry("connector", net->proc_net);
+free_cdev:
+   cn_queue_free_dev(dev->cbdev);
+netlink_release:
+   netlink_kernel_release(dev->nls);
+free_cn:
+   kfree(ve->cn);
+   ve->cn = NULL;
+   return err;
 }
 
 static void cn_fini_ve(struct ve_struct *ve)
@@ -312,6 +333,8 @@ static void cn_fini_ve(struct ve_struct *ve)
 
ve->cn->cn_already_initialized = 0;
 
+   cn_proc_fini_ve(ve);
+
remove_proc_entry("connector", net->proc_net);
 
cn_queue_free_dev(dev->cbdev);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: take namespaces from VE

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit ea9dfef19a855fe11f8caab1aaee1ca8263176fe
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:41 2017 +0300

proc connector: take namespaces from VE

Intead of hardcoded "init" namespaces.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 69 +++--
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index b66fde8..df6553d 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -122,14 +122,15 @@ static bool fill_fork_event(struct proc_event *ev, struct 
ve_struct *ve,
struct task_struct *task, int unused)
 {
struct task_struct *parent;
+   struct pid_namespace *pid_ns = ve->ve_ns->pid_ns;
 
rcu_read_lock();
parent = rcu_dereference(task->real_parent);
-   ev->event_data.fork.parent_pid = task_pid_nr_ns(parent, &init_pid_ns);
-   ev->event_data.fork.parent_tgid = task_tgid_nr_ns(parent, &init_pid_ns);
+   ev->event_data.fork.parent_pid = task_pid_nr_ns(parent, pid_ns);
+   ev->event_data.fork.parent_tgid = task_tgid_nr_ns(parent, pid_ns);
rcu_read_unlock();
-   ev->event_data.fork.child_pid = task_pid_nr_ns(task, &init_pid_ns);
-   ev->event_data.fork.child_tgid = task_tgid_nr_ns(task, &init_pid_ns);
+   ev->event_data.fork.child_pid = task_pid_nr_ns(task, pid_ns);
+   ev->event_data.fork.child_tgid = task_tgid_nr_ns(task, pid_ns);
return true;
 }
 
@@ -141,8 +142,10 @@ void proc_fork_connector(struct task_struct *task)
 static bool fill_exec_event(struct proc_event *ev, struct ve_struct *ve,
struct task_struct *task, int unused)
 {
-   ev->event_data.exec.process_pid = task_pid_nr_ns(task, &init_pid_ns);
-   ev->event_data.exec.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
+   struct pid_namespace *pid_ns = ve->ve_ns->pid_ns;
+
+   ev->event_data.exec.process_pid = task_pid_nr_ns(task, pid_ns);
+   ev->event_data.exec.process_tgid = task_tgid_nr_ns(task, pid_ns);
return true;
 }
 
@@ -155,17 +158,19 @@ static bool fill_id_event(struct proc_event *ev, struct 
ve_struct *ve,
  struct task_struct *task, int which_id)
 {
const struct cred *cred;
+   struct pid_namespace *pid_ns = ve->ve_ns->pid_ns;
+   struct user_namespace *user_ns = ve->init_cred->user_ns;
 
-   ev->event_data.id.process_pid = task_pid_nr_ns(task, &init_pid_ns);
-   ev->event_data.id.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
+   ev->event_data.id.process_pid = task_pid_nr_ns(task, pid_ns);
+   ev->event_data.id.process_tgid = task_tgid_nr_ns(task, pid_ns);
rcu_read_lock();
cred = __task_cred(task);
if (which_id == PROC_EVENT_UID) {
-   ev->event_data.id.r.ruid = from_kuid_munged(&init_user_ns, 
cred->uid);
-   ev->event_data.id.e.euid = from_kuid_munged(&init_user_ns, 
cred->euid);
+   ev->event_data.id.r.ruid = from_kuid_munged(user_ns, cred->uid);
+   ev->event_data.id.e.euid = from_kuid_munged(user_ns, 
cred->euid);
} else if (which_id == PROC_EVENT_GID) {
-   ev->event_data.id.r.rgid = from_kgid_munged(&init_user_ns, 
cred->gid);
-   ev->event_data.id.e.egid = from_kgid_munged(&init_user_ns, 
cred->egid);
+   ev->event_data.id.r.rgid = from_kgid_munged(user_ns, cred->gid);
+   ev->event_data.id.e.egid = from_kgid_munged(user_ns, 
cred->egid);
} else {
rcu_read_unlock();
return false;
@@ -182,8 +187,10 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
 static bool fill_sid_event(struct proc_event *ev, struct ve_struct *ve,
   struct task_struct *task, int unused)
 {
-   ev->event_data.sid.process_pid = task_pid_nr_ns(task, &init_pid_ns);
-   ev->event_data.sid.process_tgid = task_tgid_nr_ns(task, &init_pid_ns);
+   struct pid_namespace *pid_ns = ve->ve_ns->pid_ns;
+
+   ev->event_data.sid.process_pid = task_pid_nr_ns(task, pid_ns);
+   ev->event_data.sid.process_tgid = task_tgid_nr_ns(task, pid_ns);
return true;
 }
 
@@ -195,11 +202,13 @@ void proc_sid_connector(struct task_struct *task)
 static bool fill_ptrace_event(struct proc_event *ev, struct ve_struct *ve,
  struct task_struct *task, int ptrace_id)
 {
-   ev->event_data.ptrace.process_pid  = task_pid_nr_ns(task, &init_pid_ns);
-   ev->event_data.ptrace.process_tgid = task_tgid_nr_ns(task, 
&init_pid_ns);
+   struct pid_namespace *pid_ns = ve->ve_ns->pid_ns;
+
+   ev->event_data.ptrace.process_p

[Devel] [PATCH RHEL7 COMMIT] connector: introduce VE-aware get_cdev() helper

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 1dd02e8904050497fc1eb9c74485c526184679b0
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:28 2017 +0300

connector: introduce VE-aware get_cdev() helper

Once containerized, device won't be one and for all.
Thus make a helper template and use it instead of direct device object 
access.
Use ve0 for now.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index da26064..407fe52 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -63,6 +63,12 @@ static int cn_already_initialized;
  * a new message.
  *
  */
+
+static struct cn_dev *get_cdev(struct ve_struct *ve)
+{
+   return &cdev;
+}
+
 int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
@@ -70,7 +76,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t 
gfp_mask)
struct sk_buff *skb;
struct nlmsghdr *nlh;
struct cn_msg *data;
-   struct cn_dev *dev = &cdev;
+   struct cn_dev *dev = get_cdev(get_ve0());
u32 group = 0;
int found = 0;
 
@@ -123,7 +129,7 @@ EXPORT_SYMBOL_GPL(cn_netlink_send);
 static int cn_call_callback(struct sk_buff *skb)
 {
struct cn_callback_entry *i, *cbq = NULL;
-   struct cn_dev *dev = &cdev;
+   struct cn_dev *dev = get_cdev(get_ve0());
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = &NETLINK_CB(skb);
int err = -ENODEV;
@@ -190,7 +196,7 @@ int cn_add_callback(struct cb_id *id, const char *name,
 struct netlink_skb_parms *))
 {
int err;
-   struct cn_dev *dev = &cdev;
+   struct cn_dev *dev = get_cdev(get_ve0());
 
if (!cn_already_initialized)
return -EAGAIN;
@@ -213,7 +219,7 @@ EXPORT_SYMBOL_GPL(cn_add_callback);
  */
 void cn_del_callback(struct cb_id *id)
 {
-   struct cn_dev *dev = &cdev;
+   struct cn_dev *dev = get_cdev(get_ve0());
 
cn_queue_del_callback(dev->cbdev, id);
 }
@@ -221,7 +227,7 @@ EXPORT_SYMBOL_GPL(cn_del_callback);
 
 static int cn_proc_show(struct seq_file *m, void *v)
 {
-   struct cn_queue_dev *dev = cdev.cbdev;
+   struct cn_queue_dev *dev = get_cdev(get_ve0())->cbdev;
struct cn_callback_entry *cbq;
 
seq_printf(m, "NameID\n");
@@ -255,7 +261,7 @@ static const struct file_operations cn_file_ops = {
 
 static int cn_init(void)
 {
-   struct cn_dev *dev = &cdev;
+   struct cn_dev *dev = get_cdev(get_ve0());
struct netlink_kernel_cfg cfg = {
.groups = CN_NETLINK_USERS + 0xf,
.input  = cn_rx_skb,
@@ -280,7 +286,7 @@ static int cn_init(void)
 
 static void cn_fini(void)
 {
-   struct cn_dev *dev = &cdev;
+   struct cn_dev *dev = get_cdev(get_ve0());
 
cn_already_initialized = 0;
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use per-ve netlink sender helper

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit df6a3526acfae69476e008569e659ac52374950c
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:42 2017 +0300

proc connector: use per-ve netlink sender helper

Required to send event in the network to the right listener.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index df6553d..17e0247 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -115,7 +115,7 @@ static void proc_event_connector(struct task_struct *task,
return;
 
/*  If cn_netlink_send() failed, the data is not sent */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+   cn_netlink_send_ve(ve, msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
 static bool fill_fork_event(struct proc_event *ev, struct ve_struct *ve,
@@ -302,7 +302,7 @@ static void cn_proc_ack(struct ve_struct *ve, int err, int 
rcvd_seq, int rcvd_ac
msg->ack = rcvd_ack + 1;
msg->len = sizeof(*ev);
msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+   cn_netlink_send_ve(ve, msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
 /**
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: add per-ve init and fini foutines

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit ed6801f36adefd236c8d87418518763e876fb1ad
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:38 2017 +0300

proc connector: add per-ve init and fini foutines

These routines will be called from main connecter per-ve init and fini
routines.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 17 +
 include/linux/connector.h   |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 17a8c8c..8998335 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -329,6 +329,23 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg,
cn_proc_ack(err, msg->seq, msg->ack);
 }
 
+int cn_proc_init_ve(struct ve_struct *ve)
+{
+   int err = cn_add_callback_ve(ve, &cn_proc_event_id,
+"cn_proc",
+&cn_proc_mcast_ctl);
+   if (err) {
+   pr_warn("VE#%d: cn_proc failed to register\n", ve->veid);
+   return err;
+   }
+   return 0;
+}
+
+void cn_proc_fini_ve(struct ve_struct *ve)
+{
+   cn_del_callback_ve(ve, &cn_proc_event_id);
+}
+
 /*
  * cn_proc_init - initialization entry point
  *
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 8b44bf0..60eb089 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -76,6 +76,9 @@ struct cn_private {
 
 };
 
+int cn_proc_init_ve(struct ve_struct *ve);
+void cn_proc_fini_ve(struct ve_struct *ve);
+
 int cn_add_callback_ve(struct ve_struct *ve,
   struct cb_id *id, const char *name,
   void (*callback)(struct cn_msg *,
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: remove redundant input callback from cn_dev

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit b36f8c16abd69f33268c7b57613f529252a28075
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:27 2017 +0300

connector: remove redundant input callback from cn_dev

Patchset description:
proc connector: containerize on per-VE basis

This feature is requested by customer and needed by cgred service.

https://jira.sw.ru/browse/PSBM-60227

What's ne in v2:
1) Containerization is done on per-VE basis
2) Event in container is also sent to VE#0

Stanislav Kinsburskiy (27):
  connector: remove redundant input callback from cn_dev
  connector: store all private data on VE structure
  connector: introduce VE-aware get_cdev() helper
  connector: per-ve init and fini helpers introduced
  connector: use device stored in VE
  connector: per-ve helpers intoruduced
  connector: take cn_already_initialized from VE
  proc connector: generic proc_event_connector() helper introduced
  proc connector: use generic event helper for fork event
  proc connector: use generic event helper for exec event
  proc connector: use generic event helper for id event
  proc connector: use generic event helper for sid event
  proc connector: use generic event helper for ptrace event
  proc connector: use generic event helper for comm event
  proc connector: use generic event helper for coredump event
  proc connector: use generic event helper for exit event
  proc connector: add pid namespace awareness
  proc connector: add per-ve init and fini foutines
  proc connector: call proc-related init and fini routines explicitly
  proc connector: take number of listeners and per-cpu conters from VE
  proc connector: pass VE to event fillers
  proc connector: take namespaces from VE
  proc connector: use per-ve netlink sender helper
  proc connector: send events to both VEs if not in VE#0
  connector: containerize "connector" proc entry
  connector: take VE from socket upon callback
  connector: add VE SS hook

=
This patch description:

A small cleanup: this callback is never used.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 6 +-
 include/linux/connector.h | 1 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 0daa11e..da26064 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -253,16 +253,12 @@ static const struct file_operations cn_file_ops = {
.release = single_release
 };
 
-static struct cn_dev cdev = {
-   .input   = cn_rx_skb,
-};
-
 static int cn_init(void)
 {
struct cn_dev *dev = &cdev;
struct netlink_kernel_cfg cfg = {
.groups = CN_NETLINK_USERS + 0xf,
-   .input  = dev->input,
+   .input  = cn_rx_skb,
};
 
dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
diff --git a/include/linux/connector.h b/include/linux/connector.h
index b2b5a41..4c4d2b9 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -63,7 +63,6 @@ struct cn_dev {
 
u32 seq, groups;
struct sock *nls;
-   void (*input) (struct sk_buff *skb);
 
struct cn_queue_dev *cbdev;
 };
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: generic proc_event_connector() helper introduced

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit b4a281062d0770311132bc2a19b6797f63abe161
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:32 2017 +0300

proc connector: generic proc_event_connector() helper introduced

A lot of code is duplicated in proc connector events handling.
This patch introduces generic even handler, which will be used by different
events.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 50 +
 1 file changed, 50 insertions(+)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 3165811..808b22a 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -64,6 +64,54 @@ static inline void get_seq(__u32 *ts, int *cpu)
preempt_enable();
 }
 
+static struct cn_msg *cn_msg_fill(__u8 *buffer,
+ struct task_struct *task,
+ int what, int cookie,
+ bool (*fill_event)(struct proc_event *ev,
+struct task_struct *task,
+int cookie))
+{
+   struct cn_msg *msg;
+   struct proc_event *ev;
+   struct timespec ts;
+
+   msg = buffer_to_cn_msg(buffer);
+   ev = (struct proc_event *)msg->data;
+
+   get_seq(&msg->seq, &ev->cpu);
+   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
+   msg->ack = 0; /* not used */
+   msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
+
+   memset(&ev->event_data, 0, sizeof(ev->event_data));
+   ktime_get_ts(&ts); /* get high res monotonic timestamp */
+   ev->timestamp_ns = timespec_to_ns(&ts);
+   ev->what = what;
+
+   return fill_event(ev, task, cookie) ? msg : NULL;
+}
+
+static void proc_event_connector(struct task_struct *task,
+int what, int cookie,
+bool (*fill_event)(struct proc_event *ev,
+   struct task_struct *task,
+   int cookie))
+{
+   struct cn_msg *msg;
+   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+   if (atomic_read(&proc_event_num_listeners) < 1)
+   return;
+
+   msg = cn_msg_fill(buffer, task, what, cookie, fill_event);
+   if (!msg)
+   return;
+
+   /*  If cn_netlink_send() failed, the data is not sent */
+   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+}
+
 void proc_fork_connector(struct task_struct *task)
 {
struct cn_msg *msg;
@@ -72,6 +120,8 @@ void proc_fork_connector(struct task_struct *task)
struct timespec ts;
struct task_struct *parent;
 
+   (void) proc_event_connector;
+
if (atomic_read(&proc_event_num_listeners) < 1)
return;
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: containerize "connector" proc entry

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 2100a680437f0c26d65ab4e304cc274399ccbcf3
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:43 2017 +0300

connector: containerize "connector" proc entry

Needed to expose "/proc/net/connector" in CT and show right content.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 59d81a3..771dadf 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -241,7 +241,7 @@ EXPORT_SYMBOL_GPL(cn_del_callback);
 
 static int cn_proc_show(struct seq_file *m, void *v)
 {
-   struct cn_queue_dev *dev = get_cdev(get_ve0())->cbdev;
+   struct cn_queue_dev *dev = get_cdev(get_exec_env())->cbdev;
struct cn_callback_entry *cbq;
 
seq_printf(m, "NameID\n");
@@ -303,7 +303,7 @@ static int cn_init_ve(struct ve_struct *ve)
 
ve->cn->cn_already_initialized = 1;
 
-   if (!proc_create("connector", S_IRUGO, net->proc_net, &cn_file_ops)) {
+   if (!proc_net_create("connector", S_IRUGO, net->proc_net, 
&cn_file_ops)) {
err = -ENOMEM;
goto free_cdev;
}
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for id event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 34e9dc939d3adba763404a3e97b38a4255dd1e02
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:34 2017 +0300

proc connector: use generic event helper for id event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 06fd6b3..0647fcf 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -145,21 +145,11 @@ void proc_exec_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_EXEC, 0, fill_exec_event);
 }
 
-void proc_id_connector(struct task_struct *task, int which_id)
+static bool fill_id_event(struct proc_event *ev, struct task_struct *task,
+ int which_id)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-   struct timespec ts;
const struct cred *cred;
 
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   ev->what = which_id;
ev->event_data.id.process_pid = task->pid;
ev->event_data.id.process_tgid = task->tgid;
rcu_read_lock();
@@ -172,18 +162,15 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
ev->event_data.id.e.egid = from_kgid_munged(&init_user_ns, 
cred->egid);
} else {
rcu_read_unlock();
-   return;
+   return false;
}
rcu_read_unlock();
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_id_connector(struct task_struct *task, int which_id)
+{
+   proc_event_connector(task, which_id, which_id, fill_id_event);
 }
 
 void proc_sid_connector(struct task_struct *task)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: per-ve helpers intoruduced

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 596e20e4cfc9660a390027c3d5b5d2d9fc61b203
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:30 2017 +0300

connector: per-ve helpers intoruduced

This is precursor patch.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 48 +--
 include/linux/connector.h |  7 +++
 2 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index bc2308a..bba667d 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -67,14 +67,14 @@ static struct cn_dev *get_cdev(struct ve_struct *ve)
return &ve->cn->cdev;
 }
 
-int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t gfp_mask)
+int cn_netlink_send_ve(struct ve_struct *ve, struct cn_msg *msg, u32 __group, 
gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
unsigned int size;
struct sk_buff *skb;
struct nlmsghdr *nlh;
struct cn_msg *data;
-   struct cn_dev *dev = get_cdev(get_ve0());
+   struct cn_dev *dev = get_cdev(ve);
u32 group = 0;
int found = 0;
 
@@ -119,6 +119,11 @@ int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t 
gfp_mask)
 
return netlink_broadcast(dev->nls, skb, 0, group, gfp_mask);
 }
+
+int cn_netlink_send(struct cn_msg *msg, u32 __group, gfp_t gfp_mask)
+{
+   return cn_netlink_send_ve(get_ve0(), msg, __group, gfp_mask);
+}
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
@@ -183,18 +188,13 @@ static void cn_rx_skb(struct sk_buff *__skb)
}
 }
 
-/*
- * Callback add routing - adds callback with given ID and name.
- * If there is registered callback with the same ID it will not be added.
- *
- * May sleep.
- */
-int cn_add_callback(struct cb_id *id, const char *name,
-   void (*callback)(struct cn_msg *,
-struct netlink_skb_parms *))
+int cn_add_callback_ve(struct ve_struct *ve,
+  struct cb_id *id, const char *name,
+  void (*callback)(struct cn_msg *,
+   struct netlink_skb_parms *))
 {
int err;
-   struct cn_dev *dev = get_cdev(get_ve0());
+   struct cn_dev *dev = get_cdev(ve);
 
if (!cn_already_initialized)
return -EAGAIN;
@@ -205,8 +205,28 @@ int cn_add_callback(struct cb_id *id, const char *name,
 
return 0;
 }
+
+/*
+ * Callback add routing - adds callback with given ID and name.
+ * If there is registered callback with the same ID it will not be added.
+ *
+ * May sleep.
+ */
+int cn_add_callback(struct cb_id *id, const char *name,
+   void (*callback)(struct cn_msg *,
+struct netlink_skb_parms *))
+{
+   return cn_add_callback_ve(get_ve0(), id, name, callback);
+}
 EXPORT_SYMBOL_GPL(cn_add_callback);
 
+void cn_del_callback_ve(struct ve_struct *ve, struct cb_id *id)
+{
+   struct cn_dev *dev = get_cdev(ve);
+
+   cn_queue_del_callback(dev->cbdev, id);
+}
+
 /*
  * Callback remove routing - removes callback
  * with given ID.
@@ -217,9 +237,7 @@ EXPORT_SYMBOL_GPL(cn_add_callback);
  */
 void cn_del_callback(struct cb_id *id)
 {
-   struct cn_dev *dev = get_cdev(get_ve0());
-
-   cn_queue_del_callback(dev->cbdev, id);
+   cn_del_callback_ve(get_ve0(), id);
 }
 EXPORT_SYMBOL_GPL(cn_del_callback);
 
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 9e05e28..8b44bf0 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -76,6 +76,13 @@ struct cn_private {
 
 };
 
+int cn_add_callback_ve(struct ve_struct *ve,
+  struct cb_id *id, const char *name,
+  void (*callback)(struct cn_msg *,
+   struct netlink_skb_parms *));
+void cn_del_callback_ve(struct ve_struct *ve, struct cb_id *id);
+int cn_netlink_send_ve(struct ve_struct *ve, struct cn_msg *, u32, gfp_t);
+
 int cn_add_callback(struct cb_id *id, const char *name,
void (*callback)(struct cn_msg *, struct netlink_skb_parms 
*));
 void cn_del_callback(struct cb_id *);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for exit event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 72677a7d7de095a6c32f7f1c41e32fc3173337fd
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:37 2017 +0300

proc connector: use generic event helper for exit event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 312f30f..4ee1640 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -235,33 +235,19 @@ void proc_coredump_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_COREDUMP, 0, fill_coredump_event);
 }
 
-void proc_exit_connector(struct task_struct *task)
+static bool fill_exit_event(struct proc_event *ev, struct task_struct *task,
+   int unused)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-   struct timespec ts;
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_EXIT;
ev->event_data.exit.process_pid = task->pid;
ev->event_data.exit.process_tgid = task->tgid;
ev->event_data.exit.exit_code = task->exit_code;
ev->event_data.exit.exit_signal = task->exit_signal;
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_exit_connector(struct task_struct *task)
+{
+   proc_event_connector(task, PROC_EVENT_EXIT, 0, fill_exit_event);
 }
 
 /*
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for sid event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 9de4dc2591367ad8b7276ba1b8c723cb9960e9e9
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:34 2017 +0300

proc connector: use generic event helper for sid event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 0647fcf..2ad2587 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -173,31 +173,17 @@ void proc_id_connector(struct task_struct *task, int 
which_id)
proc_event_connector(task, which_id, which_id, fill_id_event);
 }
 
-void proc_sid_connector(struct task_struct *task)
+static bool fill_sid_event(struct proc_event *ev, struct task_struct *task,
+  int unused)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   struct timespec ts;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_SID;
ev->event_data.sid.process_pid = task->pid;
ev->event_data.sid.process_tgid = task->tgid;
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_sid_connector(struct task_struct *task)
+{
+   proc_event_connector(task, PROC_EVENT_SID, 0, fill_sid_event);
 }
 
 void proc_ptrace_connector(struct task_struct *task, int ptrace_id)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for ptrace event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit c4c0ba8521053013532de1e7db5ec3b5d27276c2
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:35 2017 +0300

proc connector: use generic event helper for ptrace event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 2ad2587..36a53fd 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -186,23 +186,9 @@ void proc_sid_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_SID, 0, fill_sid_event);
 }
 
-void proc_ptrace_connector(struct task_struct *task, int ptrace_id)
+static bool fill_ptrace_event(struct proc_event *ev, struct task_struct *task,
+  int ptrace_id)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   struct timespec ts;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_PTRACE;
ev->event_data.ptrace.process_pid  = task->pid;
ev->event_data.ptrace.process_tgid = task->tgid;
if (ptrace_id == PTRACE_ATTACH) {
@@ -212,13 +198,14 @@ void proc_ptrace_connector(struct task_struct *task, int 
ptrace_id)
ev->event_data.ptrace.tracer_pid  = 0;
ev->event_data.ptrace.tracer_tgid = 0;
} else
-   return;
+   return false;
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_ptrace_connector(struct task_struct *task, int ptrace_id)
+{
+   proc_event_connector(task, PROC_EVENT_PTRACE, ptrace_id,
+fill_ptrace_event);
 }
 
 void proc_comm_connector(struct task_struct *task)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for exec event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit ea2114f455580db5ab66460c31c19efbb7f716b2
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:33 2017 +0300

proc connector: use generic event helper for exec event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index ffda79b..06fd6b3 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -132,31 +132,17 @@ void proc_fork_connector(struct task_struct *task)
proc_event_connector(task, PROC_EVENT_FORK, 0, fill_fork_event);
 }
 
-void proc_exec_connector(struct task_struct *task)
+static bool fill_exec_event(struct proc_event *ev, struct task_struct *task,
+   int unused)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   struct timespec ts;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_EXEC;
ev->event_data.exec.process_pid = task->pid;
ev->event_data.exec.process_tgid = task->tgid;
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_exec_connector(struct task_struct *task)
+{
+   proc_event_connector(task, PROC_EVENT_EXEC, 0, fill_exec_event);
 }
 
 void proc_id_connector(struct task_struct *task, int which_id)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: take cn_already_initialized from VE

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit a66190eb61ac389d0060e3cff22f76cff0bf4c3d
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:31 2017 +0300

connector: take cn_already_initialized from VE

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index bba667d..110637b 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -38,8 +38,6 @@ MODULE_AUTHOR("Evgeniy Polyakov ");
 MODULE_DESCRIPTION("Generic userspace <-> kernelspace connector.");
 MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_CONNECTOR);
 
-static int cn_already_initialized;
-
 /*
  * msg->seq and msg->ack are used to determine message genealogy.
  * When someone sends message it puts there locally unique sequence
@@ -196,7 +194,7 @@ int cn_add_callback_ve(struct ve_struct *ve,
int err;
struct cn_dev *dev = get_cdev(ve);
 
-   if (!cn_already_initialized)
+   if (!ve->cn->cn_already_initialized)
return -EAGAIN;
 
err = cn_queue_add_callback(dev->cbdev, name, id, callback);
@@ -300,7 +298,7 @@ static int cn_init_ve(struct ve_struct *ve)
return -EINVAL;
}
 
-   cn_already_initialized = 1;
+   ve->cn->cn_already_initialized = 1;
 
proc_create("connector", S_IRUGO, net->proc_net, &cn_file_ops);
 
@@ -312,7 +310,7 @@ static void cn_fini_ve(struct ve_struct *ve)
struct cn_dev *dev = get_cdev(ve);
struct net *net = ve->ve_netns;
 
-   cn_already_initialized = 0;
+   ve->cn->cn_already_initialized = 0;
 
remove_proc_entry("connector", net->proc_net);
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] proc connector: use generic event helper for fork event

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit b9b0ba3dfa697a80078cbef06b13caf3c14ec249
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:32 2017 +0300

proc connector: use generic event helper for fork event

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/cn_proc.c | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 808b22a..ffda79b 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -112,26 +112,11 @@ static void proc_event_connector(struct task_struct *task,
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
 }
 
-void proc_fork_connector(struct task_struct *task)
+static bool fill_fork_event(struct proc_event *ev, struct task_struct *task,
+   int unused)
 {
-   struct cn_msg *msg;
-   struct proc_event *ev;
-   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
-   struct timespec ts;
struct task_struct *parent;
 
-   (void) proc_event_connector;
-
-   if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
-
-   msg = buffer_to_cn_msg(buffer);
-   ev = (struct proc_event *)msg->data;
-   memset(&ev->event_data, 0, sizeof(ev->event_data));
-   get_seq(&msg->seq, &ev->cpu);
-   ktime_get_ts(&ts); /* get high res monotonic timestamp */
-   ev->timestamp_ns = timespec_to_ns(&ts);
-   ev->what = PROC_EVENT_FORK;
rcu_read_lock();
parent = rcu_dereference(task->real_parent);
ev->event_data.fork.parent_pid = parent->pid;
@@ -139,13 +124,12 @@ void proc_fork_connector(struct task_struct *task)
rcu_read_unlock();
ev->event_data.fork.child_pid = task->pid;
ev->event_data.fork.child_tgid = task->tgid;
+   return true;
+}
 
-   memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
-   msg->ack = 0; /* not used */
-   msg->len = sizeof(*ev);
-   msg->flags = 0; /* not used */
-   /*  If cn_netlink_send() failed, the data is not sent */
-   cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+void proc_fork_connector(struct task_struct *task)
+{
+   proc_event_connector(task, PROC_EVENT_FORK, 0, fill_fork_event);
 }
 
 void proc_exec_connector(struct task_struct *task)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] connector: per-ve init and fini helpers introduced

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 37c6a11416ce88290d381482e0b8bf568dc59e97
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 17:40:29 2017 +0300

connector: per-ve init and fini helpers introduced

This helpers will be used later to initialize per-container connector.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Andrey Ryabinin 
---
 drivers/connector/connector.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 407fe52..f5484b2 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -259,15 +259,20 @@ static const struct file_operations cn_file_ops = {
.release = single_release
 };
 
-static int cn_init(void)
+static int cn_init_ve(struct ve_struct *ve)
 {
struct cn_dev *dev = get_cdev(get_ve0());
struct netlink_kernel_cfg cfg = {
.groups = CN_NETLINK_USERS + 0xf,
.input  = cn_rx_skb,
};
+   struct net *net = ve->ve_netns;
+
+   ve->cn = kzalloc(sizeof(*ve->cn), GFP_KERNEL);
+   if (!ve->cn)
+   return -ENOMEM;
 
-   dev->nls = netlink_kernel_create(&init_net, NETLINK_CONNECTOR, &cfg);
+   dev->nls = netlink_kernel_create(net, NETLINK_CONNECTOR, &cfg);
if (!dev->nls)
return -EIO;
 
@@ -279,21 +284,35 @@ static int cn_init(void)
 
cn_already_initialized = 1;
 
-   proc_create("connector", S_IRUGO, init_net.proc_net, &cn_file_ops);
+   proc_create("connector", S_IRUGO, net->proc_net, &cn_file_ops);
 
return 0;
 }
 
-static void cn_fini(void)
+static void cn_fini_ve(struct ve_struct *ve)
 {
-   struct cn_dev *dev = get_cdev(get_ve0());
+   struct cn_dev *dev = get_cdev(ve);
+   struct net *net = ve->ve_netns;
 
cn_already_initialized = 0;
 
-   remove_proc_entry("connector", init_net.proc_net);
+   remove_proc_entry("connector", net->proc_net);
 
cn_queue_free_dev(dev->cbdev);
netlink_kernel_release(dev->nls);
+
+   kfree(ve->cn);
+   ve->cn = NULL;
+}
+
+static int cn_init(void)
+{
+   return cn_init_ve(get_ve0());
+}
+
+static void cn_fini(void)
+{
+   return cn_fini_ve(get_ve0());
 }
 
 subsys_initcall(cn_init);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [RFC PATCH 2/2] autofs: sent 32-bit sized packet for 32-bit process

2017-08-31 Thread Stanislav Kinsburskiy

The structure autofs_v5_packet (except name) is not aligned by 8 bytes, which
lead to different sizes in 32 and 64-bit architectures.
Let's form 32-bit compatible packet when daemon has 32-bit addressation.

Signed-off-by: Stanislav Kinsburskiy 
---
 fs/autofs4/waitq.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 309ca6b..484cf2e 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -153,12 +153,19 @@ static void autofs4_notify_daemon(struct autofs_sb_info 
*sbi,
{
struct autofs_v5_packet *packet = &pkt.v5_pkt.v5_packet;
struct user_namespace *user_ns = sbi->pipe->f_cred->user_ns;
+   size_t name_offset;
 
-   pktsz = sizeof(*packet);
+   if (sbi->is32bit)
+   name_offset = offsetof(struct autofs_v5_packet, len) +
+ sizeof(packet->len);
+   else
+   name_offset = offsetof(struct autofs_v5_packet, name);
+
+   pktsz = name_offset + sizeof(packet->name);
 
packet->wait_queue_token = wq->wait_queue_token;
packet->len = wq->name.len;
-   memcpy(packet->name, wq->name.name, wq->name.len);
+   memcpy(packet + name_offset, wq->name.name, wq->name.len);
packet->name[wq->name.len] = '\0';
packet->dev = wq->dev;
packet->ino = wq->ino;

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [RFC PATCH 1/2] autofs: set compat flag on sbi when daemon uses 32bit addressation

2017-08-31 Thread Stanislav Kinsburskiy

Signed-off-by: Stanislav Kinsburskiy 
---
 fs/autofs4/inode.c |   16 
 1 file changed, 16 insertions(+)

diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index b23cf2a..989ac38 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -217,6 +217,7 @@ int autofs4_fill_super(struct super_block *s, void *data, 
int silent)
int pgrp;
bool pgrp_set = false;
int ret = -EINVAL;
+   struct task_struct *tsk;
 
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
@@ -281,10 +282,25 @@ int autofs4_fill_super(struct super_block *s, void *data, 
int silent)
pgrp);
goto fail_dput;
}
+   tsk = get_pid_task(sbi->oz_pgrp, PIDTYPE_PGID);
+   if (!tsk) {
+   pr_warn("autofs: could not find process group leader 
%d\n",
+   pgrp);
+   goto fail_put_pid;
+   }
} else {
sbi->oz_pgrp = get_task_pid(current, PIDTYPE_PGID);
+   get_task_struct(current);
+   tsk = current;
}
 
+   if (test_tsk_thread_flag(tsk, TIF_ADDR32))
+   sbi->is32bit = 1;
+   else
+   sbi->is32bit = 0;
+
+   put_task_struct(tsk);
+
if (autofs_type_trigger(sbi->type))
__managed_dentry_set_managed(root);
 

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [RFC PATCH 0/2] autofs: add "compat" support

2017-08-31 Thread Stanislav Kinsburskiy

The idea is simple: reduce autofs_v5_packet for 32bit damon on 64bit
architectures.

---

Stanislav Kinsburskiy (2):
  autofs: set compat flag on sbi when daemon uses 32bit addressation
  autofs: sent 32-bit sized packet for 32-bit process


 fs/autofs4/inode.c |   16 
 fs/autofs4/waitq.c |   11 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

--
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH rh7] tswap: Add support for zero-filled pages

2017-08-31 Thread Andrey Ryabinin

On 08/03/2017 12:54 PM, Kirill Tkhai wrote:
>  static int tswap_frontswap_store(unsigned type, pgoff_t offset,
>struct page *page)
>  {
>   swp_entry_t entry = swp_entry(type, offset);
> + int zero_filled = -1, err = 0;
>   struct page *cache_page;
> - int err = 0;
>  
>   if (!tswap_active)
>   return -1;
>  
>   cache_page = tswap_lookup_page(entry);
> - if (cache_page)
> - goto copy;
> + if (cache_page) {
> + zero_filled = is_zero_filled_page(page);
> + /* If type of page has not changed, just reuse it */
> + if (zero_filled == (cache_page == ZERO_PAGE(0)))
> + goto copy;
> + tswap_delete_page(entry, NULL);
> + put_page(cache_page);

I think if we race with tswap_frontswap_load() this will lead to double 
put_page().

> + }
>  
>   if (!(current->flags & PF_MEMCG_RECLAIM))
>   return -1;
>  
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Stanislav Kinsburskiy



31.08.2017 15:05, Dmitry V. Levin пишет:
> On Thu, Aug 31, 2017 at 02:40:23PM +0300, Dmitry V. Levin wrote:
>> On Thu, Aug 31, 2017 at 01:48:27PM +0300, Stanislav Kinsburskiy wrote:
>>>
>>>
>>> 31.08.2017 13:38, Dmitry V. Levin пишет:
 On Thu, Aug 31, 2017 at 02:11:34PM +0400, Stanislav Kinsburskiy wrote:
> Due to integer variables alignment size of struct autofs_v5_packet in 300
> bytes in 32-bit architectures (instead of 304 bytes in 64-bits 
> architectures).
>
> This may lead to memory corruption (64 bits kernel always send 304 bytes,
> while 32-bit userspace application expects for 300).
>
> https://jira.sw.ru/browse/PSBM-71078
>
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  include/uapi/linux/auto_fs4.h |2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
> index e02982f..8729a47 100644
> --- a/include/uapi/linux/auto_fs4.h
> +++ b/include/uapi/linux/auto_fs4.h
> @@ -137,6 +137,8 @@ struct autofs_v5_packet {
>   __u32 pid;
>   __u32 tgid;
>   __u32 len;
> + __u32 blob; /* This is needed to align structure up to 8
> +bytes for ALL archs including 32-bit */
>   char name[NAME_MAX+1];
>  };

 This change breaks ABI because it changes offsetof(struct 
 autofs_v5_packet, name).
 If you need to fix the alignment, use  __attribute__((aligned(8))).

>>>
>>> Nice to know you're watching.
>>> Yes, attribute is better.
>>> But how ABI is broken? On x86_64 this alignment is implied, so nothing is 
>>> changed.
>>
>> Your change increases offsetof(struct autofs_v5_packet, name) by 4 on all
>> architectures.  On architectures where the structure is 32-bit aligned
>> this also leads to increase of its size by 4.
>>
 An alignment change would also be an ABI breakage on 32-bit architectures,
 though.

>>>
>>> True.
>>> But from my POW better have it working on 64bit archs for 32bit apps.
>>> But anyway, upstream guys will device, whether they want 32-bit autofs 
>>> applications properly work on 64 or 32 bits.
>>
>> Let's fix old bugs without introducing new bugs.
>> The right fix here seems to be a compat structure, that is, both 64-bit
>> and 32-bit kernels should send the same 32-bit aligned structure, and
>> it has to be the same structure sent by traditional 32-bit kernels.
> 
> Alternatively, a much more simple fix would be to change 64-bit kernels
> not to send the trailing 4 padding bytes of 64-bit aligned
> struct autofs_v5_packet.  That is, just send
> offsetofend(struct autofs_v5_packet, name) bytes instead of
> sizeof(struct autofs_v5_packet) regardless of architecture.
> 

Fair enough, thanks!
But this approach won't work, because autofs pipe has O_DIRECT flag.
Compat structure looks more promising, but not yet clear to me, how to define 
one properly.
Probably by replacing "len" with char array like this:

/* autofs v5 common packet struct */
struct autofs_v5_packet {
struct autofs_packet_hdr hdr;
autofs_wqt_t wait_queue_token;
__u32 dev;
__u64 ino;
__u32 uid;
__u32 gid;
__u32 pid;
__u32 tgid;
__u8  len[4];
char name[NAME_MAX+1];
};

What do you think?
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH rh7 2/2] mm/memcg: reclaim only kmem if kmem limit reached.

2017-08-31 Thread Andrey Ryabinin

On 08/31/2017 12:58 PM, Konstantin Khorenko wrote:
> Do we want to push it to mainstream as well?
> 

I don't think so. Distributions are slowly moving towards v2 cgroup, where
kmem limit simply doesn't exists. And for legacy cgroup v1 lack of reclaim on 
kmem limit
hit wasn't a mistake but a deliberate choice. There is no clear use case for 
this, but
it's adds a lot complexity to the reclaim code and just looks a bit ugly.

> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Dmitry V. Levin

On Thu, Aug 31, 2017 at 02:40:23PM +0300, Dmitry V. Levin wrote:
> On Thu, Aug 31, 2017 at 01:48:27PM +0300, Stanislav Kinsburskiy wrote:
> > 
> > 
> > 31.08.2017 13:38, Dmitry V. Levin пишет:
> > > On Thu, Aug 31, 2017 at 02:11:34PM +0400, Stanislav Kinsburskiy wrote:
> > >> Due to integer variables alignment size of struct autofs_v5_packet in 300
> > >> bytes in 32-bit architectures (instead of 304 bytes in 64-bits 
> > >> architectures).
> > >>
> > >> This may lead to memory corruption (64 bits kernel always send 304 bytes,
> > >> while 32-bit userspace application expects for 300).
> > >>
> > >> https://jira.sw.ru/browse/PSBM-71078
> > >>
> > >> Signed-off-by: Stanislav Kinsburskiy 
> > >> ---
> > >>  include/uapi/linux/auto_fs4.h |2 ++
> > >>  1 file changed, 2 insertions(+)
> > >>
> > >> diff --git a/include/uapi/linux/auto_fs4.h 
> > >> b/include/uapi/linux/auto_fs4.h
> > >> index e02982f..8729a47 100644
> > >> --- a/include/uapi/linux/auto_fs4.h
> > >> +++ b/include/uapi/linux/auto_fs4.h
> > >> @@ -137,6 +137,8 @@ struct autofs_v5_packet {
> > >>  __u32 pid;
> > >>  __u32 tgid;
> > >>  __u32 len;
> > >> +__u32 blob; /* This is needed to align structure up 
> > >> to 8
> > >> +   bytes for ALL archs including 32-bit 
> > >> */
> > >>  char name[NAME_MAX+1];
> > >>  };
> > > 
> > > This change breaks ABI because it changes offsetof(struct 
> > > autofs_v5_packet, name).
> > > If you need to fix the alignment, use  __attribute__((aligned(8))).
> > > 
> > 
> > Nice to know you're watching.
> > Yes, attribute is better.
> > But how ABI is broken? On x86_64 this alignment is implied, so nothing is 
> > changed.
> 
> Your change increases offsetof(struct autofs_v5_packet, name) by 4 on all
> architectures.  On architectures where the structure is 32-bit aligned
> this also leads to increase of its size by 4.
> 
> > > An alignment change would also be an ABI breakage on 32-bit architectures,
> > > though.
> > > 
> > 
> > True.
> > But from my POW better have it working on 64bit archs for 32bit apps.
> > But anyway, upstream guys will device, whether they want 32-bit autofs 
> > applications properly work on 64 or 32 bits.
> 
> Let's fix old bugs without introducing new bugs.
> The right fix here seems to be a compat structure, that is, both 64-bit
> and 32-bit kernels should send the same 32-bit aligned structure, and
> it has to be the same structure sent by traditional 32-bit kernels.

Alternatively, a much more simple fix would be to change 64-bit kernels
not to send the trailing 4 padding bytes of 64-bit aligned
struct autofs_v5_packet.  That is, just send
offsetofend(struct autofs_v5_packet, name) bytes instead of
sizeof(struct autofs_v5_packet) regardless of architecture.


-- 
ldv


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH rh7] tswap: Add support for zero-filled pages

2017-08-31 Thread Konstantin Khorenko


Andrey, please review the patch.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/03/2017 12:54 PM, Kirill Tkhai wrote:

This patch makes tswap to do not allocate a new page,
if swapped page is zero-filled, and to use ZERO_PAGE()
pointer to decode it instead.

The same optimization is made in zram, and it may help
VMs to reduce memory usage in some way.

Signed-off-by: Kirill Tkhai 
---
 mm/tswap.c |   65 +++-
 1 file changed, 51 insertions(+), 14 deletions(-)

diff --git a/mm/tswap.c b/mm/tswap.c
index 15f5adc2dc9..6a3cb917059 100644
--- a/mm/tswap.c
+++ b/mm/tswap.c
@@ -54,16 +54,20 @@ static void tswap_lru_add(struct page *page)
 {
struct tswap_lru *lru = &tswap_lru_node[page_to_nid(page)];

-   list_add_tail(&page->lru, &lru->list);
-   lru->nr_items++;
+   if (page != ZERO_PAGE(0)) {
+   list_add_tail(&page->lru, &lru->list);
+   lru->nr_items++;
+   }
 }

 static void tswap_lru_del(struct page *page)
 {
struct tswap_lru *lru = &tswap_lru_node[page_to_nid(page)];

-   list_del(&page->lru);
-   lru->nr_items--;
+   if (page != ZERO_PAGE(0)) {
+   list_del(&page->lru);
+   lru->nr_items--;
+   }
 }

 static struct page *tswap_lookup_page(swp_entry_t entry)
@@ -73,7 +77,7 @@ static struct page *tswap_lookup_page(swp_entry_t entry)
spin_lock(&tswap_lock);
page = radix_tree_lookup(&tswap_page_tree, entry.val);
spin_unlock(&tswap_lock);
-   BUG_ON(page && page_private(page) != entry.val);
+   BUG_ON(page && page != ZERO_PAGE(0) && page_private(page) != entry.val);
return page;
 }

@@ -85,7 +89,8 @@ static int tswap_insert_page(swp_entry_t entry, struct page 
*page)
if (err)
return err;

-   set_page_private(page, entry.val);
+   if (page != ZERO_PAGE(0))
+   set_page_private(page, entry.val);
spin_lock(&tswap_lock);
err = radix_tree_insert(&tswap_page_tree, entry.val, page);
if (!err) {
@@ -111,7 +116,7 @@ static struct page *tswap_delete_page(swp_entry_t entry, 
struct page *expected)
spin_unlock(&tswap_lock);
if (page) {
BUG_ON(expected && page != expected);
-   BUG_ON(page_private(page) != entry.val);
+   BUG_ON(page_private(page) != entry.val && page != ZERO_PAGE(0));
}
return page;
 }
@@ -274,26 +279,57 @@ static void tswap_frontswap_init(unsigned type)
 */
 }

+static bool is_zero_filled_page(struct page *page)
+{
+   bool zero_filled = true;
+   unsigned long *v;
+   int i;
+
+   v = kmap_atomic(page);
+   for (i = 0; i < PAGE_SIZE / sizeof(*v); i++) {
+   if (v[i] != 0) {
+   zero_filled = false;
+   break;
+   }
+   }
+   kunmap_atomic(v);
+   return zero_filled;
+}
+
 static int tswap_frontswap_store(unsigned type, pgoff_t offset,
 struct page *page)
 {
swp_entry_t entry = swp_entry(type, offset);
+   int zero_filled = -1, err = 0;
struct page *cache_page;
-   int err = 0;

if (!tswap_active)
return -1;

cache_page = tswap_lookup_page(entry);
-   if (cache_page)
-   goto copy;
+   if (cache_page) {
+   zero_filled = is_zero_filled_page(page);
+   /* If type of page has not changed, just reuse it */
+   if (zero_filled == (cache_page == ZERO_PAGE(0)))
+   goto copy;
+   tswap_delete_page(entry, NULL);
+   put_page(cache_page);
+   }

if (!(current->flags & PF_MEMCG_RECLAIM))
return -1;

-   cache_page = alloc_page(TSWAP_GFP_MASK | __GFP_HIGHMEM);
-   if (!cache_page)
-   return -1;
+   if (zero_filled == -1)
+   zero_filled = is_zero_filled_page(page);
+
+   if (!zero_filled) {
+   cache_page = alloc_page(TSWAP_GFP_MASK | __GFP_HIGHMEM);
+   if (!cache_page)
+   return -1;
+   } else {
+   cache_page = ZERO_PAGE(0);
+   get_page(cache_page);
+   }

err = tswap_insert_page(entry, cache_page);
if (err) {
@@ -306,7 +342,8 @@ static int tswap_frontswap_store(unsigned type, pgoff_t 
offset,
return -1;
}
 copy:
-   copy_highpage(cache_page, page);
+   if (cache_page != ZERO_PAGE(0))
+   copy_highpage(cache_page, page);
return 0;
 }


.


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Dmitry V. Levin

On Thu, Aug 31, 2017 at 01:48:27PM +0300, Stanislav Kinsburskiy wrote:
> 
> 
> 31.08.2017 13:38, Dmitry V. Levin пишет:
> > On Thu, Aug 31, 2017 at 02:11:34PM +0400, Stanislav Kinsburskiy wrote:
> >> Due to integer variables alignment size of struct autofs_v5_packet in 300
> >> bytes in 32-bit architectures (instead of 304 bytes in 64-bits 
> >> architectures).
> >>
> >> This may lead to memory corruption (64 bits kernel always send 304 bytes,
> >> while 32-bit userspace application expects for 300).
> >>
> >> https://jira.sw.ru/browse/PSBM-71078
> >>
> >> Signed-off-by: Stanislav Kinsburskiy 
> >> ---
> >>  include/uapi/linux/auto_fs4.h |2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
> >> index e02982f..8729a47 100644
> >> --- a/include/uapi/linux/auto_fs4.h
> >> +++ b/include/uapi/linux/auto_fs4.h
> >> @@ -137,6 +137,8 @@ struct autofs_v5_packet {
> >>__u32 pid;
> >>__u32 tgid;
> >>__u32 len;
> >> +  __u32 blob; /* This is needed to align structure up to 8
> >> + bytes for ALL archs including 32-bit */
> >>char name[NAME_MAX+1];
> >>  };
> > 
> > This change breaks ABI because it changes offsetof(struct autofs_v5_packet, 
> > name).
> > If you need to fix the alignment, use  __attribute__((aligned(8))).
> > 
> 
> Nice to know you're watching.
> Yes, attribute is better.
> But how ABI is broken? On x86_64 this alignment is implied, so nothing is 
> changed.

Your change increases offsetof(struct autofs_v5_packet, name) by 4 on all
architectures.  On architectures where the structure is 32-bit aligned
this also leads to increase of its size by 4.

> > An alignment change would also be an ABI breakage on 32-bit architectures,
> > though.
> > 
> 
> True.
> But from my POW better have it working on 64bit archs for 32bit apps.
> But anyway, upstream guys will device, whether they want 32-bit autofs 
> applications properly work on 64 or 32 bits.

Let's fix old bugs without introducing new bugs.
The right fix here seems to be a compat structure, that is, both 64-bit
and 32-bit kernels should send the same 32-bit aligned structure, and
it has to be the same structure sent by traditional 32-bit kernels.


-- 
ldv


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Stanislav Kinsburskiy



31.08.2017 13:38, Dmitry V. Levin пишет:
> On Thu, Aug 31, 2017 at 02:11:34PM +0400, Stanislav Kinsburskiy wrote:
>> Due to integer variables alignment size of struct autofs_v5_packet in 300
>> bytes in 32-bit architectures (instead of 304 bytes in 64-bits 
>> architectures).
>>
>> This may lead to memory corruption (64 bits kernel always send 304 bytes,
>> while 32-bit userspace application expects for 300).
>>
>> https://jira.sw.ru/browse/PSBM-71078
>>
>> Signed-off-by: Stanislav Kinsburskiy 
>> ---
>>  include/uapi/linux/auto_fs4.h |2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
>> index e02982f..8729a47 100644
>> --- a/include/uapi/linux/auto_fs4.h
>> +++ b/include/uapi/linux/auto_fs4.h
>> @@ -137,6 +137,8 @@ struct autofs_v5_packet {
>>  __u32 pid;
>>  __u32 tgid;
>>  __u32 len;
>> +__u32 blob; /* This is needed to align structure up to 8
>> +   bytes for ALL archs including 32-bit */
>>  char name[NAME_MAX+1];
>>  };
> 
> This change breaks ABI because it changes offsetof(struct autofs_v5_packet, 
> name).
> If you need to fix the alignment, use  __attribute__((aligned(8))).
> 

Nice to know you're watching.
Yes, attribute is better.
But how ABI is broken? On x86_64 this alignment is implied, so nothing is 
changed.

> An alignment change would also be an ABI breakage on 32-bit architectures,
> though.
> 

True.
But from my POW better have it working on 64bit archs for 32bit apps.
But anyway, upstream guys will device, whether they want 32-bit autofs 
applications properly work on 64 or 32 bits.
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Dmitry V. Levin

On Thu, Aug 31, 2017 at 02:11:34PM +0400, Stanislav Kinsburskiy wrote:
> Due to integer variables alignment size of struct autofs_v5_packet in 300
> bytes in 32-bit architectures (instead of 304 bytes in 64-bits architectures).
> 
> This may lead to memory corruption (64 bits kernel always send 304 bytes,
> while 32-bit userspace application expects for 300).
> 
> https://jira.sw.ru/browse/PSBM-71078
> 
> Signed-off-by: Stanislav Kinsburskiy 
> ---
>  include/uapi/linux/auto_fs4.h |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
> index e02982f..8729a47 100644
> --- a/include/uapi/linux/auto_fs4.h
> +++ b/include/uapi/linux/auto_fs4.h
> @@ -137,6 +137,8 @@ struct autofs_v5_packet {
>   __u32 pid;
>   __u32 tgid;
>   __u32 len;
> + __u32 blob; /* This is needed to align structure up to 8
> +bytes for ALL archs including 32-bit */
>   char name[NAME_MAX+1];
>  };

This change breaks ABI because it changes offsetof(struct autofs_v5_packet, 
name).
If you need to fix the alignment, use  __attribute__((aligned(8))).

An alignment change would also be an ABI breakage on 32-bit architectures,
though.


-- 
ldv


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Stanislav Kinsburskiy

Yes

31.08.2017 13:37, Konstantin Khorenko пишет:
> Will you send it to mainstream as well?
> 
> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 08/31/2017 01:11 PM, Stanislav Kinsburskiy wrote:
>> Due to integer variables alignment size of struct autofs_v5_packet in 300
>> bytes in 32-bit architectures (instead of 304 bytes in 64-bits 
>> architectures).
>>
>> This may lead to memory corruption (64 bits kernel always send 304 bytes,
>> while 32-bit userspace application expects for 300).
>>
>> https://jira.sw.ru/browse/PSBM-71078
>>
>> Signed-off-by: Stanislav Kinsburskiy 
>> ---
>>  include/uapi/linux/auto_fs4.h |2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
>> index e02982f..8729a47 100644
>> --- a/include/uapi/linux/auto_fs4.h
>> +++ b/include/uapi/linux/auto_fs4.h
>> @@ -137,6 +137,8 @@ struct autofs_v5_packet {
>>  __u32 pid;
>>  __u32 tgid;
>>  __u32 len;
>> +__u32 blob;/* This is needed to align structure up to 8
>> +   bytes for ALL archs including 32-bit */
>>  char name[NAME_MAX+1];
>>  };
>>
>>
>> .
>>
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Konstantin Khorenko


Will you send it to mainstream as well?

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/31/2017 01:11 PM, Stanislav Kinsburskiy wrote:

Due to integer variables alignment size of struct autofs_v5_packet in 300
bytes in 32-bit architectures (instead of 304 bytes in 64-bits architectures).

This may lead to memory corruption (64 bits kernel always send 304 bytes,
while 32-bit userspace application expects for 300).

https://jira.sw.ru/browse/PSBM-71078

Signed-off-by: Stanislav Kinsburskiy 
---
 include/uapi/linux/auto_fs4.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index e02982f..8729a47 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -137,6 +137,8 @@ struct autofs_v5_packet {
__u32 pid;
__u32 tgid;
__u32 len;
+   __u32 blob; /* This is needed to align structure up to 8
+  bytes for ALL archs including 32-bit */
char name[NAME_MAX+1];
 };


.


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit e484b0abe8af8793f58e6434060a3779261d3151
Author: Stanislav Kinsburskiy 
Date:   Thu Aug 31 13:36:59 2017 +0300

autofs: fix autofs_v5_packet structure for compat mode

Due to integer variables alignment size of struct autofs_v5_packet in 300
bytes in 32-bit architectures (instead of 304 bytes in 64-bits 
architectures).

This may lead to memory corruption (64 bits kernel always send 304 bytes,
while 32-bit userspace application expects for 300).

https://jira.sw.ru/browse/PSBM-71078

Signed-off-by: Stanislav Kinsburskiy 
---
 include/uapi/linux/auto_fs4.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index e02982f..8729a47 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -137,6 +137,8 @@ struct autofs_v5_packet {
__u32 pid;
__u32 tgid;
__u32 len;
+   __u32 blob; /* This is needed to align structure up to 8
+  bytes for ALL archs including 32-bit */
char name[NAME_MAX+1];
 };
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH RHEL7 COMMIT] ms/workqueue: fix ghost PENDING flag while doing MQ IO

2017-08-31 Thread Konstantin Khorenko


Please consider to release it as a ReadyKernel patch.

https://readykernel.com/

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/31/2017 01:28 PM, Konstantin Khorenko wrote:

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit f24bbb53d5035c7b13b5ecb61728d5f12240f139
Author: Roman Pen 
Date:   Thu Aug 31 13:28:46 2017 +0300

ms/workqueue: fix ghost PENDING flag while doing MQ IO

We have the hole node hang, many processes hang on similar stack as here:

crash> ps -m 8802b7f0
[0 00:20:36.663] [UN]  PID: 22713  TASK: 8802b7f0  CPU: 1   COMMAND: 
"worker"

crash> bt 8802b7f0
PID: 22713  TASK: 8802b7f0  CPU: 1   COMMAND: "worker"
 #0 [88031b04f980] __schedule at 8256cdd1
 #1 [88031b04f9f8] schedule at 8256e239
 #2 [88031b04fa18] schedule_timeout at 82561cea
 #3 [88031b04fb88] io_schedule_timeout at 8256c0d9
 #4 [88031b04fbb8] wait_for_completion_io at 8256f3e0
 #5 [88031b04fc90] blkdev_issue_flush at 8193a207
 #6 [88031b04fe08] ext4_sync_file at a0af6d34 [ext4]
 #7 [88031b04fe68] vfs_fsync_range at 8173212c
 #8 [88031b04fec8] do_fsync at 817330dc
 #9 [88031b04ff68] sys_fdatasync at 8173437e
RIP: 7f474a581ddd  RSP: 7f46ba3fe8a0  RFLAGS: 0282
RAX: 004b  RBX: 8258f609  RCX: 
RDX: 7f4754ffd458  RSI:   RDI: 0011
RBP:    R8:    R9: 58b9
R10: 7f46ba3fe8b0  R11: 0293  R12: 7f475be25d80
R13: 8173437e  R14: 88031b04ff78  R15: 7f4755141452
ORIG_RAX: 004b  CS: 0033  SS: 002b

crash> ps -m 8802b7f0
[0 00:20:36.663] [UN]  PID: 22713  TASK: 8802b7f0  CPU: 1   COMMAND: 
"worker"

Sleeps for 20 minutes on bio completion:

blkdev_issue_flush:
submit_bio(WRITE_FLUSH, bio);
here>wait_for_completion_io(&wait);

As bio->bi_rw = (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH), we had:

submit_bio->generic_make_request->dm_make_request->queue_io->queue_work

So in wait_for_completion_io we wait for dm_wq_work to complete these
bio. But work is not in the workqueue already, as work->entry is empty
list, so the work seem completed. That could happen only if md->flags
had DMF_BLOCK_IO_FOR_SUSPEND bit set. But it is already unset, when we
clear the bit we queue another dm_wq_work on these wq in dm_queue_flush.

So what could've happened here is that operation reordering loads
DMF_BLOCK_IO_FOR_SUSPEND bit in dm_wq_work before it was cleared in
dm_queue_flush. Adding smp_mb in set_work_pool_and_clear_pending should
order operations properly.

https://jira.sw.ru/browse/PSBM-69788

original commit message:

The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 
seconds.
[  601.347574]   Tainted: G   O4.4.5-1-storage+ #6
[  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  601.348142] kworker/u129:5  D 880803077988 0  1636  2 
0x
[  601.348519] Workqueue: ibnbd_server_fileio_wq 
ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  880803077988 88080466b900 8808033f9c80 
880803078000
[  601.349662]  880807c95000 7fff 815b0920 
880803077ad0
[  601.350333]  8808030779a0 815b01d5  
880803077a38
[  601.350965] Call Trace:
[  601.351203]  [] ? bit_wait+0x60/0x60
[  601.351444]  [] schedule+0x35/0x80
[  601.351709]  [] schedule_timeout+0x192/0x230
[  601.351958]  [] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [] ? ktime_get+0x37/0xa0
[  601.352446]  [] ? bit_wait+0x60/0x60
[  601.352688]  [] io_schedule_timeout+0xa4/0x110
[  601.352951]  [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [] bit_wait_io+0x1b/0x70
[  601.353440]  [] __wait_on_bit+0x5d/0x90
[  601.353689]  [] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [] blkdev_fsync+0x1b/0x50
[  601.355193]  [] vfs_fsync_range+0x49/0xa0
[  601.355432]  [] blkdev_write_iter+0xca/0x100
[  601.355679]  [] __vfs_write+0xaa/0xe0
[  601.355925]  [] vfs_write+0xa9/0x1a0
[  601.356164]  []

[Devel] [PATCH RHEL7 COMMIT] ms/workqueue: fix ghost PENDING flag while doing MQ IO

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit f24bbb53d5035c7b13b5ecb61728d5f12240f139
Author: Roman Pen 
Date:   Thu Aug 31 13:28:46 2017 +0300

ms/workqueue: fix ghost PENDING flag while doing MQ IO

We have the hole node hang, many processes hang on similar stack as here:

crash> ps -m 8802b7f0
[0 00:20:36.663] [UN]  PID: 22713  TASK: 8802b7f0  CPU: 1   
COMMAND: "worker"

crash> bt 8802b7f0
PID: 22713  TASK: 8802b7f0  CPU: 1   COMMAND: "worker"
 #0 [88031b04f980] __schedule at 8256cdd1
 #1 [88031b04f9f8] schedule at 8256e239
 #2 [88031b04fa18] schedule_timeout at 82561cea
 #3 [88031b04fb88] io_schedule_timeout at 8256c0d9
 #4 [88031b04fbb8] wait_for_completion_io at 8256f3e0
 #5 [88031b04fc90] blkdev_issue_flush at 8193a207
 #6 [88031b04fe08] ext4_sync_file at a0af6d34 [ext4]
 #7 [88031b04fe68] vfs_fsync_range at 8173212c
 #8 [88031b04fec8] do_fsync at 817330dc
 #9 [88031b04ff68] sys_fdatasync at 8173437e
RIP: 7f474a581ddd  RSP: 7f46ba3fe8a0  RFLAGS: 0282
RAX: 004b  RBX: 8258f609  RCX: 
RDX: 7f4754ffd458  RSI:   RDI: 0011
RBP:    R8:    R9: 58b9
R10: 7f46ba3fe8b0  R11: 0293  R12: 7f475be25d80
R13: 8173437e  R14: 88031b04ff78  R15: 7f4755141452
ORIG_RAX: 004b  CS: 0033  SS: 002b

crash> ps -m 8802b7f0
[0 00:20:36.663] [UN]  PID: 22713  TASK: 8802b7f0  CPU: 1   
COMMAND: "worker"

Sleeps for 20 minutes on bio completion:

blkdev_issue_flush:
submit_bio(WRITE_FLUSH, bio);
here>   wait_for_completion_io(&wait);

As bio->bi_rw = (WRITE | REQ_SYNC | REQ_NOIDLE | REQ_FLUSH), we had:

submit_bio->generic_make_request->dm_make_request->queue_io->queue_work

So in wait_for_completion_io we wait for dm_wq_work to complete these
bio. But work is not in the workqueue already, as work->entry is empty
list, so the work seem completed. That could happen only if md->flags
had DMF_BLOCK_IO_FOR_SUSPEND bit set. But it is already unset, when we
clear the bit we queue another dm_wq_work on these wq in dm_queue_flush.

So what could've happened here is that operation reordering loads
DMF_BLOCK_IO_FOR_SUSPEND bit in dm_wq_work before it was cleared in
dm_queue_flush. Adding smp_mb in set_work_pool_and_clear_pending should
order operations properly.

https://jira.sw.ru/browse/PSBM-69788

original commit message:

The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 
seconds.
[  601.347574]   Tainted: G   O4.4.5-1-storage+ #6
[  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  601.348142] kworker/u129:5  D 880803077988 0  1636  2 
0x
[  601.348519] Workqueue: ibnbd_server_fileio_wq 
ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  880803077988 88080466b900 8808033f9c80 
880803078000
[  601.349662]  880807c95000 7fff 815b0920 
880803077ad0
[  601.350333]  8808030779a0 815b01d5  
880803077a38
[  601.350965] Call Trace:
[  601.351203]  [] ? bit_wait+0x60/0x60
[  601.351444]  [] schedule+0x35/0x80
[  601.351709]  [] schedule_timeout+0x192/0x230
[  601.351958]  [] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [] ? ktime_get+0x37/0xa0
[  601.352446]  [] ? bit_wait+0x60/0x60
[  601.352688]  [] io_schedule_timeout+0xa4/0x110
[  601.352951]  [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [] bit_wait_io+0x1b/0x70
[  601.353440]  [] __wait_on_bit+0x5d/0x90
[  601.353689]  [] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [] blkdev_fsync+0x1b/0x50
[  601.355193]  [] vfs_fsync_range+0x49/0xa0
[  601.355432]  [] blkdev_write_iter+0xca/0x100
[  601.355679]  [] __vfs_write+0xaa/0xe0
[  601.355925]  [] vfs_write+0xa9/0x1a0
[  601.356164]  [] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode= MQ
  submit_queues

[Devel] [PATCH RHEL7 COMMIT] fs-writeback: add endless writeback debug

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 1069e544ff85161d41fd3679c3d3b47dc3af5139
Author: Dmitry Monakhov 
Date:   Fri Aug 25 13:16:52 2017 +0400

fs-writeback: add endless writeback debug

This is temporary debug patch, it will be rolled back before the release.

https://jira.sw.ru/browse/PSBM-69587

Signed-off-by: Dmitry Monakhov 
---
 fs/fs-writeback.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f34ae6c..a54c0bd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -787,11 +787,15 @@ static long __writeback_inodes_wb(struct bdi_writeback 
*wb,
 {
unsigned long start_time = jiffies;
long wrote = 0;
+   int trace = 0;
 
while (!list_empty(&wb->b_io)) {
struct inode *inode = wb_inode(wb->b_io.prev);
struct super_block *sb = inode->i_sb;
 
+   if (time_is_before_jiffies(start_time + 15* HZ))
+   trace = 1;
+
if (!grab_super_passive(sb)) {
/*
 * grab_super_passive() may fail consistently due to
@@ -799,6 +803,9 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 * requeue_io() to avoid busy retrying the inode/sb.
 */
redirty_tail(inode, wb);
+   if (trace)
+   printk("%s:%d writeback is taking too long 
ino:%ld sb(%p):%s\n",
+  __FUNCTION__, __LINE__, inode->i_ino, 
sb, sb->s_id);
continue;
}
wrote += writeback_sb_inodes(sb, wb, work);
@@ -890,6 +897,7 @@ static long wb_writeback(struct bdi_writeback *wb,
unsigned long oldest_jif;
struct inode *inode;
long progress;
+   int trace = 0;
 
oldest_jif = jiffies;
work->older_than_this = &oldest_jif;
@@ -902,6 +910,9 @@ static long wb_writeback(struct bdi_writeback *wb,
if (work->nr_pages <= 0)
break;
 
+   if (time_is_before_jiffies(wb_start + 15* HZ))
+   trace = 1;
+
/*
 * Background writeout and kupdate-style writeback may
 * run forever. Stop them if there is other work to do
@@ -973,6 +984,10 @@ static long wb_writeback(struct bdi_writeback *wb,
inode = wb_inode(wb->b_more_io.prev);
spin_lock(&inode->i_lock);
spin_unlock(&wb->list_lock);
+   if (trace)
+   printk("%s:%d writeback is taking too long 
ino:%ld st:%ld sb(%p):%s\n",
+  __FUNCTION__, __LINE__, inode->i_ino,
+  inode->i_state, inode->i_sb, 
inode->i_sb->s_id);
/* This function drops i_lock... */
inode_sleep_on_writeback(inode);
spin_lock(&wb->list_lock);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH] autofs: fix autofs_v5_packet structure for compat mode

2017-08-31 Thread Stanislav Kinsburskiy

Due to integer variables alignment size of struct autofs_v5_packet in 300
bytes in 32-bit architectures (instead of 304 bytes in 64-bits architectures).

This may lead to memory corruption (64 bits kernel always send 304 bytes,
while 32-bit userspace application expects for 300).

https://jira.sw.ru/browse/PSBM-71078

Signed-off-by: Stanislav Kinsburskiy 
---
 include/uapi/linux/auto_fs4.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/auto_fs4.h b/include/uapi/linux/auto_fs4.h
index e02982f..8729a47 100644
--- a/include/uapi/linux/auto_fs4.h
+++ b/include/uapi/linux/auto_fs4.h
@@ -137,6 +137,8 @@ struct autofs_v5_packet {
__u32 pid;
__u32 tgid;
__u32 len;
+   __u32 blob; /* This is needed to align structure up to 8
+  bytes for ALL archs including 32-bit */
char name[NAME_MAX+1];
 };
 

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] mm/memcg: reclaim only kmem if kmem limit reached

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit aa84e9472d88646f993f8bf1f2eb03a6abad93cd
Author: Andrey Ryabinin 
Date:   Thu Aug 31 13:03:24 2017 +0300

mm/memcg: reclaim only kmem if kmem limit reached

If kmem limit on memcg reached, we go into memory reclaim,
and reclaim everything we can, including page cache and anon.
Reclaiming page cache or anon won't help since we need to lower
only kmem usage. This patch fixes the problem by avoiding
non-kmem reclaim on hitting the kmem limit.

https://jira.sw.ru/browse/PSBM-69226
Signed-off-by: Andrey Ryabinin 
---
 include/linux/memcontrol.h | 10 ++
 include/linux/swap.h   |  2 +-
 mm/memcontrol.c| 30 --
 mm/vmscan.c| 31 ---
 4 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1a52e58..1d6bc80 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -45,6 +45,16 @@ struct mem_cgroup_reclaim_cookie {
unsigned int generation;
 };
 
+/*
+ * Reclaim flags for mem_cgroup_hierarchical_reclaim
+ */
+#define MEM_CGROUP_RECLAIM_NOSWAP_BIT  0x0
+#define MEM_CGROUP_RECLAIM_NOSWAP  (1 << MEM_CGROUP_RECLAIM_NOSWAP_BIT)
+#define MEM_CGROUP_RECLAIM_SHRINK_BIT  0x1
+#define MEM_CGROUP_RECLAIM_SHRINK  (1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
+#define MEM_CGROUP_RECLAIM_KMEM_BIT0x2
+#define MEM_CGROUP_RECLAIM_KMEM(1 << 
MEM_CGROUP_RECLAIM_KMEM_BIT)
+
 #ifdef CONFIG_MEMCG
 int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
  gfp_t gfp_mask, struct mem_cgroup **memcgp);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index bd162f9..bd47451 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -324,7 +324,7 @@ extern unsigned long try_to_free_pages(struct zonelist 
*zonelist, int order,
 extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
  unsigned long nr_pages,
- gfp_t gfp_mask, bool noswap);
+ gfp_t gfp_mask, int flags);
 extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
struct zone *zone,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09ce016..5372151 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -511,16 +511,6 @@ enum res_type {
 #define OOM_CONTROL(0)
 
 /*
- * Reclaim flags for mem_cgroup_hierarchical_reclaim
- */
-#define MEM_CGROUP_RECLAIM_NOSWAP_BIT  0x0
-#define MEM_CGROUP_RECLAIM_NOSWAP  (1 << MEM_CGROUP_RECLAIM_NOSWAP_BIT)
-#define MEM_CGROUP_RECLAIM_SHRINK_BIT  0x1
-#define MEM_CGROUP_RECLAIM_SHRINK  (1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
-#define MEM_CGROUP_RECLAIM_KMEM_BIT0x2
-#define MEM_CGROUP_RECLAIM_KMEM(1 << 
MEM_CGROUP_RECLAIM_KMEM_BIT)
-
-/*
  * The memcg_create_mutex will be held whenever a new cgroup is created.
  * As a consequence, any change that needs to protect against new child cgroups
  * appearing has to hold it as well.
@@ -2137,7 +2127,7 @@ static unsigned long mem_cgroup_reclaim(struct mem_cgroup 
*memcg,
if (loop)
drain_all_stock_async(memcg);
total += try_to_free_mem_cgroup_pages(memcg, SWAP_CLUSTER_MAX,
- gfp_mask, noswap);
+ gfp_mask, flags);
if (test_thread_flag(TIF_MEMDIE) ||
fatal_signal_pending(current))
return 1;
@@ -2150,6 +2140,16 @@ static unsigned long mem_cgroup_reclaim(struct 
mem_cgroup *memcg,
break;
if (mem_cgroup_margin(memcg, flags & MEM_CGROUP_RECLAIM_KMEM))
break;
+
+   /*
+* Try harder to reclaim dcache. dcache reclaim may
+* temporarly fail due to dcache->dlock being held
+* by someone else. We must try harder to avoid premature
+* slab allocation failures.
+*/
+   if (flags & MEM_CGROUP_RECLAIM_KMEM &&
+   page_counter_read(&memcg->dcache))
+   continue;
/*
 * If nothing was reclaimed after two attempts, there
 * may be no reclaimable pages in this hierarchy.
@@ -2778,11 +2778,13 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask, bool kmem_charge
struct mem_cg

[Devel] [PATCH RHEL7 COMMIT] ms/mm: use sc->priority for slab shrink targets

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 5a99b388025b5981f44c29588a22dc37607f990c
Author: Josef Bacik 
Date:   Thu Aug 31 13:03:23 2017 +0300

ms/mm: use sc->priority for slab shrink targets

Previously we were using the ratio of the number of lru pages scanned to
the number of eligible lru pages to determine the number of slab objects
to scan.  The problem with this is that these two things have nothing to
do with each other, so in slab heavy work loads where there is little to
no page cache we can end up with the pages scanned being a very low
number.  This means that we reclaim next to no slab pages and waste a
lot of time reclaiming small amounts of space.

Instead use sc->priority in the same way we use it to determine scan
amounts for the lru's.  This generally equates to pages.  Consider the
following

slab_pages = (nr_objects * object_size) / PAGE_SIZE

What we would like to do is

scan = slab_pages >> sc->priority

but we don't know the number of slab pages each shrinker controls, only
the objects.  However say that theoretically we knew how many pages a
shrinker controlled, we'd still have to convert this to objects, which
would look like the following

scan = shrinker_pages >> sc->priority
scan_objects = (PAGE_SIZE / object_size) * scan

or written another way

scan_objects = (shrinker_pages >> sc->priority) *
(PAGE_SIZE / object_size)

which can thus be written

scan_objects = ((shrinker_pages * PAGE_SIZE) / object_size) >>
sc->priority

which is just

scan_objects = nr_objects >> sc->priority

We don't need to know exactly how many pages each shrinker represents,
it's objects are all the information we need.  Making this change allows
us to place an appropriate amount of pressure on the shrinker pools for
their relative size.

Signed-off-by: Josef Bacik 

https://jira.sw.ru/browse/PSBM-69226
Signed-off-by: Andrey Ryabinin 
---
 include/trace/events/vmscan.h | 23 ++
 mm/vmscan.c   | 44 ---
 2 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 132a985..d98fb0a 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -181,23 +181,22 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, 
mm_vmscan_memcg_softlimit_re
 
 TRACE_EVENT(mm_shrink_slab_start,
TP_PROTO(struct shrinker *shr, struct shrink_control *sc,
-   long nr_objects_to_shrink, unsigned long pgs_scanned,
-   unsigned long lru_pgs, unsigned long cache_items,
-   unsigned long long delta, unsigned long total_scan),
+   long nr_objects_to_shrink, unsigned long cache_items,
+   unsigned long long delta, unsigned long total_scan,
+   int priority),
 
-   TP_ARGS(shr, sc, nr_objects_to_shrink, pgs_scanned, lru_pgs,
-   cache_items, delta, total_scan),
+   TP_ARGS(shr, sc, nr_objects_to_shrink, cache_items, delta, total_scan,
+   priority),
 
TP_STRUCT__entry(
__field(struct shrinker *, shr)
__field(void *, shrink)
__field(long, nr_objects_to_shrink)
__field(gfp_t, gfp_flags)
-   __field(unsigned long, pgs_scanned)
-   __field(unsigned long, lru_pgs)
__field(unsigned long, cache_items)
__field(unsigned long long, delta)
__field(unsigned long, total_scan)
+   __field(int, priority)
),
 
TP_fast_assign(
@@ -205,23 +204,21 @@ TRACE_EVENT(mm_shrink_slab_start,
__entry->shrink = shr->scan_objects;
__entry->nr_objects_to_shrink = nr_objects_to_shrink;
__entry->gfp_flags = sc->gfp_mask;
-   __entry->pgs_scanned = pgs_scanned;
-   __entry->lru_pgs = lru_pgs;
__entry->cache_items = cache_items;
__entry->delta = delta;
__entry->total_scan = total_scan;
+   __entry->priority = priority;
),
 
-   TP_printk("%pF %p: objects to shrink %ld gfp_flags %s pgs_scanned %ld 
lru_pgs %ld cache items %ld delta %lld total_scan %ld",
+   TP_printk("%pF %p: objects to shrink %ld gfp_flags %s cache items %ld 
delta %lld total_scan %ld priority %d",
__entry->shrink,
__entry->shr,
__entry->nr_objects_to_shrink,
show_gfp_flags(__entry->gfp_flags),
-   __entry->pgs_scanned,
-   __entry->lru_pgs,
__entry->cach

Re: [Devel] [PATCH rh7 2/2] mm/memcg: reclaim only kmem if kmem limit reached.

2017-08-31 Thread Konstantin Khorenko


Do we want to push it to mainstream as well?

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/25/2017 06:38 PM, Andrey Ryabinin wrote:

If kmem limit on memcg reached, we go into memory reclaim,
and reclaim everything we can, including page cache and anon.
Reclaiming page cache or anon won't help since we need to lower
only kmem usage. This patch fixes the problem by avoiding
non-kmem reclaim on hitting the kmem limit.

https://jira.sw.ru/browse/PSBM-69226
Signed-off-by: Andrey Ryabinin 
---
 include/linux/memcontrol.h | 10 ++
 include/linux/swap.h   |  2 +-
 mm/memcontrol.c| 30 --
 mm/vmscan.c| 31 ---
 4 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1a52e58ab7de..1d6bc80c4c90 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -45,6 +45,16 @@ struct mem_cgroup_reclaim_cookie {
unsigned int generation;
 };

+/*
+ * Reclaim flags for mem_cgroup_hierarchical_reclaim
+ */
+#define MEM_CGROUP_RECLAIM_NOSWAP_BIT  0x0
+#define MEM_CGROUP_RECLAIM_NOSWAP  (1 << MEM_CGROUP_RECLAIM_NOSWAP_BIT)
+#define MEM_CGROUP_RECLAIM_SHRINK_BIT  0x1
+#define MEM_CGROUP_RECLAIM_SHRINK  (1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
+#define MEM_CGROUP_RECLAIM_KMEM_BIT0x2
+#define MEM_CGROUP_RECLAIM_KMEM(1 << 
MEM_CGROUP_RECLAIM_KMEM_BIT)
+
 #ifdef CONFIG_MEMCG
 int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
  gfp_t gfp_mask, struct mem_cgroup **memcgp);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index bd162f9bef0d..bd47451ec95a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -324,7 +324,7 @@ extern unsigned long try_to_free_pages(struct zonelist 
*zonelist, int order,
 extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
  unsigned long nr_pages,
- gfp_t gfp_mask, bool noswap);
+ gfp_t gfp_mask, int flags);
 extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
struct zone *zone,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 97824e281d7a..f9a5f3819a31 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -511,16 +511,6 @@ enum res_type {
 #define OOM_CONTROL(0)

 /*
- * Reclaim flags for mem_cgroup_hierarchical_reclaim
- */
-#define MEM_CGROUP_RECLAIM_NOSWAP_BIT  0x0
-#define MEM_CGROUP_RECLAIM_NOSWAP  (1 << MEM_CGROUP_RECLAIM_NOSWAP_BIT)
-#define MEM_CGROUP_RECLAIM_SHRINK_BIT  0x1
-#define MEM_CGROUP_RECLAIM_SHRINK  (1 << MEM_CGROUP_RECLAIM_SHRINK_BIT)
-#define MEM_CGROUP_RECLAIM_KMEM_BIT0x2
-#define MEM_CGROUP_RECLAIM_KMEM(1 << 
MEM_CGROUP_RECLAIM_KMEM_BIT)
-
-/*
  * The memcg_create_mutex will be held whenever a new cgroup is created.
  * As a consequence, any change that needs to protect against new child cgroups
  * appearing has to hold it as well.
@@ -2137,7 +2127,7 @@ static unsigned long mem_cgroup_reclaim(struct mem_cgroup 
*memcg,
if (loop)
drain_all_stock_async(memcg);
total += try_to_free_mem_cgroup_pages(memcg, SWAP_CLUSTER_MAX,
- gfp_mask, noswap);
+ gfp_mask, flags);
if (test_thread_flag(TIF_MEMDIE) ||
fatal_signal_pending(current))
return 1;
@@ -2150,6 +2140,16 @@ static unsigned long mem_cgroup_reclaim(struct 
mem_cgroup *memcg,
break;
if (mem_cgroup_margin(memcg, flags & MEM_CGROUP_RECLAIM_KMEM))
break;
+
+   /*
+* Try harder to reclaim dcache. dcache reclaim may
+* temporarly fail due to dcache->dlock being held
+* by someone else. We must try harder to avoid premature
+* slab allocation failures.
+*/
+   if (flags & MEM_CGROUP_RECLAIM_KMEM &&
+   page_counter_read(&memcg->dcache))
+   continue;
/*
 * If nothing was reclaimed after two attempts, there
 * may be no reclaimable pages in this hierarchy.
@@ -2778,11 +2778,13 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask, bool kmem_charge
struct mem_cgroup *mem_over_limit;
struct page_counter *counter;
unsigned long nr_reclaimed;
-   unsigned long flags = 0;
+   unsigned long flags;

if (mem_cgroup_is_r

[Devel] [PATCH] zdtm: fix package memory allocation in autofs.c

2017-08-31 Thread Stanislav Kinsburskiy

Plus some cleanup.

https://jira.sw.ru/browse/PSBM-71078

Signed-off-by: Stanislav Kinsburskiy 
---
 test/zdtm/static/autofs.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/test/zdtm/static/autofs.c b/test/zdtm/static/autofs.c
index 8d917ee..882289f 100644
--- a/test/zdtm/static/autofs.c
+++ b/test/zdtm/static/autofs.c
@@ -460,10 +460,10 @@ static int automountd_loop(int pipe, const char 
*mountpoint, struct autofs_param
 {
union autofs_v5_packet_union *packet;
ssize_t bytes;
-   size_t psize = sizeof(*packet) * 2;
+   size_t psize = sizeof(*packet);
int err = 0;
 
-   packet = malloc(psize);
+   packet = malloc(psize * 2);
if (!packet) {
pr_err("failed to allocate autofs packet\n");
return -ENOMEM;
@@ -473,7 +473,7 @@ static int automountd_loop(int pipe, const char 
*mountpoint, struct autofs_param
siginterrupt(SIGUSR2, 1);
 
while (!stop && !err) {
-   memset(packet, 0, sizeof(*packet));
+   memset(packet, 0, psize * 2);
 
bytes = read(pipe, packet, psize);
if (bytes < 0) {
@@ -483,12 +483,12 @@ static int automountd_loop(int pipe, const char 
*mountpoint, struct autofs_param
}
continue;
}
-   if (bytes > psize) {
-   pr_err("read more that expected: %zd > %zd\n", bytes, 
psize);
-   return -EINVAL;
-   }
-   if (bytes != sizeof(*packet)) {
-   pr_err("read less than expected: %zd\n", bytes);
+   if (bytes != psize) {
+   pr_err("read %s that expected: %zd %s %zd\n",
+   (bytes > psize) ? "more" : "less",
+   bytes,
+   (bytes > psize) ? ">" : "<",
+   psize);
return -EINVAL;
}
err = automountd_serve(mountpoint, param, packet);

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH RHEL7 COMMIT] ms/block: Check for gaps on front and back merges

2017-08-31 Thread Konstantin Khorenko


Please consider to release a RK patch for it.

https://readykernel.com/

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/31/2017 11:59 AM, Konstantin Khorenko wrote:

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 36178f3003689a99f0b58b6c12e235186952d9a9
Author: Maxim Patlasov 
Date:   Thu Aug 31 11:59:13 2017 +0300

ms/block: Check for gaps on front and back merges

Backport 5e7c4274a70aa2d6f485996d0ca1dad52d0039ca from ml. Before the patch,
front merge incorrectly used the same req_gap_to_prev() as back merge.
Original patch description:

block: Check for gaps on front and back merges

We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.

Signed-off-by: Jens Axboe 
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg 

https://jira.sw.ru/browse/PSBM-70321
Signed-off-by: Maxim Patlasov 
---
 block/blk-merge.c  | 18 +-
 include/linux/blkdev.h | 20 
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b0ce46d..0e8b7f2 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -294,6 +294,8 @@ static inline int ll_new_hw_segment(struct request_queue *q,
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 struct bio *bio)
 {
+   if (req_gap_back_merge(req, bio))
+   return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req)) {
req->cmd_flags |= REQ_NOMERGE;
@@ -312,6 +314,8 @@ int ll_back_merge_fn(struct request_queue *q, struct 
request *req,
 int ll_front_merge_fn(struct request_queue *q, struct request *req,
  struct bio *bio)
 {
+   if (req_gap_front_merge(req, bio))
+   return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req)) {
req->cmd_flags |= REQ_NOMERGE;
@@ -338,14 +342,6 @@ static bool req_no_special_merge(struct request *req)
return !q->mq_ops && req->special;
 }

-static int req_gap_to_prev(struct request *req, struct bio *next)
-{
-   struct bio *prev = req->biotail;
-
-   return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1],
-   next->bi_io_vec[0].bv_offset);
-}
-
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
struct request *next)
 {
@@ -360,7 +356,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
if (req_no_special_merge(req) || req_no_special_merge(next))
return 0;

-   if (req_gap_to_prev(req, next->bio))
+   if (req_gap_back_merge(req, next->bio))
return 0;

/*
@@ -568,10 +564,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
!blk_write_same_mergeable(rq->bio, bio))
return false;

-   /* Only check gaps if the bio carries data */
-   if (bio_has_data(bio) && req_gap_to_prev(rq, bio))
-   return false;
-
return true;
 }

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e1662f9..2b9bc88 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1493,6 +1493,26 @@ static inline bool bvec_gap_to_prev(struct request_queue 
*q,
((bprv->bv_offset + bprv->bv_len) & queue_virt_boundary(q));
 }

+static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
+struct bio *next)
+{
+   if (!bio_has_data(prev))
+   return false;
+
+   return bvec_gap_to_prev(q, &prev->bi_io_vec[prev->bi_vcnt - 1],
+   next->bi_io_vec[0].bv_offset);
+}
+
+static inline bool req_gap_back_merge(struct request *req, struct bio *bio)
+{
+   return bio_will_gap(req->q, req->biotail, bio);
+}
+
+static inline bool req_gap_front_merge(struct request *req, struct bio *bio)
+{
+   return bio_will_gap(req->q, bio, req->bio);
+}
+
 struct work_struct;
 int kblockd_schedule_work(struct work_struct *work);
 int kblockd_schedule_delayed_work(struct delayed_work *dwork, unsigned long 
delay);
.


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] ms/block: Check for gaps on front and back merges

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 36178f3003689a99f0b58b6c12e235186952d9a9
Author: Maxim Patlasov 
Date:   Thu Aug 31 11:59:13 2017 +0300

ms/block: Check for gaps on front and back merges

Backport 5e7c4274a70aa2d6f485996d0ca1dad52d0039ca from ml. Before the patch,
front merge incorrectly used the same req_gap_to_prev() as back merge.
Original patch description:

block: Check for gaps on front and back merges

We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.

Signed-off-by: Jens Axboe 
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg 

https://jira.sw.ru/browse/PSBM-70321
Signed-off-by: Maxim Patlasov 
---
 block/blk-merge.c  | 18 +-
 include/linux/blkdev.h | 20 
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b0ce46d..0e8b7f2 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -294,6 +294,8 @@ static inline int ll_new_hw_segment(struct request_queue *q,
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 struct bio *bio)
 {
+   if (req_gap_back_merge(req, bio))
+   return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req)) {
req->cmd_flags |= REQ_NOMERGE;
@@ -312,6 +314,8 @@ int ll_back_merge_fn(struct request_queue *q, struct 
request *req,
 int ll_front_merge_fn(struct request_queue *q, struct request *req,
  struct bio *bio)
 {
+   if (req_gap_front_merge(req, bio))
+   return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req)) {
req->cmd_flags |= REQ_NOMERGE;
@@ -338,14 +342,6 @@ static bool req_no_special_merge(struct request *req)
return !q->mq_ops && req->special;
 }
 
-static int req_gap_to_prev(struct request *req, struct bio *next)
-{
-   struct bio *prev = req->biotail;
-
-   return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1],
-   next->bi_io_vec[0].bv_offset);
-}
-
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
struct request *next)
 {
@@ -360,7 +356,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
if (req_no_special_merge(req) || req_no_special_merge(next))
return 0;
 
-   if (req_gap_to_prev(req, next->bio))
+   if (req_gap_back_merge(req, next->bio))
return 0;
 
/*
@@ -568,10 +564,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
!blk_write_same_mergeable(rq->bio, bio))
return false;
 
-   /* Only check gaps if the bio carries data */
-   if (bio_has_data(bio) && req_gap_to_prev(rq, bio))
-   return false;
-
return true;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e1662f9..2b9bc88 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1493,6 +1493,26 @@ static inline bool bvec_gap_to_prev(struct request_queue 
*q,
((bprv->bv_offset + bprv->bv_len) & queue_virt_boundary(q));
 }
 
+static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
+struct bio *next)
+{
+   if (!bio_has_data(prev))
+   return false;
+
+   return bvec_gap_to_prev(q, &prev->bi_io_vec[prev->bi_vcnt - 1],
+   next->bi_io_vec[0].bv_offset);
+}
+
+static inline bool req_gap_back_merge(struct request *req, struct bio *bio)
+{
+   return bio_will_gap(req->q, req->biotail, bio);
+}
+
+static inline bool req_gap_front_merge(struct request *req, struct bio *bio)
+{
+   return bio_will_gap(req->q, bio, req->bio);
+}
+
 struct work_struct;
 int kblockd_schedule_work(struct work_struct *work);
 int kblockd_schedule_delayed_work(struct delayed_work *dwork, unsigned long 
delay);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH RHEL7 COMMIT] mm/memcg: add missing kmem charge

2017-08-31 Thread Konstantin Khorenko


Please consider to release it as a ReadyKernel patch.

https://readykernel.com/

(required only for vz7.33.22 kernel)

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 08/31/2017 11:52 AM, Konstantin Khorenko wrote:

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 822bec288dcaf5f69c1ed3e64734230320c798ba
Author: Andrey Ryabinin 
Date:   Thu Aug 31 11:52:13 2017 +0300

mm/memcg: add missing kmem charge

Since de3a106e28d5 ("mm/memcg: reclaim memory on reaching kmem limit.")
if try_charge() decide to bypass memory limit, memcg_charge_kmem()
will charge only ->memory/->memsw but not ->kmem. This may lead to
deadlocks during cgroup destruction as condition:
 (page_counter_read(&memcg->memory) -
 page_counter_read(&memcg->kmem) > 0)

in mem_cgroup_reparent_charges() won't come true ever.

https://jira.sw.ru/browse/PSBM-70556

Fixes: de3a106e28d5 ("mm/memcg: reclaim memory on reaching kmem limit.")
Signed-off-by: Andrey Ryabinin 
Reviewed-by:Vasily Averin 
---
 mm/memcontrol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 97824e2..09ce016 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3087,6 +3087,8 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
page_counter_charge(&memcg->memory, nr_pages);
if (do_swap_account)
page_counter_charge(&memcg->memsw, nr_pages);
+   page_counter_charge(&memcg->kmem, nr_pages);
+
ret = 0;
}

.


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] mm/memcg: add missing kmem charge

2017-08-31 Thread Konstantin Khorenko

The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-514.26.1.vz7.35.5
-->
commit 822bec288dcaf5f69c1ed3e64734230320c798ba
Author: Andrey Ryabinin 
Date:   Thu Aug 31 11:52:13 2017 +0300

mm/memcg: add missing kmem charge

Since de3a106e28d5 ("mm/memcg: reclaim memory on reaching kmem limit.")
if try_charge() decide to bypass memory limit, memcg_charge_kmem()
will charge only ->memory/->memsw but not ->kmem. This may lead to
deadlocks during cgroup destruction as condition:
 (page_counter_read(&memcg->memory) -
 page_counter_read(&memcg->kmem) > 0)

in mem_cgroup_reparent_charges() won't come true ever.

https://jira.sw.ru/browse/PSBM-70556

Fixes: de3a106e28d5 ("mm/memcg: reclaim memory on reaching kmem limit.")
Signed-off-by: Andrey Ryabinin 
Reviewed-by:Vasily Averin 
---
 mm/memcontrol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 97824e2..09ce016 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3087,6 +3087,8 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
page_counter_charge(&memcg->memory, nr_pages);
if (do_swap_account)
page_counter_charge(&memcg->memsw, nr_pages);
+   page_counter_charge(&memcg->kmem, nr_pages);
+
ret = 0;
}
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

70 matches

Mail list logo