[Devel] [PATCH RHEL7 COMMIT] Revert "nfs: protect callback execution against per-net callback thread shutdown"

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.26
-->
commit a80a0dd37f0f64499413e888c21066a7f21c27aa
Author: Konstantin Khorenko 
Date:   Mon Nov 13 11:25:18 2017 +0300

Revert "nfs: protect callback execution against per-net callback thread 
shutdown"

This reverts commit 2149800a70af636b2b22289fc5aa977420b392c2.

Temporary revert due to
https://jira.sw.ru/browse/PSBM-77114

Signed-off-by: Konstantin Khorenko 
---
 fs/nfs/callback.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index e18d774..0beb275 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -99,8 +99,6 @@ nfs4_callback_up(struct svc_serv *serv)
 }
 
 #if defined(CONFIG_NFS_V4_1)
-static DEFINE_MUTEX(nfs41_callback_mutex);
-
 /*
  * The callback service for NFSv4.1 callbacks
  */
@@ -119,12 +117,6 @@ nfs41_callback_svc(void *vrqstp)
if (try_to_freeze())
continue;
 
-   mutex_lock(&nfs41_callback_mutex);
-   if (kthread_should_stop()) {
-   mutex_unlock(&nfs41_callback_mutex);
-   return 0;
-   }
-
prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
spin_lock_bh(&serv->sv_cb_lock);
if (!list_empty(&serv->sv_cb_list)) {
@@ -137,10 +129,8 @@ nfs41_callback_svc(void *vrqstp)
error = bc_svc_process(serv, req, rqstp);
dprintk("bc_svc_process() returned w/ error code= %d\n",
error);
-   mutex_unlock(&nfs41_callback_mutex);
} else {
spin_unlock_bh(&serv->sv_cb_lock);
-   mutex_unlock(&nfs41_callback_mutex);
schedule();
finish_wait(&serv->sv_cb_waitq, &wq);
}
@@ -252,7 +242,6 @@ static void nfs_callback_down_net(u32 minorversion, struct 
svc_serv *serv, struc
return;
 
dprintk("NFS: destroy per-net callback data; net=%p\n", net);
-   bc_svc_flush_queue_net(serv, net);
svc_shutdown_net(serv, net);
 }
 
@@ -388,13 +377,7 @@ void nfs_callback_down(int minorversion, struct net *net)
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
 
mutex_lock(&nfs_callback_mutex);
-#if defined(CONFIG_NFS_V4_1)
-   mutex_lock(&nfs41_callback_mutex);
-   nfs_callback_down_net(minorversion, cb_info->serv, net);
-   mutex_unlock(&nfs41_callback_mutex);
-#else
nfs_callback_down_net(minorversion, cb_info->serv, net);
-#endif
cb_info->users--;
if (cb_info->users == 0 && cb_info->task != NULL) {
kthread_stop(cb_info->task);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] tcache: Increase seeks more

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.26
-->
commit 8e1f4aef14d0149513cd079a4811e4b5c52391d6
Author: Kirill Tkhai 
Date:   Mon Nov 13 11:30:02 2017 +0300

tcache: Increase seeks more

One node from the test plan showed, number of seeks still is not enough.
Increase it again twice more to fit new formula introduced in commit
e008b95a28ef ("ms/mm: use sc->priority for slab shrink targets").

https://jira.sw.ru/browse/PSBM-77015

Fixes: d6cdd02 ("tswap, tcache: Increase shrinkers seeks")

Signed-off-by: Kirill Tkhai 
---
 mm/tcache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index 8b893e6..b5157d9 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -1202,7 +1202,7 @@ static unsigned long tcache_shrink_scan(struct shrinker 
*shrink,
 struct shrinker tcache_shrinker = {
.count_objects  = tcache_shrink_count,
.scan_objects   = tcache_shrink_scan,
-   .seeks  = 4,
+   .seeks  = 8,
.batch  = TCACHE_SCAN_BATCH,
.flags  = SHRINKER_NUMA_AWARE,
 };
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] Revert "sunrpc: bc_svc_flush_queue_net() helper introduced"

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.26
-->
commit 8742ccbe2d3e6aa8b821639ab915909780c9e790
Author: Konstantin Khorenko 
Date:   Mon Nov 13 11:25:44 2017 +0300

Revert "sunrpc: bc_svc_flush_queue_net() helper introduced"

This reverts commit 99bfffc43ae40204d445f2c138071176d3e7b03d.

Temporary revert due to
https://jira.sw.ru/browse/PSBM-77114

Signed-off-by: Konstantin Khorenko 
---
 include/linux/sunrpc/svc.h |  2 --
 net/sunrpc/svc.c   | 15 ---
 2 files changed, 17 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index fe70ff0..2b30868 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -484,8 +484,6 @@ void   svc_reserve(struct svc_rqst *rqstp, 
int space);
 struct svc_pool *  svc_pool_for_cpu(struct svc_serv *serv, int cpu);
 char *svc_print_addr(struct svc_rqst *, char *, size_t);
 
-void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net);
-
 #defineRPC_MAX_ADDRBUFLEN  (63U)
 
 /*
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 2ca4ff7..de8cded1 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1338,21 +1338,6 @@ svc_process(struct svc_rqst *rqstp)
 EXPORT_SYMBOL_GPL(svc_process);
 
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
-void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net)
-{
-   struct rpc_rqst *req, *tmp;
-
-   spin_lock_bh(&serv->sv_cb_lock);
-   list_for_each_entry_safe(req, tmp, &serv->sv_cb_list, rq_bc_list) {
-   if (req->rq_xprt->xprt_net == net) {
-   list_del(&req->rq_bc_list);
-   xprt_free_bc_request(req);
-   }
-   }
-   spin_unlock_bh(&serv->sv_cb_lock);
-}
-EXPORT_SYMBOL_GPL(bc_svc_flush_queue_net);
-
 /*
  * Process a backchannel RPC request that arrived over an existing
  * outbound connection
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: fix dio_invalidate_cache()

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.26
-->
commit 887eece598408a684fb08e932ebc7246723939b2
Author: Maxim Patlasov 
Date:   Mon Nov 13 11:35:22 2017 +0300

ploop: fix dio_invalidate_cache()

The patch fixes two critical bugs in dio_invalidate_cache:

1) "bdev" arg points to the block_device of underlying block device
(where image file resides), not ploop block device. Hence, the statement:

> struct ploop_device *plo = bdev->bd_disk->private_data;

is mistake -- that private_data is not our ploop private_data.

2) dio_invalidate_cache() is always called with plo->ctl_mutex held. Hence,
we cannot use ploop_get_dm_crypt_bdev() who tries to acquire the lock again.

https://jira.sw.ru/browse/PSBM-73999
Signed-off-by: Maxim Patlasov 
---
 drivers/block/ploop/io_direct.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index fb594c8..d6b1118 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -860,16 +860,17 @@ static int dio_fsync_thread(void * data)
  * must not be quiesced.
  */
 
-static int dio_invalidate_cache(struct address_space * mapping,
-   struct block_device * bdev)
+static int dio_invalidate_cache(struct ploop_io * io)
 {
+   struct address_space *mapping = io->files.mapping;
+   struct block_device  *bdev= io->files.bdev;
int err;
int attempt2 = 0;
 
 retry:
err = invalidate_inode_pages2(mapping);
if (err) {
-   struct ploop_device *plo = bdev->bd_disk->private_data;
+   struct ploop_device *plo = io->plo;
struct block_device *dm_crypt_bdev;
 
printk("PLOOP: failed to invalidate page cache %d/%d\n", err, 
attempt2);
@@ -879,7 +880,8 @@ static int dio_invalidate_cache(struct address_space * 
mapping,
 
mutex_unlock(&mapping->host->i_mutex);
 
-   dm_crypt_bdev = ploop_get_dm_crypt_bdev(plo);
+   WARN_ONCE(!mutex_is_locked(&plo->ctl_mutex), "ctl_mutex is not 
held");
+   dm_crypt_bdev = __ploop_get_dm_crypt_bdev(plo);
if (dm_crypt_bdev)
bdev = dm_crypt_bdev;
else
@@ -928,7 +930,7 @@ static void dio_destroy(struct ploop_io * io)
io->files.em_tree = NULL;
mutex_lock(&io->files.inode->i_mutex);
ploop_dio_close(io, delta->flags & PLOOP_FMT_RDONLY);
-   (void)dio_invalidate_cache(io->files.mapping, 
io->files.bdev);
+   (void)dio_invalidate_cache(io);
mutex_unlock(&io->files.inode->i_mutex);
}
 
@@ -991,7 +993,7 @@ static int dio_open(struct ploop_io * io)
 
io->files.em_tree = em_tree;
 
-   err = dio_invalidate_cache(io->files.mapping, io->files.bdev);
+   err = dio_invalidate_cache(io);
if (err) {
io->files.em_tree = NULL;
ploop_dio_close(io, 0);
@@ -1637,7 +1639,7 @@ static int dio_prepare_snapshot(struct ploop_io * io, 
struct ploop_snapdata *sd)
}
 
mutex_lock(&io->files.inode->i_mutex);
-   err = dio_invalidate_cache(io->files.mapping, io->files.bdev);
+   err = dio_invalidate_cache(io);
mutex_unlock(&io->files.inode->i_mutex);
 
if (err) {
@@ -1709,7 +1711,7 @@ static int dio_prepare_merge(struct ploop_io * io, struct 
ploop_snapdata *sd)
 
mutex_lock(&io->files.inode->i_mutex);
 
-   err = dio_invalidate_cache(io->files.mapping, io->files.bdev);
+   err = dio_invalidate_cache(io);
if (err) {
mutex_unlock(&io->files.inode->i_mutex);
fput(file);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH] mm: Fix mis accounting of isolated pages in memcg_numa_isolate_pages()

2017-11-13 Thread Kirill Tkhai
When split_huge_page_to_list() fails, and a huge page is going back
to LRU, the number of isolated pages is decreasing. So we must
subtract HPAGE_PMD_NR from NR_ISOLATED_ANON counter, not to add it.

Otherwise, we may bumped into a situation, when number of isolated
pages grows up to number of inactive pages, and direct reclaim hangs in:

  shrink_inactive_list()
 while (too_many_isolated())
congestion_wait(BLK_RW_ASYNC, HZ/10),

waiting for the counter becomes less. But it has no a chance
to finish, and hangs forever. Fix that.

https://jira.sw.ru/browse/PSBM-76970

Signed-off-by: Kirill Tkhai 
---
 mm/memcontrol.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a7fa84a9980..a165a221e87 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4963,7 +4963,7 @@ static long memcg_numa_isolate_pages(struct lruvec 
*lruvec, enum lru_list lru,
if (PageTransHuge(page) && split_huge_page_to_list(page, dst)) {
list_del(&page->lru);
mod_zone_page_state(zone, NR_ISOLATED_ANON,
-   HPAGE_PMD_NR);
+   -HPAGE_PMD_NR);
putback_lru_page(page);
}
}

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH] mm: Fix mis accounting of isolated pages in memcg_numa_isolate_pages()

2017-11-13 Thread Andrey Ryabinin


On 11/13/2017 02:50 PM, Kirill Tkhai wrote:
> When split_huge_page_to_list() fails, and a huge page is going back
> to LRU, the number of isolated pages is decreasing. So we must
> subtract HPAGE_PMD_NR from NR_ISOLATED_ANON counter, not to add it.
> 
> Otherwise, we may bumped into a situation, when number of isolated
> pages grows up to number of inactive pages, and direct reclaim hangs in:
> 
>   shrink_inactive_list()
>  while (too_many_isolated())
> congestion_wait(BLK_RW_ASYNC, HZ/10),
> 
> waiting for the counter becomes less. But it has no a chance
> to finish, and hangs forever. Fix that.
> 
> https://jira.sw.ru/browse/PSBM-76970
> 
> Signed-off-by: Kirill Tkhai 

Acked-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] mm: Fix mis accounting of isolated pages in memcg_numa_isolate_pages()

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.27
-->
commit 01b3a23d466d41c3d54e8700e58785e2357e053b
Author: Kirill Tkhai 
Date:   Mon Nov 13 15:24:40 2017 +0300

mm: Fix mis accounting of isolated pages in memcg_numa_isolate_pages()

When split_huge_page_to_list() fails, and a huge page is going back
to LRU, the number of isolated pages is decreasing. So we must
subtract HPAGE_PMD_NR from NR_ISOLATED_ANON counter, not to add it.

Otherwise, we may bumped into a situation, when number of isolated
pages grows up to number of inactive pages, and direct reclaim hangs in:

  shrink_inactive_list()
 while (too_many_isolated())
congestion_wait(BLK_RW_ASYNC, HZ/10),

waiting for the counter becomes less. But it has no a chance
to finish, and hangs forever. Fix that.

https://jira.sw.ru/browse/PSBM-76970

Signed-off-by: Kirill Tkhai 
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a7fa84a..a165a22 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4963,7 +4963,7 @@ static long memcg_numa_isolate_pages(struct lruvec 
*lruvec, enum lru_list lru,
if (PageTransHuge(page) && split_huge_page_to_list(page, dst)) {
list_del(&page->lru);
mod_zone_page_state(zone, NR_ISOLATED_ANON,
-   HPAGE_PMD_NR);
+   -HPAGE_PMD_NR);
putback_lru_page(page);
}
}
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH v6 1/2] sunrpc: bc_svc_flush_queue_net() helper introduced

2017-11-13 Thread Stanislav Kinsburskiy
From: Stanislav Kinsburskiy 

This helper can be used to remove backchannel requests from callback queue on
per-net basis.

Signed-off-by: Stanislav Kinsburskiy 
---
 include/linux/sunrpc/svc.h |2 ++
 net/sunrpc/svc.c   |   15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 2b30868..fe70ff0 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -484,6 +484,8 @@ void   svc_reserve(struct svc_rqst *rqstp, 
int space);
 struct svc_pool *  svc_pool_for_cpu(struct svc_serv *serv, int cpu);
 char *svc_print_addr(struct svc_rqst *, char *, size_t);
 
+void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net);
+
 #defineRPC_MAX_ADDRBUFLEN  (63U)
 
 /*
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index de8cded..2ca4ff7 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1338,6 +1338,21 @@ svc_process(struct svc_rqst *rqstp)
 EXPORT_SYMBOL_GPL(svc_process);
 
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
+void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net)
+{
+   struct rpc_rqst *req, *tmp;
+
+   spin_lock_bh(&serv->sv_cb_lock);
+   list_for_each_entry_safe(req, tmp, &serv->sv_cb_list, rq_bc_list) {
+   if (req->rq_xprt->xprt_net == net) {
+   list_del(&req->rq_bc_list);
+   xprt_free_bc_request(req);
+   }
+   }
+   spin_unlock_bh(&serv->sv_cb_lock);
+}
+EXPORT_SYMBOL_GPL(bc_svc_flush_queue_net);
+
 /*
  * Process a backchannel RPC request that arrived over an existing
  * outbound connection

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH v6 0/2] nfs: fix race between callback shutdown and execution

2017-11-13 Thread Stanislav Kinsburskiy
The idea is to use mutex for protecting callback execution agains per-net
callback shutdown and destroying all the net-related backchannel requests
before transports destruction.

---

Stanislav Kinsburskiy (2):
  sunrpc: bc_svc_flush_queue_net() helper introduced
  nfs: protect callback execution against per-net callback thread shutdown


 fs/nfs/callback.c  |   20 
 include/linux/sunrpc/svc.h |3 +++
 net/sunrpc/svc.c   |   15 +++
 3 files changed, 38 insertions(+)

--
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH v6 2/2] nfs: protect callback execution against per-net callback thread shutdown

2017-11-13 Thread Stanislav Kinsburskiy
From: Stanislav Kinsburskiy 

Here is the race:

CPU #0  CPU#1

cleanup_mnt nfs41_callback_svc (get xprt from the list)
nfs_callback_down   ...
... ...
svc_close_net   ...
... ...
svc_xprt_free   ...
svc_bc_sock_freebc_svc_process
kfree(xprt) svc_process_common
rqstp->rq_xprt->xpt_ops (use after free)

The problem is that per-net SUNRPC transports shutdown is done regardless
current callback execution. This is a race leading to transport use-after-free
in callback handler.
This patch fixes it in stright-forward way. I.e. it protects callback
execution with the same mutex used for per-net data creation and destruction.
Hopefully, it won't slow down NFS client significantly.

https://jira.sw.ru/browse/PSBM-75751

v6: destroy all per-net backchannel requests only for NFSv4.1

v5: destroy all per-net backchannel requests before transports on in
nfs_callback_down_net

v4: use another mutex to protect callback execution agains per-net transports
shutdown.
This guarantees, that transports won't be destroyed by shutdown callback while
execution is in progress and vice versa.

v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
mutex taken), while thread wait for the mutex to take.
The idea is to simply check if thread has to exit, if mutex lock has failed.
This is a busy loop, but it shouldn't happend often and for long.

Signed-off-by: Stanislav Kinsburskiy 
---
 fs/nfs/callback.c  |   20 
 include/linux/sunrpc/svc.h |1 +
 2 files changed, 21 insertions(+)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 0beb275..feffccf 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -99,6 +99,8 @@ nfs4_callback_up(struct svc_serv *serv)
 }
 
 #if defined(CONFIG_NFS_V4_1)
+static DEFINE_MUTEX(nfs41_callback_mutex);
+
 /*
  * The callback service for NFSv4.1 callbacks
  */
@@ -117,6 +119,12 @@ nfs41_callback_svc(void *vrqstp)
if (try_to_freeze())
continue;
 
+   mutex_lock(&nfs41_callback_mutex);
+   if (kthread_should_stop()) {
+   mutex_unlock(&nfs41_callback_mutex);
+   return 0;
+   }
+
prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
spin_lock_bh(&serv->sv_cb_lock);
if (!list_empty(&serv->sv_cb_list)) {
@@ -129,8 +137,10 @@ nfs41_callback_svc(void *vrqstp)
error = bc_svc_process(serv, req, rqstp);
dprintk("bc_svc_process() returned w/ error code= %d\n",
error);
+   mutex_unlock(&nfs41_callback_mutex);
} else {
spin_unlock_bh(&serv->sv_cb_lock);
+   mutex_unlock(&nfs41_callback_mutex);
schedule();
finish_wait(&serv->sv_cb_waitq, &wq);
}
@@ -139,6 +149,13 @@ nfs41_callback_svc(void *vrqstp)
return 0;
 }
 
+static void nfs41_callback_down_net(struct svc_serv *serv, struct net *net)
+{
+   mutex_lock(&nfs41_callback_mutex);
+   bc_svc_flush_queue_net(serv, net);
+   mutex_unlock(&nfs41_callback_mutex);
+}
+
 /*
  * Bring up the NFSv4.1 callback service
  */
@@ -150,6 +167,7 @@ nfs41_callback_up(struct svc_serv *serv)
INIT_LIST_HEAD(&serv->sv_cb_list);
spin_lock_init(&serv->sv_cb_lock);
init_waitqueue_head(&serv->sv_cb_waitq);
+   serv->svc_cb_down_net = nfs41_callback_down_net;
rqstp = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
dprintk("--> %s return %d\n", __func__, PTR_ERR_OR_ZERO(rqstp));
return rqstp;
@@ -242,6 +260,8 @@ static void nfs_callback_down_net(u32 minorversion, struct 
svc_serv *serv, struc
return;
 
dprintk("NFS: destroy per-net callback data; net=%p\n", net);
+   if (serv->svc_cb_down_net)
+   serv->svc_cb_down_net(serv, net);
svc_shutdown_net(serv, net);
 }
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index fe70ff0..c04ef80 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -108,6 +108,7 @@ struct svc_serv {
wait_queue_head_t   sv_cb_waitq;/* sleep here if there are no
 * entries in the svc_cb_list */
struct svc_xprt *sv_bc_xprt;/* callback on fore channel */
+   void(*svc_cb_down_net)(struct svc_serv *serv, 
struct net *net);
 #endif /* CONFIG_SUNRPC_BACKCHANNEL */
 };
 

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH v6 0/2] nfs: fix race between callback shutdown and execution

2017-11-13 Thread Kirill Tkhai
On 13.11.2017 16:16, Stanislav Kinsburskiy wrote:
> The idea is to use mutex for protecting callback execution agains per-net
> callback shutdown and destroying all the net-related backchannel requests
> before transports destruction.
> 
> ---
> 
> Stanislav Kinsburskiy (2):
>   sunrpc: bc_svc_flush_queue_net() helper introduced
>   nfs: protect callback execution against per-net callback thread shutdown
> 
> 
>  fs/nfs/callback.c  |   20 
>  include/linux/sunrpc/svc.h |3 +++
>  net/sunrpc/svc.c   |   15 +++
>  3 files changed, 38 insertions(+)

Reviewed-by: Kirill Tkhai 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] nfs: protect callback execution against per-net callback thread shutdown

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.27
-->
commit c1cd5be61d00591c64888d4ae012d675b3073d8e
Author: Stanislav Kinsburskiy 
Date:   Mon Nov 13 17:11:42 2017 +0300

nfs: protect callback execution against per-net callback thread shutdown

Patchset description:
nfs: fix race between callback shutdown and execution

The idea is to use mutex for protecting callback execution agains per-net
callback shutdown and destroying all the net-related backchannel requests
before transports destruction.

Stanislav Kinsburskiy (2):
sunrpc: bc_svc_flush_queue_net() helper introduced
nfs: protect callback execution against per-net callback thread 
shutdown

==
This patch description:

Here is the race:

CPU #0  CPU#1

cleanup_mnt nfs41_callback_svc (get xprt from the list)
nfs_callback_down   ...
... ...
svc_close_net   ...
... ...
svc_xprt_free   ...
svc_bc_sock_freebc_svc_process
kfree(xprt) svc_process_common
rqstp->rq_xprt->xpt_ops (use after free)

The problem is that per-net SUNRPC transports shutdown is done regardless
current callback execution. This is a race leading to transport 
use-after-free
in callback handler.
This patch fixes it in stright-forward way. I.e. it protects callback
execution with the same mutex used for per-net data creation and 
destruction.
Hopefully, it won't slow down NFS client significantly.

https://jira.sw.ru/browse/PSBM-75751

v6: destroy all per-net backchannel requests only for NFSv4.1

v5: destroy all per-net backchannel requests before transports on in
nfs_callback_down_net

v4: use another mutex to protect callback execution agains per-net 
transports
shutdown.
This guarantees, that transports won't be destroyed by shutdown callback 
while
execution is in progress and vice versa.

v3: Fix mutex deadlock, when shutdown callback waits for thread to exit 
(with
mutex taken), while thread wait for the mutex to take.
The idea is to simply check if thread has to exit, if mutex lock has failed.
This is a busy loop, but it shouldn't happend often and for long.

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Kirill Tkhai 
---
 fs/nfs/callback.c  | 20 
 include/linux/sunrpc/svc.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 0beb275..feffccf 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -99,6 +99,8 @@ nfs4_callback_up(struct svc_serv *serv)
 }
 
 #if defined(CONFIG_NFS_V4_1)
+static DEFINE_MUTEX(nfs41_callback_mutex);
+
 /*
  * The callback service for NFSv4.1 callbacks
  */
@@ -117,6 +119,12 @@ nfs41_callback_svc(void *vrqstp)
if (try_to_freeze())
continue;
 
+   mutex_lock(&nfs41_callback_mutex);
+   if (kthread_should_stop()) {
+   mutex_unlock(&nfs41_callback_mutex);
+   return 0;
+   }
+
prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
spin_lock_bh(&serv->sv_cb_lock);
if (!list_empty(&serv->sv_cb_list)) {
@@ -129,8 +137,10 @@ nfs41_callback_svc(void *vrqstp)
error = bc_svc_process(serv, req, rqstp);
dprintk("bc_svc_process() returned w/ error code= %d\n",
error);
+   mutex_unlock(&nfs41_callback_mutex);
} else {
spin_unlock_bh(&serv->sv_cb_lock);
+   mutex_unlock(&nfs41_callback_mutex);
schedule();
finish_wait(&serv->sv_cb_waitq, &wq);
}
@@ -139,6 +149,13 @@ nfs41_callback_svc(void *vrqstp)
return 0;
 }
 
+static void nfs41_callback_down_net(struct svc_serv *serv, struct net *net)
+{
+   mutex_lock(&nfs41_callback_mutex);
+   bc_svc_flush_queue_net(serv, net);
+   mutex_unlock(&nfs41_callback_mutex);
+}
+
 /*
  * Bring up the NFSv4.1 callback service
  */
@@ -150,6 +167,7 @@ nfs41_callback_up(struct svc_serv *serv)
INIT_LIST_HEAD(&serv->sv_cb_list);
spin_lock_init(&serv->sv_cb_lock);
init_waitqueue_head(&serv->sv_cb_waitq);
+   serv->svc_cb_down_net = nfs41_callback_down_net;
rqstp = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
dprintk("--> %s return %d\n", __func__, PTR_ERR_OR_ZERO(rqstp));
return rqstp;
@@ -242,6 +260,8 @@ static void nfs

[Devel] [PATCH RHEL7 COMMIT] sunrpc: bc_svc_flush_queue_net() helper introduced

2017-11-13 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.27
-->
commit ca2dc121100718f4f175e5613b7511d3758790a1
Author: Stanislav Kinsburskiy 
Date:   Mon Nov 13 17:11:18 2017 +0300

sunrpc: bc_svc_flush_queue_net() helper introduced

Patchset description:
nfs: fix race between callback shutdown and execution

The idea is to use mutex for protecting callback execution agains per-net
callback shutdown and destroying all the net-related backchannel requests
before transports destruction.

Stanislav Kinsburskiy (2):
sunrpc: bc_svc_flush_queue_net() helper introduced
nfs: protect callback execution against per-net callback thread shutdown

==
This patch description:

This helper can be used to remove backchannel requests from callback queue 
on
per-net basis.

https://jira.sw.ru/browse/PSBM-75751

Signed-off-by: Stanislav Kinsburskiy 
Reviewed-by: Kirill Tkhai 
---
 include/linux/sunrpc/svc.h |  2 ++
 net/sunrpc/svc.c   | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 2b30868..fe70ff0 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -484,6 +484,8 @@ void   svc_reserve(struct svc_rqst *rqstp, 
int space);
 struct svc_pool *  svc_pool_for_cpu(struct svc_serv *serv, int cpu);
 char *svc_print_addr(struct svc_rqst *, char *, size_t);
 
+void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net);
+
 #defineRPC_MAX_ADDRBUFLEN  (63U)
 
 /*
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index de8cded1..2ca4ff7 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1338,6 +1338,21 @@ svc_process(struct svc_rqst *rqstp)
 EXPORT_SYMBOL_GPL(svc_process);
 
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
+void bc_svc_flush_queue_net(struct svc_serv *serv, struct net *net)
+{
+   struct rpc_rqst *req, *tmp;
+
+   spin_lock_bh(&serv->sv_cb_lock);
+   list_for_each_entry_safe(req, tmp, &serv->sv_cb_list, rq_bc_list) {
+   if (req->rq_xprt->xprt_net == net) {
+   list_del(&req->rq_bc_list);
+   xprt_free_bc_request(req);
+   }
+   }
+   spin_unlock_bh(&serv->sv_cb_lock);
+}
+EXPORT_SYMBOL_GPL(bc_svc_flush_queue_net);
+
 /*
  * Process a backchannel RPC request that arrived over an existing
  * outbound connection
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH p.haul] Increate a limit for opened files for criu pre-dump and page-server

2017-11-13 Thread Andrei Vagin
criu restore has to be resored with a standard limit, because the kernel
doesn't shrink fdtable, when a limit is reduced. fdtable-s are charged
to kmem, so if we run criu restore with a big limit, all restored
proccess are forked with this limit and only then they restore their
limits, but fdtable-s are allocated for the initial limit, so they eat
much more kernel memory then they have to.

https://jira.sw.ru/browse/PSBM-67194

Cc: Cyrill Gorcunov 
Cc: Pavel Vokhmyanin 
Signed-off-by: Andrei Vagin 
---
 phaul/criu_api.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/phaul/criu_api.py b/phaul/criu_api.py
index 73c642a..4627d5f 100644
--- a/phaul/criu_api.py
+++ b/phaul/criu_api.py
@@ -9,6 +9,7 @@ import re
 import socket
 import subprocess
 import util
+import resource
 
 import pycriu
 
@@ -36,9 +37,16 @@ class criu_conn(object):
util.set_cloexec(css[1])
logging.info("Passing (ctl:%d, data:%d) pair to CRIU",
css[0].fileno(), mem_sk.fileno())
+
+# criu uses a lot of pipes to pre-dump memory, so we need to
+# increate a limit for opened files.
+   fileno_max = int(open("/proc/sys/fs/nr_open").read())
+   fileno_old = resource.getrlimit(resource.RLIMIT_NOFILE)
+   resource.setrlimit(resource.RLIMIT_NOFILE, (fileno_max, 
fileno_max))
self._swrk = subprocess.Popen([criu_binary,
"swrk", 
"%d" % css[0].fileno()])
css[0].close()
+   resource.setrlimit(resource.RLIMIT_NOFILE, fileno_old)
self._cs = css[1]
self._last_req = -1
self._mem_fd = mem_sk.fileno()
-- 
2.13.6

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel