[Devel] [PATCH rh7] cbt: blk_cbt_update_size() must return if cbt->block_max not changed

2016-06-08 Thread Maxim Patlasov
It's useless to recreate cbt every time as we called for the same
block-device size. Actually, it's worthy only if cbt->block_max
increases.

Since commit b8e560a299 (fix cbt->block_max calculation), we calculate
cbt->block_max precisely:

>   cbt->block_max  = (size + blocksize - 1) >> cbt->block_bits;

Hence, the following check:

if ((new_sz + bsz) >> cbt->block_bits <= cbt->block_max)
goto err_mtx;

must be corrected accordingly.

Signed-off-by: Maxim Patlasov 
---
 block/blk-cbt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-cbt.c b/block/blk-cbt.c
index 3a2b197..4f2ce26 100644
--- a/block/blk-cbt.c
+++ b/block/blk-cbt.c
@@ -440,7 +440,7 @@ void blk_cbt_update_size(struct block_device *bdev)
return;
}
bsz = 1 << cbt->block_bits;
-   if ((new_sz + bsz) >> cbt->block_bits <= cbt->block_max)
+   if ((new_sz + bsz - 1) >> cbt->block_bits <= cbt->block_max)
goto err_mtx;
 
new = do_cbt_alloc(q, cbt->uuid, new_sz, bsz);

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] cbt: blk_cbt_update_size() should not copy uninitialized data

2016-06-08 Thread Maxim Patlasov
to_cpy is the number of page pointers to copy from current cbt to new.
The following check:

>   if ((new_sz + bsz) >> cbt->block_bits <= cbt->block_max)
>   goto err_mtx;

ensures that the copy will be done only for new cbt bigger than current. So,
we have to calculate to_cpy based on the current (smaller) cbt. The rest of
new cbt is OK because it was nullified by do_cbt_alloc().

The bug existed since the very first version of CBT (commit ad7ba3dfe).

https://jira.sw.ru/browse/PSBM-48120

Signed-off-by: Maxim Patlasov 
---
 block/blk-cbt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-cbt.c b/block/blk-cbt.c
index 001dbfd..3a2b197 100644
--- a/block/blk-cbt.c
+++ b/block/blk-cbt.c
@@ -448,7 +448,7 @@ void blk_cbt_update_size(struct block_device *bdev)
set_bit(CBT_ERROR, &cbt->flags);
goto err_mtx;
}
-   to_cpy = NR_PAGES(new->block_max);
+   to_cpy = NR_PAGES(cbt->block_max);
set_bit(CBT_NOCACHE, &cbt->flags);
cbt_flush_cache(cbt);
spin_lock_irq(&cbt->lock);

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH] scripts: vz-rst-action -- Restore certain members before creating namespaces

2016-06-08 Thread Cyrill Gorcunov
When restoring tasks we call clone() and unshare() with flags needed
but some of VE settings such as @iptables_mask affects how create
proceed new namespace. Thus we need to restore this member at the
very early pre-restore stage. I put @features here as well, for
example sitX net_init action depends on it.

Signed-off-by: Cyrill Gorcunov 
---
Igor, don't apply it please until explicit Ack from CC'ed list.

 scripts/vz-rst-action.in | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/scripts/vz-rst-action.in b/scripts/vz-rst-action.in
index 4e408f2..f0b6aca 100755
--- a/scripts/vz-rst-action.in
+++ b/scripts/vz-rst-action.in
@@ -68,6 +68,12 @@ fi
 
 set -e
 case "$CRTOOLS_SCRIPT_ACTION" in
+"pre-restore")
+   if [ -n "$VEID" ]; then
+   [ -n "$VE_IPTABLES_MASK" ] && cgset -r 
ve.iptables_mask="$VE_IPTABLES_MASK" $VEID
+   [ -n "$VE_FEATURES" ] && cgset -r ve.features="$VE_FEATURES" 
$VEID
+   fi
+   ;;
 "setup-namespaces")
pid=$(cat $VE_PIDFILE)
ln -s /proc/$pid/ns/net $VE_NETNS_FILE
@@ -75,8 +81,6 @@ case "$CRTOOLS_SCRIPT_ACTION" in
if [ -n "$VEID" ]; then
[ -n "$VE_CLOCK_BOOTBASED" ] && cgset -r 
ve.clock_bootbased="$VE_CLOCK_BOOTBASED" $VEID
[ -n "$VE_CLOCK_MONOTONIC" ] && cgset -r 
ve.clock_monotonic="$VE_CLOCK_MONOTONIC" $VEID
-   [ -n "$VE_IPTABLES_MASK" ] && cgset -r 
ve.iptables_mask="$VE_IPTABLES_MASK" $VEID
-   [ -n "$VE_FEATURES" ] && cgset -r ve.features="$VE_FEATURES" 
$VEID
[ -n "$VE_AIO_MAX_NR" ] && cgset -r 
ve.aio_max_nr="$VE_AIO_MAX_NR" $VEID
cgset -r ve.state="START $pid" $VEID || { echo "Failed to start 
$VEID"; exit 1; }
fi
-- 
2.5.5

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [patch vz7] do not allow rootfs umount

2016-06-08 Thread Andrey Vagin
On Mon, Jun 06, 2016 at 03:45:11PM +0300, Vasily Averin wrote:
> In mainline rootfs is marked always as MNT_LOCKED,
> sys_umount checks this flag and fails its processing.
> Our kernels lacks for MNT_LOCKED flag, so we use another kind of check
> to prevent incorrect operation.
> 
> https://jira.sw.ru/browse/PSBM-46437
> 
> Signed-off-by: Vasily Averin 

> diff --git a/fs/namespace.c b/fs/namespace.c
> index 988320b..6f05245 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1355,6 +1355,8 @@ SYSCALL_DEFINE2(umount, char __user *, name, int, flags)
>   goto dput_and_out;
>   if (!check_mnt(mnt))
>   goto dput_and_out;
> + if (path.mnt->mnt_parent == path.mnt)

We can user mnt_has_parent() here

> + goto dput_and_out;
>  
>   retval = do_umount(mnt, flags);
>  dput_and_out:

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [vzlin-dev] [PATCH rh7 2/2] ploop: push_backup: rework lockout machinery

2016-06-08 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> It was not very nice idea to reuse plo->lockout_tree for push_backup. Because
> by design only one preq (for any given req_cluster) can sit in the lockout
> tree, but while we're reusing the tree for a WRITE request, a READ from
> backup tool may come. Such a READ may want to to use the tree: see how
> map_index_fault calls add_lockout for snapshot configuration.
>
> The patch introduces ad-hoc separate push_backup lockout tree. This fix the
> issue (PSBM-47680) and makes the code much easier to understand.
>
> https://jira.sw.ru/browse/PSBM-47680
ACK
>
> Signed-off-by: Maxim Patlasov 
> ---
>  drivers/block/ploop/dev.c|  111 
> ++
>  drivers/block/ploop/events.h |1 
>  include/linux/ploop/ploop.h  |3 +
>  3 files changed, 95 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
> index d3f0ec0..27827a8 100644
> --- a/drivers/block/ploop/dev.c
> +++ b/drivers/block/ploop/dev.c
> @@ -1117,20 +1117,25 @@ static int ploop_congested(void *data, int bits)
>   return ret;
>  }
>  
> -static int check_lockout(struct ploop_request *preq)
> +static int __check_lockout(struct ploop_request *preq, bool pb)
>  {
>   struct ploop_device * plo = preq->plo;
> - struct rb_node * n = plo->lockout_tree.rb_node;
> + struct rb_node * n = pb ? plo->lockout_pb_tree.rb_node :
> +   plo->lockout_tree.rb_node;
>   struct ploop_request * p;
> + int lockout_bit = pb ? PLOOP_REQ_PB_LOCKOUT : PLOOP_REQ_LOCKOUT;
>  
>   if (n == NULL)
>   return 0;
>  
> - if (test_bit(PLOOP_REQ_LOCKOUT, &preq->state))
> + if (test_bit(lockout_bit, &preq->state))
>   return 0;
>  
>   while (n) {
> - p = rb_entry(n, struct ploop_request, lockout_link);
> + if (pb)
> + p = rb_entry(n, struct ploop_request, lockout_pb_link);
> + else
> + p = rb_entry(n, struct ploop_request, lockout_link);
>  
>   if (preq->req_cluster < p->req_cluster)
>   n = n->rb_left;
> @@ -1146,19 +1151,51 @@ static int check_lockout(struct ploop_request *preq)
>   return 0;
>  }
>  
> -int ploop_add_lockout(struct ploop_request *preq, int try)
> +static int check_lockout(struct ploop_request *preq)
> +{
> + if (__check_lockout(preq, false))
> + return 1;
> +
> + /* push_backup passes READs intact */
> + if (!(preq->req_rw & REQ_WRITE))
> + return 0;
> +
> + if (__check_lockout(preq, true))
> + return 1;
> +
> + return 0;
> +}
> +
> +static int __ploop_add_lockout(struct ploop_request *preq, int try, bool pb)
>  {
>   struct ploop_device * plo = preq->plo;
> - struct rb_node ** p = &plo->lockout_tree.rb_node;
> + struct rb_node ** p;
>   struct rb_node *parent = NULL;
>   struct ploop_request * pr;
> + struct rb_node *link;
> + struct rb_root *tree;
> + int lockout_bit;
> +
> + if (pb) {
> + link = &preq->lockout_pb_link;
> + tree = &plo->lockout_pb_tree;
> + lockout_bit = PLOOP_REQ_PB_LOCKOUT;
> + } else {
> + link = &preq->lockout_link;
> + tree = &plo->lockout_tree;
> + lockout_bit = PLOOP_REQ_LOCKOUT;
> + }
>  
> - if (test_bit(PLOOP_REQ_LOCKOUT, &preq->state))
> + if (test_bit(lockout_bit, &preq->state))
>   return 0;
>  
> + p = &tree->rb_node;
>   while (*p) {
>   parent = *p;
> - pr = rb_entry(parent, struct ploop_request, lockout_link);
> + if (pb)
> + pr = rb_entry(parent, struct ploop_request, 
> lockout_pb_link);
> + else
> + pr = rb_entry(parent, struct ploop_request, 
> lockout_link);
>  
>   if (preq->req_cluster == pr->req_cluster) {
>   if (try)
> @@ -1174,23 +1211,56 @@ int ploop_add_lockout(struct ploop_request *preq, int 
> try)
>  
>   trace_add_lockout(preq);
>  
> - rb_link_node(&preq->lockout_link, parent, p);
> - rb_insert_color(&preq->lockout_link, &plo->lockout_tree);
> - __set_bit(PLOOP_REQ_LOCKOUT, &preq->state);
> + rb_link_node(link, parent, p);
> + rb_insert_color(link, tree);
> + __set_bit(lockout_bit, &preq->state);
>   return 0;
>  }
> +
> +int ploop_add_lockout(struct ploop_request *preq, int try)
> +{
> + return __ploop_add_lockout(preq, try, false);
> +}
>  EXPORT_SYMBOL(ploop_add_lockout);
>  
> -void del_lockout(struct ploop_request *preq)
> +static void ploop_add_pb_lockout(struct ploop_request *preq)
> +{
> + __ploop_add_lockout(preq, 0, true);
> +}
> +
> +static void __del_lockout(struct ploop_request *preq, bool pb)
>  {
>   struct ploop_device * plo = preq->plo;
> + struct rb_node *link;
> + struct rb_root *tree;
> + int lockout_bit;
> +
> +   

Re: [Devel] [vzlin-dev] [PATCH rh7 1/2] ploop: push_backup: roll back ALLOW_READS patch

2016-06-08 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> The patch reverts:
>
> Subject: [PATCH rh7] ploop: push_backup must pass READs intact
>
> If push_backup is in progress (doesn't matter "full" or "incremental") and
> ploop state-machine detects incoming WRITE request to the cluster-block that
> was not push_backup-ed yet, it suspends the request until userspace reports it
> as "processed".
>
> The above is fine, but while such a WRITE request is suspended, only
> subsequent WRITEs (to given cluster-block) must be suspended too. READs must
> not. Otherwise userspace backup tool will be blocked infinintely trying
> to push_backup given cluster-block.
>
> Passing READs while blocking WRITEs must be OK because: 1) ploop has not
> finalized that first WRITE yet; 2) given cluster-block will be kept
> intact (non-modified) while the WRITE is suspended.
>
> https://jira.sw.ru/browse/PSBM-46775
ACK
>
> Signed-off-by: Maxim Patlasov 
> ---
>  drivers/block/ploop/dev.c   |7 ---
>  include/linux/ploop/ploop.h |1 -
>  2 files changed, 8 deletions(-)
>
> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
> index 96f7850..d3f0ec0 100644
> --- a/drivers/block/ploop/dev.c
> +++ b/drivers/block/ploop/dev.c
> @@ -1137,11 +1137,6 @@ static int check_lockout(struct ploop_request *preq)
>   else if (preq->req_cluster > p->req_cluster)
>   n = n->rb_right;
>   else {
> - /* do not block backup tool READs from /dev/ploop */
> - if (!(preq->req_rw & REQ_WRITE) &&
> - test_bit(PLOOP_REQ_ALLOW_READS, &p->state))
> - return 0;
> -
>   list_add_tail(&preq->list, &p->delay_list);
>   plo->st.bio_lockouts++;
>   trace_preq_lockout(preq, p);
> @@ -2053,7 +2048,6 @@ restart:
>   ploop_pb_clear_bit(plo->pbd, preq->req_cluster);
>   } else {
>   spin_lock_irq(&plo->lock);
> - __set_bit(PLOOP_REQ_ALLOW_READS, &preq->state);
>   ploop_add_lockout(preq, 0);
>   spin_unlock_irq(&plo->lock);
>   /*
> @@ -2072,7 +2066,6 @@ restart:
>  
>   spin_lock_irq(&plo->lock);
>   del_lockout(preq);
> - __clear_bit(PLOOP_REQ_ALLOW_READS, &preq->state);
>   if (!list_empty(&preq->delay_list))
>   list_splice_init(&preq->delay_list, 
> plo->ready_queue.prev);
>   spin_unlock_irq(&plo->lock);
> diff --git a/include/linux/ploop/ploop.h b/include/linux/ploop/ploop.h
> index 0fba25e..77fd833 100644
> --- a/include/linux/ploop/ploop.h
> +++ b/include/linux/ploop/ploop.h
> @@ -470,7 +470,6 @@ enum
>   PLOOP_REQ_KAIO_FSYNC,   /*force image fsync by KAIO module */
>   PLOOP_REQ_POST_SUBMIT, /* preq needs post_submit processing */
>   PLOOP_REQ_PUSH_BACKUP, /* preq was ACKed by userspace push_backup */
> - PLOOP_REQ_ALLOW_READS, /* READs are allowed for given req_cluster */
>   PLOOP_REQ_FSYNC_DONE,  /* fsync_thread() performed f_op->fsync() */
>  };
>  


signature.asc
Description: PGP signature
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7] sched/core/cfs: don't reset nr_cpus while setting cpu limits

2016-06-08 Thread Vladimir Davydov
On Tue, Jun 07, 2016 at 04:50:38PM +0300, Andrey Ryabinin wrote:
> Setting cpu limits reset number of cpus
> # echo 2 >/sys/fs/cgroup/cpu,cpuacct/101/cpu.nr_cpus
> # exec 101 cat /proc/cpuinfo |grep -c processor
>  2
> # echo 16 >/sys/fs/cgroup/cpu,cpuacct/101/cpu.cfs_quota_us
> # vzctl exec 101 cat /proc/cpuinfo |grep -c processor
>  4
> # cat /sys/fs/cgroup/cpu,cpuacct/101/cpu.nr_cpus
>  0
> 
> tg_update_cpu_limit() does that without any apparent reason,
> so let's fix it,.
> 
> https://jira.sw.ru/browse/PSBM-48061
> 
> Signed-off-by: Andrey Ryabinin 
> ---
>  kernel/sched/core.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2c147c8..51ebed2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8696,7 +8696,6 @@ static void tg_update_cpu_limit(struct task_group *tg)
>   }
>  
>   tg->cpu_rate = rate;
> - tg->nr_cpus = 0;

This is incorrect. Suppose nr_cpus = 2 and you set cfs_quota to
4 * cfs_period. If you don't reset nr_cpus, you'll get cpu limit equal
to 400, although it should be min(nr_cpus * 100, cpu_rate) = 200.

>  }
>  
>  static int tg_set_cpu_limit(struct task_group *tg,
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel