Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3

2016-06-22 Thread Maxim Patlasov

Dima,

I'm uneasy that we still have handling RELOC_A|S broken. It seems we 
have full agreement that for such requests we can do unconditional 
FLUSH|FUA when we call write_page from ploop_index_update() and 
map_wb_complete(). And your idea to implement it by passing FLUSH|FUA 
for io_direct and post_fsync=1 for io_kaio is smart and OK. Will you 
send patch for that (fix barriers for RELOC_A|S requests)?


Thanks,
Maxim

On 06/21/2016 04:56 PM, Maxim Patlasov wrote:

Dima,

After more thinking I realized that the whole idea of 
PLOOP_REQ_DELAYED_FLUSH might be bogus: it is possible that we simply 
do not have many enough incoming FUA-s to make delaying lucrative. 
This patch actually mixes three things: 1) fix barriers for RELOC_A|S 
requests, 2) fix barriers for ordinary requests, 3) DELAYED_FLUSH 
optimization. So, please, split the patch into three and make some 
measurements demonstrating that applying "DELAYED_FLUSH optimization" 
patch on top of previous patches improves performance.


I have an idea about how to fix barriers for ordinary requests -- see 
please the patch I'll send soon. The key point is that handling 
FLUSH-es is broken the same way as FUA: if you observe (rw & 
REQ_FLUSH) and sends first bio marked as REQ_FLUSH, it guarantees 
nothing unless you wait for completion before submitting further 
bio-s! And ploop simply does not have the logic of waiting the first 
before sending others. And, to make things worse, not only dio_submit 
is affected, dio_sibmit_pad and dio_io_page to be fixed too.


There are also some inline comments below...

On 06/21/2016 06:55 AM, Dmitry Monakhov wrote:

barrier code is broken in many ways:
Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} 
correctly.
But request also can goes though ->dio_submit_alloc()->dio_submit_pad 
and write_page (for indexes)

So in case of grow_dev we have following sequance:

E_RELOC_DATA_READ:
  ->set_bit(PLOOP_REQ_FORCE_FUA, >state);
   ->delta->allocate
  ->io->submit_allloc: dio_submit_alloc
->dio_submit_pad
E_DATA_WBI : data written, time to update index
->delta->allocate_complete:ploop_index_update
 ->set_bit(PLOOP_REQ_FORCE_FUA, >state);
 ->write_page
 ->ploop_map_wb_complete
   ->ploop_wb_complete_post_process
 ->set_bit(PLOOP_REQ_FORCE_FUA, >state);
E_RELOC_NULLIFY:

->submit()

BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH
BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag 
all bios via REQ_FUA

not just latest one.
This patch unify barrier handling like follows:
- Get rid of FORCE_{FLUSH,FUA}
- Introduce DELAYED_FLUSH
- fix fua handling for dio_submit
- BUG_ON for REQ_FLUSH in kaio_page_write

This makes reloc sequence optimal:
io_direct
RELOC_S: R1, W2, WBI:FLUSH|FUA
RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
io_kaio
RELOC_S: R1, W2:FUA, WBI:FUA
RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA

https://jira.sw.ru/browse/PSBM-47107
Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/dev.c   |  8 +---
  drivers/block/ploop/io_direct.c | 30 ++-
  drivers/block/ploop/io_kaio.c   | 23 +
  drivers/block/ploop/map.c   | 45 
++---

  include/linux/ploop/ploop.h | 19 +
  5 files changed, 60 insertions(+), 65 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 96f7850..fbc5f2f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct 
ploop_request * preq)

__TRACE("Z %p %u\n", preq, preq->req_cluster);
  +if (!preq->error) {
+WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, >state));
+}
  while (preq->bl.head) {
  struct bio * bio = preq->bl.head;
  preq->bl.head = bio->bi_next;
@@ -2530,9 +2533,8 @@ restart:
  top_delta = ploop_top_delta(plo);
  sbl.head = sbl.tail = preq->aux_bio;
  -/* Relocated data write required sync before BAT updatee */
-set_bit(PLOOP_REQ_FORCE_FUA, >state);
-
+/* Relocated data write required sync before BAT updatee
+ * this will happen inside index_update */
  if (test_bit(PLOOP_REQ_RELOC_S, >state)) {
  preq->eng_state = PLOOP_E_DATA_WBI;
  plo->st.bio_out++;
diff --git a/drivers/block/ploop/io_direct.c 
b/drivers/block/ploop/io_direct.c

index a6d83fe..303eb70 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct 
ploop_request * preq,

  int err;
  struct bio_list_walk bw;
  int preflush;
-int postfua = 0;
+int fua = 0;
  int write = !!(rw & REQ_WRITE);
  int bio_num;


Your patch obsoletes bio_num. 

Re: [Devel] [PATCH rh7] ploop: fix barriers for ordinary requests

2016-06-22 Thread Maxim Patlasov

On 06/22/2016 06:41 AM, Dmitry Monakhov wrote:

Maxim Patlasov  writes:


The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA
is completely wrong: to make sure that b1:FLUSH made effect we have to
wait for its completion. Similarly, even if we're sure that FUA will be
processed as post-FLUSH (also dubious!), we have to wait for completion
b1..b4 to make sure that that flush will cover them.

The patch fixes all these issues pretty simple: let's mark outgouing
bio-s with FLUSH|FUA based on those flags in *corresponing* incoming
bio-s.

One more thing please see below.

Signed-off-by: Maxim Patlasov 
---
  drivers/block/ploop/dev.c   |1 -
  drivers/block/ploop/io_direct.c |   47 ---
  2 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 2ef1449..6b5702f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * bio,
preq->req_sector = bio->bi_sector;
preq->req_size = bio->bi_size >> 9;
preq->req_rw = bio->bi_rw;
-   bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA);
preq->eng_state = PLOOP_E_ENTRY;
preq->state = 0;
preq->error = 0;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index 6ef9cd8..84c9a48 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
int preflush;
int postfua = 0;
int write = !!(rw & REQ_WRITE);
-   int bio_num;
  
  	trace_submit(preq);
  
@@ -233,13 +232,13 @@ flush_bio:

goto flush_bio;
}
  
+		bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA);

bw.bv_off += copy;
size -= copy >> 9;
sec += copy >> 9;
}
ploop_extent_put(em);
  
-	bio_num = 0;

while (bl.head) {
struct bio * b = bl.head;
unsigned long rw2 = rw;
@@ -255,11 +254,10 @@ flush_bio:
preflush = 0;
}
if (unlikely(postfua && !bl.head))
-   rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
+   rw2 |= REQ_FUA;
  
  		ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);

-   submit_bio(rw2, b);
-   bio_num++;
+   submit_bio(rw2 | b->bi_rw, b);
}
  
  	ploop_complete_io_request(preq);

@@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * 
preq,
sector_t sec, end_sec, nsec, start, end;
struct bio_list_walk bw;
int err;
-   int preflush = !!(preq->req_rw & REQ_FLUSH);
  
  	bio_list_init();
  
@@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq,

while (sec < end_sec) {
struct page * page;
unsigned int poff, plen;
+   bool zero_page;
  
  		if (sec < start) {

+   zero_page = true;
page = ZERO_PAGE(0);
poff = 0;
plen = start - sec;
if (plen > (PAGE_SIZE>>9))
plen = (PAGE_SIZE>>9);
} else if (sec >= end) {
+   zero_page = true;
page = ZERO_PAGE(0);
poff = 0;
plen = end_sec - sec;
@@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * 
preq,
} else {
/* sec >= start && sec < end */
struct bio_vec * bv;
+   zero_page = false;
  
  			if (sec == start) {

bw.cur = sbl->head;
@@ -672,6 +673,10 @@ flush_bio:
goto flush_bio;
}
  
+		/* Handle FLUSH here, dio_post_submit will handle FUA */

submit_pad may be called w/o post_submit flag from here:
->dio_submit_alloc
   if (io->files.em_tree->_get_extent) {
->dio_fallocate
->dio_submit_pad
   ..
  }


We never has _get_extent set. This is legacy code for PCSS support, 
we'll remove it. For now, we can safely ignore this.


Thanks,
Maxim
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH 2/3] ploop: deadcode cleanup

2016-06-22 Thread Maxim Patlasov

Kostya,

The patch is OK per-se, please commit it with:

Acked-by: Maxim Patlasov 

Thanks,
Maxim

On 06/21/2016 06:55 AM, Dmitry Monakhov wrote:

(rw & REQ_FUA) branch is impossible because REQ_FUA was cleared line above.
Logic was moved to ploop_req_delay_fua_possible() long time ago.

Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/io_direct.c | 9 -
  1 file changed, 9 deletions(-)

diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index 58d7580..a6d83fe 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -108,15 +108,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * 
preq,
rw &= ~(REQ_FLUSH | REQ_FUA);
  
  
-	/* In case of eng_state != COMPLETE, we'll do FUA in

-* ploop_index_update(). Otherwise, we should mark
-* last bio as FUA here. */
-   if (rw & REQ_FUA) {
-   rw &= ~REQ_FUA;
-   if (preq->eng_state == PLOOP_E_COMPLETE)
-   postfua = 1;
-   }
-
bio_list_init();
  
  	if (iblk == PLOOP_ZERO_INDEX)


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH 1/3] ploop: skip redundant fsync for REQ_FUA in post_submit

2016-06-22 Thread Maxim Patlasov

Kostya,

The patch is OK per-se, please commit it with:

Acked-by: Maxim Patlasov 

Thanks,
Maxim

On 06/21/2016 06:55 AM, Dmitry Monakhov wrote:

Signed-off-by: Dmitry Monakhov 
---
  drivers/block/ploop/io_direct.c | 24 ++--
  1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index b844a80..58d7580 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -517,27 +517,31 @@ dio_post_submit(struct ploop_io *io, struct ploop_request 
* preq)
struct ploop_device *plo = preq->plo;
sector_t sec = (sector_t)preq->iblock << preq->plo->cluster_log;
loff_t clu_siz = 1 << (preq->plo->cluster_log + 9);
+   int force_sync = preq->req_rw & REQ_FUA;
int err;
  
  	file_start_write(io->files.file);
  
-	/* Here io->io_count is even ... */

-   spin_lock_irq(>lock);
-   io->io_count++;
-   set_bit(PLOOP_IO_FSYNC_DELAYED, >io_state);
-   spin_unlock_irq(>lock);
-
+   if (!force_sync) {
+   /* Here io->io_count is even ... */
+   spin_lock_irq(>lock);
+   io->io_count++;
+   set_bit(PLOOP_IO_FSYNC_DELAYED, >io_state);
+   spin_unlock_irq(>lock);
+   }
err = io->files.file->f_op->fallocate(io->files.file,
  FALLOC_FL_CONVERT_UNWRITTEN,
  (loff_t)sec << 9, clu_siz);
  
  	/* highly unlikely case: FUA coming to a block not provisioned yet */

-   if (!err && (preq->req_rw & REQ_FUA))
+   if (!err && force_sync)
err = io->ops->sync(io);
  
-	spin_lock_irq(>lock);

-   io->io_count++;
-   spin_unlock_irq(>lock);
+   if (!force_sync) {
+   spin_lock_irq(>lock);
+   io->io_count++;
+   spin_unlock_irq(>lock);
+   }
/* and here io->io_count is even (+2) again. */
  
  	file_end_write(io->files.file);


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [NEW KERNEL] 3.10.0-327.18.2.vz7.14.19 (rhel7)

2016-06-22 Thread builder
Changelog:

OpenVZ kernel rh7-3.10.0-327.18.2.vz7.14.19

* cpustat: an attempt to update vcpustats for root_task_group led to
  kernel panic

Changelog since kernel rh7-3.10.0-327.18.2.vz7.14.17:

* cleanup: dropped CAP_VE_ADMIN and CAP_VE_NET_ADMIN, mark DEF_PERMS feature
  deprecated
* mm: when setting memory.high below usage, reclaim right now
* locks: NFS shared locks to be shown in fdinfo (required for CRIU)
* fs: ignore device permissions in CRIU restore stage

* ploop: fix barriers for ordinary requests


Generated changelog:

* Wed Jun 22 2016 Konstantin Khorenko  
[3.10.0-327.18.2.vz7.14.19]
- ve/cpustat: don't try to update vcpustats for root_task_group (Andrey 
Ryabinin) [PSBM-48721]


Built packages: 
http://kojistorage.eng.sw.ru/packages/vzkernel/3.10.0/327.18.2.vz7.14.19/
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH] binfmt_misc: Allow mount if capable(CAP_SYS_ADMIN)

2016-06-22 Thread Andrey Ryabinin
On 06/22/2016 04:42 PM, Kirill Tkhai wrote:
> The patch allows to mount binfmt_misc in a CT with ve0's admin caps,
> and it's need that for CRIU dump. This time, unmounted binfmt_misc
> may be forced mounted back, and we don't want to change CRIU's user_ns
> to do that.
> 
> Signed-off-by: Kirill Tkhai 

Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7] ploop: fix barriers for ordinary requests

2016-06-22 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA
> is completely wrong: to make sure that b1:FLUSH made effect we have to
> wait for its completion. Similarly, even if we're sure that FUA will be
> processed as post-FLUSH (also dubious!), we have to wait for completion
> b1..b4 to make sure that that flush will cover them.
>
> The patch fixes all these issues pretty simple: let's mark outgouing
> bio-s with FLUSH|FUA based on those flags in *corresponing* incoming
> bio-s.
One more thing please see below.
>
> Signed-off-by: Maxim Patlasov 
> ---
>  drivers/block/ploop/dev.c   |1 -
>  drivers/block/ploop/io_direct.c |   47 
> ---
>  2 files changed, 15 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
> index 2ef1449..6b5702f 100644
> --- a/drivers/block/ploop/dev.c
> +++ b/drivers/block/ploop/dev.c
> @@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * 
> bio,
>   preq->req_sector = bio->bi_sector;
>   preq->req_size = bio->bi_size >> 9;
>   preq->req_rw = bio->bi_rw;
> - bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA);
>   preq->eng_state = PLOOP_E_ENTRY;
>   preq->state = 0;
>   preq->error = 0;
> diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
> index 6ef9cd8..84c9a48 100644
> --- a/drivers/block/ploop/io_direct.c
> +++ b/drivers/block/ploop/io_direct.c
> @@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
>   int preflush;
>   int postfua = 0;
>   int write = !!(rw & REQ_WRITE);
> - int bio_num;
>  
>   trace_submit(preq);
>  
> @@ -233,13 +232,13 @@ flush_bio:
>   goto flush_bio;
>   }
>  
> + bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA);
>   bw.bv_off += copy;
>   size -= copy >> 9;
>   sec += copy >> 9;
>   }
>   ploop_extent_put(em);
>  
> - bio_num = 0;
>   while (bl.head) {
>   struct bio * b = bl.head;
>   unsigned long rw2 = rw;
> @@ -255,11 +254,10 @@ flush_bio:
>   preflush = 0;
>   }
>   if (unlikely(postfua && !bl.head))
> - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
> + rw2 |= REQ_FUA;
>  
>   ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
> - submit_bio(rw2, b);
> - bio_num++;
> + submit_bio(rw2 | b->bi_rw, b);
>   }
>  
>   ploop_complete_io_request(preq);
> @@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request 
> * preq,
>   sector_t sec, end_sec, nsec, start, end;
>   struct bio_list_walk bw;
>   int err;
> - int preflush = !!(preq->req_rw & REQ_FLUSH);
>  
>   bio_list_init();
>  
> @@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct 
> ploop_request * preq,
>   while (sec < end_sec) {
>   struct page * page;
>   unsigned int poff, plen;
> + bool zero_page;
>  
>   if (sec < start) {
> + zero_page = true;
>   page = ZERO_PAGE(0);
>   poff = 0;
>   plen = start - sec;
>   if (plen > (PAGE_SIZE>>9))
>   plen = (PAGE_SIZE>>9);
>   } else if (sec >= end) {
> + zero_page = true;
>   page = ZERO_PAGE(0);
>   poff = 0;
>   plen = end_sec - sec;
> @@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request 
> * preq,
>   } else {
>   /* sec >= start && sec < end */
>   struct bio_vec * bv;
> + zero_page = false;
>  
>   if (sec == start) {
>   bw.cur = sbl->head;
> @@ -672,6 +673,10 @@ flush_bio:
>   goto flush_bio;
>   }
>  
> + /* Handle FLUSH here, dio_post_submit will handle FUA */

submit_pad may be called w/o post_submit flag from here:
->dio_submit_alloc
  if (io->files.em_tree->_get_extent) {
   ->dio_fallocate
   ->dio_submit_pad
  ..
 }
> + if (!zero_page)
> + bio->bi_rw |= bw.cur->bi_rw & REQ_FLUSH;
> +
>   bw.bv_off += (plen<<9);
>   BUG_ON(plen == 0);
>   sec += plen;
> @@ -688,13 +693,9 @@ flush_bio:
>   b->bi_private = preq;
>   b->bi_end_io = dio_endio_async;
>  
> - rw = sbl->head->bi_rw | WRITE;
> - if (unlikely(preflush)) {
> - rw |= REQ_FLUSH;
> - preflush = 0;
> - }
> + rw = preq->req_rw & ~(REQ_FLUSH | REQ_FUA);
>   

[Devel] [PATCH] binfmt_misc: Allow mount if capable(CAP_SYS_ADMIN)

2016-06-22 Thread Kirill Tkhai
The patch allows to mount binfmt_misc in a CT with ve0's admin caps,
and it's need that for CRIU dump. This time, unmounted binfmt_misc
may be forced mounted back, and we don't want to change CRIU's user_ns
to do that.

Signed-off-by: Kirill Tkhai 
---
 fs/binfmt_misc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fd5227f..e259022 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -735,7 +735,7 @@ static int bm_fill_super(struct super_block * sb, void * 
data, int silent)
 static struct dentry *bm_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
 {
-   if (!current_user_ns_initial())
+   if (!current_user_ns_initial() && !capable(CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);
return mount_ns(fs_type, flags, get_exec_env(), bm_fill_super);
 }

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve/cpustat: don't try to update vcpustats for root_task_group

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.18
-->
commit 6a887128c0ff214571da1451d7336e3c9bb8d86a
Author: Andrey Ryabinin 
Date:   Wed Jun 22 17:19:39 2016 +0400

ve/cpustat: don't try to update vcpustats for root_task_group

root_task_group doesn't have vcpu stats. Attempt to update them leads
to NULL-ptr deref:

BUG: unable to handle kernel NULL pointer dereference at   
(null)
IP: [] cpu_cgroup_update_vcpustat+0x13c/0x620
...
Call Trace:
 [] cpu_cgroup_get_stat+0x7b/0x180
 [] ve_get_cpu_stat+0x27/0x70
 [] fill_cpu_stat+0x91/0x1e0 [vzmon]
 [] vzcalls_ioctl+0x2bb/0x430 [vzmon]
 [] vzctl_ioctl+0x45/0x60 [vzdev]
 [] do_vfs_ioctl+0x255/0x4f0
 [] SyS_ioctl+0x54/0xa0
 [] system_call_fastpath+0x16/0x1b

So, return -ENOENT if we asked for vcpu stats of root_task_group.

https://jira.sw.ru/browse/PSBM-48721

Signed-off-by: Andrey Ryabinin 
Reviewed-by: Vladimir Davydov 
---
 kernel/sched/core.c | 10 --
 kernel/ve/ve.c  |  7 ---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e885549..94deef4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9120,20 +9120,26 @@ int cpu_cgroup_proc_loadavg(struct cgroup *cgrp, struct 
cftype *cft,
return 0;
 }
 
-void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat)
+int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat)
 {
struct task_group *tg = cgroup_tg(cgrp);
int nr_vcpus = tg->nr_cpus ?: num_online_cpus();
int i;
 
+   kernel_cpustat_zero(kstat);
+
+   if (tg == _task_group)
+   return -ENOENT;
+
for_each_possible_cpu(i)
cpu_cgroup_update_stat(cgrp, i);
 
cpu_cgroup_update_vcpustat(cgrp);
 
-   kernel_cpustat_zero(kstat);
for (i = 0; i < nr_vcpus; i++)
kernel_cpustat_add(tg->vcpustat + i, kstat, kstat);
+
+   return 0;
 }
 
 int cpu_cgroup_get_avenrun(struct cgroup *cgrp, unsigned long *avenrun)
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index 2459cb5..d196e3e 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -1448,16 +1448,17 @@ int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned 
long *avenrun)
 }
 EXPORT_SYMBOL(ve_get_cpu_avenrun);
 
-void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat);
+int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat);
 
 int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat)
 {
struct cgroup_subsys_state *css;
+   int err;
 
css = ve_get_init_css(ve, cpu_cgroup_subsys_id);
-   cpu_cgroup_get_stat(css->cgroup, kstat);
+   err = cpu_cgroup_get_stat(css->cgroup, kstat);
css_put(css);
-   return 0;
+   return err;
 }
 EXPORT_SYMBOL(ve_get_cpu_stat);
 #endif /* CONFIG_CGROUP_SCHED */
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7] ve/cpustat: don't try to update vcpustats for root_task_group

2016-06-22 Thread Vladimir Davydov
On Wed, Jun 22, 2016 at 03:59:05PM +0300, Andrey Ryabinin wrote:
> root_task_group doesn't have vcpu stats. Attempt to upate those leads
> to NULL-ptr deref:
> 
>   BUG: unable to handle kernel NULL pointer dereference at   
> (null)
>   IP: [] cpu_cgroup_update_vcpustat+0x13c/0x620
>   ...
>   Call Trace:
>[] cpu_cgroup_get_stat+0x7b/0x180
>[] ve_get_cpu_stat+0x27/0x70
>[] fill_cpu_stat+0x91/0x1e0 [vzmon]
>[] vzcalls_ioctl+0x2bb/0x430 [vzmon]
>[] vzctl_ioctl+0x45/0x60 [vzdev]
>[] do_vfs_ioctl+0x255/0x4f0
>[] SyS_ioctl+0x54/0xa0
>[] system_call_fastpath+0x16/0x1b
> 
> So, return -ENOENT if we asked for vcpu stats of root_task_group.
> 
> https://jira.sw.ru/browse/PSBM-48721
> 
> Signed-off-by: Andrey Ryabinin 

Reviewed-by: Vladimir Davydov 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] ve/cpustat: don't try to update vcpustats for root_task_group

2016-06-22 Thread Andrey Ryabinin
root_task_group doesn't have vcpu stats. Attempt to upate those leads
to NULL-ptr deref:

BUG: unable to handle kernel NULL pointer dereference at   
(null)
IP: [] cpu_cgroup_update_vcpustat+0x13c/0x620
...
Call Trace:
 [] cpu_cgroup_get_stat+0x7b/0x180
 [] ve_get_cpu_stat+0x27/0x70
 [] fill_cpu_stat+0x91/0x1e0 [vzmon]
 [] vzcalls_ioctl+0x2bb/0x430 [vzmon]
 [] vzctl_ioctl+0x45/0x60 [vzdev]
 [] do_vfs_ioctl+0x255/0x4f0
 [] SyS_ioctl+0x54/0xa0
 [] system_call_fastpath+0x16/0x1b

So, return -ENOENT if we asked for vcpu stats of root_task_group.

https://jira.sw.ru/browse/PSBM-48721

Signed-off-by: Andrey Ryabinin 
---
 kernel/sched/core.c | 10 --
 kernel/ve/ve.c  |  7 ---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e885549..94deef4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9120,20 +9120,26 @@ int cpu_cgroup_proc_loadavg(struct cgroup *cgrp, struct 
cftype *cft,
return 0;
 }
 
-void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat)
+int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat)
 {
struct task_group *tg = cgroup_tg(cgrp);
int nr_vcpus = tg->nr_cpus ?: num_online_cpus();
int i;
 
+   kernel_cpustat_zero(kstat);
+
+   if (tg == _task_group)
+   return -ENOENT;
+
for_each_possible_cpu(i)
cpu_cgroup_update_stat(cgrp, i);
 
cpu_cgroup_update_vcpustat(cgrp);
 
-   kernel_cpustat_zero(kstat);
for (i = 0; i < nr_vcpus; i++)
kernel_cpustat_add(tg->vcpustat + i, kstat, kstat);
+
+   return 0;
 }
 
 int cpu_cgroup_get_avenrun(struct cgroup *cgrp, unsigned long *avenrun)
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index 2459cb5..d196e3e 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -1448,16 +1448,17 @@ int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned 
long *avenrun)
 }
 EXPORT_SYMBOL(ve_get_cpu_avenrun);
 
-void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat);
+int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat);
 
 int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat)
 {
struct cgroup_subsys_state *css;
+   int err;
 
css = ve_get_init_css(ve, cpu_cgroup_subsys_id);
-   cpu_cgroup_get_stat(css->cgroup, kstat);
+   err = cpu_cgroup_get_stat(css->cgroup, kstat);
css_put(css);
-   return 0;
+   return err;
 }
 EXPORT_SYMBOL(ve_get_cpu_stat);
 #endif /* CONFIG_CGROUP_SCHED */
-- 
2.7.3

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve: drop not used CAP_VE_ADMIN and CAP_VE_NET_ADMIN

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.17
-->
commit cb6242e909e43182b9bdcd08342b50500d5bad84
Author: Vladimir Davydov 
Date:   Wed Jun 22 16:48:45 2016 +0400

ve: drop not used CAP_VE_ADMIN and CAP_VE_NET_ADMIN

Not needed anymore as we use user ns for capability checking.
Also, move capable_setveid() helper to ve.h so as not to pollute
generic headers.

Signed-off-by: Vladimir Davydov 
---
 include/linux/ve.h  |  3 +++
 include/uapi/linux/capability.h | 55 -
 2 files changed, 3 insertions(+), 55 deletions(-)

diff --git a/include/linux/ve.h b/include/linux/ve.h
index cea3a87..247cadb 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -138,6 +138,9 @@ struct ve_devmnt {
 #define VE_MEMINFO_DEFAULT  1   /* default behaviour */
 #define VE_MEMINFO_SYSTEM   0   /* disable meminfo virtualization */
 
+#define capable_setveid() \
+   (ve_is_super(get_exec_env()) && capable(CAP_SYS_ADMIN))
+
 extern int nr_ve;
 extern struct proc_dir_entry *proc_vz_dir;
 extern struct cgroup_subsys ve_subsys;
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index cadbfe6..b3d37bb 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -307,61 +307,6 @@ struct vfs_cap_data {
 
 #define CAP_SETFCAP 31
 
-#ifdef __KERNEL__
-/*
- * Important note: VZ capabilities do intersect with CAP_AUDIT
- * this is due to compatibility reasons. Nothing bad.
- * Both VZ and Audit/SELinux caps are disabled in VPSs.
- */
-
-/* Allow access to all information. In the other case some structures will be
- * hiding to ensure different Virtual Environment non-interaction on the same
- * node (NOW OBSOLETED)
- */
-#define CAP_SETVEID 29
-
-#define capable_setveid()  ({  \
-   ve_is_super(get_exec_env()) &&  \
-   (capable(CAP_SYS_ADMIN) ||  \
-capable(CAP_VE_ADMIN));\
-   })
-
-/*
- * coinsides with CAP_AUDIT_CONTROL but we don't care, since
- * audit is disabled in Virtuozzo
- */
-#define CAP_VE_ADMIN30
-
-#ifdef CONFIG_VE
-
-/* Replacement for CAP_NET_ADMIN:
-   delegated rights to the Virtual environment of its network administration.
-   For now the following rights have been delegated:
-
-   Allow setting arbitrary process / process group ownership on sockets
-   Allow interface configuration
- */
-#define CAP_VE_NET_ADMIN CAP_VE_ADMIN
-
-/* Replacement for CAP_SYS_ADMIN:
-   delegated rights to the Virtual environment of its administration.
-   For now the following rights have been delegated:
- */
-/* Allow mount/umount/remount */
-/* Allow examination and configuration of disk quotas */
-/* Allow removing semaphores */
-/* Used instead of CAP_CHOWN to "chown" IPC message queues, semaphores
-   and shared memory */
-/* Allow locking/unlocking of shared memory segment */
-/* Allow forged pids on socket credentials passing */
-
-#define CAP_VE_SYS_ADMIN CAP_VE_ADMIN
-#else
-#define CAP_VE_NET_ADMIN CAP_NET_ADMIN
-#define CAP_VE_SYS_ADMIN CAP_SYS_ADMIN
-#endif
-#endif
-
 /* Override MAC access.
The base kernel enforces no MAC policy.
An LSM may enforce a MAC policy, and if it does and it chooses
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: fix barriers for ordinary requests

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.17
-->
commit c2247f374581173c476bf88e82b9650ba263b37d
Author: Maxim Patlasov 
Date:   Wed Jun 22 16:42:43 2016 +0400

ploop: fix barriers for ordinary requests

The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA
is completely wrong: to make sure that b1:FLUSH made effect we have to
wait for its completion. Similarly, even if we're sure that FUA will be
processed as post-FLUSH (also dubious!), we have to wait for completion
b1..b4 to make sure that that flush will cover them.

The patch fixes all these issues pretty simple: let's mark outgouing
bio-s with FLUSH|FUA based on those flags in *corresponing* incoming
bio-s.

Signed-off-by: Maxim Patlasov 
Acked-by: Dmitry Monakhov 

khorenko@:
v2 changes:
Drop 2 hunks like:
> - submit_bio(rw, b);
> + submit_bio(rw | b->bi_rw, b);
This is redundant  ^
submit_bio looks like following:
void submit_bio(int rw, struct bio *bio)
{
bio->bi_rw |= rw;
...
---
 drivers/block/ploop/dev.c   |  1 -
 drivers/block/ploop/io_direct.c | 43 +
 2 files changed, 13 insertions(+), 31 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index 2ef1449..6b5702f 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * bio,
preq->req_sector = bio->bi_sector;
preq->req_size = bio->bi_size >> 9;
preq->req_rw = bio->bi_rw;
-   bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA);
preq->eng_state = PLOOP_E_ENTRY;
preq->state = 0;
preq->error = 0;
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index 6ef9cd8..50c0ed1 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
int preflush;
int postfua = 0;
int write = !!(rw & REQ_WRITE);
-   int bio_num;
 
trace_submit(preq);
 
@@ -233,13 +232,13 @@ flush_bio:
goto flush_bio;
}
 
+   bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA);
bw.bv_off += copy;
size -= copy >> 9;
sec += copy >> 9;
}
ploop_extent_put(em);
 
-   bio_num = 0;
while (bl.head) {
struct bio * b = bl.head;
unsigned long rw2 = rw;
@@ -255,11 +254,10 @@ flush_bio:
preflush = 0;
}
if (unlikely(postfua && !bl.head))
-   rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
+   rw2 |= REQ_FUA;
 
ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
submit_bio(rw2, b);
-   bio_num++;
}
 
ploop_complete_io_request(preq);
@@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * 
preq,
sector_t sec, end_sec, nsec, start, end;
struct bio_list_walk bw;
int err;
-   int preflush = !!(preq->req_rw & REQ_FLUSH);
 
bio_list_init();
 
@@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request 
* preq,
while (sec < end_sec) {
struct page * page;
unsigned int poff, plen;
+   bool zero_page;
 
if (sec < start) {
+   zero_page = true;
page = ZERO_PAGE(0);
poff = 0;
plen = start - sec;
if (plen > (PAGE_SIZE>>9))
plen = (PAGE_SIZE>>9);
} else if (sec >= end) {
+   zero_page = true;
page = ZERO_PAGE(0);
poff = 0;
plen = end_sec - sec;
@@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * 
preq,
} else {
/* sec >= start && sec < end */
struct bio_vec * bv;
+   zero_page = false;
 
if (sec == start) {
bw.cur = sbl->head;
@@ -672,6 +673,10 @@ flush_bio:
goto flush_bio;
}
 
+   /* Handle FLUSH here, dio_post_submit will handle FUA */
+   if (!zero_page)
+   bio->bi_rw |= bw.cur->bi_rw & REQ_FLUSH;
+
bw.bv_off += (plen<<9);
  

[Devel] [PATCH RHEL7 COMMIT] ms/mm: memcontrol: reclaim when shrinking memory.high below usage

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.17
-->
commit d1074e8489380ff3087edc2ca86e88a2fdb25b05
Author: Johannes Weiner 
Date:   Wed Jun 22 16:15:13 2016 +0400

ms/mm: memcontrol: reclaim when shrinking memory.high below usage

When setting memory.high below usage, nothing happens until the next
charge comes along, and then it will only reclaim its own charge and not
the now potentially huge excess of the new memory.high.  This can cause
groups to stay in excess of their memory.high indefinitely.

To fix that, when shrinking memory.high, kick off a reclaim cycle that
goes after the delta.

https://jira.sw.ru/browse/PSBM-48546

Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
Cc: Vladimir Davydov 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
(cherry picked from commit 588083bb37a3cea8533c392370a554417c8f29cb)
Signed-off-by: Vladimir Davydov 

Conflicts:
mm/memcontrol.c
---
 mm/memcontrol.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index de7c362..1f525f2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5314,7 +5314,7 @@ static int mem_cgroup_high_write(struct cgroup *cont, 
struct cftype *cft,
 const char *buffer)
 {
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
-   unsigned long long val;
+   unsigned long long val, usage;
int ret;
 
ret = res_counter_memparse_write_strategy(buffer, );
@@ -5322,6 +5322,12 @@ static int mem_cgroup_high_write(struct cgroup *cont, 
struct cftype *cft,
return ret;
 
memcg->high = val;
+
+   usage = res_counter_read_u64(>res, RES_USAGE);
+   if (usage > val)
+   try_to_free_mem_cgroup_pages(memcg,
+(usage - val) >> PAGE_SHIFT,
+GFP_KERNEL, false);
return 0;
 }
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] rh/locks: check for fl->fl_owner != filp in show_fd_locks

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.17
-->
commit 71684d5261234b7aa0287dd8fb99b06644147172
Author: Stanislav Kinsburskiy 
Date:   Wed Jun 22 16:07:48 2016 +0400

rh/locks: check for fl->fl_owner != filp in show_fd_locks

NFS emulates flocks via posix lock on server and fl->fl_owner is set to 
filp.

khorenko@: prior to this patch NFS shared locks were not shown in
fdinfo and thus could not be migrated using CRIU.

https://jira.sw.ru/browse/PSBM-48727

Signed-off-by: Stanislav Kinsburskiy 
---
 fs/locks.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/locks.c b/fs/locks.c
index cb7da61..a5ab0c0 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2503,6 +2503,7 @@ void show_fd_locks(struct seq_file *f,
 * matches ->fl_file.
 */
if (fl->fl_owner != files &&
+   fl->fl_owner != (fl_owner_t)filp &&
fl->fl_owner != NULL)
continue;
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve/fs: namespace -- Ignore device permissions during restore

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.17
-->
commit 7eeb5b4afa8db5a2f2e1e47ab6b84e55fc8c5661
Author: Cyrill Gorcunov 
Date:   Wed Jun 22 14:45:43 2016 +0400

ve/fs: namespace -- Ignore device permissions during restore

To support several storage backends (ploops) inside container
we've hacks in libvzctl which setup "old" permissions when
restore procedure initiated. But the former idea was simply
allow CRIU to do all the works and restore ploops mounts
by its own (since CRIU fetches all mount options and such).

For this sake we turn off mount options filtering provisionally
if @is_pseudosuper is set, and CRIU restore mounts as regular
ones.

https://jira.sw.ru/browse/PSBM-48188

Signed-off-by: Cyrill Gorcunov 

CC: Igor Sukhih 
CC: Vladimir Davydov 
CC: Konstantin Khorenko 
---
 fs/namespace.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 4fb935a..3df0ac5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1933,7 +1933,12 @@ again:
if (devmnt->dev == dev) {
err = ve_devmnt_check(data, devmnt->allowed_options);
 
-   if (!err && !remount)
+   /*
+* In case of @is_pseudouser set, ie restore procedure,
+* we don't check for allowed options filtering, since
+* restore mode is special.
+*/
+   if ((ve->is_pseudosuper || !err) && !remount)
err = ve_devmnt_insert(data, 
devmnt->hidden_options);
 
break;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve: mark DEF_PERMS feature deprecated

2016-06-22 Thread Konstantin Khorenko
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.17
-->
commit f4255eae32883b47b6f3027cc07e66d5b982df6a
Author: Evgenii Shatokhin 
Date:   Wed Jun 22 14:40:09 2016 +0400

ve: mark DEF_PERMS feature deprecated

"def_perms" is not mentioned in the man pages for prlctl and vzctl.

VE_FEATURE_DEF_PERMS is only used in the kernel code as a part of
VE_FEATURES_DEF ("ve->features = VE_FEATURES_DEF;" in ve_create()).
No code checks if the bit for this feature is set in ve->features.

Let us mark this feature deprecated, similar to SYSFS and IPGRE
features.

https://jira.sw.ru/browse/PSBM-40280

Signed-off-by: Evgenii Shatokhin 
Reviewed-by: Kirill Tkhai 
---
 include/uapi/linux/vzcalluser.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/vzcalluser.h b/include/uapi/linux/vzcalluser.h
index 2b340cf..bc55bb3 100644
--- a/include/uapi/linux/vzcalluser.h
+++ b/include/uapi/linux/vzcalluser.h
@@ -115,7 +115,7 @@ struct env_create_param3 {
 
 #define VE_FEATURE_SYSFS   (1ULL << 0) /* deprecated */
 #define VE_FEATURE_NFS (1ULL << 1)
-#define VE_FEATURE_DEF_PERMS   (1ULL << 2)
+#define VE_FEATURE_DEF_PERMS   (1ULL << 2) /* deprecated */
 #define VE_FEATURE_SIT  (1ULL << 3)
 #define VE_FEATURE_IPIP (1ULL << 4)
 #define VE_FEATURE_PPP (1ULL << 5)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 2/2] cgroup: un-export cgroup_kernel_* and zap cgroup_kernel_remove

2016-06-22 Thread Vladimir Davydov
After fairsched's gone, cgroup_kernel_remove is not used any more, so
drop it. cgroup_kernel_* family of functions are now used only by
beancounters, which is a part of the kernel, so un-export them.

Signed-off-by: Vladimir Davydov 
---
 include/linux/cgroup.h |  1 -
 kernel/cgroup.c| 26 --
 2 files changed, 27 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 730ca9091bfb..b34239dcdb52 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -55,7 +55,6 @@ struct cgroup *cgroup_kernel_lookup(struct vfsmount *mnt,
const char *pathname);
 struct cgroup *cgroup_kernel_open(struct cgroup *parent,
enum cgroup_open_flags flags, const char *name);
-int cgroup_kernel_remove(struct cgroup *parent, const char *name);
 int cgroup_kernel_attach(struct cgroup *cgrp, struct task_struct *tsk);
 void cgroup_kernel_close(struct cgroup *cgrp);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 581924e7af9e..1c047b9bb1fb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5669,13 +5669,11 @@ struct vfsmount *cgroup_kernel_mount(struct 
cgroup_sb_opts *opts)
 {
return kern_mount_data(_fs_type, opts);
 }
-EXPORT_SYMBOL(cgroup_kernel_mount);
 
 struct cgroup *cgroup_get_root(struct vfsmount *mnt)
 {
return mnt->mnt_root->d_fsdata;
 }
-EXPORT_SYMBOL(cgroup_get_root);
 
 struct cgroup *cgroup_kernel_lookup(struct vfsmount *mnt,
const char *pathname)
@@ -5698,7 +5696,6 @@ struct cgroup *cgroup_kernel_lookup(struct vfsmount *mnt,
path_put();
return cgrp;
 }
-EXPORT_SYMBOL(cgroup_kernel_lookup);
 
 struct cgroup *cgroup_kernel_open(struct cgroup *parent,
enum cgroup_open_flags flags, const char *name)
@@ -5729,27 +5726,6 @@ out:
mutex_unlock(>dentry->d_inode->i_mutex);
return cgrp;
 }
-EXPORT_SYMBOL(cgroup_kernel_open);
-
-int cgroup_kernel_remove(struct cgroup *parent, const char *name)
-{
-   struct dentry *dentry;
-   int ret;
-
-   mutex_lock_nested(>dentry->d_inode->i_mutex, I_MUTEX_PARENT);
-   dentry = lookup_one_len(name, parent->dentry, strlen(name));
-   ret = PTR_ERR(dentry);
-   if (IS_ERR(dentry))
-   goto out;
-   ret = -ENOENT;
-   if (dentry->d_inode)
-   ret = vfs_rmdir(parent->dentry->d_inode, dentry);
-   dput(dentry);
-out:
-   mutex_unlock(>dentry->d_inode->i_mutex);
-   return ret;
-}
-EXPORT_SYMBOL(cgroup_kernel_remove);
 
 int cgroup_kernel_attach(struct cgroup *cgrp, struct task_struct *tsk)
 {
@@ -5761,7 +5737,6 @@ int cgroup_kernel_attach(struct cgroup *cgrp, struct 
task_struct *tsk)
mutex_unlock(_mutex);
return ret;
 }
-EXPORT_SYMBOL(cgroup_kernel_attach);
 
 void cgroup_kernel_close(struct cgroup *cgrp)
 {
@@ -5770,4 +5745,3 @@ void cgroup_kernel_close(struct cgroup *cgrp)
check_for_release(cgrp);
}
 }
-EXPORT_SYMBOL(cgroup_kernel_close);
-- 
2.1.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] Remove container and beancounter directories from /proc/vz

2016-06-22 Thread Vladimir Davydov
In PCS6, cgroups were mounted there. Now they are unused, as all cgroups
are supposed to be mounted by systemd under /sys/fs/cgroup.

Signed-off-by: Vladimir Davydov 
---
 kernel/bc/proc.c| 1 -
 kernel/ve/veowner.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/kernel/bc/proc.c b/kernel/bc/proc.c
index 3a3b4e3f28c8..9f60d9991e0a 100644
--- a/kernel/bc/proc.c
+++ b/kernel/bc/proc.c
@@ -754,7 +754,6 @@ static int __init ub_init_proc(void)
entry = proc_create("user_beancounters",
S_IRUSR|S_ISVTX, NULL, _file_operations);
proc_create("vswap", S_IRUSR, proc_vz_dir, _vswap_fops);
-   proc_mkdir_mode("beancounter", 0, proc_vz_dir);
return 0;
 }
 
diff --git a/kernel/ve/veowner.c b/kernel/ve/veowner.c
index 86065072a9ca..7642191bf517 100644
--- a/kernel/ve/veowner.c
+++ b/kernel/ve/veowner.c
@@ -41,7 +41,6 @@ static void prepare_proc(void)
proc_vz_dir = proc_mkdir_mode("vz", S_ISVTX | S_IRUGO | S_IXUGO, NULL);
if (!proc_vz_dir)
panic("Can't create /proc/vz dir\n");
-   proc_mkdir_mode("container", 0, proc_vz_dir);
 }
 #endif
 
-- 
2.1.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7 1/2] ve: drop ve_cgroup_open and ve_cgroup_remove

2016-06-22 Thread Vladimir Davydov
Fairsched was the last user of these functions. After it's gone, we
don't need them any longer.

Signed-off-by: Vladimir Davydov 
---
 include/linux/ve_proto.h |  2 --
 kernel/ve/ve.c   | 21 -
 2 files changed, 23 deletions(-)

diff --git a/include/linux/ve_proto.h b/include/linux/ve_proto.h
index 8cc7fe3ba2a3..d2dc12d2f2c2 100644
--- a/include/linux/ve_proto.h
+++ b/include/linux/ve_proto.h
@@ -50,8 +50,6 @@ extern struct list_head ve_list_head;
 #define for_each_ve(ve)list_for_each_entry((ve), _list_head, 
ve_list)
 extern struct mutex ve_list_lock;
 extern struct ve_struct *get_ve_by_id(envid_t);
-extern struct cgroup *ve_cgroup_open(struct cgroup *root, int flags, envid_t 
veid);
-extern int ve_cgroup_remove(struct cgroup *root, envid_t veid);
 
 extern int nr_threads_ve(struct ve_struct *ve);
 
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index 2459cb53a665..9995dbcd1623 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -156,27 +156,6 @@ const char *ve_name(struct ve_struct *ve)
 }
 EXPORT_SYMBOL(ve_name);
 
-/* Cgroup must be closed with cgroup_kernel_close */
-struct cgroup *ve_cgroup_open(struct cgroup *root, int flags, envid_t veid)
-{
-   char name[16];
-   struct cgroup *cgrp;
-
-   snprintf(name, sizeof(name), "%u", veid);
-   cgrp = cgroup_kernel_open(root, flags, name);
-   return cgrp ? cgrp : ERR_PTR(-ENOENT);
-}
-EXPORT_SYMBOL(ve_cgroup_open);
-
-int ve_cgroup_remove(struct cgroup *root, envid_t veid)
-{
-   char name[16];
-
-   snprintf(name, sizeof(name), "%u", veid);
-   return cgroup_kernel_remove(root, name);
-}
-EXPORT_SYMBOL(ve_cgroup_remove);
-
 /* under rcu_read_lock if task != current */
 const char *task_ve_name(struct task_struct *task)
 {
-- 
2.1.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] Drop CAP_VE_ADMIN and CAP_VE_NET_ADMIN

2016-06-22 Thread Vladimir Davydov
Not needed anymore as we use user ns for capability checking.
Also, move capable_setveid() helper to ve.h so as not to pollute
generic headers.

Signed-off-by: Vladimir Davydov 
---
 include/linux/ve.h  |  3 +++
 include/uapi/linux/capability.h | 55 -
 2 files changed, 3 insertions(+), 55 deletions(-)

diff --git a/include/linux/ve.h b/include/linux/ve.h
index cea3a87cb9c0..247cadb78c06 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -138,6 +138,9 @@ struct ve_devmnt {
 #define VE_MEMINFO_DEFAULT  1   /* default behaviour */
 #define VE_MEMINFO_SYSTEM   0   /* disable meminfo virtualization */
 
+#define capable_setveid() \
+   (ve_is_super(get_exec_env()) && capable(CAP_SYS_ADMIN))
+
 extern int nr_ve;
 extern struct proc_dir_entry *proc_vz_dir;
 extern struct cgroup_subsys ve_subsys;
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index cadbfe6109e8..b3d37bb108b8 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -307,61 +307,6 @@ struct vfs_cap_data {
 
 #define CAP_SETFCAP 31
 
-#ifdef __KERNEL__
-/*
- * Important note: VZ capabilities do intersect with CAP_AUDIT
- * this is due to compatibility reasons. Nothing bad.
- * Both VZ and Audit/SELinux caps are disabled in VPSs.
- */
-
-/* Allow access to all information. In the other case some structures will be
- * hiding to ensure different Virtual Environment non-interaction on the same
- * node (NOW OBSOLETED)
- */
-#define CAP_SETVEID 29
-
-#define capable_setveid()  ({  \
-   ve_is_super(get_exec_env()) &&  \
-   (capable(CAP_SYS_ADMIN) ||  \
-capable(CAP_VE_ADMIN));\
-   })
-
-/*
- * coinsides with CAP_AUDIT_CONTROL but we don't care, since
- * audit is disabled in Virtuozzo
- */
-#define CAP_VE_ADMIN30
-
-#ifdef CONFIG_VE
-
-/* Replacement for CAP_NET_ADMIN:
-   delegated rights to the Virtual environment of its network administration.
-   For now the following rights have been delegated:
-
-   Allow setting arbitrary process / process group ownership on sockets
-   Allow interface configuration
- */
-#define CAP_VE_NET_ADMIN CAP_VE_ADMIN
-
-/* Replacement for CAP_SYS_ADMIN:
-   delegated rights to the Virtual environment of its administration.
-   For now the following rights have been delegated:
- */
-/* Allow mount/umount/remount */
-/* Allow examination and configuration of disk quotas */
-/* Allow removing semaphores */
-/* Used instead of CAP_CHOWN to "chown" IPC message queues, semaphores
-   and shared memory */
-/* Allow locking/unlocking of shared memory segment */
-/* Allow forged pids on socket credentials passing */
-
-#define CAP_VE_SYS_ADMIN CAP_VE_ADMIN
-#else
-#define CAP_VE_NET_ADMIN CAP_NET_ADMIN
-#define CAP_VE_SYS_ADMIN CAP_SYS_ADMIN
-#endif
-#endif
-
 /* Override MAC access.
The base kernel enforces no MAC policy.
An LSM may enforce a MAC policy, and if it does and it chooses
-- 
2.1.4

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh7] ploop: fix barriers for ordinary requests

2016-06-22 Thread Dmitry Monakhov
Maxim Patlasov  writes:

> The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA
> is completely wrong: to make sure that b1:FLUSH made effect we have to
> wait for its completion. Similarly, even if we're sure that FUA will be
> processed as post-FLUSH (also dubious!), we have to wait for completion
> b1..b4 to make sure that that flush will cover them.
>
> The patch fixes all these issues pretty simple: let's mark outgouing
> bio-s with FLUSH|FUA based on those flags in *corresponing* incoming
> bio-s.
>
> Signed-off-by: Maxim Patlasov 
> ---
>  drivers/block/ploop/dev.c   |1 -
>  drivers/block/ploop/io_direct.c |   47 
> ---
>  2 files changed, 15 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
> index 2ef1449..6b5702f 100644
> --- a/drivers/block/ploop/dev.c
> +++ b/drivers/block/ploop/dev.c
> @@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * 
> bio,
>   preq->req_sector = bio->bi_sector;
>   preq->req_size = bio->bi_size >> 9;
>   preq->req_rw = bio->bi_rw;
> - bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA);
Wow. I can't even imagine that we clear barrier flags from original bios
>   preq->eng_state = PLOOP_E_ENTRY;
>   preq->state = 0;
>   preq->error = 0;
> diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
> index 6ef9cd8..84c9a48 100644
> --- a/drivers/block/ploop/io_direct.c
> +++ b/drivers/block/ploop/io_direct.c
> @@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
>   int preflush;
>   int postfua = 0;
>   int write = !!(rw & REQ_WRITE);
> - int bio_num;
>  
>   trace_submit(preq);
>  
> @@ -233,13 +232,13 @@ flush_bio:
>   goto flush_bio;
>   }
>  
> + bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA);
>   bw.bv_off += copy;
>   size -= copy >> 9;
>   sec += copy >> 9;
>   }
>   ploop_extent_put(em);
>  
> - bio_num = 0;
>   while (bl.head) {
>   struct bio * b = bl.head;
>   unsigned long rw2 = rw;
> @@ -255,11 +254,10 @@ flush_bio:
>   preflush = 0;
>   }
>   if (unlikely(postfua && !bl.head))
> - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
> + rw2 |= REQ_FUA;
>  
>   ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
> - submit_bio(rw2, b);
> - bio_num++;
> + submit_bio(rw2 | b->bi_rw, b);
>   }
>  
>   ploop_complete_io_request(preq);
> @@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request 
> * preq,
>   sector_t sec, end_sec, nsec, start, end;
>   struct bio_list_walk bw;
>   int err;
> - int preflush = !!(preq->req_rw & REQ_FLUSH);
>  
>   bio_list_init();
>  
> @@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct 
> ploop_request * preq,
>   while (sec < end_sec) {
>   struct page * page;
>   unsigned int poff, plen;
> + bool zero_page;
>  
>   if (sec < start) {
> + zero_page = true;
>   page = ZERO_PAGE(0);
>   poff = 0;
>   plen = start - sec;
>   if (plen > (PAGE_SIZE>>9))
>   plen = (PAGE_SIZE>>9);
>   } else if (sec >= end) {
> + zero_page = true;
>   page = ZERO_PAGE(0);
>   poff = 0;
>   plen = end_sec - sec;
> @@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request 
> * preq,
>   } else {
>   /* sec >= start && sec < end */
>   struct bio_vec * bv;
> + zero_page = false;
>  
>   if (sec == start) {
>   bw.cur = sbl->head;
> @@ -672,6 +673,10 @@ flush_bio:
>   goto flush_bio;
>   }
>  
> + /* Handle FLUSH here, dio_post_submit will handle FUA */
> + if (!zero_page)
> + bio->bi_rw |= bw.cur->bi_rw & REQ_FLUSH;
> +
>   bw.bv_off += (plen<<9);
>   BUG_ON(plen == 0);
>   sec += plen;
> @@ -688,13 +693,9 @@ flush_bio:
>   b->bi_private = preq;
>   b->bi_end_io = dio_endio_async;
>  
> - rw = sbl->head->bi_rw | WRITE;
> - if (unlikely(preflush)) {
> - rw |= REQ_FLUSH;
> - preflush = 0;
> - }
> + rw = preq->req_rw & ~(REQ_FLUSH | REQ_FUA);
>   ploop_acc_ff_out(preq->plo, rw | b->bi_rw);
> - submit_bio(rw, b);
> + submit_bio(rw |