Re: [Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v3
Dima, I'm uneasy that we still have handling RELOC_A|S broken. It seems we have full agreement that for such requests we can do unconditional FLUSH|FUA when we call write_page from ploop_index_update() and map_wb_complete(). And your idea to implement it by passing FLUSH|FUA for io_direct and post_fsync=1 for io_kaio is smart and OK. Will you send patch for that (fix barriers for RELOC_A|S requests)? Thanks, Maxim On 06/21/2016 04:56 PM, Maxim Patlasov wrote: Dima, After more thinking I realized that the whole idea of PLOOP_REQ_DELAYED_FLUSH might be bogus: it is possible that we simply do not have many enough incoming FUA-s to make delaying lucrative. This patch actually mixes three things: 1) fix barriers for RELOC_A|S requests, 2) fix barriers for ordinary requests, 3) DELAYED_FLUSH optimization. So, please, split the patch into three and make some measurements demonstrating that applying "DELAYED_FLUSH optimization" patch on top of previous patches improves performance. I have an idea about how to fix barriers for ordinary requests -- see please the patch I'll send soon. The key point is that handling FLUSH-es is broken the same way as FUA: if you observe (rw & REQ_FLUSH) and sends first bio marked as REQ_FLUSH, it guarantees nothing unless you wait for completion before submitting further bio-s! And ploop simply does not have the logic of waiting the first before sending others. And, to make things worse, not only dio_submit is affected, dio_sibmit_pad and dio_io_page to be fixed too. There are also some inline comments below... On 06/21/2016 06:55 AM, Dmitry Monakhov wrote: barrier code is broken in many ways: Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly. But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes) So in case of grow_dev we have following sequance: E_RELOC_DATA_READ: ->set_bit(PLOOP_REQ_FORCE_FUA, >state); ->delta->allocate ->io->submit_allloc: dio_submit_alloc ->dio_submit_pad E_DATA_WBI : data written, time to update index ->delta->allocate_complete:ploop_index_update ->set_bit(PLOOP_REQ_FORCE_FUA, >state); ->write_page ->ploop_map_wb_complete ->ploop_wb_complete_post_process ->set_bit(PLOOP_REQ_FORCE_FUA, >state); E_RELOC_NULLIFY: ->submit() BUG#2: currecntly kaio write_page silently ignores REQ_FLUSH BUG#3: io_direct:dio_submit if fua_delay is not possible we MUST tag all bios via REQ_FUA not just latest one. This patch unify barrier handling like follows: - Get rid of FORCE_{FLUSH,FUA} - Introduce DELAYED_FLUSH - fix fua handling for dio_submit - BUG_ON for REQ_FLUSH in kaio_page_write This makes reloc sequence optimal: io_direct RELOC_S: R1, W2, WBI:FLUSH|FUA RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA io_kaio RELOC_S: R1, W2:FUA, WBI:FUA RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA https://jira.sw.ru/browse/PSBM-47107 Signed-off-by: Dmitry Monakhov--- drivers/block/ploop/dev.c | 8 +--- drivers/block/ploop/io_direct.c | 30 ++- drivers/block/ploop/io_kaio.c | 23 + drivers/block/ploop/map.c | 45 ++--- include/linux/ploop/ploop.h | 19 + 5 files changed, 60 insertions(+), 65 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 96f7850..fbc5f2f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq) __TRACE("Z %p %u\n", preq, preq->req_cluster); +if (!preq->error) { +WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, >state)); +} while (preq->bl.head) { struct bio * bio = preq->bl.head; preq->bl.head = bio->bi_next; @@ -2530,9 +2533,8 @@ restart: top_delta = ploop_top_delta(plo); sbl.head = sbl.tail = preq->aux_bio; -/* Relocated data write required sync before BAT updatee */ -set_bit(PLOOP_REQ_FORCE_FUA, >state); - +/* Relocated data write required sync before BAT updatee + * this will happen inside index_update */ if (test_bit(PLOOP_REQ_RELOC_S, >state)) { preq->eng_state = PLOOP_E_DATA_WBI; plo->st.bio_out++; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index a6d83fe..303eb70 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -83,28 +83,19 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, int err; struct bio_list_walk bw; int preflush; -int postfua = 0; +int fua = 0; int write = !!(rw & REQ_WRITE); int bio_num; Your patch obsoletes bio_num.
Re: [Devel] [PATCH rh7] ploop: fix barriers for ordinary requests
On 06/22/2016 06:41 AM, Dmitry Monakhov wrote: Maxim Patlasovwrites: The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA is completely wrong: to make sure that b1:FLUSH made effect we have to wait for its completion. Similarly, even if we're sure that FUA will be processed as post-FLUSH (also dubious!), we have to wait for completion b1..b4 to make sure that that flush will cover them. The patch fixes all these issues pretty simple: let's mark outgouing bio-s with FLUSH|FUA based on those flags in *corresponing* incoming bio-s. One more thing please see below. Signed-off-by: Maxim Patlasov --- drivers/block/ploop/dev.c |1 - drivers/block/ploop/io_direct.c | 47 --- 2 files changed, 15 insertions(+), 33 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 2ef1449..6b5702f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * bio, preq->req_sector = bio->bi_sector; preq->req_size = bio->bi_size >> 9; preq->req_rw = bio->bi_rw; - bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA); preq->eng_state = PLOOP_E_ENTRY; preq->state = 0; preq->error = 0; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index 6ef9cd8..84c9a48 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, int preflush; int postfua = 0; int write = !!(rw & REQ_WRITE); - int bio_num; trace_submit(preq); @@ -233,13 +232,13 @@ flush_bio: goto flush_bio; } + bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA); bw.bv_off += copy; size -= copy >> 9; sec += copy >> 9; } ploop_extent_put(em); - bio_num = 0; while (bl.head) { struct bio * b = bl.head; unsigned long rw2 = rw; @@ -255,11 +254,10 @@ flush_bio: preflush = 0; } if (unlikely(postfua && !bl.head)) - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0)); + rw2 |= REQ_FUA; ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw); - submit_bio(rw2, b); - bio_num++; + submit_bio(rw2 | b->bi_rw, b); } ploop_complete_io_request(preq); @@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq, sector_t sec, end_sec, nsec, start, end; struct bio_list_walk bw; int err; - int preflush = !!(preq->req_rw & REQ_FLUSH); bio_list_init(); @@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq, while (sec < end_sec) { struct page * page; unsigned int poff, plen; + bool zero_page; if (sec < start) { + zero_page = true; page = ZERO_PAGE(0); poff = 0; plen = start - sec; if (plen > (PAGE_SIZE>>9)) plen = (PAGE_SIZE>>9); } else if (sec >= end) { + zero_page = true; page = ZERO_PAGE(0); poff = 0; plen = end_sec - sec; @@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq, } else { /* sec >= start && sec < end */ struct bio_vec * bv; + zero_page = false; if (sec == start) { bw.cur = sbl->head; @@ -672,6 +673,10 @@ flush_bio: goto flush_bio; } + /* Handle FLUSH here, dio_post_submit will handle FUA */ submit_pad may be called w/o post_submit flag from here: ->dio_submit_alloc if (io->files.em_tree->_get_extent) { ->dio_fallocate ->dio_submit_pad .. } We never has _get_extent set. This is legacy code for PCSS support, we'll remove it. For now, we can safely ignore this. Thanks, Maxim ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH 2/3] ploop: deadcode cleanup
Kostya, The patch is OK per-se, please commit it with: Acked-by: Maxim PatlasovThanks, Maxim On 06/21/2016 06:55 AM, Dmitry Monakhov wrote: (rw & REQ_FUA) branch is impossible because REQ_FUA was cleared line above. Logic was moved to ploop_req_delay_fua_possible() long time ago. Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/io_direct.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index 58d7580..a6d83fe 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -108,15 +108,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, rw &= ~(REQ_FLUSH | REQ_FUA); - /* In case of eng_state != COMPLETE, we'll do FUA in -* ploop_index_update(). Otherwise, we should mark -* last bio as FUA here. */ - if (rw & REQ_FUA) { - rw &= ~REQ_FUA; - if (preq->eng_state == PLOOP_E_COMPLETE) - postfua = 1; - } - bio_list_init(); if (iblk == PLOOP_ZERO_INDEX) ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH 1/3] ploop: skip redundant fsync for REQ_FUA in post_submit
Kostya, The patch is OK per-se, please commit it with: Acked-by: Maxim PatlasovThanks, Maxim On 06/21/2016 06:55 AM, Dmitry Monakhov wrote: Signed-off-by: Dmitry Monakhov --- drivers/block/ploop/io_direct.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index b844a80..58d7580 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -517,27 +517,31 @@ dio_post_submit(struct ploop_io *io, struct ploop_request * preq) struct ploop_device *plo = preq->plo; sector_t sec = (sector_t)preq->iblock << preq->plo->cluster_log; loff_t clu_siz = 1 << (preq->plo->cluster_log + 9); + int force_sync = preq->req_rw & REQ_FUA; int err; file_start_write(io->files.file); - /* Here io->io_count is even ... */ - spin_lock_irq(>lock); - io->io_count++; - set_bit(PLOOP_IO_FSYNC_DELAYED, >io_state); - spin_unlock_irq(>lock); - + if (!force_sync) { + /* Here io->io_count is even ... */ + spin_lock_irq(>lock); + io->io_count++; + set_bit(PLOOP_IO_FSYNC_DELAYED, >io_state); + spin_unlock_irq(>lock); + } err = io->files.file->f_op->fallocate(io->files.file, FALLOC_FL_CONVERT_UNWRITTEN, (loff_t)sec << 9, clu_siz); /* highly unlikely case: FUA coming to a block not provisioned yet */ - if (!err && (preq->req_rw & REQ_FUA)) + if (!err && force_sync) err = io->ops->sync(io); - spin_lock_irq(>lock); - io->io_count++; - spin_unlock_irq(>lock); + if (!force_sync) { + spin_lock_irq(>lock); + io->io_count++; + spin_unlock_irq(>lock); + } /* and here io->io_count is even (+2) again. */ file_end_write(io->files.file); ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [NEW KERNEL] 3.10.0-327.18.2.vz7.14.19 (rhel7)
Changelog: OpenVZ kernel rh7-3.10.0-327.18.2.vz7.14.19 * cpustat: an attempt to update vcpustats for root_task_group led to kernel panic Changelog since kernel rh7-3.10.0-327.18.2.vz7.14.17: * cleanup: dropped CAP_VE_ADMIN and CAP_VE_NET_ADMIN, mark DEF_PERMS feature deprecated * mm: when setting memory.high below usage, reclaim right now * locks: NFS shared locks to be shown in fdinfo (required for CRIU) * fs: ignore device permissions in CRIU restore stage * ploop: fix barriers for ordinary requests Generated changelog: * Wed Jun 22 2016 Konstantin Khorenko[3.10.0-327.18.2.vz7.14.19] - ve/cpustat: don't try to update vcpustats for root_task_group (Andrey Ryabinin) [PSBM-48721] Built packages: http://kojistorage.eng.sw.ru/packages/vzkernel/3.10.0/327.18.2.vz7.14.19/ ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH] binfmt_misc: Allow mount if capable(CAP_SYS_ADMIN)
On 06/22/2016 04:42 PM, Kirill Tkhai wrote: > The patch allows to mount binfmt_misc in a CT with ve0's admin caps, > and it's need that for CRIU dump. This time, unmounted binfmt_misc > may be forced mounted back, and we don't want to change CRIU's user_ns > to do that. > > Signed-off-by: Kirill TkhaiReviewed-by: Andrey Ryabinin ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH rh7] ploop: fix barriers for ordinary requests
Maxim Patlasovwrites: > The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA > is completely wrong: to make sure that b1:FLUSH made effect we have to > wait for its completion. Similarly, even if we're sure that FUA will be > processed as post-FLUSH (also dubious!), we have to wait for completion > b1..b4 to make sure that that flush will cover them. > > The patch fixes all these issues pretty simple: let's mark outgouing > bio-s with FLUSH|FUA based on those flags in *corresponing* incoming > bio-s. One more thing please see below. > > Signed-off-by: Maxim Patlasov > --- > drivers/block/ploop/dev.c |1 - > drivers/block/ploop/io_direct.c | 47 > --- > 2 files changed, 15 insertions(+), 33 deletions(-) > > diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c > index 2ef1449..6b5702f 100644 > --- a/drivers/block/ploop/dev.c > +++ b/drivers/block/ploop/dev.c > @@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * > bio, > preq->req_sector = bio->bi_sector; > preq->req_size = bio->bi_size >> 9; > preq->req_rw = bio->bi_rw; > - bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA); > preq->eng_state = PLOOP_E_ENTRY; > preq->state = 0; > preq->error = 0; > diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c > index 6ef9cd8..84c9a48 100644 > --- a/drivers/block/ploop/io_direct.c > +++ b/drivers/block/ploop/io_direct.c > @@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, > int preflush; > int postfua = 0; > int write = !!(rw & REQ_WRITE); > - int bio_num; > > trace_submit(preq); > > @@ -233,13 +232,13 @@ flush_bio: > goto flush_bio; > } > > + bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA); > bw.bv_off += copy; > size -= copy >> 9; > sec += copy >> 9; > } > ploop_extent_put(em); > > - bio_num = 0; > while (bl.head) { > struct bio * b = bl.head; > unsigned long rw2 = rw; > @@ -255,11 +254,10 @@ flush_bio: > preflush = 0; > } > if (unlikely(postfua && !bl.head)) > - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0)); > + rw2 |= REQ_FUA; > > ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw); > - submit_bio(rw2, b); > - bio_num++; > + submit_bio(rw2 | b->bi_rw, b); > } > > ploop_complete_io_request(preq); > @@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request > * preq, > sector_t sec, end_sec, nsec, start, end; > struct bio_list_walk bw; > int err; > - int preflush = !!(preq->req_rw & REQ_FLUSH); > > bio_list_init(); > > @@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct > ploop_request * preq, > while (sec < end_sec) { > struct page * page; > unsigned int poff, plen; > + bool zero_page; > > if (sec < start) { > + zero_page = true; > page = ZERO_PAGE(0); > poff = 0; > plen = start - sec; > if (plen > (PAGE_SIZE>>9)) > plen = (PAGE_SIZE>>9); > } else if (sec >= end) { > + zero_page = true; > page = ZERO_PAGE(0); > poff = 0; > plen = end_sec - sec; > @@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request > * preq, > } else { > /* sec >= start && sec < end */ > struct bio_vec * bv; > + zero_page = false; > > if (sec == start) { > bw.cur = sbl->head; > @@ -672,6 +673,10 @@ flush_bio: > goto flush_bio; > } > > + /* Handle FLUSH here, dio_post_submit will handle FUA */ submit_pad may be called w/o post_submit flag from here: ->dio_submit_alloc if (io->files.em_tree->_get_extent) { ->dio_fallocate ->dio_submit_pad .. } > + if (!zero_page) > + bio->bi_rw |= bw.cur->bi_rw & REQ_FLUSH; > + > bw.bv_off += (plen<<9); > BUG_ON(plen == 0); > sec += plen; > @@ -688,13 +693,9 @@ flush_bio: > b->bi_private = preq; > b->bi_end_io = dio_endio_async; > > - rw = sbl->head->bi_rw | WRITE; > - if (unlikely(preflush)) { > - rw |= REQ_FLUSH; > - preflush = 0; > - } > + rw = preq->req_rw & ~(REQ_FLUSH | REQ_FUA); >
[Devel] [PATCH] binfmt_misc: Allow mount if capable(CAP_SYS_ADMIN)
The patch allows to mount binfmt_misc in a CT with ve0's admin caps, and it's need that for CRIU dump. This time, unmounted binfmt_misc may be forced mounted back, and we don't want to change CRIU's user_ns to do that. Signed-off-by: Kirill Tkhai--- fs/binfmt_misc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index fd5227f..e259022 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -735,7 +735,7 @@ static int bm_fill_super(struct super_block * sb, void * data, int silent) static struct dentry *bm_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) { - if (!current_user_ns_initial()) + if (!current_user_ns_initial() && !capable(CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); return mount_ns(fs_type, flags, get_exec_env(), bm_fill_super); } ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ve/cpustat: don't try to update vcpustats for root_task_group
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.18 --> commit 6a887128c0ff214571da1451d7336e3c9bb8d86a Author: Andrey RyabininDate: Wed Jun 22 17:19:39 2016 +0400 ve/cpustat: don't try to update vcpustats for root_task_group root_task_group doesn't have vcpu stats. Attempt to update them leads to NULL-ptr deref: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] cpu_cgroup_update_vcpustat+0x13c/0x620 ... Call Trace: [] cpu_cgroup_get_stat+0x7b/0x180 [] ve_get_cpu_stat+0x27/0x70 [] fill_cpu_stat+0x91/0x1e0 [vzmon] [] vzcalls_ioctl+0x2bb/0x430 [vzmon] [] vzctl_ioctl+0x45/0x60 [vzdev] [] do_vfs_ioctl+0x255/0x4f0 [] SyS_ioctl+0x54/0xa0 [] system_call_fastpath+0x16/0x1b So, return -ENOENT if we asked for vcpu stats of root_task_group. https://jira.sw.ru/browse/PSBM-48721 Signed-off-by: Andrey Ryabinin Reviewed-by: Vladimir Davydov --- kernel/sched/core.c | 10 -- kernel/ve/ve.c | 7 --- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e885549..94deef4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9120,20 +9120,26 @@ int cpu_cgroup_proc_loadavg(struct cgroup *cgrp, struct cftype *cft, return 0; } -void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat) +int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat) { struct task_group *tg = cgroup_tg(cgrp); int nr_vcpus = tg->nr_cpus ?: num_online_cpus(); int i; + kernel_cpustat_zero(kstat); + + if (tg == _task_group) + return -ENOENT; + for_each_possible_cpu(i) cpu_cgroup_update_stat(cgrp, i); cpu_cgroup_update_vcpustat(cgrp); - kernel_cpustat_zero(kstat); for (i = 0; i < nr_vcpus; i++) kernel_cpustat_add(tg->vcpustat + i, kstat, kstat); + + return 0; } int cpu_cgroup_get_avenrun(struct cgroup *cgrp, unsigned long *avenrun) diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c index 2459cb5..d196e3e 100644 --- a/kernel/ve/ve.c +++ b/kernel/ve/ve.c @@ -1448,16 +1448,17 @@ int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned long *avenrun) } EXPORT_SYMBOL(ve_get_cpu_avenrun); -void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat); +int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat); int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat) { struct cgroup_subsys_state *css; + int err; css = ve_get_init_css(ve, cpu_cgroup_subsys_id); - cpu_cgroup_get_stat(css->cgroup, kstat); + err = cpu_cgroup_get_stat(css->cgroup, kstat); css_put(css); - return 0; + return err; } EXPORT_SYMBOL(ve_get_cpu_stat); #endif /* CONFIG_CGROUP_SCHED */ ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH rh7] ve/cpustat: don't try to update vcpustats for root_task_group
On Wed, Jun 22, 2016 at 03:59:05PM +0300, Andrey Ryabinin wrote: > root_task_group doesn't have vcpu stats. Attempt to upate those leads > to NULL-ptr deref: > > BUG: unable to handle kernel NULL pointer dereference at > (null) > IP: [] cpu_cgroup_update_vcpustat+0x13c/0x620 > ... > Call Trace: >[] cpu_cgroup_get_stat+0x7b/0x180 >[] ve_get_cpu_stat+0x27/0x70 >[] fill_cpu_stat+0x91/0x1e0 [vzmon] >[] vzcalls_ioctl+0x2bb/0x430 [vzmon] >[] vzctl_ioctl+0x45/0x60 [vzdev] >[] do_vfs_ioctl+0x255/0x4f0 >[] SyS_ioctl+0x54/0xa0 >[] system_call_fastpath+0x16/0x1b > > So, return -ENOENT if we asked for vcpu stats of root_task_group. > > https://jira.sw.ru/browse/PSBM-48721 > > Signed-off-by: Andrey RyabininReviewed-by: Vladimir Davydov ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7] ve/cpustat: don't try to update vcpustats for root_task_group
root_task_group doesn't have vcpu stats. Attempt to upate those leads to NULL-ptr deref: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] cpu_cgroup_update_vcpustat+0x13c/0x620 ... Call Trace: [] cpu_cgroup_get_stat+0x7b/0x180 [] ve_get_cpu_stat+0x27/0x70 [] fill_cpu_stat+0x91/0x1e0 [vzmon] [] vzcalls_ioctl+0x2bb/0x430 [vzmon] [] vzctl_ioctl+0x45/0x60 [vzdev] [] do_vfs_ioctl+0x255/0x4f0 [] SyS_ioctl+0x54/0xa0 [] system_call_fastpath+0x16/0x1b So, return -ENOENT if we asked for vcpu stats of root_task_group. https://jira.sw.ru/browse/PSBM-48721 Signed-off-by: Andrey Ryabinin--- kernel/sched/core.c | 10 -- kernel/ve/ve.c | 7 --- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e885549..94deef4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9120,20 +9120,26 @@ int cpu_cgroup_proc_loadavg(struct cgroup *cgrp, struct cftype *cft, return 0; } -void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat) +int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat) { struct task_group *tg = cgroup_tg(cgrp); int nr_vcpus = tg->nr_cpus ?: num_online_cpus(); int i; + kernel_cpustat_zero(kstat); + + if (tg == _task_group) + return -ENOENT; + for_each_possible_cpu(i) cpu_cgroup_update_stat(cgrp, i); cpu_cgroup_update_vcpustat(cgrp); - kernel_cpustat_zero(kstat); for (i = 0; i < nr_vcpus; i++) kernel_cpustat_add(tg->vcpustat + i, kstat, kstat); + + return 0; } int cpu_cgroup_get_avenrun(struct cgroup *cgrp, unsigned long *avenrun) diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c index 2459cb5..d196e3e 100644 --- a/kernel/ve/ve.c +++ b/kernel/ve/ve.c @@ -1448,16 +1448,17 @@ int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned long *avenrun) } EXPORT_SYMBOL(ve_get_cpu_avenrun); -void cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat); +int cpu_cgroup_get_stat(struct cgroup *cgrp, struct kernel_cpustat *kstat); int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat) { struct cgroup_subsys_state *css; + int err; css = ve_get_init_css(ve, cpu_cgroup_subsys_id); - cpu_cgroup_get_stat(css->cgroup, kstat); + err = cpu_cgroup_get_stat(css->cgroup, kstat); css_put(css); - return 0; + return err; } EXPORT_SYMBOL(ve_get_cpu_stat); #endif /* CONFIG_CGROUP_SCHED */ -- 2.7.3 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ve: drop not used CAP_VE_ADMIN and CAP_VE_NET_ADMIN
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.17 --> commit cb6242e909e43182b9bdcd08342b50500d5bad84 Author: Vladimir DavydovDate: Wed Jun 22 16:48:45 2016 +0400 ve: drop not used CAP_VE_ADMIN and CAP_VE_NET_ADMIN Not needed anymore as we use user ns for capability checking. Also, move capable_setveid() helper to ve.h so as not to pollute generic headers. Signed-off-by: Vladimir Davydov --- include/linux/ve.h | 3 +++ include/uapi/linux/capability.h | 55 - 2 files changed, 3 insertions(+), 55 deletions(-) diff --git a/include/linux/ve.h b/include/linux/ve.h index cea3a87..247cadb 100644 --- a/include/linux/ve.h +++ b/include/linux/ve.h @@ -138,6 +138,9 @@ struct ve_devmnt { #define VE_MEMINFO_DEFAULT 1 /* default behaviour */ #define VE_MEMINFO_SYSTEM 0 /* disable meminfo virtualization */ +#define capable_setveid() \ + (ve_is_super(get_exec_env()) && capable(CAP_SYS_ADMIN)) + extern int nr_ve; extern struct proc_dir_entry *proc_vz_dir; extern struct cgroup_subsys ve_subsys; diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index cadbfe6..b3d37bb 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -307,61 +307,6 @@ struct vfs_cap_data { #define CAP_SETFCAP 31 -#ifdef __KERNEL__ -/* - * Important note: VZ capabilities do intersect with CAP_AUDIT - * this is due to compatibility reasons. Nothing bad. - * Both VZ and Audit/SELinux caps are disabled in VPSs. - */ - -/* Allow access to all information. In the other case some structures will be - * hiding to ensure different Virtual Environment non-interaction on the same - * node (NOW OBSOLETED) - */ -#define CAP_SETVEID 29 - -#define capable_setveid() ({ \ - ve_is_super(get_exec_env()) && \ - (capable(CAP_SYS_ADMIN) || \ -capable(CAP_VE_ADMIN));\ - }) - -/* - * coinsides with CAP_AUDIT_CONTROL but we don't care, since - * audit is disabled in Virtuozzo - */ -#define CAP_VE_ADMIN30 - -#ifdef CONFIG_VE - -/* Replacement for CAP_NET_ADMIN: - delegated rights to the Virtual environment of its network administration. - For now the following rights have been delegated: - - Allow setting arbitrary process / process group ownership on sockets - Allow interface configuration - */ -#define CAP_VE_NET_ADMIN CAP_VE_ADMIN - -/* Replacement for CAP_SYS_ADMIN: - delegated rights to the Virtual environment of its administration. - For now the following rights have been delegated: - */ -/* Allow mount/umount/remount */ -/* Allow examination and configuration of disk quotas */ -/* Allow removing semaphores */ -/* Used instead of CAP_CHOWN to "chown" IPC message queues, semaphores - and shared memory */ -/* Allow locking/unlocking of shared memory segment */ -/* Allow forged pids on socket credentials passing */ - -#define CAP_VE_SYS_ADMIN CAP_VE_ADMIN -#else -#define CAP_VE_NET_ADMIN CAP_NET_ADMIN -#define CAP_VE_SYS_ADMIN CAP_SYS_ADMIN -#endif -#endif - /* Override MAC access. The base kernel enforces no MAC policy. An LSM may enforce a MAC policy, and if it does and it chooses ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ploop: fix barriers for ordinary requests
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.17 --> commit c2247f374581173c476bf88e82b9650ba263b37d Author: Maxim PatlasovDate: Wed Jun 22 16:42:43 2016 +0400 ploop: fix barriers for ordinary requests The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA is completely wrong: to make sure that b1:FLUSH made effect we have to wait for its completion. Similarly, even if we're sure that FUA will be processed as post-FLUSH (also dubious!), we have to wait for completion b1..b4 to make sure that that flush will cover them. The patch fixes all these issues pretty simple: let's mark outgouing bio-s with FLUSH|FUA based on those flags in *corresponing* incoming bio-s. Signed-off-by: Maxim Patlasov Acked-by: Dmitry Monakhov khorenko@: v2 changes: Drop 2 hunks like: > - submit_bio(rw, b); > + submit_bio(rw | b->bi_rw, b); This is redundant ^ submit_bio looks like following: void submit_bio(int rw, struct bio *bio) { bio->bi_rw |= rw; ... --- drivers/block/ploop/dev.c | 1 - drivers/block/ploop/io_direct.c | 43 + 2 files changed, 13 insertions(+), 31 deletions(-) diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c index 2ef1449..6b5702f 100644 --- a/drivers/block/ploop/dev.c +++ b/drivers/block/ploop/dev.c @@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * bio, preq->req_sector = bio->bi_sector; preq->req_size = bio->bi_size >> 9; preq->req_rw = bio->bi_rw; - bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA); preq->eng_state = PLOOP_E_ENTRY; preq->state = 0; preq->error = 0; diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c index 6ef9cd8..50c0ed1 100644 --- a/drivers/block/ploop/io_direct.c +++ b/drivers/block/ploop/io_direct.c @@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, int preflush; int postfua = 0; int write = !!(rw & REQ_WRITE); - int bio_num; trace_submit(preq); @@ -233,13 +232,13 @@ flush_bio: goto flush_bio; } + bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA); bw.bv_off += copy; size -= copy >> 9; sec += copy >> 9; } ploop_extent_put(em); - bio_num = 0; while (bl.head) { struct bio * b = bl.head; unsigned long rw2 = rw; @@ -255,11 +254,10 @@ flush_bio: preflush = 0; } if (unlikely(postfua && !bl.head)) - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0)); + rw2 |= REQ_FUA; ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw); submit_bio(rw2, b); - bio_num++; } ploop_complete_io_request(preq); @@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq, sector_t sec, end_sec, nsec, start, end; struct bio_list_walk bw; int err; - int preflush = !!(preq->req_rw & REQ_FLUSH); bio_list_init(); @@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq, while (sec < end_sec) { struct page * page; unsigned int poff, plen; + bool zero_page; if (sec < start) { + zero_page = true; page = ZERO_PAGE(0); poff = 0; plen = start - sec; if (plen > (PAGE_SIZE>>9)) plen = (PAGE_SIZE>>9); } else if (sec >= end) { + zero_page = true; page = ZERO_PAGE(0); poff = 0; plen = end_sec - sec; @@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request * preq, } else { /* sec >= start && sec < end */ struct bio_vec * bv; + zero_page = false; if (sec == start) { bw.cur = sbl->head; @@ -672,6 +673,10 @@ flush_bio: goto flush_bio; } + /* Handle FLUSH here, dio_post_submit will handle FUA */ + if (!zero_page) + bio->bi_rw |= bw.cur->bi_rw & REQ_FLUSH; + bw.bv_off += (plen<<9);
[Devel] [PATCH RHEL7 COMMIT] ms/mm: memcontrol: reclaim when shrinking memory.high below usage
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.17 --> commit d1074e8489380ff3087edc2ca86e88a2fdb25b05 Author: Johannes WeinerDate: Wed Jun 22 16:15:13 2016 +0400 ms/mm: memcontrol: reclaim when shrinking memory.high below usage When setting memory.high below usage, nothing happens until the next charge comes along, and then it will only reclaim its own charge and not the now potentially huge excess of the new memory.high. This can cause groups to stay in excess of their memory.high indefinitely. To fix that, when shrinking memory.high, kick off a reclaim cycle that goes after the delta. https://jira.sw.ru/browse/PSBM-48546 Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Cc: Vladimir Davydov Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 588083bb37a3cea8533c392370a554417c8f29cb) Signed-off-by: Vladimir Davydov Conflicts: mm/memcontrol.c --- mm/memcontrol.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index de7c362..1f525f2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5314,7 +5314,7 @@ static int mem_cgroup_high_write(struct cgroup *cont, struct cftype *cft, const char *buffer) { struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); - unsigned long long val; + unsigned long long val, usage; int ret; ret = res_counter_memparse_write_strategy(buffer, ); @@ -5322,6 +5322,12 @@ static int mem_cgroup_high_write(struct cgroup *cont, struct cftype *cft, return ret; memcg->high = val; + + usage = res_counter_read_u64(>res, RES_USAGE); + if (usage > val) + try_to_free_mem_cgroup_pages(memcg, +(usage - val) >> PAGE_SHIFT, +GFP_KERNEL, false); return 0; } ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] rh/locks: check for fl->fl_owner != filp in show_fd_locks
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.17 --> commit 71684d5261234b7aa0287dd8fb99b06644147172 Author: Stanislav KinsburskiyDate: Wed Jun 22 16:07:48 2016 +0400 rh/locks: check for fl->fl_owner != filp in show_fd_locks NFS emulates flocks via posix lock on server and fl->fl_owner is set to filp. khorenko@: prior to this patch NFS shared locks were not shown in fdinfo and thus could not be migrated using CRIU. https://jira.sw.ru/browse/PSBM-48727 Signed-off-by: Stanislav Kinsburskiy --- fs/locks.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/locks.c b/fs/locks.c index cb7da61..a5ab0c0 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2503,6 +2503,7 @@ void show_fd_locks(struct seq_file *f, * matches ->fl_file. */ if (fl->fl_owner != files && + fl->fl_owner != (fl_owner_t)filp && fl->fl_owner != NULL) continue; ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ve/fs: namespace -- Ignore device permissions during restore
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.17 --> commit 7eeb5b4afa8db5a2f2e1e47ab6b84e55fc8c5661 Author: Cyrill GorcunovDate: Wed Jun 22 14:45:43 2016 +0400 ve/fs: namespace -- Ignore device permissions during restore To support several storage backends (ploops) inside container we've hacks in libvzctl which setup "old" permissions when restore procedure initiated. But the former idea was simply allow CRIU to do all the works and restore ploops mounts by its own (since CRIU fetches all mount options and such). For this sake we turn off mount options filtering provisionally if @is_pseudosuper is set, and CRIU restore mounts as regular ones. https://jira.sw.ru/browse/PSBM-48188 Signed-off-by: Cyrill Gorcunov CC: Igor Sukhih CC: Vladimir Davydov CC: Konstantin Khorenko --- fs/namespace.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index 4fb935a..3df0ac5 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1933,7 +1933,12 @@ again: if (devmnt->dev == dev) { err = ve_devmnt_check(data, devmnt->allowed_options); - if (!err && !remount) + /* +* In case of @is_pseudouser set, ie restore procedure, +* we don't check for allowed options filtering, since +* restore mode is special. +*/ + if ((ve->is_pseudosuper || !err) && !remount) err = ve_devmnt_insert(data, devmnt->hidden_options); break; ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ve: mark DEF_PERMS feature deprecated
The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.18.2.vz7.14.17 --> commit f4255eae32883b47b6f3027cc07e66d5b982df6a Author: Evgenii ShatokhinDate: Wed Jun 22 14:40:09 2016 +0400 ve: mark DEF_PERMS feature deprecated "def_perms" is not mentioned in the man pages for prlctl and vzctl. VE_FEATURE_DEF_PERMS is only used in the kernel code as a part of VE_FEATURES_DEF ("ve->features = VE_FEATURES_DEF;" in ve_create()). No code checks if the bit for this feature is set in ve->features. Let us mark this feature deprecated, similar to SYSFS and IPGRE features. https://jira.sw.ru/browse/PSBM-40280 Signed-off-by: Evgenii Shatokhin Reviewed-by: Kirill Tkhai --- include/uapi/linux/vzcalluser.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/vzcalluser.h b/include/uapi/linux/vzcalluser.h index 2b340cf..bc55bb3 100644 --- a/include/uapi/linux/vzcalluser.h +++ b/include/uapi/linux/vzcalluser.h @@ -115,7 +115,7 @@ struct env_create_param3 { #define VE_FEATURE_SYSFS (1ULL << 0) /* deprecated */ #define VE_FEATURE_NFS (1ULL << 1) -#define VE_FEATURE_DEF_PERMS (1ULL << 2) +#define VE_FEATURE_DEF_PERMS (1ULL << 2) /* deprecated */ #define VE_FEATURE_SIT (1ULL << 3) #define VE_FEATURE_IPIP (1ULL << 4) #define VE_FEATURE_PPP (1ULL << 5) ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 2/2] cgroup: un-export cgroup_kernel_* and zap cgroup_kernel_remove
After fairsched's gone, cgroup_kernel_remove is not used any more, so drop it. cgroup_kernel_* family of functions are now used only by beancounters, which is a part of the kernel, so un-export them. Signed-off-by: Vladimir Davydov--- include/linux/cgroup.h | 1 - kernel/cgroup.c| 26 -- 2 files changed, 27 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 730ca9091bfb..b34239dcdb52 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -55,7 +55,6 @@ struct cgroup *cgroup_kernel_lookup(struct vfsmount *mnt, const char *pathname); struct cgroup *cgroup_kernel_open(struct cgroup *parent, enum cgroup_open_flags flags, const char *name); -int cgroup_kernel_remove(struct cgroup *parent, const char *name); int cgroup_kernel_attach(struct cgroup *cgrp, struct task_struct *tsk); void cgroup_kernel_close(struct cgroup *cgrp); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 581924e7af9e..1c047b9bb1fb 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -5669,13 +5669,11 @@ struct vfsmount *cgroup_kernel_mount(struct cgroup_sb_opts *opts) { return kern_mount_data(_fs_type, opts); } -EXPORT_SYMBOL(cgroup_kernel_mount); struct cgroup *cgroup_get_root(struct vfsmount *mnt) { return mnt->mnt_root->d_fsdata; } -EXPORT_SYMBOL(cgroup_get_root); struct cgroup *cgroup_kernel_lookup(struct vfsmount *mnt, const char *pathname) @@ -5698,7 +5696,6 @@ struct cgroup *cgroup_kernel_lookup(struct vfsmount *mnt, path_put(); return cgrp; } -EXPORT_SYMBOL(cgroup_kernel_lookup); struct cgroup *cgroup_kernel_open(struct cgroup *parent, enum cgroup_open_flags flags, const char *name) @@ -5729,27 +5726,6 @@ out: mutex_unlock(>dentry->d_inode->i_mutex); return cgrp; } -EXPORT_SYMBOL(cgroup_kernel_open); - -int cgroup_kernel_remove(struct cgroup *parent, const char *name) -{ - struct dentry *dentry; - int ret; - - mutex_lock_nested(>dentry->d_inode->i_mutex, I_MUTEX_PARENT); - dentry = lookup_one_len(name, parent->dentry, strlen(name)); - ret = PTR_ERR(dentry); - if (IS_ERR(dentry)) - goto out; - ret = -ENOENT; - if (dentry->d_inode) - ret = vfs_rmdir(parent->dentry->d_inode, dentry); - dput(dentry); -out: - mutex_unlock(>dentry->d_inode->i_mutex); - return ret; -} -EXPORT_SYMBOL(cgroup_kernel_remove); int cgroup_kernel_attach(struct cgroup *cgrp, struct task_struct *tsk) { @@ -5761,7 +5737,6 @@ int cgroup_kernel_attach(struct cgroup *cgrp, struct task_struct *tsk) mutex_unlock(_mutex); return ret; } -EXPORT_SYMBOL(cgroup_kernel_attach); void cgroup_kernel_close(struct cgroup *cgrp) { @@ -5770,4 +5745,3 @@ void cgroup_kernel_close(struct cgroup *cgrp) check_for_release(cgrp); } } -EXPORT_SYMBOL(cgroup_kernel_close); -- 2.1.4 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7] Remove container and beancounter directories from /proc/vz
In PCS6, cgroups were mounted there. Now they are unused, as all cgroups are supposed to be mounted by systemd under /sys/fs/cgroup. Signed-off-by: Vladimir Davydov--- kernel/bc/proc.c| 1 - kernel/ve/veowner.c | 1 - 2 files changed, 2 deletions(-) diff --git a/kernel/bc/proc.c b/kernel/bc/proc.c index 3a3b4e3f28c8..9f60d9991e0a 100644 --- a/kernel/bc/proc.c +++ b/kernel/bc/proc.c @@ -754,7 +754,6 @@ static int __init ub_init_proc(void) entry = proc_create("user_beancounters", S_IRUSR|S_ISVTX, NULL, _file_operations); proc_create("vswap", S_IRUSR, proc_vz_dir, _vswap_fops); - proc_mkdir_mode("beancounter", 0, proc_vz_dir); return 0; } diff --git a/kernel/ve/veowner.c b/kernel/ve/veowner.c index 86065072a9ca..7642191bf517 100644 --- a/kernel/ve/veowner.c +++ b/kernel/ve/veowner.c @@ -41,7 +41,6 @@ static void prepare_proc(void) proc_vz_dir = proc_mkdir_mode("vz", S_ISVTX | S_IRUGO | S_IXUGO, NULL); if (!proc_vz_dir) panic("Can't create /proc/vz dir\n"); - proc_mkdir_mode("container", 0, proc_vz_dir); } #endif -- 2.1.4 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 1/2] ve: drop ve_cgroup_open and ve_cgroup_remove
Fairsched was the last user of these functions. After it's gone, we don't need them any longer. Signed-off-by: Vladimir Davydov--- include/linux/ve_proto.h | 2 -- kernel/ve/ve.c | 21 - 2 files changed, 23 deletions(-) diff --git a/include/linux/ve_proto.h b/include/linux/ve_proto.h index 8cc7fe3ba2a3..d2dc12d2f2c2 100644 --- a/include/linux/ve_proto.h +++ b/include/linux/ve_proto.h @@ -50,8 +50,6 @@ extern struct list_head ve_list_head; #define for_each_ve(ve)list_for_each_entry((ve), _list_head, ve_list) extern struct mutex ve_list_lock; extern struct ve_struct *get_ve_by_id(envid_t); -extern struct cgroup *ve_cgroup_open(struct cgroup *root, int flags, envid_t veid); -extern int ve_cgroup_remove(struct cgroup *root, envid_t veid); extern int nr_threads_ve(struct ve_struct *ve); diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c index 2459cb53a665..9995dbcd1623 100644 --- a/kernel/ve/ve.c +++ b/kernel/ve/ve.c @@ -156,27 +156,6 @@ const char *ve_name(struct ve_struct *ve) } EXPORT_SYMBOL(ve_name); -/* Cgroup must be closed with cgroup_kernel_close */ -struct cgroup *ve_cgroup_open(struct cgroup *root, int flags, envid_t veid) -{ - char name[16]; - struct cgroup *cgrp; - - snprintf(name, sizeof(name), "%u", veid); - cgrp = cgroup_kernel_open(root, flags, name); - return cgrp ? cgrp : ERR_PTR(-ENOENT); -} -EXPORT_SYMBOL(ve_cgroup_open); - -int ve_cgroup_remove(struct cgroup *root, envid_t veid) -{ - char name[16]; - - snprintf(name, sizeof(name), "%u", veid); - return cgroup_kernel_remove(root, name); -} -EXPORT_SYMBOL(ve_cgroup_remove); - /* under rcu_read_lock if task != current */ const char *task_ve_name(struct task_struct *task) { -- 2.1.4 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7] Drop CAP_VE_ADMIN and CAP_VE_NET_ADMIN
Not needed anymore as we use user ns for capability checking. Also, move capable_setveid() helper to ve.h so as not to pollute generic headers. Signed-off-by: Vladimir Davydov--- include/linux/ve.h | 3 +++ include/uapi/linux/capability.h | 55 - 2 files changed, 3 insertions(+), 55 deletions(-) diff --git a/include/linux/ve.h b/include/linux/ve.h index cea3a87cb9c0..247cadb78c06 100644 --- a/include/linux/ve.h +++ b/include/linux/ve.h @@ -138,6 +138,9 @@ struct ve_devmnt { #define VE_MEMINFO_DEFAULT 1 /* default behaviour */ #define VE_MEMINFO_SYSTEM 0 /* disable meminfo virtualization */ +#define capable_setveid() \ + (ve_is_super(get_exec_env()) && capable(CAP_SYS_ADMIN)) + extern int nr_ve; extern struct proc_dir_entry *proc_vz_dir; extern struct cgroup_subsys ve_subsys; diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index cadbfe6109e8..b3d37bb108b8 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -307,61 +307,6 @@ struct vfs_cap_data { #define CAP_SETFCAP 31 -#ifdef __KERNEL__ -/* - * Important note: VZ capabilities do intersect with CAP_AUDIT - * this is due to compatibility reasons. Nothing bad. - * Both VZ and Audit/SELinux caps are disabled in VPSs. - */ - -/* Allow access to all information. In the other case some structures will be - * hiding to ensure different Virtual Environment non-interaction on the same - * node (NOW OBSOLETED) - */ -#define CAP_SETVEID 29 - -#define capable_setveid() ({ \ - ve_is_super(get_exec_env()) && \ - (capable(CAP_SYS_ADMIN) || \ -capable(CAP_VE_ADMIN));\ - }) - -/* - * coinsides with CAP_AUDIT_CONTROL but we don't care, since - * audit is disabled in Virtuozzo - */ -#define CAP_VE_ADMIN30 - -#ifdef CONFIG_VE - -/* Replacement for CAP_NET_ADMIN: - delegated rights to the Virtual environment of its network administration. - For now the following rights have been delegated: - - Allow setting arbitrary process / process group ownership on sockets - Allow interface configuration - */ -#define CAP_VE_NET_ADMIN CAP_VE_ADMIN - -/* Replacement for CAP_SYS_ADMIN: - delegated rights to the Virtual environment of its administration. - For now the following rights have been delegated: - */ -/* Allow mount/umount/remount */ -/* Allow examination and configuration of disk quotas */ -/* Allow removing semaphores */ -/* Used instead of CAP_CHOWN to "chown" IPC message queues, semaphores - and shared memory */ -/* Allow locking/unlocking of shared memory segment */ -/* Allow forged pids on socket credentials passing */ - -#define CAP_VE_SYS_ADMIN CAP_VE_ADMIN -#else -#define CAP_VE_NET_ADMIN CAP_NET_ADMIN -#define CAP_VE_SYS_ADMIN CAP_SYS_ADMIN -#endif -#endif - /* Override MAC access. The base kernel enforces no MAC policy. An LSM may enforce a MAC policy, and if it does and it chooses -- 2.1.4 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH rh7] ploop: fix barriers for ordinary requests
Maxim Patlasovwrites: > The way how io_direct.c handles FLUSH|FUA: b1:FLUSH,b2,b3,b4,b5:FLUSH|FUA > is completely wrong: to make sure that b1:FLUSH made effect we have to > wait for its completion. Similarly, even if we're sure that FUA will be > processed as post-FLUSH (also dubious!), we have to wait for completion > b1..b4 to make sure that that flush will cover them. > > The patch fixes all these issues pretty simple: let's mark outgouing > bio-s with FLUSH|FUA based on those flags in *corresponing* incoming > bio-s. > > Signed-off-by: Maxim Patlasov > --- > drivers/block/ploop/dev.c |1 - > drivers/block/ploop/io_direct.c | 47 > --- > 2 files changed, 15 insertions(+), 33 deletions(-) > > diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c > index 2ef1449..6b5702f 100644 > --- a/drivers/block/ploop/dev.c > +++ b/drivers/block/ploop/dev.c > @@ -498,7 +498,6 @@ ploop_bio_queue(struct ploop_device * plo, struct bio * > bio, > preq->req_sector = bio->bi_sector; > preq->req_size = bio->bi_size >> 9; > preq->req_rw = bio->bi_rw; > - bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA); Wow. I can't even imagine that we clear barrier flags from original bios > preq->eng_state = PLOOP_E_ENTRY; > preq->state = 0; > preq->error = 0; > diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c > index 6ef9cd8..84c9a48 100644 > --- a/drivers/block/ploop/io_direct.c > +++ b/drivers/block/ploop/io_direct.c > @@ -92,7 +92,6 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq, > int preflush; > int postfua = 0; > int write = !!(rw & REQ_WRITE); > - int bio_num; > > trace_submit(preq); > > @@ -233,13 +232,13 @@ flush_bio: > goto flush_bio; > } > > + bio->bi_rw |= bw.cur->bi_rw & (REQ_FLUSH | REQ_FUA); > bw.bv_off += copy; > size -= copy >> 9; > sec += copy >> 9; > } > ploop_extent_put(em); > > - bio_num = 0; > while (bl.head) { > struct bio * b = bl.head; > unsigned long rw2 = rw; > @@ -255,11 +254,10 @@ flush_bio: > preflush = 0; > } > if (unlikely(postfua && !bl.head)) > - rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0)); > + rw2 |= REQ_FUA; > > ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw); > - submit_bio(rw2, b); > - bio_num++; > + submit_bio(rw2 | b->bi_rw, b); > } > > ploop_complete_io_request(preq); > @@ -567,7 +565,6 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request > * preq, > sector_t sec, end_sec, nsec, start, end; > struct bio_list_walk bw; > int err; > - int preflush = !!(preq->req_rw & REQ_FLUSH); > > bio_list_init(); > > @@ -598,14 +595,17 @@ dio_submit_pad(struct ploop_io *io, struct > ploop_request * preq, > while (sec < end_sec) { > struct page * page; > unsigned int poff, plen; > + bool zero_page; > > if (sec < start) { > + zero_page = true; > page = ZERO_PAGE(0); > poff = 0; > plen = start - sec; > if (plen > (PAGE_SIZE>>9)) > plen = (PAGE_SIZE>>9); > } else if (sec >= end) { > + zero_page = true; > page = ZERO_PAGE(0); > poff = 0; > plen = end_sec - sec; > @@ -614,6 +614,7 @@ dio_submit_pad(struct ploop_io *io, struct ploop_request > * preq, > } else { > /* sec >= start && sec < end */ > struct bio_vec * bv; > + zero_page = false; > > if (sec == start) { > bw.cur = sbl->head; > @@ -672,6 +673,10 @@ flush_bio: > goto flush_bio; > } > > + /* Handle FLUSH here, dio_post_submit will handle FUA */ > + if (!zero_page) > + bio->bi_rw |= bw.cur->bi_rw & REQ_FLUSH; > + > bw.bv_off += (plen<<9); > BUG_ON(plen == 0); > sec += plen; > @@ -688,13 +693,9 @@ flush_bio: > b->bi_private = preq; > b->bi_end_io = dio_endio_async; > > - rw = sbl->head->bi_rw | WRITE; > - if (unlikely(preflush)) { > - rw |= REQ_FLUSH; > - preflush = 0; > - } > + rw = preq->req_rw & ~(REQ_FLUSH | REQ_FUA); > ploop_acc_ff_out(preq->plo, rw | b->bi_rw); > - submit_bio(rw, b); > + submit_bio(rw |