[Devel] [PATCH rh7] overlayfs: fix dentry reference leak
Without this patch it is easy to crash node by fiddling with overlayfs dirs. Backport commit ab79efab0 from ms: From: David HowellsIn ovl_copy_up_locked(), newdentry is leaked if the function exits through out_cleanup as this just to out after calling ovl_cleanup() - which doesn't actually release the ref on newdentry. The out_cleanup segment should instead exit through out2 as certainly newdentry leaks - and possibly upper does also, though this isn't caught given the catch of newdentry. Without this fix, something like the following is seen: BUG: Dentry 880023e9eb20{i=f861,n=#880023e82d90} still in use (1) [unmount of tmpfs tmpfs] BUG: Dentry 880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs] when unmounting the upper layer after an error occurred in copyup. An error can be induced by creating a big file in a lower layer with something like: dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000)) to create a large file (4.1G). Overlay an upper layer that is too small (on tmpfs might do) and then induce a copy up by opening it writably. Reported-by: Ulrich Obergfell Signed-off-by: David Howells Signed-off-by: Miklos Szeredi https://jira.sw.ru/browse/PSBM-47981 --- fs/overlayfs/copy_up.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 3f3d1b0..afed35c 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -299,7 +299,7 @@ out: out_cleanup: ovl_cleanup(wdir, newdentry); - goto out; + goto out2; } /* ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH rh7] ext4: ext4_mkdir must set S_IOPS_WRAPPER bit
Kostya, ms is not affected, RedHat bz ticket: https://bugzilla.redhat.com/show_bug.cgi?id=1361682 On 07/29/2016 08:15 AM, Konstantin Khorenko wrote: Maxim, will you send the patch to mainstream as well? -- Best regards, Konstantin Khorenko, Virtuozzo Linux Kernel Team On 07/26/2016 12:01 AM, Maxim Patlasov wrote: ext4_iget() sets this bit for directories. Let's do the same in ext4_mkdir(). Otherwise, the behaviour of vfs_rename (on top of ext4) varies depending on how the in-core inode was born: via lookup or mkdir. The key place in vfs_rename sensible to the change is: if (flags && !rename2) return -EINVAL; Signed-off-by: Maxim Patlasov--- fs/ext4/namei.c |1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 0adc6df..bebe698 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2413,6 +2413,7 @@ retry: inode->i_op = _dir_inode_operations.ops; inode->i_fop = _dir_operations; +inode->i_flags |= S_IOPS_WRAPPER; err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; . ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [NEW KERNEL] 3.10.0-327.22.2.vz7.16.2 (rhel7)
Changelog: OpenVZ kernel rh7-3.10.0-327.22.2.vz7.16.2 * kernel.spec: returns back build of kernel headers and debug kernels by default * ext4: set S_IOPS_WRAPPER inode flag on directory creation via "mkdir" * net: "bridge" CT feature must control creation of briges inside a Container in both ways: via ioctl and via netlink Generated changelog: * Fri Jul 29 2016 Konstantin Khorenko[3.10.0-327.22.2.vz7.16.2] - ve/bridge: br_dev_init: check if "bridge" feature is enabled (Evgenii Shatokhin) [PSBM-50009] - ext4: ext4_mkdir must set S_IOPS_WRAPPER bit (Maxim Patlasov) Built packages: http://kojistorage.eng.sw.ru/packages/vzkernel/3.10.0/327.22.2.vz7.16.2/ ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ve/bridge: br_dev_init: check if "bridge" feature is enabled
The commit is pushed to "branch-rh7-3.10.0-327.22.2.vz7.16.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.22.2.vz7.16.1 --> commit 420fc7211bffd87d83cd4c8877ea446d9bc9222a Author: Evgenii ShatokhinDate: Fri Jul 29 19:16:34 2016 +0400 ve/bridge: br_dev_init: check if "bridge" feature is enabled Currently, the feature is checked in br_ioctl_deviceless_stub() which is called when "brctl addbr" runs. However, "ip link add br1 type bridge" goes a different path and still succeeds even if the feature is disabled for a CT: rtnl_newlink rtnl_create_link br_dev_setup register_netdevice br_dev_init ... Let us check the "bridge" feature in br_dev_init() instead, to cover both cases. https://jira.sw.ru/browse/PSBM-50009 Signed-off-by: Evgenii Shatokhin Acked-by: Kirill Tkhai --- net/bridge/br_device.c | 4 net/bridge/br_ioctl.c | 3 --- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 5e3347b..db206a3 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -88,8 +88,12 @@ out: static int br_dev_init(struct net_device *dev) { struct net_bridge *br = netdev_priv(dev); + struct net *net = dev_net(dev); int err; + if (!(net->owner_ve->features & VE_FEATURE_BRIDGE)) + return -EACCES; + br->stats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); if (!br->stats) return -ENOMEM; diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c index 98447b8..cd8c3a4 100644 --- a/net/bridge/br_ioctl.c +++ b/net/bridge/br_ioctl.c @@ -351,9 +351,6 @@ static int old_deviceless(struct net *net, void __user *uarg) int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *uarg) { - if (!(net->owner_ve->features & VE_FEATURE_BRIDGE)) - return -ENOTTY; - switch (cmd) { case SIOCGIFBR: case SIOCSIFBR: ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ext4: ext4_mkdir must set S_IOPS_WRAPPER bit
The commit is pushed to "branch-rh7-3.10.0-327.22.2.vz7.16.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-327.22.2.vz7.16.1 --> commit e8421e9d99ccbc3c8d2b3e79e1ebf3c70f9ec43c Author: Maxim PatlasovDate: Fri Jul 29 19:16:33 2016 +0400 ext4: ext4_mkdir must set S_IOPS_WRAPPER bit ext4_iget() sets this bit for directories. Let's do the same in ext4_mkdir(). Otherwise, the behaviour of vfs_rename (on top of ext4) varies depending on how the in-core inode was born: via lookup or mkdir. The key place in vfs_rename sensible to the change is: > if (flags && !rename2) > return -EINVAL; Signed-off-by: Maxim Patlasov --- fs/ext4/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 0adc6df..bebe698 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2413,6 +2413,7 @@ retry: inode->i_op = _dir_inode_operations.ops; inode->i_fop = _dir_operations; + inode->i_flags |= S_IOPS_WRAPPER; err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH rh7] ext4: ext4_mkdir must set S_IOPS_WRAPPER bit
Maxim, will you send the patch to mainstream as well? -- Best regards, Konstantin Khorenko, Virtuozzo Linux Kernel Team On 07/26/2016 12:01 AM, Maxim Patlasov wrote: ext4_iget() sets this bit for directories. Let's do the same in ext4_mkdir(). Otherwise, the behaviour of vfs_rename (on top of ext4) varies depending on how the in-core inode was born: via lookup or mkdir. The key place in vfs_rename sensible to the change is: if (flags && !rename2) return -EINVAL; Signed-off-by: Maxim Patlasov--- fs/ext4/namei.c |1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 0adc6df..bebe698 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2413,6 +2413,7 @@ retry: inode->i_op = _dir_inode_operations.ops; inode->i_fop = _dir_operations; + inode->i_flags |= S_IOPS_WRAPPER; err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; . ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
Re: [Devel] [PATCH rh7 3/3] ploop: io_direct: delay f_op->fsync() until index_update for reloc requests (v3)
Maxim Patlasovwrites: > Dima, > > > One week elapsed, still no feedback from you. Do you have something > against this patch? Sorry for delay Max. I was overloaded by pended crap I've collected before vacations, and lost your email. Again sorry. Whole patch looks good. Thank you for your rede BTW: We defenitely need regression testing for original bug (broken barries and others). I'm working on that. > > > Thanks, > > Maxim > > > On 07/20/2016 11:21 PM, Maxim Patlasov wrote: >> Commit 9f860e606 introduced an engine to delay fsync: doing >> fallocate(FALLOC_FL_CONVERT_UNWRITTEN) dio_post_submit marks >> io as PLOOP_IO_FSYNC_DELAYED to ensure that fsync happens >> later, when incoming FLUSH|FUA comes. >> >> That was deemed as important because (PSBM-47026): >> >>> This optimization becomes more important due to the fact that customers >>> tend to use pcompact heavily => ploop images grow each day. >> Now, we can easily re-use the engine to delay fsync for reloc >> requests as well. As explained in the description of commit >> 5aa3fe09: >> >>> 1->read_data_from_old_post >>> 2->write_to_new_pos >>>->sumbit_alloc >>> ->submit_pad >>> ->post_submit->convert_unwritten >>> 3->update_index ->write_page with FLUSH|FUA >>> 4->nullify_old_pos >>> 5->issue_flush >> by the time of step 3 extent coversion is not yet stable because >> belongs to uncommitted transaction. But instead of doing fsync >> inside ->post_submit, we can fsync later, as the very first step >> of write_page for index_update. >> >> Changed in v2: >> - process delayed fsync asynchronously, via PLOOP_E_FSYNC_PENDED eng_state >> >> Changed in v3: >> - use extra arg for ploop_index_wb_proceed_or_delay() instead of ad-hoc >> PLOOP_REQ_FSYNC_IF_DELAYED >> >> https://jira.sw.ru/browse/PSBM-47026 >> >> Signed-off-by: Maxim Patlasov >> --- >> drivers/block/ploop/dev.c |9 +++-- >> drivers/block/ploop/map.c | 32 >> include/linux/ploop/ploop.h |1 + >> 3 files changed, 36 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c >> index df3eec9..ed60b1f 100644 >> --- a/drivers/block/ploop/dev.c >> +++ b/drivers/block/ploop/dev.c >> @@ -2720,6 +2720,11 @@ restart: >> ploop_index_wb_complete(preq); >> break; >> >> +case PLOOP_E_FSYNC_PENDED: >> +/* fsync done */ >> +ploop_index_wb_proceed(preq); >> +break; >> + >> default: >> BUG(); >> } >> @@ -4106,7 +4111,7 @@ static void ploop_relocate(struct ploop_device * plo) >> preq->bl.tail = preq->bl.head = NULL; >> preq->req_cluster = 0; >> preq->req_size = 0; >> -preq->req_rw = WRITE_SYNC|REQ_FUA; >> +preq->req_rw = WRITE_SYNC; >> preq->eng_state = PLOOP_E_ENTRY; >> preq->state = (1 << PLOOP_REQ_SYNC) | (1 << PLOOP_REQ_RELOC_A); >> preq->error = 0; >> @@ -4410,7 +4415,7 @@ static void ploop_relocblks_process(struct >> ploop_device *plo) >> preq->bl.tail = preq->bl.head = NULL; >> preq->req_cluster = ~0U; /* uninitialized */ >> preq->req_size = 0; >> -preq->req_rw = WRITE_SYNC|REQ_FUA; >> +preq->req_rw = WRITE_SYNC; >> preq->eng_state = PLOOP_E_ENTRY; >> preq->state = (1 << PLOOP_REQ_SYNC) | (1 << PLOOP_REQ_RELOC_S); >> preq->error = 0; >> diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c >> index 5f7fd66..715dc15 100644 >> --- a/drivers/block/ploop/map.c >> +++ b/drivers/block/ploop/map.c >> @@ -915,6 +915,24 @@ void ploop_index_wb_proceed(struct ploop_request * preq) >> put_page(page); >> } >> >> +static void ploop_index_wb_proceed_or_delay(struct ploop_request * preq, >> +int do_fsync_if_delayed) >> +{ >> +if (do_fsync_if_delayed) { >> +struct map_node * m = preq->map; >> +struct ploop_delta * top_delta = map_top_delta(m->parent); >> +struct ploop_io * top_io = _delta->io; >> + >> +if (test_bit(PLOOP_IO_FSYNC_DELAYED, _io->io_state)) { >> +preq->eng_state = PLOOP_E_FSYNC_PENDED; >> +ploop_add_req_to_fsync_queue(preq); >> +return; >> +} >> +} >> + >> +ploop_index_wb_proceed(preq); >> +} >> + >> /* Data write is commited. Now we need to update index. */ >> >> void ploop_index_update(struct ploop_request * preq) >> @@ -927,6 +945,7 @@ void ploop_index_update(struct ploop_request * preq) >> int old_level; >> struct page * page; >> unsigned long state = READ_ONCE(preq->state); >> +int do_fsync_if_delayed = 0; >> >> /* No way back, we are going to initiate index write. */ >> >> @@ -985,10 +1004,12 @@ void ploop_index_update(struct ploop_request * preq)