[Devel] [PATCH RHEL8 COMMIT] configs: Set overlayfs nfs_export option to true

2020-09-23 Thread Konstantin Khorenko
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear 
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-193.6.3.vz8.4.7
-->
commit f8e1f0f833f17767764b2fc2c7e2e4f8cb55d95e
Author: Valeriy Vdovin 
Date:   Wed Jul 29 10:13:47 2020 +0300

configs: Set overlayfs nfs_export option to true

+CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_NFS_EXPORT=y

Docker uses overlayfs. Lately, I've been testing checkpoint/restore of
fs notifications that were setup on top of overlayfs. I've found out
that criu is not able to dump opened file descriptors of fanotify when
nfs_export parameter is off.

The details of that problem: at checkpoint/dump criu wants to dump
fanotify fd and uses 'open_by_handle_at' with fhandle of the subject
descriptor. Kernel needs to decode fhandle to convert it to inode, for
that it uses mnt->mnt_sb->s_export_op->fh_to_dentry.

For overlayfs mount s_export_op is only filled with valid exportfs
function if nfs_export flag is true. nfs_export in its turn depends on
index=on option.

One way to enable them is to extend mount options with string
"nfs_export=on,index=on" during call to mount. Another way which we
discussed - is to tune defaults for both values to true.

https://jira.sw.ru/browse/PSBM-104961

Signed-off-by: Valeriy Vdovin 
---
 configs/kernel-4.18.0-x86_64-KVM-minimal.config | 3 ++-
 configs/kernel-4.18.0-x86_64-debug.config   | 3 +++
 configs/kernel-4.18.0-x86_64.config | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/configs/kernel-4.18.0-x86_64-KVM-minimal.config 
b/configs/kernel-4.18.0-x86_64-KVM-minimal.config
index 42b7cd818bb6..cb2562cf64f5 100644
--- a/configs/kernel-4.18.0-x86_64-KVM-minimal.config
+++ b/configs/kernel-4.18.0-x86_64-KVM-minimal.config
@@ -4029,7 +4029,8 @@ CONFIG_FUSE_KIO_PCS=y
 CONFIG_OVERLAY_FS=y
 # CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
 CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
-# CONFIG_OVERLAY_FS_INDEX is not set
+CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_NFS_EXPORT=y
 # CONFIG_OVERLAY_FS_XINO_AUTO is not set
 # CONFIG_OVERLAY_FS_METACOPY is not set
 CONFIG_OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS=y
diff --git a/configs/kernel-4.18.0-x86_64-debug.config 
b/configs/kernel-4.18.0-x86_64-debug.config
index 5db56c96b8da..9f576da4cff7 100644
--- a/configs/kernel-4.18.0-x86_64-debug.config
+++ b/configs/kernel-4.18.0-x86_64-debug.config
@@ -7730,6 +7730,9 @@ CONFIG_LEGACY_PTY_COUNT=12
 CONFIG_LSM_MMAP_MIN_ADDR=69632
 CONFIG_DEFAULT_MMAP_MIN_ADDR=69632
 
+CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_NFS_EXPORT=y
+
 CONFIG_PANIC_ON_OOPS=y
 CONFIG_PANIC_ON_OOPS_VALUE=1
 
diff --git a/configs/kernel-4.18.0-x86_64.config 
b/configs/kernel-4.18.0-x86_64.config
index 90347efc7a93..504b5734aa12 100644
--- a/configs/kernel-4.18.0-x86_64.config
+++ b/configs/kernel-4.18.0-x86_64.config
@@ -7685,6 +7685,9 @@ CONFIG_LEGACY_PTY_COUNT=12
 CONFIG_LSM_MMAP_MIN_ADDR=69632
 CONFIG_DEFAULT_MMAP_MIN_ADDR=69632
 
+CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_NFS_EXPORT=y
+
 #
 # OpenVZ
 #
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL8 COMMIT] net/mlx5: suppress high order allocation

2020-09-23 Thread Konstantin Khorenko
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear 
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-193.6.3.vz8.4.7
-->
commit 6b685433bed93f3f826ff45ba91979078c00e423
Author: Vasily Averin 
Date:   Wed Jul 29 09:39:06 2020 +0300

net/mlx5: suppress high order allocation

v2: added lost kvfree()

WARNING: CPU: 0 PID: 9668 at mm/page_alloc.c:3548 
__alloc_pages_nodemask+0x1b1/0x600
order 4 >= 3, gfp 0xc0d0
CPU: 0 PID: 9668 Comm: ip ve: 0 Not tainted 3.10.0-1127.8.2.vz7.158.3 #1 
158.3
Call Trace:
 [] dump_stack+0x19/0x1b
 [] __warn+0xd8/0x100
 [] warn_slowpath_fmt+0x5f/0x80
 [] __alloc_pages_nodemask+0x1b1/0x600
 [] alloc_pages_current+0x98/0x110
 [] kmalloc_order+0x18/0x40
 [] kmalloc_order_trace+0x26/0xa0
 [] __kmalloc+0x281/0x2a0
 [] mlx5_frag_buf_alloc_node+0x61/0x320 [mlx5_core]
 [] mlx5_cqwq_create+0xaa/0x1c0 [mlx5_core]
 [] mlx5e_alloc_cq_common+0x7a/0x130 [mlx5_core]
 [] mlx5e_open_cq.isra.50+0x81/0x100 [mlx5_core]
 [] mlx5e_open_channels+0x49e/0xd60 [mlx5_core]
 [] mlx5e_open_locked+0x2d/0xb0 [mlx5_core]
 [] mlx5e_open+0x28/0xb0 [mlx5_core]
 [] __dev_open+0xd2/0x150
 [] __dev_change_flags+0xa3/0x180
 [] dev_change_flags+0x29/0x60
 [] do_setlink+0x385/0xe50
 [] rtnl_newlink+0x532/0x890
 [] rtnetlink_rcv_msg+0xc5/0x280
 [] netlink_rcv_skb+0xab/0xc0
 [] rtnetlink_rcv+0x28/0x30
 [] netlink_unicast+0x1bc/0x240
 [] netlink_sendmsg+0x34e/0x460
 [] sock_sendmsg+0xb0/0xf0
 [] ___sys_sendmsg+0x3e9/0x400
 [] __sys_sendmsg+0x51/0x90
 [] SyS_sendmsg+0x12/0x20
 [] system_call_fastpath+0x25/0x2a

https://pmc.acronis.com/browse/VSTOR-35452
Signed-off-by: Vasily Averin 
---
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
index c4179dc8c335..a900d9d0b9d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
@@ -126,8 +126,8 @@ int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int 
size,
buf->size = size;
buf->npages = DIV_ROUND_UP(size, PAGE_SIZE);
buf->page_shift = PAGE_SHIFT;
-   buf->frags = kcalloc(buf->npages, sizeof(struct mlx5_buf_list),
-GFP_KERNEL);
+   buf->frags = kvcalloc(buf->npages, sizeof(struct mlx5_buf_list),
+ GFP_KERNEL);
if (!buf->frags)
goto err_out;
 
@@ -155,7 +155,7 @@ int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int 
size,
while (i--)
dma_free_coherent(dev->device, PAGE_SIZE, buf->frags[i].buf,
  buf->frags[i].map);
-   kfree(buf->frags);
+   kvfree(buf->frags);
 err_out:
return -ENOMEM;
 }
@@ -173,7 +173,7 @@ void mlx5_frag_buf_free(struct mlx5_core_dev *dev, struct 
mlx5_frag_buf *buf)
  buf->frags[i].map);
size -= frag_sz;
}
-   kfree(buf->frags);
+   kvfree(buf->frags);
 }
 EXPORT_SYMBOL_GPL(mlx5_frag_buf_free);
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 2/2] ploop: Wait till fsync_thread goes to sleep on ploop_quiesce()

2020-09-23 Thread Kirill Tkhai
fsync_thread not only makes sync(), but it also may modify data.
See in kaio_fsync_thread: kaio_resubmit() submits writes, and
this is not agreed with another points of synchronizations.
This is a problem on backup (but not only), as ploop_quiesce()
never waits for fsync_thread, and we may see unexpected data
changes.

This patch makes ploop_quiesce() also wait for fsync_thread:
till last preq was submitted.

https://jira.sw.ru/browse/PSBM-108101

Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/dev.c   |   30 ++
 drivers/block/ploop/io_direct.c |9 -
 drivers/block/ploop/io_kaio.c   |   23 +--
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index ac4d142197d9..097ae6c73aac 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -224,6 +224,23 @@ static int disable_and_wait_discard(struct ploop_device 
*plo)
return ret;
 }
 
+static bool has_pending_fsync_reqs(struct ploop_device *plo, struct ploop_io 
*io)
+{
+   unsigned long flags;
+   bool ret;
+
+   spin_lock_irqsave(>lock, flags);
+   ret = !list_empty(>fsync_queue);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return ret;
+}
+
+static void wait_fsync_thread(struct ploop_device *plo, struct ploop_io *io)
+{
+   wait_event(plo->waitq, !has_pending_fsync_reqs(plo, io));
+}
+
 static void ploop_init_request(struct ploop_request *preq)
 {
preq->eng_state = PLOOP_E_ENTRY;
@@ -3602,6 +3619,7 @@ void ploop_quiesce(struct ploop_device * plo)
 {
struct completion qcomp;
struct ploop_request * preq;
+   struct ploop_io *io;
 
if (!test_bit(PLOOP_S_RUNNING, >state))
return;
@@ -3628,6 +3646,18 @@ void ploop_quiesce(struct ploop_device * plo)
 
wait_fast_path_reqs(plo);
wait_for_completion();
+
+   /*
+* Main thread is sleeping, so no one may submit a new preq
+* for fsync_thread, except delayed timer. Let it fire.
+*/
+   io = _top_delta(plo)->io;
+   if (io->fsync_timer.function) {
+   del_timer_sync(>fsync_timer);
+   io->fsync_timer.function(io->fsync_timer.data);
+   }
+   wait_fsync_thread(plo, io);
+
plo->quiesce_comp = NULL;
 }
 EXPORT_SYMBOL(ploop_quiesce);
diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
index 4f82b8b5ce36..f67ce47e0562 100644
--- a/drivers/block/ploop/io_direct.c
+++ b/drivers/block/ploop/io_direct.c
@@ -785,8 +785,9 @@ static int dio_fsync_thread(void * data)
 
spin_lock_irq(>lock);
while (!kthread_should_stop() || !list_empty(>fsync_queue)) {
-   int err;
+   struct list_head fake_entry;
LIST_HEAD(list);
+   int err;
 
DEFINE_WAIT(_wait);
for (;;) {
@@ -806,6 +807,11 @@ static int dio_fsync_thread(void * data)
 
INIT_LIST_HEAD();
list_splice_init(>fsync_queue, );
+   /*
+* Not empty io->fsync_queue means fsync_thread has
+* pending work. See ploop_quiesce() for details.
+*/
+   list_add(_entry, >fsync_queue);
io_count = io->io_count;
spin_unlock_irq(>lock);
 
@@ -822,6 +828,7 @@ static int dio_fsync_thread(void * data)
 */
 
spin_lock_irq(>lock);
+   list_del(_entry);
 
if (io_count == io->io_count && !(io_count & 1))
clear_bit(PLOOP_IO_FSYNC_DELAYED, >io_state);
diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
index c2b7e8feaedf..2c2fb90d2b53 100644
--- a/drivers/block/ploop/io_kaio.c
+++ b/drivers/block/ploop/io_kaio.c
@@ -497,8 +497,10 @@ static int kaio_fsync_thread(void * data)
 
spin_lock_irq(>lock);
while (!kthread_should_stop() || !list_empty(>fsync_queue)) {
+   struct list_head fake_entry;
+   struct ploop_request *preq;
+   bool skip_wakeup = false;
int err;
-   struct ploop_request * preq;
 
DEFINE_WAIT(_wait);
for (;;) {
@@ -518,6 +520,12 @@ static int kaio_fsync_thread(void * data)
 
preq = list_entry(io->fsync_queue.next, struct ploop_request, 
list);
list_del(>list);
+   /*
+* Not empty io->fsync_queue means fsync_thread has pending 
work.
+* See ploop_quiesce() for details.
+*/
+   list_add(_entry, >fsync_queue);
+
io->fsync_qlen--;
if (!preq->prealloc_size)
plo->st.bio_fsync++;
@@ -557,20 +565,22 @@ static int kaio_fsync_thread(void * data)
preq->req_rw &= ~REQ_FLUSH;
/*

[Devel] [PATCH RH7 1/2] ploop: Rename label in kaio_fsync_thread()

2020-09-23 Thread Kirill Tkhai
Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/io_kaio.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
index a60b18742dd5..c2b7e8feaedf 100644
--- a/drivers/block/ploop/io_kaio.c
+++ b/drivers/block/ploop/io_kaio.c
@@ -535,7 +535,7 @@ static int kaio_fsync_thread(void * data)
 */
WARN_ON_ONCE(io->prealloced_size);
preq->prealloc_size = isize;
-   goto out;
+   goto ready;
}
err = __kaio_truncate(io, io->files.file,
  preq->prealloc_size);
@@ -566,7 +566,7 @@ static int kaio_fsync_thread(void * data)
}
}
}
-out:
+ready:
spin_lock_irq(>lock);
list_add_tail(>list, >ready_queue);
 


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ovl: introduce new "index=nouuid" option for inodes index feature

2020-09-23 Thread Pavel Tikhomirov
This relaxes uuid checks for overlay index feature. It is only possible
in case there is only one filesystem for all the work/upper/lower
directories and bare file handles from this backing filesystem are uniq.
In case we have multiple filesystems here just fall back to normal
"index=on".

This is needed when overlayfs is/was mounted in a container with
index enabled (e.g.: to be able to resolve inotify watch file handles on
it to paths in CRIU), and this container is copied and started alongside
with the original one. This way the "copy" container can't have the same
uuid on the superblock and mounting the overlayfs from it later would
fail. (see https://jira.sw.ru/browse/PSBM-11961 for more info why it is
needed)

https://jira.sw.ru/browse/PSBM-108115

To mainstream:
https://lkml.org/lkml/2020/9/23/565

Signed-off-by: Pavel Tikhomirov 
---
 .../kernel-3.10.0-x86_64-debug-minimal.config |  1 +
 configs/kernel-3.10.0-x86_64-debug.config |  1 +
 configs/kernel-3.10.0-x86_64-minimal.config   |  1 +
 configs/kernel-3.10.0-x86_64.config   |  1 +
 fs/overlayfs/Kconfig  | 16 ++
 fs/overlayfs/export.c |  6 ++-
 fs/overlayfs/namei.c  | 35 
 fs/overlayfs/overlayfs.h  | 23 +---
 fs/overlayfs/ovl_entry.h  |  2 +-
 fs/overlayfs/super.c  | 54 ++-
 10 files changed, 106 insertions(+), 34 deletions(-)

diff --git a/configs/kernel-3.10.0-x86_64-debug-minimal.config 
b/configs/kernel-3.10.0-x86_64-debug-minimal.config
index b7fc84975c5e..6d0fce8947bb 100644
--- a/configs/kernel-3.10.0-x86_64-debug-minimal.config
+++ b/configs/kernel-3.10.0-x86_64-debug-minimal.config
@@ -3792,6 +3792,7 @@ CONFIG_OVERLAY_FS=y
 # CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
 CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
 CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_INDEX_NOUUID=y
 CONFIG_OVERLAY_FS_NFS_EXPORT=y
 # CONFIG_OVERLAY_FS_XINO_AUTO is not set
 CONFIG_OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS=y
diff --git a/configs/kernel-3.10.0-x86_64-debug.config 
b/configs/kernel-3.10.0-x86_64-debug.config
index f1f83f69aa0c..8b77ad652b23 100644
--- a/configs/kernel-3.10.0-x86_64-debug.config
+++ b/configs/kernel-3.10.0-x86_64-debug.config
@@ -5784,6 +5784,7 @@ CONFIG_OVERLAY_FS=m
 # CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
 # CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
 CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_INDEX_NOUUID=y
 CONFIG_OVERLAY_FS_NFS_EXPORT=y
 # CONFIG_OVERLAY_FS_XINO_AUTO is not set
 CONFIG_GENERIC_ACL=y
diff --git a/configs/kernel-3.10.0-x86_64-minimal.config 
b/configs/kernel-3.10.0-x86_64-minimal.config
index 8dc256b1e4bc..879a9ad21c96 100644
--- a/configs/kernel-3.10.0-x86_64-minimal.config
+++ b/configs/kernel-3.10.0-x86_64-minimal.config
@@ -3793,6 +3793,7 @@ CONFIG_OVERLAY_FS=y
 # CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
 CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW=y
 CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_INDEX_NOUUID=y
 CONFIG_OVERLAY_FS_NFS_EXPORT=y
 # CONFIG_OVERLAY_FS_XINO_AUTO is not set
 CONFIG_OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS=y
diff --git a/configs/kernel-3.10.0-x86_64.config 
b/configs/kernel-3.10.0-x86_64.config
index 8e285b0d31c7..8b91f73c5484 100644
--- a/configs/kernel-3.10.0-x86_64.config
+++ b/configs/kernel-3.10.0-x86_64.config
@@ -5783,6 +5783,7 @@ CONFIG_OVERLAY_FS=m
 # CONFIG_OVERLAY_FS_REDIRECT_DIR is not set
 # CONFIG_OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW is not set
 CONFIG_OVERLAY_FS_INDEX=y
+CONFIG_OVERLAY_FS_INDEX_NOUUID=y
 CONFIG_OVERLAY_FS_NFS_EXPORT=y
 # CONFIG_OVERLAY_FS_XINO_AUTO is not set
 CONFIG_GENERIC_ACL=y
diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index 32e8195a0cbe..05dc2906d60d 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -60,6 +60,22 @@ config OVERLAY_FS_INDEX
 
  If unsure, say N.
 
+config OVERLAY_FS_INDEX_NOUUID
+   bool "Overlayfs: relax uuid checks of inodes index feature"
+   depends on OVERLAY_FS
+   depends on OVERLAY_FS_INDEX
+   help
+ If this config option is enabled then overlay will skip uuid checks
+ for index lower to upper inode map, this only can be done if all
+ upper and lower directories are on the same filesystem where basic
+ fhandles are uniq.
+
+ It is needed to overcome possible change of uuid on superblock of the
+ backing filesystem, e.g. when you copied the virtual disk and mount
+ both the copy of the disk and the original one at the same time.
+
+ If unsure, say N.
+
 config OVERLAY_FS_NFS_EXPORT
bool "Overlayfs: turn on NFS export feature by default"
depends on OVERLAY_FS
diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
index 03acd05ba642..c32ba29c86fd 100644
--- a/fs/overlayfs/export.c
+++ b/fs/overlayfs/export.c
@@ -683,11 +683,12 @@ static struct dentry *ovl_upper_fh_to_d(struct 
super_block *sb,
struct 

Re: [Devel] [PATCH RH7] ipset: enable memory accounting for ipset memory allocations

2020-09-23 Thread Vasily Averin
On 9/23/20 4:41 PM, Evgenii Shatokhin wrote:
> On 23.09.2020 15:54, Vasily Averin wrote:
>> currently root inside non-trusted network namespace can consume
>> all node's memory for ipset hashtable.
>>
>> https://jira.sw.ru/browse/PSBM-108091
>> Signed-off-by: Vasily Averin 
>> ---
>>   net/netfilter/ipset/ip_set_core.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Thanks for the fix!
> 
> Do we need something like this in VZ8 as well?

yes, both rh8 and mainline are affected too, I'm going to prepare patch for 
upstream.

>> diff --git a/net/netfilter/ipset/ip_set_core.c 
>> b/net/netfilter/ipset/ip_set_core.c
>> index 6b93a8978cb2..0fb19b95b507 100644
>> --- a/net/netfilter/ipset/ip_set_core.c
>> +++ b/net/netfilter/ipset/ip_set_core.c
>> @@ -251,14 +251,14 @@ ip_set_alloc(size_t size)
>>   void *members = NULL;
>>     if (size < KMALLOC_MAX_SIZE)
>> -    members = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
>> +    members = kzalloc(size, GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
>>     if (members) {
>>   pr_debug("%p: allocated with kmalloc\n", members);
>>   return members;
>>   }
>>   -    members = vzalloc(size);
>> +    members = vzalloc_account(size);
>>   if (!members)
>>   return NULL;
>>   pr_debug("%p: allocated with vmalloc\n", members);
>>
> 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RH7] ipset: enable memory accounting for ipset memory allocations

2020-09-23 Thread Evgenii Shatokhin

On 23.09.2020 15:54, Vasily Averin wrote:

currently root inside non-trusted network namespace can consume
all node's memory for ipset hashtable.

https://jira.sw.ru/browse/PSBM-108091
Signed-off-by: Vasily Averin 
---
  net/netfilter/ipset/ip_set_core.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Thanks for the fix!

Do we need something like this in VZ8 as well?



diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 6b93a8978cb2..0fb19b95b507 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -251,14 +251,14 @@ ip_set_alloc(size_t size)
void *members = NULL;
  
  	if (size < KMALLOC_MAX_SIZE)

-   members = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+   members = kzalloc(size, GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
  
  	if (members) {

pr_debug("%p: allocated with kmalloc\n", members);
return members;
}
  
-	members = vzalloc(size);

+   members = vzalloc_account(size);
if (!members)
return NULL;
pr_debug("%p: allocated with vmalloc\n", members);



___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: zero-out block device statistics at ploop_stop

2020-09-23 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.28
-->
commit 1a3cd07a8bcd508ee6c02450bfcde8780d99b7e8
Author: Valeriy Vdovin 
Date:   Wed Sep 23 15:57:10 2020 +0300

ploop: zero-out block device statistics at ploop_stop

ploop block device is represented by a block device file in /dev, but
it's lifecycle is separated from the file itself by PLOOP_IOC_START and
PLOOP_IOC_STOP ioctls. This way ploop file in /dev can be an empty
placeholder after PLOOP_IOC_STOP ioctl and reinitialized later by a
PLOOP_IOC_START. Because of that some of the important data structures
stay allocated after stop and maintain old values until and after restart.
This situation is also true for block device statistics that remain 
unchanged
after end of ploop device lifecycle. Fresh-started ploop device is 
considered
a new entity with stats equal to zero. For that we zero out stats at 
ploop_stop.

https://jira.sw.ru/browse/PSBM-95605

Signed-off-by: Valeriy.Vdovin 
Reviewed-by: Kirill Tkhai 
---
 drivers/block/ploop/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index ac4d142..c54ff90 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -4373,6 +4373,9 @@ static int ploop_stop(struct ploop_device * plo, struct 
block_device *bdev)
 
clear_bit(PLOOP_S_RUNNING, >state);
 
+   part_stat_set_all(>disk->part0, 0);
+   memset(>st, 0, sizeof(plo->st));
+
del_timer_sync(>mitigation_timer);
del_timer_sync(>freeze_timer);
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] mm, memcg: add oom counter to memory.stat memcgroup file #PSBM-107731

2020-09-23 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.28
-->
commit 545507995ca12b24a7d1fd886a6c82b28bb47e2c
Author: Andrey Ryabinin 
Date:   Wed Sep 23 15:56:08 2020 +0300

mm, memcg: add oom counter to memory.stat memcgroup file #PSBM-107731

Add oom counter to memory.stat file. oom shows amount of oom kills
triggered due to cgroup's memory limit. total_oom shows total sum of
oom kills triggered due to cgroup's and it's sub-groups memory limits.

memory.stat in the root cgroup counts global oom kills.

E.g:
 # mkdir /sys/fs/cgroup/memory/test/
 # echo 100M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 # echo 100M > /sys/fs/cgroup/memory/test/memory.memsw.limit_in_bytes
 # echo $$ > /sys/fs/cgroup/memory/test/tasks
 # ./vm-scalability/usemem -O 200M
 # grep oom /sys/fs/cgroup/memory/test/memory.stat
   oom 1
   total_oom 1
 # echo -1 > /sys/fs/cgroup/memory/test/memory.memsw.limit_in_bytes
 # echo -1 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
 # ./vm-scalability/usemem -O 1000G
 # grep oom /sys/fs/cgroup/memory/memory.stat
oom 1
total_oom 2

https://jira.sw.ru/browse/PSBM-107731
Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6587cc2..fe06c7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -400,6 +400,7 @@ struct mem_cgroup {
struct mem_cgroup_stat_cpu __percpu *stat;
struct mem_cgroup_stat2_cpu stat2;
spinlock_t pcp_counter_lock;
+   atomic_long_t   oom;
 
atomic_tdead_count;
 #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_INET)
@@ -2005,6 +2006,7 @@ void mem_cgroup_note_oom_kill(struct mem_cgroup 
*root_memcg,
if (memcg == root_memcg)
break;
}
+   atomic_long_inc(_memcg->oom);
 
if (memcg_to_put)
css_put(_to_put->css);
@@ -5691,6 +5693,7 @@ static int memcg_stat_show(struct cgroup *cont, struct 
cftype *cft,
for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++)
seq_printf(m, "%s %lu\n", mem_cgroup_events_names[i],
   mem_cgroup_read_events(memcg, i));
+   seq_printf(m, "oom %lu\n", atomic_long_read(>oom));
 
for (i = 0; i < NR_LRU_LISTS; i++)
seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i],
@@ -5733,6 +5736,12 @@ static int memcg_stat_show(struct cgroup *cont, struct 
cftype *cft,
seq_printf(m, "total_%s %llu\n",
   mem_cgroup_events_names[i], val);
}
+   {
+   unsigned long val = 0;
+   for_each_mem_cgroup_tree(mi, memcg)
+   val += atomic_long_read(>oom);
+   seq_printf(m, "total_oom %lu\n", val);
+   }
 
for (i = 0; i < NR_LRU_LISTS; i++) {
unsigned long long val = 0;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ipset: enable memory accounting for ipset memory allocations

2020-09-23 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.28
-->
commit b85b3e0c99926241ad2fe32d51694b6c7405f493
Author: Vasily Averin 
Date:   Wed Sep 23 15:55:48 2020 +0300

ipset: enable memory accounting for ipset memory allocations

currently root inside non-trusted network namespace can consume
all node's memory for ipset hashtable.

https://jira.sw.ru/browse/PSBM-108091
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 6b93a89..0fb19b9 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -251,14 +251,14 @@ ip_set_alloc(size_t size)
void *members = NULL;
 
if (size < KMALLOC_MAX_SIZE)
-   members = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+   members = kzalloc(size, GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
 
if (members) {
pr_debug("%p: allocated with kmalloc\n", members);
return members;
}
 
-   members = vzalloc(size);
+   members = vzalloc_account(size);
if (!members)
return NULL;
pr_debug("%p: allocated with vmalloc\n", members);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] keys, user: fix NULL-ptr dereference in user_destroy() #PSBM-108198

2020-09-23 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.28
-->
commit a0e271fd8929312b1c5dab72fbc8bc336a296b45
Author: Andrey Ryabinin 
Date:   Wed Sep 23 15:55:59 2020 +0300

keys,user: fix NULL-ptr dereference in user_destroy() #PSBM-108198

key->payload.data could be NULL

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: user_destroy+0x13/0x30

Call Trace:
  key_gc_unused_keys.constprop.1+0xfd/0x110
  key_garbage_collector+0x1d7/0x390
  process_one_work+0x185/0x440
  worker_thread+0x126/0x3c0
  kthread+0xd1/0xe0
  ret_from_fork_nospec_begin+0x7/0x21

Add the necessary check to fix this.

https://jira.sw.ru/browse/PSBM-108198
Fixes: 499126f3b029 ("keys, user: Fix high order allocation in 
user_instantiate()")
Signed-off-by: Andrey Ryabinin 
---
 security/keys/user_defined.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/keys/user_defined.c b/security/keys/user_defined.c
index b13d70b..c3196db 100644
--- a/security/keys/user_defined.c
+++ b/security/keys/user_defined.c
@@ -184,8 +184,10 @@ void user_destroy(struct key *key)
 {
struct user_key_payload *upayload = key->payload.data;
 
-   memset(upayload, 0, sizeof(*upayload) + upayload->datalen);
-   kvfree(upayload);
+   if (upayload) {
+   memset(upayload, 0, sizeof(*upayload) + upayload->datalen);
+   kvfree(upayload);
+   }
 }
 
 EXPORT_SYMBOL_GPL(user_destroy);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ipset: enable memory accounting for ipset memory allocations

2020-09-23 Thread Vasily Averin
currently root inside non-trusted network namespace can consume
all node's memory for ipset hashtable.

https://jira.sw.ru/browse/PSBM-108091
Signed-off-by: Vasily Averin 
---
 net/netfilter/ipset/ip_set_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 6b93a8978cb2..0fb19b95b507 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -251,14 +251,14 @@ ip_set_alloc(size_t size)
void *members = NULL;
 
if (size < KMALLOC_MAX_SIZE)
-   members = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+   members = kzalloc(size, GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
 
if (members) {
pr_debug("%p: allocated with kmalloc\n", members);
return members;
}
 
-   members = vzalloc(size);
+   members = vzalloc_account(size);
if (!members)
return NULL;
pr_debug("%p: allocated with vmalloc\n", members);
-- 
2.17.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] keys, user: fix NULL-ptr dereference in user_destroy() #PSBM-108198

2020-09-23 Thread Andrey Ryabinin
key->payload.data could be NULL

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: user_destroy+0x13/0x30

Call Trace:
  key_gc_unused_keys.constprop.1+0xfd/0x110
  key_garbage_collector+0x1d7/0x390
  process_one_work+0x185/0x440
  worker_thread+0x126/0x3c0
  kthread+0xd1/0xe0
  ret_from_fork_nospec_begin+0x7/0x21

Add the necessary check to fix this.

https://jira.sw.ru/browse/PSBM-108198
Fixes: 499126f3b029 ("keys, user: Fix high order allocation in 
user_instantiate()")
Signed-off-by: Andrey Ryabinin 
---
 security/keys/user_defined.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/keys/user_defined.c b/security/keys/user_defined.c
index b13d70b69069..c3196db50e30 100644
--- a/security/keys/user_defined.c
+++ b/security/keys/user_defined.c
@@ -184,8 +184,10 @@ void user_destroy(struct key *key)
 {
struct user_key_payload *upayload = key->payload.data;
 
-   memset(upayload, 0, sizeof(*upayload) + upayload->datalen);
-   kvfree(upayload);
+   if (upayload) {
+   memset(upayload, 0, sizeof(*upayload) + upayload->datalen);
+   kvfree(upayload);
+   }
 }
 
 EXPORT_SYMBOL_GPL(user_destroy);
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] /vz on nfs support

2020-09-23 Thread Mihai-Alexandru Vintila
Hi All,

Is /vz on nfs still supported on vz 7 as i get the following error no
matter what nfs version i use 3 or 4.1

ploop init -s 1g -t ext4 /vz/private/tst
Creating delta /vz/private/tst bs=2048 size=2097152 sectors v2
Adding snapshot {5fbaabe3-6958-40ff-92a7-860e329aab41}
Storing /vz/private/DiskDescriptor.xml
Opening delta /vz/private/tst
Opening delta /vz/private/tst
Adding delta dev=/dev/ploop35301 img=/vz/private/tst (rw)
Error in add_delta (ploop.c:1580): Can't add image /vz/private/tst: Invalid
argument
Error in read_line (util.c:155): Can't open or read
/sys/block/ploop35301/pdelta/-1/image: No such file or directory
Error in ploop_umount (ploop.c:2583): Failed to find top delta name and
format
Dropping image /vz/private/tst



 vzctl create 100 --ostemplate centos-7-x86_64 --config vswap.2048MB
Creating Container private area (centos-7-x86_64) with applications from
config (ve-vswap.2048MB.conf-sample)
Resize the image /vz/private/100.private_temporary/root.hdd to 10485760K
Failed to resize image: Error in add_delta (ploop.c:1580): Can't add image
/vz/private/100.private_temporary/root.hdd/root.hds: unsupported underlying
filesystem [3]
VE_PRIVATE is not set
Creation of Container private area failed

Virtuozzo Linux release 7.8

Linux enya-dal03.4psacloud.com 3.10.0-1127.8.2.vz7.151.14 #1 SMP Tue Jun 9
12:58:54 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
Name: ploop
Version : 7.0.193.7
Release : 1.vz7
Architecture: x86_64
Install Date: Wed 23 Sep 2020 03:41:26 AM CDT
Group   : Applications/System
Size: 151858
License : GPLv2



Best regads,

Mihai
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel