Re: [Devel] [PATCH rh8 0/3] vecalls: Implement VZCTL_GET_CPU_STAT ioctl

2020-11-10 Thread Andrey Ryabinin



On 11/10/20 12:44 PM, Konstantin Khorenko wrote:
> Used by vzstat/dispatcher/libvirt.
> Faster than parsing Container's cpu cgroup files.
> 
> Konstantin Khorenko (3):
>   vecalls: Add cpu stat measurement units comments to header
>   ve/sched/loadavg: Provide task_group parameter to get_avenrun_ve()
>   vecalls: Introduce VZCTL_GET_CPU_STAT ioctl
> 
>  include/linux/sched/loadavg.h   |  2 -
>  include/linux/ve.h  |  2 +
>  include/uapi/linux/vzcalluser.h | 14 +++
>  kernel/sched/loadavg.c  | 12 +-
>  kernel/sys.c|  6 ++-
>  kernel/time/time.c  |  1 +
>  kernel/ve/ve.c  | 18 +
>  kernel/ve/vecalls.c | 66 +
>  8 files changed, 109 insertions(+), 12 deletions(-)
> 
Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RH7] ploop: Fix crash in purge_lru_warn()

2020-11-10 Thread Andrey Ryabinin



On 11/10/20 5:47 PM, Kirill Tkhai wrote:
> do_div() works wrong in case of the second argument is long.
> We don't need remainder, so we don't need do_div() at all.
> 
> https://jira.sw.ru/browse/PSBM-122035
> 
> Reported-by: Evgenii Shatokhin 
> Signed-off-by: Kirill Tkhai 

Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH RH7] ploop: Fix crash in purge_lru_warn()

2020-11-10 Thread Evgenii Shatokhin

On 10.11.2020 17:47, Kirill Tkhai wrote:

do_div() works wrong in case of the second argument is long.
We don't need remainder, so we don't need do_div() at all.

https://jira.sw.ru/browse/PSBM-122035

Reported-by: Evgenii Shatokhin 
Signed-off-by: Kirill Tkhai 
---
  drivers/block/ploop/io_direct_map.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/ploop/io_direct_map.c 
b/drivers/block/ploop/io_direct_map.c
index 5528e86aab43..8f09ab083315 100644
--- a/drivers/block/ploop/io_direct_map.c
+++ b/drivers/block/ploop/io_direct_map.c
@@ -377,7 +377,7 @@ static inline void purge_lru_warn(struct extent_map_tree 
*tree)
loff_t ratio = i_size_read(tree->mapping->host) * 100;
long images_size = atomic_long_read(_io_images_size) ? : 1;
  
-	do_div(ratio, images_size);

+   ratio /= images_size;
  
  	printk(KERN_WARNING "Purging lru entry from extent tree for inode %ld "

   "(map_size=%d ratio=%lld%%)\n",


.



Looks good to me. The simpler the better.

Reviewed-by: Evgenii Shatokhin 

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh8] ve/proc: Added separate start time field to task_struct to show in container

2020-11-10 Thread Konstantin Khorenko
From: Valeriy Vdovin 

Introduced 'real_start_time_ct' field in task_struct.

The value is READ:
1. When the process lives inside of a ve group and any process
inside of the same ve group wants to know it's start time by reading
it's /proc/[pid]/stat file.
2. At container suspend operation to store this value to a dump image.

The value is WRITTEN:
1. At creation time (copy_process function)
1.1. If a process is being created outside of ve group / on host, then
this value is initialized to 0
1.2. If a process is being created by process already living in ve
group, this value is calculated as host_uptime - ve_uptime.

2. During attach to ve. (ve_attach function). The process can be created on
a host and later attached to ve. It's container's start_time value has been
already initialized to 0 at creation time. After the process enters the
domain of a ve, the value should be initialized.
Note that the process can be attached to a non-running container, in which
case it's start_time value should not be calculated and left initialized to
0.

3. At container restore via prctl (prctl_set_task_ct_fields function).
In this case the value is only settable outside of a container.
During restore the processes would be created from the dump image.
At restore step each process will execute prctl to set it's start_time
value, read from the dump. This would only be permitted during
pseudosuper ve mode. The value is set as is (read from the dump), without
any calculations.

https://jira.sw.ru/browse/PSBM-64123

Signed-off-by: Valeriy Vdovin 

(cherry picked from vz7 commit eca790eaed527bae7029b4ae1cd557ce847ac6c0)
Signed-off-by: Konstantin Khorenko 
---
 fs/proc/array.c| 12 +++-
 include/linux/sched.h  |  7 ++-
 include/linux/ve.h | 16 
 include/uapi/linux/prctl.h |  7 +++
 kernel/fork.c  | 11 +++
 kernel/sys.c   | 23 +++
 kernel/ve/ve.c |  2 ++
 7 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 5e7152d21a9d..2a292d54e804 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -544,16 +544,10 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
start_time = task->real_start_time;
 
 #ifdef CONFIG_VE
-   if (!is_super) {
-   u64 offset = get_exec_env()->real_start_time;
-   start_time -= (unsigned long long)offset;
-   }
-   /* tasks inside a CT can have negative start time e.g. if the CT was
-* migrated from another hw node, in which case we will report 0 in
-* order not to confuse userspace */
-   if ((s64)start_time < 0)
-   start_time = 0;
+   if (!is_super)
+   start_time = task->real_start_time_ct;
 #endif
+
/* convert nsec -> ticks */
start_time = nsec_to_clock_t(start_time);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index cabed6a47a70..a0616888a5ca 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -847,7 +847,6 @@ struct task_struct {
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
struct vtimevtime;
 #endif
-
 #ifdef CONFIG_NO_HZ_FULL
atomic_ttick_dep_mask;
 #endif
@@ -861,6 +860,12 @@ struct task_struct {
/* Boot based time in nsecs: */
u64 real_start_time;
 
+   /*
+* This is a Container-side copy of 'real_start_time' field
+* shown from inside of a Container and modified by host.
+*/
+   u64 real_start_time_ct;
+
/* MM fault and swap info: this can arguably be seen as either 
mm-specific or thread-specific: */
unsigned long   min_flt;
unsigned long   maj_flt;
diff --git a/include/linux/ve.h b/include/linux/ve.h
index 3aa0ea0b1bab..ab8da4dceec1 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -148,6 +148,22 @@ static u64 ve_get_uptime(struct ve_struct *ve)
return ktime_get_boot_ns() - ve->real_start_time;
 }
 
+static inline void ve_set_task_start_time(struct ve_struct *ve,
+ struct task_struct *t)
+{
+   /*
+* Mitigate memory access reordering risks by doing double check,
+* 'is_running' could be read as 1 before we see
+* 'real_start_time' updated here. If it's still 0,
+* we know 'is_running' is being modified right NOW in
+* parallel so it's safe to say that start time is also 0.
+*/
+   if (!ve->is_running || !ve->real_start_time)
+   t->real_start_time_ct = 0;
+   else
+   t->real_start_time_ct = ve_get_uptime(ve);
+}
+
 extern void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 *tp);
 extern void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 *tp);
 
diff --git a/include/uapi/linux/prctl.h 

[Devel] [PATCH RH7] ploop: Fix crash in purge_lru_warn()

2020-11-10 Thread Kirill Tkhai
do_div() works wrong in case of the second argument is long.
We don't need remainder, so we don't need do_div() at all.

https://jira.sw.ru/browse/PSBM-122035

Reported-by: Evgenii Shatokhin 
Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/io_direct_map.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/ploop/io_direct_map.c 
b/drivers/block/ploop/io_direct_map.c
index 5528e86aab43..8f09ab083315 100644
--- a/drivers/block/ploop/io_direct_map.c
+++ b/drivers/block/ploop/io_direct_map.c
@@ -377,7 +377,7 @@ static inline void purge_lru_warn(struct extent_map_tree 
*tree)
loff_t ratio = i_size_read(tree->mapping->host) * 100;
long images_size = atomic_long_read(_io_images_size) ? : 1;
 
-   do_div(ratio, images_size);
+   ratio /= images_size;
 
printk(KERN_WARNING "Purging lru entry from extent tree for inode %ld "
   "(map_size=%d ratio=%lld%%)\n",


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: Invalidate pagecache on service operations

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit 24aef273109a6076d82eceb8f9417cdb2e9149fd
Author: Kirill Tkhai 
Date:   Tue Nov 10 13:43:52 2020 +0300

ploop: Invalidate pagecache on service operations

This allows fastmap to work, otherwise it fails on
mapping_needs_writeback() check.

Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/io_kaio.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
index be74b2e..112fd92 100644
--- a/drivers/block/ploop/io_kaio.c
+++ b/drivers/block/ploop/io_kaio.c
@@ -937,6 +937,15 @@ static int kaio_alloc_sync(struct ploop_io * io, loff_t 
pos, loff_t len)
 
return err;
 }
+static int kaio_invalidate_cache(struct ploop_io *io)
+{
+   struct inode *inode = io->files.inode;
+
+   if (!inode->i_op->fastmap)
+   return 0;
+
+   return invalidate_inode_pages2(io->files.mapping);
+}
 
 static int kaio_open(struct ploop_io * io)
 {
@@ -952,6 +961,7 @@ static int kaio_open(struct ploop_io * io)
io->files.bdev = io->files.inode->i_sb->s_bdev;
 
mutex_lock(>files.inode->i_mutex);
+   kaio_invalidate_cache(io);
err = ploop_kaio_open(file, delta->flags & PLOOP_FMT_RDONLY);
mutex_unlock(>files.inode->i_mutex);
 
@@ -1004,6 +1014,10 @@ static int kaio_prepare_snapshot(struct ploop_io * io, 
struct ploop_snapdata *sd
return err;
}
 
+   mutex_lock(>files.inode->i_mutex);
+   kaio_invalidate_cache(io);
+   mutex_unlock(>files.inode->i_mutex);
+
sd->file = file;
return 0;
 }
@@ -1024,6 +1038,10 @@ static int kaio_complete_snapshot(struct ploop_io * io, 
struct ploop_snapdata *s
 
ploop_kaio_downgrade(io->files.mapping);
 
+   mutex_lock(>files.inode->i_mutex);
+   kaio_invalidate_cache(io);
+   mutex_unlock(>files.inode->i_mutex);
+
if (io->fsync_thread) {
kthread_stop(io->fsync_thread);
io->fsync_thread = NULL;
@@ -1057,6 +1075,10 @@ static int kaio_prepare_merge(struct ploop_io * io, 
struct ploop_snapdata *sd)
if (err)
goto prep_merge_done;
 
+   mutex_lock(>files.inode->i_mutex);
+   kaio_invalidate_cache(io);
+   mutex_unlock(>files.inode->i_mutex);
+
err = ploop_kaio_upgrade(io->files.mapping);
if (err)
goto prep_merge_done;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] fs/fuse kio: post rdma work requests only after connection is established

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit c9c8d2e99a62df4b3ecfa7a6fbc27c491b5bba5f
Author: Ildar Ismagilov 
Date:   Tue Nov 10 13:43:59 2020 +0300

fs/fuse kio: post rdma work requests only after connection is established

Some RDMA drivers only generate completions for work requests
that are posted after connection is established. And if connection
is rejected our posted works will never be completed, as result we
have a resource leak. To fix this problem we don't post work requests
until connection is established.

https://pmc.acronis.com/browse/VSTOR-38116

Signed-off-by: Ildar Ismagilov 
Reviewed-by: Andrey Zaitsev 
---
 fs/fuse/kio/pcs/pcs_rdma_conn.c |  4 
 fs/fuse/kio/pcs/pcs_rdma_io.c   | 25 +
 fs/fuse/kio/pcs/pcs_rdma_io.h   |  1 +
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/fuse/kio/pcs/pcs_rdma_conn.c b/fs/fuse/kio/pcs/pcs_rdma_conn.c
index 521089c..4db9031 100644
--- a/fs/fuse/kio/pcs/pcs_rdma_conn.c
+++ b/fs/fuse/kio/pcs/pcs_rdma_conn.c
@@ -80,6 +80,10 @@ static int pcs_rdma_cm_event_handler(struct rdma_cm_id *cmid,
break;
case RDMA_CM_EVENT_ESTABLISHED:
cmid->context = >rio->id;
+   if (pcs_rdma_established(rc->rio)) {
+   TRACE("pcs_rdma_established failed, rio: 
0x%p\n", rc->rio);
+   rc->cm_event = RDMA_CM_EVENT_REJECTED;
+   }
complete(>cm_done);
break;
case RDMA_CM_EVENT_REJECTED:
diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.c b/fs/fuse/kio/pcs/pcs_rdma_io.c
index dc55a9c..5955716 100644
--- a/fs/fuse/kio/pcs/pcs_rdma_io.c
+++ b/fs/fuse/kio/pcs/pcs_rdma_io.c
@@ -1183,7 +1183,6 @@ struct pcs_rdmaio* pcs_rdma_create(int hdr_size, struct 
rdma_cm_id *cmid,
   int queue_depth, struct pcs_rpc *ep)
 {
struct pcs_rdmaio *rio;
-   struct rio_rx *rx;
struct ib_cq_init_attr cq_attr = {};
struct ib_qp_init_attr qp_attr = {};
int recv_queue_depth = queue_depth * 2 + 2;
@@ -1304,21 +1303,10 @@ struct pcs_rdmaio* pcs_rdma_create(int hdr_size, struct 
rdma_cm_id *cmid,
goto free_cq;
}
 
-   for (rx = rio->rx_descs; rx - rio->rx_descs < recv_queue_depth; rx++)
-   if (rio_rx_post(rio, rx, RIO_MSG_SIZE)) {
-   TRACE("rio_rx_post failed: rio: 0x%p\n", rio);
-   break;
-   }
-
-   if (rio->n_rx_posted != recv_queue_depth)
-   goto free_qp;
-
TRACE("rio: 0x%p, dev: 0x%p, queue_depth: %d\n", rio, rio->dev, 
queue_depth);
 
return rio;
 
-free_qp:
-   rdma_destroy_qp(rio->cmid);
 free_cq:
ib_destroy_cq(rio->cq);
 free_dev:
@@ -1333,6 +1321,19 @@ free_rio:
return NULL;
 }
 
+int pcs_rdma_established(struct pcs_rdmaio *rio)
+{
+   struct rio_rx *rx;
+
+   for (rx = rio->rx_descs; rx - rio->rx_descs < rio->recv_queue_depth; 
rx++)
+   if (rio_rx_post(rio, rx, RIO_MSG_SIZE)) {
+   TRACE("rio_rx_post failed: rio: 0x%p\n", rio);
+   break;
+   }
+
+   return rio->n_rx_posted == rio->recv_queue_depth ? 0 : -EINVAL;
+}
+
 static void rio_cleanup(struct pcs_rdmaio *rio)
 {
rio_perform_tx_jobs(rio);
diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.h b/fs/fuse/kio/pcs/pcs_rdma_io.h
index b411098e..69d817d 100644
--- a/fs/fuse/kio/pcs/pcs_rdma_io.h
+++ b/fs/fuse/kio/pcs/pcs_rdma_io.h
@@ -106,6 +106,7 @@ struct pcs_rdmaio
 
 struct pcs_rdmaio* pcs_rdma_create(int hdr_size, struct rdma_cm_id *cmid,
int queue_depth, struct pcs_rpc *ep);
+int pcs_rdma_established(struct pcs_rdmaio *rio);
 void pcs_rdma_destroy(struct pcs_rdmaio *rio);
 void pcs_rdma_ioconn_destruct(struct pcs_ioconn *ioconn);
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] fs/fuse kio: fix processing order of RDMA works during throttle/unthrottle

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit 0aa759d3db6c7abce84121c6bdd8447e9967b335
Author: Ildar Ismagilov 
Date:   Tue Nov 10 13:44:14 2020 +0300

fs/fuse kio: fix processing order of RDMA works during throttle/unthrottle

To fix it, let's skip processing the RDMA works during throttle and
add them to rio->pended_rxs list in FIFO order.

This fix helps us to avoid BUG_ON(tx->xid != rack->xid) in userspace.

https://pmc.acronis.com/browse/VSTOR-38354

Signed-off-by: Ildar Ismagilov 
Reviewed-by: Andrey Zaitsev 
---
 fs/fuse/kio/pcs/pcs_rdma_io.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.c b/fs/fuse/kio/pcs/pcs_rdma_io.c
index 5955716..4622198 100644
--- a/fs/fuse/kio/pcs/pcs_rdma_io.c
+++ b/fs/fuse/kio/pcs/pcs_rdma_io.c
@@ -798,6 +798,11 @@ static int rio_handle_rx_immediate(struct pcs_rdmaio *rio, 
char *buf, int len,
int offset = rio->hdr_size;
struct iov_iter it;
 
+   if (rio->throttled) {
+   *throttle = 1;
+   return 0;
+   }
+
if (len < rio->hdr_size) {
TRACE("rio read short msg: %d < %d, rio: 0x%p\n", len,
  rio->hdr_size, rio);
@@ -949,7 +954,7 @@ static void rio_handle_rx(struct pcs_rdmaio *rio, struct 
rio_rx *rx,
return;
}
} else
-   list_add(>list, >pended_rxs);
+   list_add_tail(>list, >pended_rxs);
 
if (!pended)
rio->n_peer_credits += credits;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: Add statistics of fastmap requests, which fails because of cache

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit 56a50079f0317675af4d9c1018f38b47ca2d476c
Author: Kirill Tkhai 
Date:   Tue Nov 10 13:43:43 2020 +0300

ploop: Add statistics of fastmap requests, which fails because of cache

Normally, mapping_needs_writeback() should not be true. But in case
of some problem, or userspace touch root.hds without direct mode,
cache may populate. Count such the failed fastmaps.

Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/io_kaio.c| 2 ++
 fs/ext4/file.c   | 5 -
 include/linux/ploop/ploop_stat.h | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
index 4c4a0c6..be74b2e 100644
--- a/drivers/block/ploop/io_kaio.c
+++ b/drivers/block/ploop/io_kaio.c
@@ -1236,6 +1236,8 @@ kaio_fastmap(struct ploop_io *io, struct bio *orig_bio,
   orig_bio->bi_rw & REQ_WRITE);
if (ret < 0) {
io->plo->st.fast_neg_noem++;
+   if (ret == -EBUSY)
+   io->plo->st.write_back_pending++;
return 1;
}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8f5fb6d..67a385e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -132,6 +132,7 @@ static int ext4_fastmap(struct inode *inode, sector_t 
lblk_sec,
bool unaligned_aio, found, locked = false;
struct ext4_map_blocks map;
loff_t pos = lblk_sec << 9;
+   int err;
 
if (!S_ISREG(inode->i_mode))
return -ENOENT;
@@ -152,6 +153,7 @@ static int ext4_fastmap(struct inode *inode, sector_t 
lblk_sec,
locked = true;
}
 
+   err = -EBUSY;
if (unlikely(mapping_needs_writeback(mapping)))
goto err_maybe_unlock;
 
@@ -163,6 +165,7 @@ static int ext4_fastmap(struct inode *inode, sector_t 
lblk_sec,
locked = false;
}
 
+   err = -ENOENT;
if (unlikely(ext4_test_inode_state(inode,
EXT4_STATE_DIOREAD_LOCK))) {
goto err_dio_end;
@@ -181,7 +184,7 @@ err_dio_end:
 err_maybe_unlock:
if (locked)
mutex_unlock(>i_mutex);
-   return -ENOENT;
+   return err;
 }
 
 static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *iter, 
loff_t *pos)
diff --git a/include/linux/ploop/ploop_stat.h b/include/linux/ploop/ploop_stat.h
index bed910a..92543a9 100644
--- a/include/linux/ploop/ploop_stat.h
+++ b/include/linux/ploop/ploop_stat.h
@@ -34,6 +34,7 @@ __DO(merge_neg_cluster)
 __DO(merge_neg_disable)
 __DO(fast_neg_nomap)
 __DO(fast_neg_noem)
+__DO(write_back_pending)
 __DO(fast_neg_shortem)
 __DO(fast_neg_backing)
 __DO(bio_lockouts)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] kvm: fix AMD IBRS/IBPB/STIBP/SSBD reporting #PSBM-120787

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit 7b92b48f1bde33ff94e9cfbf375bfa85af6fe5fd
Author: Denis V. Lunev 
Date:   Tue Nov 10 13:42:39 2020 +0300

kvm: fix AMD IBRS/IBPB/STIBP/SSBD reporting #PSBM-120787

We should report these bits in 8008 EBX on AMD only, i.e. when AMD
specific feature bits are enabled.

Signed-off-by: Denis V. Lunev 
CC: Vasily Averin 
CC: Konstantin Khorenko 

Signed-off-by: "Denis V. Lunev" 
---
 arch/x86/kvm/cpuid.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 96a6bac..d876f18 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -642,13 +642,13 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 * arch/x86/kernel/cpu/bugs.c is kind enough to
 * record that in cpufeatures so use them.
 */
-   if (boot_cpu_has(X86_FEATURE_IBPB))
+   if (boot_cpu_has(X86_FEATURE_AMD_IBPB))
entry->ebx |= F(AMD_IBPB);
-   if (boot_cpu_has(X86_FEATURE_IBRS))
+   if (boot_cpu_has(X86_FEATURE_AMD_IBRS))
entry->ebx |= F(AMD_IBRS);
-   if (boot_cpu_has(X86_FEATURE_STIBP))
+   if (boot_cpu_has(X86_FEATURE_AMD_STIBP))
entry->ebx |= F(AMD_STIBP);
-   if (boot_cpu_has(X86_FEATURE_SSBD))
+   if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
entry->ebx |= F(AMD_SSBD);
if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
entry->ebx |= F(AMD_SSB_NO);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve/pidns: Use proper ns in sys_getppid()

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit a152b818bc87a0db588acfda333d01e62866f243
Author: Konstantin Khorenko 
Date:   Tue Nov 10 13:42:50 2020 +0300

ve/pidns: Use proper ns in sys_getppid()

struct nsproxy::pid_ns is the pidns for children processes in fact,
so use task_active_pid_ns() instead.

The struct field got already renamed in ms by commit
c2b1df2eb429 "(Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children").

The problem appears only in case a process has done unshare(CLONE_NEWPID)
and after that called sys_getppid().

https://jira.sw.ru/browse/PSBM-121530
Fixes: 762f3e6a33f3 ("ve: Replace 0 ppid with 1 (workaround for bad utils)")
Signed-off-by: Konstantin Khorenko 
---
 kernel/sys.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 2ce16c7..8687707 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1231,7 +1231,7 @@ SYSCALL_DEFINE0(getppid)
int pid;
 
rcu_read_lock();
-   pid = ve_task_ppid_nr_ns(current, current->nsproxy->pid_ns);
+   pid = ve_task_ppid_nr_ns(current, task_active_pid_ns(current));
rcu_read_unlock();
 
return pid;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] pid: Use proper ns in proc_dointvec_pidmax()

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit 5d351a1f64cb1e192a94b51aff5d7efebe596487
Author: Kirill Tkhai 
Date:   Tue Nov 10 13:43:02 2020 +0300

pid: Use proper ns in proc_dointvec_pidmax()

Use current task pid ns instead of pid ns for future children
https://jira.sw.ru/browse/PSBM-121530
Signed-off-by: Kirill Tkhai 
---
 kernel/sysctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index a958d57..6e2645d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -339,7 +339,7 @@ static int proc_dointvec_pidmax(struct ctl_table *table, 
int write,
struct ctl_table tmp;
 
tmp = *table;
-   tmp.data = >nsproxy->pid_ns->pid_max;
+   tmp.data = _active_pid_ns(current)->pid_max;
return proc_dointvec_minmax(, write, buffer, lenp, ppos);
 }
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ve: Reorder ve->ve_ns assignment in ve_grab_context()

2020-11-10 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1160.2.2.vz7.170.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.2.2.el7
-->
commit e3f54fa771eb942f51e1a1260fbfd8d6ec2d5c75
Author: Kirill Tkhai 
Date:   Tue Nov 10 13:43:31 2020 +0300

ve: Reorder ve->ve_ns assignment in ve_grab_context()

This function must provide guarantees for readers, that
ve_ns != NULL under rcu_read_lock means the rest of context
(say, ve->init_task) is table.

But now order is wrong, and it does not guarantee that. Fix it.

v2: Use local variable for ve_ns, otherwise net_ns write results
in NULL pointer derefence.

Signed-off-by: Kirill Tkhai 
---
 kernel/ve/ve.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index 482d658..068b7b5 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -580,15 +580,18 @@ static void ve_stop_kthread(struct ve_struct *ve)
 static void ve_grab_context(struct ve_struct *ve)
 {
struct task_struct *tsk = current;
+   struct nsproxy *ve_ns;
 
get_task_struct(tsk);
ve->init_task = tsk;
ve->root_css_set = tsk->cgroups;
get_css_set(ve->root_css_set);
ve->init_cred = (struct cred *)get_current_cred();
-   rcu_assign_pointer(ve->ve_ns, get_nsproxy(tsk->nsproxy));
-   ve->ve_netns =  get_net(ve->ve_ns->net_ns);
+   ve_ns = get_nsproxy(tsk->nsproxy);
+   ve->ve_netns =  get_net(ve_ns->net_ns);
synchronize_rcu();
+
+   rcu_assign_pointer(ve->ve_ns, ve_ns);
 }
 
 static void ve_drop_context(struct ve_struct *ve)
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh8 2/3] ve/sched/loadavg: Provide task_group parameter to get_avenrun_ve()

2020-11-10 Thread Konstantin Khorenko
Rename get_avenrun_ve() to get_avenrun_tg() and provide it
the task_group argument to use it later for any VE, for the the current
one.

Fixes: f52cf2752bca ("ve/sched/loadavg: Calculate avenrun for Containers
root cpu cgroups")

Signed-off-by: Konstantin Khorenko 
---
 include/linux/sched/loadavg.h |  2 --
 kernel/sched/loadavg.c| 12 ++--
 kernel/sys.c  |  6 +-
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/loadavg.h b/include/linux/sched/loadavg.h
index 1da5768389b7..25fb3344cdbf 100644
--- a/include/linux/sched/loadavg.h
+++ b/include/linux/sched/loadavg.h
@@ -16,8 +16,6 @@
  */
 extern unsigned long avenrun[];/* Load averages */
 extern void get_avenrun(unsigned long *loads, unsigned long offset, int shift);
-extern void get_avenrun_ve(unsigned long *loads,
-  unsigned long offset, int shift);
 
 #define FSHIFT 11  /* nr of bits of precision */
 #define FIXED_1(1avenrun[1] + offset) << shift;
loads[2] = (tg->avenrun[2] + offset) << shift;
+
+   return 0;
 }
 
 long calc_load_fold_active(struct rq *this_rq, long adjust)
diff --git a/kernel/sys.c b/kernel/sys.c
index e7e07ea8d7ef..8560e5bcb6c2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2543,6 +2543,8 @@ SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned 
__user *, nodep,
 }
 
 extern void si_meminfo_ve(struct sysinfo *si, struct ve_struct *ve);
+extern int get_avenrun_tg(struct task_group *tg, unsigned long *loads,
+ unsigned long offset, int shift);
 
 /**
  * do_sysinfo - fill in sysinfo struct
@@ -2575,7 +2577,9 @@ static int do_sysinfo(struct sysinfo *info)
 
info->procs = nr_threads_ve(ve);
 
-   get_avenrun_ve(info->loads, 0, SI_LOAD_SHIFT - FSHIFT);
+   /* does not fail on non-VE0 task group */
+   (void)get_avenrun_tg(NULL, info->loads,
+0, SI_LOAD_SHIFT - FSHIFT);
}
 
/*
-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh8 3/3] vecalls: Introduce VZCTL_GET_CPU_STAT ioctl

2020-11-10 Thread Konstantin Khorenko
This vzctl ioctl still used by vzstat utility and dispatcher/libvirt
statistics reporting.
>From one point of view almost all data can be get from cpu cgroup of a
Container (missing data can be exported additionally),
but statistics is gathered often and ioctl is faster and requires less
cpu power => let it be for now.

The current patch is based on following vz7 commits:
  ecdce58b214c ("sched: Export per task_group statistics_work")
  a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")
  75fc174adc36 ("sched: Port cpustat related patches")

Signed-off-by: Konstantin Khorenko 
---
 include/linux/ve.h  |  2 ++
 kernel/time/time.c  |  1 +
 kernel/ve/ve.c  | 18 +
 kernel/ve/vecalls.c | 66 +
 4 files changed, 87 insertions(+)

diff --git a/include/linux/ve.h b/include/linux/ve.h
index 656ee43e383e..7cb416f342e7 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -201,10 +201,12 @@ struct seq_file;
 #if defined(CONFIG_VE) && defined(CONFIG_CGROUP_SCHED)
 int ve_show_cpu_stat(struct ve_struct *ve, struct seq_file *p);
 int ve_show_loadavg(struct ve_struct *ve, struct seq_file *p);
+int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned long *avenrun);
 int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat);
 #else
 static inline int ve_show_cpu_stat(struct ve_struct *ve, struct seq_file *p) { 
return -ENOSYS; }
 static inline int ve_show_loadavg(struct ve_struct *ve, struct seq_file *p) { 
return -ENOSYS; }
+static inline int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned long 
*avenrun) { return -ENOSYS; }
 static inline int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat 
*kstat) { return -ENOSYS; }
 #endif
 
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 2b41e8e2d31d..ff1db0ba0c39 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -770,6 +770,7 @@ u64 nsec_to_clock_t(u64 x)
return div_u64(x * 9, (9ull * NSEC_PER_SEC + (USER_HZ / 2)) / USER_HZ);
 #endif
 }
+EXPORT_SYMBOL(nsec_to_clock_t);
 
 u64 jiffies64_to_nsecs(u64 j)
 {
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index a9afefc5b9de..29e98e6396dc 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -1430,6 +1430,24 @@ int ve_show_loadavg(struct ve_struct *ve, struct 
seq_file *p)
return err;
 }
 
+inline struct task_group *css_tg(struct cgroup_subsys_state *css);
+int get_avenrun_tg(struct task_group *tg, unsigned long *loads,
+  unsigned long offset, int shift);
+
+int ve_get_cpu_avenrun(struct ve_struct *ve, unsigned long *avnrun)
+{
+   struct cgroup_subsys_state *css;
+   struct task_group *tg;
+   int err;
+
+   css = ve_get_init_css(ve, cpu_cgrp_id);
+   tg = css_tg(css);
+   err = get_avenrun_tg(tg, avnrun, 0, 0);
+   css_put(css);
+   return err;
+}
+EXPORT_SYMBOL(ve_get_cpu_avenrun);
+
 int cpu_cgroup_get_stat(struct cgroup_subsys_state *cpu_css,
struct cgroup_subsys_state *cpuacct_css,
struct kernel_cpustat *kstat);
diff --git a/kernel/ve/vecalls.c b/kernel/ve/vecalls.c
index 3258b49b15b2..786a743faa1a 100644
--- a/kernel/ve/vecalls.c
+++ b/kernel/ve/vecalls.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include 
@@ -35,6 +37,62 @@ static u64 ve_get_uptime(struct ve_struct *ve)
return ktime_get_boot_ns() - ve->real_start_time;
 }
 
+static int fill_cpu_stat(envid_t veid, struct vz_cpu_stat __user *buf)
+{
+   struct ve_struct *ve;
+   struct vz_cpu_stat *vstat;
+   int retval;
+   int i;
+   unsigned long tmp;
+   unsigned long avnrun[3];
+   struct kernel_cpustat kstat;
+
+   if (!ve_is_super(get_exec_env()) && (veid != get_exec_env()->veid))
+   return -EPERM;
+   ve = get_ve_by_id(veid);
+   if (!ve)
+   return -ESRCH;
+
+   retval = -ENOMEM;
+   vstat = kzalloc(sizeof(*vstat), GFP_KERNEL);
+   if (!vstat)
+   goto out_put_ve;
+
+   retval = ve_get_cpu_stat(ve, );
+   if (retval)
+   goto out_free;
+
+   retval = ve_get_cpu_avenrun(ve, avnrun);
+   if (retval)
+   goto out_free;
+
+   vstat->user_jif   = (unsigned long)nsec_to_clock_t(
+  kstat.cpustat[CPUTIME_USER]);
+   vstat->nice_jif   = (unsigned long)nsec_to_clock_t(
+  kstat.cpustat[CPUTIME_NICE]);
+   vstat->system_jif = (unsigned long)nsec_to_clock_t(
+  kstat.cpustat[CPUTIME_SYSTEM]);
+   vstat->idle_clk   = kstat.cpustat[CPUTIME_IDLE];
+   vstat->uptime_clk = ve_get_uptime(ve);
+
+   vstat->uptime_jif = (unsigned long)jiffies_64_to_clock_t(
+   get_jiffies_64() - ve->start_jiffies);
+   for (i = 0; i < 3; i++) {
+   tmp = avnrun[i] + (FIXED_1/200);
+   

[Devel] [PATCH rh8 1/3] vecalls: Add cpu stat measurement units comments to header

2020-11-10 Thread Konstantin Khorenko
It's not obvious why, say, "user_jif" field does not contain
time in jiffies, so add clarification comments.

Fixes: 248ed6b2a193 ("ve: Add vecalls")

Signed-off-by: Konstantin Khorenko 
---
 include/uapi/linux/vzcalluser.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/vzcalluser.h b/include/uapi/linux/vzcalluser.h
index f2584b4b284f..8a4ff0015e40 100644
--- a/include/uapi/linux/vzcalluser.h
+++ b/include/uapi/linux/vzcalluser.h
@@ -55,13 +55,13 @@ struct vz_load_avg {
 };
 
 struct vz_cpu_stat {
-   unsigned long   user_jif;
-   unsigned long   nice_jif;
-   unsigned long   system_jif;
-   unsigned long   uptime_jif;
-   __u64   idle_clk;
-   __u64   strv_clk;
-   __u64   uptime_clk;
+   unsigned long   user_jif;   /* clock_t */
+   unsigned long   nice_jif;   /* clock_t */
+   unsigned long   system_jif  /* clock_t */;
+   unsigned long   uptime_jif  /* clock_t */;
+   __u64   idle_clk;   /* ns */
+   __u64   strv_clk;   /* deprecated */
+   __u64   uptime_clk; /* ns */
struct vz_load_avg  avenrun[3]; /* loadavg data */
 };
 
-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh8 0/3] vecalls: Implement VZCTL_GET_CPU_STAT ioctl

2020-11-10 Thread Konstantin Khorenko
Used by vzstat/dispatcher/libvirt.
Faster than parsing Container's cpu cgroup files.

Konstantin Khorenko (3):
  vecalls: Add cpu stat measurement units comments to header
  ve/sched/loadavg: Provide task_group parameter to get_avenrun_ve()
  vecalls: Introduce VZCTL_GET_CPU_STAT ioctl

 include/linux/sched/loadavg.h   |  2 -
 include/linux/ve.h  |  2 +
 include/uapi/linux/vzcalluser.h | 14 +++
 kernel/sched/loadavg.c  | 12 +-
 kernel/sys.c|  6 ++-
 kernel/time/time.c  |  1 +
 kernel/ve/ve.c  | 18 +
 kernel/ve/vecalls.c | 66 +
 8 files changed, 109 insertions(+), 12 deletions(-)

-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel