[Devel] [PATCH rh8] sched/stat: account forks per task group

2020-10-30 Thread Konstantin Khorenko
From: Vladimir Davydov 

This is a backport of diff-sched-account-forks-per-task-group:

 Subject: sched: account forks per task group
 Date: Fri, 28 Dec 2012 15:09:46 +0400

* [sched] the number of processes should be reported correctly
inside a CT in /proc/stat (PSBM-18113)

For /proc/stat:processes to be correct inside containers.

https://jira.sw.ru/browse/PSBM-18113

Signed-off-by: Vladimir Davydov 

(cherry picked from vz7 commit 0a927bf02fd873f4e9bad7c4df0c201bf9b48274)
Signed-off-by: Konstantin Khorenko 
---
 kernel/sched/cpuacct.c | 4 +++-
 kernel/sched/fair.c| 1 +
 kernel/sched/sched.h   | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 2814ea059bb3..0ba19cce9fac 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -652,6 +652,7 @@ int cpu_cgroup_proc_stat(struct cgroup_subsys_state 
*cpu_css,
unsigned long tg_nr_running = 0;
unsigned long tg_nr_iowait = 0;
unsigned long long tg_nr_switches = 0;
+   unsigned long tg_nr_forks = 0;
 
getboottime64(&boottime);
 
@@ -671,6 +672,7 @@ int cpu_cgroup_proc_stat(struct cgroup_subsys_state 
*cpu_css,
tg_nr_running += tg->cfs_rq[i]->h_nr_running;
tg_nr_iowait  += tg->cfs_rq[i]->nr_iowait;
tg_nr_switches += tg->cfs_rq[i]->nr_switches;
+   tg_nr_forks   += tg->cfs_rq[i]->nr_forks;
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
tg_nr_running += tg->rt_rq[i]->rt_nr_running;
@@ -746,7 +748,7 @@ int cpu_cgroup_proc_stat(struct cgroup_subsys_state 
*cpu_css,
   "procs_blocked %lu\n",
   tg_nr_switches,
   (unsigned long long)boot_sec,
-  total_forks,
+  tg_nr_forks,
   tg_nr_running,
   tg_nr_iowait);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b9bb108625a..892329471df1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10300,6 +10300,7 @@ static void task_fork_fair(struct task_struct *p)
}
 
se->vruntime -= cfs_rq->min_vruntime;
+   cfs_rq->nr_forks++;
rq_unlock(rq, &rf);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3d55b45f1ea6..ccd8ad478a08 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -545,6 +545,7 @@ struct cfs_rq {
struct sched_entity *prev;
 
u64 nr_switches;
+   unsigned long nr_forks;
 
 #ifdef CONFIG_SCHED_DEBUG
unsigned intnr_spread_over;
-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 v2] ve: Reorder ve->ve_ns assignment in ve_grab_context()

2020-10-30 Thread Kirill Tkhai
This function must provide guarantees for readers, that
ve_ns != NULL under rcu_read_lock means the rest of context
(say, ve->init_task) is table.

But now order is wrong, and it does not guarantee that. Fix it.

v2: Use local variable for ve_ns, otherwise net_ns write results
in NULL pointer derefence.

Signed-off-by: Kirill Tkhai 
---
 kernel/ve/ve.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index db26cbd41d3f..cfc3039bb85b 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -579,15 +579,18 @@ static void ve_stop_kthread(struct ve_struct *ve)
 static void ve_grab_context(struct ve_struct *ve)
 {
struct task_struct *tsk = current;
+   struct nsproxy *ve_ns;
 
get_task_struct(tsk);
ve->init_task = tsk;
ve->root_css_set = tsk->cgroups;
get_css_set(ve->root_css_set);
ve->init_cred = (struct cred *)get_current_cred();
-   rcu_assign_pointer(ve->ve_ns, get_nsproxy(tsk->nsproxy));
-   ve->ve_netns =  get_net(ve->ve_ns->net_ns);
+   ve_ns = get_nsproxy(tsk->nsproxy);
+   ve->ve_netns =  get_net(ve_ns->net_ns);
synchronize_rcu();
+
+   rcu_assign_pointer(ve->ve_ns, ve_ns);
 }
 
 static void ve_drop_context(struct ve_struct *ve)


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH vz8 1/3] arch/x86: introduce cpuid override

2020-10-30 Thread Andrey Ryabinin
From: Vladimir Davydov 

Port diff-arch-x86-introduce-cpuid-override

Recent Intel CPUs rejected CPUID masking, which is required for flex
migration, in favor of CPUID faulting. So we need to support it in
kenrel.

This patch adds user writable file /proc/vz/cpuid_override, which
contains CPUID override table. Each table entry must have the following
format:

  op[ count]: eax ebx ecx edx

where @op and optional @count define a CPUID function, whose output one
would like to override (@op and @count are loaded to EAX and ECX
registers respectively before calling CPUID); @eax, @ebx, @ecx, @edx -
the desired CPUID output for the specified function. All values must be
in HEX, 0x prefix is optional.

Notes:

 - the file is only present on hosts that support CPUID faulting;
 - CPUID faulting is always enabled if it is supported;
 - CPUID output is overridden on all present CPUs;
 - the maximal number of entries one can override equals 16;
 - each write(2) to the file removes all existing entries before adding
   new ones, so the whole table must be written in one write(2); in
   particular writing an empty line to the file removes all existing
   rules.

Example:

Suppose we want to mask out SSE2 (CPUID.01H:EDX:26) and RDTSCP
(CPUID.8001H:EDX:27). Then we should execute the following sequence:

 - get the current cpuid value:

   # cpuid -r | grep -e '^\s*0x0001' -e '^\s*0x8001' | head -n 2
  0x0001 0x00: eax=0x000306e4 ebx=0x00200800 ecx=0x7fbee3ff 
edx=0xbfebfbff
  0x8001 0x00: eax=0x ebx=0x ecx=0x0001 
edx=0x2c100800

 - clear the feature bits we want to mask out and write the result to
   /proc/vz/cpuid_override:

   # cat >/proc/vz/cpuid_override 

[Devel] [PATCH vz8 3/3] x86: Show vcpu cpuflags in cpuinfo

2020-10-30 Thread Andrey Ryabinin
From: Kirill Tkhai 

Show cpu_i flags as flags of vcpu_i.

Extracted from "Initial patch". Merged several reworks.

TODO: Maybe replace/rework on_each_cpu() with smp_call_function_single().
Then we won't need split c_start() in previous patch (as the call
function will be called right before specific cpu is being prepared
to show). This should be rather easy.
[aryabinin: Don't see what it buys us, so I didn't try to implement it]

Signed-off-by: Kirill Tkhai 

https://jira.sw.ru/browse/PSBM-121823
[aryabinin:vz8 rebase]
Signed-off-by: Andrey Ryabinin 
---
 arch/x86/kernel/cpu/proc.c | 63 +++---
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index d6b17a60acf6..4fe1577d5e6f 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -4,6 +4,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "cpu.h"
 
@@ -58,10 +60,54 @@ extern void __do_cpuid_fault(unsigned int op, unsigned int 
count,
 unsigned int *eax, unsigned int *ebx,
 unsigned int *ecx, unsigned int *edx);
 
+struct cpu_flags {
+   u32 val[NCAPINTS];
+};
+
+static DEFINE_PER_CPU(struct cpu_flags, cpu_flags);
+
+static void init_cpu_flags(void *dummy)
+{
+   int cpu = smp_processor_id();
+   struct cpu_flags *flags = &per_cpu(cpu_flags, cpu);
+   struct cpuinfo_x86 *c = &cpu_data(cpu);
+   unsigned int eax, ebx, ecx, edx;
+
+   memcpy(flags->val, c->x86_capability, NCAPINTS * sizeof(u32));
+
+   /*
+* Clear feature bits masked using cpuid masking/faulting.
+*/
+
+   if (c->cpuid_level >= 0x0001) {
+   __do_cpuid_fault(0x0001, 0, &eax, &ebx, &ecx, &edx);
+   flags->val[4] &= ecx;
+   flags->val[0] &= edx;
+   }
+
+   if (c->cpuid_level >= 0x0007) {
+   __do_cpuid_fault(0x0007, 0, &eax, &ebx, &ecx, &edx);
+   flags->val[9] &= ebx;
+   }
+
+   if ((c->extended_cpuid_level & 0x) == 0x8000 &&
+   c->extended_cpuid_level >= 0x8001) {
+   __do_cpuid_fault(0x8001, 0, &eax, &ebx, &ecx, &edx);
+   flags->val[6] &= ecx;
+   flags->val[1] &= edx;
+   }
+
+   if (c->cpuid_level >= 0x000d) {
+   __do_cpuid_fault(0x000d, 1, &eax, &ebx, &ecx, &edx);
+   flags->val[10] &= eax;
+   }
+}
+
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
struct cpuinfo_x86 *c = v;
unsigned int cpu;
+   int is_super = ve_is_super(get_exec_env());
int i;
 
cpu = c->cpu_index;
@@ -103,7 +149,10 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 
seq_puts(m, "flags\t\t:");
for (i = 0; i < 32*NCAPINTS; i++)
-   if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
+   if (x86_cap_flags[i] != NULL &&
+   ((is_super && cpu_has(c, i)) ||
+(!is_super && test_bit(i, (unsigned long *)
+   &per_cpu(cpu_flags, 
cpu)
seq_printf(m, " %s", x86_cap_flags[i]);
 
seq_puts(m, "\nbugs\t\t:");
@@ -145,18 +194,24 @@ static int show_cpuinfo(struct seq_file *m, void *v)
return 0;
 }
 
-static void *c_start(struct seq_file *m, loff_t *pos)
+static void *__c_start(struct seq_file *m, loff_t *pos)
 {
*pos = cpumask_next(*pos - 1, cpu_online_mask);
-   if ((*pos) < nr_cpu_ids)
+   if (bitmap_weight(cpumask_bits(cpu_online_mask), *pos) < 
num_online_vcpus())
return &cpu_data(*pos);
return NULL;
 }
 
+static void *c_start(struct seq_file *m, loff_t *pos)
+{
+   on_each_cpu(init_cpu_flags, NULL, 1);
+   return __c_start(m, pos);
+}
+
 static void *c_next(struct seq_file *m, void *v, loff_t *pos)
 {
(*pos)++;
-   return c_start(m, pos);
+   return __c_start(m, pos);
 }
 
 static void c_stop(struct seq_file *m, void *v)
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH vz8 2/3] x86: make ARCH_[SET|GET]_CPUID friends with /proc/vz/cpuid_override

2020-10-30 Thread Andrey Ryabinin
We are using cpuid faults to emulate cpuid in containers. This
conflicts with arch_prctl(ARCH_SET_CPUID, 0) which allows to enable
cpuid faulting so that cpuid instruction causes SIGSEGV.

Add TIF_CPUID_OVERRIDE thread info flag which is added on all
!ve0 tasks. And check this flag along with TIF_NOCPUID to
decide whether we need to enable/disable cpuid faults or not.

https://jira.sw.ru/browse/PSBM-121823
Signed-off-by: Andrey Ryabinin 
---
 arch/x86/include/asm/thread_info.h |  4 +++-
 arch/x86/kernel/cpuid_fault.c  |  3 ++-
 arch/x86/kernel/process.c  | 13 +
 arch/x86/kernel/traps.c|  3 +++
 kernel/ve/ve.c |  1 +
 5 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index c0da378eed8b..6ffb64d25383 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID15  /* CPUID is not accessible in userland 
*/
 #define TIF_NOTSC  16  /* TSC is not accessible in userland */
 #define TIF_IA32   17  /* IA32 compatibility process */
+#define TIF_CPUID_OVERRIDE 18  /* CPUID emulation enabled */
 #define TIF_NOHZ   19  /* in adaptive nohz mode */
 #define TIF_MEMDIE 20  /* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG 21  /* idle is polling for TIF_NEED_RESCHED 
*/
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID   (1 << TIF_NOCPUID)
 #define _TIF_NOTSC (1 << TIF_NOTSC)
 #define _TIF_IA32  (1 << TIF_IA32)
+#define _TIF_CPUID_OVERRIDE(1 << TIF_CPUID_OVERRIDE)
 #define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -153,7 +155,7 @@ struct thread_info {
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE   \
(_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP| \
-_TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
+_TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_CPUID_OVERRIDE)
 
 /*
  * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated.
diff --git a/arch/x86/kernel/cpuid_fault.c b/arch/x86/kernel/cpuid_fault.c
index 339e2638c3b8..1e8ffacc4412 100644
--- a/arch/x86/kernel/cpuid_fault.c
+++ b/arch/x86/kernel/cpuid_fault.c
@@ -6,7 +6,8 @@
 #include 
 #include 
 #include 
-#include 
+#include 
+#include 
 
 struct cpuid_override_entry {
unsigned int op;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e5c5b1d724ab..788b9b8f8f9c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -209,7 +209,8 @@ static void set_cpuid_faulting(bool on)
 static void disable_cpuid(void)
 {
preempt_disable();
-   if (!test_and_set_thread_flag(TIF_NOCPUID)) {
+   if (!test_and_set_thread_flag(TIF_NOCPUID) ||
+   test_thread_flag(TIF_CPUID_OVERRIDE)) {
/*
 * Must flip the CPU state synchronously with
 * TIF_NOCPUID in the current running context.
@@ -222,7 +223,8 @@ static void disable_cpuid(void)
 static void enable_cpuid(void)
 {
preempt_disable();
-   if (test_and_clear_thread_flag(TIF_NOCPUID)) {
+   if (test_and_clear_thread_flag(TIF_NOCPUID) &&
+   !test_thread_flag(TIF_CPUID_OVERRIDE)) {
/*
 * Must flip the CPU state synchronously with
 * TIF_NOCPUID in the current running context.
@@ -505,6 +507,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct 
task_struct *next_p)
 {
struct thread_struct *prev, *next;
unsigned long tifp, tifn;
+   bool prev_cpuid, next_cpuid;
 
prev = &prev_p->thread;
next = &next_p->thread;
@@ -529,8 +532,10 @@ void __switch_to_xtra(struct task_struct *prev_p, struct 
task_struct *next_p)
if ((tifp ^ tifn) & _TIF_NOTSC)
cr4_toggle_bits_irqsoff(X86_CR4_TSD);
 
-   if ((tifp ^ tifn) & _TIF_NOCPUID)
-   set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
+   prev_cpuid = (tifp & _TIF_NOCPUID) || (tifp & _TIF_CPUID_OVERRIDE);
+   next_cpuid = (tifn & _TIF_NOCPUID) || (tifn & _TIF_CPUID_OVERRIDE);
+   if (prev_cpuid != next_cpuid)
+   set_cpuid_faulting(next_cpuid);
 
if (likely(!((tifp | tifn) & _TIF_SPEC_FORCE_UPDATE))) {
__speculation_ctrl_update(tifp, tifn);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c43e3b80e50f..d0b379cf0484 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -526,6 +526,9 @@ static int check_cpuid_fault(struct pt_regs *regs, long 
error_code)
if (error_code != 0)
return 0;
 
+   if (test_thread_flag(TIF_NOCPUID))
+

[Devel] [PATCH rh8 2/3] ve/time/stat: idle time virtualization in /proc/loadavg

2020-10-30 Thread Konstantin Khorenko
The patch is based on following vz7 commits:
  a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")
  75fc174adc36 ("sched: Port cpustat related patches")

Fixes: a3c4d1d8f383 ("ve/time: Customize VE uptime")

TODO: to separate FIXME hunks from a3c4d1d8f383 ("ve/time: Customize VE
uptime") and merge them into this commit

Signed-off-by: Konstantin Khorenko 
---
 fs/proc/uptime.c | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c
index bc07d42ce9f5..dae407953903 100644
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -23,37 +23,24 @@ static inline void get_ve0_idle(struct timespec64 *idle)
idle->tv_nsec = rem;
 }
 
-static inline void get_veX_idle(struct timespec *idle, struct cgroup* cgrp)
+static inline void get_veX_idle(struct ve_struct *ve, struct timespec64 *idle)
 {
-#if 0
-FIXME: to be reworked anyway in
-   "Use ve init task's css instead of opening cgroup via vfs"
-
struct kernel_cpustat kstat;
 
-   cpu_cgroup_get_stat(cgrp, &kstat);
-   *idle = ns_to_timespec(kstat.cpustat[CPUTIME_IDLE]);
-#endif
+   ve_get_cpu_stat(ve, &kstat);
+   *idle = ns_to_timespec64(kstat.cpustat[CPUTIME_IDLE]);
 }
 
 static int uptime_proc_show(struct seq_file *m, void *v)
 {
struct timespec uptime, offset;
struct timespec64 idle;
+   struct ve_struct *ve = get_exec_env();
 
-   if (ve_is_super(get_exec_env()))
-   get_ve0_idle(&idle);
-   else {
+   if (ve_is_super(ve))
get_ve0_idle(&idle);
-#if 0
-FIXME:  to be reworked anyway in
-"Use ve init task's css instead of opening cgroup via vfs"
-
-   rcu_read_lock();
-   get_veX_idle(&idle, task_cgroup(current, cpu_cgroup_subsys_id));
-   rcu_read_unlock();
-#endif
-   }
+   else
+   get_veX_idle(ve, &idle);
 
get_monotonic_boottime(&uptime);
 #ifdef CONFIG_VE
-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh8 1/3] ve/sched/stat: Introduce handler for getting CT cpu statistics

2020-10-30 Thread Konstantin Khorenko
It will be used later in
  * idle cpu stat virtualization in /proc/loadavg
  * /proc/vz/vestat output
  * VZCTL_GET_CPU_STAT ioctl

The patch is based on following vz7 commits:
  ecdce58b214c ("sched: Export per task_group statistics_work")
  75fc174adc36 ("sched: Port cpustat related patches")
  a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")

Signed-off-by: Konstantin Khorenko 
---
 include/linux/ve.h |  2 ++
 kernel/sched/cpuacct.c | 24 
 kernel/ve/ve.c | 18 ++
 3 files changed, 44 insertions(+)

diff --git a/include/linux/ve.h b/include/linux/ve.h
index d88e4715a222..656ee43e383e 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -201,9 +201,11 @@ struct seq_file;
 #if defined(CONFIG_VE) && defined(CONFIG_CGROUP_SCHED)
 int ve_show_cpu_stat(struct ve_struct *ve, struct seq_file *p);
 int ve_show_loadavg(struct ve_struct *ve, struct seq_file *p);
+int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat);
 #else
 static inline int ve_show_cpu_stat(struct ve_struct *ve, struct seq_file *p) { 
return -ENOSYS; }
 static inline int ve_show_loadavg(struct ve_struct *ve, struct seq_file *p) { 
return -ENOSYS; }
+static inline int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat 
*kstat) { return -ENOSYS; }
 #endif
 
 #endif /* _LINUX_VE_H */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 0ba19cce9fac..df5fe01c8f24 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -754,3 +754,27 @@ int cpu_cgroup_proc_stat(struct cgroup_subsys_state 
*cpu_css,
 
return 0;
 }
+
+int cpu_cgroup_get_stat(struct cgroup_subsys_state *cpu_css,
+   struct cgroup_subsys_state *cpuacct_css,
+   struct kernel_cpustat *kstat)
+{
+   struct task_group *tg = css_tg(cpu_css);
+   int nr_vcpus = tg_nr_cpus(tg);
+   int i;
+
+   kernel_cpustat_zero(kstat);
+
+   if (tg == &root_task_group)
+   return -ENOENT;
+
+   for_each_possible_cpu(i)
+   cpu_cgroup_update_stat(cpu_css, cpuacct_css, i);
+
+   cpu_cgroup_update_vcpustat(cpu_css, cpuacct_css);
+
+   for (i = 0; i < nr_vcpus; i++)
+   kernel_cpustat_add(tg->vcpustat + i, kstat, kstat);
+
+   return 0;
+}
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index eeb1947a7d53..10cebe10beab 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -1425,4 +1425,22 @@ int ve_show_loadavg(struct ve_struct *ve, struct 
seq_file *p)
css_put(css);
return err;
 }
+
+int cpu_cgroup_get_stat(struct cgroup_subsys_state *cpu_css,
+   struct cgroup_subsys_state *cpuacct_css,
+   struct kernel_cpustat *kstat);
+
+int ve_get_cpu_stat(struct ve_struct *ve, struct kernel_cpustat *kstat)
+{
+   struct cgroup_subsys_state *cpu_css, *cpuacct_css;
+   int err;
+
+   cpu_css = ve_get_init_css(ve, cpu_cgrp_id);
+   cpuacct_css = ve_get_init_css(ve, cpuacct_cgrp_id);
+   err = cpu_cgroup_get_stat(cpu_css, cpuacct_css, kstat);
+   css_put(cpuacct_css);
+   css_put(cpu_css);
+   return err;
+}
+EXPORT_SYMBOL(ve_get_cpu_stat);
 #endif /* CONFIG_CGROUP_SCHED */
-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh8 3/3] ve/vestat: Introduce /proc/vz/vestat

2020-10-30 Thread Konstantin Khorenko
The patch is based on following vz7 commits:

  f997bf6c613a ("ve: initial patch")
  75fc174adc36 ("sched: Port cpustat related patches")
  09e1cb4a7d4d ("ve/proc: restricted proc-entries scope")
  a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")

Signed-off-by: Konstantin Khorenko 
---
 kernel/ve/vecalls.c | 98 +
 1 file changed, 98 insertions(+)

diff --git a/kernel/ve/vecalls.c b/kernel/ve/vecalls.c
index 78773c21b8db..3258b49b15b2 100644
--- a/kernel/ve/vecalls.c
+++ b/kernel/ve/vecalls.c
@@ -30,6 +30,11 @@
 #include 
 #include 
 
+static u64 ve_get_uptime(struct ve_struct *ve)
+{
+   return ktime_get_boot_ns() - ve->real_start_time;
+}
+
 /**
  **
  *
@@ -38,6 +43,74 @@
  **
  **/
 #ifdef CONFIG_PROC_FS
+#if BITS_PER_LONG == 32
+#define VESTAT_LINE_WIDTH (6 * 11 + 6 * 21)
+#define VESTAT_LINE_FMT "%10s %10lu %10lu %10lu %10Lu %20Lu %20Lu %20Lu %20Lu 
%20Lu %20Lu %10lu\n"
+#define VESTAT_HEAD_FMT "%10s %10s %10s %10s %10s %20s %20s %20s %20s %20s 
%20s %10s\n"
+#else
+#define VESTAT_LINE_WIDTH (12 * 21)
+#define VESTAT_LINE_FMT "%20s %20lu %20lu %20lu %20Lu %20Lu %20Lu %20Lu %20Lu 
%20Lu %20Lu %20lu\n"
+#define VESTAT_HEAD_FMT "%20s %20s %20s %20s %20s %20s %20s %20s %20s %20s 
%20s %20s\n"
+#endif
+
+static int vestat_seq_show(struct seq_file *m, void *v)
+{
+   struct list_head *entry;
+   struct ve_struct *ve;
+   struct ve_struct *curve;
+   int ret;
+   unsigned long user_ve, nice_ve, system_ve;
+   unsigned long long uptime;
+   u64 uptime_cycles, idle_time, strv_time, used;
+   struct kernel_cpustat kstat;
+
+   entry = (struct list_head *)v;
+   ve = list_entry(entry, struct ve_struct, ve_list);
+
+   curve = get_exec_env();
+   if (entry == ve_list_head.next ||
+   (!ve_is_super(curve) && ve == curve)) {
+   /* print header */
+   seq_printf(m, "%-*s\n",
+  VESTAT_LINE_WIDTH - 1,
+  "Version: 2.2");
+   seq_printf(m, VESTAT_HEAD_FMT, "VEID",
+  "user", "nice", "system",
+  "uptime", "idle",
+  "strv", "uptime", "used",
+  "maxlat", "totlat", "numsched");
+   }
+
+   if (ve == get_ve0())
+   return 0;
+
+   ret = ve_get_cpu_stat(ve, &kstat);
+   if (ret)
+   return ret;
+
+   strv_time   = 0;
+   user_ve = nsecs_to_jiffies(kstat.cpustat[CPUTIME_USER]);
+   nice_ve = nsecs_to_jiffies(kstat.cpustat[CPUTIME_NICE]);
+   system_ve   = nsecs_to_jiffies(kstat.cpustat[CPUTIME_SYSTEM]);
+   used= kstat.cpustat[CPUTIME_USED];
+   idle_time   = kstat.cpustat[CPUTIME_IDLE];
+
+   uptime_cycles = ve_get_uptime(ve);
+   uptime = get_jiffies_64() - ve->start_jiffies;
+
+   seq_printf(m, VESTAT_LINE_FMT, ve_name(ve),
+  user_ve, nice_ve, system_ve,
+  (unsigned long long)uptime,
+  (unsigned long long)idle_time,
+  (unsigned long long)strv_time,
+  (unsigned long long)uptime_cycles,
+  (unsigned long long)used,
+  (unsigned long long)ve->sched_lat_ve.last.maxlat,
+  (unsigned long long)ve->sched_lat_ve.last.totlat,
+  ve->sched_lat_ve.last.count);
+   return 0;
+}
+
 static void *ve_seq_start(struct seq_file *m, loff_t *pos)
 {
struct ve_struct *curve;
@@ -66,6 +139,25 @@ static void ve_seq_stop(struct seq_file *m, void *v)
mutex_unlock(&ve_list_lock);
 }
 
+static struct seq_operations vestat_seq_op = {
+   .start  = ve_seq_start,
+   .next   = ve_seq_next,
+   .stop   = ve_seq_stop,
+   .show   = vestat_seq_show
+};
+
+static int vestat_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, &vestat_seq_op);
+}
+
+static struct file_operations proc_vestat_operations = {
+   .open   = vestat_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release = seq_release
+};
+
 static int devperms_seq_show(struct seq_file *m, void *v)
 {
struct ve_struct *ve = list_entry(v, struct ve_struct, ve_list);
@@ -181,6 +273,11 @@ static int __init init_vecalls_proc(void)
 {
struct proc_dir_entry *de;
 
+   de = proc_create("vestat", S_IFREG | S_IRUSR | S_ISVTX, proc_vz_dir,
+   &proc_vestat_operations);
+   if (!de)
+   printk(KERN_WARNING "VZMON: can't make vestat proc entry\n");
+
de = proc_create("devperms", S_IFREG | S_IRUSR, proc_vz_dir,
 

Re: [Devel] [PATCH rh8] sched/stat: account ctxsw per task group

2020-10-30 Thread Andrey Ryabinin



On 10/29/20 6:46 PM, Konstantin Khorenko wrote:
> From: Vladimir Davydov 
> 
> This is a backport of diff-sched-account-ctxsw-per-task-group:
> 
>  Subject: sched: account ctxsw per task group
>  Date: Fri, 28 Dec 2012 15:09:45 +0400
> 
> * [sched] the number of context switches should be reported correctly
> inside a CT in /proc/stat (PSBM-18113)
> 
> For /proc/stat:ctxt to be correct inside containers.
> 
> https://jira.sw.ru/browse/PSBM-18113
> 
> Signed-off-by: Vladimir Davydov 
> 
> (cherry picked from vz7 commit d388f0bf64adb74cd62c4deff58e181bd63d62ac)
> Signed-off-by: Konstantin Khorenko 
> ---

Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh8 0/8] ve/proc/sched/stat: Virtualize /proc/stat in a Container

2020-10-30 Thread Andrey Ryabinin



On 10/28/20 6:57 PM, Konstantin Khorenko wrote:
> This patchset contains of parts of following vz7 commits:
> 
>   a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")
>   ecdce58b214c ("sched: Export per task_group statistics_work")
>   fc24d1785a28 ("fs/proc: print fairshed stat")
>   75fc174adc36 ("sched: Port cpustat related patches")
>   3c7b1e52294c ("ve/sched: Hide steal time from inside CT")
>   3d34f0d3b529 ("proc/cpu/cgroup: make boottime in CT reveal the real start 
> time")
>   715f311fdb4a ("sched: Account task_group::cpustat,taskstats,avenrun")
> 
> Known issues:
>  - context switches ("ctxt") and number of forks ("processes")
>virtualization is TBD
>  - "procs_blocked" reported is incorrect, to be fixed by later patches
> 
> Konstantin Khorenko (8):
>   ve/cgroup: export cgroup_get_ve_root1() + cleanup
>   kernel/stat: Introduce kernel_cpustat operation wrappers
>   ve/sched/stat: Add basic infrastructure for vcpu statistics
>   ve/sched/stat: Introduce functions to calculate vcpustat data
>   ve/proc/stat: Introduce /proc/stat virtualized handler for Containers
>   ve/proc/stat: Wire virtualized /proc/stat handler
>   ve/proc/stat: Introduce CPUTIME_USED field in cpustat statistic
>   sched: Fix task_group "iowait_sum" statistic accounting
> 
>  fs/proc/stat.c  |  10 +
>  include/linux/kernel_stat.h |  37 
>  include/linux/ve.h  |   8 +
>  kernel/cgroup/cgroup.c  |   6 +-
>  kernel/sched/core.c |  17 +-
>  kernel/sched/cpuacct.c  | 377 
>  kernel/sched/fair.c |   3 +-
>  kernel/sched/sched.h|   5 +
>  kernel/ve/ve.c  |  17 ++
>  9 files changed, 475 insertions(+), 5 deletions(-)
> 

Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh8 1/3] ve/sched/stat: Introduce handler for getting CT cpu statistics

2020-10-30 Thread Andrey Ryabinin



On 10/30/20 4:08 PM, Konstantin Khorenko wrote:
> It will be used later in
>   * idle cpu stat virtualization in /proc/loadavg
>   * /proc/vz/vestat output
>   * VZCTL_GET_CPU_STAT ioctl
> 
> The patch is based on following vz7 commits:
>   ecdce58b214c ("sched: Export per task_group statistics_work")
>   75fc174adc36 ("sched: Port cpustat related patches")
>   a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")
> 
> Signed-off-by: Konstantin Khorenko 

Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh8 2/3] ve/time/stat: idle time virtualization in /proc/loadavg

2020-10-30 Thread Andrey Ryabinin



On 10/30/20 4:08 PM, Konstantin Khorenko wrote:
> The patch is based on following vz7 commits:
>   a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")
>   75fc174adc36 ("sched: Port cpustat related patches")
> 
> Fixes: a3c4d1d8f383 ("ve/time: Customize VE uptime")
> 
> TODO: to separate FIXME hunks from a3c4d1d8f383 ("ve/time: Customize VE
> uptime") and merge them into this commit
> 
> Signed-off-by: Konstantin Khorenko 
> ---
Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh8 3/3] ve/vestat: Introduce /proc/vz/vestat

2020-10-30 Thread Andrey Ryabinin



On 10/30/20 4:08 PM, Konstantin Khorenko wrote:
> The patch is based on following vz7 commits:
> 
>   f997bf6c613a ("ve: initial patch")
>   75fc174adc36 ("sched: Port cpustat related patches")
>   09e1cb4a7d4d ("ve/proc: restricted proc-entries scope")
>   a58fb58bff1c ("Use ve init task's css instead of opening cgroup via vfs")
> 
> Signed-off-by: Konstantin Khorenko 
> ---

Reviewed-by: Andrey Ryabinin 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel