[Devel] [PATCH RHEL7 COMMIT] mm/memcg: optimize mem_cgroup_enough_memory()

2020-10-12 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.36
-->
commit 42198a2e87454cd34c2a61a86e4ffb90d0b19e0c
Author: Andrey Ryabinin 
Date:   Mon Oct 12 19:07:53 2020 +0300

mm/memcg: optimize mem_cgroup_enough_memory()

mem_cgroup_enough_memory() iterates memcg's subtree to account
'MEM_CGROUP_STAT_CACHE - MEM_CGROUP_STAT_SHMEM'.

Fortunately we can just read memcg->cache counter instead
as it's hierarchical (includes subgroups) and doesn't account
shmem.

https://jira.sw.ru/browse/PSBM-120968
Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fe06c7d..e0e113b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4723,11 +4723,7 @@ int mem_cgroup_enough_memory(struct mem_cgroup *memcg, 
long pages)
free += page_counter_read(>dcache);
 
/* assume file cache is reclaimable */
-   free += mem_cgroup_recursive_stat2(memcg, MEM_CGROUP_STAT_CACHE);
-
-   /* but do not count shmem pages as they can't be purged,
-* only swapped out */
-   free -= mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_SHMEM);
+   free += page_counter_read(>cache);
 
return free < pages ? -ENOMEM : 0;
 }
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] bcache: fix NULL pointer deref in blk_add_request_payload

2020-10-12 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.36
-->
commit c71f88419bd7eca486997ccc8d4e377240163145
Author: Lars Ellenberg 
Date:   Mon Oct 12 19:07:47 2020 +0300

bcache: fix NULL pointer deref in blk_add_request_payload

[https://lkml.org/lkml/2014/2/19/264]

bch_generic_make_request_hack() tries to be smart,
and fake a bi_max_bvecs = bi_vcnt.

If those bios have been REQ_DISCARD, and get submitted to a driver
(md raid) that uses bio_clone, the clone will end up with bi_io_vec == NULL,
passed down the stack, end up in sd_prep_fn and blk_add_request_payload,
which then tries to use bio->bi_io_vec->page.

Fix: try to be even smarter in bch_generic_make_request_hack(),
and always pretend to have at least bi_max_vecs of 1,
unless the incoming bio was already created without a single bvec.

Signed-off-by: Lars Ellenberg 

https://jira.sw.ru/browse/PSBM-121142

The fix did not make it into the mainline or stable kernels but it was not
rejected either, just forgotten.

The problem was fixed in the kernel 3.14 with commit
e90abc8ec323 "block: Remove bi_idx hacks" and its prerequisites, which are
rather invasive.

Signed-off-by: Evgenii Shatokhin 
---
 drivers/md/bcache/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index d285cd4..4482c09 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -45,7 +45,7 @@ static void bch_generic_make_request_hack(struct bio *bio)
 *
 * To be taken out once immutable bvec stuff is in.
 */
-   bio->bi_max_vecs = bio->bi_vcnt;
+   bio->bi_max_vecs = bio->bi_vcnt ?: (bio->bi_io_vec ? 1 : 0);
 
generic_make_request(bio);
 }
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ploop: Wake up on last discard bio from ploop_complete_request()

2020-10-12 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.36
-->
commit 3d869e1c074a5ca9cecee94d1b604e247f35be7e
Author: Kirill Tkhai 
Date:   Mon Oct 12 19:07:59 2020 +0300

ploop: Wake up on last discard bio from ploop_complete_request()

Concurrent thread waits that on pending_waitq.

https://jira.sw.ru/browse/PSBM-121135
Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/dev.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index d6edbfb..320a5d5 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1461,8 +1461,11 @@ static void ploop_complete_request(struct ploop_request 
* preq)
 */
spin_lock_irq(>lock);
plo->active_reqs--;
-   if (preq->req_rw & REQ_DISCARD)
-   plo->discard_inflight_reqs--;
+   if (preq->req_rw & REQ_DISCARD) {
+   if (!--plo->discard_inflight_reqs &&
+   waitqueue_active(>pending_waitq))
+   wake_up(>pending_waitq);
+   }
spin_unlock_irq(>lock);
 
while (preq->bl.head) {
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh8 v2] ve/posix-timers: reference ve monotonic clock from ve start (v2)

2020-10-12 Thread Kirill Tkhai
On 12.10.2020 16:47, Konstantin Khorenko wrote:
> From: Kirill Tkhai 
> 
> So that CLOCK_MONOTONIC will be monotonic even if ve is migrated to
> another hw node.
> 
> Note, translating ve <-> abs time in clock_settime and timer_settime is
> not necessary because (1) clock_settime won't set monotonic clock and
> (2) timer_gettime always returns relative time.
> 
> https://jira.sw.ru/browse/PSBM-13860
> 
> diff-posix_timers-reference-ct-monotonic-clock-from-ct-start
> 
> Signed-off-by: Vladimir Davydov 
> 
> Acked-by: Pavel Emelyanov 
> Signed-off-by: Kirill Tkhai 
> 
> +++
> ve/posix-timers: reference ve monotonic clock from start in clock_nanosleep
> 
> This is an addition to 
> diff-posix_timers-reference-ct-monotonic-clock-from-ct-start
> 
> Otherwise, apps that use sys_clock_nanosleep() to suspend their
> execution can hang after ve migration.
> 
> diff-posix-timers-reference-ve-monotonic-clock-from-ve-start-in-clock_nanosleep
> 
> Signed-off-by: Vladimir Davydov 
> 
> Acked-by: Konstantin Khlebnikov 
> Acked-by: Pavel Emelyanov 
> Signed-off-by: Kirill Tkhai 
> 
> +++
> timers: Port 
> diff-ve-timers-convert-ve-monotonic-to-abs-time-when-setting-timerfd-2
> 
> Need this for docker, as sometimes systemd-tmpfiles-clean.timer inside
> a PCS7 CT is spamming dbus with requests to start corresponding service.
> And at the same time Docker tries to create cgroup for container and
> attach it to hierarchies like memory and blkio.
> 
> That is because systemd timer was triggered with non-virtualized timerfd
> using plain host clock but check that timer is successfull uses
> virtualized clock_gettime and don't pass before proper(in-container)
> timer activation. And timers charges again and again starts service
> got in busy loop.
> 
> https://jira.sw.ru/browse/PSBM-34017
> 
> v2: move the stubs to ve.h
> 
> Port the following RH6 commit:
> 
>   Author: Vladimir Davydov
>   Email: vdavy...@parallels.com
>   Subject: fs: convert ve monotonic to abs time when setting timerfd
>   Date: Fri, 15 Feb 2013 11:57:09 +0400
> 
>   * [timers] corrected TFD_TIMER_ABSTIME timer handling,
> the issue led to high cpu usage inside a Fedora 18 CT
> by 'init' process (PSBM-18284)
> 
>   Monotonic time inside container, as it can be obtained using various
>   system calls such as clock_gettime, is reported since start of the 
> container,
>   not since start of the whole system. This was made in order to avoid time
>   issues while a container is migrated between different physical hosts, but 
> this
>   also introduced a lot of problems in time- related system calls because
>   absolute monotonic time, which is in fact relative to container, passed to 
> those
>   system calls must be converted to system-wide monotonic time, which is used 
> by
>   kernel hrtimers.
> 
>   One of those buggy system calls is timerfd_settime which accepts as an
>   argument absolute time if flag TFD_TIMER_ABSTIME is specified.
> 
>   The patch fixes it by converting container monotonic time to system-
>   wide monotonic time using the monotonic_ve_to_abs() function, which was
>   introduced earlier and is now exported for that reason.
> 
>   https://jira.sw.ru/browse/PSBM-18284
> 
>   Signed-off-by: Vladimir Davydov 
> 
> Signed-off-by: Pavel Tikhomirov 
> Signed-off-by: Kirill Tkhai 
> Reviewed-by: Vladimir Davydov 
> 
> (cherry picked from vz7 commit 869542c24c41c0578b47d2ef83cfa63427e0e5e1)
> Signed-off-by: Konstantin Khorenko 
> 
> +++
> timers should not get negative argument
> 
> This patch fixes 25-sec delay on login into systemd based containers.
> 
> Userspace application can set timer for past
> and expect that the timer will be expired immediately.
> 
> This can do not work as expected inside migrated containers.
> Translated argument provided to timer can become negative,
> and according timer will sleep a very long time.
> 
> https://jira.sw.ru/browse/PSBM-48475
> 
> CC: Vladimir Davydov 
> CC: Konstantin Khorenko 
> Signed-off-by: Vasily Averin 
> Acked-by: Cyrill Gorcunov 
> 
> (cherry picked from vz7 commit a71fa19facb00472e47760255ab2e6fa16885732)
> Signed-off-by: Konstantin Khorenko 

Reviewed-by: Kirill Tkhai 

> 
> Changes:
> v2: dropped second redundant string "if (flags & TIMER_ABSTIME)"
> 
> ---
>  fs/timerfd.c   |  8 --
>  include/linux/ve.h |  8 ++
>  kernel/time/posix-timers.c | 54 --
>  3 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index cdad49da3ff7..59ed38c29941 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct timerfd_ctx {
>   union {
> @@ -432,8 +433,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
>   return ufd;
>  }
>  
> -static int do_timerfd_settime(int ufd, int flags, 
> - const struct itimerspec64 *new,
> +static int do_timerfd_settime(int ufd, int flags,
> + 

[Devel] [PATCH rh8 v2] ve/posix-timers: reference ve monotonic clock from ve start (v2)

2020-10-12 Thread Konstantin Khorenko
From: Kirill Tkhai 

So that CLOCK_MONOTONIC will be monotonic even if ve is migrated to
another hw node.

Note, translating ve <-> abs time in clock_settime and timer_settime is
not necessary because (1) clock_settime won't set monotonic clock and
(2) timer_gettime always returns relative time.

https://jira.sw.ru/browse/PSBM-13860

diff-posix_timers-reference-ct-monotonic-clock-from-ct-start

Signed-off-by: Vladimir Davydov 

Acked-by: Pavel Emelyanov 
Signed-off-by: Kirill Tkhai 

+++
ve/posix-timers: reference ve monotonic clock from start in clock_nanosleep

This is an addition to 
diff-posix_timers-reference-ct-monotonic-clock-from-ct-start

Otherwise, apps that use sys_clock_nanosleep() to suspend their
execution can hang after ve migration.

diff-posix-timers-reference-ve-monotonic-clock-from-ve-start-in-clock_nanosleep

Signed-off-by: Vladimir Davydov 

Acked-by: Konstantin Khlebnikov 
Acked-by: Pavel Emelyanov 
Signed-off-by: Kirill Tkhai 

+++
timers: Port 
diff-ve-timers-convert-ve-monotonic-to-abs-time-when-setting-timerfd-2

Need this for docker, as sometimes systemd-tmpfiles-clean.timer inside
a PCS7 CT is spamming dbus with requests to start corresponding service.
And at the same time Docker tries to create cgroup for container and
attach it to hierarchies like memory and blkio.

That is because systemd timer was triggered with non-virtualized timerfd
using plain host clock but check that timer is successfull uses
virtualized clock_gettime and don't pass before proper(in-container)
timer activation. And timers charges again and again starts service
got in busy loop.

https://jira.sw.ru/browse/PSBM-34017

v2: move the stubs to ve.h

Port the following RH6 commit:

  Author: Vladimir Davydov
  Email: vdavy...@parallels.com
  Subject: fs: convert ve monotonic to abs time when setting timerfd
  Date: Fri, 15 Feb 2013 11:57:09 +0400

  * [timers] corrected TFD_TIMER_ABSTIME timer handling,
the issue led to high cpu usage inside a Fedora 18 CT
by 'init' process (PSBM-18284)

  Monotonic time inside container, as it can be obtained using various
  system calls such as clock_gettime, is reported since start of the container,
  not since start of the whole system. This was made in order to avoid time
  issues while a container is migrated between different physical hosts, but 
this
  also introduced a lot of problems in time- related system calls because
  absolute monotonic time, which is in fact relative to container, passed to 
those
  system calls must be converted to system-wide monotonic time, which is used by
  kernel hrtimers.

  One of those buggy system calls is timerfd_settime which accepts as an
  argument absolute time if flag TFD_TIMER_ABSTIME is specified.

  The patch fixes it by converting container monotonic time to system-
  wide monotonic time using the monotonic_ve_to_abs() function, which was
  introduced earlier and is now exported for that reason.

  https://jira.sw.ru/browse/PSBM-18284

  Signed-off-by: Vladimir Davydov 

Signed-off-by: Pavel Tikhomirov 
Signed-off-by: Kirill Tkhai 
Reviewed-by: Vladimir Davydov 

(cherry picked from vz7 commit 869542c24c41c0578b47d2ef83cfa63427e0e5e1)
Signed-off-by: Konstantin Khorenko 

+++
timers should not get negative argument

This patch fixes 25-sec delay on login into systemd based containers.

Userspace application can set timer for past
and expect that the timer will be expired immediately.

This can do not work as expected inside migrated containers.
Translated argument provided to timer can become negative,
and according timer will sleep a very long time.

https://jira.sw.ru/browse/PSBM-48475

CC: Vladimir Davydov 
CC: Konstantin Khorenko 
Signed-off-by: Vasily Averin 
Acked-by: Cyrill Gorcunov 

(cherry picked from vz7 commit a71fa19facb00472e47760255ab2e6fa16885732)
Signed-off-by: Konstantin Khorenko 

Changes:
v2: dropped second redundant string "if (flags & TIMER_ABSTIME)"

---
 fs/timerfd.c   |  8 --
 include/linux/ve.h |  8 ++
 kernel/time/posix-timers.c | 54 --
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index cdad49da3ff7..59ed38c29941 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct timerfd_ctx {
union {
@@ -432,8 +433,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
return ufd;
 }
 
-static int do_timerfd_settime(int ufd, int flags, 
-   const struct itimerspec64 *new,
+static int do_timerfd_settime(int ufd, int flags,
+   struct itimerspec64 *new,
struct itimerspec64 *old)
 {
struct fd f;
@@ -493,6 +494,9 @@ static int do_timerfd_settime(int ufd, int flags,
/*
 * Re-program the timer to the new value ...
 */
+   if ((flags & TFD_TIMER_ABSTIME) &&
+   (new->it_value.tv_sec || new->it_value.tv_nsec))
+   

Re: [Devel] [PATCH rh8] ve/posix-timers: reference ve monotonic clock from ve start (v2)

2020-10-12 Thread Konstantin Khorenko


--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 10/12/2020 03:49 PM, Kirill Tkhai wrote:

@@ -1233,8 +1279,13 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, 
which_clock, int, flags,
  
  	if (!timespec64_valid())

return -EINVAL;
+
if (flags & TIMER_ABSTIME)

Is this string going to be deleted?

Very strange, even no compilation warning. :\
Will updated the patch surely, thank you for noticing.




+
+   if (flags & TIMER_ABSTIME) {
+   monotonic_ve_to_abs(which_clock, );
rmtp = NULL;
+   }
current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
current->restart_block.nanosleep.rmtp = rmtp;


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


Re: [Devel] [PATCH rh8] ve/posix-timers: reference ve monotonic clock from ve start (v2)

2020-10-12 Thread Kirill Tkhai
On 09.10.2020 14:02, Konstantin Khorenko wrote:
> From: Kirill Tkhai 
> 
> So that CLOCK_MONOTONIC will be monotonic even if ve is migrated to
> another hw node.
> 
> Note, translating ve <-> abs time in clock_settime and timer_settime is
> not necessary because (1) clock_settime won't set monotonic clock and
> (2) timer_gettime always returns relative time.
> 
> https://jira.sw.ru/browse/PSBM-13860
> 
> diff-posix_timers-reference-ct-monotonic-clock-from-ct-start
> 
> Signed-off-by: Vladimir Davydov 
> 
> Acked-by: Pavel Emelyanov 
> Signed-off-by: Kirill Tkhai 
> 
> +++
> ve/posix-timers: reference ve monotonic clock from start in clock_nanosleep
> 
> This is an addition to 
> diff-posix_timers-reference-ct-monotonic-clock-from-ct-start
> 
> Otherwise, apps that use sys_clock_nanosleep() to suspend their
> execution can hang after ve migration.
> 
> diff-posix-timers-reference-ve-monotonic-clock-from-ve-start-in-clock_nanosleep
> 
> Signed-off-by: Vladimir Davydov 
> 
> Acked-by: Konstantin Khlebnikov 
> Acked-by: Pavel Emelyanov 
> Signed-off-by: Kirill Tkhai 
> 
> +++
> timers: Port 
> diff-ve-timers-convert-ve-monotonic-to-abs-time-when-setting-timerfd-2
> 
> Need this for docker, as sometimes systemd-tmpfiles-clean.timer inside
> a PCS7 CT is spamming dbus with requests to start corresponding service.
> And at the same time Docker tries to create cgroup for container and
> attach it to hierarchies like memory and blkio.
> 
> That is because systemd timer was triggered with non-virtualized timerfd
> using plain host clock but check that timer is successfull uses
> virtualized clock_gettime and don't pass before proper(in-container)
> timer activation. And timers charges again and again starts service
> got in busy loop.
> 
> https://jira.sw.ru/browse/PSBM-34017
> 
> v2: move the stubs to ve.h
> 
> Port the following RH6 commit:
> 
>   Author: Vladimir Davydov
>   Email: vdavy...@parallels.com
>   Subject: fs: convert ve monotonic to abs time when setting timerfd
>   Date: Fri, 15 Feb 2013 11:57:09 +0400
> 
>   * [timers] corrected TFD_TIMER_ABSTIME timer handling,
> the issue led to high cpu usage inside a Fedora 18 CT
> by 'init' process (PSBM-18284)
> 
>   Monotonic time inside container, as it can be obtained using various
>   system calls such as clock_gettime, is reported since start of the 
> container,
>   not since start of the whole system. This was made in order to avoid time
>   issues while a container is migrated between different physical hosts, but 
> this
>   also introduced a lot of problems in time- related system calls because
>   absolute monotonic time, which is in fact relative to container, passed to 
> those
>   system calls must be converted to system-wide monotonic time, which is used 
> by
>   kernel hrtimers.
> 
>   One of those buggy system calls is timerfd_settime which accepts as an
>   argument absolute time if flag TFD_TIMER_ABSTIME is specified.
> 
>   The patch fixes it by converting container monotonic time to system-
>   wide monotonic time using the monotonic_ve_to_abs() function, which was
>   introduced earlier and is now exported for that reason.
> 
>   https://jira.sw.ru/browse/PSBM-18284
> 
>   Signed-off-by: Vladimir Davydov 
> 
> Signed-off-by: Pavel Tikhomirov 
> Signed-off-by: Kirill Tkhai 
> Reviewed-by: Vladimir Davydov 
> 
> (cherry picked from vz7 commit 869542c24c41c0578b47d2ef83cfa63427e0e5e1)
> Signed-off-by: Konstantin Khorenko 
> 
> +++
> timers should not get negative argument
> 
> This patch fixes 25-sec delay on login into systemd based containers.
> 
> Userspace application can set timer for past
> and expect that the timer will be expired immediately.
> 
> This can do not work as expected inside migrated containers.
> Translated argument provided to timer can become negative,
> and according timer will sleep a very long time.
> 
> https://jira.sw.ru/browse/PSBM-48475
> 
> CC: Vladimir Davydov 
> CC: Konstantin Khorenko 
> Signed-off-by: Vasily Averin 
> Acked-by: Cyrill Gorcunov 
> 
> (cherry picked from vz7 commit a71fa19facb00472e47760255ab2e6fa16885732)
> Signed-off-by: Konstantin Khorenko 
> ---
>  fs/timerfd.c   |  8 --
>  include/linux/ve.h |  8 ++
>  kernel/time/posix-timers.c | 55 +-
>  3 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index cdad49da3ff7..59ed38c29941 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct timerfd_ctx {
>   union {
> @@ -432,8 +433,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
>   return ufd;
>  }
>  
> -static int do_timerfd_settime(int ufd, int flags, 
> - const struct itimerspec64 *new,
> +static int do_timerfd_settime(int ufd, int flags,
> + struct itimerspec64 *new,
>   struct itimerspec64 *old)
>  {
>   struct fd f;
> @@ -493,6 +494,9 

[Devel] [PATCH rh8] ve: Add interface for ve::clock_[monotonic|bootbased] adjustment

2020-10-12 Thread Konstantin Khorenko
From: Cyrill Gorcunov 

This two members represent monotonic and bootbased clocks for
container's uptime. When container is in suspended state (or
moving to another node) we trest monotonic and bootbased
clocks as being stopped so we need to account delta time
on restore and adjust the members in subject.

Moreover this timestamps are involved into posix-timers
setup so once application tries to setup monotonic clocks
after the restore (with absolute time specification) we
adjust the values as well.

The application which migrate a container must fetch
the current settings from /sys/fs/cgroup/ve/$VE/ve.real_start_timespec
and /sys/fs/cgroup/ve/$VE/ve.start_timespec, then write them
back on the restore.

https://jira.sw.ru/browse/PSBM-41311
https://jira.sw.ru/browse/PSBM-41406

v2:
 - use clock_[monotonic|bootbased] for cgroup entry names instead

Original-by: Andrew Vagin 
Signed-off-by: Cyrill Gorcunov 
Reviewed-by: Vladimir Davydov 

(cherry picked from vz7 commit 43f4b0c752abd84aa1b346373d152941123d2446
("ve: Add interface for @start_timespec and @real_start_timespec
adjustmen"))

Signed-off-by: Konstantin Khorenko 
---
 kernel/ve/ve.c | 74 ++
 1 file changed, 74 insertions(+)

diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index ac2252445841..cc26d3b2fa9b 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -925,6 +925,66 @@ static ssize_t ve_os_release_write(struct kernfs_open_file 
*of, char *buf,
return ret ? ret : nbytes;
 }
 
+enum {
+   VE_CF_CLOCK_MONOTONIC,
+   VE_CF_CLOCK_BOOTBASED,
+};
+
+static int ve_ts_read(struct seq_file *sf, void *v)
+{
+   struct ve_struct *ve = css_to_ve(seq_css(sf));
+   struct timespec ts;
+   u64 now, delta;
+
+   switch (seq_cft(sf)->private) {
+   case VE_CF_CLOCK_MONOTONIC:
+   now = ktime_get_ns();
+   delta = ve->start_time;
+   break;
+   case VE_CF_CLOCK_BOOTBASED:
+   now = ktime_get_boot_ns();
+   delta = ve->real_start_time;
+   break;
+   default:
+   now = delta = 0;
+   WARN_ON_ONCE(1);
+   break;
+   }
+
+   ts = ns_to_timespec(now - delta);
+   seq_printf(sf, "%ld %ld", ts.tv_sec, ts.tv_nsec);
+   return 0;
+}
+
+static ssize_t ve_ts_write(struct kernfs_open_file *of, char *buf,
+  size_t nbytes, loff_t off)
+{
+   struct ve_struct *ve = css_to_ve(of_css(of));
+   struct timespec delta;
+   u64 delta_ns, now, *target;
+
+   if (sscanf(buf, "%ld %ld", _sec, _nsec) != 2)
+   return -EINVAL;
+   delta_ns = timespec_to_ns();
+
+   switch (of_cft(of)->private) {
+   case VE_CF_CLOCK_MONOTONIC:
+   now = ktime_get_ns();
+   target = >start_time;
+   break;
+   case VE_CF_CLOCK_BOOTBASED:
+   now = ktime_get_boot_ns();
+   target = >real_start_time;
+   break;
+   default:
+   WARN_ON_ONCE(1);
+   return -EINVAL;
+   }
+
+   *target = now - delta_ns;
+   return nbytes;
+}
+
 static u64 ve_netns_max_nr_read(struct cgroup_subsys_state *css, struct cftype 
*cft)
 {
return css_to_ve(css)->netns_max_nr;
@@ -994,6 +1054,20 @@ static struct cftype ve_cftypes[] = {
.read_u64   = ve_iptables_mask_read,
.write_u64  = ve_iptables_mask_write,
},
+   {
+   .name   = "clock_monotonic",
+   .flags  = CFTYPE_NOT_ON_ROOT,
+   .seq_show   = ve_ts_read,
+   .write  = ve_ts_write,
+   .private= VE_CF_CLOCK_MONOTONIC,
+   },
+   {
+   .name   = "clock_bootbased",
+   .flags  = CFTYPE_NOT_ON_ROOT,
+   .seq_show   = ve_ts_read,
+   .write  = ve_ts_write,
+   .private= VE_CF_CLOCK_BOOTBASED,
+   },
 #endif
{
.name   = "netns_max_nr",
-- 
2.28.0

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] mm/memcg: optimize mem_cgroup_enough_memory()

2020-10-12 Thread Andrey Ryabinin
mem_cgroup_enough_memory() iterates memcg's subtree to account
'MEM_CGROUP_STAT_CACHE - MEM_CGROUP_STAT_SHMEM'.

Fortunately we can just read memcg->cache counter instead
as it's hierarchical (includes subgroups) and doesn't account
shmem.

https://jira.sw.ru/browse/PSBM-120968
Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6587cc2ef019..e36ad592b3c7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4721,11 +4721,7 @@ int mem_cgroup_enough_memory(struct mem_cgroup *memcg, 
long pages)
free += page_counter_read(>dcache);
 
/* assume file cache is reclaimable */
-   free += mem_cgroup_recursive_stat2(memcg, MEM_CGROUP_STAT_CACHE);
-
-   /* but do not count shmem pages as they can't be purged,
-* only swapped out */
-   free -= mem_cgroup_recursive_stat(memcg, MEM_CGROUP_STAT_SHMEM);
+   free += page_counter_read(>cache);
 
return free < pages ? -ENOMEM : 0;
 }
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ploop: Wake up on last discard bio from ploop_complete_request()

2020-10-12 Thread Kirill Tkhai
Concurrent thread waits that on pending_waitq.

https://jira.sw.ru/browse/PSBM-121135

Signed-off-by: Kirill Tkhai 
---
 drivers/block/ploop/dev.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
index d6edbfbe4422..320a5d55d65b 100644
--- a/drivers/block/ploop/dev.c
+++ b/drivers/block/ploop/dev.c
@@ -1461,8 +1461,11 @@ static void ploop_complete_request(struct ploop_request 
* preq)
 */
spin_lock_irq(>lock);
plo->active_reqs--;
-   if (preq->req_rw & REQ_DISCARD)
-   plo->discard_inflight_reqs--;
+   if (preq->req_rw & REQ_DISCARD) {
+   if (!--plo->discard_inflight_reqs &&
+   waitqueue_active(>pending_waitq))
+   wake_up(>pending_waitq);
+   }
spin_unlock_irq(>lock);
 
while (preq->bl.head) {


___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel