On 12.10.2020 16:47, Konstantin Khorenko wrote: > From: Kirill Tkhai <ktk...@parallels.com> > > So that CLOCK_MONOTONIC will be monotonic even if ve is migrated to > another hw node. > > Note, translating ve <-> abs time in clock_settime and timer_settime is > not necessary because (1) clock_settime won't set monotonic clock and > (2) timer_gettime always returns relative time. > > https://jira.sw.ru/browse/PSBM-13860 > > diff-posix_timers-reference-ct-monotonic-clock-from-ct-start > > Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> > > Acked-by: Pavel Emelyanov <xe...@parallels.com> > Signed-off-by: Kirill Tkhai <ktk...@parallels.com> > > +++ > ve/posix-timers: reference ve monotonic clock from start in clock_nanosleep > > This is an addition to > diff-posix_timers-reference-ct-monotonic-clock-from-ct-start > > Otherwise, apps that use sys_clock_nanosleep() to suspend their > execution can hang after ve migration. > > diff-posix-timers-reference-ve-monotonic-clock-from-ve-start-in-clock_nanosleep > > Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> > > Acked-by: Konstantin Khlebnikov <khlebni...@openvz.org> > Acked-by: Pavel Emelyanov <xe...@parallels.com> > Signed-off-by: Kirill Tkhai <ktk...@parallels.com> > > +++ > timers: Port > diff-ve-timers-convert-ve-monotonic-to-abs-time-when-setting-timerfd-2 > > Need this for docker, as sometimes systemd-tmpfiles-clean.timer inside > a PCS7 CT is spamming dbus with requests to start corresponding service. > And at the same time Docker tries to create cgroup for container and > attach it to hierarchies like memory and blkio. > > That is because systemd timer was triggered with non-virtualized timerfd > using plain host clock but check that timer is successfull uses > virtualized clock_gettime and don't pass before proper(in-container) > timer activation. And timers charges again and again starts service > got in busy loop. > > https://jira.sw.ru/browse/PSBM-34017 > > v2: move the stubs to ve.h > > Port the following RH6 commit: > > Author: Vladimir Davydov > Email: vdavy...@parallels.com > Subject: fs: convert ve monotonic to abs time when setting timerfd > Date: Fri, 15 Feb 2013 11:57:09 +0400 > > * [timers] corrected TFD_TIMER_ABSTIME timer handling, > the issue led to high cpu usage inside a Fedora 18 CT > by 'init' process (PSBM-18284) > > Monotonic time inside container, as it can be obtained using various > system calls such as clock_gettime, is reported since start of the > container, > not since start of the whole system. This was made in order to avoid time > issues while a container is migrated between different physical hosts, but > this > also introduced a lot of problems in time- related system calls because > absolute monotonic time, which is in fact relative to container, passed to > those > system calls must be converted to system-wide monotonic time, which is used > by > kernel hrtimers. > > One of those buggy system calls is timerfd_settime which accepts as an > argument absolute time if flag TFD_TIMER_ABSTIME is specified. > > The patch fixes it by converting container monotonic time to system- > wide monotonic time using the monotonic_ve_to_abs() function, which was > introduced earlier and is now exported for that reason. > > https://jira.sw.ru/browse/PSBM-18284 > > Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> > > Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> > Signed-off-by: Kirill Tkhai <ktk...@odin.com> > Reviewed-by: Vladimir Davydov <vdavy...@parallels.com> > > (cherry picked from vz7 commit 869542c24c41c0578b47d2ef83cfa63427e0e5e1) > Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com> > > +++ > timers should not get negative argument > > This patch fixes 25-sec delay on login into systemd based containers. > > Userspace application can set timer for past > and expect that the timer will be expired immediately. > > This can do not work as expected inside migrated containers. > Translated argument provided to timer can become negative, > and according timer will sleep a very long time. > > https://jira.sw.ru/browse/PSBM-48475 > > CC: Vladimir Davydov <vdavy...@virtuozzo.com> > CC: Konstantin Khorenko <khore...@virtuozzo.com> > Signed-off-by: Vasily Averin <v...@virtuozzo.com> > Acked-by: Cyrill Gorcunov <gorcu...@openvz.org> > > (cherry picked from vz7 commit a71fa19facb00472e47760255ab2e6fa16885732) > Signed-off-by: Konstantin Khorenko <khore...@virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktk...@virtuozzo.com> > > Changes: > v2: dropped second redundant string "if (flags & TIMER_ABSTIME)" > > --- > fs/timerfd.c | 8 ++++-- > include/linux/ve.h | 8 ++++++ > kernel/time/posix-timers.c | 54 ++++++++++++++++++++++++++++++++++++-- > 3 files changed, 66 insertions(+), 4 deletions(-) > > diff --git a/fs/timerfd.c b/fs/timerfd.c > index cdad49da3ff7..59ed38c29941 100644 > --- a/fs/timerfd.c > +++ b/fs/timerfd.c > @@ -26,6 +26,7 @@ > #include <linux/syscalls.h> > #include <linux/compat.h> > #include <linux/rcupdate.h> > +#include <linux/ve.h> > > struct timerfd_ctx { > union { > @@ -432,8 +433,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags) > return ufd; > } > > -static int do_timerfd_settime(int ufd, int flags, > - const struct itimerspec64 *new, > +static int do_timerfd_settime(int ufd, int flags, > + struct itimerspec64 *new, > struct itimerspec64 *old) > { > struct fd f; > @@ -493,6 +494,9 @@ static int do_timerfd_settime(int ufd, int flags, > /* > * Re-program the timer to the new value ... > */ > + if ((flags & TFD_TIMER_ABSTIME) && > + (new->it_value.tv_sec || new->it_value.tv_nsec)) > + monotonic_ve_to_abs(ctx->clockid, &new->it_value); > ret = timerfd_setup(ctx, flags, new); > > spin_unlock_irq(&ctx->wqh.lock); > diff --git a/include/linux/ve.h b/include/linux/ve.h > index 0db98e8e08c1..ec7dc522ac1f 100644 > --- a/include/linux/ve.h > +++ b/include/linux/ve.h > @@ -129,6 +129,9 @@ static inline struct ve_struct *css_to_ve(struct > cgroup_subsys_state *css) > > extern struct cgroup_subsys_state *ve_get_init_css(struct ve_struct *ve, int > subsys_id); > > +extern void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 > *tp); > +extern void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 > *tp); > + > #define ve_feature_set(ve, f) \ > !!((ve)->features & VE_FEATURE_##f) > > @@ -166,6 +169,11 @@ static inline struct user_namespace > *ve_init_user_ns(void) > static inline int vz_security_family_check(struct net *net, int family) { > return 0; } > static inline int vz_security_protocol_check(struct net *net, int protocol) > { return 0; } > > +static inline void monotonic_abs_to_ve(clockid_t which_clock, > + struct timespec64 *tp) { } > +static inline void monotonic_ve_to_abs(clockid_t which_clock, > + struct timepsec64 *tp) { } > + > #endif /* CONFIG_VE */ > > #endif /* _LINUX_VE_H */ > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c > index 3d54de0a28bc..ddc848490263 100644 > --- a/kernel/time/posix-timers.c > +++ b/kernel/time/posix-timers.c > @@ -51,6 +51,7 @@ > #include <linux/hashtable.h> > #include <linux/compat.h> > #include <linux/nospec.h> > +#include <linux/ve.h> > > #include "timekeeping.h" > #include "posix-timers.h" > @@ -76,6 +77,46 @@ static const struct k_clock * const posix_clocks[]; > static const struct k_clock *clockid_to_kclock(const clockid_t id); > static const struct k_clock clock_realtime, clock_monotonic; > > +#define clock_is_monotonic(which_clock) \ > + ((which_clock) == CLOCK_MONOTONIC || \ > + (which_clock) == CLOCK_MONOTONIC_RAW || \ > + (which_clock) == CLOCK_MONOTONIC_COARSE) > + > +#ifdef CONFIG_VE > +static struct timespec64 zero_time; > + > +void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 *tp) > +{ > + struct timespec64 offset; > + > + if (!clock_is_monotonic(which_clock)) > + return; > + > + offset = ns_to_timespec64(get_exec_env()->start_time); > + set_normalized_timespec64(tp, > + tp->tv_sec - offset.tv_sec, > + tp->tv_nsec - offset.tv_nsec); > +} > + > +void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 *tp) > +{ > + struct timespec64 offset; > + > + if (!clock_is_monotonic(which_clock)) > + return; > + > + offset = ns_to_timespec64(get_exec_env()->start_time); > + set_normalized_timespec64(tp, > + tp->tv_sec + offset.tv_sec, > + tp->tv_nsec + offset.tv_nsec); > + > + if (timespec64_compare(tp, &zero_time) <= 0) { > + tp->tv_sec = 0; > + tp->tv_nsec = 1; > + } > +} > +#endif > + > /* > * we assume that the new SIGEV_THREAD_ID shares no bits with the other > * SIGEV values. Here we put out an error if this assumption fails. > @@ -895,6 +936,9 @@ static int do_timer_settime(timer_t timer_id, int flags, > if (!timr) > return -EINVAL; > > + if ((flags & TIMER_ABSTIME) && > + (new_spec64->it_value.tv_sec || new_spec64->it_value.tv_nsec)) > + monotonic_ve_to_abs(timr->it_clock, &new_spec64->it_value); > kc = timr->kclock; > if (WARN_ON_ONCE(!kc || !kc->timer_set)) > error = -EINVAL; > @@ -1072,6 +1116,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, > which_clock, > > error = kc->clock_get(which_clock, &kernel_tp); > > + monotonic_abs_to_ve(which_clock, &kernel_tp); > if (!error && put_timespec64(&kernel_tp, tp)) > error = -EFAULT; > > @@ -1148,6 +1193,7 @@ COMPAT_SYSCALL_DEFINE2(clock_gettime, clockid_t, > which_clock, > > err = kc->clock_get(which_clock, &ts); > > + monotonic_abs_to_ve(which_clock, &ts); > if (!err && compat_put_timespec64(&ts, tp)) > err = -EFAULT; > > @@ -1233,8 +1279,10 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, > which_clock, int, flags, > > if (!timespec64_valid(&t)) > return -EINVAL; > - if (flags & TIMER_ABSTIME) > + if (flags & TIMER_ABSTIME) { > + monotonic_ve_to_abs(which_clock, &t); > rmtp = NULL; > + } > current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; > current->restart_block.nanosleep.rmtp = rmtp; > > @@ -1260,8 +1308,10 @@ COMPAT_SYSCALL_DEFINE4(clock_nanosleep, clockid_t, > which_clock, int, flags, > > if (!timespec64_valid(&t)) > return -EINVAL; > - if (flags & TIMER_ABSTIME) > + if (flags & TIMER_ABSTIME) { > + monotonic_ve_to_abs(which_clock, &t); > rmtp = NULL; > + } > current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE; > current->restart_block.nanosleep.compat_rmtp = rmtp; > > _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel