From: Nadav Har'El <n...@scylladb.com>
Committer: Nadav Har'El <n...@scylladb.com>
Branch: master

Fix SMP initialization of paravirtual KVM clock

Issue #869 reported that the bug which was supposedly fixed by commit
34232de9 has returned when running OSv in a Linux 4.10 host. This patch
fixes the same issue yet again, more correctly, and fixes #869:

For the paravirtual KVM (or Xen) clock, our implementation of uptime()
initially returns 0, until the time that _smp_init is set. Afterwards,
uptime() returns:

     system_time() - _boot_systemtime

The thinking was that when _smp_init is set, so is _boot_systemtime.
That is indeed correct, but the problem is that at this point, system_time()
doesn't necessarily work! The first CPU to call pv_based_clock::setup_cpu()
will set up _boot_systemtime and _smp_init, but at that point other CPUs who
have not yet called setup_cpu() will have a zero system_time(), and as
a result the subtraction above will be negative. So uptime() will return
a negative time on those CPUs, after having previously returned time 0.
Our timer implementation can detect the clock going backward in this manner,
and cause a crash, as reported in issue #869 (and in a long time ago in
commit 34232de9).

The fix is to to set _smp_init not when the first CPU's clock is initialized
(as we did until now), but only only after all CPUs' clocks have been set up -
at which point both system_time() and _boot_systemtime will be correct on
all CPUs.

The downside of this fix is that our clock continues to be stuck at time 0
for even longer during early boot - in my benchmark this time can grow to
almost 0.3 seconds for 32 vcpus. However, this "stuck time" does not cause
any real problems as code which needs functional clocks and preemptive
multitasking will run later anyway.

Signed-off-by: Nadav Har'El <n...@scylladb.com>
Message-Id: <20170418122956.7702-1-...@scylladb.com>
Reviewed-by: Benoît Canet <ben...@scylladb.com>

---
diff --git a/drivers/clock-common.cc b/drivers/clock-common.cc
--- a/drivers/clock-common.cc
+++ b/drivers/clock-common.cc
@@ -9,6 +9,7 @@

 pv_based_clock::pv_based_clock()
     : _smp_init(false)
+    , _boot_systemtime_init_counter(sched::cpus.size())
     , cpu_notifier([&] { setup_cpu(); })
 {
 }
@@ -17,10 +18,15 @@ void pv_based_clock::setup_cpu()
 {
     init_on_cpu();

-    std::call_once(_boot_systemtime_init_flag, [&] {
+    // We need to do the following once, after all CPUs ran their
+    // init_init_on_cpu(), so any CPU calling uptime() will see not only
+    // _boot_systemtime set, but also a functional system_time(). Until
+    // all CPUs are set up, all of the will see zero uptime().
+ if (_boot_systemtime_init_counter.fetch_sub(1, std::memory_order_relaxed)
+            == 1) {
         _boot_systemtime = system_time();
         _smp_init.store(true, std::memory_order_release);
-    });
+    }
 }

 s64 pv_based_clock::time()
diff --git a/drivers/clock-common.hh b/drivers/clock-common.hh
--- a/drivers/clock-common.hh
+++ b/drivers/clock-common.hh
@@ -21,7 +21,7 @@ public:
virtual s64 boot_time() override __attribute__((no_instrument_function));
 private:
     std::atomic<bool> _smp_init;
-    std::once_flag _boot_systemtime_init_flag;
+    std::atomic<int> _boot_systemtime_init_counter;
     s64 _boot_systemtime;
     sched::cpu::notifier cpu_notifier;
     void setup_cpu();

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to