On 10.04.14 16:31, Alexey Kardashevskiy wrote:
On 04/10/2014 10:34 PM, Alexander Graf wrote:
On 03.04.14 15:14, Alexey Kardashevskiy wrote:
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.
This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
970.
This adds kvm_access_one_reg() to access a special register which is not
in env->spr.
The feature must be present in the host kernel.
Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
Changes:
v4:
* made it per machine timebase offser rather than per CPU
v3:
* kvm_access_one_reg moved out to a separate patch
* tb_offset and host_timebase were replaced with guest_timebase as
the destionation does not really care of offset on the source
v2:
* bumped the vmstate_ppc_cpu version
* defined version for the env.tb_env field
---
hw/ppc/ppc.c | 120
+++++++++++++++++++++++++++++++++++++++++++++++++
hw/ppc/spapr.c | 3 +-
include/hw/ppc/spapr.h | 2 +
target-ppc/cpu-qom.h | 16 +++++++
target-ppc/kvm.c | 5 +++
target-ppc/machine.c | 4 +-
trace-events | 3 ++
7 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 9c2a132..b51db1b 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -29,9 +29,11 @@
#include "sysemu/cpus.h"
#include "hw/timer/m48t59.h"
#include "qemu/log.h"
+#include "qemu/error-report.h"
#include "hw/loader.h"
#include "sysemu/kvm.h"
#include "kvm_ppc.h"
+#include "trace.h"
//#define PPC_DEBUG_IRQ
//#define PPC_DEBUG_TB
@@ -797,6 +799,124 @@ static void cpu_ppc_set_tb_clk (void *opaque,
uint32_t freq)
cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
}
+/*
+ * Calculate timebase on the destination side of migration
+ *
+ * We calculate new timebase offset as shown below:
+ * 1) Gtb2 = Gtb1 + max(tod2 - tod1, 0)
+ * Gtb2 = tb2 + off2
+ * 2) tb2 + off2 = Gtb1 + max(tod2 - tod1, 0)
+ * 3) off2 = Gtb1 - tb2 + max(tod2 - tod1, 0)
+ *
+ * where:
+ * Gtb2 - destination guest timebase
+ * tb2 - destination host timebase
+ * off2 - destination timebase offset
+ * tod2 - destination time of the day
+ * Gtb1 - source guest timebase
+ * tod1 - source time of the day
+ *
+ * The result we want is in @off2
+ *
+ * Two conditions must be met for @off2:
+ * 1) off2 must be multiple of 2^24 ticks as it will be set via TBU40 SPR
+ * 2) Gtb2 >= Gtb1
+ */
+static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
+{
+ uint64_t tb2, tod2;
+ int64_t off2;
+ int ratio = tb->freq / 1000000;
+ struct timeval tv;
+
+ tb2 = cpu_get_real_ticks();
+ gettimeofday(&tv, NULL);
+ tod2 = tv.tv_sec * 1000000 + tv.tv_usec;
+
+ off2 = tb->guest_timebase - tb2;
+ if ((tod2 > tb->time_of_the_day) &&
+ (tod2 - tb->time_of_the_day < 1000000)) {
+ off2 += (tod2 - tb->time_of_the_day) * ratio;
+ }
+ off2 = ROUND_UP(off2, 1 << 24);
+
+ return off2;
+}
I *think* what you're trying to say here is that you want
assert(source_timebase_freq == timebase_freq);
migration_duration_ns = host_ns - source_host_ns;
guest_tb = source_guest_tb + ns_scaled_to_tb(min(0, migration_duration_ns);
kvm_set_guest_tb(guest_tb);
-> kvm_set_one_reg(KVM_REG_PPC_TB_OFFSET, guest_tb - mftb());
But I honestly have not managed to read that from the code. Either this
really is what you're trying to do and the code is just very hard to read
(which means it needs to be written more easily) or you're doing something
different which I don't understand.
Is this any better?
static int64_t cpu_ppc_adjust_tb_offset(PPCTimebaseOffset *tb)
{
struct timeval tv;
int64_t migration_duration_ns, migration_duration_tb;
If I read the code correctly you're operating in us, not ns, no?
int64_t guest_tb, host_ns;
int ratio = tb->freq / 1000000;
#define USEC_PER_SEC 1000000
You're also losing quite a bit of precision here, no?
int64_t off;
gettimeofday(&tv, NULL);
host_ns = tv.tv_sec * 1000000 + tv.tv_usec;
host_us = get_clock_realtime() / 1000; ?
migration_duration_ns = MIN(1000000,
Why is it MIN(1000000)? Is a migration supposed to last at least 1sec? Why?
host_ns - tb->time_of_the_day);
migration_duration_tb = migration_duration_ns * ratio;
guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
off = guest_tb - cpu_get_real_ticks();
It's probably easier to read when you create one function that just
returns a guest TB value adjusted by the time the last measurement
happened. The fact that the KVM register wants an offset is a KVM
implementation detail. The TB adjustment should happen generically.
return off;
}
We also designed the PPC_TB_OFFSET ONE_REG in a way that it always rounds
up to its 40 bit granularity, so no need to do this in QEMU. In fact, we
don't want to do it in QEMU in case there will be a more fine-grained SPR
in the future.
I believe rounding was not in the kernel when I started making this...
And from all I understand the timebase frequency is now architecturally
specified, so it won't change for newer cores, no?
I asked people in our lab. Everyone says that it should not change but
noone would bet on it too much.
When it changes and you want to live migrate, you'll need to implement a
guest TB scale register and the whole idea of a "TB offset" ONE_REG is
absurd.
The more I think about this the more I realize we should have created a
"guest TB value", not a "guest TB offset" ONE_REG.
And if we migrate TCG
guests it will be the same between two hosts.
And G5 uses 33333333. I really do not understand why it is bad to
send-and-check timer frequency. Why?
Because the guest will continue to run at a different TB frequency on
the new host and break.
Is the rest ok? Thanks for review!
Not sure. Please rework everything according to the comments, make the
code readable enough that your wife understands it and then resend it :).
Alex