On 14.04.14 14:53, Alexey Kardashevskiy wrote:
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.
This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
970.
This adds kvm_access_one_reg() to access a special register which is not
in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch.
The feature must be present in the host kernel.
Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
Changes:
v6:
* time_of_the_day is now time_of_the_day_ns and measured in nm instead of us
* VMSTATE_PPC_TIMEBASE_V supports versions now
v5:
* fixed multiple comments in cpu_ppc_get_adjusted_tb and merged it
into timebase_post_load()
* removed round_up(1<<24) as KVM is expected to do this anyway
* removed @freq from migration stream
* renamed PPCTimebaseOffset to PPCTimebase
* CLOCKS_PER_SEC is used as a constant which 1000000us/s (man clock)
v4:
* made it per machine timebase offser rather than per CPU
v3:
* kvm_access_one_reg moved out to a separate patch
* tb_offset and host_timebase were replaced with guest_timebase as
the destionation does not really care of offset on the source
v2:
* bumped the vmstate_ppc_cpu version
* defined version for the env.tb_env field
---
hw/ppc/ppc.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++
hw/ppc/spapr.c | 4 +--
include/hw/ppc/spapr.h | 1 +
target-ppc/cpu-qom.h | 16 +++++++++++
target-ppc/kvm.c | 5 ++++
trace-events | 3 ++
6 files changed, 105 insertions(+), 2 deletions(-)
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 71df471..3be4d8c 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -29,9 +29,11 @@
#include "sysemu/cpus.h"
#include "hw/timer/m48t59.h"
#include "qemu/log.h"
+#include "qemu/error-report.h"
#include "hw/loader.h"
#include "sysemu/kvm.h"
#include "kvm_ppc.h"
+#include "trace.h"
//#define PPC_DEBUG_IRQ
//#define PPC_DEBUG_TB
@@ -49,6 +51,8 @@
# define LOG_TB(...) do { } while (0)
#endif
+#define NSEC_PER_SEC 1000000000LL
+
static void cpu_ppc_tb_stop (CPUPPCState *env);
static void cpu_ppc_tb_start (CPUPPCState *env);
@@ -829,6 +833,80 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
cpu_ppc_store_purr(cpu, 0x0000000000000000ULL);
}
+static void timebase_pre_save(void *opaque)
+{
+ PPCTimebase *tb = opaque;
+ uint64_t ticks = cpu_get_real_ticks();
+ PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
+
+ if (!first_ppc_cpu->env.tb_env) {
+ error_report("No timebase object");
+ return;
+ }
+
+ tb->time_of_the_day_ns = get_clock_realtime();
+ /*
+ * tb_offset is only expected to be changed by migration so
+ * there is no need to update it from KVM here
+ */
+ tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
+}
+
+static int timebase_post_load(void *opaque, int version_id)
+{
+ PPCTimebase *tb = opaque;
I think if we call this "remote" or so it becomes more obvious what data
this contains.
+ CPUState *cpu;
+ PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
+ int64_t tb_off_adj, tb_off;
+ int64_t migration_duration_ns, migration_duration_tb, guest_tb, host_ns;
+ unsigned long freq;
+
+ if (!first_ppc_cpu->env.tb_env) {
+ error_report("No timebase object");
+ return -1;
+ }
+
+ freq = first_ppc_cpu->env.tb_env->tb_freq;
+ /*
+ * Calculate timebase on the destination side of migration.
+ * The destination timebase must be not less than the source timebase.
+ * We try to adjust timebase by downtime if host clocks are not
+ * too much out of sync (1 second for now).
+ */
+ host_ns = get_clock_realtime();
+ migration_duration_ns = MIN(NSEC_PER_SEC, host_ns -
tb->time_of_the_day_ns);
+ migration_duration_tb = muldiv64(migration_duration_ns, freq,
NSEC_PER_SEC);
+ guest_tb = tb->guest_timebase + MIN(0, migration_duration_tb);
This means the shift has to be negative, or it gets truncated to 0, no?
What we really want is to have the time not go backwards, so we need
MAX, right? Or did I mess things up again? ;)
In fact, I think it would make things more obvious if we force
migration_duration_ns to always be between [0...1s] rather than force
migration_duration_ns to always be [inf...1s] and then guest_tb as
[0...inf].
+
+ tb_off_adj = guest_tb - cpu_get_real_ticks();
+
+ tb_off = first_ppc_cpu->env.tb_env->tb_offset;
+ trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,
+ (tb_off_adj - tb_off) / freq);
+
+ /* Set new offset to all CPUs */
+ CPU_FOREACH(cpu) {
+ PowerPCCPU *pcpu = POWERPC_CPU(cpu);
+ pcpu->env.tb_env->tb_offset = tb_off_adj;
+ }
+
+ return 0;
+}
+
+const VMStateDescription vmstate_ppc_timebase = {
+ .name = "timebase",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .minimum_version_id_old = 1,
+ .pre_save = timebase_pre_save,
+ .post_load = timebase_post_load,
+ .fields = (VMStateField []) {
+ VMSTATE_UINT64(guest_timebase, PPCTimebase),
+ VMSTATE_UINT64(time_of_the_day_ns, PPCTimebase),
+ VMSTATE_END_OF_LIST()
+ },
+};
+
/* Set up (once) timebase frequency (in Hz) */
clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
{
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 451c473..297fc6f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -818,7 +818,7 @@ static int spapr_vga_init(PCIBus *pci_bus)
static const VMStateDescription vmstate_spapr = {
.name = "spapr",
- .version_id = 1,
+ .version_id = 2,
.minimum_version_id = 1,
So we do not need to bump up the minimum version because the timebase is
a subsection and we have an explicit check for the version there?
Alex