On Mon, Jan 25, 2016 at 05:48:02AM +0000, Mark Cave-Ayland wrote: > On 18/01/16 04:51, David Gibson wrote: > > > On Fri, Jan 15, 2016 at 05:46:10PM +0000, Mark Cave-Ayland wrote: > >> On 12/01/16 02:44, David Gibson wrote: > >> > >>>>> In other words, isn't this just skipping the decrementer interrupts at > >>>>> the qemu level rather than the guest level? > >>>>> > >>>>> It seems that instead we should be reconstructing the decrementer on > >>>>> the destination based on an offset from the timebase. > >>>> > >>>> Well I haven't really looked at how time warping works during in > >>>> migration for QEMU, however this seems to be the method used by > >>>> hw/ppc/ppc.c's timebase_post_load() function but my understanding is > >>>> that this isn't currently available for the g3beige/mac99 machines? > >>> > >>> Ah.. yes, it looks like the timebase migration stuff is only hooked in > >>> on the pseries machine type. As far as I can tell it should be > >>> trivial to add it to other machines though - it doesn't appear to rely > >>> on anything outside the common ppc timebase stuff. > >>> > >>>> Should the patch in fact do this but also add decrementer support? And > >>>> if it did, would this have a negative effect on pseries? > >>> > >>> Yes, I think that's the right approach. Note that rather than > >>> duplicating the logic to adjust the decrementer over migration, it > >>> should be possible to encode the decrementer as a diff from the > >>> timebase across the migration. > >>> > >>> In fact.. I'm not sure it ever makes sense to store the decrementer > >>> value as a direct value, since it's constantly changing - probably > >>> makes more sense to derive it from the timebase whenever it is needed. > >>> > >>> As far as I know that should be fine for pseries. I think the current > >>> behaviour is probably technically wrong for pseries as well, but the > >>> timing code of our Linux guests is robust enough to handle a small > >>> displacement to the time of the next decrementer interrupt. > >> > >> I've had a bit of an experiment trying to implement something suitable, > >> but I'm not 100% certain I've got this right. > >> > >> >From the code my understanding is that the timebase is effectively > >> free-running and so if a migration takes 5s then you use tb_offset to > >> calculate the difference between the timebase before migration, and > >> subsequently apply the offset for all future reads of the timebase for > >> the lifetime of the CPU (i.e. the migrated guest is effectively living > >> at a point in the past where the timebase is consistent). > > > > Um.. no. At least in the usual configuration, the timebase represents > > real, wall-clock time, so we expect it to jump forward across the > > migration downtime. This is important because the guest will use the > > timebase to calculate real time differences. > > > > However, the absolute value of the timebase may be different on the > > *host* between source and destination for migration. So what we need > > to do is before migration we work out the delta between host and guest > > notions of wall clock time (as defined by the guest timebase), and > > transfer that in the migration stream. > > > > On the destination we initialize the guest timebase so that the guest > > maintains the same realtime offset from the host. This means that as > > long as source and destination system time is synchronized, guest > > real-time tracking will continue correctly across the migration. > > > > We do also make sure that the guest timebase never goes backwards, but > > that would only happen if the source and destination host times were > > badly out of sync. > > I had a poke at trying to include the timebase state in the Mac machines > but found that enabling this caused migration to freeze, even on top of > my existing (known working) patchset. > > Looking closer at the code, I don't think it can work on an x86 TCG host > in its current form since the host/guest clocks are used interchangeably > in places, e.g. in timebase_pre_save() we have this: > > uint64_t ticks = cpu_get_host_ticks(); > ... > tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; > > So this implies that guest_timebase is set in higher resolution host > ticks. But then in timebase_post_load() we have this: > > migration_duration_tb = muldiv64(migration_duration_ns, freq, > NANOSECONDS_PER_SECOND); > > which calculates the migration time in timebase ticks but then: > > guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb); > tb_off_adj = guest_tb - cpu_get_host_ticks(); > > which mixes tb_remote->guest_timebase in host ticks with > migration_duration_tb in timebase ticks. > > I think that tb_off_adj should be in host ticks so should > migration_duration_tb be dropped and guest_tb be calculated from > migration_duration_ns instead? At least given the current mixing of host > ticks and timebase ticks, I think this can only work correctly on PPC > where the two are the same. > > One other thing I noticed is that this code "broke" my normal testing > pattern when I tend to launch qemu-system-ppc with -loadvm foo -S with > the machine stopped. In this case the migration duration calculation is > performed at the moment state is restored, and not the point where the > CPU is started. > > Should the adjustment calculation not take place in a cpu_start() > function instead? Otherwise at the point when I resume the paused > machine the tb_off_adj calculation will be based in the past and > therefore wrong.
Um.. so the migration duration is a complete red herring, regardless of the units. Remember, we only ever compute the guest timebase value at the moment the guest requests it - actually maintaining a current timebase value makes sense in hardware, but would be nuts in software. The timebase is a function of real, wall-clock time, and the migration destination has a notion of wall-clock time without reference to the source. So what you need to transmit for migration is enough information to compute the guest timebase from real-time - essentially just an offset between real-time and the timebase. The guest can potentially observe the migration duration as a jump in timebase values, but qemu doesn't need to do any calculations with it. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature