Hi, I've just spent an unreasonable amount of time debugging an rtc issue and come to the conclusion it's probably more of a documentation problem than actual code - but I wondered if anyone disagrees.
(ref: https://bugzilla.redhat.com/show_bug.cgi?id=1714143 ) The question revolves around -rtc base= and what the base= passed to a destination qemu after migration should be. (partcicularly with 'host' clock) At startup, QEMU (vl.c) calculates offsets from the host clock to the base - that value isn't migrated. Most rtc calculations done afterwards don't reference it - they're all based on the time since we last read the clock and a rolling time since then. There's code to detect host clock jumps, and trigger a notifier - the only use of that is the mc146818rtc used on the x86. It then reuses the base offset to reset the rtc to the current host clock time. a) If you start a destination qemu with the same base= value as the source then the internal offset value will be different by how much later you started the destination. b) If you can trigger the host clock jump update, then on x86 that difference from (a) will become visible in reading the rtc (hwclock) and thus the rtc will appear to have fallen behind. c) libvirt (when using an 'adjustment' as oVirt does) recalculates the base on the destination; so the base passed to the destination qemu is different from the source; so even when (b) happens you get a consistent value. This may be an accident! d) The host clock jump detection (b) is broken - it correctly detects backwards jumps; but it's detection of a forward jump is based on two readings of the host clock being more than 60s apart - but often ona q emu running a Linux guest the host clock isn't read at all; so reading hwclock, waiting a minute and reading it again will trigger the jump code. So what to do? 1) Tell people to do what libvirt does and specify base= differently on the dest. 2) Migrate the offset value such that the base= on the destination is ignored 3) Fix the host clock jump detection (3) is probably independent - the easiest fix would seem to be just to set a timer to read the host clock at say 20 second intervals which is wasteful but would avoid the false trigger. Is (2) worth it or do we just go with (1) - I'm tempted to just specify the behaviour. Mind you, we could kill the host clock jump detection code - only the mc148618 registers on the notifier for it - so presumably aarch/ppc/s390 etc dont see it. Thoughts? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK