Re: [Qemu-devel] -rtc base=, migration and time jumps
* Paolo Bonzini (pbonz...@redhat.com) wrote: > So here's my understanding: "-rtc base=" says what is the RTC value > when the guest starts. This value is only used by qemu_get_timedate, > and most RTCs only use it on startup or reset. However, there are > exceptions (the PC RTC's host clock notifier, the ds1338's set time > functionality, and all reads of m41t80/m48t59/twl92230) and this causes > the bug. Yes, I think so. > On 19/07/19 14:36, Dr. David Alan Gilbert wrote: > > d) The host clock jump detection (b) is broken - it correctly detects > > backwards jumps; but it's detection of a forward jump is based > > on two readings of the host clock being more than 60s apart - but > > often ona q emu running a Linux guest the host clock isn't read at all; > > so reading hwclock, waiting a minute and reading it again will trigger > > the jump code. > > Oops. Back when the detection was added, there were two QEMU_CLOCK_HOST > timers firing every second so the clock jump detection happened promptly. > > These timers were then removed as a power-saving optimization, and that > broke the jump detection. Ah OK; I'm a bit cautious that perhaps we're still getting some benefit from them; maybe in snapshots? > > 1) Tell people to do what libvirt does and specify base= differently > > on the dest. > > This is racy; the user does not have a good way to know the exact base > on the destination. Right. > > 2) Migrate the offset value such that the base= on the destination > > is ignored > > At least on some RTCs the offset is already being migrated indirectly. > For example on x86 the (base_rtc, last_update) pair might be usable to > reconstruct the offset? Yes it probably is. > > 3) Fix the host clock jump detection > > > > (3) is probably independent - the easiest fix would seem to be just > > to set a timer to read the host clock at say 20 second intervals > > which is wasteful but would avoid the false trigger. > > > > Is (2) worth it or do we just go with (1) - I'm tempted to just > > specify the behaviour. > > > > Mind you, we could kill the host clock jump detection code - only > > the mc148618 registers on the notifier for it - so presumably > > aarch/ppc/s390 etc dont see it. > > I would just remove the host clock jump detection code. IIUC that > should fix your bug so you don't even need to do the above-mentioned > reconstruction of the offset (let's call it 2b) in the PC RTC. OK, I'll do that. > That still leaves the problem that the base goes out of sync on > migration on m41t80/m48t59/twl92230. For that, I think that the > simplest thing to do would be to fix those to store and migrate the > offset themselves just like all other RTC implementations. I'll put those on my TODO - I don't think they're actually used by any versioned machine so keeping migration compat shouldn't be an issue. Dave > Paolo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] -rtc base=, migration and time jumps
So here's my understanding: "-rtc base=" says what is the RTC value when the guest starts. This value is only used by qemu_get_timedate, and most RTCs only use it on startup or reset. However, there are exceptions (the PC RTC's host clock notifier, the ds1338's set time functionality, and all reads of m41t80/m48t59/twl92230) and this causes the bug. On 19/07/19 14:36, Dr. David Alan Gilbert wrote: > d) The host clock jump detection (b) is broken - it correctly detects > backwards jumps; but it's detection of a forward jump is based > on two readings of the host clock being more than 60s apart - but > often ona q emu running a Linux guest the host clock isn't read at all; > so reading hwclock, waiting a minute and reading it again will trigger > the jump code. Oops. Back when the detection was added, there were two QEMU_CLOCK_HOST timers firing every second so the clock jump detection happened promptly. These timers were then removed as a power-saving optimization, and that broke the jump detection. > 1) Tell people to do what libvirt does and specify base= differently > on the dest. This is racy; the user does not have a good way to know the exact base on the destination. > 2) Migrate the offset value such that the base= on the destination > is ignored At least on some RTCs the offset is already being migrated indirectly. For example on x86 the (base_rtc, last_update) pair might be usable to reconstruct the offset? > 3) Fix the host clock jump detection > > (3) is probably independent - the easiest fix would seem to be just > to set a timer to read the host clock at say 20 second intervals > which is wasteful but would avoid the false trigger. > > Is (2) worth it or do we just go with (1) - I'm tempted to just > specify the behaviour. > > Mind you, we could kill the host clock jump detection code - only > the mc148618 registers on the notifier for it - so presumably > aarch/ppc/s390 etc dont see it. I would just remove the host clock jump detection code. IIUC that should fix your bug so you don't even need to do the above-mentioned reconstruction of the offset (let's call it 2b) in the PC RTC. That still leaves the problem that the base goes out of sync on migration on m41t80/m48t59/twl92230. For that, I think that the simplest thing to do would be to fix those to store and migrate the offset themselves just like all other RTC implementations. Paolo
[Qemu-devel] -rtc base=, migration and time jumps
Hi, I've just spent an unreasonable amount of time debugging an rtc issue and come to the conclusion it's probably more of a documentation problem than actual code - but I wondered if anyone disagrees. (ref: https://bugzilla.redhat.com/show_bug.cgi?id=1714143 ) The question revolves around -rtc base=and what the base= passed to a destination qemu after migration should be. (partcicularly with 'host' clock) At startup, QEMU (vl.c) calculates offsets from the host clock to the base - that value isn't migrated. Most rtc calculations done afterwards don't reference it - they're all based on the time since we last read the clock and a rolling time since then. There's code to detect host clock jumps, and trigger a notifier - the only use of that is the mc146818rtc used on the x86. It then reuses the base offset to reset the rtc to the current host clock time. a) If you start a destination qemu with the same base= value as the source then the internal offset value will be different by how much later you started the destination. b) If you can trigger the host clock jump update, then on x86 that difference from (a) will become visible in reading the rtc (hwclock) and thus the rtc will appear to have fallen behind. c) libvirt (when using an 'adjustment' as oVirt does) recalculates the base on the destination; so the base passed to the destination qemu is different from the source; so even when (b) happens you get a consistent value. This may be an accident! d) The host clock jump detection (b) is broken - it correctly detects backwards jumps; but it's detection of a forward jump is based on two readings of the host clock being more than 60s apart - but often ona q emu running a Linux guest the host clock isn't read at all; so reading hwclock, waiting a minute and reading it again will trigger the jump code. So what to do? 1) Tell people to do what libvirt does and specify base= differently on the dest. 2) Migrate the offset value such that the base= on the destination is ignored 3) Fix the host clock jump detection (3) is probably independent - the easiest fix would seem to be just to set a timer to read the host clock at say 20 second intervals which is wasteful but would avoid the false trigger. Is (2) worth it or do we just go with (1) - I'm tempted to just specify the behaviour. Mind you, we could kill the host clock jump detection code - only the mc148618 registers on the notifier for it - so presumably aarch/ppc/s390 etc dont see it. Thoughts? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK