Re: [Qemu-devel] [PATCH v2] We should check the virtio_load return code
While I investigated and reproduced the problem, the qemu process itself did not crash/abort. I observed that a Linux guest (KVM virtual machine) became unresponsive after a migration because requests to a virtio disk did not complete. If virtio_load() returns via the following section of code ... if (virtio_set_features(vdev, features) < 0) { supported_features = vdev->binding->get_features(vdev->binding_opaque); error_report("Features 0x%x unsupported. Allowed features: 0x%x", features, supported_features); return -1; } ... various data structures are not fully set up. Execution of the above code is - for example - triggered by passing "scsi=on" to the source qemu and "scsi=off" to the destination qemu process. Since the data structures are in an incomplete state, virtio disk requests from the guest cannot be processed. Regards, Uli - Original Message - > From: "Orit Wasserman" ... > Sent: Thursday, March 1, 2012 12:28:08 PM > Subject: [PATCH v2] We should check the virtio_load return code > > Otherwise we crash on error. > Instruction to reporduce the crash with migration: > 1) run a guest with -device virtio-blk-pci,drive=drive_name,scsi=on > 2) run destination with >-device virtio-blk-pci,drive=drive_name,scsi=off ... -incoming ... > 3) migrate from 1 to 2. > > Signed-off-by: Ulrich Obergfell > Signed-off-by: Orit Wasserman ...
Re: [Qemu-devel] Re: [RFC: 0/2] patch for QEMU HPET periodic timer emulation to alleviate time drift
Anthony, in reply to: > My point is that there's really low hanging fruit and while for some > curious reason I don't actually see this patch, I believe that a patch > like this probably can help us quite a lot in the short term. I've sent the patch in two separate emails: - code part 1 (introduces the qemu command line option) http://article.gmane.org/gmane.comp.emulators.kvm.devel/67347 - code part 2 (implements compensation of lost interrupts) http://article.gmane.org/gmane.comp.emulators.kvm.devel/67348 Regards, Uli Obergfell
Re: [Qemu-devel] Re: [RFC: 0/2] patch for QEMU HPET periodic timer emulation to alleviate time drift
On 02/03/2011 9:07 PM, Anthony Liguori wrote: > On 02/03/2011 09:28 AM, Jan Kiszka wrote: >> On 2011-02-03 14:43, Ulrich Obergfell wrote: ... >>> This is an RFC through which I would like to get feedback on how the >>> idea of a patch to compensate those lost interrupts would be received: >>> >>> The patch determines the number of lost timer interrupts based on the >>> number of elapsed comparator periods. Lost interrupts are compensated >>> >> That neglects coalescing of the HPET IRQs: If the timer is run regularly >> but the guest is not able to retrieve the injected IRQs, you should >> still see drifts with your patches. >> > > FWIW, this isn't the most common failure scenario. This is only really > prominent when you have rapid reinject like we do with the in-kernel > PIT. This generally shouldn't be an issue with gradual reinjection. > >>> by gradually injecting additional interrupts during the subsequent >>> timer intervals, starting at a rate of one additional interrupt per >>> interval. If further interrupts are lost while compensation is still >>> in progress, the rate is increased. The algorithm imposes a limit on >>> the rate and on the 'backlog' of lost interrupts to be injected. Anthony, Jan, many thanks for your feedback. I'm sorry for not responding earlier. I used the following KVM kernel module trace point in __apic_accept_irq() to trace coalesced interrupts related to HPET timer 0: result = !apic_test_and_set_irr(vector, apic); trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector, !result); While running the test outlined in my original email, I observed between 5 and 15 of these events occurring during one minute: qemu-system-x86-7632 [000] 3755.888909: kvm_apic_accept_irq: apicid 0 vec 209 (Fixed|edge) (coalesced) kvm->arch->vioapic.redirtbl[2] indicated that IRQ 2 was routed to vector number 209. With a comparator period of 15.6 milliseconds, the coalesced interrupts apparently contribute less than 250 milliseconds of drift. However, the total drift that I observed during one minute was up to 15 seconds. I think the patch could possibly handle coalesced interrupts too: - In update_irq(), use a similar method as in RTC emulation to detect coalesced interrupts. Return a delivery status to hpet_timer(). - In hpet_timer(), decrement t->irqs_to_inject (number of interrupts remaining to inject) only if the interrupt was not coalesced. Regards, Uli Obergfell
[Qemu-devel] [PATCH 0/3] alleviate time drift with HPET periodic timers
Hi, By the beginning of February I posted an RFC regarding an approach to alleviate time drift with HPET periodic timers. Ref.: http://article.gmane.org/gmane.comp.emulators.kvm.devel/67346 http://lists.gnu.org/archive/html/qemu-devel/2011-02/msg00327.html During the discussion it became apparent that a generic API that could facilitate (re)injection of lost timer interrupts for any timer device emulation would be preferred over an isolated fix for a specific device. However, at this stage there appears to be no plan to actually implement the generic API. Therefore, we may want to reconsider an improved version of the code that I originally included in the RFC. This improved version compensates the loss of timer interrupts caused by ... - delayed timer callback. - interrupt coalescing (x86 only). Lost timer interrupts are compensated by gradually injecting additional interrupts during the subsequent timer intervals, starting at a rate of one additional interrupt per interval. If further interrupts are lost while compensation is still in progress, the rate is increased. A limit is imposed on the rate and on the 'backlog' of lost interrupts that are to be injected. Compensation of lost timer interrupts can be enabled via a new QEMU command line option [1]. -hpet [device=none|present][,driftfix=none|slew] The code is enclosed in '#ifdef CONFIG_HPET_DRIFTFIX ... #endif' and it is configurable via a new option '--enable-hpet-driftfix' for the 'configure' script. This also makes it easy to identify and to remove if the generic API would be implemented some day. I've tested this code primarily by playing a list of example video files continuously in a Windows Vista(tm) guest. In some cases I observed the guest time fall 15+ seconds behind KVM host time in one minute without the patch, whereas with the patch there was no visible difference between the guest time and KVM host time after several hours of testing [2]. The 'qemu-system-x86_64' test binary image file was built from snapshot ... http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=e700699e9f6f1e163985a8e4cd76d8568d24c152 ... using only the '--prefix' and '--enable-hpet-driftfix' options of the 'configure' script (other options default). The series of patches is based on the aforementioned snapshot and it consists of three parts. - Part 1 introduces the new QEMU command line option. - Part 2 implements the method to compensate lost timer interrupts. - Part 3 introduces the new option for the 'configure' script. Please review and please comment. Regards, Uli Obergfell [1] Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour is acceptable. [2] Testing required a patch in tight_compress_data(), ui/vnc-enc-tight.c to avoid segmentation fault / core dump of qemu. However, this needs to be discussed in a separate post.
[Qemu-devel] [PATCH 2/3] alleviate time drift with HPET periodic timers
Part 2 of the patch compensates loss of timer interrupts caused by ... - delayed timer callback. - interrupt coalescing (x86 only). Lost timer interrupts are compensated by gradually injecting additional interrupts during the subsequent timer intervals, starting at a rate of one additional interrupt per interval. If further interrupts are lost while compensation is still in progress, the rate is increased. A limit is imposed on the rate and on the 'backlog' of lost interrupts that are to be injected. Signed-off-by: Ulrich Obergfell diff -up ./hw/apic.c.orig2 ./hw/apic.c --- ./hw/apic.c.orig2 2011-02-18 22:48:06.0 +0100 +++ ./hw/apic.c 2011-03-13 12:40:56.110991702 +0100 @@ -24,6 +24,9 @@ #include "sysbus.h" #include "trace.h" #include "kvm.h" +#ifdef CONFIG_HPET_DRIFTFIX +#include "sysemu.h" +#endif /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -1143,6 +1146,10 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +#ifdef CONFIG_HPET_DRIFTFIX +qemu_get_irq_delivered = apic_get_irq_delivered; +qemu_reset_irq_delivered = apic_reset_irq_delivered; +#endif sysbus_register_withprop(&apic_info); } diff -up ./hw/hpet.c.orig2 ./hw/hpet.c --- ./hw/hpet.c.orig2 2011-02-18 22:48:06.0 +0100 +++ ./hw/hpet.c 2011-03-13 12:40:56.111991655 +0100 @@ -31,6 +31,9 @@ #include "hpet_emul.h" #include "sysbus.h" #include "mc146818rtc.h" +#ifdef CONFIG_HPET_DRIFTFIX +#include "sysemu.h" +#endif //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -41,6 +44,13 @@ #define HPET_MSI_SUPPORT0 +#ifdef CONFIG_HPET_DRIFTFIX +#define MAX_IRQS_TO_INJECT (uint32_t)5000 +#define MAX_IRQ_RATE (uint32_t)10 + +extern int hpet_driftfix; +#endif + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -55,6 +65,12 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +#ifdef CONFIG_HPET_DRIFTFIX +uint64_t saved_period; +uint32_t irqs_to_inject; +uint32_t irq_rate; +uint32_t divisor; +#endif } HPETTimer; typedef struct HPETState { @@ -169,11 +185,12 @@ static inline uint64_t hpet_calculate_di } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -198,8 +215,19 @@ static void update_irq(struct HPETTimer qemu_irq_raise(s->irqs[route]); } else { s->isr &= ~mask; +#ifdef CONFIG_HPET_DRIFTFIX +if (hpet_driftfix && qemu_get_irq_delivered + && qemu_reset_irq_delivered) { +qemu_reset_irq_delivered(); +qemu_irq_raise(s->irqs[route]); +irq_delivered = qemu_get_irq_delivered(); +qemu_irq_lower(s->irqs[route]); +} +else +#endif qemu_irq_pulse(s->irqs[route]); } +return irq_delivered; } static void hpet_pre_save(void *opaque) @@ -246,7 +274,11 @@ static int hpet_post_load(void *opaque, static const VMStateDescription vmstate_hpet_timer = { .name = "hpet_timer", +#ifdef CONFIG_HPET_DRIFTFIX +.version_id = 3, +#else .version_id = 1, +#endif .minimum_version_id = 1, .minimum_version_id_old = 1, .fields = (VMStateField []) { @@ -256,6 +288,12 @@ static const VMStateDescription vmstate_ VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), +#ifdef CONFIG_HPET_DRIFTFIX +VMSTATE_UINT64_V(saved_period, HPETTimer, 3), +VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), +VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), +VMSTATE_UINT32_V(divisor, HPETTimer, 3), +#endif VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() } @@ -263,7 +301,11 @@ static const VMStateDescription vmstate_ static const VMStateDescription vmstate_hpet = { .name = "hpet", +#ifdef CONFIG_HPET_DRIFTFIX +.version_id = 3, +#else .version_id = 2, +#endif .minimum_version_id = 1, .minimum_version_id_old = 1, .pre_save = hpet_pre_save, @@ -287,7 +329,10 @@ static void hpet_timer(void *opaque) { HPETTimer *t = opaque; uint64_t diff; - +#ifdef CONFIG_HPET_DRIFTFIX +uint32_t irq_count = 0; +int irq_delivered = 0; +#endif uint64_t period = t->period; uint64_t cur_tick = hpet_get_ticks(t->state); @@ -295,1
[Qemu-devel] [PATCH 1/3] alleviate time drift with HPET periodic timers
Part 1 of the patch implements the following QEMU command line option. -hpet [device=none|present][,driftfix=none|slew] Signed-off-by: Ulrich Obergfell diff -up ./qemu-config.c.orig1 ./qemu-config.c --- ./qemu-config.c.orig1 2011-02-18 22:48:06.0 +0100 +++ ./qemu-config.c 2011-03-13 12:38:22.813976639 +0100 @@ -261,6 +261,23 @@ static QemuOptsList qemu_rtc_opts = { }, }; +#ifdef CONFIG_HPET_DRIFTFIX +static QemuOptsList qemu_hpet_opts = { +.name = "hpet", +.head = QTAILQ_HEAD_INITIALIZER(qemu_hpet_opts.head), +.desc = { +{ +.name = "device", +.type = QEMU_OPT_STRING, +},{ +.name = "driftfix", +.type = QEMU_OPT_STRING, +}, +{ /* end of list */ } +}, +}; +#endif + static QemuOptsList qemu_global_opts = { .name = "global", .head = QTAILQ_HEAD_INITIALIZER(qemu_global_opts.head), @@ -462,6 +479,9 @@ static QemuOptsList *vm_config_groups[32 &qemu_netdev_opts, &qemu_net_opts, &qemu_rtc_opts, +#ifdef CONFIG_HPET_DRIFTFIX +&qemu_hpet_opts, +#endif &qemu_global_opts, &qemu_mon_opts, &qemu_cpudef_opts, diff -up ./qemu-options.hx.orig1 ./qemu-options.hx --- ./qemu-options.hx.orig1 2011-02-18 22:48:06.0 +0100 +++ ./qemu-options.hx 2011-03-13 12:38:22.815977096 +0100 @@ -972,6 +972,13 @@ STEXI Disable HPET support. ETEXI +#ifdef CONFIG_HPET_DRIFTFIX +DEF("hpet", HAS_ARG, QEMU_OPTION_hpet, \ +"-hpet [device=none|present][,driftfix=none|slew]\n" \ +"disable or enable HPET, disable or enable drift fix for periodic timers\n", +QEMU_ARCH_ALL) +#endif + DEF("balloon", HAS_ARG, QEMU_OPTION_balloon, "-balloon none disable balloon device\n" "-balloon virtio[,addr=str]\n" diff -up ./vl.c.orig1 ./vl.c --- ./vl.c.orig12011-02-18 22:48:06.0 +0100 +++ ./vl.c 2011-03-13 12:38:35.167984285 +0100 @@ -203,6 +203,9 @@ CharDriverState *parallel_hds[MAX_PARALL CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES]; int win2k_install_hack = 0; int rtc_td_hack = 0; +#ifdef CONFIG_HPET_DRIFTFIX +int hpet_driftfix = 0; +#endif int usb_enabled = 0; int singlestep = 0; int smp_cpus = 1; @@ -431,6 +434,41 @@ static void configure_rtc(QemuOpts *opts } } +#ifdef CONFIG_HPET_DRIFTFIX +static void configure_hpet(QemuOpts *opts) +{ +const char *value, *opt; + +opt = "device"; +value = qemu_opt_get(opts, opt); +if (value) { +if (!strcmp(value, "present")) { +no_hpet = 0; +} else if (!strcmp(value, "none")) { +no_hpet = 1; +} else { +goto error_exit; +} +} +opt = "driftfix"; +value = qemu_opt_get(opts, opt); +if (value) { +if (!strcmp(value, "slew")) { +hpet_driftfix = 1; +} else if (!strcmp(value, "none")) { +hpet_driftfix = 0; +} else { +goto error_exit; +} +} +return; + +error_exit: +fprintf(stderr, "qemu: -hpet option '%s': value missing or invalid\n", opt); +exit(1); +} +#endif + /***/ /* Bluetooth support */ static int nb_hcis; @@ -2644,6 +2682,15 @@ int main(int argc, char **argv, char **e case QEMU_OPTION_no_hpet: no_hpet = 1; break; +#ifdef CONFIG_HPET_DRIFTFIX +case QEMU_OPTION_hpet: +opts = qemu_opts_parse(qemu_find_opts("hpet"), optarg, 0); +if (!opts) { +exit(1); +} +configure_hpet(opts); +break; +#endif case QEMU_OPTION_balloon: if (balloon_parse(optarg) < 0) { fprintf(stderr, "Unknown -balloon argument %s\n", optarg);
[Qemu-devel] [PATCH 3/3] alleviate time drift with HPET periodic timers
Part 3 of the patch implements the following options for the 'configure' script. --disable-hpet-driftfix --enable-hpet-driftfix Signed-off-by: Ulrich Obergfell diff -up ./configure.orig3 ./configure --- ./configure.orig3 2011-02-18 22:48:06.0 +0100 +++ ./configure 2011-03-13 12:43:44.870966181 +0100 @@ -140,6 +140,7 @@ linux_aio="" attr="" vhost_net="" xfs="" +hpet_driftfix="no" gprof="no" debug_tcg="no" @@ -749,6 +750,10 @@ for opt do ;; --enable-rbd) rbd="yes" ;; + --disable-hpet-driftfix) hpet_driftfix="no" + ;; + --enable-hpet-driftfix) hpet_driftfix="yes" + ;; *) echo "ERROR: unknown option $opt"; show_help="yes" ;; esac @@ -2602,6 +2607,7 @@ echo "Trace output file $trace_file- Your SDL version is too old - please upgrade to have SDL support" @@ -2893,6 +2899,10 @@ if test "$rbd" = "yes" ; then echo "CONFIG_RBD=y" >> $config_host_mak fi +if test "$hpet_driftfix" = "yes" ; then + echo "CONFIG_HPET_DRIFTFIX=y" >> $config_host_mak +fi + # USB host support case "$usb" in linux)
[Qemu-devel] [PATCH] vnc: segmentation fault caused by incorrect 'bytes' count calculated in tight_compress_data()
tight_compress_data() calculates an incorrect 'bytes' count if 'zstream->total_out' is greater than 0x7fff, because the type of the variable 'previous_out' is 'int'. 852 int previous_out; : 872 previous_out = zstream->total_out; : 881 bytes = zstream->total_out - previous_out; 882 883 tight_send_compact_size(vs, bytes); 884 vnc_write(vs, vs->tight.zlib.buffer, bytes); The incorrect 'bytes' count causes segmentation faults in functions called by tight_compress_data(). For example: (gdb) bt #0 0x00396ab3d2b6 in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x004b3b91 in buffer_append (buffer=0x322d660, data=, len=) at /usr/include/bits/string3.h:52 #2 0x004bbf02 in tight_compress_data (vs=0x32215b0, stream_id=, bytes=0x100024881, level=, strategy=) at ui/vnc-enc-tight.c:884 ... (gdb) x/i $rip => 0x396ab3d2b6 <__memcpy_ssse3_back+6710>: movdqu -0x10(%rsi),%xmm0 (gdb) print $rsi+0x10 $1 = 0x103f2ab21 (gdb) x/xb 0x103f2ab21 0x103f2ab21:Cannot access memory at address 0x103f2ab21 The following program illustrates the problem. $ cat t.c #include main() { int previous_out; unsigned long total_out = 0x8000; size_t bytes; previous_out = total_out; total_out += 0x1; bytes = total_out - previous_out; printf("%lx\n", bytes); } $ cc t.c -o t $ ./t 10001 The patch changes the type of 'previous_out' to 'uLong' which is the same as the type of 'zstream->total_out'. Signed-off-by: Ulrich Obergfell diff -up ./ui/vnc-enc-tight.c.orig0 ./ui/vnc-enc-tight.c --- ./ui/vnc-enc-tight.c.orig0 2011-03-15 03:53:22.0 +0100 +++ ./ui/vnc-enc-tight.c2011-03-20 12:14:48.013560009 +0100 @@ -849,7 +849,7 @@ static int tight_compress_data(VncState int level, int strategy) { z_streamp zstream = &vs->tight.stream[stream_id]; -int previous_out; +uLong previous_out; if (bytes < VNC_TIGHT_MIN_TO_COMPRESS) { vnc_write(vs, vs->tight.tight.buffer, vs->tight.tight.offset);
Re: [Qemu-devel] [PATCH] vnc: segmentation fault caused by incorrect 'bytes' count calculated in tight_compress_data()
> Hi Ulrich, > Looks a lot like "vnc: tight: Fix crash after 2GB of output", right ? > > -- > Corentin Chary > http://xf.iksaif.net Hi Corentin, yes, this appears to be the same issue as: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg02044.html You posted your patch only a few minutes before I posted mine. Hence, I wasn't aware that a fix was already available. Regards, Uli
[Qemu-devel] vnc: severe memory leak caused by broken palette_destroy() function
The following commit breaks the code of the function palette_destroy(). http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=e31e3694afef58ba191cbcc6875ec243e5971268 The broken code causes a severe memory leak of 'VncPalette' structures because it never frees anything: 70 void palette_destroy(VncPalette *palette) 71 { 72 if (palette == NULL) { 73 qemu_free(palette); 74 } 75 } Calling qemu_free() unconditionally could be considered. However, the original code (prior to the aforementioned commit) returned immediately if 'palette' was NULL. In order to be closer to the original code, the proposed patch corrects the 'if' statement. Signed-off-by: Ulrich Obergfell diff -up ./ui/vnc-palette.c.orig0 ./ui/vnc-palette.c --- ./ui/vnc-palette.c.orig02011-03-15 03:53:22.0 +0100 +++ ./ui/vnc-palette.c 2011-03-20 11:52:57.257560295 +0100 @@ -69,7 +69,7 @@ void palette_init(VncPalette *palette, s void palette_destroy(VncPalette *palette) { -if (palette == NULL) { +if (palette) { qemu_free(palette); } }
[Qemu-devel] Re: [PATCH 1/3] alleviate time drift with HPET periodic timers
>> Part 1 of the patch implements the following QEMU command line option. >> >> -hpet [device=none|present][,driftfix=none|slew] > > Just define driftfix as property of the hpet device. That way it can be > controlled both globally (-global hpet.driftfix=...) and per hpet block > (once we support instantiating >1 of them). Many Thanks Jan, I started investigating code changes. I'm thinking of ... - adding a new field to the HPETState structure. uint32_t driftfix; - adding the property 'driftfix' to the DeviceInfo structure. DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false) Using a single bit so that the option syntax would be, e.g.: -global hpet.driftfix=on (Default is 'off') - Replace all 'if (hpet_driftfix ...)' by: if ((HPETState)s->driftfix ...) Regards, Uli
[Qemu-devel] Re: [PATCH 3/3] alleviate time drift with HPET periodic timers
>> Part 3 of the patch implements the following options for the >> 'configure' script. >> >> --disable-hpet-driftfix >> --enable-hpet-driftfix > > I see no benefit in this configurability. Just make the driftfix > unconditionally available, runtime-disabled by default for now until it > matured and there is no downside in enabling it all the time. Many Thanks Jan, I enclosed the code in '#ifdef CONFIG_HPET_DRIFTFIX ... #endif' so that it can be easily identified (and removed if the generic API would be implemented some day). Since the ifdef's are already there I added the configuration option for convenience. As you don't see any benefit in this option, I can remove that part of the patch. However, I'd suggest to keep the ifdef's and do the following: - Rename to '#ifdef HPET_DRIFTFIX ... #endif' to make it clear that this is not controlled via a configuration option. - Add '#define HPET_DRIFTFIX' to hw/hpet_emul.h. Do you agree ? Regards, Uli
[Qemu-devel] [PATCH v2] severe memory leak caused by broken palette_destroy() function
This is version 2 of the patch that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg02063.html [Sorry, I missed to include the keyword 'PATCH' in the subject of the original post.] The following commit breaks the code of the function palette_destroy(). http://git.kernel.org/?p=virt/kvm/qemu-kvm.git;a=commit;h=e31e3694afef58ba191cbcc6875ec243e5971268 The broken code causes a severe memory leak of 'VncPalette' structures because it never frees anything: 70 void palette_destroy(VncPalette *palette) 71 { 72 if (palette == NULL) { 73 qemu_free(palette); 74 } 75 } Version 2 of the patch calls qemu_free() unconditionally. Signed-off-by: Ulrich Obergfell diff -up ./ui/vnc-palette.c.orig0 ./ui/vnc-palette.c --- ./ui/vnc-palette.c.orig02011-03-15 03:53:22.0 +0100 +++ ./ui/vnc-palette.c 2011-03-21 20:19:02.736948725 +0100 @@ -69,9 +69,7 @@ void palette_init(VncPalette *palette, s void palette_destroy(VncPalette *palette) { -if (palette == NULL) { -qemu_free(palette); -} +qemu_free(palette); } int palette_put(VncPalette *palette, uint32_t color)
[Qemu-devel] [PATCH v2 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
driftfix is a 'bit type' property. Compensation of delayed callbacks and coalesced interrupts can be enabled with the command line option -global hpet.driftfix=on driftfix is 'off' (disabled) by default. Signed-off-by: Ulrich Obergfell --- hw/hpet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index e92b775..45847ed 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -72,6 +72,8 @@ typedef struct HPETState { uint64_t isr; /* interrupt status reg */ uint64_t hpet_counter; /* main counter */ uint8_t hpet_id; /* instance id */ + +uint32_t driftfix; } HPETState; static uint32_t hpet_in_legacy_mode(HPETState *s) @@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = { .qdev.props = (Property[]) { DEFINE_PROP_UINT8("timers", HPETState, num_timers, HPET_MIN_TIMERS), DEFINE_PROP_BIT("msi", HPETState, flags, HPET_MSI_SUPPORT, false), +DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.6.2.5
[Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 45847ed..c150da5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -55,6 +55,11 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +uint64_t saved_period; +uint64_t ticks_not_accounted; +uint32_t irqs_to_inject; +uint32_t irq_rate; +uint32_t divisor; } HPETTimer; typedef struct HPETState { @@ -248,7 +253,7 @@ static int hpet_post_load(void *opaque, int version_id) static const VMStateDescription vmstate_hpet_timer = { .name = "hpet_timer", -.version_id = 1, +.version_id = 3, .minimum_version_id = 1, .minimum_version_id_old = 1, .fields = (VMStateField []) { @@ -258,6 +263,11 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT64(fsb, HPETTimer), VMSTATE_UINT64(period, HPETTimer), VMSTATE_UINT8(wrap_flag, HPETTimer), +VMSTATE_UINT64_V(saved_period, HPETTimer, 3), +VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), +VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), +VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), +VMSTATE_UINT32_V(divisor, HPETTimer, 3), VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() } @@ -265,7 +275,7 @@ static const VMStateDescription vmstate_hpet_timer = { static const VMStateDescription vmstate_hpet = { .name = "hpet", -.version_id = 2, +.version_id = 3, .minimum_version_id = 1, .minimum_version_id_old = 1, .pre_save = hpet_pre_save, -- 1.6.2.5
[Qemu-devel] [PATCH v2 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers
Hi, This is version 2 of a series of patches that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328 I implemented the proposed changes (remove configurability and ifdefs, define 'driftfix' as a property of the HPET device) and I've split the patch into smaller parts (originally 3, now 5). I also included a fix related to scaling 't->irqs_to_inject' when the guest o/s modifies the comparator register value. Please review and please comment. Regards, Uli Ulrich Obergfell (5): hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add driftfix property to HPETState and DeviceInfo hpet 'driftfix': add fields to HPETTimer and VMStateDescription hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts hw/apic.c |4 +++ hw/hpet.c | 90 +--- sysemu.h |3 ++ vl.c |3 ++ 4 files changed, 95 insertions(+), 5 deletions(-)
[Qemu-devel] [PATCH v2 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
'target_get_irq_delivered' and 'target_reset_irq_delivered' contain entry addresses of functions that are utilized by update_irq() to detect coalesced interrupts. apic code loads these pointers during initialization. Signed-off-by: Ulrich Obergfell --- hw/apic.c |4 sysemu.h |3 +++ vl.c |3 +++ 3 files changed, 10 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 00907e0..44d8cb3 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -24,6 +24,7 @@ #include "sysbus.h" #include "trace.h" #include "kvm.h" +#include "sysemu.h" /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +target_get_irq_delivered = apic_get_irq_delivered; +target_reset_irq_delivered = apic_reset_irq_delivered; + sysbus_register_withprop(&apic_info); } diff --git a/sysemu.h b/sysemu.h index 3f7e17e..b5d4f90 100644 --- a/sysemu.h +++ b/sysemu.h @@ -96,6 +96,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f); void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f); int qemu_loadvm_state(QEMUFile *f); +extern int (*target_get_irq_delivered)(void); +extern void (*target_reset_irq_delivered)(void); + /* SLIRP */ void do_info_slirp(Monitor *mon); diff --git a/vl.c b/vl.c index 8bcf2ae..548912a 100644 --- a/vl.c +++ b/vl.c @@ -231,6 +231,9 @@ const char *prom_envs[MAX_PROM_ENVS]; const char *nvram = NULL; int boot_menu; +int (*target_get_irq_delivered)(void) = 0; +void (*target_reset_irq_delivered)(void) = 0; + typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { -- 1.6.2.5
[Qemu-devel] [PATCH v2 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)
update_irq() uses a similar method as in 'rtc_td_hack' to detect coalesced interrupts. The function entry addresses are retrieved from 'target_get_irq_delivered' and 'target_reset_irq_delivered'. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index c150da5..b5625fb 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include "hpet_emul.h" #include "sysbus.h" #include "mc146818rtc.h" +#include "sysemu.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -176,11 +177,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -205,8 +207,17 @@ static void update_irq(struct HPETTimer *timer, int set) qemu_irq_raise(s->irqs[route]); } else { s->isr &= ~mask; -qemu_irq_pulse(s->irqs[route]); +if (s->driftfix && target_get_irq_delivered +&& target_reset_irq_delivered) { +target_reset_irq_delivered(); +qemu_irq_raise(s->irqs[route]); +irq_delivered = target_get_irq_delivered(); +qemu_irq_lower(s->irqs[route]); +} else { +qemu_irq_pulse(s->irqs[route]); +} } +return irq_delivered; } static void hpet_pre_save(void *opaque) -- 1.6.2.5
[Qemu-devel] [PATCH v2 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Loss of periodic timer interrupts caused by delayed callbacks and by interrupt coalescing is compensated by gradually injecting additional interrupts during subsequent timer intervals, starting at a rate of one additional interrupt per interval. If further interrupts are lost while compensation is in progress, the rate is increased. A limit is imposed on the rate and on the 'backlog' of lost interrupts that are to be injected. If a guest o/s modifies the comparator register value while compensation is in progress, the 'backlog' of lost interrupts that are to be injected is scaled to the new value. Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour is acceptable. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 58 +- 1 files changed, 57 insertions(+), 1 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index b5625fb..5a25a96 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -42,6 +42,9 @@ #define HPET_MSI_SUPPORT0 +#define MAX_IRQS_TO_INJECT (uint32_t)5000 +#define MAX_IRQ_RATE (uint32_t)10 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -309,8 +312,11 @@ static const VMStateDescription vmstate_hpet = { static void hpet_timer(void *opaque) { HPETTimer *t = opaque; +HPETState *s = t->state; uint64_t diff; +int irq_delivered = 0; +uint32_t irq_count = 0; uint64_t period = t->period; uint64_t cur_tick = hpet_get_ticks(t->state); @@ -318,13 +324,36 @@ static void hpet_timer(void *opaque) if (t->config & HPET_TN_32BIT) { while (hpet_time_after(cur_tick, t->cmp)) { t->cmp = (uint32_t)(t->cmp + t->period); +irq_count++; } } else { while (hpet_time_after64(cur_tick, t->cmp)) { t->cmp += period; +irq_count++; } } diff = hpet_calculate_diff(t, cur_tick); +if (s->driftfix) { +if (t->saved_period != t->period) { +uint64_t tmp = (t->irqs_to_inject * t->saved_period) + + t->ticks_not_accounted; +t->irqs_to_inject = tmp / t->period; +t->ticks_not_accounted = tmp % t->period; +t->saved_period = t->period; +} +t->irqs_to_inject += irq_count; +t->irqs_to_inject = MIN(t->irqs_to_inject, MAX_IRQS_TO_INJECT); +if (t->irqs_to_inject > 1) { +if (irq_count > 1) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +if (irq_count || t->divisor == 0) { +t->divisor = t->irq_rate; +} +diff /= t->divisor--; +} +} qemu_mod_timer(t->qemu_timer, qemu_get_clock(vm_clock) + (int64_t)ticks_to_ns(diff)); } else if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) { @@ -335,7 +364,22 @@ static void hpet_timer(void *opaque) t->wrap_flag = 0; } } -update_irq(t, 1); +if (s->driftfix && timer_is_periodic(t) && period != 0) { +if (t->irqs_to_inject) { +irq_delivered = update_irq(t, 1); +if (irq_delivered) { +t->irq_rate = MIN(t->irq_rate, t->irqs_to_inject); +t->irqs_to_inject--; +} else { +if (irq_count) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +} +} +} else { +update_irq(t, 1); +} } static void hpet_set_timer(HPETTimer *t) @@ -674,6 +718,12 @@ static void hpet_reset(DeviceState *d) timer->config |= 0x0004ULL << 32; timer->period = 0ULL; timer->wrap_flag = 0; + +timer->saved_period = 0; +timer->ticks_not_accounted = 0; +timer->irqs_to_inject = 0; +timer->irq_rate = 1; +timer->divisor = 1; } s->hpet_counter = 0ULL; @@ -734,6 +784,12 @@ static int hpet_init(SysBusDevice *dev) timer->qemu_timer = qemu_new_timer(vm_clock, hpet_timer, timer); timer->tn = i; timer->state = s; + +timer->saved_period = 0; +timer->ticks_not_accounted = 0; +timer->irqs_to_inject = 0; +timer->irq_rate = 1; +timer->divisor = 1; } /* 64-bit main counter; LegacyReplacementRoute. */ -- 1.6.2.5
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
>> typedef struct HPETState { >> @@ -248,7 +253,7 @@ static int hpet_post_load(void *opaque, int >> version_id) >> >> static const VMStateDescription vmstate_hpet_timer = { >> .name = "hpet_timer", >> - .version_id = 1, >> + .version_id = 3, > > Why jump from 1 to 3? > >> .minimum_version_id = 1, >> .minimum_version_id_old = 1, >> .fields = (VMStateField []) { >> @@ -258,6 +263,11 @@ static const VMStateDescription >> vmstate_hpet_timer = { >> VMSTATE_UINT64(fsb, HPETTimer), >> VMSTATE_UINT64(period, HPETTimer), >> VMSTATE_UINT8(wrap_flag, HPETTimer), >> + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), >> + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), >> + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), >> + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), >> + VMSTATE_UINT32_V(divisor, HPETTimer, 3), Anthony, I incremented the version ID of 'vmstate_hpet' from 2 to 3 to make sure that migrations from a QEMU process that is capable of 'driftfix' to a QEMU process that is _not_ capable of 'driftfix' will fail. I assigned version ID 3 to 'vmstate_hpet_timer' and to the new fields in there too to indicate that adding those fields was the reason why the version ID of 'vmstate_hpet' was incremented to 3. As far as the flow of execution in vmstate_load_state() is concerned, I think it does not matter whether the version ID of 'vmstate_hpet_timer' and the new fields in there is 2 or 3 (as long as they are consistent). When the 'while(field->name)' loop in vmstate_load_state() gets to the following field in 'vmstate_hpet' ... VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0, vmstate_hpet_timer, HPETTimer), ... it calls itself recursively ... if (field->flags & VMS_STRUCT) { ret = vmstate_load_state(f, field->vmsd, addr, field->vmsd->version_id); 'field->vmsd->version_id' is the version ID of 'vmstate_hpet_timer' [1]. Hence 'vmstate_hpet_timer.version_id' is being checked against itself ... if (version_id > vmsd->version_id) { return -EINVAL; } ... and the version IDs of the new fields are also being checked against 'vmstate_hpet_timer.version_id' ... if ((field->field_exists && field->field_exists(opaque, version_id)) || (!field->field_exists && field->version_id <= version_id)) { If you want me to change the version ID of 'vmstate_hpet_timer' and the new fields in there from 3 to 2, I can do that. Regards, Uli [1] Ref.: commit fa3aad24d94a6cf894db52d83f72a399324a17bb
Re: [Qemu-devel] [PATCH v2 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
>> vmstate_hpet_timer = { >> VMSTATE_UINT64(fsb, HPETTimer), >> VMSTATE_UINT64(period, HPETTimer), >> VMSTATE_UINT8(wrap_flag, HPETTimer), >> + VMSTATE_UINT64_V(saved_period, HPETTimer, 3), >> + VMSTATE_UINT64_V(ticks_not_accounted, HPETTimer, 3), >> + VMSTATE_UINT32_V(irqs_to_inject, HPETTimer, 3), >> + VMSTATE_UINT32_V(irq_rate, HPETTimer, 3), >> + VMSTATE_UINT32_V(divisor, HPETTimer, 3), > > We ought to be able to use a subsection keyed off of whether any ticks > are currently accumulated, no? Anthony, I'm not sure if I understand your question correctly. Are you suggesting to migrate the driftfix-related state conditionally / only if there are any ticks accumulated in 'ticks_not_accounted' and 'irqs_to_inject' ? The size of the driftfix-related state is 28 bytes per timer and we have 32 timers per HPETState, i.e. 896 additional bytes per HPETState. With a maximum number of 8 HPET blocks (HPETState), this amounts to 7168 bytes. Hence, unconditional migration of the driftfix-related state should not cause significant additional overhead. Maybe I missed something. Could you please explain which benefit you see in using a subsection ? Regards, Uli
[Qemu-devel] [PATCH v3 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers
Hi, This is version 3 of a series of patches that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328 Changes since version 2: - The vmstate related to 'driftfix' is now in a separate subsection that can be migrated conditionally. - The new fields of the HPETTimer structure are no longer initialized in hpet_init(). This is now done in hpet_reset() only. - Compensation of lost timer interrupts is now based on a backlog of unaccounted HPET clock periods instead of 'irqs_to_inject'. This eliminates the need to scale 'irqs_to_inject' when a guest o/s modifies the comparator register value. Please review and please comment. Regards, Uli Ulrich Obergfell (5): hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add driftfix property to HPETState and DeviceInfo hpet 'driftfix': add fields to HPETTimer and VMStateDescription hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts hw/apic.c |4 ++ hw/hpet.c | 114 ++-- sysemu.h |3 ++ vl.c |3 ++ 4 files changed, 120 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
'target_get_irq_delivered' and 'target_reset_irq_delivered' contain entry addresses of functions that are utilized by update_irq() to detect coalesced interrupts. apic code loads these pointers during initialization. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/apic.c |4 sysemu.h |3 +++ vl.c |3 +++ 3 files changed, 10 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index a45b57f..eb0f6d9 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -24,6 +24,7 @@ #include "sysbus.h" #include "trace.h" #include "kvm.h" +#include "sysemu.h" /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +target_get_irq_delivered = apic_get_irq_delivered; +target_reset_irq_delivered = apic_reset_irq_delivered; + sysbus_register_withprop(&apic_info); } diff --git a/sysemu.h b/sysemu.h index 07d85cd..75b0139 100644 --- a/sysemu.h +++ b/sysemu.h @@ -98,6 +98,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f); void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f); int qemu_loadvm_state(QEMUFile *f); +extern int (*target_get_irq_delivered)(void); +extern void (*target_reset_irq_delivered)(void); + /* SLIRP */ void do_info_slirp(Monitor *mon); diff --git a/vl.c b/vl.c index 0c24e07..7bab315 100644 --- a/vl.c +++ b/vl.c @@ -233,6 +233,9 @@ const char *prom_envs[MAX_PROM_ENVS]; const char *nvram = NULL; int boot_menu; +int (*target_get_irq_delivered)(void) = 0; +void (*target_reset_irq_delivered)(void) = 0; + typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { -- 1.6.2.5
[Qemu-devel] [PATCH v3 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
The new fields in HPETTimer are covered by a separate VMStateDescription which is a subsection of 'vmstate_hpet_timer'. They are only migrated if -global hpet.driftfix=on Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 33 + 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7513065..7ab6e62 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -55,6 +55,10 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +uint64_t prev_period; +uint64_t ticks_not_accounted; +uint32_t irq_rate; +uint32_t divisor; } HPETTimer; typedef struct HPETState { @@ -246,6 +250,27 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_timer_driftfix_vmstate_needed(void *opaque) +{ +HPETTimer *t = opaque; + +return (t->state->driftfix != 0); +} + +static const VMStateDescription vmstate_hpet_timer_driftfix = { +.name = "hpet_timer_driftfix", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(prev_period, HPETTimer), +VMSTATE_UINT64(ticks_not_accounted, HPETTimer), +VMSTATE_UINT32(irq_rate, HPETTimer), +VMSTATE_UINT32(divisor, HPETTimer), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_hpet_timer = { .name = "hpet_timer", .version_id = 1, @@ -260,6 +285,14 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT8(wrap_flag, HPETTimer), VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = &vmstate_hpet_timer_driftfix, +.needed = hpet_timer_driftfix_vmstate_needed, +}, { +/* empty */ +} } }; -- 1.6.2.5
[Qemu-devel] [PATCH v3 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)
update_irq() uses a similar method as in 'rtc_td_hack' to detect coalesced interrupts. The function entry addresses are retrieved from 'target_get_irq_delivered' and 'target_reset_irq_delivered'. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7ab6e62..35466ae 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include "hpet_emul.h" #include "sysbus.h" #include "mc146818rtc.h" +#include "sysemu.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -175,11 +176,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -204,8 +206,17 @@ static void update_irq(struct HPETTimer *timer, int set) qemu_irq_raise(s->irqs[route]); } else { s->isr &= ~mask; -qemu_irq_pulse(s->irqs[route]); +if (s->driftfix && target_get_irq_delivered +&& target_reset_irq_delivered) { +target_reset_irq_delivered(); +qemu_irq_raise(s->irqs[route]); +irq_delivered = target_get_irq_delivered(); +qemu_irq_lower(s->irqs[route]); +} else { +qemu_irq_pulse(s->irqs[route]); +} } +return irq_delivered; } static void hpet_pre_save(void *opaque) -- 1.6.2.5
[Qemu-devel] [PATCH v3 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
driftfix is a 'bit type' property. Compensation of delayed callbacks and coalesced interrupts can be enabled with the command line option -global hpet.driftfix=on driftfix is 'off' (disabled) by default. Signed-off-by: Ulrich Obergfell --- hw/hpet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 6ce07bc..7513065 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -72,6 +72,8 @@ typedef struct HPETState { uint64_t isr; /* interrupt status reg */ uint64_t hpet_counter; /* main counter */ uint8_t hpet_id; /* instance id */ + +uint32_t driftfix; } HPETState; static uint32_t hpet_in_legacy_mode(HPETState *s) @@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = { .qdev.props = (Property[]) { DEFINE_PROP_UINT8("timers", HPETState, num_timers, HPET_MIN_TIMERS), DEFINE_PROP_BIT("msi", HPETState, flags, HPET_MSI_SUPPORT, false), +DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.6.2.5
[Qemu-devel] [PATCH v3 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Loss of periodic timer interrupts caused by delayed callbacks and by interrupt coalescing is compensated by gradually injecting additional interrupts during subsequent timer intervals, starting at a rate of one additional interrupt per interval. The injection of additional interrupts is based on a backlog of unaccounted HPET clock periods (new HPETTimer field 'ticks_not_accounted'). The backlog increases due to delayed callbacks and coalesced interrupts, and it decreases if an interrupt was injected successfully. If the backlog increases while compensation is still in progress, the rate at which additional interrupts are injected is increased too. A limit is imposed on the backlog and on the rate. Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass slower and faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour may be acceptable. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 63 +++- 1 files changed, 61 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 35466ae..92d5f58 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -32,6 +32,7 @@ #include "sysbus.h" #include "mc146818rtc.h" #include "sysemu.h" +#include //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -42,6 +43,9 @@ #define HPET_MSI_SUPPORT0 +#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */ +#define MAX_IRQ_RATE(uint32_t)10 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -326,28 +330,63 @@ static const VMStateDescription vmstate_hpet = { } }; +static bool hpet_timer_has_tick_backlog(HPETTimer *t) +{ +uint64_t backlog = t->ticks_not_accounted - (t->period + t->prev_period); +return (backlog >= t->period); +} + /* * timer expiration callback */ static void hpet_timer(void *opaque) { HPETTimer *t = opaque; +HPETState *s = t->state; uint64_t diff; - +int irq_delivered = 0; +uint32_t irq_count = 0; uint64_t period = t->period; uint64_t cur_tick = hpet_get_ticks(t->state); +if (s->driftfix && !t->ticks_not_accounted) { +t->ticks_not_accounted = t->prev_period = t->period; +} if (timer_is_periodic(t) && period != 0) { if (t->config & HPET_TN_32BIT) { while (hpet_time_after(cur_tick, t->cmp)) { t->cmp = (uint32_t)(t->cmp + t->period); +t->ticks_not_accounted += t->period; +irq_count++; } } else { while (hpet_time_after64(cur_tick, t->cmp)) { t->cmp += period; +t->ticks_not_accounted += period; +irq_count++; } } diff = hpet_calculate_diff(t, cur_tick); +if (s->driftfix) { +if (t->ticks_not_accounted > MAX_TICKS_NOT_ACCOUNTED) { +t->ticks_not_accounted = t->period + t->prev_period; +} +if (hpet_timer_has_tick_backlog(t)) { +if (t->irq_rate == 1 || irq_count > 1) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +if (t->divisor == 0) { +assert(irq_count); +} +if (irq_count) { +t->divisor = t->irq_rate; +} +diff /= t->divisor--; +} else { +t->irq_rate = 1; +} +} qemu_mod_timer(t->qemu_timer, qemu_get_clock_ns(vm_clock) + (int64_t)ticks_to_ns(diff)); } else if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) { @@ -358,7 +397,22 @@ static void hpet_timer(void *opaque) t->wrap_flag = 0; } } -update_irq(t, 1); +if (s->driftfix && timer_is_periodic(t) && period != 0) { +if (t->ticks_not_accounted >= t->period + t->prev_period) { +irq_delivered = update_irq(t, 1); +if (irq_delivered) { +t->ticks_not_accounted -= t->prev_period; +t->prev_period = t->period; +} else { +if (irq_count) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +} +} +} else { +update_irq(t, 1); +} } static void hpet_set_timer(HPE
Re: [Qemu-devel] [PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
> On 2011-04-28 20:51, Blue Swirl wrote: >> On Thu, Apr 28, 2011 at 5:24 PM, Ulrich Obergfell wrote: >>> 'target_get_irq_delivered' and 'target_reset_irq_delivered' contain >>> entry addresses of functions that are utilized by update_irq() to >>> detect coalesced interrupts. apic code loads these pointers during >>> initialization. >> >> I'm not so happy with this approach, but probably then the i386 >> dependencies can be removed from RTC and it can be compiled only once >> for all targets. > > This whole series is really the minimalistic approach. The callbacks > defined here must remain a temporary "shortcut". Just like proper > abstraction of periodic tick compensation for reuse in other timers has > to be added later on. And the limitation to edge-triggered legacy HPET > INTs has to be removed. Since QEMU doesn't have a generic infrastructure to track interrupt delivery, I decided to reuse something that is currently available. The only mechanism that I'm aware of is the one that is utilized by RTC code ('rtc_td_hack'), i.e. apic_get_irq_delivered() etc. The changes that would be introduced by part 1/5 and part 4/5 of this patch series could be replaced if a generic infrastructure to track interrupt delivery becomes available. Regards, Uli
Re: [Qemu-devel] [PATCH v3 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Hi Marcelo, > Whats prev_period for, since in practice the period will not change > between interrupts (OS programs comparator once, or perhaps twice > during bootup) ? 'prev_period' is needed if a guest o/s changes the comparator period 'on the fly' (without stopping and restarting the timer). guest o/s changes period | ti(n-1) |ti(n) ti(n+1) | v | | +-+--+ <--- prev_period ---> <-- period --> The idea is that each timer interrupt represents a certain quantum of time (the comparator period). If a guest o/s changes the period between timer interrupt 'n-1' and timer interrupt 'n', I think the new value should not take effect before timer interrupt 'n'. Timer interrupt 'n' still represents the old/previous quantum, and timer interrupt 'n+1' represents the new quantum. Hence, the patch decrements 'ticks_not_accounted' by 'prev_period' and sets 'prev_period' to 'period' when an interrupt was delivered to the guest o/s. +irq_delivered = update_irq(t, 1); +if (irq_delivered) { +t->ticks_not_accounted -= t->prev_period; +t->prev_period = t->period; +} else { Most of the time 'prev_period' is equal to 'period'. It should only be different in the scenario shown above. Regards, Uli
Re: [Qemu-devel] [PATCH v3 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Hi Marcelo, > Other than that, shouldnt reset accounting variables to init state on > write to GLOBAL_ENABLE_CFG / writes to main counter? I'd suggest to initialize/reset the driftfix-related fields in the 'HPETTimer' structure (including the backlog of unaccounted ticks) in the following situations. - When the guest o/s sets the 'CFG_ENABLE' bit (overall enable) in the General Configuration Register. This is the place in hpet_ram_writel(): case HPET_CFG: ... if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) { ... for (i = 0; i < s->num_timers; i++) { if ((&s->timer[i])->cmp != ~0ULL) { // initialize / reset fields here hpet_set_timer(&s->timer[i]); - When the guest o/s sets the 'TN_ENABLE' bit (timer N interrupt enable) in the Timer N Configuration and Capabilities Register. This is the place in hpet_ram_writel(): case HPET_TN_CFG: ... if (activating_bit(old_val, new_val, HPET_TN_ENABLE)) { // initialize / reset fields here hpet_set_timer(timer); This should cover cases such as ... - first time initialization of HPET & timers during guest o/s boot. (includes kexec and reboot) - if a guest o/s stops and restarts the timer. (for example, to change the comparator register value or to switch a timer between periodic mode and non-periodic mode) Regards, Uli
[Qemu-devel] [PATCH v4 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers
Hi, This is version 4 of a series of patches that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328 Changes since version 3: in patch part 1/5 and part 4/5 - Added stub functions for 'target_reset_irq_delivered' and 'target_get_irq_delivered'. Added registration functions that are used by apic code to replace the stubs. - Removed NULL pointer checks from update_irq(). in patch part 5/5 - A minor modification in hpet_timer_has_tick_backlog(). - Renamed the local variable 'irq_count' in hpet_timer() to 'period_count'. - Driftfix-related fields in struct 'HPETTimer' are no longer being initialized/reset in hpet_reset(). Added the function hpet_timer_driftfix_reset() which is called when the guest . sets the 'CFG_ENABLE' bit (overall enable) in the General Configuration Register. . sets the 'TN_ENABLE' bit (timer N interrupt enable) in the Timer N Configuration and Capabilities Register. Please review and please comment. Regards, Uli Ulrich Obergfell (5): hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add driftfix property to HPETState and DeviceInfo hpet 'driftfix': add fields to HPETTimer and VMStateDescription hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts hw/apic.c |4 ++ hw/hpet.c | 119 ++-- hw/pc.h | 13 +++ vl.c | 13 +++ 4 files changed, 145 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH v4 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
'target_get_irq_delivered' and 'target_reset_irq_delivered' point to functions that are called by update_irq() to detect coalesced interrupts. Initially they point to stub functions which pretend successful interrupt injection. apic code calls two registration functions to replace the stubs with apic_get_irq_delivered() and apic_reset_irq_delivered(). This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/apic.c |4 hw/pc.h | 13 + vl.c | 13 + 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index a45b57f..94b1d15 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -17,6 +17,7 @@ * License along with this library; if not, see <http://www.gnu.org/licenses/> */ #include "hw.h" +#include "pc.h" #include "apic.h" #include "ioapic.h" #include "qemu-timer.h" @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +register_target_get_irq_delivered(apic_get_irq_delivered); +register_target_reset_irq_delivered(apic_reset_irq_delivered); + sysbus_register_withprop(&apic_info); } diff --git a/hw/pc.h b/hw/pc.h index 1291e2d..79885f4 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -7,6 +7,19 @@ #include "fdc.h" #include "net.h" +extern int (*target_get_irq_delivered)(void); +extern void (*target_reset_irq_delivered)(void); + +static inline void register_target_get_irq_delivered(int (*func)(void)) +{ +target_get_irq_delivered = func; +} + +static inline void register_target_reset_irq_delivered(void (*func)(void)) +{ +target_reset_irq_delivered = func; +} + /* PC-style peripherals (also used by other machines). */ /* serial.c */ diff --git a/vl.c b/vl.c index a143250..a2bbc61 100644 --- a/vl.c +++ b/vl.c @@ -233,6 +233,19 @@ const char *prom_envs[MAX_PROM_ENVS]; const char *nvram = NULL; int boot_menu; +static int target_get_irq_delivered_stub(void) +{ +return 1; +} + +static void target_reset_irq_delivered_stub(void) +{ +return; +} + +int (*target_get_irq_delivered)(void) = target_get_irq_delivered_stub; +void (*target_reset_irq_delivered)(void) = target_reset_irq_delivered_stub; + typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { -- 1.6.2.5
[Qemu-devel] [PATCH v4 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
The new fields in HPETTimer are covered by a separate VMStateDescription which is a subsection of 'vmstate_hpet_timer'. They are only migrated if -global hpet.driftfix=on Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 33 + 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7513065..7ab6e62 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -55,6 +55,10 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +uint64_t prev_period; +uint64_t ticks_not_accounted; +uint32_t irq_rate; +uint32_t divisor; } HPETTimer; typedef struct HPETState { @@ -246,6 +250,27 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_timer_driftfix_vmstate_needed(void *opaque) +{ +HPETTimer *t = opaque; + +return (t->state->driftfix != 0); +} + +static const VMStateDescription vmstate_hpet_timer_driftfix = { +.name = "hpet_timer_driftfix", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(prev_period, HPETTimer), +VMSTATE_UINT64(ticks_not_accounted, HPETTimer), +VMSTATE_UINT32(irq_rate, HPETTimer), +VMSTATE_UINT32(divisor, HPETTimer), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_hpet_timer = { .name = "hpet_timer", .version_id = 1, @@ -260,6 +285,14 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT8(wrap_flag, HPETTimer), VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = &vmstate_hpet_timer_driftfix, +.needed = hpet_timer_driftfix_vmstate_needed, +}, { +/* empty */ +} } }; -- 1.6.2.5
[Qemu-devel] [PATCH v4 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
driftfix is a 'bit type' property. Compensation of delayed callbacks and coalesced interrupts can be enabled with the command line option -global hpet.driftfix=on driftfix is 'off' (disabled) by default. Signed-off-by: Ulrich Obergfell --- hw/hpet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 6ce07bc..7513065 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -72,6 +72,8 @@ typedef struct HPETState { uint64_t isr; /* interrupt status reg */ uint64_t hpet_counter; /* main counter */ uint8_t hpet_id; /* instance id */ + +uint32_t driftfix; } HPETState; static uint32_t hpet_in_legacy_mode(HPETState *s) @@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = { .qdev.props = (Property[]) { DEFINE_PROP_UINT8("timers", HPETState, num_timers, HPET_MIN_TIMERS), DEFINE_PROP_BIT("msi", HPETState, flags, HPET_MSI_SUPPORT, false), +DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.6.2.5
[Qemu-devel] [PATCH v4 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Loss of periodic timer interrupts caused by delayed callbacks and by interrupt coalescing is compensated by gradually injecting additional interrupts during subsequent timer intervals, starting at a rate of one additional interrupt per interval. The injection of additional interrupts is based on a backlog of unaccounted HPET clock periods (new HPETTimer field 'ticks_not_accounted'). The backlog increases due to delayed callbacks and coalesced interrupts, and it decreases if an interrupt was injected successfully. If the backlog increases while compensation is still in progress, the rate at which additional interrupts are injected is increased too. A limit is imposed on the backlog and on the rate. Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass slower and faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour may be acceptable. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 70 +++- 1 files changed, 68 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index e57c654..519fc6b 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include "hpet_emul.h" #include "sysbus.h" #include "mc146818rtc.h" +#include //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -41,6 +42,9 @@ #define HPET_MSI_SUPPORT0 +#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */ +#define MAX_IRQ_RATE(uint32_t)10 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -324,14 +328,35 @@ static const VMStateDescription vmstate_hpet = { } }; +static void hpet_timer_driftfix_reset(HPETTimer *t) +{ +if (t->state->driftfix && timer_is_periodic(t)) { +t->ticks_not_accounted = t->prev_period = t->period; +t->irq_rate = 1; +t->divisor = 1; +} +} + +static bool hpet_timer_has_tick_backlog(HPETTimer *t) +{ +uint64_t backlog = 0; + +if (t->ticks_not_accounted >= t->period + t->prev_period) { +backlog = t->ticks_not_accounted - (t->period + t->prev_period); +} +return (backlog >= t->period); +} + /* * timer expiration callback */ static void hpet_timer(void *opaque) { HPETTimer *t = opaque; +HPETState *s = t->state; uint64_t diff; - +int irq_delivered = 0; +uint32_t period_count = 0; uint64_t period = t->period; uint64_t cur_tick = hpet_get_ticks(t->state); @@ -339,13 +364,37 @@ static void hpet_timer(void *opaque) if (t->config & HPET_TN_32BIT) { while (hpet_time_after(cur_tick, t->cmp)) { t->cmp = (uint32_t)(t->cmp + t->period); +t->ticks_not_accounted += t->period; +period_count++; } } else { while (hpet_time_after64(cur_tick, t->cmp)) { t->cmp += period; +t->ticks_not_accounted += period; +period_count++; } } diff = hpet_calculate_diff(t, cur_tick); +if (s->driftfix) { +if (t->ticks_not_accounted > MAX_TICKS_NOT_ACCOUNTED) { +t->ticks_not_accounted = t->period + t->prev_period; +} +if (hpet_timer_has_tick_backlog(t)) { +if (t->irq_rate == 1 || period_count > 1) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +if (t->divisor == 0) { +assert(period_count); +} +if (period_count) { +t->divisor = t->irq_rate; +} +diff /= t->divisor--; +} else { +t->irq_rate = 1; +} +} qemu_mod_timer(t->qemu_timer, qemu_get_clock_ns(vm_clock) + (int64_t)ticks_to_ns(diff)); } else if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) { @@ -356,7 +405,22 @@ static void hpet_timer(void *opaque) t->wrap_flag = 0; } } -update_irq(t, 1); +if (s->driftfix && timer_is_periodic(t) && period != 0) { +if (t->ticks_not_accounted >= t->period + t->prev_period) { +irq_delivered = update_irq(t, 1); +if (irq_delivered) { +t->ticks_not_accounted -= t->prev_period; +t->prev_period = t->period; +} else { +if (p
[Qemu-devel] [PATCH v4 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)
update_irq() uses a similar method as in 'rtc_td_hack' to detect coalesced interrupts. The function entry addresses are retrieved from 'target_get_irq_delivered' and 'target_reset_irq_delivered'. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7ab6e62..e57c654 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -175,11 +175,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -204,8 +205,16 @@ static void update_irq(struct HPETTimer *timer, int set) qemu_irq_raise(s->irqs[route]); } else { s->isr &= ~mask; -qemu_irq_pulse(s->irqs[route]); +if (s->driftfix) { +target_reset_irq_delivered(); +qemu_irq_raise(s->irqs[route]); +irq_delivered = target_get_irq_delivered(); +qemu_irq_lower(s->irqs[route]); +} else { +qemu_irq_pulse(s->irqs[route]); +} } +return irq_delivered; } static void hpet_pre_save(void *opaque) -- 1.6.2.5
Re: [Qemu-devel] [PATCH v4 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Hi Zachary, 1. re.: >> +static void hpet_timer_driftfix_reset(HPETTimer *t) >> +{ >> +if (t->state->driftfix&& timer_is_periodic(t)) { >> +t->ticks_not_accounted = t->prev_period = t->period; >> > > This is rather confusing. Clearly, ticks_not_accounted isn't actually > ticks not accounted, it's a different variable entirely which is based > on the current period. > > If it were actually ticks_not_accounted, it should be reset to zero. hpet_timer_driftfix_reset() is called at two sites before the periodic timer is actually started via hpet_set_timer(). An interrupt shall be delivered at the first callback of hpet_timer(), after t->period amount of time has passed. The ticks are recorded as 'not accounted' until the interrupt is delivered to the guest. The following message explains why t->prev_period is needed in addition to t->period and how it is used. http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg00275.html On entry to hpet_timer(), t->ticks_not_accounted is >= t->prev_period, and is it increased by t->period while the comparator register is being advanced. This is also the place where we can detect whether the current callback was delayed. If the period_count is > 1, the delay was so large that we missed to inject an interrupt (which needs to be compensated). while (hpet_time_after(cur_tick, t->cmp)) { t->cmp = (uint32_t)(t->cmp + t->period); +t->ticks_not_accounted += t->period; +period_count++; } Please note that period_count can be zero. This happens during callbacks that inject additional interrupts 'inside of' a period interval in order to compensate missed and coalesced interrupts. t->ticks_not_accounted is decreased only if an interrupt was delivered to the guest. If an interrupt could not be delivered, the ticks that are represented by that interrupt remain recorded as 'not accounted' (this also triggers compensation of coalesced interrupts). +if (irq_delivered) { +t->ticks_not_accounted -= t->prev_period; +t->prev_period = t->period; +} else { +if (period_count) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +} 2. re.: >> +if (period_count) { >> +t->divisor = t->irq_rate; >> +} >> +diff /= t->divisor--; >> > > Why subtracting from the divisor? Shouldn't the divisor and irq_rate > change in lockstep? t->irq_rate is the rate at which interrupts are delivered to the guest, relative to the period length. If t->irq_rate is 1, then one interrupt shall be injected per period interval. If t->irq_rate is > 1, we are in 'compensation mode' (trying to inject additional interrupts 'inside of' an interval). - If t->irq_rate is 2, then two interrupts shall be injected during a period interval (one regular and one additional). - If t->irq_rate is 3, then three interrupts shall be injected during a period interval (one regular and two additional). - etc. A non-zero period_count marks the start of an interval, at which the divisor is set to t->irq_rate. Let's take a look at an example where t->divisor = t->irq_rate = 3. - The current period starts at t(p), the next period starts at t(p+1). We are now at t(p). The first additional interrupt shall be injected at a(1), the second at a(2). Hence, the next callback is scheduled to occur at a(1) = t(p) + diff / 3. t(p) a(1) a(2) t(p+1) ++++ time <--- diff ---> < diff / 3 > - We are now in the callback at a(1). A new diff has been calculated, which is equal to the remaining time in the interval from a(1) to t(p+1). The second additional interrupt shall be injected at a(2). Hence, the next callback is scheduled to occur at a(2) = a(1) + diff / 2. a(1) a(2) t(p+1) +++ time <-- diff ---> < diff / 2 > - We are now in the callback at a(2). A new diff has been calculated, which is equal to the remaining time in the interval from a(2) to t(p+1). The next callback marks the beginning of period t(p+1). a(2) t(p+1) ++ time <-- diff --> < diff / 1 > At t(p), the divisor is set to irq_rate (= 3). diff is divided by 3 and the divisor is decremented by one. At a(1), the divisor is 2. diff is divided by 2 and
[Qemu-devel] [PATCH v5 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers
Hi, This is version 5 of a series of patches that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328 Changes since version 4: Added comments to patch part 3 and part 5. No changes in the actual code. Please review and please comment. Regards, Uli Ulrich Obergfell (5): hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add driftfix property to HPETState and DeviceInfo hpet 'driftfix': add fields to HPETTimer and VMStateDescription hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts hw/apic.c |4 ++ hw/hpet.c | 178 +++-- hw/pc.h | 13 + vl.c | 13 + 4 files changed, 204 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH v5 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)
update_irq() uses a similar method as in 'rtc_td_hack' to detect coalesced interrupts. The function entry addresses are retrieved from 'target_get_irq_delivered' and 'target_reset_irq_delivered'. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index dba9370..0428290 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -184,11 +184,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -213,8 +214,16 @@ static void update_irq(struct HPETTimer *timer, int set) qemu_irq_raise(s->irqs[route]); } else { s->isr &= ~mask; -qemu_irq_pulse(s->irqs[route]); +if (s->driftfix) { +target_reset_irq_delivered(); +qemu_irq_raise(s->irqs[route]); +irq_delivered = target_get_irq_delivered(); +qemu_irq_lower(s->irqs[route]); +} else { +qemu_irq_pulse(s->irqs[route]); +} } +return irq_delivered; } static void hpet_pre_save(void *opaque) -- 1.6.2.5
[Qemu-devel] [PATCH v5 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Loss of periodic timer interrupts caused by delayed callbacks and by interrupt coalescing is compensated by gradually injecting additional interrupts during subsequent timer intervals, starting at a rate of one additional interrupt per interval. The injection of additional interrupts is based on a backlog of unaccounted HPET clock periods (new HPETTimer field 'ticks_not_accounted'). The backlog increases due to delayed callbacks and coalesced interrupts, and it decreases if an interrupt was injected successfully. If the backlog increases while compensation is still in progress, the rate at which additional interrupts are injected is increased too. A limit is imposed on the backlog and on the rate. Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass slower and faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour may be acceptable. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 120 +++- 1 files changed, 118 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 0428290..bc2a21a 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include "hpet_emul.h" #include "sysbus.h" #include "mc146818rtc.h" +#include //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -41,6 +42,9 @@ #define HPET_MSI_SUPPORT0 +#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */ +#define MAX_IRQ_RATE(uint32_t)10 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -334,13 +338,68 @@ static const VMStateDescription vmstate_hpet = { }; /* + * This function resets the driftfix state in the following situations. + * + * - When the guest o/s changes the 'CFG_ENABLE' bit (overall enable) + * in the General Configuration Register from 0 to 1. + * + * - When the guest o/s changes the 'TN_ENABLE' bit (timer N interrupt enable) + * in the Timer N Configuration and Capabilities Register from 0 to 1. + */ +static void hpet_timer_driftfix_reset(HPETTimer *t) +{ +if (t->state->driftfix && timer_is_periodic(t)) { +t->ticks_not_accounted = t->prev_period = t->period; +t->irq_rate = 1; +t->divisor = 1; +} +} + +/* + * This function determines whether there is a backlog of ticks for which + * no interrupts have been delivered to the guest o/s yet. If the backlog + * is equal to or greater than the current period length, then additional + * interrupts will be delivered to the guest o/s inside of the subsequent + * period interval to compensate missed interrupts. + * + * 'ticks_not_accounted' increases by 'N * period' when the comparator is + * being advanced, and it decreases by 'prev_period' when an interrupt is + * delivered to the guest o/s. Normally 'prev_period' is equal to 'period' + * and 'N' is 1. 'prev_period' is different from 'period' if a guest o/s + * has changed the comparator value during the previous period interval. + * 'N' is greater than 1 if the callback was delayed by 'N - 1' periods, + * and 'N' is zero while additional interrupts are delivered inside of an + * interval. + * + * This function is called after the comparator has been advanced but before + * the interrupt is delivered to the guest o/s. Hence, 'ticks_not_accounted' + * is equal to 'prev_period' plus 'period' if there is no backlog. + */ +static bool hpet_timer_has_tick_backlog(HPETTimer *t) +{ +uint64_t backlog = 0; + +if (t->ticks_not_accounted >= t->period + t->prev_period) { +backlog = t->ticks_not_accounted - (t->period + t->prev_period); +} +return (backlog >= t->period); +} + +/* * timer expiration callback */ static void hpet_timer(void *opaque) { HPETTimer *t = opaque; +HPETState *s = t->state; uint64_t diff; - +int irq_delivered = 0; +uint32_t period_count = 0; /* elapsed periods since last callback + * 1: normal case + * > 1: missed 'period_count - 1' interrupts + * due to delayed callback + * 0: callback inside of an interval + * to deliver additional interrupts */ uint64_t period = t->period; uint64_t cur_tick = hpet_get_ticks(t->state); @@ -348,13 +407,48 @@ static void hpet_timer(void *opaque) if (t->config & HPET_TN_32BIT) {
[Qemu-devel] [PATCH v5 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
'target_get_irq_delivered' and 'target_reset_irq_delivered' point to functions that are called by update_irq() to detect coalesced interrupts. Initially they point to stub functions which pretend successful interrupt injection. apic code calls two registration functions to replace the stubs with apic_get_irq_delivered() and apic_reset_irq_delivered(). This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/apic.c |4 hw/pc.h | 13 + vl.c | 13 + 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index a45b57f..94b1d15 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -17,6 +17,7 @@ * License along with this library; if not, see <http://www.gnu.org/licenses/> */ #include "hw.h" +#include "pc.h" #include "apic.h" #include "ioapic.h" #include "qemu-timer.h" @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +register_target_get_irq_delivered(apic_get_irq_delivered); +register_target_reset_irq_delivered(apic_reset_irq_delivered); + sysbus_register_withprop(&apic_info); } diff --git a/hw/pc.h b/hw/pc.h index bc8fcec..7511f28 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -7,6 +7,19 @@ #include "fdc.h" #include "net.h" +extern int (*target_get_irq_delivered)(void); +extern void (*target_reset_irq_delivered)(void); + +static inline void register_target_get_irq_delivered(int (*func)(void)) +{ +target_get_irq_delivered = func; +} + +static inline void register_target_reset_irq_delivered(void (*func)(void)) +{ +target_reset_irq_delivered = func; +} + /* PC-style peripherals (also used by other machines). */ /* serial.c */ diff --git a/vl.c b/vl.c index 73e147f..456e320 100644 --- a/vl.c +++ b/vl.c @@ -232,6 +232,19 @@ const char *prom_envs[MAX_PROM_ENVS]; const char *nvram = NULL; int boot_menu; +static int target_get_irq_delivered_stub(void) +{ +return 1; +} + +static void target_reset_irq_delivered_stub(void) +{ +return; +} + +int (*target_get_irq_delivered)(void) = target_get_irq_delivered_stub; +void (*target_reset_irq_delivered)(void) = target_reset_irq_delivered_stub; + typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { -- 1.6.2.5
[Qemu-devel] [PATCH v5 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
driftfix is a 'bit type' property. Compensation of delayed callbacks and coalesced interrupts can be enabled with the command line option -global hpet.driftfix=on driftfix is 'off' (disabled) by default. Signed-off-by: Ulrich Obergfell --- hw/hpet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 6ce07bc..7513065 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -72,6 +72,8 @@ typedef struct HPETState { uint64_t isr; /* interrupt status reg */ uint64_t hpet_counter; /* main counter */ uint8_t hpet_id; /* instance id */ + +uint32_t driftfix; } HPETState; static uint32_t hpet_in_legacy_mode(HPETState *s) @@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = { .qdev.props = (Property[]) { DEFINE_PROP_UINT8("timers", HPETState, num_timers, HPET_MIN_TIMERS), DEFINE_PROP_BIT("msi", HPETState, flags, HPET_MSI_SUPPORT, false), +DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.6.2.5
[Qemu-devel] [PATCH v5 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
The new fields in HPETTimer are covered by a separate VMStateDescription which is a subsection of 'vmstate_hpet_timer'. They are only migrated if -global hpet.driftfix=on Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 42 ++ 1 files changed, 42 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7513065..dba9370 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -55,6 +55,19 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +/* driftfix state */ +uint64_t prev_period;/* needed when the guest o/s changes the + * comparator value */ +uint64_t ticks_not_accounted;/* ticks for which no interrupts have been + * delivered to the guest o/s yet */ +uint32_t irq_rate; /* rate at which interrupts are delivered + * to the guest o/s during one period + * interval; if rate is greater than 1, + * additional interrupts are delivered + * to compensate missed interrupts */ +uint32_t divisor;/* needed to determine when the next + * timer callback should occur while + * rate is greater than 1 */ } HPETTimer; typedef struct HPETState { @@ -246,6 +259,27 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_timer_driftfix_vmstate_needed(void *opaque) +{ +HPETTimer *t = opaque; + +return (t->state->driftfix != 0); +} + +static const VMStateDescription vmstate_hpet_timer_driftfix = { +.name = "hpet_timer_driftfix", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(prev_period, HPETTimer), +VMSTATE_UINT64(ticks_not_accounted, HPETTimer), +VMSTATE_UINT32(irq_rate, HPETTimer), +VMSTATE_UINT32(divisor, HPETTimer), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_hpet_timer = { .name = "hpet_timer", .version_id = 1, @@ -260,6 +294,14 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT8(wrap_flag, HPETTimer), VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = &vmstate_hpet_timer_driftfix, +.needed = hpet_timer_driftfix_vmstate_needed, +}, { +/* empty */ +} } }; -- 1.6.2.5
[Qemu-devel] [PATCH] scsi-disk: fix bug in scsi_block_new_request() introduced by commit 137745c
This patch fixes a bug in scsi_block_new_request() that was introduced by commit 137745c5c60f083ec982fe9e861e8c16ebca1ba8. If the host cache is used - i.e. if BDRV_O_NOCACHE is _not_ set - the 'break' statement needs to be executed to 'fall back' to SG_IO. Signed-off-by: Ulrich Obergfell --- hw/scsi/scsi-disk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 4bcef55..7e62dbd 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2526,7 +2526,7 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag, * ones (such as WRITE SAME or EXTENDED COPY, etc.). So, without * O_DIRECT everything must go through SG_IO. */ -if (bdrv_get_flags(s->qdev.conf.bs) & BDRV_O_NOCACHE) { +if (!(bdrv_get_flags(s->qdev.conf.bs) & BDRV_O_NOCACHE)) { break; } -- 1.7.11.7