Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. -- Gleb.
[Qemu-devel] Re: sun framebuffer selection (was option-rom)
On Sat, Jun 5, 2010 at 11:10 PM, Bob Breuer breu...@mc.net wrote: Blue Swirl wrote: but again: should we have a new machine with cg14 or some switch to select TCX vs. cg14? Maybe the recently proposed machine subtype patches could help here. Well, let's try to figure out a method of selecting the framebuffer type. I'll try to list some of the options, even if they might be ridiculous. 1) Use the -vga option. I know TCX and cg14 are not vga, but I think it's the closest existing command line option available. 2) Switch based on the -g WxH option. At the moment, the TCX emulation doesn't really handle anything other than 1024x768, so switch to cg14 for other resolutions if supported. 3) Use some other existing command line option, -device, -set or -global? Might work, but the syntax may not be easy to remember. We don't have an equivalent of -chardev, -netdev and -drive for displays. 4) Machine subtype. 5) New command line option. Anything above might be better. 6) New machine type. Is it a big enough feature to demand it's own machine type? Maybe, but see next option. 7) Select as default video for SS-20. The SS-10 and SS-600MP are already very similar. This would allow for some differentiation between the machines, but there could still be an option to switch back to TCX. Note that TCX was really only available for the SS-4 and SS-5. Is there anything else that I missed? Combined 7 6: make cg14 default for SS-20, add a deprecated compatibility machine for SS-20 with TCX. I'm going to go ahead with option 2 in the short term. I'm inclined to narrow it down to options 1, 4, and 7. I know that 7 would have backwards compatibility concerns. The cg14 seems to have at least the same capabilities as TCX so there shouldn't be any loss of functionality. Even though SS-20 is not the default machine, do you know of any OS that works with the sun4m implementation today but doesn't have a cg14 driver? Possible downside to cg14 for video is that any acceleration is handled by the SX pixel processor which has no available documentation. TCX also has some amount of unimplemented acceleration. It would be nice to use some basic device with well defined acceleration or just a frame buffer as default.
[Qemu-devel] Re: [PATCH 6/6] apic: avoid using CPUState internals
Blue Swirl wrote: Use only an opaque CPUState pointer and move the actual CPUState contents handling to cpu.h and cpuid.c. Set env-halted in pc.c and add a function to get the local APIC state of the current CPU for the MMIO. Signed-off-by: Blue Swirl blauwir...@gmail.com --- hw/apic.c | 40 +++- hw/apic.h |9 - hw/pc.c | 12 +++- target-i386/cpu.h | 27 --- target-i386/cpuid.c |6 ++ 5 files changed, 56 insertions(+), 38 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 91c8d93..332c66e 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -95,7 +95,7 @@ #define MSI_ADDR_SIZE 0x10 struct APICState { -CPUState *cpu_env; +void *cpu_env; uint32_t apicbase; uint8_t id; uint8_t arb_id; @@ -320,7 +320,7 @@ void cpu_set_apic_base(APICState *s, uint64_t val) /* if disabled, cannot be enabled again */ if (!(val MSR_IA32_APICBASE_ENABLE)) { s-apicbase = ~MSR_IA32_APICBASE_ENABLE; -s-cpu_env-cpuid_features = ~CPUID_APIC; +cpu_clear_apic_feature(s-cpu_env); s-spurious_vec = ~APIC_SV_ENABLE; } } @@ -508,8 +508,6 @@ void apic_init_reset(APICState *s) s-initial_count_load_time = 0; s-next_time = 0; s-wait_for_sipi = 1; - -s-cpu_env-halted = !(s-apicbase MSR_IA32_APICBASE_BSP); We are now lacking 'halted' initialization after system reset. Could be addressed by a special reset handler in hw/pc.c, I guess. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 6, 2010 at 7:15 AM, Gleb Natapov g...@redhat.com wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. Because translation at IOAPIC may be lossy, IRQs from many devices pointing to the same vector? With IRQMsg you know where a specific message came from. The situation is different inside the kernel: it manages both translation and registration, whereas in QEMU we could only control registration.
[Qemu-devel] Re: [PATCH 6/6] apic: avoid using CPUState internals
On Sun, Jun 6, 2010 at 7:36 AM, Jan Kiszka jan.kis...@web.de wrote: Blue Swirl wrote: Use only an opaque CPUState pointer and move the actual CPUState contents handling to cpu.h and cpuid.c. Set env-halted in pc.c and add a function to get the local APIC state of the current CPU for the MMIO. Signed-off-by: Blue Swirl blauwir...@gmail.com --- hw/apic.c | 40 +++- hw/apic.h | 9 - hw/pc.c | 12 +++- target-i386/cpu.h | 27 --- target-i386/cpuid.c | 6 ++ 5 files changed, 56 insertions(+), 38 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 91c8d93..332c66e 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -95,7 +95,7 @@ #define MSI_ADDR_SIZE 0x10 struct APICState { - CPUState *cpu_env; + void *cpu_env; uint32_t apicbase; uint8_t id; uint8_t arb_id; @@ -320,7 +320,7 @@ void cpu_set_apic_base(APICState *s, uint64_t val) /* if disabled, cannot be enabled again */ if (!(val MSR_IA32_APICBASE_ENABLE)) { s-apicbase = ~MSR_IA32_APICBASE_ENABLE; - s-cpu_env-cpuid_features = ~CPUID_APIC; + cpu_clear_apic_feature(s-cpu_env); s-spurious_vec = ~APIC_SV_ENABLE; } } @@ -508,8 +508,6 @@ void apic_init_reset(APICState *s) s-initial_count_load_time = 0; s-next_time = 0; s-wait_for_sipi = 1; - - s-cpu_env-halted = !(s-apicbase MSR_IA32_APICBASE_BSP); We are now lacking 'halted' initialization after system reset. Could be addressed by a special reset handler in hw/pc.c, I guess. Good catch, I forgot to do that.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? That's because I confused myself :) I _dislike_ them to be used, but since device assignment requires ack notifiers anyway may be it is better to introduce one mechanism for device assignmen + de-coalescing instead of introducing two different mechanism. Using ack notifiers should be easy: RTC registers ack notifier and keep track of delivered interrupts. If timer triggers after previews irq was set, but before it was acked coalesced counter is incremented. In ack notifier callback coalesced counter is checked and if it is not zero new irq is set. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 07:39:49AM +, Blue Swirl wrote: On Sun, Jun 6, 2010 at 7:15 AM, Gleb Natapov g...@redhat.com wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. Because translation at IOAPIC may be lossy, IRQs from many devices pointing to the same vector? With IRQMsg you know where a specific message came from. The situation is different inside the kernel: it manages both translation and registration, whereas in QEMU we could only control registration. Configuring IOAPIC like that is against x86 architecture. OS will not be able to map from interrupt vector back to device. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? That's because I confused myself :) I _dislike_ them to be used, but since device assignment requires ack notifiers anyway may be it is better to introduce one mechanism for device assignmen + de-coalescing instead of introducing two different mechanism. Using ack notifiers should be easy: RTC registers ack notifier and keep track of delivered interrupts. If timer triggers after previews irq was set, but before it was acked coalesced counter is incremented. In ack notifier callback coalesced counter is checked and if it is not zero new irq is set. Ack notifier registrations and event deliveries still need to be routed. Piggy-backing this on IRQ messages may be unavoidable for that reason. Anyway, I'm going to post my HPET updates with the infrastructure for IRQMsg now. Maybe it's helpful to see the other option in reality. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH 04/16] hpet: Move static timer field initialization
From: Jan Kiszka jan.kis...@siemens.com Properly initialize HPETTimer::tn and HPETTimer::state once during hpet_init instead of (re-)writing them on every reset. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index bcb160b..fd7a1fd 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -575,12 +575,10 @@ static void hpet_reset(void *opaque) HPETTimer *timer = s-timer[i]; hpet_del_timer(timer); -timer-tn = i; timer-cmp = ~0ULL; timer-config = HPET_TN_PERIODIC_CAP | HPET_TN_SIZE_CAP; /* advertise availability of ioapic inti2 */ timer-config |= 0x0004ULL 32; -timer-state = s; timer-period = 0ULL; timer-wrap_flag = 0; } @@ -617,6 +615,8 @@ void hpet_init(qemu_irq *irq) for (i = 0; i HPET_NUM_TIMERS; i++) { timer = s-timer[i]; timer-qemu_timer = qemu_new_timer(vm_clock, hpet_timer, timer); +timer-tn = i; +timer-state = s; } vmstate_register(-1, vmstate_hpet, s); qemu_register_reset(hpet_reset, s); -- 1.6.0.2
[Qemu-devel] [PATCH 00/16] HPET cleanups, fixes, enhancements
Second round, specifically adressing: - IRQMsg framework to refactor existing de-coalescing code - RTC IRQ output as GPIO pin (routed depening on HPET or -no-hpet) - ISA reservation for RTC IRQ If discussion around IRQMsg and de-coalescing happens to continue, I would suggest to merge patches 1..7 as they are likely uncontroversial and also fix bugs. Jan Kiszka (16): hpet: Catch out-of-bounds timer access hpet: Coding style cleanups and some refactorings hpet: Silence warning on write to running main counter hpet: Move static timer field initialization hpet: Convert to qdev hpet: Start/stop timer when HPET_TN_ENABLE is modified monitor/QMP: Drop info hpet / query-hpet Pass IRQ object on handler invocation Enable message delivery via IRQs x86: Refactor RTC IRQ coalescing workaround hpet/rtc: Rework RTC IRQ replacement by HPET hpet: Drop static state hpet: Add support for level-triggered interrupts vmstate: Add VMSTATE_STRUCT_VARRAY_UINT8 hpet: Make number of timers configurable hpet: Add MSI support QMP/vm-info |2 +- hw/acpi_piix4.c |3 +- hw/apic.c | 66 +++--- hw/apic.h | 11 +- hw/arm11mpcore.c| 12 +- hw/arm_gic.c| 18 +- hw/arm_pic.c|6 +- hw/arm_timer.c |4 +- hw/bitbang_i2c.c|4 +- hw/bt-hci-csr.c |2 +- hw/cbus.c |6 +- hw/cris_pic_cpu.c |4 +- hw/esp.c|2 +- hw/etraxfs_pic.c| 16 +- hw/fdc.c|2 +- hw/heathrow_pic.c |3 +- hw/hpet.c | 595 ++- hw/hpet_emul.h | 46 +--- hw/hw.h | 10 + hw/i8259.c | 28 ++- hw/ide/cmd646.c |2 +- hw/ide/microdrive.c |2 +- hw/integratorcp.c | 10 +- hw/ioapic.c | 22 ++- hw/irq.c| 48 - hw/irq.h| 42 +++- hw/lance.c |2 +- hw/max7310.c|2 +- hw/mc146818rtc.c| 111 +- hw/mc146818rtc.h|4 +- hw/mcf5206.c|6 +- hw/mcf_intc.c | 14 +- hw/microblaze_pic_cpu.c |5 +- hw/mips_int.c | 10 +- hw/mips_jazz.c |4 +- hw/mips_malta.c |4 +- hw/mips_r4k.c |2 +- hw/mst_fpga.c | 10 +- hw/musicpal.c | 16 +- hw/nseries.c|4 +- hw/omap.h |2 +- hw/omap1.c | 34 ++-- hw/omap2.c |8 +- hw/omap_dma.c |8 +- hw/omap_mmc.c |2 +- hw/openpic.c|6 +- hw/palm.c |2 +- hw/pc.c | 59 -- hw/pc.h |8 +- hw/pci.c|4 +- hw/pl061.c |4 +- hw/pl190.c |6 +- hw/ppc.c|8 +- hw/ppc4xx_devs.c|2 +- hw/ppc_prep.c |4 +- hw/pxa2xx.c |2 +- hw/pxa2xx_gpio.c|2 +- hw/pxa2xx_pcmcia.c |3 +- hw/pxa2xx_pic.c | 10 +- hw/r2d.c|2 +- hw/rc4030.c |7 +- hw/sbi.c|2 +- hw/sh_intc.c|4 +- hw/sh_intc.h|2 +- hw/sharpsl.h|1 - hw/slavio_intctl.c | 16 +- hw/slavio_misc.c|3 +- hw/sparc32_dma.c|2 +- hw/spitz.c | 14 +- hw/ssd0323.c|2 +- hw/stellaris.c |6 +- hw/sun4c_intctl.c |8 +- hw/sun4m.c | 14 +- hw/sun4u.c | 12 +- hw/syborg_interrupt.c |8 +- hw/tc6393xb.c |7 +- hw/tosa.c |2 +- hw/tusb6010.c |3 +- hw/twl92230.c |5 +- hw/versatilepb.c| 10 +- hw/xilinx_intc.c|8 +- hw/zaurus.c |2 +- monitor.c | 22 -- qemu-monitor.hx | 21 -- 84 files changed, 874 insertions(+), 643 deletions(-)
[Qemu-devel] [PATCH 02/16] hpet: Coding style cleanups and some refactorings
From: Jan Kiszka jan.kis...@siemens.com This moves the private HPET structures into the C module, simplifies some helper functions and fixes most coding style issues (biggest chunk was improper switch-case indention). No functional changes. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Reviewed-by: Juan Quintela quint...@redhat.com --- hw/hpet.c | 413 ++- hw/hpet_emul.h | 31 + 2 files changed, 226 insertions(+), 218 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 1980906..2836fb0 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -37,21 +37,47 @@ #define DPRINTF(...) #endif +struct HPETState; +typedef struct HPETTimer { /* timers */ +uint8_t tn; /*timer number*/ +QEMUTimer *qemu_timer; +struct HPETState *state; +/* Memory-mapped, software visible timer registers */ +uint64_t config;/* configuration/cap */ +uint64_t cmp; /* comparator */ +uint64_t fsb; /* FSB route, not supported now */ +/* Hidden register state */ +uint64_t period;/* Last value written to comparator */ +uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit + * mode. Next pop will be actual timer expiration. + */ +} HPETTimer; + +typedef struct HPETState { +uint64_t hpet_offset; +qemu_irq *irqs; +HPETTimer timer[HPET_NUM_TIMERS]; + +/* Memory-mapped, software visible registers */ +uint64_t capability;/* capabilities */ +uint64_t config;/* configuration */ +uint64_t isr; /* interrupt status reg */ +uint64_t hpet_counter; /* main counter */ +} HPETState; + static HPETState *hpet_statep; uint32_t hpet_in_legacy_mode(void) { -if (hpet_statep) -return hpet_statep-config HPET_CFG_LEGACY; -else +if (!hpet_statep) { return 0; +} +return hpet_statep-config HPET_CFG_LEGACY; } static uint32_t timer_int_route(struct HPETTimer *timer) { -uint32_t route; -route = (timer-config HPET_TN_INT_ROUTE_MASK) HPET_TN_INT_ROUTE_SHIFT; -return route; +return (timer-config HPET_TN_INT_ROUTE_MASK) HPET_TN_INT_ROUTE_SHIFT; } static uint32_t hpet_enabled(void) @@ -108,9 +134,7 @@ static int deactivating_bit(uint64_t old, uint64_t new, uint64_t mask) static uint64_t hpet_get_ticks(void) { -uint64_t ticks; -ticks = ns_to_ticks(qemu_get_clock(vm_clock) + hpet_statep-hpet_offset); -return ticks; +return ns_to_ticks(qemu_get_clock(vm_clock) + hpet_statep-hpet_offset); } /* @@ -121,12 +145,14 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) if (t-config HPET_TN_32BIT) { uint32_t diff, cmp; + cmp = (uint32_t)t-cmp; diff = cmp - (uint32_t)current; diff = (int32_t)diff 0 ? diff : (uint32_t)0; return (uint64_t)diff; } else { uint64_t diff, cmp; + cmp = t-cmp; diff = cmp - current; diff = (int64_t)diff 0 ? diff : (uint64_t)0; @@ -136,7 +162,6 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) static void update_irq(struct HPETTimer *timer) { -qemu_irq irq; int route; if (timer-tn = 1 hpet_in_legacy_mode()) { @@ -144,22 +169,20 @@ static void update_irq(struct HPETTimer *timer) * timer0 be routed to IRQ0 in NON-APIC or IRQ2 in the I/O APIC, * timer1 be routed to IRQ8 in NON-APIC or IRQ8 in the I/O APIC. */ -if (timer-tn == 0) { -irq=timer-state-irqs[0]; -} else -irq=timer-state-irqs[8]; +route = (timer-tn == 0) ? 0 : 8; } else { -route=timer_int_route(timer); -irq=timer-state-irqs[route]; +route = timer_int_route(timer); } -if (timer_enabled(timer) hpet_enabled()) { -qemu_irq_pulse(irq); +if (!timer_enabled(timer) || !hpet_enabled()) { +return; } +qemu_irq_pulse(timer-state-irqs[route]); } static void hpet_pre_save(void *opaque) { HPETState *s = opaque; + /* save current counter value */ s-hpet_counter = hpet_get_ticks(); } @@ -212,7 +235,7 @@ static const VMStateDescription vmstate_hpet = { */ static void hpet_timer(void *opaque) { -HPETTimer *t = (HPETTimer*)opaque; +HPETTimer *t = opaque; uint64_t diff; uint64_t period = t-period; @@ -220,20 +243,22 @@ static void hpet_timer(void *opaque) if (timer_is_periodic(t) period != 0) { if (t-config HPET_TN_32BIT) { -while (hpet_time_after(cur_tick, t-cmp)) +while (hpet_time_after(cur_tick, t-cmp)) { t-cmp = (uint32_t)(t-cmp + t-period); -} else -while (hpet_time_after64(cur_tick, t-cmp)) +} +} else { +while (hpet_time_after64(cur_tick, t-cmp)) {
[Qemu-devel] [PATCH 05/16] hpet: Convert to qdev
From: Jan Kiszka jan.kis...@siemens.com Register the HPET as a sysbus device and create it that way. As it can route its IRQs to any ISA IRQ, we need to connect it to all 24 of them. Once converted to qdev, we can move reset handler and vmstate registration into its hands as well. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 43 ++- hw/hpet_emul.h |3 ++- hw/pc.c|7 ++- 3 files changed, 38 insertions(+), 15 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index fd7a1fd..6974935 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -29,6 +29,7 @@ #include console.h #include qemu-timer.h #include hpet_emul.h +#include sysbus.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -54,8 +55,9 @@ typedef struct HPETTimer { /* timers */ } HPETTimer; typedef struct HPETState { +SysBusDevice busdev; uint64_t hpet_offset; -qemu_irq *irqs; +qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; HPETTimer timer[HPET_NUM_TIMERS]; /* Memory-mapped, software visible registers */ @@ -565,9 +567,9 @@ static CPUWriteMemoryFunc * const hpet_ram_write[] = { hpet_ram_writel, }; -static void hpet_reset(void *opaque) +static void hpet_reset(DeviceState *d) { -HPETState *s = opaque; +HPETState *s = FROM_SYSBUS(HPETState, sysbus_from_qdev(d)); int i; static int count = 0; @@ -600,28 +602,43 @@ static void hpet_reset(void *opaque) count = 1; } - -void hpet_init(qemu_irq *irq) +static int hpet_init(SysBusDevice *dev) { +HPETState *s = FROM_SYSBUS(HPETState, dev); int i, iomemtype; HPETTimer *timer; -HPETState *s; - -DPRINTF (hpet_init\n); -s = qemu_mallocz(sizeof(HPETState)); +assert(!hpet_statep); hpet_statep = s; -s-irqs = irq; +for (i = 0; i HPET_NUM_IRQ_ROUTES; i++) { +sysbus_init_irq(dev, s-irqs[i]); +} for (i = 0; i HPET_NUM_TIMERS; i++) { timer = s-timer[i]; timer-qemu_timer = qemu_new_timer(vm_clock, hpet_timer, timer); timer-tn = i; timer-state = s; } -vmstate_register(-1, vmstate_hpet, s); -qemu_register_reset(hpet_reset, s); + /* HPET Area */ iomemtype = cpu_register_io_memory(hpet_ram_read, hpet_ram_write, s); -cpu_register_physical_memory(HPET_BASE, 0x400, iomemtype); +sysbus_init_mmio(dev, 0x400, iomemtype); +return 0; } + +static SysBusDeviceInfo hpet_device_info = { +.qdev.name= hpet, +.qdev.size= sizeof(HPETState), +.qdev.no_user = 1, +.qdev.vmsd= vmstate_hpet, +.qdev.reset = hpet_reset, +.init = hpet_init, +}; + +static void hpet_register_device(void) +{ +sysbus_register_withprop(hpet_device_info); +} + +device_init(hpet_register_device) diff --git a/hw/hpet_emul.h b/hw/hpet_emul.h index 2f5f8ba..785f850 100644 --- a/hw/hpet_emul.h +++ b/hw/hpet_emul.h @@ -19,6 +19,8 @@ #define FS_PER_NS 100 #define HPET_NUM_TIMERS 3 +#define HPET_NUM_IRQ_ROUTES 32 + #define HPET_CFG_ENABLE 0x001 #define HPET_CFG_LEGACY 0x002 @@ -47,7 +49,6 @@ #if defined TARGET_I386 extern uint32_t hpet_in_legacy_mode(void); -extern void hpet_init(qemu_irq *irq); #endif #endif diff --git a/hw/pc.c b/hw/pc.c index 9b85c42..ae31e2e 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -35,6 +35,7 @@ #include elf.h #include multiboot.h #include mc146818rtc.h +#include sysbus.h /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -957,7 +958,11 @@ void pc_basic_device_init(qemu_irq *isa_irq, pit = pit_init(0x40, isa_reserve_irq(0)); pcspk_init(pit); if (!no_hpet) { -hpet_init(isa_irq); +DeviceState *hpet = sysbus_create_simple(hpet, HPET_BASE, NULL); + +for (i = 0; i 24; i++) { +sysbus_connect_irq(sysbus_from_qdev(hpet), i, isa_irq[i]); +} } for(i = 0; i MAX_SERIAL_PORTS; i++) { -- 1.6.0.2
[Qemu-devel] [PATCH 10/16] x86: Refactor RTC IRQ coalescing workaround
From: Jan Kiszka jan.kis...@siemens.com Make use of the new IRQ message and report delivery results from the sink to the source. As a by-product, this also adds de-coalescing support to the PIC. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c| 64 +++-- hw/apic.h|9 ++ hw/i8259.c | 16 ++- hw/ioapic.c | 20 ++--- hw/mc146818rtc.c | 83 ++--- hw/pc.c | 29 -- 6 files changed, 141 insertions(+), 80 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 7fbd79b..f9587d1 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -123,10 +123,8 @@ typedef struct APICState { static int apic_io_memory; static APICState *local_apics[MAX_APICS + 1]; static int last_apic_idx = 0; -static int apic_irq_delivered; - -static void apic_set_irq(APICState *s, int vector_num, int trigger_mode); +static int apic_set_irq(APICState *s, int vector_num, int trigger_mode); static void apic_update_irq(APICState *s); static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, uint8_t dest, uint8_t dest_mode); @@ -239,12 +237,12 @@ void apic_deliver_pic_intr(CPUState *env, int level) }\ } -static void apic_bus_deliver(const uint32_t *deliver_bitmask, - uint8_t delivery_mode, - uint8_t vector_num, uint8_t polarity, - uint8_t trigger_mode) +static int apic_bus_deliver(const uint32_t *deliver_bitmask, +uint8_t delivery_mode, uint8_t vector_num, +uint8_t polarity, uint8_t trigger_mode) { APICState *apic_iter; +int ret; switch (delivery_mode) { case APIC_DM_LOWPRI: @@ -261,11 +259,12 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, if (d = 0) { apic_iter = local_apics[d]; if (apic_iter) { -apic_set_irq(apic_iter, vector_num, trigger_mode); +return apic_set_irq(apic_iter, vector_num, +trigger_mode); } } } -return; +return QEMU_IRQ_MASKED; case APIC_DM_FIXED: break; @@ -273,34 +272,42 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, case APIC_DM_SMI: foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_SMI) ); -return; +return QEMU_IRQ_DELIVERED; case APIC_DM_NMI: foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_NMI) ); -return; +return QEMU_IRQ_DELIVERED; case APIC_DM_INIT: /* normal INIT IPI sent to processors */ foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_INIT) ); -return; +return QEMU_IRQ_DELIVERED; case APIC_DM_EXTINT: /* handled in I/O APIC code */ break; default: -return; +return QEMU_IRQ_MASKED; } +ret = QEMU_IRQ_MASKED; foreach_apic(apic_iter, deliver_bitmask, - apic_set_irq(apic_iter, vector_num, trigger_mode) ); +if (ret == QEMU_IRQ_MASKED) +ret = QEMU_IRQ_COALESCED; +if (apic_set_irq(apic_iter, vector_num, + trigger_mode) == QEMU_IRQ_DELIVERED) { +ret = QEMU_IRQ_DELIVERED; +} +); +return ret; } -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, - uint8_t delivery_mode, uint8_t vector_num, - uint8_t polarity, uint8_t trigger_mode) +int apic_deliver_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t polarity, uint8_t trigger_mode) { uint32_t deliver_bitmask[MAX_APIC_WORDS]; @@ -308,8 +315,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, polarity %d trigger_mode %d\n, __func__, dest, dest_mode, delivery_mode, vector_num, polarity, trigger_mode); apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode); -apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, polarity, - trigger_mode); +return apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, +polarity, trigger_mode); } void cpu_set_apic_base(CPUState *env, uint64_t val) @@ -402,22 +409,10 @@ static void apic_update_irq(APICState *s) cpu_interrupt(s-cpu_env, CPU_INTERRUPT_HARD); } -void apic_reset_irq_delivered(void) -{ -DPRINTF_C(%s: old
[Qemu-devel] [PATCH 09/16] Enable message delivery via IRQs
From: Jan Kiszka jan.kis...@siemens.com This patch allows to optionally attach a message to an IRQ event. The message can contain a payload reference and a callback that the IRQ handler may invoke to report the delivery result. The former can be used to model message signaling interrupts, the latter to cleanly implement IRQ de-coalescing logics. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/irq.c | 37 - hw/irq.h | 38 +- 2 files changed, 73 insertions(+), 2 deletions(-) diff --git a/hw/irq.c b/hw/irq.c index 24fb09d..db5136b 100644 --- a/hw/irq.c +++ b/hw/irq.c @@ -28,12 +28,31 @@ struct IRQState { qemu_irq_handler handler; void *opaque; int n; +IRQMsg *msg; }; -void qemu_set_irq(qemu_irq irq, int level) +void qemu_set_irq_msg(qemu_irq irq, int level, IRQMsg *msg) { if (irq) { +irq-msg = msg; irq-handler(irq, irq-opaque, irq-n, level); +irq-msg = NULL; +} +} + +void *qemu_irq_get_payload(qemu_irq irq) +{ +IRQMsg *msg = irq-msg; + +return msg ? msg-payload : NULL; +} + +void qemu_irq_fire_delivery_cb(qemu_irq irq, int level, int result) +{ +IRQMsg *msg = irq-msg; + +if (msg msg-delivery_cb) { +msg-delivery_cb(irq, msg-delivery_opaque, irq-n, level, result); } } @@ -61,11 +80,27 @@ void qemu_free_irqs(qemu_irq *s) qemu_free(s); } +static void qemu_notirq_delivery_cb(qemu_irq irq, void *opaque, int line, +int level, int result) +{ +qemu_irq orig_irq = opaque; + +qemu_irq_fire_delivery_cb(orig_irq, !level, result); +} + static void qemu_notirq(qemu_irq irq, void *opaque, int line, int level) { struct IRQState *inv_irq = opaque; +IRQMsg msg; +if (irq-msg) { +msg.delivery_cb = qemu_notirq_delivery_cb; +msg.delivery_opaque = irq; +msg.payload = irq-msg-payload; +inv_irq-msg = msg; +} inv_irq-handler(inv_irq, inv_irq-opaque, inv_irq-n, !level); +inv_irq-msg = NULL; } qemu_irq qemu_irq_invert(qemu_irq irq) diff --git a/hw/irq.h b/hw/irq.h index d0f83e3..01f96af 100644 --- a/hw/irq.h +++ b/hw/irq.h @@ -3,26 +3,62 @@ /* Generic IRQ/GPIO pin infrastructure. */ +#define QEMU_IRQ_DELIVERED 0 +#define QEMU_IRQ_COALESCED (-1) +#define QEMU_IRQ_MASKED (-2) + +typedef void (*qemu_irq_delivery_cb)(qemu_irq irq, void *opaque, int n, + int level, int result); typedef void (*qemu_irq_handler)(qemu_irq irq, void *opaque, int n, int level); -void qemu_set_irq(qemu_irq irq, int level); +typedef struct IRQMsg { +qemu_irq_delivery_cb delivery_cb; +void *delivery_opaque; +void *payload; +} IRQMsg; + +void qemu_set_irq_msg(qemu_irq irq, int level, IRQMsg *msg); + +static inline void qemu_set_irq(qemu_irq irq, int level) +{ +qemu_set_irq_msg(irq, level, NULL); +} static inline void qemu_irq_raise(qemu_irq irq) { qemu_set_irq(irq, 1); } +static inline void qemu_irq_raise_msg(qemu_irq irq, IRQMsg *msg) +{ +qemu_set_irq_msg(irq, 1, msg); +} + static inline void qemu_irq_lower(qemu_irq irq) { qemu_set_irq(irq, 0); } +static inline void qemu_irq_lower_msg(qemu_irq irq, IRQMsg *msg) +{ +qemu_set_irq_msg(irq, 0, msg); +} + static inline void qemu_irq_pulse(qemu_irq irq) { qemu_set_irq(irq, 1); qemu_set_irq(irq, 0); } +static inline void qemu_irq_pulse_msg(qemu_irq irq, IRQMsg *msg) +{ +qemu_set_irq_msg(irq, 1, msg); +qemu_set_irq_msg(irq, 0, msg); +} + +void qemu_irq_fire_delivery_cb(qemu_irq irq, int level, int result); +void *qemu_irq_get_payload(qemu_irq irq); + /* Returns an array of N IRQs. */ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, int n); void qemu_free_irqs(qemu_irq *s); -- 1.6.0.2
[Qemu-devel] [PATCH 06/16] hpet: Start/stop timer when HPET_TN_ENABLE is modified
From: Jan Kiszka jan.kis...@siemens.com We have to update the qemu timer when the per-timer enable bit is toggled, just like for HPET_CFG_ENABLE changes. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 6974935..041dd84 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -430,6 +430,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, printf(qemu: level-triggered hpet not supported\n); exit (-1); } +if (activating_bit(old_val, new_val, HPET_TN_ENABLE)) { +hpet_set_timer(timer); +} else if (deactivating_bit(old_val, new_val, HPET_TN_ENABLE)) { +hpet_del_timer(timer); +} break; case HPET_TN_CFG + 4: // Interrupt capabilities DPRINTF(qemu: invalid HPET_TN_CFG+4 write\n); -- 1.6.0.2
[Qemu-devel] [PATCH 07/16] monitor/QMP: Drop info hpet / query-hpet
From: Jan Kiszka jan.kis...@siemens.com This command was of minimal use before, now it is useless as the hpet become a qdev device and is thus easily discoverable. We should definitely not set query-hpet in QMP's stone, and there is also no good reason to keep it for the interactive monitor. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- QMP/vm-info |2 +- monitor.c | 22 -- qemu-monitor.hx | 21 - 3 files changed, 1 insertions(+), 44 deletions(-) diff --git a/QMP/vm-info b/QMP/vm-info index b150d82..8ebaeb3 100755 --- a/QMP/vm-info +++ b/QMP/vm-info @@ -25,7 +25,7 @@ def main(): qemu = qmp.QEMUMonitorProtocol(argv[1]) qemu.connect() -for cmd in [ 'version', 'hpet', 'kvm', 'status', 'uuid', 'balloon' ]: +for cmd in [ 'version', 'kvm', 'status', 'uuid', 'balloon' ]: print cmd + ': ' + str(qemu.send('query-' + cmd)) if __name__ == '__main__': diff --git a/monitor.c b/monitor.c index 15b53b9..14f77bd 100644 --- a/monitor.c +++ b/monitor.c @@ -740,20 +740,6 @@ static void do_info_commands(Monitor *mon, QObject **ret_data) *ret_data = QOBJECT(cmd_list); } -#if defined(TARGET_I386) -static void do_info_hpet_print(Monitor *mon, const QObject *data) -{ -monitor_printf(mon, HPET is %s by QEMU\n, - qdict_get_bool(qobject_to_qdict(data), enabled) ? - enabled : disabled); -} - -static void do_info_hpet(Monitor *mon, QObject **ret_data) -{ -*ret_data = qobject_from_jsonf({ 'enabled': %i }, !no_hpet); -} -#endif - static void do_info_uuid_print(Monitor *mon, const QObject *data) { monitor_printf(mon, %s\n, qdict_get_str(qobject_to_qdict(data), UUID)); @@ -2509,14 +2495,6 @@ static const mon_cmd_t info_cmds[] = { .help = show the active virtual memory mappings, .mhandler.info = mem_info, }, -{ -.name = hpet, -.args_type = , -.params = , -.help = show state of HPET, -.user_print = do_info_hpet_print, -.mhandler.info_new = do_info_hpet, -}, #endif { .name = jit, diff --git a/qemu-monitor.hx b/qemu-monitor.hx index f6a94f2..9f62b94 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -2144,27 +2144,6 @@ show the active virtual memory mappings (i386 only) ETEXI STEXI -...@item info hpet -show state of HPET (i386 only) -ETEXI -SQMP -query-hpet --- - -Show HPET state. - -Return a json-object with the following information: - -- enabled: true if hpet if enabled, false otherwise (json-bool) - -Example: - -- { execute: query-hpet } -- { return: { enabled: true } } - -EQMP - -STEXI @item info jit show dynamic compiler info @item info kvm -- 1.6.0.2
[Qemu-devel] [PATCH 14/16] vmstate: Add VMSTATE_STRUCT_VARRAY_UINT8
From: Jan Kiszka jan.kis...@siemens.com Required for hpet. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hw.h | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index fc2d184..36be0be 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -474,6 +474,16 @@ extern const VMStateInfo vmstate_info_unused_buffer; .offset = vmstate_offset_array(_state, _field, _type, _num), \ } +#define VMSTATE_STRUCT_VARRAY_UINT8(_field, _state, _field_num, _version, _vmsd, _type) { \ +.name = (stringify(_field)), \ +.num_offset = vmstate_offset_value(_state, _field_num, uint8_t), \ +.version_id = (_version),\ +.vmsd = (_vmsd), \ +.size = sizeof(_type), \ +.flags = VMS_STRUCT|VMS_VARRAY_INT32, \ +.offset = offsetof(_state, _field), \ +} + #define VMSTATE_STATIC_BUFFER(_field, _state, _version, _test, _start, _size) { \ .name = (stringify(_field)), \ .version_id = (_version), \ -- 1.6.0.2
[Qemu-devel] [PATCH 12/16] hpet: Drop static state
From: Jan Kiszka jan.kis...@siemens.com Instead of keeping a static reference around, pass the state to hpet_enabled and hpet_get_ticks. All callers now have it at hand. Will once allow to instantiate the HPET more than a single time. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 38 +- 1 files changed, 17 insertions(+), 21 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index d26cad5..3866061 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -69,8 +69,6 @@ typedef struct HPETState { uint64_t hpet_counter; /* main counter */ } HPETState; -static HPETState *hpet_statep; - static uint32_t hpet_in_legacy_mode(HPETState *s) { return s-config HPET_CFG_LEGACY; @@ -81,9 +79,9 @@ static uint32_t timer_int_route(struct HPETTimer *timer) return (timer-config HPET_TN_INT_ROUTE_MASK) HPET_TN_INT_ROUTE_SHIFT; } -static uint32_t hpet_enabled(void) +static uint32_t hpet_enabled(HPETState *s) { -return hpet_statep-config HPET_CFG_ENABLE; +return s-config HPET_CFG_ENABLE; } static uint32_t timer_is_periodic(HPETTimer *t) @@ -133,9 +131,9 @@ static int deactivating_bit(uint64_t old, uint64_t new, uint64_t mask) return ((old mask) !(new mask)); } -static uint64_t hpet_get_ticks(void) +static uint64_t hpet_get_ticks(HPETState *s) { -return ns_to_ticks(qemu_get_clock(vm_clock) + hpet_statep-hpet_offset); +return ns_to_ticks(qemu_get_clock(vm_clock) + s-hpet_offset); } /* @@ -174,7 +172,7 @@ static void update_irq(struct HPETTimer *timer) } else { route = timer_int_route(timer); } -if (!timer_enabled(timer) || !hpet_enabled()) { +if (!timer_enabled(timer) || !hpet_enabled(timer-state)) { return; } qemu_irq_pulse(timer-state-irqs[route]); @@ -185,7 +183,7 @@ static void hpet_pre_save(void *opaque) HPETState *s = opaque; /* save current counter value */ -s-hpet_counter = hpet_get_ticks(); +s-hpet_counter = hpet_get_ticks(s); } static int hpet_post_load(void *opaque, int version_id) @@ -240,7 +238,7 @@ static void hpet_timer(void *opaque) uint64_t diff; uint64_t period = t-period; -uint64_t cur_tick = hpet_get_ticks(); +uint64_t cur_tick = hpet_get_ticks(t-state); if (timer_is_periodic(t) period != 0) { if (t-config HPET_TN_32BIT) { @@ -270,7 +268,7 @@ static void hpet_set_timer(HPETTimer *t) { uint64_t diff; uint32_t wrap_diff; /* how many ticks until we wrap? */ -uint64_t cur_tick = hpet_get_ticks(); +uint64_t cur_tick = hpet_get_ticks(t-state); /* whenever new timer is being set up, make sure wrap_flag is 0 */ t-wrap_flag = 0; @@ -353,16 +351,16 @@ static uint32_t hpet_ram_readl(void *opaque, target_phys_addr_t addr) DPRINTF(qemu: invalid HPET_CFG + 4 hpet_ram_readl \n); return 0; case HPET_COUNTER: -if (hpet_enabled()) { -cur_tick = hpet_get_ticks(); +if (hpet_enabled(s)) { +cur_tick = hpet_get_ticks(s); } else { cur_tick = s-hpet_counter; } DPRINTF(qemu: reading counter = % PRIx64 \n, cur_tick); return cur_tick; case HPET_COUNTER + 4: -if (hpet_enabled()) { -cur_tick = hpet_get_ticks(); +if (hpet_enabled(s)) { +cur_tick = hpet_get_ticks(s); } else { cur_tick = s-hpet_counter; } @@ -457,7 +455,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, (timer-period 0xULL) | new_val; } timer-config = ~HPET_TN_SETVAL; -if (hpet_enabled()) { +if (hpet_enabled(s)) { hpet_set_timer(timer); } break; @@ -476,7 +474,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, (timer-period 0xULL) | new_val 32; } timer-config = ~HPET_TN_SETVAL; -if (hpet_enabled()) { +if (hpet_enabled(s)) { hpet_set_timer(timer); } break; @@ -506,7 +504,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } } else if (deactivating_bit(old_val, new_val, HPET_CFG_ENABLE)) { /* Halt main counter and disable interrupt generation. */ -s-hpet_counter = hpet_get_ticks(); +s-hpet_counter = hpet_get_ticks(s); for (i = 0; i HPET_NUM_TIMERS; i++) { hpet_del_timer(s-timer[i]); } @@ -527,7 +525,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, /* FIXME: need to handle level-triggered interrupts */ break; case HPET_COUNTER: -
[Qemu-devel] [PATCH 11/16] hpet/rtc: Rework RTC IRQ replacement by HPET
From: Jan Kiszka jan.kis...@siemens.com Allow the intercept the RTC IRQ for the HPET legacy mode. Then push routing to IRQ8 completely into the HPET. This allows to turn hpet_in_legacy_mode() into a private function. Furthermore, this stops the RTC from clearing IRQ8 even if the HPET is in control. This patch comes with a side effect: The RTC timers will no longer be stoppend when there is no IRQ consumer, possibly causing a minor performance degration. But as the guest may want to redirect the RTC to the SCI in that mode, it should normally disable unused IRQ source anyway. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c| 42 +++--- hw/hpet_emul.h |4 hw/mc146818rtc.c | 54 +++--- hw/mc146818rtc.h |4 +++- hw/mips_jazz.c |2 +- hw/mips_malta.c |2 +- hw/mips_r4k.c|2 +- hw/pc.c | 14 -- hw/ppc_prep.c|2 +- 9 files changed, 65 insertions(+), 61 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 041dd84..d26cad5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include qemu-timer.h #include hpet_emul.h #include sysbus.h +#include mc146818rtc.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -58,6 +59,7 @@ typedef struct HPETState { SysBusDevice busdev; uint64_t hpet_offset; qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; +uint8_t rtc_irq_level; HPETTimer timer[HPET_NUM_TIMERS]; /* Memory-mapped, software visible registers */ @@ -69,12 +71,9 @@ typedef struct HPETState { static HPETState *hpet_statep; -uint32_t hpet_in_legacy_mode(void) +static uint32_t hpet_in_legacy_mode(HPETState *s) { -if (!hpet_statep) { -return 0; -} -return hpet_statep-config HPET_CFG_LEGACY; +return s-config HPET_CFG_LEGACY; } static uint32_t timer_int_route(struct HPETTimer *timer) @@ -166,12 +165,12 @@ static void update_irq(struct HPETTimer *timer) { int route; -if (timer-tn = 1 hpet_in_legacy_mode()) { +if (timer-tn = 1 hpet_in_legacy_mode(timer-state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires * timer0 be routed to IRQ0 in NON-APIC or IRQ2 in the I/O APIC, * timer1 be routed to IRQ8 in NON-APIC or IRQ8 in the I/O APIC. */ -route = (timer-tn == 0) ? 0 : 8; +route = (timer-tn == 0) ? 0 : RTC_ISA_IRQ; } else { route = timer_int_route(timer); } @@ -515,8 +514,10 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_disable(); +qemu_irq_lower(s-irqs[RTC_ISA_IRQ]); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_enable(); +qemu_set_irq(s-irqs[RTC_ISA_IRQ], s-rtc_irq_level); } break; case HPET_CFG + 4: @@ -607,6 +608,30 @@ static void hpet_reset(DeviceState *d) count = 1; } +static void hpet_rtc_delivery_cb(qemu_irq irq, void *opaque, int n, int level, + int result) +{ +qemu_irq orig_irq = opaque; + +qemu_irq_fire_delivery_cb(orig_irq, level, result); +} + +static void hpet_handle_rtc_irq(qemu_irq irq, void *opaque, int n, int level) +{ +HPETState *s = FROM_SYSBUS(HPETState, opaque); +IRQMsg msg = { +.delivery_cb = hpet_rtc_delivery_cb, +.delivery_opaque = irq, +}; + +s-rtc_irq_level = level; +if (hpet_in_legacy_mode(s)) { +qemu_irq_fire_delivery_cb(irq, level, QEMU_IRQ_MASKED); +} else { +qemu_set_irq_msg(s-irqs[RTC_ISA_IRQ], level, msg); +} +} + static int hpet_init(SysBusDevice *dev) { HPETState *s = FROM_SYSBUS(HPETState, dev); @@ -625,6 +650,9 @@ static int hpet_init(SysBusDevice *dev) timer-state = s; } +isa_reserve_irq(RTC_ISA_IRQ); +qdev_init_gpio_in(dev-qdev, hpet_handle_rtc_irq, 1); + /* HPET Area */ iomemtype = cpu_register_io_memory(hpet_ram_read, hpet_ram_write, s); diff --git a/hw/hpet_emul.h b/hw/hpet_emul.h index 785f850..9c268cc 100644 --- a/hw/hpet_emul.h +++ b/hw/hpet_emul.h @@ -47,8 +47,4 @@ #define HPET_TN_INT_ROUTE_CAP_SHIFT 32 #define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U -#if defined TARGET_I386 -extern uint32_t hpet_in_legacy_mode(void); -#endif - #endif diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index cbb98a4..ac82810 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -26,7 +26,6 @@ #include sysemu.h #include pc.h #include isa.h -#include hpet_emul.h #include mc146818rtc.h //#define DEBUG_CMOS @@ -100,24 +99,6 @@ typedef struct RTCState { QEMUTimer *second_timer2; } RTCState; -static void rtc_irq_raise(RTCState *s,
[Qemu-devel] [PATCH 13/16] hpet: Add support for level-triggered interrupts
From: Jan Kiszka jan.kis...@siemens.com By implementing this feature we can also remove a nasty way to kill qemu (by trying to enable level-triggered hpet interrupts). Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 32 ++-- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 3866061..eafdccb 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -159,8 +159,10 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer) +static void update_irq(struct HPETTimer *timer, int set) { +uint64_t mask; +HPETState *s; int route; if (timer-tn = 1 hpet_in_legacy_mode(timer-state)) { @@ -172,10 +174,18 @@ static void update_irq(struct HPETTimer *timer) } else { route = timer_int_route(timer); } -if (!timer_enabled(timer) || !hpet_enabled(timer-state)) { -return; +s = timer-state; +mask = 1 timer-tn; +if (!set || !timer_enabled(timer) || !hpet_enabled(timer-state)) { +s-isr = ~mask; +qemu_irq_lower(s-irqs[route]); +} else if (timer-config HPET_TN_TYPE_LEVEL) { +s-isr |= mask; +qemu_irq_raise(s-irqs[route]); +} else { +s-isr = ~mask; +qemu_irq_pulse(s-irqs[route]); } -qemu_irq_pulse(timer-state-irqs[route]); } static void hpet_pre_save(void *opaque) @@ -261,7 +271,7 @@ static void hpet_timer(void *opaque) t-wrap_flag = 0; } } -update_irq(t); +update_irq(t, 1); } static void hpet_set_timer(HPETTimer *t) @@ -291,6 +301,7 @@ static void hpet_set_timer(HPETTimer *t) static void hpet_del_timer(HPETTimer *t) { qemu_del_timer(t-qemu_timer); +update_irq(t, 0); } #ifdef HPET_DEBUG @@ -423,10 +434,6 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, timer-cmp = (uint32_t)timer-cmp; timer-period = (uint32_t)timer-period; } -if (new_val HPET_TN_TYPE_LEVEL) { -printf(qemu: level-triggered hpet not supported\n); -exit (-1); -} if (activating_bit(old_val, new_val, HPET_TN_ENABLE)) { hpet_set_timer(timer); } else if (deactivating_bit(old_val, new_val, HPET_TN_ENABLE)) { @@ -522,7 +529,12 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, DPRINTF(qemu: invalid HPET_CFG+4 write \n); break; case HPET_STATUS: -/* FIXME: need to handle level-triggered interrupts */ +val = new_val s-isr; +for (i = 0; i HPET_NUM_TIMERS; i++) { +if (val (1 i)) { +update_irq(s-timer[i], 0); +} +} break; case HPET_COUNTER: if (hpet_enabled(s)) { -- 1.6.0.2
[Qemu-devel] [PATCH 15/16] hpet: Make number of timers configurable
From: Jan Kiszka jan.kis...@siemens.com One HPET block supports up to 32 timers. Allow to instantiate more than the recommended and implemented minimum of 3. The number is configured via the qdev property timers. It is also saved/restored so that it need not match between migration peers. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 53 - hw/hpet_emul.h |6 +- 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index eafdccb..7219967 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -60,7 +60,8 @@ typedef struct HPETState { uint64_t hpet_offset; qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; uint8_t rtc_irq_level; -HPETTimer timer[HPET_NUM_TIMERS]; +uint8_t num_timers; +HPETTimer timer[HPET_MAX_TIMERS]; /* Memory-mapped, software visible registers */ uint64_t capability;/* capabilities */ @@ -196,12 +197,25 @@ static void hpet_pre_save(void *opaque) s-hpet_counter = hpet_get_ticks(s); } +static int hpet_pre_load(void *opaque) +{ +HPETState *s = opaque; + +/* version 1 only supports 3, later versions will load the actual value */ +s-num_timers = HPET_MIN_TIMERS; +return 0; +} + static int hpet_post_load(void *opaque, int version_id) { HPETState *s = opaque; /* Recalculate the offset between the main counter and guest time */ s-hpet_offset = ticks_to_ns(s-hpet_counter) - qemu_get_clock(vm_clock); + +/* Push number of timers into capability returned via HPET_ID */ +s-capability = ~HPET_ID_NUM_TIM_MASK; +s-capability |= (s-num_timers - 1) HPET_ID_NUM_TIM_SHIFT; return 0; } @@ -224,17 +238,19 @@ static const VMStateDescription vmstate_hpet_timer = { static const VMStateDescription vmstate_hpet = { .name = hpet, -.version_id = 1, +.version_id = 2, .minimum_version_id = 1, .minimum_version_id_old = 1, .pre_save = hpet_pre_save, +.pre_load = hpet_pre_load, .post_load = hpet_post_load, .fields = (VMStateField []) { VMSTATE_UINT64(config, HPETState), VMSTATE_UINT64(isr, HPETState), VMSTATE_UINT64(hpet_counter, HPETState), -VMSTATE_STRUCT_ARRAY(timer, HPETState, HPET_NUM_TIMERS, 0, - vmstate_hpet_timer, HPETTimer), +VMSTATE_UINT8_V(num_timers, HPETState, 2), +VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0, +vmstate_hpet_timer, HPETTimer), VMSTATE_END_OF_LIST() } }; @@ -330,7 +346,7 @@ static uint32_t hpet_ram_readl(void *opaque, target_phys_addr_t addr) uint8_t timer_id = (addr - 0x100) / 0x20; HPETTimer *timer = s-timer[timer_id]; -if (timer_id HPET_NUM_TIMERS - 1) { +if (timer_id s-num_timers) { DPRINTF(qemu: timer id out of range\n); return 0; } @@ -421,7 +437,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, HPETTimer *timer = s-timer[timer_id]; DPRINTF(qemu: hpet_ram_writel timer_id = %#x \n, timer_id); -if (timer_id HPET_NUM_TIMERS - 1) { +if (timer_id s-num_timers) { DPRINTF(qemu: timer id out of range\n); return; } @@ -504,7 +520,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, /* Enable main counter and interrupt generation. */ s-hpet_offset = ticks_to_ns(s-hpet_counter) - qemu_get_clock(vm_clock); -for (i = 0; i HPET_NUM_TIMERS; i++) { +for (i = 0; i s-num_timers; i++) { if ((s-timer[i])-cmp != ~0ULL) { hpet_set_timer(s-timer[i]); } @@ -512,7 +528,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } else if (deactivating_bit(old_val, new_val, HPET_CFG_ENABLE)) { /* Halt main counter and disable interrupt generation. */ s-hpet_counter = hpet_get_ticks(s); -for (i = 0; i HPET_NUM_TIMERS; i++) { +for (i = 0; i s-num_timers; i++) { hpet_del_timer(s-timer[i]); } } @@ -530,7 +546,7 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, break; case HPET_STATUS: val = new_val s-isr; -for (i = 0; i HPET_NUM_TIMERS; i++) { +for (i = 0; i s-num_timers; i++) { if (val (1 i)) { update_irq(s-timer[i], 0); } @@ -589,7 +605,7 @@ static void hpet_reset(DeviceState *d) int i; static int count = 0; -for (i = 0; i HPET_NUM_TIMERS; i++) { +for (i = 0; i s-num_timers; i++) { HPETTimer *timer = s-timer[i]; hpet_del_timer(timer); @@ -603,8 +619,9 @@ static
[Qemu-devel] Re: RFC: blockdev_add friends, brief rationale, QMP docs
On 06/04/2010 05:16 PM, Markus Armbruster wrote: - protocol: json-array of json-object Each element object has a member name - Possible values: file, nbd, ... Additional members depend on the value of name. For name = file: - file: file name (json-string) For name = nbd: - domain: address family (json-string, optional) - Possible values: inet (default), unix - file: file name (json-string), only with domain = unix - host: host name (json-string), only with domain = inet - port: port (json-int), only with domain = inet ... This loses the nesting that protocols have. I'd like to see the each nested protocol as member of the parent protocol. Besides the lovely } } }s in the json representation, this allows us to have more complicated protocols, for example a mirror protocol that has two child protocol each specifying a different backing store. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 16/16] hpet: Add MSI support
From: Jan Kiszka jan.kis...@siemens.com This implements the HPET capability of routing IRQs to the front-side bus, aka MSI support. This feature can be enabled via the qdev property msi and is off by default. Note that switching it on can cause guests (at least Linux) to use the HPET as timer instead of the LAPIC. KVM users should recall that only the latter is currently available as fast in-kernel model. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c |2 +- hw/apic.h |1 + hw/hpet.c | 39 +++ hw/hpet_emul.h |4 +++- 4 files changed, 40 insertions(+), 6 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index f9587d1..f33d20a 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -776,7 +776,7 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr) return val; } -static void apic_send_msi(target_phys_addr_t addr, uint32 data) +void apic_send_msi(target_phys_addr_t addr, uint32 data) { uint8_t dest = (addr MSI_ADDR_DEST_ID_MASK) MSI_ADDR_DEST_ID_SHIFT; uint8_t vector = (data MSI_DATA_VECTOR_MASK) MSI_DATA_VECTOR_SHIFT; diff --git a/hw/apic.h b/hw/apic.h index 738d98a..9c646f0 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -5,6 +5,7 @@ typedef struct IOAPICState IOAPICState; int apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t polarity, uint8_t trigger_mode); +void apic_send_msi(target_phys_addr_t addr, uint32 data); int apic_init(CPUState *env); int apic_accept_pic_intr(CPUState *env); void apic_deliver_pic_intr(CPUState *env, int level); diff --git a/hw/hpet.c b/hw/hpet.c index 7219967..490a804 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include hpet_emul.h #include sysbus.h #include mc146818rtc.h +#include apic.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -39,6 +40,8 @@ #define DPRINTF(...) #endif +#define HPET_MSI_SUPPORT0 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -47,7 +50,7 @@ typedef struct HPETTimer { /* timers */ /* Memory-mapped, software visible timer registers */ uint64_t config;/* configuration/cap */ uint64_t cmp; /* comparator */ -uint64_t fsb; /* FSB route, not supported now */ +uint64_t fsb; /* FSB route */ /* Hidden register state */ uint64_t period;/* Last value written to comparator */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit @@ -59,6 +62,7 @@ typedef struct HPETState { SysBusDevice busdev; uint64_t hpet_offset; qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; +uint32_t flags; uint8_t rtc_irq_level; uint8_t num_timers; HPETTimer timer[HPET_MAX_TIMERS]; @@ -80,6 +84,11 @@ static uint32_t timer_int_route(struct HPETTimer *timer) return (timer-config HPET_TN_INT_ROUTE_MASK) HPET_TN_INT_ROUTE_SHIFT; } +static uint32_t timer_fsb_route(HPETTimer *t) +{ +return t-config HPET_TN_FSB_ENABLE; +} + static uint32_t hpet_enabled(HPETState *s) { return s-config HPET_CFG_ENABLE; @@ -179,7 +188,11 @@ static void update_irq(struct HPETTimer *timer, int set) mask = 1 timer-tn; if (!set || !timer_enabled(timer) || !hpet_enabled(timer-state)) { s-isr = ~mask; -qemu_irq_lower(s-irqs[route]); +if (!timer_fsb_route(timer)) { +qemu_irq_lower(s-irqs[route]); +} +} else if (timer_fsb_route(timer)) { +apic_send_msi(timer-fsb 32, timer-fsb 0x); } else if (timer-config HPET_TN_TYPE_LEVEL) { s-isr |= mask; qemu_irq_raise(s-irqs[route]); @@ -216,6 +229,12 @@ static int hpet_post_load(void *opaque, int version_id) /* Push number of timers into capability returned via HPET_ID */ s-capability = ~HPET_ID_NUM_TIM_MASK; s-capability |= (s-num_timers - 1) HPET_ID_NUM_TIM_SHIFT; + +/* Derive HPET_MSI_SUPPORT from the capability of the first timer. */ +s-flags = ~(1 HPET_MSI_SUPPORT); +if (s-timer[0].config HPET_TN_FSB_CAP) { +s-flags |= 1 HPET_MSI_SUPPORT; +} return 0; } @@ -361,6 +380,8 @@ static uint32_t hpet_ram_readl(void *opaque, target_phys_addr_t addr) case HPET_TN_CMP + 4: return timer-cmp 32; case HPET_TN_ROUTE: +return timer-fsb; +case HPET_TN_ROUTE + 4: return timer-fsb 32; default: DPRINTF(qemu: invalid hpet_ram_readl\n); @@ -444,6 +465,9 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, switch ((addr - 0x100) % 0x20) { case HPET_TN_CFG: DPRINTF(qemu: hpet_ram_writel HPET_TN_CFG\n); +if (activating_bit(old_val, new_val, HPET_TN_FSB_ENABLE)) { +update_irq(timer, 0); +} val = hpet_fixup_reg(new_val, old_val,
[Qemu-devel] Re: [PATCH 10/16] x86: Refactor RTC IRQ coalescing workaround
On Sun, Jun 6, 2010 at 8:10 AM, Jan Kiszka jan.kis...@web.de wrote: From: Jan Kiszka jan.kis...@siemens.com Make use of the new IRQ message and report delivery results from the sink to the source. As a by-product, this also adds de-coalescing support to the PIC. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c | 64 +++-- hw/apic.h | 9 ++ hw/i8259.c | 16 ++- hw/ioapic.c | 20 ++--- hw/mc146818rtc.c | 83 ++--- hw/pc.c | 29 -- 6 files changed, 141 insertions(+), 80 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 7fbd79b..f9587d1 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -123,10 +123,8 @@ typedef struct APICState { static int apic_io_memory; static APICState *local_apics[MAX_APICS + 1]; static int last_apic_idx = 0; -static int apic_irq_delivered; - -static void apic_set_irq(APICState *s, int vector_num, int trigger_mode); +static int apic_set_irq(APICState *s, int vector_num, int trigger_mode); static void apic_update_irq(APICState *s); static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, uint8_t dest, uint8_t dest_mode); @@ -239,12 +237,12 @@ void apic_deliver_pic_intr(CPUState *env, int level) }\ } -static void apic_bus_deliver(const uint32_t *deliver_bitmask, - uint8_t delivery_mode, - uint8_t vector_num, uint8_t polarity, - uint8_t trigger_mode) +static int apic_bus_deliver(const uint32_t *deliver_bitmask, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t polarity, uint8_t trigger_mode) { APICState *apic_iter; + int ret; switch (delivery_mode) { case APIC_DM_LOWPRI: @@ -261,11 +259,12 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, if (d = 0) { apic_iter = local_apics[d]; if (apic_iter) { - apic_set_irq(apic_iter, vector_num, trigger_mode); + return apic_set_irq(apic_iter, vector_num, + trigger_mode); } } } - return; + return QEMU_IRQ_MASKED; case APIC_DM_FIXED: break; @@ -273,34 +272,42 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, case APIC_DM_SMI: foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_SMI) ); - return; + return QEMU_IRQ_DELIVERED; case APIC_DM_NMI: foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_NMI) ); - return; + return QEMU_IRQ_DELIVERED; case APIC_DM_INIT: /* normal INIT IPI sent to processors */ foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_INIT) ); - return; + return QEMU_IRQ_DELIVERED; case APIC_DM_EXTINT: /* handled in I/O APIC code */ break; default: - return; + return QEMU_IRQ_MASKED; } + ret = QEMU_IRQ_MASKED; foreach_apic(apic_iter, deliver_bitmask, - apic_set_irq(apic_iter, vector_num, trigger_mode) ); + if (ret == QEMU_IRQ_MASKED) + ret = QEMU_IRQ_COALESCED; + if (apic_set_irq(apic_iter, vector_num, + trigger_mode) == QEMU_IRQ_DELIVERED) { + ret = QEMU_IRQ_DELIVERED; + } + ); + return ret; } -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, - uint8_t delivery_mode, uint8_t vector_num, - uint8_t polarity, uint8_t trigger_mode) +int apic_deliver_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t polarity, uint8_t trigger_mode) { uint32_t deliver_bitmask[MAX_APIC_WORDS]; @@ -308,8 +315,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, polarity %d trigger_mode %d\n, __func__, dest, dest_mode, delivery_mode, vector_num, polarity, trigger_mode); apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode); - apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, polarity, - trigger_mode); + return apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, + polarity, trigger_mode); } void cpu_set_apic_base(CPUState *env, uint64_t val) @@ -402,22 +409,10 @@ static void
[Qemu-devel] Re: [PATCH 11/16] hpet/rtc: Rework RTC IRQ replacement by HPET
On Sun, Jun 6, 2010 at 8:11 AM, Jan Kiszka jan.kis...@web.de wrote: From: Jan Kiszka jan.kis...@siemens.com Allow the intercept the RTC IRQ for the HPET legacy mode. Then push routing to IRQ8 completely into the HPET. This allows to turn hpet_in_legacy_mode() into a private function. Furthermore, this stops the RTC from clearing IRQ8 even if the HPET is in control. This patch comes with a side effect: The RTC timers will no longer be stoppend when there is no IRQ consumer, possibly causing a minor performance degration. But as the guest may want to redirect the RTC to the SCI in that mode, it should normally disable unused IRQ source anyway. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 42 +++--- hw/hpet_emul.h | 4 hw/mc146818rtc.c | 54 +++--- hw/mc146818rtc.h | 4 +++- hw/mips_jazz.c | 2 +- hw/mips_malta.c | 2 +- hw/mips_r4k.c | 2 +- hw/pc.c | 14 -- hw/ppc_prep.c | 2 +- 9 files changed, 65 insertions(+), 61 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 041dd84..d26cad5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include qemu-timer.h #include hpet_emul.h #include sysbus.h +#include mc146818rtc.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -58,6 +59,7 @@ typedef struct HPETState { SysBusDevice busdev; uint64_t hpet_offset; qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; + uint8_t rtc_irq_level; HPETTimer timer[HPET_NUM_TIMERS]; /* Memory-mapped, software visible registers */ @@ -69,12 +71,9 @@ typedef struct HPETState { static HPETState *hpet_statep; -uint32_t hpet_in_legacy_mode(void) +static uint32_t hpet_in_legacy_mode(HPETState *s) { - if (!hpet_statep) { - return 0; - } - return hpet_statep-config HPET_CFG_LEGACY; + return s-config HPET_CFG_LEGACY; } static uint32_t timer_int_route(struct HPETTimer *timer) @@ -166,12 +165,12 @@ static void update_irq(struct HPETTimer *timer) { int route; - if (timer-tn = 1 hpet_in_legacy_mode()) { + if (timer-tn = 1 hpet_in_legacy_mode(timer-state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires * timer0 be routed to IRQ0 in NON-APIC or IRQ2 in the I/O APIC, * timer1 be routed to IRQ8 in NON-APIC or IRQ8 in the I/O APIC. */ - route = (timer-tn == 0) ? 0 : 8; + route = (timer-tn == 0) ? 0 : RTC_ISA_IRQ; } else { route = timer_int_route(timer); } @@ -515,8 +514,10 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_disable(); + qemu_irq_lower(s-irqs[RTC_ISA_IRQ]); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_enable(); + qemu_set_irq(s-irqs[RTC_ISA_IRQ], s-rtc_irq_level); } break; case HPET_CFG + 4: @@ -607,6 +608,30 @@ static void hpet_reset(DeviceState *d) count = 1; } +static void hpet_rtc_delivery_cb(qemu_irq irq, void *opaque, int n, int level, + int result) +{ + qemu_irq orig_irq = opaque; + + qemu_irq_fire_delivery_cb(orig_irq, level, result); +} + +static void hpet_handle_rtc_irq(qemu_irq irq, void *opaque, int n, int level) +{ + HPETState *s = FROM_SYSBUS(HPETState, opaque); + IRQMsg msg = { + .delivery_cb = hpet_rtc_delivery_cb, + .delivery_opaque = irq, + }; + + s-rtc_irq_level = level; + if (hpet_in_legacy_mode(s)) { + qemu_irq_fire_delivery_cb(irq, level, QEMU_IRQ_MASKED); + } else { + qemu_set_irq_msg(s-irqs[RTC_ISA_IRQ], level, msg); This is the problem with passing around stack allocated objects: after this function finishes, s-irqs[RTC_ISA_IRQ].msg is a dangling pointer to some stack space. + } +} + static int hpet_init(SysBusDevice *dev) { HPETState *s = FROM_SYSBUS(HPETState, dev); @@ -625,6 +650,9 @@ static int hpet_init(SysBusDevice *dev) timer-state = s; } + isa_reserve_irq(RTC_ISA_IRQ); + qdev_init_gpio_in(dev-qdev, hpet_handle_rtc_irq, 1); + /* HPET Area */ iomemtype = cpu_register_io_memory(hpet_ram_read, hpet_ram_write, s); diff --git a/hw/hpet_emul.h b/hw/hpet_emul.h index 785f850..9c268cc 100644 --- a/hw/hpet_emul.h +++ b/hw/hpet_emul.h @@ -47,8 +47,4 @@ #define HPET_TN_INT_ROUTE_CAP_SHIFT 32 #define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U -#if defined TARGET_I386 -extern uint32_t hpet_in_legacy_mode(void); -#endif - #endif diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index
[Qemu-devel] Re: [PATCH 00/16] HPET cleanups, fixes, enhancements
On Sun, Jun 6, 2010 at 8:10 AM, Jan Kiszka jan.kis...@web.de wrote: Second round, specifically adressing: - IRQMsg framework to refactor existing de-coalescing code - RTC IRQ output as GPIO pin (routed depening on HPET or -no-hpet) - ISA reservation for RTC IRQ If discussion around IRQMsg and de-coalescing happens to continue, I would suggest to merge patches 1..7 as they are likely uncontroversial and also fix bugs. Otherwise everything looks fine to me, but 10 and 11 had minor problems. Nice work! I'd suppose one possible cleanup could be to use the message payload in place of apic_deliver_irq()? Jan Kiszka (16): hpet: Catch out-of-bounds timer access hpet: Coding style cleanups and some refactorings hpet: Silence warning on write to running main counter hpet: Move static timer field initialization hpet: Convert to qdev hpet: Start/stop timer when HPET_TN_ENABLE is modified monitor/QMP: Drop info hpet / query-hpet Pass IRQ object on handler invocation Enable message delivery via IRQs x86: Refactor RTC IRQ coalescing workaround hpet/rtc: Rework RTC IRQ replacement by HPET hpet: Drop static state hpet: Add support for level-triggered interrupts vmstate: Add VMSTATE_STRUCT_VARRAY_UINT8 hpet: Make number of timers configurable hpet: Add MSI support QMP/vm-info | 2 +- hw/acpi_piix4.c | 3 +- hw/apic.c | 66 +++--- hw/apic.h | 11 +- hw/arm11mpcore.c | 12 +- hw/arm_gic.c | 18 +- hw/arm_pic.c | 6 +- hw/arm_timer.c | 4 +- hw/bitbang_i2c.c | 4 +- hw/bt-hci-csr.c | 2 +- hw/cbus.c | 6 +- hw/cris_pic_cpu.c | 4 +- hw/esp.c | 2 +- hw/etraxfs_pic.c | 16 +- hw/fdc.c | 2 +- hw/heathrow_pic.c | 3 +- hw/hpet.c | 595 ++- hw/hpet_emul.h | 46 +--- hw/hw.h | 10 + hw/i8259.c | 28 ++- hw/ide/cmd646.c | 2 +- hw/ide/microdrive.c | 2 +- hw/integratorcp.c | 10 +- hw/ioapic.c | 22 ++- hw/irq.c | 48 - hw/irq.h | 42 +++- hw/lance.c | 2 +- hw/max7310.c | 2 +- hw/mc146818rtc.c | 111 +- hw/mc146818rtc.h | 4 +- hw/mcf5206.c | 6 +- hw/mcf_intc.c | 14 +- hw/microblaze_pic_cpu.c | 5 +- hw/mips_int.c | 10 +- hw/mips_jazz.c | 4 +- hw/mips_malta.c | 4 +- hw/mips_r4k.c | 2 +- hw/mst_fpga.c | 10 +- hw/musicpal.c | 16 +- hw/nseries.c | 4 +- hw/omap.h | 2 +- hw/omap1.c | 34 ++-- hw/omap2.c | 8 +- hw/omap_dma.c | 8 +- hw/omap_mmc.c | 2 +- hw/openpic.c | 6 +- hw/palm.c | 2 +- hw/pc.c | 59 -- hw/pc.h | 8 +- hw/pci.c | 4 +- hw/pl061.c | 4 +- hw/pl190.c | 6 +- hw/ppc.c | 8 +- hw/ppc4xx_devs.c | 2 +- hw/ppc_prep.c | 4 +- hw/pxa2xx.c | 2 +- hw/pxa2xx_gpio.c | 2 +- hw/pxa2xx_pcmcia.c | 3 +- hw/pxa2xx_pic.c | 10 +- hw/r2d.c | 2 +- hw/rc4030.c | 7 +- hw/sbi.c | 2 +- hw/sh_intc.c | 4 +- hw/sh_intc.h | 2 +- hw/sharpsl.h | 1 - hw/slavio_intctl.c | 16 +- hw/slavio_misc.c | 3 +- hw/sparc32_dma.c | 2 +- hw/spitz.c | 14 +- hw/ssd0323.c | 2 +- hw/stellaris.c | 6 +- hw/sun4c_intctl.c | 8 +- hw/sun4m.c | 14 +- hw/sun4u.c | 12 +- hw/syborg_interrupt.c | 8 +- hw/tc6393xb.c | 7 +- hw/tosa.c | 2 +- hw/tusb6010.c | 3 +- hw/twl92230.c | 5 +- hw/versatilepb.c | 10 +- hw/xilinx_intc.c | 8 +- hw/zaurus.c | 2 +- monitor.c | 22 -- qemu-monitor.hx | 21 -- 84 files changed, 874 insertions(+), 643 deletions(-)
[Qemu-devel] Re: [PATCH 10/16] x86: Refactor RTC IRQ coalescing workaround
Blue Swirl wrote: On Sun, Jun 6, 2010 at 8:10 AM, Jan Kiszka jan.kis...@web.de wrote: From: Jan Kiszka jan.kis...@siemens.com Make use of the new IRQ message and report delivery results from the sink to the source. As a by-product, this also adds de-coalescing support to the PIC. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c| 64 +++-- hw/apic.h|9 ++ hw/i8259.c | 16 ++- hw/ioapic.c | 20 ++--- hw/mc146818rtc.c | 83 ++--- hw/pc.c | 29 -- 6 files changed, 141 insertions(+), 80 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 7fbd79b..f9587d1 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -123,10 +123,8 @@ typedef struct APICState { static int apic_io_memory; static APICState *local_apics[MAX_APICS + 1]; static int last_apic_idx = 0; -static int apic_irq_delivered; - -static void apic_set_irq(APICState *s, int vector_num, int trigger_mode); +static int apic_set_irq(APICState *s, int vector_num, int trigger_mode); static void apic_update_irq(APICState *s); static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, uint8_t dest, uint8_t dest_mode); @@ -239,12 +237,12 @@ void apic_deliver_pic_intr(CPUState *env, int level) }\ } -static void apic_bus_deliver(const uint32_t *deliver_bitmask, - uint8_t delivery_mode, - uint8_t vector_num, uint8_t polarity, - uint8_t trigger_mode) +static int apic_bus_deliver(const uint32_t *deliver_bitmask, +uint8_t delivery_mode, uint8_t vector_num, +uint8_t polarity, uint8_t trigger_mode) { APICState *apic_iter; +int ret; switch (delivery_mode) { case APIC_DM_LOWPRI: @@ -261,11 +259,12 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, if (d = 0) { apic_iter = local_apics[d]; if (apic_iter) { -apic_set_irq(apic_iter, vector_num, trigger_mode); +return apic_set_irq(apic_iter, vector_num, +trigger_mode); } } } -return; +return QEMU_IRQ_MASKED; case APIC_DM_FIXED: break; @@ -273,34 +272,42 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, case APIC_DM_SMI: foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_SMI) ); -return; +return QEMU_IRQ_DELIVERED; case APIC_DM_NMI: foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_NMI) ); -return; +return QEMU_IRQ_DELIVERED; case APIC_DM_INIT: /* normal INIT IPI sent to processors */ foreach_apic(apic_iter, deliver_bitmask, cpu_interrupt(apic_iter-cpu_env, CPU_INTERRUPT_INIT) ); -return; +return QEMU_IRQ_DELIVERED; case APIC_DM_EXTINT: /* handled in I/O APIC code */ break; default: -return; +return QEMU_IRQ_MASKED; } +ret = QEMU_IRQ_MASKED; foreach_apic(apic_iter, deliver_bitmask, - apic_set_irq(apic_iter, vector_num, trigger_mode) ); +if (ret == QEMU_IRQ_MASKED) +ret = QEMU_IRQ_COALESCED; +if (apic_set_irq(apic_iter, vector_num, + trigger_mode) == QEMU_IRQ_DELIVERED) { +ret = QEMU_IRQ_DELIVERED; +} +); +return ret; } -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, - uint8_t delivery_mode, uint8_t vector_num, - uint8_t polarity, uint8_t trigger_mode) +int apic_deliver_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t polarity, uint8_t trigger_mode) { uint32_t deliver_bitmask[MAX_APIC_WORDS]; @@ -308,8 +315,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, polarity %d trigger_mode %d\n, __func__, dest, dest_mode, delivery_mode, vector_num, polarity, trigger_mode); apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode); -apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, polarity, - trigger_mode); +return apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, +polarity, trigger_mode); } void cpu_set_apic_base(CPUState *env, uint64_t val) @@ -402,22 +409,10 @@
[Qemu-devel] Re: [PATCH 11/16] hpet/rtc: Rework RTC IRQ replacement by HPET
Blue Swirl wrote: On Sun, Jun 6, 2010 at 8:11 AM, Jan Kiszka jan.kis...@web.de wrote: From: Jan Kiszka jan.kis...@siemens.com Allow the intercept the RTC IRQ for the HPET legacy mode. Then push routing to IRQ8 completely into the HPET. This allows to turn hpet_in_legacy_mode() into a private function. Furthermore, this stops the RTC from clearing IRQ8 even if the HPET is in control. This patch comes with a side effect: The RTC timers will no longer be stoppend when there is no IRQ consumer, possibly causing a minor performance degration. But as the guest may want to redirect the RTC to the SCI in that mode, it should normally disable unused IRQ source anyway. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c| 42 +++--- hw/hpet_emul.h |4 hw/mc146818rtc.c | 54 +++--- hw/mc146818rtc.h |4 +++- hw/mips_jazz.c |2 +- hw/mips_malta.c |2 +- hw/mips_r4k.c|2 +- hw/pc.c | 14 -- hw/ppc_prep.c|2 +- 9 files changed, 65 insertions(+), 61 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 041dd84..d26cad5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include qemu-timer.h #include hpet_emul.h #include sysbus.h +#include mc146818rtc.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -58,6 +59,7 @@ typedef struct HPETState { SysBusDevice busdev; uint64_t hpet_offset; qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; +uint8_t rtc_irq_level; HPETTimer timer[HPET_NUM_TIMERS]; /* Memory-mapped, software visible registers */ @@ -69,12 +71,9 @@ typedef struct HPETState { static HPETState *hpet_statep; -uint32_t hpet_in_legacy_mode(void) +static uint32_t hpet_in_legacy_mode(HPETState *s) { -if (!hpet_statep) { -return 0; -} -return hpet_statep-config HPET_CFG_LEGACY; +return s-config HPET_CFG_LEGACY; } static uint32_t timer_int_route(struct HPETTimer *timer) @@ -166,12 +165,12 @@ static void update_irq(struct HPETTimer *timer) { int route; -if (timer-tn = 1 hpet_in_legacy_mode()) { +if (timer-tn = 1 hpet_in_legacy_mode(timer-state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires * timer0 be routed to IRQ0 in NON-APIC or IRQ2 in the I/O APIC, * timer1 be routed to IRQ8 in NON-APIC or IRQ8 in the I/O APIC. */ -route = (timer-tn == 0) ? 0 : 8; +route = (timer-tn == 0) ? 0 : RTC_ISA_IRQ; } else { route = timer_int_route(timer); } @@ -515,8 +514,10 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_disable(); +qemu_irq_lower(s-irqs[RTC_ISA_IRQ]); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { hpet_pit_enable(); +qemu_set_irq(s-irqs[RTC_ISA_IRQ], s-rtc_irq_level); } break; case HPET_CFG + 4: @@ -607,6 +608,30 @@ static void hpet_reset(DeviceState *d) count = 1; } +static void hpet_rtc_delivery_cb(qemu_irq irq, void *opaque, int n, int level, + int result) +{ +qemu_irq orig_irq = opaque; + +qemu_irq_fire_delivery_cb(orig_irq, level, result); +} + +static void hpet_handle_rtc_irq(qemu_irq irq, void *opaque, int n, int level) +{ +HPETState *s = FROM_SYSBUS(HPETState, opaque); +IRQMsg msg = { +.delivery_cb = hpet_rtc_delivery_cb, +.delivery_opaque = irq, +}; + +s-rtc_irq_level = level; +if (hpet_in_legacy_mode(s)) { +qemu_irq_fire_delivery_cb(irq, level, QEMU_IRQ_MASKED); +} else { +qemu_set_irq_msg(s-irqs[RTC_ISA_IRQ], level, msg); This is the problem with passing around stack allocated objects: after this function finishes, s-irqs[RTC_ISA_IRQ].msg is a dangling pointer to some stack space. s-irqs[RTC_ISA_IRQ].msg is NULL when qemu_set_irq_msg returned, msg itself will not leak out of the qemu_irq subsystem. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] Re: [PATCH 00/16] HPET cleanups, fixes, enhancements
Blue Swirl wrote: On Sun, Jun 6, 2010 at 8:10 AM, Jan Kiszka jan.kis...@web.de wrote: Second round, specifically adressing: - IRQMsg framework to refactor existing de-coalescing code - RTC IRQ output as GPIO pin (routed depening on HPET or -no-hpet) - ISA reservation for RTC IRQ If discussion around IRQMsg and de-coalescing happens to continue, I would suggest to merge patches 1..7 as they are likely uncontroversial and also fix bugs. Otherwise everything looks fine to me, but 10 and 11 had minor problems. Nice work! Thanks for the quick feedback! I'd suppose one possible cleanup could be to use the message payload in place of apic_deliver_irq()? Haven't looked into such things yet, the series is already long enough. :) I could also imagine that we may avoid exporting the APIC MSI functions for HPET use and instead provide a single MSI qemu_irq object, pushing the vector information into the message payload. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] Re: [PATCHv3 1/2] virtio: support layout with avail ring before idx
On Sat, Jun 05, 2010 at 01:40:26PM +0930, Rusty Russell wrote: On Fri, 4 Jun 2010 09:12:05 pm Michael S. Tsirkin wrote: On Fri, Jun 04, 2010 at 08:46:49PM +0930, Rusty Russell wrote: I'm uncomfortable with moving a field. We haven't done that before and I wonder what will break with old code. With e.g. my patch, We only do this conditionally when bit is negotitated. Of course, but see this change: commit ef688e151c00e5d529703be9a04fd506df8bc54e Author: Rusty Russell ru...@rustcorp.com.au Date: Fri Jun 12 22:16:35 2009 -0600 virtio: meet virtio spec by finalizing features before using device Virtio devices are supposed to negotiate features before they start using the device, but the current code doesn't do this. This is because the driver's probe() function invariably has to add buffers to a virtqueue, or probe the disk (virtio_blk). This currently doesn't matter since no existing backend is strict about the feature negotiation. But it's possible to imagine a future feature which completely changes how a device operates: in this case, we'd need to acknowledge it before using the device. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Now, this isn't impossible to overcome: we know that if they use the ring before completing feature negotiation then they don't understand the new format. But we have to be aware of that on the qemu side. Are we? I think we are ok. virtqueue_init which sets the avail/ysed pointers is called when we write the base address. So we only need to be careful and not change this feature bit after creating the rings. Should we instead just abandon the flags field and use last_used only? Or, more radically, put flags == last_used when the feature is on? Thoughts? Rusty. Hmm, e.g. with TX and virtio net, we almost never want interrupts, whatever the index value. Good point. OK, I give in, I'll take your patch which moves the fields to the end. Is that your preference? Yes, I think so. You mean PATCHv3 unchanged with 254 byte padding? Please be careful with the qemu side though... It's not inconceivable that I'll write that virtio cacheline simulator this (coming) week, too... Thanks. Rusty.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? That's because I confused myself :) I _dislike_ them to be used, but since device assignment requires ack notifiers anyway may be it is better to introduce one mechanism for device assignmen + de-coalescing instead of introducing two different mechanism. Using ack notifiers should be easy: RTC registers ack notifier and keep track of delivered interrupts. If timer triggers after previews irq was set, but before it was acked coalesced counter is incremented. In ack notifier callback coalesced counter is checked and if it is not zero new irq is set. Ack notifier registrations and event deliveries still need to be routed. Piggy-backing this on IRQ messages may be unavoidable for that reason. It is done in the kernel without piggy-backing. Anyway, I'm going to post my HPET updates with the infrastructure for IRQMsg now. Maybe it's helpful to see the other option in reality. One other think to consider current approach does not always work. Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be broadcasted to all cpus, but only boot cpu does time calculation. With current approach if interrupt is delivered to at least one vcpu it will not be considered coalesced, but if cpu it was delivered to is not cpu that does time accounting then clock will drift. -- Gleb.
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
Gleb Natapov wrote: On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? That's because I confused myself :) I _dislike_ them to be used, but since device assignment requires ack notifiers anyway may be it is better to introduce one mechanism for device assignmen + de-coalescing instead of introducing two different mechanism. Using ack notifiers should be easy: RTC registers ack notifier and keep track of delivered interrupts. If timer triggers after previews irq was set, but before it was acked coalesced counter is incremented. In ack notifier callback coalesced counter is checked and if it is not zero new irq is set. Ack notifier registrations and event deliveries still need to be routed. Piggy-backing this on IRQ messages may be unavoidable for that reason. It is done in the kernel without piggy-backing. As it does not include any IRQ routers in front of the interrupt controller. Maybe it works for x86, but it is no generic solution. Also, periodic timer sources get no information about the fact that their interrupt is masked somewhere along the path to the VCPUs and will possibly replay countless IRQs when the masking ends, no? Anyway, I'm going to post my HPET updates with the infrastructure for IRQMsg now. Maybe it's helpful to see the other option in reality. One other think to consider current approach does not always work. Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be broadcasted to all cpus, but only boot cpu does time calculation. With current approach if interrupt is delivered to at least one vcpu it will not be considered coalesced, but if cpu it was delivered to is not cpu that does time accounting then clock will drift. That means we would have to fire callbacks per receiving CPU and report its number back. Is there a way to find out if we are running such a guest without an '-enable-win2k[38]-64bit-smp-rtc-drift-fix'? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers with delivery feedback
On Sun, Jun 06, 2010 at 12:10:07PM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sun, Jun 06, 2010 at 10:07:48AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sun, Jun 06, 2010 at 09:39:04AM +0200, Jan Kiszka wrote: Gleb Natapov wrote: On Sat, Jun 05, 2010 at 02:04:01AM +0200, Jan Kiszka wrote: I'd like to also support EOI handling. When the guest clears the interrupt condtion, the EOI callback would be called. This could occur much later than the IRQ delivery time. I'm not sure if we need the result code in that case. If any intermediate device (IOAPIC?) needs to be informed about either delivery or EOI also, it could create a proxy message with its callbacks in place. But we need then a separate opaque field (in addition to payload) to store the original message. struct IRQMsg { DeviceState *src; void (*delivery_cb)(IRQMsg *msg, int result); void (*eoi_cb)(IRQMsg *msg, int result); void *src_opaque; void *payload; }; Extending the lifetime of IRQMsg objects beyond the delivery call stack means qemu_malloc/free for every delivery. I think it takes a _very_ appealing reason to justify this. But so far I do not see any use case for eio_cb at all. I dislike use of eoi for reinfecting missing interrupts since it eliminates use of internal PIC/APIC queue of not yet delivered interrupts. PIC and APIC has internal queue that can handle two elements: one is delivered, but not yet acked interrupt in isr and another is pending interrupt in irr. Using eoi callback (or ack notifier as it's called inside kernel) interrupt will be considered coalesced even if irr is cleared, but no ack was received for previously delivered interrupt. But ack notifiers actually has another use: device assignment. There is a plan to move device assignment from kernel to userspace and for that ack notifiers will have to be extended to userspace too. If so we can use them to do irq decoalescing as well. I doubt they should be part of IRQMsg though. Why not do what kernel does: have globally registered notifier based on irqchip/pin. I read this twice but I still don't get your plan. Do you like or dislike using EIO for de-coalescing? And how should these notifiers work? That's because I confused myself :) I _dislike_ them to be used, but since device assignment requires ack notifiers anyway may be it is better to introduce one mechanism for device assignmen + de-coalescing instead of introducing two different mechanism. Using ack notifiers should be easy: RTC registers ack notifier and keep track of delivered interrupts. If timer triggers after previews irq was set, but before it was acked coalesced counter is incremented. In ack notifier callback coalesced counter is checked and if it is not zero new irq is set. Ack notifier registrations and event deliveries still need to be routed. Piggy-backing this on IRQ messages may be unavoidable for that reason. It is done in the kernel without piggy-backing. As it does not include any IRQ routers in front of the interrupt controller. Maybe it works for x86, but it is no generic solution. x86 has IRQ router in front of interrupt controller inside pci host bridge. Also, periodic timer sources get no information about the fact that their interrupt is masked somewhere along the path to the VCPUs and will possibly replay countless IRQs when the masking ends, no? Correct, for that we have mask notifiers in the kernel. Gets ugly be the minute. Anyway, I'm going to post my HPET updates with the infrastructure for IRQMsg now. Maybe it's helpful to see the other option in reality. One other think to consider current approach does not always work. Win2K3-64bit-smp and Win2k8-64bit-smp configure RTC interrupt to be broadcasted to all cpus, but only boot cpu does time calculation. With current approach if interrupt is delivered to at least one vcpu it will not be considered coalesced, but if cpu it was delivered to is not cpu that does time accounting then clock will drift. That means we would have to fire callbacks per receiving CPU and report its number back. Is there a way to find out if we are running such a guest without an '-enable-win2k[38]-64bit-smp-rtc-drift-fix'? Not that I know of. -- Gleb.
Re: [Qemu-devel] [PATCH 01/17] vl.c: Remove double include of netinet/in.h for Solaris
Am 04.06.2010 um 18:08 schrieb jes.soren...@redhat.com: From: Jes Sorensen jes.soren...@redhat.com vl.c: netinet/in.h is already included once above for in the generic POSIX section. Signed-off-by: Jes Sorensen jes.soren...@redhat.com Acked-by: Andreas Faerber afaer...@opensolaris.org --- vl.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/vl.c b/vl.c index 417554f..7c4298a 100644 --- a/vl.c +++ b/vl.c @@ -70,7 +70,6 @@ #include sys/ethernet.h #include sys/sockio.h #include netinet/arp.h -#include netinet/in.h #include netinet/in_systm.h #include netinet/ip.h #include netinet/ip_icmp.h // must come after ip.h -- 1.6.5.2
[Qemu-devel] just one more qemu problem
Hi. Well, just one more problem I need help with, then I'll not post this things to the list again. I upgraded qemu, and got sound in the console using the curses option, so windows works. but when starting in a gnome terminal with qemu windows.ovl -soundhw all -localtime The microsoft windows screen comes up, and windows tries to boot, but nothing happens. What have I missed now? /Kristoffer Kristoffer Gustafsson Trelleborgsvägen 1b 514 33 Tranemo tel: 0325-42093 mobil: 073-8226473 e-post: k...@dreamwld.com Eller kristoffer_gustafs...@allmail.net
Re: [Qemu-devel] Re: [PATCH v2 2/2] vnc: threaded VNC server
On 06/05/2010 11:03 AM, Corentin Chary wrote: So it's disabled by default? Sounds like a pretty cool and useful feature to me that should be enabled by default. Because it's does not work on windows (qemu-thread.c only uses pthread) and because I don't want to break everything :) One option is to disable vnc on Windows and let a Windows maintainer materialize and add the corresponding support. Introducing more and more config options is not a good approach. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v2 2/2] vnc: threaded VNC server
On 06/04/2010 04:20 PM, Corentin Chary wrote: +if (vnc_trylock_display(vd)) { +vd-timer_interval = VNC_REFRESH_INTERVAL_BASE; +qemu_mod_timer(vd-timer, qemu_get_clock(rt_clock) + + vd-timer_interval); +return; +} + has_dirty = vnc_refresh_server_surface(vd); +vnc_unlock_display(vd); This could delay the update by quite a bit, no? A more elaborate approach would be to enqueue the refresh job into the queue. May need the iothread enabled so we have qemu_mutex. btw, I could not find other uses of vd-mutex, shouldn't it protect against the work thread? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [PATCH v2 2/2] vnc: threaded VNC server
On Sun, Jun 6, 2010 at 3:54 PM, Avi Kivity a...@redhat.com wrote: On 06/05/2010 11:03 AM, Corentin Chary wrote: So it's disabled by default? Sounds like a pretty cool and useful feature to me that should be enabled by default. Because it's does not work on windows (qemu-thread.c only uses pthread) and because I don't want to break everything :) One option is to disable vnc on Windows and let a Windows maintainer materialize and add the corresponding support. Introducing more and more config options is not a good approach. I think keeping the non-threaded code is a good thing (and there is not much code). There is probably case where you want to avoid threads. -- Corentin Chary http://xf.iksaif.net
Re: [Qemu-devel] [PATCH v2 2/2] vnc: threaded VNC server
On Sun, Jun 6, 2010 at 4:11 PM, Avi Kivity a...@redhat.com wrote: On 06/04/2010 04:20 PM, Corentin Chary wrote: + if (vnc_trylock_display(vd)) { + vd-timer_interval = VNC_REFRESH_INTERVAL_BASE; + qemu_mod_timer(vd-timer, qemu_get_clock(rt_clock) + + vd-timer_interval); + return; + } + has_dirty = vnc_refresh_server_surface(vd); + vnc_unlock_display(vd); This could delay the update by quite a bit, no? Yep, but it's far better than waiting the lock because it doesn't slow down the main thread. I played big buck bunny trailler (33sec) in mplayer and tight encoding: - ~40 sec with the non-threaded server - ~37 sec with a lock - ~33 sec with a try_lock A more elaborate approach would be to enqueue the refresh job into the queue. May need the iothread enabled so we have qemu_mutex. Maybe, but I'd like to wait the generic async work subsystem before adding different kind of jobs to the queue. And it's already a big improvment over the current code :). btw, I could not find other uses of vd-mutex, shouldn't it protect against the work thread? Check vnc-jobs.c, there is a qemu_mutex_lock(vs-vd-mutex); -- Corentin Chary http://xf.iksaif.net
Re: [Qemu-devel] [PATCH v2 2/2] vnc: threaded VNC server
On 06/06/2010 05:48 PM, Corentin Chary wrote: On Sun, Jun 6, 2010 at 4:11 PM, Avi Kivitya...@redhat.com wrote: On 06/04/2010 04:20 PM, Corentin Chary wrote: +if (vnc_trylock_display(vd)) { +vd-timer_interval = VNC_REFRESH_INTERVAL_BASE; +qemu_mod_timer(vd-timer, qemu_get_clock(rt_clock) + + vd-timer_interval); +return; +} + has_dirty = vnc_refresh_server_surface(vd); +vnc_unlock_display(vd); This could delay the update by quite a bit, no? Yep, but it's far better than waiting the lock because it doesn't slow down the main thread. I played big buck bunny trailler (33sec) in mplayer and tight encoding: - ~40 sec with the non-threaded server - ~37 sec with a lock - ~33 sec with a try_lock Definitely, blocking the main thread is a no-no. A more elaborate approach would be to enqueue the refresh job into the queue. May need the iothread enabled so we have qemu_mutex. Maybe, but I'd like to wait the generic async work subsystem before adding different kind of jobs to the queue. And it's already a big improvment over the current code :). Hm, ok. btw, I could not find other uses of vd-mutex, shouldn't it protect against the work thread? Check vnc-jobs.c, there is a qemu_mutex_lock(vs-vd-mutex); Shouldn't it use vnc_lock_display()? That's why I missed it. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v2 2/2] vnc: threaded VNC server
On Sun, Jun 6, 2010 at 4:53 PM, Avi Kivity a...@redhat.com wrote: On 06/06/2010 05:48 PM, Corentin Chary wrote: On Sun, Jun 6, 2010 at 4:11 PM, Avi Kivitya...@redhat.com wrote: On 06/04/2010 04:20 PM, Corentin Chary wrote: + if (vnc_trylock_display(vd)) { + vd-timer_interval = VNC_REFRESH_INTERVAL_BASE; + qemu_mod_timer(vd-timer, qemu_get_clock(rt_clock) + + vd-timer_interval); + return; + } + has_dirty = vnc_refresh_server_surface(vd); + vnc_unlock_display(vd); This could delay the update by quite a bit, no? Yep, but it's far better than waiting the lock because it doesn't slow down the main thread. I played big buck bunny trailler (33sec) in mplayer and tight encoding: - ~40 sec with the non-threaded server - ~37 sec with a lock - ~33 sec with a try_lock Definitely, blocking the main thread is a no-no. A more elaborate approach would be to enqueue the refresh job into the queue. May need the iothread enabled so we have qemu_mutex. Maybe, but I'd like to wait the generic async work subsystem before adding different kind of jobs to the queue. And it's already a big improvment over the current code :). Hm, ok. btw, I could not find other uses of vd-mutex, shouldn't it protect against the work thread? Check vnc-jobs.c, there is a qemu_mutex_lock(vs-vd-mutex); Shouldn't it use vnc_lock_display()? That's why I missed it. I didn't use vnc_lock_display because I didn't want to export it first. Maybe I should also use vnc_lock_output() in vnc-jobs.c ... -- Corentin Chary http://xf.iksaif.net
Re: [Qemu-devel] [PATCH v6 5/6] Inter-VM shared memory PCI device
On 06/05/2010 12:44 PM, Blue Swirl wrote: On Fri, Jun 4, 2010 at 9:45 PM, Cam Macdonellc...@cs.ualberta.ca wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id (shared memory server is qemu.git/contrib/ivshmem-server) Sample programs and init scripts are in a git repo here: Why is this KVM specific BTW, Posix SHM is available on many platforms? What would happen if kvm_set_foobar functions were not called when KVM is not being used? Is host eventfd support essential? It's not kvm specific, it's atomic-ops-on-shared-memory-are-visible-as-atomic-ops specific, which is currently only available with kvm. When tcg gains true smp support (and not just against other tcg threads) this can work with tcg as well. I guess that needs a host with at least 32/64 bit CAS for 32/64 bit targets respectively, and double that if the target has DCAS. Not sure how targets with ll/sc can be implemented, especially if there are limits as to what can go in between. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [RFC] QMP: Introduce query-netdevices documentation
On 06/04/2010 05:06 PM, Miguel Di Ciurcio Filho wrote: This introduces the protocol specification for querying information about network devices available on a VM and a new monitor command that show the same information. Signed-off-by: Miguel Di Ciurcio Filhomiguel.fi...@gmail.com --- qemu-monitor.hx | 69 +++ 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/qemu-monitor.hx b/qemu-monitor.hx index f6a94f2..8600129 100644 --- a/qemu-monitor.hx +++ b/qemu-monitor.hx @@ -1674,6 +1674,75 @@ show the various VLANs and the associated devices ETEXI STEXI +...@item info netdevices +show information about network devices +ETEXI +SQMP +query-netdevices + + +Each device is represented by a json-object. The returned value is a json-array +of all devices. + +Each json-object contain the following: + +- device: device name (json-string) +- vlan: only present if the device is attached to a VLAN (json-int) +- info: json-object containing the following: + - model: type of the device (json-string) + - Possible values: tap, socket, xen, slirp, dump, + vde, ne2k_pci, i82551, i82557b, + i82559er, rtl8139, e1000, pcnet, + virtio, dp83932, lan9118, mcf_fec, + xilinx-ethlite, lance, stellaris, + smc91c111, ne2k_isa, mv88w8618, + mipsnet, fseth, dp83932, usb This casts the vlan model into concrete. I thought we wanted to move away from it? Instead have separate entries for host and guest devices. -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: sun framebuffer selection (was option-rom)
2010/6/6 Blue Swirl blauwir...@gmail.com: On Sat, Jun 5, 2010 at 11:10 PM, Bob Breuer breu...@mc.net wrote: Blue Swirl wrote: but again: should we have a new machine with cg14 or some switch to select TCX vs. cg14? Why not just probe for both devices? OpenBIOS has the intention to run one day on a real hardware, doesn't it? Maybe the recently proposed machine subtype patches could help here. How is the graphic card different from cpu or a disk drive? Well, let's try to figure out a method of selecting the framebuffer type. I'll try to list some of the options, even if they might be ridiculous. 1) Use the -vga option. I know TCX and cg14 are not vga, but I think it's the closest existing command line option available. 2) Switch based on the -g WxH option. At the moment, the TCX emulation doesn't really handle anything other than 1024x768, so switch to cg14 for other resolutions if supported. 3) Use some other existing command line option, -device, -set or -global? Might work, but the syntax may not be easy to remember. We don't have an equivalent of -chardev, -netdev and -drive for displays. I guess only cause the other emulated platforms don't have that much of choice (yet). Why not use just the generic -device option? 4) Machine subtype. 5) New command line option. Anything above might be better. 6) New machine type. Is it a big enough feature to demand it's own machine type? Maybe, but see next option. 7) Select as default video for SS-20. The SS-10 and SS-600MP are already very similar. This would allow for some differentiation between the machines, but there could still be an option to switch back to TCX. Note that TCX was really only available for the SS-4 and SS-5. They are similar in qemu. But it's rather a bug than a feature. The real SS-600 is much more complex VME-bus machine. Is there anything else that I missed? Combined 7 6: make cg14 default for SS-20, add a deprecated compatibility machine for SS-20 with TCX. I'm going to go ahead with option 2 in the short term. I'm inclined to narrow it down to options 1, 4, and 7. I know that 7 would have backwards compatibility concerns. The cg14 seems to have at least the same capabilities as TCX so there shouldn't be any loss of functionality. Even though SS-20 is not the default machine, do you know of any OS that works with the sun4m implementation today but doesn't have a cg14 driver? Possible downside to cg14 for video is that any acceleration is handled by the SX pixel processor which has no available documentation. TCX also has some amount of unimplemented acceleration. It would be nice to use some basic device with well defined acceleration or just a frame buffer as default. AFAIK the open source OSes don't use the cg14 acceleration anyway. So we'll only have potential problems with Solaris and NeXTStep here. -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/
[Qemu-devel] [PATCH] virtio-net: truncating packet
virtio net attempts to peek into virtio queue to determine that we have enough space for the complete packet to fit. However, it fails to account for space consumed by virtio net header when it does this, Under stress this results in a failure with a message 'truncating packet'. redhat bz 591494. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-net.c | 15 +-- 1 files changed, 9 insertions(+), 6 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index 6a9d560..bf67e73 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -532,16 +532,17 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ if (!virtio_net_can_receive(n-nic-nc)) return -1; -if (!virtio_net_has_buffers(n, size)) +/* hdr_len refers to the header we supply to the guest */ +hdr_len = n-mergeable_rx_bufs ? +sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr); + + +if (!virtio_net_has_buffers(n, size + hdr_len)) return 0; if (!receive_filter(n, buf, size)) return size; -/* hdr_len refers to the header we supply to the guest */ -hdr_len = n-mergeable_rx_bufs ? -sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr); - offset = i = 0; while (offset size) { @@ -555,7 +556,9 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ virtqueue_pop(n-rx_vq, elem) == 0) { if (i == 0) return -1; -fprintf(stderr, virtio-net truncating packet\n); +fprintf(stderr, virtio-net truncating packet: + offset %zd, size %zd, hdr %zd\n, + offset, size, hdr_len); exit(1); } -- 1.7.1.12.g42b7f
[Qemu-devel] Re: [RFC PATCH v4 3/3] block: add sheepdog driver for distributed storage support
At Fri, 04 Jun 2010 13:04:00 +0200, Kevin Wolf wrote: Am 03.06.2010 18:23, schrieb MORITA Kazutaka: +static void sd_aio_cancel(BlockDriverAIOCB *blockacb) +{ + SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb; + + acb-canceled = 1; +} Does this provide the right semantics? You haven't really cancelled the request, but you pretend to. So you actually complete the request in the background and then throw the return code away. I seem to remember that posix-aio-compat.c waits at this point for completion of the requests, calls the callbacks and only afterwards returns from aio_cancel when no more requests are in flight. Or if you can really cancel requests, it would be the best option, of course. Sheepdog cannot cancel the requests which are already sent to the servers. So, as you say, we pretend to cancel the requests without waiting for completion of them. However, are there any situation where pretending to cancel causes problems in practice? I'm not sure how often it would happen in practice, but if the guest OS thinks the old value is on disk when in fact the new one is, this could lead to corruption. I think if it can happen, even without evidence that it actually does, it's already relevant enough. I agree. To wait for completion of the requests here, we may need to create another thread for processing I/O like posix-aio-compat.c. I don't think you need a thread to get the same behaviour, you just need to call the fd handlers like in the main loop. It would probably be the first driver doing this, though, and it's not an often used code path, so it might be a bad idea. Maybe it's reasonable to just complete the request with -EIO? This way the guest couldn't make any assumption about the data written. On the other hand, it could be unhappy about failed requests, but that's probably better than corruption. Completing with -EIO looks good to me. Thanks for the advice. I'll send an updated patch tomorrow. Regards, Kazutaka
[Qemu-devel] [Bug 590456] [NEW] qemu forum (http://qemu-forum.ipi.fi/) not available
Public bug reported: receive an error message since this week: --- General Error SQL ERROR [ mysql4 ] Host 'www-hostnet' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' [1129] An sql error occurred while fetching this page. Please contact an administrator if this problem persists. --- It's not a bug of qemu, but on project's related website. ** Affects: qemu Importance: Undecided Status: New -- qemu forum (http://qemu-forum.ipi.fi/) not available https://bugs.launchpad.net/bugs/590456 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: receive an error message since this week: --- General Error SQL ERROR [ mysql4 ] Host 'www-hostnet' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' [1129] An sql error occurred while fetching this page. Please contact an administrator if this problem persists. --- It's not a bug of qemu, but on project's related website.
[Qemu-devel] Re: [PATCH 6/6] apic: avoid using CPUState internals
On 06/05/2010 11:31 PM, Blue Swirl wrote: Use only an opaque CPUState pointer and move the actual CPUState contents handling to cpu.h and cpuid.c. Set env-halted in pc.c and add a function to get the local APIC state of the current CPU for the MMIO. Signed-off-by: Blue Swirlblauwir...@gmail.com --- hw/apic.c | 40 +++- hw/apic.h |9 - hw/pc.c | 12 +++- target-i386/cpu.h | 27 --- target-i386/cpuid.c |6 ++ 5 files changed, 56 insertions(+), 38 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 91c8d93..332c66e 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -95,7 +95,7 @@ #define MSI_ADDR_SIZE 0x10 struct APICState { -CPUState *cpu_env; +void *cpu_env; I proposed having an opaque CPUState type in hw/ but it was rejected. But I don't think using a void pointer is any better. Paolo
[Qemu-devel] Moving a memory mapping
Hi all, I create a mmio memory mapping with cpu_register_io_memory followed by cpu_register_physical_memory. The host then twiddles a control register that changes the mapping address. How can I move the mapping to the new address? I could not find a cpu_unregister_physical_memory function or equivalent. Best, OG.
Re: [Qemu-devel] qemu:virtio-9p: [RFC] [PATCH 01/02] Send iounit to client for read/write operations
Sripathi Kodi wrote: On Tue, 1 Jun 2010 19:47:14 +0530 M. Mohan Kumar mo...@in.ibm.com wrote: Compute iounit based on the host filesystem block size and pass it to client with open/create response. Also return iounit as statfs's f_bsize for optimal block size transfers. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- hw/virtio-9p.c | 56 ++-- hw/virtio-9p.h |3 +++ 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c index f087122..4357f1f 100644 --- a/hw/virtio-9p.c +++ b/hw/virtio-9p.c @@ -1,4 +1,4 @@ -/* +/* * Virtio 9p backend * * Copyright IBM, Corp. 2010 @@ -269,6 +269,11 @@ static int v9fs_do_fsync(V9fsState *s, int fd) return s-ops-fsync(s-ctx, fd); } +static int v9fs_do_statfs(V9fsState *s, V9fsString *path, struct statfs *stbuf) +{ +return s-ops-statfs(s-ctx, path-data, stbuf); +} + static void v9fs_string_init(V9fsString *str) { str-data = NULL; @@ -1035,11 +1040,10 @@ static void v9fs_fix_path(V9fsString *dst, V9fsString *src, int len) static void v9fs_version(V9fsState *s, V9fsPDU *pdu) { -int32_t msize; V9fsString version; size_t offset = 7; -pdu_unmarshal(pdu, offset, ds, msize, version); +pdu_unmarshal(pdu, offset, ds, s-msize, version); if (!strcmp(version.data, 9P2000.u)) { s-proto_version = V9FS_PROTO_2000U; @@ -1049,7 +1053,7 @@ static void v9fs_version(V9fsState *s, V9fsPDU *pdu) v9fs_string_sprintf(version, unknown); } -offset += pdu_marshal(pdu, offset, ds, msize, version); +offset += pdu_marshal(pdu, offset, ds, s-msize, version); complete_pdu(s, pdu, offset); v9fs_string_free(version); @@ -1304,6 +1308,20 @@ out: v9fs_walk_complete(s, vs, err); } +static int32_t get_iounit(V9fsState *s, V9fsString *name) +{ +struct statfs stbuf; +int32_t iounit = 0; + + +if (!v9fs_do_statfs(s, name, stbuf)) { +iounit = stbuf.f_bsize; +iounit *= (s-msize - P9_IOHDRSZ)/stbuf.f_bsize; If (s-msize - P9_IOHDRSZ) is less than stbuf.f_bsize iounit becomes zero. See below. +} + +return iounit; +} + static void v9fs_open_post_opendir(V9fsState *s, V9fsOpenState *vs, int err) { if (vs-fidp-dir == NULL) { @@ -1321,12 +1339,15 @@ out: static void v9fs_open_post_open(V9fsState *s, V9fsOpenState *vs, int err) { +int32_t iounit; + if (vs-fidp-fd == -1) { err = -errno; goto out; } -vs-offset += pdu_marshal(vs-pdu, vs-offset, Qd, vs-qid, 0); +iounit = get_iounit(s, vs-fidp-path); +vs-offset += pdu_marshal(vs-pdu, vs-offset, Qd, vs-qid, iounit); err = vs-offset; out: complete_pdu(s, vs-pdu, err); @@ -1800,11 +1821,16 @@ out: static void v9fs_post_create(V9fsState *s, V9fsCreateState *vs, int err) { +int32_t iounit; + +iounit = get_iounit(s, vs-fidp-path); + if (err == 0) { v9fs_string_copy(vs-fidp-path, vs-fullname); stat_to_qid(vs-stbuf, vs-qid); -vs-offset += pdu_marshal(vs-pdu, vs-offset, Qd, vs-qid, 0); +vs-offset += pdu_marshal(vs-pdu, vs-offset, Qd, vs-qid, +iounit); err = vs-offset; } @@ -2295,23 +2321,25 @@ out: qemu_free(vs); } -static int v9fs_do_statfs(V9fsState *s, V9fsString *path, struct statfs *stbuf) -{ -return s-ops-statfs(s-ctx, path-data, stbuf); -} - static void v9fs_statfs_post_statfs(V9fsState *s, V9fsStatfsState *vs, int err) { +int32_t bsize_factor; + if (err) { err = -errno; goto out; } +bsize_factor = (s-msize - P9_IOHDRSZ)/vs-stbuf.f_bsize; +if (!bsize_factor) { +bsize_factor = 1; +} Again, if (s-msize - P9_IOHDRSZ) is less than stbuf.f_bsize bsize_factor becomes zero. The following divisions become divide by zero! Yes, I think we should leave iounit alone return it with open/create and handle it as per the 9P protocol..and return whatever the fileserver gives for stat and satatfs. - JV Thanks, Sripathi. vs-v9statfs.f_type = vs-stbuf.f_type; vs-v9statfs.f_bsize = vs-stbuf.f_bsize; -vs-v9statfs.f_blocks = vs-stbuf.f_blocks; -vs-v9statfs.f_bfree = vs-stbuf.f_bfree; -vs-v9statfs.f_bavail = vs-stbuf.f_bavail; +vs-v9statfs.f_bsize *= bsize_factor; +vs-v9statfs.f_blocks = vs-stbuf.f_blocks/bsize_factor; +vs-v9statfs.f_bfree = vs-stbuf.f_bfree/bsize_factor; +vs-v9statfs.f_bavail = vs-stbuf.f_bavail/bsize_factor; vs-v9statfs.f_files = vs-stbuf.f_files; vs-v9statfs.f_ffree = vs-stbuf.f_ffree; vs-v9statfs.fsid_val = (unsigned int) vs-stbuf.f_fsid.__val[0] | diff --git a/hw/virtio-9p.h b/hw/virtio-9p.h index 6b3d4a4..9264163 100644 --- a/hw/virtio-9p.h +++ b/hw/virtio-9p.h @@ -72,6 +72,8 @@ enum p9_proto_version {
[Qemu-devel] [Bug 590552] [NEW] New default network card doesn't work with tap networking
Public bug reported: Unfortunately, I can provide very little information. Hope this will be useful anyway. I've upgraded qemu using debian apt to lastest unstable (QEMU PC emulator version 0.12.4 (Debian 0.12.4+dfsg-2), Copyright (c) 2003-2008 Fabrice Bellard): looks like at some point the default network card for -net nic option was switched to intel gigabit instead of the good old ne2k_pci. I was using -net tap -net nic options and my network stopped working. When not working, - tcpdump on the host shows me taht all packets are sent and received fine from guest - tcpdump on guest shows that packets from host are NOT received obviously, both host tap interface and guest eth0 interfaces, routing tables, dns, firewall, etc... are well configured. Having banged my head for a while, I finally stopped the host and started it again using -net nic,model=ne2k_pci option, then my network magically started working again. ** Affects: qemu Importance: Undecided Status: New -- New default network card doesn't work with tap networking https://bugs.launchpad.net/bugs/590552 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: Unfortunately, I can provide very little information. Hope this will be useful anyway. I've upgraded qemu using debian apt to lastest unstable (QEMU PC emulator version 0.12.4 (Debian 0.12.4+dfsg-2), Copyright (c) 2003-2008 Fabrice Bellard): looks like at some point the default network card for -net nic option was switched to intel gigabit instead of the good old ne2k_pci. I was using -net tap -net nic options and my network stopped working. When not working, - tcpdump on the host shows me taht all packets are sent and received fine from guest - tcpdump on guest shows that packets from host are NOT received obviously, both host tap interface and guest eth0 interfaces, routing tables, dns, firewall, etc... are well configured. Having banged my head for a while, I finally stopped the host and started it again using -net nic,model=ne2k_pci option, then my network magically started working again.
[Qemu-devel] [Bug 590456] Re: qemu forum (http://qemu-forum.ipi.fi/) not available
The QEMU forum is in no way officially associated with QEMU. We have no access to the server and no ability to fix it. I don't even know who runs it. ** Changed in: qemu Status: New = Invalid -- qemu forum (http://qemu-forum.ipi.fi/) not available https://bugs.launchpad.net/bugs/590456 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: Invalid Bug description: receive an error message since this week: --- General Error SQL ERROR [ mysql4 ] Host 'www-hostnet' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' [1129] An sql error occurred while fetching this page. Please contact an administrator if this problem persists. --- It's not a bug of qemu, but on project's related website.
[Qemu-devel] [PATCH v5 RESEND 2/4] Introduce cpu_physical_memory_get_dirty_range().
It checks the first row and puts dirty addr in the array. If the first row is empty, it skips to the first non-dirty row or the end addr, and put the length in the first entry of the array. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- cpu-all.h |4 +++ exec.c| 67 + 2 files changed, 71 insertions(+), 0 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 77ef939..6ac8fc2 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -1019,6 +1019,10 @@ static inline void cpu_physical_memory_mask_dirty_range(ram_addr_t start, } } +int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, +ram_addr_t *dirty_rams, int length, +int dirty_flags); + void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, int dirty_flags); void cpu_tlb_update_dirty(CPUState *env); diff --git a/exec.c b/exec.c index 21299be..5fabb24 100644 --- a/exec.c +++ b/exec.c @@ -2023,6 +2023,73 @@ static inline void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, } } +/* It checks the first row and puts dirty addrs in the array. + If the first row is empty, it skips to the first non-dirty row + or the end addr, and put the length in the first entry of the array. */ +int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, +ram_addr_t *dirty_rams, int length, +int dirty_flag) +{ +unsigned long p = 0, page_number; +ram_addr_t addr; +ram_addr_t s_idx = (start TARGET_PAGE_BITS) / HOST_LONG_BITS; +ram_addr_t e_idx = (end TARGET_PAGE_BITS) / HOST_LONG_BITS; +int i, j, offset, dirty_idx = dirty_flag_to_idx(dirty_flag); + +/* mask bits before the start addr */ +offset = (start TARGET_PAGE_BITS) (HOST_LONG_BITS - 1); +cpu_physical_memory_sync_master(s_idx); +p |= phys_ram_dirty[dirty_idx][s_idx] ~((1UL offset) - 1); + +if (s_idx == e_idx) { +/* mask bits after the end addr */ +offset = (end TARGET_PAGE_BITS) (HOST_LONG_BITS - 1); +p = (1UL offset) - 1; +} + +if (p == 0) { +/* when the row is empty */ +ram_addr_t skip; +if (s_idx == e_idx) { +skip = end; +} else { +/* skip empty rows */ +while (s_idx e_idx) { +s_idx++; +cpu_physical_memory_sync_master(s_idx); + +if (phys_ram_dirty[dirty_idx][s_idx] != 0) { +break; +} +} +skip = (s_idx * HOST_LONG_BITS * TARGET_PAGE_SIZE); +} +dirty_rams[0] = skip - start; +i = 0; + +} else if (p == ~0UL) { +/* when the row is fully dirtied */ +addr = start; +for (i = 0; i length; i++) { +dirty_rams[i] = addr; +addr += TARGET_PAGE_SIZE; +} +} else { +/* when the row is partially dirtied */ +i = 0; +do { +j = ffsl(p) - 1; +p = ~(1UL j); +page_number = s_idx * HOST_LONG_BITS + j; +addr = page_number * TARGET_PAGE_SIZE; +dirty_rams[i] = addr; +i++; +} while (p != 0 i length); +} + +return i; +} + /* Note: start and end must be within the same ram block. */ void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end, int dirty_flags) -- 1.7.0.31.g1df487
[Qemu-devel] [PATCH v5 RESEND 4/4] Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages.
Modifies ram_save_block() and ram_save_remaining() to use cpu_physical_memory_get_dirty_range() to check multiple dirty and non-dirty pages at once. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- arch_init.c | 57 +++-- 1 files changed, 35 insertions(+), 22 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8e849a8..6f2ed29 100644 --- a/arch_init.c +++ b/arch_init.c @@ -108,32 +108,39 @@ static int ram_save_block(QEMUFile *f) static ram_addr_t current_addr = 0; ram_addr_t saved_addr = current_addr; ram_addr_t addr = 0; -int bytes_sent = 0; +ram_addr_t dirty_rams[HOST_LONG_BITS]; +int i, found, bytes_sent = 0; while (addr last_ram_offset) { -if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) { +if ((found = cpu_physical_memory_get_dirty_range( + current_addr, last_ram_offset, dirty_rams, HOST_LONG_BITS, + MIGRATION_DIRTY_FLAG))) { uint8_t *p; -cpu_physical_memory_reset_dirty(current_addr, -current_addr + TARGET_PAGE_SIZE, -MIGRATION_DIRTY_FLAG); - -p = qemu_get_ram_ptr(current_addr); - -if (is_dup_page(p, *p)) { -qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_COMPRESS); -qemu_put_byte(f, *p); -bytes_sent = 1; -} else { -qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_PAGE); -qemu_put_buffer(f, p, TARGET_PAGE_SIZE); -bytes_sent = TARGET_PAGE_SIZE; +for (i = 0; i found; i++) { +ram_addr_t page_addr = dirty_rams[i]; +cpu_physical_memory_reset_dirty(page_addr, +page_addr + TARGET_PAGE_SIZE, +MIGRATION_DIRTY_FLAG); + +p = qemu_get_ram_ptr(page_addr); + +if (is_dup_page(p, *p)) { +qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_COMPRESS); +qemu_put_byte(f, *p); +bytes_sent++; +} else { +qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_PAGE); +qemu_put_buffer(f, p, TARGET_PAGE_SIZE); +bytes_sent += TARGET_PAGE_SIZE; +} } break; +} else { +addr += dirty_rams[0]; +current_addr = (saved_addr + addr) % last_ram_offset; } -addr += TARGET_PAGE_SIZE; -current_addr = (saved_addr + addr) % last_ram_offset; } return bytes_sent; @@ -143,12 +150,18 @@ static uint64_t bytes_transferred; static ram_addr_t ram_save_remaining(void) { -ram_addr_t addr; +ram_addr_t addr = 0; ram_addr_t count = 0; +ram_addr_t dirty_rams[HOST_LONG_BITS]; +int found = 0; -for (addr = 0; addr last_ram_offset; addr += TARGET_PAGE_SIZE) { -if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) { -count++; +while (addr last_ram_offset) { +if ((found = cpu_physical_memory_get_dirty_range(addr, last_ram_offset, +dirty_rams, HOST_LONG_BITS, MIGRATION_DIRTY_FLAG))) { +count += found; +addr = dirty_rams[found - 1] + TARGET_PAGE_SIZE; +} else { +addr += dirty_rams[0]; } } -- 1.7.0.31.g1df487
[Qemu-devel] [PATCH v5 RESEND 1/4] Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty.
Replaces byte-based phys_ram_dirty bitmap with four (MASTER, VGA, CODE, MIGRATION) bit-based phys_ram_dirty bitmap. On allocation, it sets all bits in the bitmap. It uses ffs() to convert DIRTY_FLAG to DIRTY_IDX. Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap. MASTER works as a buffer, and upon get_diry() or get_dirty_flags(), it calls cpu_physical_memory_sync_master() to update VGA and MIGRATION. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp --- cpu-all.h | 128 - exec.c| 15 -- qemu-common.h |3 + 3 files changed, 121 insertions(+), 25 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index 77eaf85..77ef939 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -37,6 +37,9 @@ #include softfloat.h +/* to use ffs in flag_to_idx() */ +#include strings.h + #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN) #define BSWAP_NEEDED #endif @@ -859,7 +862,6 @@ target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr); /* memory API */ extern int phys_ram_fd; -extern uint8_t *phys_ram_dirty; extern ram_addr_t ram_size; extern ram_addr_t last_ram_offset; @@ -884,51 +886,137 @@ extern int mem_prealloc; /* Set if TLB entry is an IO callback. */ #define TLB_MMIO(1 5) -#define VGA_DIRTY_FLAG 0x01 -#define CODE_DIRTY_FLAG 0x02 -#define MIGRATION_DIRTY_FLAG 0x08 +/* Use DIRTY_IDX as indexes of bit-based phys_ram_dirty. */ +#define MASTER_DIRTY_IDX0 +#define VGA_DIRTY_IDX 1 +#define CODE_DIRTY_IDX 2 +#define MIGRATION_DIRTY_IDX 3 +#define NUM_DIRTY_IDX 4 + +#define MASTER_DIRTY_FLAG(1 MASTER_DIRTY_IDX) +#define VGA_DIRTY_FLAG (1 VGA_DIRTY_IDX) +#define CODE_DIRTY_FLAG (1 CODE_DIRTY_IDX) +#define MIGRATION_DIRTY_FLAG (1 MIGRATION_DIRTY_IDX) + +extern unsigned long *phys_ram_dirty[NUM_DIRTY_IDX]; + +static inline int dirty_flag_to_idx(int flag) +{ +return ffs(flag) - 1; +} + +static inline int dirty_idx_to_flag(int idx) +{ +return 1 idx; +} /* read dirty bit (return 0 or 1) */ static inline int cpu_physical_memory_is_dirty(ram_addr_t addr) { -return phys_ram_dirty[addr TARGET_PAGE_BITS] == 0xff; +unsigned long mask; +ram_addr_t index = (addr TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr TARGET_PAGE_BITS) (HOST_LONG_BITS - 1); + +mask = 1UL offset; +return (phys_ram_dirty[MASTER_DIRTY_IDX][index] mask) == mask; +} + +static inline void cpu_physical_memory_sync_master(ram_addr_t index) +{ +if (phys_ram_dirty[MASTER_DIRTY_IDX][index]) { +phys_ram_dirty[VGA_DIRTY_IDX][index] +|= phys_ram_dirty[MASTER_DIRTY_IDX][index]; +phys_ram_dirty[MIGRATION_DIRTY_IDX][index] +|= phys_ram_dirty[MASTER_DIRTY_IDX][index]; +phys_ram_dirty[MASTER_DIRTY_IDX][index] = 0UL; +} } static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) { -return phys_ram_dirty[addr TARGET_PAGE_BITS]; + unsigned long mask; + ram_addr_t index = (addr TARGET_PAGE_BITS) / HOST_LONG_BITS; + int offset = (addr TARGET_PAGE_BITS) (HOST_LONG_BITS - 1); + int ret = 0, i; + + mask = 1UL offset; + cpu_physical_memory_sync_master(index); + + for (i = VGA_DIRTY_IDX; i = MIGRATION_DIRTY_IDX; i++) { + if (phys_ram_dirty[i][index] mask) { + ret |= dirty_idx_to_flag(i); + } + } + + return ret; +} + +static inline int cpu_physical_memory_get_dirty_idx(ram_addr_t addr, +int dirty_idx) +{ +unsigned long mask; +ram_addr_t index = (addr TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr TARGET_PAGE_BITS) (HOST_LONG_BITS - 1); + +mask = 1UL offset; +cpu_physical_memory_sync_master(index); +return (phys_ram_dirty[dirty_idx][index] mask) == mask; } static inline int cpu_physical_memory_get_dirty(ram_addr_t addr, int dirty_flags) { -return phys_ram_dirty[addr TARGET_PAGE_BITS] dirty_flags; +return cpu_physical_memory_get_dirty_idx(addr, + dirty_flag_to_idx(dirty_flags)); } static inline void cpu_physical_memory_set_dirty(ram_addr_t addr) { -phys_ram_dirty[addr TARGET_PAGE_BITS] = 0xff; +unsigned long mask; +ram_addr_t index = (addr TARGET_PAGE_BITS) / HOST_LONG_BITS; +int offset = (addr TARGET_PAGE_BITS) (HOST_LONG_BITS - 1); + +mask = 1UL offset; +phys_ram_dirty[MASTER_DIRTY_IDX][index] |= mask; } -static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr, - int dirty_flags) +static inline void cpu_physical_memory_set_dirty_range(ram_addr_t addr, + unsigned long mask) { -return
[Qemu-devel] [PATCH v5 RESEND 3/4] Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.
Modifies kvm_physical_sync_dirty_bitmap to use cpu_physical_memory_set_dirty_range() to update the row of the bit-based phys_ram_dirty bitmap at once. Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp --- kvm-all.c | 24 +--- 1 files changed, 9 insertions(+), 15 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index c238f54..0d29798 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -290,8 +290,8 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, unsigned long offset, unsigned long mem_size) { -unsigned int i, j; -unsigned long page_number, addr, addr1, c; +unsigned int i; +unsigned long page_number, addr, addr1; ram_addr_t ram_addr; unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) / HOST_LONG_BITS; @@ -302,23 +302,17 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, */ for (i = 0; i len; i++) { if (bitmap[i] != 0) { -c = leul_to_cpu(bitmap[i]); -do { -j = ffsl(c) - 1; -c = ~(1ul j); -page_number = i * HOST_LONG_BITS + j; -addr1 = page_number * TARGET_PAGE_SIZE; -addr = offset + addr1; -ram_addr = cpu_get_physical_page_desc(addr); -cpu_physical_memory_set_dirty(ram_addr); -} while (c != 0); +page_number = i * HOST_LONG_BITS; +addr1 = page_number * TARGET_PAGE_SIZE; +addr = offset + addr1; +ram_addr = cpu_get_physical_page_desc(addr); +cpu_physical_memory_set_dirty_range(ram_addr, +leul_to_cpu(bitmap[i])); } } return 0; } -#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) - /** * kvm_physical_sync_dirty_bitmap - Grab dirty bitmap from kernel space * This function updates qemu's dirty bitmap using cpu_physical_memory_set_dirty(). @@ -343,7 +337,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr, break; } -size = ALIGN(((mem-memory_size) TARGET_PAGE_BITS), HOST_LONG_BITS) / 8; +size = BITMAP_SIZE(mem-memory_size); if (!d.dirty_bitmap) { d.dirty_bitmap = qemu_malloc(size); } else if (size allocated_size) { -- 1.7.0.31.g1df487
[Qemu-devel] [PATCH v5 RESEND 0/4] Introduce bit-based phys_ram_dirty, and bit-based dirty page checker.
The dirty and non-dirty pages are checked one by one. When most of the memory is not dirty, checking the dirty and non-dirty pages by multiple page size should be much faster than checking them one by one. We introduced bit-based phys_ram_dirty for VGA, CODE, MIGRATION, MASTER, and cpu_physical_memory_get_dirty_range() for this purpose. The following numbers show the speed up of bit-based phys_ram_dirty. The speed up grows when the number of rows, whose contents are 0, gets larger. Test Environment: CPU: 4x Intel Xeon Quad Core 2.66GHz Mem size: 96GB Host OS: CentOS (kernel 2.6.33) Guest OS: Debian/GNU Linux lenny (kernel 2.6.26) Guest Mem size: 512MB Conditions of experiments are as follows: Cond1: Guest OS periodically makes the 256MB continuous dirty pages. Cond2: Guest OS periodically makes the 256MB dirty pages and non-dirty pages in turn. Cond3: Guest OS read 1GB file, which is bigger than memory. Cond4: Guest OS write 1GB file, which is bigger than memory. Experimental results: Cond1: 5 ??? 83 times speed up Cond2: 5 ??? 52 times speed up Cond3: 5 ??? 132 times speed up Cond4: 5 ??? 57 times speed up Changes from v4 to v5 are: - Rebased to HEAD (0ffbba357c557d9fa5caf9476878a4b9c155a614) - Use BITMAP_SIZE() in kvm_physical_sync_dirty_bitmap() (3/4) Changes from v3 to v4 are: - Merged {1,2,3}/6 to compile correctly. - Fix setting bits after phys_ram_dirty allocation. - renamed DIRTY_FLAG and DIRTY_IDX converter function. Changes from v2 to v3 are: - Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3) - Use ffs to convert FLAGS to IDX. - Add a helper function which takes IDX. - Change the behavior of MASTER as a buffer. - Change dirty bitmap access to a loop. - Add brace after if () Yoshiaki Tamura (4): Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty. Introduce cpu_physical_memory_get_dirty_range(). Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty. Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages. arch_init.c | 57 +++- cpu-all.h | 132 - exec.c| 82 +-- kvm-all.c | 24 -- qemu-common.h |3 + 5 files changed, 236 insertions(+), 62 deletions(-)
[Qemu-devel] Few Questions about QEMU JSON
Hello, Basically i want to seperate QEMU(Instruction translations, hardware emulation drivers etc...) and Simulators (UI,events etc...), Someone suggested me to use json mechanism. I want to understand more on json, can u please give me some insight,It there is any document or something it will be helpful. Also is it possible to seperate QEMU and Simulator?If yes can we use JSON? Warm Regards, Akshay