No longer working on HPET
I have decided to take a job outside of IBM and so will not be involved with HPET any longer. Working on KVM has been great fun... top-notch people and a top-notch technology. Wishing KVM and you all the best! -- Regards, Beth Kon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The HPET issue on Linux
Beth Kon wrote: Dor Laor wrote: On 01/06/2010 12:09 PM, Gleb Natapov wrote: On Wed, Jan 06, 2010 at 05:48:52PM +0800, Sheng Yang wrote: Hi Beth I still found the emulated HPET would result in some boot failure. For example, on my 2.6.30, with HPET enabled, the kernel would fail check_timer(), especially in timer_irq_works(). The testing of timer_irq_works() is let 10 ticks pass(using mdelay()), and want to confirm the clock source with at least 5 ticks advanced in jiffies. I've checked that, on my machine, it would mostly get only 4 ticks when HPET enabled, then fail the test. On the other hand, if I using PIT, it would get more than 10 ticks(maybe understandable if some complementary ticks there). Of course, extend the ticks count/mdelay() time can work. I think it's a major issue of HPET. And it maybe just due to a too long userspace path for interrupt injection... If it's true, I think it's not easy to deal with it. PIT tick are reinjected automatically, HPET should probably do the same although it may just create another set of problems. Older Linux do automatic adjustment for lost ticks so automatic reinjection causes time to run too fast. This is why we added the -no-kvm-pit-reinject flag... It took lots of time to pit/rtc to stabilize, in order of seriously consider the hpet emulation, lots of testing should be done. I will try to look into this. Since HPET is edge-triggered, looks like this problem is of a different nature than PIT. Is this a solid failure or intermittent? Anthony just explained that on x86, even edge-triggered interrupts are queued in the apic and an eoi will occur, so this is not different than the PIT. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Beth Kon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The HPET issue on Linux
Dor Laor wrote: On 01/06/2010 12:09 PM, Gleb Natapov wrote: On Wed, Jan 06, 2010 at 05:48:52PM +0800, Sheng Yang wrote: Hi Beth I still found the emulated HPET would result in some boot failure. For example, on my 2.6.30, with HPET enabled, the kernel would fail check_timer(), especially in timer_irq_works(). The testing of timer_irq_works() is let 10 ticks pass(using mdelay()), and want to confirm the clock source with at least 5 ticks advanced in jiffies. I've checked that, on my machine, it would mostly get only 4 ticks when HPET enabled, then fail the test. On the other hand, if I using PIT, it would get more than 10 ticks(maybe understandable if some complementary ticks there). Of course, extend the ticks count/mdelay() time can work. I think it's a major issue of HPET. And it maybe just due to a too long userspace path for interrupt injection... If it's true, I think it's not easy to deal with it. PIT tick are reinjected automatically, HPET should probably do the same although it may just create another set of problems. Older Linux do automatic adjustment for lost ticks so automatic reinjection causes time to run too fast. This is why we added the -no-kvm-pit-reinject flag... It took lots of time to pit/rtc to stabilize, in order of seriously consider the hpet emulation, lots of testing should be done. I will try to look into this. Since HPET is edge-triggered, looks like this problem is of a different nature than PIT. Is this a solid failure or intermittent? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Beth Kon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] Kernel changes for HPET legacy support(v9)
When kvm is in hpet_legacy_mode, the hpet is providing the timer interrupt and the pit should not be. So in legacy mode, the pit timer is destroyed, but the *state* of the pit is maintained. So if kvm or the guest tries to modify the state of the pit, this modification is accepted, *except* that the timer isn't actually started. When we exit hpet_legacy_mode, the current state of the pit (which is up to date since we've been accepting modifications) is used to restart the pit timer. The saved_mode code in kvm_pit_load_count temporarily changes mode to 0xff in order to destroy the timer, but then restores the actual value, again maintaining "current" state of the pit for possible later reenablement. Changes from v7: - added kvm_pit_state2 struct with flags field - replaced hpet legacy mode ioctl with get/set pit2 ioctl changes from v6: - added ioctl interface for legacy mode in order not to break the abi. Signed-off-by: Beth Kon --- arch/x86/include/asm/kvm.h |8 ++ arch/x86/kvm/i8254.c | 22 ++--- arch/x86/kvm/i8254.h |3 +- arch/x86/kvm/x86.c | 55 +++- include/linux/kvm.h|6 5 files changed, 88 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 708b9c3..f5554dd 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -18,6 +18,7 @@ #define __KVM_HAVE_GUEST_DEBUG #define __KVM_HAVE_MSIX #define __KVM_HAVE_MCE +#define __KVM_HAVE_PIT_STATE2 /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 @@ -237,6 +238,13 @@ struct kvm_pit_state { struct kvm_pit_channel_state channels[3]; }; +#define KPIT_FLAGS_HPET_LEGACY 0x0001 + +struct kvm_pit_state2 { + struct kvm_pit_channel_state channels[3]; + __u32 flags; +}; + struct kvm_reinject_control { __u8 pit_reinject; __u8 reserved[31]; diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 6e0a203..0b0a761 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -329,20 +329,33 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) case 1: /* FIXME: enhance mode 4 precision */ case 4: - create_pit_timer(ps, val, 0); + if (!(ps->flags & KPIT_FLAGS_HPET_LEGACY)) { + create_pit_timer(ps, val, 0); + } break; case 2: case 3: - create_pit_timer(ps, val, 1); + if (!(ps->flags & KPIT_FLAGS_HPET_LEGACY)){ + create_pit_timer(ps, val, 1); + } break; default: destroy_pit_timer(&ps->pit_timer); } } -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start) { - pit_load_count(kvm, channel, val); + u8 saved_mode; + if (hpet_legacy_start) { + /* save existing mode for later reenablement */ + saved_mode = kvm->arch.vpit->pit_state.channels[0].mode; + kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */ + pit_load_count(kvm, channel, val); + kvm->arch.vpit->pit_state.channels[0].mode = saved_mode; + } else { + pit_load_count(kvm, channel, val); + } } static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev) @@ -546,6 +559,7 @@ void kvm_pit_reset(struct kvm_pit *pit) struct kvm_kpit_channel_state *c; mutex_lock(&pit->pit_state.lock); + pit->pit_state.flags = 0; for (i = 0; i < 3; i++) { c = &pit->pit_state.channels[i]; c->mode = 0xff; diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index b267018..d4c1c7f 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -21,6 +21,7 @@ struct kvm_kpit_channel_state { struct kvm_kpit_state { struct kvm_kpit_channel_state channels[3]; + u32 flags; struct kvm_timer pit_timer; bool is_periodic; u32speaker_data_on; @@ -49,7 +50,7 @@ struct kvm_pit { #define KVM_PIT_CHANNEL_MASK 0x3 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val); +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm); void kvm_pit_reset(struct kvm_pit *pit); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index af53f64..aa75466 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1196,6 +1196,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IR
[PATCH 2/5] Userspace changes for irq0->inti2 override support (v9)
Select irq0->irq2 override based on kernel gsi routing availability If the kernel does not support gsi routing, we cannot do the irq0->irq2 override, so disable it in that case. Signed-off-by: Beth Kon Signed-off-by: Avi Kivity --- hw/ioapic.c|6 +++--- hw/pc.c|2 ++ qemu-kvm-x86.c |6 +- qemu-kvm.h |2 ++ sysemu.h |1 + vl.c | 11 +-- 6 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index a7a5ef9..c894b72 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/hw/pc.c b/hw/pc.c index 05d05e0..043a0da 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -55,6 +55,7 @@ #define BIOS_CFG_IOPORT 0x510 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0) #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1) +#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2) #define MAX_IDE_BUS 2 @@ -476,6 +477,7 @@ static void bochs_bios_init(void) fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, acpi_tables_len); +fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, &irq0override, 1); smbios_table = smbios_get_table(&smbios_len); if (smbios_table) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index a78073e..f7c66d1 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -1561,7 +1561,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index eb99bc4..b044ead 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -167,6 +167,7 @@ int kvm_has_sync_mmu(void); #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); #else @@ -175,6 +176,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) #define qemu_kvm_cpu_stop(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 2824b0d..5b42506 100644 --- a/sysemu.h +++ b/sysemu.h @@ -111,6 +111,7 @@ extern int xenfb_enabled; extern int graphic_width; extern int graphic_height; extern int graphic_depth; +extern uint8_t irq0override; extern DisplayType display_type; extern const char *keyboard_layout; extern int win2k_install_hack; diff --git a/vl.c b/vl.c index df583b7..d8b7198 100644 --- a/vl.c +++ b/vl.c @@ -255,6 +255,7 @@ int no_reboot = 0; int no_shutdown = 0; int cursor_hide = 1; int graphic_rotate = 0; +uint8_t irq0override = 1; #ifndef _WIN32 int daemonize = 0; #endif @@ -6199,8 +6200,14 @@ int main(int argc, char **argv, char **envp) module_call_init(MODULE_INIT_DEVICE); -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { +irq0override = 0; +} +#endif +} machine->init(ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Userspace changes for qemu-kvm HPET support(v9)
The big change here is handling of enabling/disabling of hpet legacy mode. When hpet enters legacy mode, the spec says that the pit stops generating interrupts. In practice, we want to stop the pit periodic timer from running because it is wasteful in a virtual environment. We also have to worry about the hpet leaving legacy mode (which, at least in linux, happens only during a shutdown or crash). At this point, according to the hpet spec, PIT interrupts need to be reenabled. For us, it means the PIT timer needs to be restarted. This patch handles this situation better than the earlier versions by coming closer to just disabling PIT interrupts. It allows the PIT state to change if the OS modifies it, even while PIT is disabled, but does not allow a pit timer to start. Then if HPET legacy mode is disabled, whatever the PIT state is at that point, the PIT timer is restarted accordingly. Changes from v8: - incremented PIT_SAVEVM_VERSION - changed pit_load to check for version_id != PIT_SAVEVM_VERSION - removed unnecessary return Changes from v7: - added flags field to PITState - added kvm_pit_state2 struct with flags field - replaced hpet legacy mode ioctl with get/set pit2 ioctl Changes from v6: - added ioctl interface for setting hpet legacy mode in kernel pit - moved check for hpet_legacy_mode in pit_load_count to allow state info to be copied before returning if legacy mode is enabled. - sprinkled in some #ifdef TARGET_I386 Signed-off-by: Beth Kon --- hw/hpet.c | 16 +++-- hw/i8254-kvm.c| 25 ++ hw/i8254.c| 79 hw/i8254.h|5 ++- hw/pc.h |4 +- kvm/include/linux/kvm.h |4 ++ kvm/include/x86/asm/kvm.h |7 libkvm-all.h | 32 +- qemu-kvm-x86.c| 38 + qemu-kvm.c| 20 +++ qemu-kvm.h|8 vl.c | 21 ++-- 12 files changed, 218 insertions(+), 41 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index e0be486..462e6db 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int version_id) qemu_get_timer(f, s->timer[i].qemu_timer); } } +if (hpet_in_legacy_mode()) { +hpet_disable_pit(); +} return 0; } @@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_disable_pit(); +dprintf("qemu: hpet disabled pit\n"); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); } break; case HPET_CFG + 4: @@ -554,13 +559,16 @@ static void hpet_reset(void *opaque) { /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */ s->capability = 0x8086a201ULL; s->capability |= ((HPET_CLK_PERIOD) << 32); -if (count > 0) +s->config = 0ULL; +if (count > 0) { /* we don't enable pit when hpet_reset is first called (by hpet_init) * because hpet is taking over for pit here. On subsequent invocations, * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); +} count = 1; } diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c index 8390d75..af26e4f 100644 --- a/hw/i8254-kvm.c +++ b/hw/i8254-kvm.c @@ -33,15 +33,20 @@ static PITState pit_state; static void kvm_pit_save(QEMUFile *f, void *opaque) { PITState *s = opaque; -struct kvm_pit_state pit; +struct kvm_pit_state2 pit2; struct kvm_pit_channel_state *c; struct PITChannelState *sc; int i; -kvm_get_pit(kvm_context, &pit); - +if(qemu_kvm_has_pit_state2()) { +kvm_get_pit2(kvm_context, &pit2); +s->flags = pit2.flags; +} else { +/* pit2 is superset of pit struct so just cast it and use it */ +kvm_get_pit(kvm_context, (struct kvm_pit_state *)&pit2); +} for (i = 0; i < 3; i++) { - c = &pit.channels[i]; + c = &pit2.channels[i]; sc = &s->channels[i]; sc->count = c->count; sc->latched_count = c->latched_count; @@ -64,15 +69,16 @@ static void kvm_pit_save(QEMUFile *f, void *opaque) static int kvm_pit_loa
[PATCH 3/5] BIOS changes for qemu-kvm hpet support (v9)
Advertise HPET in ACPI HPET table Signed-off-by: Beth Kon Signed-off-by: Avi Kivity --- kvm/bios/acpi-dsdt.dsl |2 -- kvm/bios/rombios32.c | 11 +++ 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index 3560baa..26fc7ad 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -194,7 +194,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -214,7 +213,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index f9e0452..28f2b21 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1526,8 +1526,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1716,13 +1716,11 @@ void acpi_bios_init(void) addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif /* RSDP */ memset(rsdp, 0, sizeof(*rsdp)); @@ -1900,7 +1898,6 @@ void acpi_bios_init(void) } /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1909,7 +1906,6 @@ void acpi_bios_init(void) hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); -#endif #endif @@ -1919,8 +1915,7 @@ void acpi_bios_init(void) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr); rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(madt_addr); #ifdef BX_QEMU -/* No HPET (yet) */ -// rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); acpi_additional_tables(); /* resets cfg to required entry */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] BIOS changes for irq0->inti2 override (v9)
bios: allow qemu to configure irq0->inti2 override Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). Changes from v8 - Incorporated Gleb's comments to patch 1/5 and 4/5. In 1/5, removed a "return" per Gleb's comment. See 4/5 for v8->v9 change description. Signed-off-by: Beth Kon --- kvm/bios/rombios32.c | 66 + 1 files changed, 50 insertions(+), 16 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 0369111..f9e0452 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -446,6 +446,9 @@ uint32_t cpuid_features; uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) +#define QEMU_CFG_IRQ0_OVERRIDE (QEMU_CFG_ARCH_LOCAL + 2) int qemu_cfg_port; @@ -555,6 +559,16 @@ uint64_t qemu_cfg_get64 (void) } #endif +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +} +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1153,7 +1167,14 @@ static void mptable_init(void) putstr(&q, "0.1 "); /* vendor id */ putle32(&q, 0); /* OEM table ptr */ putle16(&q, 0); /* OEM table size */ +#ifdef BX_QEMU +if (irq0_override) +putle16(&q, MAX_CPUS + 17); /* entry count */ +else +putle16(&q, MAX_CPUS + 18); /* entry count */ +#else putle16(&q, MAX_CPUS + 18); /* entry count */ +#endif putle32(&q, 0xfee0); /* local APIC addr */ putle16(&q, 0); /* ext table length */ putb(&q, 0); /* ext table checksum */ @@ -1197,6 +1218,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1204,7 +1232,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1768,23 +1801,21 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); -#ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +int_override = (struct madt_int_override*)(io_apic); +#ifdef BX_QEMU +if (irq0_override) { +memset(int_override, 0, sizeof(*int_override)); +int_override->type = APIC_XRUPT_OVERRIDE; +int_override->length = sizeof(*int_override); +int_override->source = 0; +int_override->gsi = 2; +int_override->flags = 0; /* conforms to bus specifications */ +int_override++; +} #endif - -int_override = (struct madt_int_override*)(io_apic + 1); -for ( i = 0; i < 16; i++ ) { -if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +fo
[PATCH 4/5] Userspace changes for qemu-kvm HPET support(v8)
The big change here is handling of enabling/disabling of hpet legacy mode. When hpet enters legacy mode, the spec says that the pit stops generating interrupts. In practice, we want to stop the pit periodic timer from running because it is wasteful in a virtual environment. We also have to worry about the hpet leaving legacy mode (which, at least in linux, happens only during a shutdown or crash). At this point, according to the hpet spec, PIT interrupts need to be reenabled. For us, it means the PIT timer needs to be restarted. This patch handles this situation better than the earlier versions by coming closer to just disabling PIT interrupts. It allows the PIT state to change if the OS modifies it, even while PIT is disabled, but does not allow a pit timer to start. Then if HPET legacy mode is disabled, whatever the PIT state is at that point, the PIT timer is restarted accordingly. Changes from v7: - added flags field to PITState - added kvm_pit_state2 struct with flags field - replaced hpet legacy mode ioctl with get/set pit2 ioctl Changes from v6: - added ioctl interface for setting hpet legacy mode in kernel pit - moved check for hpet_legacy_mode in pit_load_count to allow state info to be copied before returning if legacy mode is enabled. - sprinkled in some #ifdef TARGET_I386 Signed-off-by: Beth Kon --- hw/hpet.c | 16 +++-- hw/i8254-kvm.c| 26 ++- hw/i8254.c| 77 hw/i8254.h|3 ++ hw/pc.h |4 +- kvm/include/linux/kvm.h |4 ++ kvm/include/x86/asm/kvm.h |7 libkvm-all.h | 32 ++- qemu-kvm-x86.c| 38 ++ qemu-kvm.c| 20 qemu-kvm.h|8 + vl.c | 21 ++-- 12 files changed, 217 insertions(+), 39 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index e0be486..462e6db 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int version_id) qemu_get_timer(f, s->timer[i].qemu_timer); } } +if (hpet_in_legacy_mode()) { +hpet_disable_pit(); +} return 0; } @@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_disable_pit(); +dprintf("qemu: hpet disabled pit\n"); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); } break; case HPET_CFG + 4: @@ -554,13 +559,16 @@ static void hpet_reset(void *opaque) { /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */ s->capability = 0x8086a201ULL; s->capability |= ((HPET_CLK_PERIOD) << 32); -if (count > 0) +s->config = 0ULL; +if (count > 0) { /* we don't enable pit when hpet_reset is first called (by hpet_init) * because hpet is taking over for pit here. On subsequent invocations, * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); +} count = 1; } diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c index 8390d75..8145658 100644 --- a/hw/i8254-kvm.c +++ b/hw/i8254-kvm.c @@ -33,15 +33,20 @@ static PITState pit_state; static void kvm_pit_save(QEMUFile *f, void *opaque) { PITState *s = opaque; -struct kvm_pit_state pit; +struct kvm_pit_state2 pit2; struct kvm_pit_channel_state *c; struct PITChannelState *sc; int i; -kvm_get_pit(kvm_context, &pit); - +if(qemu_kvm_has_pit_state2()) { +kvm_get_pit2(kvm_context, &pit2); +s->flags = pit2.flags; +} else { +/* pit2 is superset of pit struct so just cast it and use it */ +kvm_get_pit(kvm_context, (struct kvm_pit_state *)&pit2); +} for (i = 0; i < 3; i++) { - c = &pit.channels[i]; + c = &pit2.channels[i]; sc = &s->channels[i]; sc->count = c->count; sc->latched_count = c->latched_count; @@ -64,15 +69,16 @@ static void kvm_pit_save(QEMUFile *f, void *opaque) static int kvm_pit_load(QEMUFile *f, void *opaque, int version_id) { PITState *s = opaque; -struct kvm_pit_state pit; +struct kvm_pit_state2 pit2;
[PATCH 5/5] Kernel changes for HPET legacy support(v8)
When kvm is in hpet_legacy_mode, the hpet is providing the timer interrupt and the pit should not be. So in legacy mode, the pit timer is destroyed, but the *state* of the pit is maintained. So if kvm or the guest tries to modify the state of the pit, this modification is accepted, *except* that the timer isn't actually started. When we exit hpet_legacy_mode, the current state of the pit (which is up to date since we've been accepting modifications) is used to restart the pit timer. The saved_mode code in kvm_pit_load_count temporarily changes mode to 0xff in order to destroy the timer, but then restores the actual value, again maintaining "current" state of the pit for possible later reenablement. Changes from v7: - added kvm_pit_state2 struct with flags field - replaced hpet legacy mode ioctl with get/set pit2 ioctl changes from v6: - added ioctl interface for legacy mode in order not to break the abi. Signed-off-by: Beth Kon --- arch/x86/include/asm/kvm.h |8 ++ arch/x86/kvm/i8254.c | 22 ++--- arch/x86/kvm/i8254.h |3 +- arch/x86/kvm/x86.c | 55 +++- include/linux/kvm.h|6 5 files changed, 88 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 708b9c3..f5554dd 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -18,6 +18,7 @@ #define __KVM_HAVE_GUEST_DEBUG #define __KVM_HAVE_MSIX #define __KVM_HAVE_MCE +#define __KVM_HAVE_PIT_STATE2 /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 @@ -237,6 +238,13 @@ struct kvm_pit_state { struct kvm_pit_channel_state channels[3]; }; +#define KPIT_FLAGS_HPET_LEGACY 0x0001 + +struct kvm_pit_state2 { + struct kvm_pit_channel_state channels[3]; + __u32 flags; +}; + struct kvm_reinject_control { __u8 pit_reinject; __u8 reserved[31]; diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 6e0a203..0b0a761 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -329,20 +329,33 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) case 1: /* FIXME: enhance mode 4 precision */ case 4: - create_pit_timer(ps, val, 0); + if (!(ps->flags & KPIT_FLAGS_HPET_LEGACY)) { + create_pit_timer(ps, val, 0); + } break; case 2: case 3: - create_pit_timer(ps, val, 1); + if (!(ps->flags & KPIT_FLAGS_HPET_LEGACY)){ + create_pit_timer(ps, val, 1); + } break; default: destroy_pit_timer(&ps->pit_timer); } } -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start) { - pit_load_count(kvm, channel, val); + u8 saved_mode; + if (hpet_legacy_start) { + /* save existing mode for later reenablement */ + saved_mode = kvm->arch.vpit->pit_state.channels[0].mode; + kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */ + pit_load_count(kvm, channel, val); + kvm->arch.vpit->pit_state.channels[0].mode = saved_mode; + } else { + pit_load_count(kvm, channel, val); + } } static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev) @@ -546,6 +559,7 @@ void kvm_pit_reset(struct kvm_pit *pit) struct kvm_kpit_channel_state *c; mutex_lock(&pit->pit_state.lock); + pit->pit_state.flags = 0; for (i = 0; i < 3; i++) { c = &pit->pit_state.channels[i]; c->mode = 0xff; diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index b267018..d4c1c7f 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -21,6 +21,7 @@ struct kvm_kpit_channel_state { struct kvm_kpit_state { struct kvm_kpit_channel_state channels[3]; + u32 flags; struct kvm_timer pit_timer; bool is_periodic; u32speaker_data_on; @@ -49,7 +50,7 @@ struct kvm_pit { #define KVM_PIT_CHANNEL_MASK 0x3 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val); +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm); void kvm_pit_reset(struct kvm_pit *pit); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index af53f64..aa75466 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1196,6 +1196,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IR
[PATCH 3/5] BIOS changes for qemu-kvm hpet support (v8)
Advertise HPET in ACPI HPET table Signed-off-by: Beth Kon Signed-off-by: Avi Kivity --- kvm/bios/acpi-dsdt.dsl |2 -- kvm/bios/rombios32.c | 11 +++ 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index 3560baa..26fc7ad 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -194,7 +194,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -214,7 +213,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 9e5370e..110d130 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1527,8 +1527,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1717,13 +1717,11 @@ void acpi_bios_init(void) addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif /* RSDP */ memset(rsdp, 0, sizeof(*rsdp)); @@ -1901,7 +1899,6 @@ void acpi_bios_init(void) } /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1910,7 +1907,6 @@ void acpi_bios_init(void) hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); -#endif #endif @@ -1920,8 +1916,7 @@ void acpi_bios_init(void) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr); rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(madt_addr); #ifdef BX_QEMU -/* No HPET (yet) */ -// rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); acpi_additional_tables(); /* resets cfg to required entry */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] Userspace changes for irq0->inti2 override support (v8)
Select irq0->irq2 override based on kernel gsi routing availability If the kernel does not support gsi routing, we cannot do the irq0->irq2 override, so disable it in that case. Signed-off-by: Beth Kon Signed-off-by: Avi Kivity --- hw/ioapic.c|6 +++--- hw/pc.c|2 ++ qemu-kvm-x86.c |6 +- qemu-kvm.h |2 ++ sysemu.h |1 + vl.c | 11 +-- 6 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index a7a5ef9..c894b72 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/hw/pc.c b/hw/pc.c index 05d05e0..043a0da 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -55,6 +55,7 @@ #define BIOS_CFG_IOPORT 0x510 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0) #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1) +#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2) #define MAX_IDE_BUS 2 @@ -476,6 +477,7 @@ static void bochs_bios_init(void) fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, acpi_tables_len); +fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, &irq0override, 1); smbios_table = smbios_get_table(&smbios_len); if (smbios_table) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index a78073e..f7c66d1 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -1561,7 +1561,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index eb99bc4..b044ead 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -167,6 +167,7 @@ int kvm_has_sync_mmu(void); #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); #else @@ -175,6 +176,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) #define qemu_kvm_cpu_stop(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 2824b0d..5b42506 100644 --- a/sysemu.h +++ b/sysemu.h @@ -111,6 +111,7 @@ extern int xenfb_enabled; extern int graphic_width; extern int graphic_height; extern int graphic_depth; +extern uint8_t irq0override; extern DisplayType display_type; extern const char *keyboard_layout; extern int win2k_install_hack; diff --git a/vl.c b/vl.c index df583b7..d8b7198 100644 --- a/vl.c +++ b/vl.c @@ -255,6 +255,7 @@ int no_reboot = 0; int no_shutdown = 0; int cursor_hide = 1; int graphic_rotate = 0; +uint8_t irq0override = 1; #ifndef _WIN32 int daemonize = 0; #endif @@ -6199,8 +6200,14 @@ int main(int argc, char **argv, char **envp) module_call_init(MODULE_INIT_DEVICE); -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { +irq0override = 0; +} +#endif +} machine->init(ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] BIOS changes for irq0->inti2 override (v8)
bios: allow qemu to configure irq0->inti2 override Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). Signed-off-by: Beth Kon Signed-off-by: Avi Kivity --- kvm/bios/rombios32.c | 67 ++ 1 files changed, 51 insertions(+), 16 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 0369111..9e5370e 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -446,6 +446,9 @@ uint32_t cpuid_features; uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) +#define QEMU_CFG_IRQ0_OVERRIDE (QEMU_CFG_ARCH_LOCAL + 2) int qemu_cfg_port; @@ -555,6 +559,17 @@ uint64_t qemu_cfg_get64 (void) } #endif +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1153,7 +1168,14 @@ static void mptable_init(void) putstr(&q, "0.1 "); /* vendor id */ putle32(&q, 0); /* OEM table ptr */ putle16(&q, 0); /* OEM table size */ +#ifdef BX_QEMU +if (irq0_override) +putle16(&q, MAX_CPUS + 17); /* entry count */ +else +putle16(&q, MAX_CPUS + 18); /* entry count */ +#else putle16(&q, MAX_CPUS + 18); /* entry count */ +#endif putle32(&q, 0xfee0); /* local APIC addr */ putle16(&q, 0); /* ext table length */ putb(&q, 0); /* ext table checksum */ @@ -1197,6 +1219,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1204,7 +1233,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1768,23 +1802,21 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); -#ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +int_override = (struct madt_int_override*)(io_apic); +#ifdef BX_QEMU +if (irq0_override) { +memset(int_override, 0, sizeof(*int_override)); +int_override->type = APIC_XRUPT_OVERRIDE; +int_override->length = sizeof(*int_override); +int_override->source = 0; +int_override->gsi = 2; +int_override->flags = 0; /* conforms to bus specifications */ +int_override++; +} #endif - -int_override = (struct madt_int_override*)(io_apic + 1); -for ( i = 0; i < 16; i++ ) { -if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +for (i = 0; i < 16; i++) { +if (PCI_ISA_IRQ_MASK & (1U << i)) { memset(int_override, 0, sizeof(*int_override)); int_override->type
Re: [PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)
Avi Kivity wrote: On 06/22/2009 12:14 PM, Jan Kiszka wrote: Hmm, stead of introducing a new pair of singe-purpose IOCTLs, why not add KVM_GET/SET_PIT2 which exchanges an extended kvm_pit_state2. And that struct should also include some flags field and enough padding to be potentially extended yet again in the future. In that case I see no problem having also a mode read-back interface. We'd only add kernel hpet if we were forced to (I imagine the same applications/kernels that forced the PIT into the kernel will do the same for HPET). Answer and citation does not yet correlate for me. Misquote. I meant to reply to your 'Is it planned to add in-kernel hpet support?' question. Must be early in the morning in some timezone. Could you comment more explicitly if your are fine with Beth's proposed interface, rather prefer something like my suggestion or even want something totally different? GET/SET PIT2 looks like the best choice to me, at least until I find whoever designed the HPET/PIT interdependency and make him take it back. It seems to me that GET/SET PIT2 adds a good deal of complexity without any gain. PIT is not a very dynamic technology. GET/SET PIT works for PIT operational needs as defined by the PIT specifications. This whole problem existe because of the unfortunate requirement that hpet disable PIT interrupts, which is quite outside normal PIT operation. If I create a GET/SET PIT2, and a PITState2 that is a superset of PITState1, then I need to address all the cases where PITState is currently set/referenced and properly use PITState2/PITState, depending on whether the kernel supports PITState2. It just seems unnecessary, since hpet_legacy, and probably any other future "control" of the PIT will likely be outside of "normal" PIT operation. I really think a separate ioctl that transfers just control information (of which one of the flag bits would be hpet_legacy) would be preferable and cleaner. Am I missing some advantage to PITState2? Or is there some simple way to implement this that I'm missing? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)
Jan Kiszka wrote: Beth Kon wrote: When kvm is in hpet_legacy_mode, the hpet is providing the timer interrupt and the pit should not be. So in legacy mode, the pit timer is destroyed, but the *state* of the pit is maintained. So if kvm or the guest tries to modify the state of the pit, this modification is accepted, *except* that the timer isn't actually started. When we exit hpet_legacy_mode, the current state of the pit (which is up to date since we've been accepting modifications) is used to restart the pit timer. The saved_mode code in kvm_pit_load_count temporarily changes mode to 0xff in order to destroy the timer, but then restores the actual value, again maintaining "current" state of the pit for possible later reenablement. changes from v6: - Added ioctl interface for legacy mode in order not to break the abi. Signed-off-by: Beth Kon ... @@ -1986,7 +1987,24 @@ static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) int r = 0; memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); - kvm_pit_load_count(kvm, 0, ps->channels[0].count); + kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0); + return r; +} + +static int kvm_vm_ioctl_get_hpet_legacy_mode(struct kvm *kvm, u8 *mode) +{ + int r = 0; + *mode = kvm->arch.vpit->pit_state.hpet_legacy_mode; + return r; +} This only applies if we go for a separate mode IOCTL: The legacy mode is not directly modifiable by the guest. Is it planned to add in-kernel hpet support? Otherwise get_hpet_legacy_mode looks a bit like overkill given that user space could easily track the state. Assuming I will at least generalize the ioctl, I'll leave this question for the time being. + +static int kvm_vm_ioctl_set_hpet_legacy_mode(struct kvm *kvm, u8 *mode) +{ + int r = 0, start = 0; + if (kvm->arch.vpit->pit_state.hpet_legacy_mode == 0 && *mode == 1) Here you check more mode == 1, but legacy_mode is only checked for != 0. I would make this consistent. ok + start = 1; + kvm->arch.vpit->pit_state.hpet_legacy_mode = *mode; + kvm_pit_load_count(kvm, 0, kvm->arch.vpit->pit_state.channels[0].count, start); return r; } @@ -2047,6 +2065,7 @@ long kvm_arch_vm_ioctl(struct file *filp, struct kvm_pit_state ps; struct kvm_memory_alias alias; struct kvm_pit_config pit_config; + u8 hpet_legacy_mode; Hmm, stead of introducing a new pair of singe-purpose IOCTLs, why not add KVM_GET/SET_PIT2 which exchanges an extended kvm_pit_state2. And that struct should also include some flags field and enough padding to be potentially extended yet again in the future. In that case I see no problem having also a mode read-back interface. I thought about that, but it seemed to add unnecessary complexity, since this legacy control is really outside of normal PIT operation, which is embodied by KVM_GET/SET_PIT. It might be worth making this ioctl more general. Rather than SET_HPET_LEGACY, have SET_PIT_CONTROLS and pass a bit field, one of which is HPET_LEGACY. But, if general consensus is that it would be better to create a kvm_pit_state2, and get/set that, I can do that. Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2][RFC] Userspace changes for KVM HPET (v7)
This patch series must be applied on top of the hpet branch. The big change here is handling of enabling/disabling of hpet legacy mode. When hpet enters legacy mode, the spec says that the pit stops generating interrupts. In practice, we want to stop the pit periodic timer from running because it is wasteful in a virtual environment. We also have to worry about the hpet leaving legacy mode (which, at least in linux, happens only during a shutdown or crash). At this point, according to the hpet spec, PIT interrupts need to be reenabled. For us, it means the PIT timer needs to be restarted. This patch handles this situation better than the earlier versions by coming closer to just disabling PIT interrupts. It allows the PIT state to change if the OS modifies it, even while PIT is disabled, but does not allow a pit timer to start. Then if HPET legacy mode is disabled, whatever the PIT state is at that point, the PIT timer is restarted accordingly. Changes from v6: - added ioctl interface for setting hpet legacy mode in kernel pit - moved check for hpet_legacy_mode in pit_load_count to allow state info to be copied before returning if legacy mode is enabled. - sprinkled in some #ifdef TARGET_I386 Signed-off-by: Beth Kon diff --git a/hw/hpet.c b/hw/hpet.c index 29db325..2f5255f 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int version_id) qemu_get_timer(f, s->timer[i].qemu_timer); } } +if (hpet_in_legacy_mode()) { +hpet_disable_pit(); +} return 0; } @@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_disable_pit(); +dprintf("qemu: hpet disabled pit\n"); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); } break; case HPET_CFG + 4: @@ -554,13 +559,16 @@ static void hpet_reset(void *opaque) { /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */ s->capability = 0x8086a201ULL; s->capability |= ((HPET_CLK_PERIOD) << 32); -if (count > 0) +s->config = 0ULL; +if (count > 0) { /* we don't enable pit when hpet_reset is first called (by hpet_init) * because hpet is taking over for pit here. On subsequent invocations, * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); +} count = 1; } diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c index 8390d75..76ce6f2 100644 --- a/hw/i8254-kvm.c +++ b/hw/i8254-kvm.c @@ -36,6 +36,7 @@ static void kvm_pit_save(QEMUFile *f, void *opaque) struct kvm_pit_state pit; struct kvm_pit_channel_state *c; struct PITChannelState *sc; +__u8 hpet_legacy_mode; int i; kvm_get_pit(kvm_context, &pit); @@ -59,6 +60,10 @@ static void kvm_pit_save(QEMUFile *f, void *opaque) } pit_save(f, s); +if (kvm_has_hpet_legacy_mode(kvm_context)) { +kvm_get_hpet_legacy_mode(kvm_context, &hpet_legacy_mode); +qemu_put_8s(f, &hpet_legacy_mode); +} } static int kvm_pit_load(QEMUFile *f, void *opaque, int version_id) @@ -67,6 +72,7 @@ static int kvm_pit_load(QEMUFile *f, void *opaque, int version_id) struct kvm_pit_state pit; struct kvm_pit_channel_state *c; struct PITChannelState *sc; +__u8 hpet_legacy_mode; int i; pit_load(f, s, version_id); @@ -89,8 +95,13 @@ static int kvm_pit_load(QEMUFile *f, void *opaque, int version_id) c->count_load_time = sc->count_load_time; } -kvm_set_pit(kvm_context, &pit); +if (kvm_has_hpet_legacy_mode(kvm_context)) { +qemu_get_8s(f, &hpet_legacy_mode); +kvm_get_hpet_legacy_mode(kvm_context, &hpet_legacy_mode); +} +kvm_set_hpet_legacy_mode(kvm_context, &hpet_legacy_mode); +kvm_set_pit(kvm_context, &pit); return 0; } diff --git a/hw/i8254.c b/hw/i8254.c index 2f229f9..0136c64 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -25,6 +25,7 @@ #include "pc.h" #include "isa.h" #include "qemu-timer.h" +#include "qemu-kvm.h" #include "i8254.h" //#define DEBUG_PIT @@ -202,6 +203,11 @@ static inline void pit_load_count(PITChannelState *s, int val) val = 0x1;
[PATCH 0/2][RFC] Completing HPET in KVM (v7)
There is a problem in the latest git with savevm (it aborts). So I've been unable to test savevm with these patches, but am submitting them RFC. Everything else has been tested, including compatibility testing between old/new kernel/userspace combinations. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)
When kvm is in hpet_legacy_mode, the hpet is providing the timer interrupt and the pit should not be. So in legacy mode, the pit timer is destroyed, but the *state* of the pit is maintained. So if kvm or the guest tries to modify the state of the pit, this modification is accepted, *except* that the timer isn't actually started. When we exit hpet_legacy_mode, the current state of the pit (which is up to date since we've been accepting modifications) is used to restart the pit timer. The saved_mode code in kvm_pit_load_count temporarily changes mode to 0xff in order to destroy the timer, but then restores the actual value, again maintaining "current" state of the pit for possible later reenablement. changes from v6: - Added ioctl interface for legacy mode in order not to break the abi. Signed-off-by: Beth Kon diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 708b9c3..25cae50 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -18,6 +18,7 @@ #define __KVM_HAVE_GUEST_DEBUG #define __KVM_HAVE_MSIX #define __KVM_HAVE_MCE +#define __KVM_HAVE_HPET_LEGACY_MODE /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 331705f..02de293 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -329,21 +329,32 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) case 1: /* FIXME: enhance mode 4 precision */ case 4: - create_pit_timer(ps, val, 0); + if (!ps->hpet_legacy_mode) + create_pit_timer(ps, val, 0); break; case 2: case 3: - create_pit_timer(ps, val, 1); + if (!ps->hpet_legacy_mode) + create_pit_timer(ps, val, 1); break; default: destroy_pit_timer(&ps->pit_timer); } } -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start) { + u8 saved_mode; mutex_lock(&kvm->arch.vpit->pit_state.lock); - pit_load_count(kvm, channel, val); + if (hpet_legacy_start) { + /* save existing mode for later reenablement */ + saved_mode = kvm->arch.vpit->pit_state.channels[0].mode; + kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */ + pit_load_count(kvm, channel, val); + kvm->arch.vpit->pit_state.channels[0].mode = saved_mode; + } else { + pit_load_count(kvm, channel, val); + } mutex_unlock(&kvm->arch.vpit->pit_state.lock); } @@ -548,6 +559,7 @@ void kvm_pit_reset(struct kvm_pit *pit) struct kvm_kpit_channel_state *c; mutex_lock(&pit->pit_state.lock); + pit->pit_state.hpet_legacy_mode = 0; for (i = 0; i < 3; i++) { c = &pit->pit_state.channels[i]; c->mode = 0xff; diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index b267018..b5967ca 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -21,6 +21,7 @@ struct kvm_kpit_channel_state { struct kvm_kpit_state { struct kvm_kpit_channel_state channels[3]; + u8 hpet_legacy_mode; struct kvm_timer pit_timer; bool is_periodic; u32speaker_data_on; @@ -49,7 +50,7 @@ struct kvm_pit { #define KVM_PIT_CHANNEL_MASK 0x3 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val); +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm); void kvm_pit_reset(struct kvm_pit *pit); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6025e5b..8562eeb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1134,6 +1134,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IRQFD: case KVM_CAP_PIT2: + case KVM_CAP_HPET_LEGACY_MODE: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -1986,7 +1987,24 @@ static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) int r = 0; memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); - kvm_pit_load_count(kvm, 0, ps->channels[0].count); + kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0); + return r; +} + +static int kvm_vm_ioctl_get_hpet_legacy_mode(struct kvm *kvm, u8 *mode) +{ + int r = 0; + *mode = kvm->arch.vpit->pit_state.hpet_legacy_mode; + return r; +} + +static int kvm_vm_ioctl_set_hpet_legacy_mode(struct kvm *kvm,
ioctl number overlap?
kvm.h has #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) and #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) Seems that these could conflict? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Userspace changes for KVM HPET (v6)
The big change here is handling of enabling/disabling of hpet legacy mode. When hpet enters legacy mode, the spec says that the pit stops generating interrupts. In practice, we want to stop the pit periodic timer from running because it is wasteful in a virtual environment. We also have to worry about the hpet leaving legacy mode (which, at least in linux, happens only during a shutdown or crash). At this point, according to the hpet spec, PIT interrupts need to be reenabled. For us, it means the PIT timer needs to be restarted. This patch handles this situation better than the previous version by coming closer to just disabling PIT interrupts. It allows the PIT state to change if the OS modifies it, even while PIT is disabled, but does not allow a pit timer to start. Then if HPET legacy mode is disabled, whatever the PIT state is at that point, the PIT timer is restarted accordingly. Signed-off-by: Beth Kon --- diff --git a/hw/hpet.c b/hw/hpet.c index 29db325..043b92b 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int version_id) qemu_get_timer(f, s->timer[i].qemu_timer); } } +if (hpet_in_legacy_mode()) { +hpet_disable_pit(); +} return 0; } @@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_disable_pit(); +dprintf("qemu: hpet disabled pit\n"); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); } break; case HPET_CFG + 4: @@ -554,13 +559,15 @@ static void hpet_reset(void *opaque) { /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */ s->capability = 0x8086a201ULL; s->capability |= ((HPET_CLK_PERIOD) << 32); -if (count > 0) +if (count > 0) { /* we don't enable pit when hpet_reset is first called (by hpet_init) * because hpet is taking over for pit here. On subsequent invocations, * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); +} count = 1; } diff --git a/hw/i8254.c b/hw/i8254.c index 2f229f9..8c8076f 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -25,6 +25,7 @@ #include "pc.h" #include "isa.h" #include "qemu-timer.h" +#include "qemu-kvm.h" #include "i8254.h" //#define DEBUG_PIT @@ -198,6 +199,9 @@ int pit_get_mode(PITState *pit, int channel) static inline void pit_load_count(PITChannelState *s, int val) { +if (s->channel == 0 && pit_state.hpet_legacy_mode) { +return; +} if (val == 0) val = 0x1; s->count_load_time = qemu_get_clock(vm_clock); @@ -371,10 +375,11 @@ static void pit_irq_timer_update(PITChannelState *s, int64_t current_time) (double)(expire_time - current_time) / ticks_per_sec); #endif s->next_transition_time = expire_time; -if (expire_time != -1) +if (expire_time != -1) { qemu_mod_timer(s->irq_timer, expire_time); -else +} else { qemu_del_timer(s->irq_timer); +} } static void pit_irq_timer(void *opaque) @@ -451,6 +456,7 @@ void pit_reset(void *opaque) PITChannelState *s; int i; +pit->hpet_legacy_mode = 0; for(i = 0;i < 3; i++) { s = &pit->channels[i]; s->mode = 3; @@ -460,32 +466,43 @@ void pit_reset(void *opaque) } /* When HPET is operating in legacy mode, i8254 timer0 is disabled */ -void hpet_pit_disable(void) { -PITChannelState *s; -s = &pit_state.channels[0]; -if (s->irq_timer) -qemu_del_timer(s->irq_timer); + +void hpet_disable_pit(void) +{ +PITChannelState *s = &pit_state.channels[0]; +if (qemu_kvm_pit_in_kernel()) { +kvm_hpet_disable_kpit(); +} else { +if (s->irq_timer) { +qemu_del_timer(s->irq_timer); +} +} } /* When HPET is reset or leaving legacy mode, it must reenable i8254 * timer 0 */ -void hpet_pit_enable(void) +void hpet_enable_pit(void) { PITState *pit = &pit_state; -PITChannelState *s; -s = &pit->channels[0]; -s->mode = 3; -s->gate = 1; -pit_load_count(s, 0); +PITChannelState *s = &pit->channels[0]; +if (qemu_kvm_pit_i
[PATCH 5/5] HPET interaction with in-kernel PIT (v6)
Signed-off-by: Beth Kon --- arch/x86/include/asm/kvm.h |1 + arch/x86/kvm/i8254.c | 24 +++- arch/x86/kvm/i8254.h |3 ++- arch/x86/kvm/x86.c |5 - 4 files changed, 26 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 708b9c3..3c44923 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -235,6 +235,7 @@ struct kvm_guest_debug_arch { struct kvm_pit_state { struct kvm_pit_channel_state channels[3]; + u8 hpet_legacy_mode; }; struct kvm_reinject_control { diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 331705f..bb8382b 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -340,10 +340,20 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) } } -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start) { + u8 saved_mode; mutex_lock(&kvm->arch.vpit->pit_state.lock); - pit_load_count(kvm, channel, val); + if (hpet_legacy_start) { + /* save existing mode for later reenablement */ + saved_mode = kvm->arch.vpit->pit_state.channels[0].mode; + kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */ + pit_load_count(kvm, channel, val); + kvm->arch.vpit->pit_state.channels[0].mode = saved_mode; + } else { + if (!(channel == 0 && kvm->arch.vpit->pit_state.hpet_legacy_mode)) + pit_load_count(kvm, channel, val); + } mutex_unlock(&kvm->arch.vpit->pit_state.lock); } @@ -411,17 +421,20 @@ static void pit_ioport_write(struct kvm_io_device *this, switch (s->write_state) { default: case RW_STATE_LSB: - pit_load_count(kvm, addr, val); + if (!(addr == 0 && pit_state->hpet_legacy_mode)) + pit_load_count(kvm, addr, val); break; case RW_STATE_MSB: - pit_load_count(kvm, addr, val << 8); + if (!(addr == 0 && pit_state->hpet_legacy_mode)) + pit_load_count(kvm, addr, val << 8); break; case RW_STATE_WORD0: s->write_latch = val; s->write_state = RW_STATE_WORD1; break; case RW_STATE_WORD1: - pit_load_count(kvm, addr, s->write_latch | (val << 8)); + if (!(addr == 0 && pit_state->hpet_legacy_mode)) + pit_load_count(kvm, addr, s->write_latch | (val << 8)); s->write_state = RW_STATE_WORD0; break; } @@ -548,6 +561,7 @@ void kvm_pit_reset(struct kvm_pit *pit) struct kvm_kpit_channel_state *c; mutex_lock(&pit->pit_state.lock); + pit->pit_state.hpet_legacy_mode = 0; for (i = 0; i < 3; i++) { c = &pit->pit_state.channels[i]; c->mode = 0xff; diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index b267018..b5967ca 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -21,6 +21,7 @@ struct kvm_kpit_channel_state { struct kvm_kpit_state { struct kvm_kpit_channel_state channels[3]; + u8 hpet_legacy_mode; struct kvm_timer pit_timer; bool is_periodic; u32speaker_data_on; @@ -49,7 +50,7 @@ struct kvm_pit { #define KVM_PIT_CHANNEL_MASK 0x3 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val); +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm); void kvm_pit_reset(struct kvm_pit *pit); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b91ea7..3c70545 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1948,9 +1948,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps) static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) { int r = 0; + int hpet_legacy_start = 0; + if (ps->hpet_legacy_mode && !kvm->arch.vpit->pit_state.hpet_legacy_mode) + hpet_legacy_start = 1; memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); - kvm_pit_load_count(kvm, 0, ps->channels[0].count); + kvm_pit_load_count(kvm, 0, ps->channels[0].count, hpet_leg
[PATCH 3/5] BIOS changes for KVM HPET (v6)
Signed-off-by: Beth Kon --- kvm/bios/acpi-dsdt.dsl |2 -- kvm/bios/rombios32.c | 11 +++ 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index db57307..71d0a5e 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -296,7 +296,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -316,7 +315,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 9d6910e..1106f38 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1518,8 +1518,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1703,13 +1703,11 @@ void acpi_bios_init(void) addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif /* RSDP */ memset(rsdp, 0, sizeof(*rsdp)); @@ -1883,7 +1881,6 @@ void acpi_bios_init(void) } /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1892,7 +1889,6 @@ void acpi_bios_init(void) hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); -#endif acpi_additional_tables(); /* resets cfg to required entry */ for(i = 0; i < external_tables; i++) { @@ -1912,8 +1908,7 @@ void acpi_bios_init(void) /* kvm has no ssdt (processors are in dsdt) */ // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr); #ifdef BX_QEMU -/* No HPET (yet) */ -// rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); #endif -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] Userspace changes for configuring irq0->inti2 override (v6)
These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). Changes from v3: - changes based on comments from Avi and Gleb. - corrected legacy enable/disable for in-kernel PIT. The code now best approximates a multiplexer that disables PIT interrupts when HPET is in legacy mode (as described by HPET spec). Any changes to the PIT that may occur while HPET is operating in legacy mode are saved, so if HPET leaves legacy mode, the PIT is just reenabled, with mode set to whatever the last setting from guest was. Legacy mode is disabled at least during crash and shutdown (in Linux), so this needs to be handled properly. Changes from v4: - Modify mp_table entry count depending on whether irq_override is enabled. Signed-off-by: Beth Kon --- kvm/bios/rombios32.c | 67 ++ 1 files changed, 51 insertions(+), 16 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 7db91d8..d6886ee 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -446,6 +446,9 @@ uint32_t cpuid_features; uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) +#define QEMU_CFG_IRQ0_OVERRIDE (QEMU_CFG_ARCH_LOCAL + 2) int qemu_cfg_port; @@ -555,6 +559,17 @@ uint64_t qemu_cfg_get64 (void) } #endif +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1153,7 +1168,14 @@ static void mptable_init(void) putstr(&q, "0.1 "); /* vendor id */ putle32(&q, 0); /* OEM table ptr */ putle16(&q, 0); /* OEM table size */ +#ifdef BX_QEMU +if (irq0_override) +putle16(&q, MAX_CPUS + 17); /* entry count */ +else +putle16(&q, MAX_CPUS + 18); /* entry count */ +#else putle16(&q, MAX_CPUS + 18); /* entry count */ +#endif putle32(&q, 0xfee0); /* local APIC addr */ putle16(&q, 0); /* ext table length */ putb(&q, 0); /* ext table checksum */ @@ -1197,6 +1219,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1204,7 +1233,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1760,23 +1794,21 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); -#ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +int_override = (struct madt_int_override*)(io_apic); +#ifdef BX_QEMU +if (irq0_override) { +memset(int_o
[PATCH 2/5] Userspace changes for configuring irq0->inti2 override (v6)
Signed-off-by: Beth Kon --- hw/ioapic.c|6 +++--- hw/pc.c|2 ++ qemu-kvm-x86.c |6 +- qemu-kvm.h |2 ++ sysemu.h |1 + vl.c | 11 +-- 6 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 6c178c7..a67b766 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/hw/pc.c b/hw/pc.c index 66f4635..1c068fb 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -55,6 +55,7 @@ #define BIOS_CFG_IOPORT 0x510 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0) #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1) +#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2) #define MAX_IDE_BUS 2 @@ -476,6 +477,7 @@ static void bochs_bios_init(void) fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, acpi_tables_len); +fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, &irq0override, 1); smbios_table = smbios_get_table(&smbios_len); if (smbios_table) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 5526d8f..89337e9 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -909,7 +909,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index fa40542..6bbafbc 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -169,6 +169,7 @@ int handle_tpr_access(void *opaque, kvm_vcpu_context_t vcpu, #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu() void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); @@ -177,6 +178,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_has_sync_mmu() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 47d001e..f78e974 100644 --- a/sysemu.h +++ b/sysemu.h @@ -108,6 +108,7 @@ extern int xenfb_enabled; extern int graphic_width; extern int graphic_height; extern int graphic_depth; +extern uint8_t irq0override; extern DisplayType display_type; extern const char *keyboard_layout; extern int win2k_install_hack; diff --git a/vl.c b/vl.c index 2fda17b..9b1d1ab 100644 --- a/vl.c +++ b/vl.c @@ -253,6 +253,7 @@ int no_reboot = 0; int no_shutdown = 0; int cursor_hide = 1; int graphic_rotate = 0; +uint8_t irq0override = 1; #ifndef _WIN32 int daemonize = 0; #endif @@ -6054,8 +6055,14 @@ int main(int argc, char **argv, char **envp) module_call_init(MODULE_INIT_DEVICE); -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { +irq0override = 0; +} +#endif +} machine->init(ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] BIOS changes for configuring irq0->inti2 override (v4)
Beth Kon wrote: Sebastian Herbszt wrote: Beth Kon wrote: These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). Changes from v3: - changes based on comments from Avi and Gleb. - corrected legacy enable/disable for in-kernel PIT. The code now best approximates a multiplexer that disables PIT interrupts when HPET is in legacy mode (as described by HPET spec). Any changes to the PIT that may occur while HPET is operating in legacy mode are saved, so if HPET leaves legacy mode, the PIT is just reenabled, with mode set to whatever the last setting from guest was. Legacy mode is disabled at least during crash and shutdown (in Linux), so this needs to be handled properly. --- kvm/bios/rombios32.c | 60 - 1 files changed, 44 insertions(+), 16 deletions(-) What about the mptable entry count? Think it would need something like #ifdef BX_QEMU if (irq0_override) putle16(&q, smp_cpus + 17); /* entry count */ else putle16(&q, smp_cpus + 18); /* entry count */ #else putle16(&q, smp_cpus + 18); /* entry count */ #endif Your patch "Fix non-ACPI Timer Interrupt Routing - v3" [1] included such a change. [1] http://lists.gnu.org/archive/html/qemu-devel/2009-04/msg01396.html Yes, I lost that somehow! Thanks (again!). Actually, it isn't that simple. That patch that you referred to was a qemu patch. But I still don't see it in qemu-patched bochs bios. Apparently, I did neglect to add it to the kvm bios patches that I had waiting. Anthony, do you know what happened to this patch? - Sebastian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] BIOS changes for configuring irq0->inti2 override (v4)
Sebastian Herbszt wrote: Beth Kon wrote: These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). Changes from v3: - changes based on comments from Avi and Gleb. - corrected legacy enable/disable for in-kernel PIT. The code now best approximates a multiplexer that disables PIT interrupts when HPET is in legacy mode (as described by HPET spec). Any changes to the PIT that may occur while HPET is operating in legacy mode are saved, so if HPET leaves legacy mode, the PIT is just reenabled, with mode set to whatever the last setting from guest was. Legacy mode is disabled at least during crash and shutdown (in Linux), so this needs to be handled properly. --- kvm/bios/rombios32.c | 60 - 1 files changed, 44 insertions(+), 16 deletions(-) What about the mptable entry count? Think it would need something like #ifdef BX_QEMU if (irq0_override) putle16(&q, smp_cpus + 17); /* entry count */ else putle16(&q, smp_cpus + 18); /* entry count */ #else putle16(&q, smp_cpus + 18); /* entry count */ #endif Your patch "Fix non-ACPI Timer Interrupt Routing - v3" [1] included such a change. [1] http://lists.gnu.org/archive/html/qemu-devel/2009-04/msg01396.html Yes, I lost that somehow! Thanks (again!). - Sebastian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Userspace changes for KVM HPET (v4)
The big change here is handling of enabling/disabling of hpet legacy mode. When hpet enters legacy mode, the spec says that the pit stops generating interrupts. In practice, we want to stop the pit periodic timer from running because it is wasteful in a virtual environment. We also have to worry about the hpet leaving legacy mode (which, at least in linux, happens only during a shutdown or crash). At this point, according to the hpet spec, PIT interrupts need to be reenabled. For us, it means the PIT timer needs to be restarted. This patch handles this situation better than the previous version by coming closer to just disabling PIT interrupts. It allows the PIT state to change if the OS modifies it, even while PIT is disabled, but does not allow a pit timer to start. Then if HPET legacy mode is disabled, whatever the PIT state is at that point, the PIT timer is restarted accordingly. Signed-off-by: Beth Kon --- hw/hpet.c | 15 +++ hw/i8254.c| 43 ++- hw/i8254.h|2 ++ hw/pc.h |4 ++-- kvm/include/x86/asm/kvm.h |1 + qemu-kvm.c| 20 qemu-kvm.h|3 ++- vl.c |7 ++- 8 files changed, 74 insertions(+), 21 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 29db325..043b92b 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int version_id) qemu_get_timer(f, s->timer[i].qemu_timer); } } +if (hpet_in_legacy_mode()) { +hpet_disable_pit(); +} return 0; } @@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_disable_pit(); +dprintf("qemu: hpet disabled pit\n"); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); } break; case HPET_CFG + 4: @@ -554,13 +559,15 @@ static void hpet_reset(void *opaque) { /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */ s->capability = 0x8086a201ULL; s->capability |= ((HPET_CLK_PERIOD) << 32); -if (count > 0) +if (count > 0) { /* we don't enable pit when hpet_reset is first called (by hpet_init) * because hpet is taking over for pit here. On subsequent invocations, * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_enable_pit(); +dprintf("qemu: hpet enabled pit\n"); +} count = 1; } diff --git a/hw/i8254.c b/hw/i8254.c index 2f229f9..8c8076f 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -25,6 +25,7 @@ #include "pc.h" #include "isa.h" #include "qemu-timer.h" +#include "qemu-kvm.h" #include "i8254.h" //#define DEBUG_PIT @@ -198,6 +199,9 @@ int pit_get_mode(PITState *pit, int channel) static inline void pit_load_count(PITChannelState *s, int val) { +if (s->channel == 0 && pit_state.hpet_legacy_mode) { +return; +} if (val == 0) val = 0x1; s->count_load_time = qemu_get_clock(vm_clock); @@ -371,10 +375,11 @@ static void pit_irq_timer_update(PITChannelState *s, int64_t current_time) (double)(expire_time - current_time) / ticks_per_sec); #endif s->next_transition_time = expire_time; -if (expire_time != -1) +if (expire_time != -1) { qemu_mod_timer(s->irq_timer, expire_time); -else +} else { qemu_del_timer(s->irq_timer); +} } static void pit_irq_timer(void *opaque) @@ -451,6 +456,7 @@ void pit_reset(void *opaque) PITChannelState *s; int i; +pit->hpet_legacy_mode = 0; for(i = 0;i < 3; i++) { s = &pit->channels[i]; s->mode = 3; @@ -460,32 +466,43 @@ void pit_reset(void *opaque) } /* When HPET is operating in legacy mode, i8254 timer0 is disabled */ -void hpet_pit_disable(void) { -PITChannelState *s; -s = &pit_state.channels[0]; -if (s->irq_timer) -qemu_del_timer(s->irq_timer); + +void hpet_disable_pit(void) +{ +PITChannelState *s = &pit_state.channels[0]; +if (qemu_kvm_pit_in_kernel()) { +kvm_hpet_disable_kpit(); +} else { +if (s->irq_timer) { +qemu_del_time
[PATCH 5/5] HPET interaction with in-kernel PIT
Signed-off-by: Beth Kon --- arch/x86/include/asm/kvm.h |1 + arch/x86/kvm/i8254.c | 24 +++- arch/x86/kvm/i8254.h |3 ++- arch/x86/kvm/x86.c |5 - 4 files changed, 26 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 708b9c3..3c44923 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -235,6 +235,7 @@ struct kvm_guest_debug_arch { struct kvm_pit_state { struct kvm_pit_channel_state channels[3]; + u8 hpet_legacy_mode; }; struct kvm_reinject_control { diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 331705f..bb8382b 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -340,10 +340,20 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) } } -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start) { + u8 saved_mode; mutex_lock(&kvm->arch.vpit->pit_state.lock); - pit_load_count(kvm, channel, val); + if (hpet_legacy_start) { + /* save existing mode for later reenablement */ + saved_mode = kvm->arch.vpit->pit_state.channels[0].mode; + kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */ + pit_load_count(kvm, channel, val); + kvm->arch.vpit->pit_state.channels[0].mode = saved_mode; + } else { + if (!(channel == 0 && kvm->arch.vpit->pit_state.hpet_legacy_mode)) + pit_load_count(kvm, channel, val); + } mutex_unlock(&kvm->arch.vpit->pit_state.lock); } @@ -411,17 +421,20 @@ static void pit_ioport_write(struct kvm_io_device *this, switch (s->write_state) { default: case RW_STATE_LSB: - pit_load_count(kvm, addr, val); + if (!(addr == 0 && pit_state->hpet_legacy_mode)) + pit_load_count(kvm, addr, val); break; case RW_STATE_MSB: - pit_load_count(kvm, addr, val << 8); + if (!(addr == 0 && pit_state->hpet_legacy_mode)) + pit_load_count(kvm, addr, val << 8); break; case RW_STATE_WORD0: s->write_latch = val; s->write_state = RW_STATE_WORD1; break; case RW_STATE_WORD1: - pit_load_count(kvm, addr, s->write_latch | (val << 8)); + if (!(addr == 0 && pit_state->hpet_legacy_mode)) + pit_load_count(kvm, addr, s->write_latch | (val << 8)); s->write_state = RW_STATE_WORD0; break; } @@ -548,6 +561,7 @@ void kvm_pit_reset(struct kvm_pit *pit) struct kvm_kpit_channel_state *c; mutex_lock(&pit->pit_state.lock); + pit->pit_state.hpet_legacy_mode = 0; for (i = 0; i < 3; i++) { c = &pit->pit_state.channels[i]; c->mode = 0xff; diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index b267018..b5967ca 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -21,6 +21,7 @@ struct kvm_kpit_channel_state { struct kvm_kpit_state { struct kvm_kpit_channel_state channels[3]; + u8 hpet_legacy_mode; struct kvm_timer pit_timer; bool is_periodic; u32speaker_data_on; @@ -49,7 +50,7 @@ struct kvm_pit { #define KVM_PIT_CHANNEL_MASK 0x3 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); -void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val); +void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm); void kvm_pit_reset(struct kvm_pit *pit); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b91ea7..3c70545 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1948,9 +1948,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps) static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) { int r = 0; + int hpet_legacy_start = 0; + if (ps->hpet_legacy_mode && !kvm->arch.vpit->pit_state.hpet_legacy_mode) + hpet_legacy_start = 1; memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); - kvm_pit_load_count(kvm, 0, ps->channels[0].count); + kvm_pit_load_count(kvm, 0, ps->channels[0].count, hpet_leg
[PATCH 2/5] Userspace changes for configuring irq0->inti2 override (v4)
Signed-off-by: Beth Kon --- hw/ioapic.c|6 +++--- hw/pc.c|2 ++ qemu-kvm-x86.c |6 +- qemu-kvm.h |2 ++ sysemu.h |1 + vl.c | 11 +-- 6 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 6c178c7..a67b766 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/hw/pc.c b/hw/pc.c index 66f4635..1c068fb 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -55,6 +55,7 @@ #define BIOS_CFG_IOPORT 0x510 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0) #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1) +#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2) #define MAX_IDE_BUS 2 @@ -476,6 +477,7 @@ static void bochs_bios_init(void) fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables, acpi_tables_len); +fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, &irq0override, 1); smbios_table = smbios_get_table(&smbios_len); if (smbios_table) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 5526d8f..89337e9 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -909,7 +909,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index fa40542..6bbafbc 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -169,6 +169,7 @@ int handle_tpr_access(void *opaque, kvm_vcpu_context_t vcpu, #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu() void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); @@ -177,6 +178,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_has_sync_mmu() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 47d001e..f78e974 100644 --- a/sysemu.h +++ b/sysemu.h @@ -108,6 +108,7 @@ extern int xenfb_enabled; extern int graphic_width; extern int graphic_height; extern int graphic_depth; +extern uint8_t irq0override; extern DisplayType display_type; extern const char *keyboard_layout; extern int win2k_install_hack; diff --git a/vl.c b/vl.c index 2fda17b..9b1d1ab 100644 --- a/vl.c +++ b/vl.c @@ -253,6 +253,7 @@ int no_reboot = 0; int no_shutdown = 0; int cursor_hide = 1; int graphic_rotate = 0; +uint8_t irq0override = 1; #ifndef _WIN32 int daemonize = 0; #endif @@ -6054,8 +6055,14 @@ int main(int argc, char **argv, char **envp) module_call_init(MODULE_INIT_DEVICE); -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { +irq0override = 0; +} +#endif +} machine->init(ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] BIOS changes for KVM HPET (v5)
Signed-off-by: Beth Kon --- kvm/bios/acpi-dsdt.dsl |2 -- kvm/bios/rombios32.c | 11 +++ 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index db57307..71d0a5e 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -296,7 +296,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -316,7 +315,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 9d6910e..1106f38 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1518,8 +1518,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1703,13 +1703,11 @@ void acpi_bios_init(void) addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif /* RSDP */ memset(rsdp, 0, sizeof(*rsdp)); @@ -1883,7 +1881,6 @@ void acpi_bios_init(void) } /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1892,7 +1889,6 @@ void acpi_bios_init(void) hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); -#endif acpi_additional_tables(); /* resets cfg to required entry */ for(i = 0; i < external_tables; i++) { @@ -1912,8 +1908,7 @@ void acpi_bios_init(void) /* kvm has no ssdt (processors are in dsdt) */ // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr); #ifdef BX_QEMU -/* No HPET (yet) */ -// rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); #endif -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] BIOS changes for configuring irq0->inti2 override (v4)
These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). Changes from v3: - changes based on comments from Avi and Gleb. - corrected legacy enable/disable for in-kernel PIT. The code now best approximates a multiplexer that disables PIT interrupts when HPET is in legacy mode (as described by HPET spec). Any changes to the PIT that may occur while HPET is operating in legacy mode are saved, so if HPET leaves legacy mode, the PIT is just reenabled, with mode set to whatever the last setting from guest was. Legacy mode is disabled at least during crash and shutdown (in Linux), so this needs to be handled properly. --- kvm/bios/rombios32.c | 60 - 1 files changed, 44 insertions(+), 16 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 369cbef..9d6910e 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -444,6 +444,9 @@ uint32_t cpuid_features; uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) +#define QEMU_CFG_IRQ0_OVERRIDE (QEMU_CFG_ARCH_LOCAL + 2) int qemu_cfg_port; @@ -553,6 +557,17 @@ uint64_t qemu_cfg_get64 (void) } #endif +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1195,6 +1210,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1202,7 +1224,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1758,23 +1785,21 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); -#ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +int_override = (struct madt_int_override*)(io_apic); +#ifdef BX_QEMU +if (irq0_override) { +memset(int_override, 0, sizeof(*int_override)); +int_override->type = APIC_XRUPT_OVERRIDE; +int_override->length = sizeof(*int_override); +int_override->source = 0; +int_override->gsi = 2; +int_override->flags = 0; /* conforms to bus specifications */ +int_override++; +} #endif - -int_override = (struct madt_int_override*)(io_apic + 1); -for ( i = 0; i < 16; i++ ) { -if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +for (i = 0; i < 16; i++) { +if (PCI_ISA_IRQ_MASK & (1U << i)) { memset(int_override, 0, sizeof(*int_override)); int_override->type = APIC_XRUPT_OVERRIDE; int_override->length = sizeof(*int_override); @@ -2697,6 +2722,9 @@ void rombios32_init(uint32_t
qemu-kvm broken after ./configure --disable-kvm
Building latest git with ./configure --disable-kvm breaks with errors in pcspk.c -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Clean up MADT Table Creation (v2)
This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. There has been much ado over the acpi_bios_init function recently. I had actually done a rewrite very similar to Gleb's, but Avi argued that the rewrite has to be more incremental. This patch contains minimal changes without any rewrite because the changes are kvm-only. The rewrite would better be a separate step, submitted to qemu and then merged into kvm. I am submitting the RSDT fix to kvm because the kvm and qemu RSDT implementation differs. Again, as a separate rewrite effort, the kvm and qemu RSDT manipulation could be merged into one base as a later, separate step. This patch will get MADT into reasonable enough shape for me to resubmit hpet patches on top of it. After that, I'd be willing to submit incremental rewrite patches for acpi_bios_init to qemu, starting with MADT and RSDT. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 369cbef..cdae363 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -86,6 +86,8 @@ typedef unsigned long long uint64_t; #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) +#define MAX_INT_OVERRIDES 16 + static inline void outl(int addr, int val) { asm volatile ("outl %1, %w0" : : "d" (addr), "a" (val)); @@ -1600,7 +1602,7 @@ void acpi_bios_init(void) uint32_t hpet_addr; #endif uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, ssdt_addr; -uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size; +uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end; uint32_t srat_addr,srat_size; uint16_t i, external_tables; int nb_numa_nodes; @@ -1668,7 +1670,7 @@ void acpi_bios_init(void) madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU -sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */; +sizeof(struct madt_io_apic) + sizeof(struct madt_int_override) * MAX_INT_OVERRIDES; #else sizeof(struct madt_io_apic); #endif @@ -1786,8 +1788,9 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } +madt_end = (uint32_t)int_override; +madt_size = madt_end - madt_addr; acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Clean up RSDT Table Creation (v2)
This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. I moved the for() loop to the end of the code that adds table_offset_entry entries so I could add the check for overflow - || (nb_rsdt_entries > MAX_RSDT_ENTRIES) This is not ideal. An ideal fix would require a rewrite of the rsdt build code, which I can do later and submit to qemu. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cdae363..7db91d8 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1602,7 +1602,7 @@ void acpi_bios_init(void) uint32_t hpet_addr; #endif uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, ssdt_addr; -uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end; +uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end, rsdt_end; uint32_t srat_addr,srat_size; uint16_t i, external_tables; int nb_numa_nodes; @@ -1628,7 +1628,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1872,16 +1872,6 @@ void acpi_bios_init(void) "HPET", sizeof(*hpet), 1); #endif -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i < external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, &len) < 0) -BX_PANIC("Failed to load ACPI table from QEMU\n"); -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr >= ram_size) -BX_PANIC("ACPI table overflow\n"); -} #endif /* RSDT */ @@ -1894,9 +1884,19 @@ void acpi_bios_init(void) // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +for(i = 0; i < external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, &len) < 0) +BX_PANIC("Failed to load ACPI table from QEMU\n"); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); +addr += len; +if ((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) +BX_PANIC("ACPI table overflow\n"); +} #endif -rsdt_size -= MAX_RSDT_ENTRIES * 4; -rsdt_size += nb_rsdt_entries * 4; +rsdt_end = (uint32_t)(&rsdt->table_offset_entry[nb_rsdt_entries]); +rsdt_size = rsdt_end - rsdt_addr; acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", rsdt_size, 1); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hpet missing in qemu-kvm's acpi table
Jan Kiszka wrote: Hi, does qemu-kvm's bios intentionally refrain from reporting hpet support via acpi or is this a bug? It works nicely with upstream (tcg & kvm mode). Jan Hi Jan. HPET is not in qemu-kvm yet because there are some issues unique to qemu-kvm regarding disabling of the in-kernel PIT. I have patches ready to submit and should be able to do so next week (long story), so hopefully this will be resolved shortly. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Clean up MADT Table Creation
Avi Kivity wrote: Beth Kon wrote: This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. And to give credit where it is due, this cleanup is also related to the patch Marcelo provided when the HPET addition tripped over the same problem. (Thanks again Marcelo :-) This patch moves all the table layout calculations to the same area of acpi_bios_init. This prevents corruption problems when, in the middle of filling in the tables, the MADT table size grows. The idea is to do all the layout in one section, then fill things in afterwards. It also corrects a problem where the madt table was memset to 0 before the final size of the table had been determined. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..7f62e4f 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; You're just duplicating the override creation loop (with its internal if); if we update it, we'll have to update this too. Yep, that's a valid complaint. I'll resubmit shortly. Why not set madt_end = int_override and calculate madt_size = madt_end - madt? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git regression in configure
Avi Kivity wrote: Beth Kon wrote: Latest qemu-kvm.git fails with ./configure, and reverting 22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it. Works for me. What error do you get? ./configure: 1364: Syntax error: "(" unexpected (expecting "fi") -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm.git regression in configure
Latest qemu-kvm.git fails with ./configure, and reverting 22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it. Beth Kon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Clean up MADT Table Creation
This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. And to give credit where it is due, this cleanup is also related to the patch Marcelo provided when the HPET addition tripped over the same problem. (Thanks again Marcelo :-) This patch moves all the table layout calculations to the same area of acpi_bios_init. This prevents corruption problems when, in the middle of filling in the tables, the MADT table size grows. The idea is to do all the layout in one section, then fill things in afterwards. It also corrects a problem where the madt table was memset to 0 before the final size of the table had been determined. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..7f62e4f 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Clean up RSDT Table Creation
This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 7f62e4f..ac8f9c5 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1873,16 +1873,6 @@ void acpi_bios_init(void) "HPET", sizeof(*hpet), 1); #endif -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i < external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, &len) < 0) -BX_PANIC("Failed to load ACPI table from QEMU\n"); -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr >= ram_size) -BX_PANIC("ACPI table overflow\n"); -} #endif /* RSDT */ @@ -1895,6 +1885,19 @@ void acpi_bios_init(void) // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +/* external_tables load must occur last to + * properly check for MAX_RSDT_ENTRIES overflow. + */ +for(i = 0; i < external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, &len) < 0) +BX_PANIC("Failed to load ACPI table from QEMU\n"); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); +addr += len; +if((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) +BX_PANIC("ACPI table overflow\n"); +} #endif rsdt_size -= MAX_RSDT_ENTRIES * 4; rsdt_size += nb_rsdt_entries * 4; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subject:[PATCH 1/2] Clean up MADT Table Creation
Beth Kon wrote: This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon This should have been patch 2/2. I think git-send-email didn't like that I didn't have a space after Subject: . Let me try to resend with the space added. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subject:[PATCH 1/2] Clean up MADT Table Creation
This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 7f62e4f..ac8f9c5 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1873,16 +1873,6 @@ void acpi_bios_init(void) "HPET", sizeof(*hpet), 1); #endif -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i < external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, &len) < 0) -BX_PANIC("Failed to load ACPI table from QEMU\n"); -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr >= ram_size) -BX_PANIC("ACPI table overflow\n"); -} #endif /* RSDT */ @@ -1895,6 +1885,19 @@ void acpi_bios_init(void) // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +/* external_tables load must occur last to + * properly check for MAX_RSDT_ENTRIES overflow. + */ +for(i = 0; i < external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, &len) < 0) +BX_PANIC("Failed to load ACPI table from QEMU\n"); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); +addr += len; +if((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) +BX_PANIC("ACPI table overflow\n"); +} #endif rsdt_size -= MAX_RSDT_ENTRIES * 4; rsdt_size += nb_rsdt_entries * 4; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subject:[PATCH 1/2] Clean up MADT Table Creation
This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. And to give credit where it is due, this cleanup is also related to the patch Marcelo provided when the HPET addition tripped over the same problem. (Thanks again Marcelo :-) This patch moves all the table layout calculations to the same area of acpi_bios_init. This prevents corruption problems when, in the middle of filling in the tables, the MADT table size grows. The idea is to do all the layout in one section, then fill things in afterwards. It also corrects a problem where the madt table was memset to 0 before the final size of the table had been determined. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..7f62e4f 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable
Marcelo Tosatti wrote: Beth, On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote: Anthony Liguori wrote: Vincent Minet wrote: External ACPI tables are counted twice for the RSDT size and the load address for the first external table is in the MADT (interrupt override entries are overwritten). Signed-off-by: Vincent Minet Beth, I think you had a patch attempting to address the same issue. It was a bit more involved though. Which is the proper fix and are they both to the same problem? They are for 2 different bases. My patch was for qemu's bochs bios and this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in this area of setting up the ACPI tables. My patch is still needed for the qemu base. I hope we'll be getting to one base soon :-) Assuming the intent of the code was for MAX_RSDT_ENTRIES to include external_tables, this patch looks correct. I think one additional check would be needed (in my patch) to make sure that the code doesn't exceed MAX_RSDT_ENTRIES when the external tables are being loaded. My patch also puts all the code that calculates madt_size in the same place, at the beginning of the table layout. I believe this is neater and will avoid problems like this one in the future. As much as possible, I think it best to get all the tables layed out, then fill them in. If for some reason this is not acceptable, we need to add a big note that no tables should be layed out after the madt because the madt may grow further down in the code and overwrite the other table. I like this better too, see questions/comments below. Regards, Anthony Liguori --- kvm/bios/rombios32.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..289361b 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1787,6 +1787,7 @@ void acpi_bios_init(void) } int_override++; madt_size += sizeof(struct madt_int_override); +addr += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..23835b6 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; This bug could only affect the HPET descriptor right? I'm not sure what you're asking. There were 2 bugs that Vincent pointed out. The first caused an incorrect rsdt_size to be reported, and the second (missing addr += sizeof(struct madt_int_override)) caused corruption of whatever came after the MADT. But even if his patch were applied, any future code that added a table and manipulated addr between the following points: ... (about line 1676) madt = (void *)(addr); addr += madt_size; ... (about line 1789) madt_size += sizeof(struct madt_int_override); addr += sizeof(struct madt_int_override); would have wound up causing some kind of corruption, as happened with the HPET. Also the "memset(madt, 0, madt_size)" around line 1740 was not using the complete madt_size. So this seems undesirable, and that's why I suggested moving all addr manipulation (with the exception of additional tables at the very end) to the same section of the table layout code. Seems best to manage madt_size all in one place. #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1);
Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable
Anthony Liguori wrote: Vincent Minet wrote: External ACPI tables are counted twice for the RSDT size and the load address for the first external table is in the MADT (interrupt override entries are overwritten). Signed-off-by: Vincent Minet Beth, I think you had a patch attempting to address the same issue. It was a bit more involved though. Which is the proper fix and are they both to the same problem? They are for 2 different bases. My patch was for qemu's bochs bios and this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in this area of setting up the ACPI tables. My patch is still needed for the qemu base. I hope we'll be getting to one base soon :-) Assuming the intent of the code was for MAX_RSDT_ENTRIES to include external_tables, this patch looks correct. I think one additional check would be needed (in my patch) to make sure that the code doesn't exceed MAX_RSDT_ENTRIES when the external tables are being loaded. My patch also puts all the code that calculates madt_size in the same place, at the beginning of the table layout. I believe this is neater and will avoid problems like this one in the future. As much as possible, I think it best to get all the tables layed out, then fill them in. If for some reason this is not acceptable, we need to add a big note that no tables should be layed out after the madt because the madt may grow further down in the code and overwrite the other table. Regards, Anthony Liguori --- kvm/bios/rombios32.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..289361b 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1787,6 +1787,7 @@ void acpi_bios_init(void) } int_override++; madt_size += sizeof(struct madt_int_override); +addr += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..23835b6 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); @@ -1868,17 +1872,6 @@ void acpi_bios_init(void) acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); #endif - -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i < external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, &len) < 0) -BX_PANIC("Failed to load ACPI table from QEMU\n"); -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr >= ram_size) -BX_PANIC("ACPI table overflow\n"); -} #endif /* RSDT */ @@ -1891,6 +1884,16 @@ void acpi_bios_init(void) // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +for(i = 0; i < external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, &len) < 0) +BX_PANIC("Failed to load ACPI table from QEMU\n"); +rsdt->table_offset_entry[
Re: [PATCH 4/4] Userspace changes for KVM HPET (v3)
Beth Kon wrote: Avi Kivity wrote: Beth Kon wrote: Signed-off-by: Beth Kon diff --git a/hw/hpet.c b/hw/hpet.c index c7945ec..100abf5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_legacy_enable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_disable(); +dprintf("qemu: hpet disabled kernel pit\n"); +} else { +hpet_pit_disable(); +dprintf("qemu: hpet disabled userspace pit\n"); +} +} + +static void hpet_legacy_disable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_enable(); +dprintf("qemu: hpet enabled kernel pit\n"); +} else { +hpet_pit_enable(); +dprintf("qemu: hpet enabled userspace pit\n"); +} +} I think it's better to move these into hpet_pit_enable() and hpet_pit_enable(). This avoids changing the calls below, and puts pit stuff in i8254.c instead of hpet.c. Might also need to be called from hpet_load(); probably a problem in upstream as well. My assumption about hpet_load was that the correct pit state would be established via pit_load (since all saves/loads are done together). But when I wrote this, I was thinking only about the userspace pit (for qemu). I'm not sure how the "load" concept applies to kernel state. Do I need to explicitly re-enable or disable the kernel pit during load? Looking further at the code, it looks like kvm_pit_load should take care of this. Agree? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Userspace changes for KVM HPET (v3)
Avi Kivity wrote: Beth Kon wrote: Signed-off-by: Beth Kon diff --git a/hw/hpet.c b/hw/hpet.c index c7945ec..100abf5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_legacy_enable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_disable(); +dprintf("qemu: hpet disabled kernel pit\n"); +} else { +hpet_pit_disable(); +dprintf("qemu: hpet disabled userspace pit\n"); +} +} + +static void hpet_legacy_disable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_enable(); +dprintf("qemu: hpet enabled kernel pit\n"); +} else { +hpet_pit_enable(); +dprintf("qemu: hpet enabled userspace pit\n"); +} +} I think it's better to move these into hpet_pit_enable() and hpet_pit_enable(). This avoids changing the calls below, and puts pit stuff in i8254.c instead of hpet.c. Might also need to be called from hpet_load(); probably a problem in upstream as well. My assumption about hpet_load was that the correct pit state would be established via pit_load (since all saves/loads are done together). But when I wrote this, I was thinking only about the userspace pit (for qemu). I'm not sure how the "load" concept applies to kernel state. Do I need to explicitly re-enable or disable the kernel pit during load? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] BIOS changes for configuring irq0->inti2 override(v3)
Gleb Natapov wrote: On Mon, May 11, 2009 at 01:29:43PM -0400, Beth Kon wrote: Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..53359b8 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -444,6 +444,9 @@ uint32_t cpuid_features; uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) +#define QEMU_CFG_IRQ0_OVERRIDE (QEMU_CFG_ARCH_LOCAL + 2) int qemu_cfg_port; @@ -553,6 +557,18 @@ uint64_t qemu_cfg_get64 (void) } #endif +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +memset(&irq0_override, 0, 1); +} Why memset and not irq0_override = 0, actually it should zero already. This was an oversight, left over from some early cut-and-paste coding I was doing. You're right - not necessary. Thanks. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Userspace changes for configuring irq0->inti2override (v3)
Gleb Natapov wrote: On Mon, May 11, 2009 at 01:29:44PM -0400, Beth Kon wrote: Signed-off-by: Beth Kon diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index e1b19d7..bb74f38 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic); fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override); It is read as 1 byte by the BIOS, but it is 2 bytes here. And arch specific config should be registered in arch specific place (hw/pc.c) ok. register_savevm("fw_cfg", -1, 1, fw_cfg_save, fw_cfg_load, s); qemu_register_reset(fw_cfg_reset, s); diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index f616ed2..1de7360 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -19,6 +19,7 @@ #define FW_CFG_WRITE_CHANNEL0x4000 #define FW_CFG_ARCH_LOCAL 0x8000 +#define FW_CFG_IRQ0_OVERRIDE(FW_CFG_ARCH_LOCAL + 2) This should go to hw/pc.c ok. #define FW_CFG_ENTRY_MASK ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL) #define FW_CFG_INVALID 0x diff --git a/hw/ioapic.c b/hw/ioapic.c index 0b70cf6..2d77a2c 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 8cb6faa..2e52c87 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -879,7 +879,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} There is no entry for IRQ2, is this OK? What happens if IRQ2 triggers? Answered in separate email. if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index dd045dd..6a1968a 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -165,6 +165,7 @@ void qemu_kvm_cpu_stop(CPUState *env); #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu() void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); @@ -173,6 +174,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_has_sync_mmu() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 1f45fd6..292bbc3 100644 --- a/sysemu.h +++ b/sysemu.h @@ -93,6 +93,7 @@ extern int graphic_width; extern int graphic_height; extern int graphic_depth; extern int nographic; +extern int irq0override; extern const char *keyboard_layout; extern int win2k_install_hack; extern int rtc_td_hack; diff --git a/vl.c b/vl.c index d9f0607..0bffc82 100644 --- a/vl.c +++ b/vl.c @@ -207,6 +207,7 @@ static int vga_ram_size; enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; static DisplayState *display_state; int nographic; +int irq0override; static int curses; static int sdl; const char* keyboard_layout = NULL; @@ -5035,6 +5036,7 @@ int main(int argc, char **argv, char **envp) vga_ram_size = VGA_RAM_SIZE; snapshot = 0; nographic = 0; +irq0override = 1; Why not do that when defining the variable? Yeah I realize this is how it is done for other variables too, but why? Good question. I don't think there is any good reason. I was conforming to the existing style. curses = 0; kernel_filename = NULL; kernel_cmdline = ""; @@ -6129,8 +6131,14 @@ int main(int argc, char **argv, char **envp) } } -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqc
Re: [PATCH 2/4] Userspace changes for configuring irq0->inti2override (v3)
Gleb Natapov wrote: On Tue, May 12, 2009 at 01:22:06PM +0300, Avi Kivity wrote: Gleb Natapov wrote: for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} There is no entry for IRQ2, is this OK? What happens if IRQ2 triggers? irq 2 is the PIC cascade interrupt. If it is somehow triggered, the kernel will ignore it. But here we configure IOAPIC routing. What if IOAPIC is used for interrupt delivery and something triggers irq2. There is no entry describing it in IOAPIC routing table, so what gsi it will be mapped to? -- The ACPI spec states that systems that support both APIC and dual-8259 interrupt models must map system interrupt vectors 0-15 to 8259 IRQs 0-15, except where interrupt source overrides are provided. We provide an irq0->inti2 override, and no irq2 override, so irq2 must be unused. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Userspace changes for KVM HPET (v3)
Signed-off-by: Beth Kon diff --git a/hw/hpet.c b/hw/hpet.c index c7945ec..100abf5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_legacy_enable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_disable(); +dprintf("qemu: hpet disabled kernel pit\n"); +} else { +hpet_pit_disable(); +dprintf("qemu: hpet disabled userspace pit\n"); +} +} + +static void hpet_legacy_disable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_enable(); +dprintf("qemu: hpet enabled kernel pit\n"); +} else { +hpet_pit_enable(); +dprintf("qemu: hpet enabled userspace pit\n"); +} +} + static uint32_t timer_int_route(struct HPETTimer *timer) { uint32_t route; @@ -475,9 +498,9 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_legacy_enable(); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_legacy_disable(); } break; case HPET_CFG + 4: @@ -560,7 +583,7 @@ static void hpet_reset(void *opaque) { * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_legacy_disable(); count = 1; } diff --git a/qemu-kvm.c b/qemu-kvm.c index f55cee8..1bb853b 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -463,6 +463,25 @@ void kvm_init_vcpu(CPUState *env) qemu_cond_wait(&qemu_vcpu_cond); } +void kvm_kpit_enable(void) +{ +struct kvm_pit_state ps; +if (qemu_kvm_pit_in_kernel()) { +kvm_get_pit(kvm_context, &ps); +kvm_set_pit(kvm_context, &ps); +} +} + +void kvm_kpit_disable(void) +{ +struct kvm_pit_state ps; +if (qemu_kvm_pit_in_kernel()) { +kvm_get_pit(kvm_context, &ps); +ps.channels[0].mode = 0xff; +kvm_set_pit(kvm_context, &ps); +} +} + int kvm_init_ap(void) { #ifdef TARGET_I386 diff --git a/qemu-kvm.h b/qemu-kvm.h index 6a1968a..13353ec 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -31,6 +31,8 @@ int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap); int kvm_qemu_init_env(CPUState *env); int kvm_qemu_check_extension(int ext); void kvm_apic_init(CPUState *env); +void kvm_kpit_enable(void); +void kvm_kpit_disable(void); int kvm_set_irq(int irq, int level, int *status); int kvm_physical_memory_set_dirty_tracking(int enable); diff --git a/vl.c b/vl.c index 0bffc82..8f120c5 100644 --- a/vl.c +++ b/vl.c @@ -6132,10 +6132,15 @@ int main(int argc, char **argv, char **envp) } if (kvm_enabled()) { - kvm_init_ap(); +kvm_init_ap(); #ifdef USE_KVM if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { irq0override = 0; +/* if kernel can't do irq routing, interrupt source + * override 0->2 can not be set up as required by hpet, + * so disable hpet. + */ +no_hpet=1; } #endif } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] BIOS changes for KVM HPET (v3)
Signed-off-by: Beth Kon diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index c756fed..0e142be 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -308,7 +308,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -328,7 +327,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 53359b8..df83ee7 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1519,8 +1519,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1706,13 +1706,11 @@ void acpi_bios_init(void) #endif addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif /* RSDP */ memset(rsdp, 0, sizeof(*rsdp)); @@ -1884,7 +1882,6 @@ void acpi_bios_init(void) } /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1893,7 +1890,6 @@ void acpi_bios_init(void) hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); -#endif acpi_additional_tables(); /* resets cfg to required entry */ for(i = 0; i < external_tables; i++) { @@ -1913,8 +1909,7 @@ void acpi_bios_init(void) /* kvm has no ssdt (processors are in dsdt) */ // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr); #ifdef BX_QEMU -/* No HPET (yet) */ -// rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); #endif -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Userspace changes for configuring irq0->inti2 override (v3)
Signed-off-by: Beth Kon diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index e1b19d7..bb74f38 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic); fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override); register_savevm("fw_cfg", -1, 1, fw_cfg_save, fw_cfg_load, s); qemu_register_reset(fw_cfg_reset, s); diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index f616ed2..1de7360 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -19,6 +19,7 @@ #define FW_CFG_WRITE_CHANNEL0x4000 #define FW_CFG_ARCH_LOCAL 0x8000 +#define FW_CFG_IRQ0_OVERRIDE(FW_CFG_ARCH_LOCAL + 2) #define FW_CFG_ENTRY_MASK ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL) #define FW_CFG_INVALID 0x diff --git a/hw/ioapic.c b/hw/ioapic.c index 0b70cf6..2d77a2c 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 8cb6faa..2e52c87 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -879,7 +879,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index dd045dd..6a1968a 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -165,6 +165,7 @@ void qemu_kvm_cpu_stop(CPUState *env); #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu() void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); @@ -173,6 +174,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_has_sync_mmu() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 1f45fd6..292bbc3 100644 --- a/sysemu.h +++ b/sysemu.h @@ -93,6 +93,7 @@ extern int graphic_width; extern int graphic_height; extern int graphic_depth; extern int nographic; +extern int irq0override; extern const char *keyboard_layout; extern int win2k_install_hack; extern int rtc_td_hack; diff --git a/vl.c b/vl.c index d9f0607..0bffc82 100644 --- a/vl.c +++ b/vl.c @@ -207,6 +207,7 @@ static int vga_ram_size; enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; static DisplayState *display_state; int nographic; +int irq0override; static int curses; static int sdl; const char* keyboard_layout = NULL; @@ -5035,6 +5036,7 @@ int main(int argc, char **argv, char **envp) vga_ram_size = VGA_RAM_SIZE; snapshot = 0; nographic = 0; +irq0override = 1; curses = 0; kernel_filename = NULL; kernel_cmdline = ""; @@ -6129,8 +6131,14 @@ int main(int argc, char **argv, char **envp) } } -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { +irq0override = 0; +} +#endif +} machine->init(ram_size, vga_ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] BIOS changes for configuring irq0->inti2 override (v3)
These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid checksum with these patches, but later he couldn't reproduce it, and I'm not seeing it now. While all guests still need to be fully tested, everything appears to be in order. I've tested on win2k864, win2k832, RHEL5.3 32 bit, and ubuntu 8.10 64 bit. Changes from v2: - rebased on latest kvm - fixed build problems with --disable-kvm (kvm_kpit_enable/disable) Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..53359b8 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -444,6 +444,9 @@ uint32_t cpuid_features; uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1) +#define QEMU_CFG_IRQ0_OVERRIDE (QEMU_CFG_ARCH_LOCAL + 2) int qemu_cfg_port; @@ -553,6 +557,18 @@ uint64_t qemu_cfg_get64 (void) } #endif +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +memset(&irq0_override, 0, 1); +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1195,6 +1211,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1202,7 +1225,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1665,16 +1693,18 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + -#ifdef BX_QEMU -sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */; -#else sizeof(struct madt_io_apic); +#ifdef BX_QEMU +for (i = 0; i < 16; i++) +if (PCI_ISA_IRQ_MASK & (1U << i)) +madt_size += sizeof(struct madt_int_override); +if (irq0_override) +madt_size += sizeof(struct madt_int_override); #endif -madt = (void *)(addr); addr += madt_size; - #ifdef BX_QEMU #ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; @@ -1758,23 +1788,20 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); +int_override = (struct madt_int_override*)(io_apic + 1); #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM -io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +if (irq0_override) { +memset(int_override, 0, sizeof(*int_override)); +int_override->type = APIC_XRUPT_OVERRIDE; +int_override->length = sizeof(*int_override); +int_override-&g
Re: [PATCH 1/4] BIOS changes for configuring irq0->inti2 override(v2)
Avi Kivity wrote: Beth Kon wrote: These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. They are dependent on Jes Sorensen's recent 0006-qemu-kvm-irq-routing.patch. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid checksum with these patches, but later he couldn't reproduce it, and I'm not seeing it now. While all guests still need to be fully tested, everything appears to be in order. I've tested on win2k864, win2k832, RHEL5.3 32 bit, and ubuntu 8.10 64 bit. What are the changes relative to v1? Just merge issues with the changes you put in when moving to the newer bios. I submitted prematurely, incorrectly thinking I was done testing. When I finished, some problems surfaced. @@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_SIGNATURE 0x00 #define QEMU_CFG_ID 0x01 #define QEMU_CFG_UUID 0x02 +#define QEMU_CFG_IRQ0_OVERRIDE 0x0e As noted, this should be in the arch local space. The base changes were not in the code yet. As we discussed on IRC, I'll resubmit once they're there. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] BIOS changes for KVM HPET (v2)
Just a note here... The number of table_offset_entry entries for the non BX_QEMU case doesn't make sense here. There are only 2 entries. I left it as is, since it does not impact HPET's interraction with it. Actually it seems like dead code since this is in kvm code but with BX_QEMU undefined. It doesn't seem to be a problem. Signed-off-by: Beth Kon diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index c756fed..0e142be 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -308,7 +308,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -328,7 +327,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index ddfa828..7441cd7 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1293,7 +1293,7 @@ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ #ifdef BX_QEMU - uint32_t table_offset_entry [2]; /* Array of pointers to other */ + uint32_t table_offset_entry [3]; /* Array of pointers to other */ // uint32_t table_offset_entry [4]; /* Array of pointers to other */ #else uint32_t table_offset_entry [3]; /* Array of pointers to other */ @@ -1450,8 +1450,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1591,13 +1591,11 @@ void acpi_bios_init(void) #endif addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif acpi_tables_size = addr - base_addr; @@ -1620,10 +1618,10 @@ void acpi_bios_init(void) memset(rsdt, 0, sizeof(*rsdt)); rsdt->table_offset_entry[0] = cpu_to_le32(fadt_addr); rsdt->table_offset_entry[1] = cpu_to_le32(madt_addr); -//rsdt->table_offset_entry[2] = cpu_to_le32(ssdt_addr); #ifdef BX_QEMU -//rsdt->table_offset_entry[3] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[2] = cpu_to_le32(hpet_addr); #endif +//rsdt->table_offset_entry[3] = cpu_to_le32(ssdt_addr); acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", sizeof(*rsdt), 1); @@ -1723,7 +1721,6 @@ void acpi_bios_init(void) #ifdef BX_QEMU /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1733,7 +1730,6 @@ void acpi_bios_init(void) acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); #endif -#endif } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Userspace changes for KVM HPET (v2)
Signed-off-by: Beth Kon diff --git a/hw/hpet.c b/hw/hpet.c index c7945ec..47c9f89 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_kpit_enable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_kpit_disable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +ps.channels[0].mode = 0xff; +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_legacy_enable(void) +{ +if (qemu_kvm_pit_in_kernel()) { + hpet_kpit_disable(); + dprintf("qemu: hpet disabled kernel pit\n"); +} else { + hpet_pit_disable(); + dprintf("qemu: hpet disabled userspace pit\n"); +} +} + +static void hpet_legacy_disable(void) +{ +if (qemu_kvm_pit_in_kernel()) { + hpet_kpit_enable(); + dprintf("qemu: hpet enabled kernel pit\n"); +} else { + hpet_pit_enable(); + dprintf("qemu: hpet enabled userspace pit\n"); +} +} + static uint32_t timer_int_route(struct HPETTimer *timer) { uint32_t route; @@ -475,9 +513,9 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_legacy_enable(); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_legacy_disable(); } break; case HPET_CFG + 4: @@ -560,7 +598,7 @@ static void hpet_reset(void *opaque) { * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_legacy_disable(); count = 1; } diff --git a/vl.c b/vl.c index f9a72b3..b860b82 100644 --- a/vl.c +++ b/vl.c @@ -6138,10 +6138,15 @@ int main(int argc, char **argv, char **envp) } if (kvm_enabled()) { - kvm_init_ap(); +kvm_init_ap(); #ifdef USE_KVM if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { irq0override = 0; +/* if kernel can't do irq routing, interrupt source + * override 0->2 can not be set up as required by hpet, + * so disable hpet. + */ +no_hpet=1; } #endif } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Userspace changes for configuring irq0->inti2 override (v2)
Signed-off-by: Beth Kon diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index e1b19d7..bb74f38 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic); fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override); register_savevm("fw_cfg", -1, 1, fw_cfg_save, fw_cfg_load, s); qemu_register_reset(fw_cfg_reset, s); diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index f616ed2..498c1e3 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -15,6 +15,7 @@ #define FW_CFG_INITRD_SIZE 0x0b #define FW_CFG_BOOT_DEVICE 0x0c #define FW_CFG_NUMA 0x0d +#define FW_CFG_IRQ0_OVERRIDE0x0e #define FW_CFG_MAX_ENTRY0x10 #define FW_CFG_WRITE_CHANNEL0x4000 diff --git a/hw/ioapic.c b/hw/ioapic.c index 0b70cf6..2d77a2c 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 8cb6faa..2e52c87 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -879,7 +879,11 @@ int kvm_arch_init_irq_routing(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} if (r < 0) return r; } diff --git a/qemu-kvm.h b/qemu-kvm.h index 8226001..c64718c 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -165,6 +165,7 @@ void qemu_kvm_cpu_stop(CPUState *env); #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu() void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); @@ -173,6 +174,7 @@ void kvm_load_tsc(CPUState *env); #define kvm_nested 0 #define qemu_kvm_irqchip_in_kernel() (0) #define qemu_kvm_pit_in_kernel() (0) +#define qemu_kvm_has_gsi_routing() (0) #define kvm_has_sync_mmu() (0) #define kvm_load_registers(env) do {} while(0) #define kvm_save_registers(env) do {} while(0) diff --git a/sysemu.h b/sysemu.h index 1f45fd6..292bbc3 100644 --- a/sysemu.h +++ b/sysemu.h @@ -93,6 +93,7 @@ extern int graphic_width; extern int graphic_height; extern int graphic_depth; extern int nographic; +extern int irq0override; extern const char *keyboard_layout; extern int win2k_install_hack; extern int rtc_td_hack; diff --git a/vl.c b/vl.c index 6b4b7d2..f9a72b3 100644 --- a/vl.c +++ b/vl.c @@ -207,6 +207,7 @@ static int vga_ram_size; enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; static DisplayState *display_state; int nographic; +int irq0override; static int curses; static int sdl; const char* keyboard_layout = NULL; @@ -5039,6 +5040,7 @@ int main(int argc, char **argv, char **envp) vga_ram_size = VGA_RAM_SIZE; snapshot = 0; nographic = 0; +irq0override = 1; curses = 0; kernel_filename = NULL; kernel_cmdline = ""; @@ -6135,8 +6137,14 @@ int main(int argc, char **argv, char **envp) } } -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { +irq0override = 0; +} +#endif +} machine->init(ram_size, vga_ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] BIOS changes for configuring irq0->inti2 override (v2)
These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. They are dependent on Jes Sorensen's recent 0006-qemu-kvm-irq-routing.patch. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid checksum with these patches, but later he couldn't reproduce it, and I'm not seeing it now. While all guests still need to be fully tested, everything appears to be in order. I've tested on win2k864, win2k832, RHEL5.3 32 bit, and ubuntu 8.10 64 bit. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 8684987..07dda73 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -445,6 +445,9 @@ uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; uint8_t bios_uuid[16]; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_SIGNATURE 0x00 #define QEMU_CFG_ID 0x01 #define QEMU_CFG_UUID 0x02 +#define QEMU_CFG_IRQ0_OVERRIDE 0x0e int qemu_cfg_port; @@ -518,6 +522,18 @@ void uuid_probe(void) memset(bios_uuid, 0, 16); } +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +memset(&irq0_override, 0, 1); +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1160,6 +1176,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1167,7 +1190,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1550,16 +1578,18 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + -#ifdef BX_QEMU -sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */; -#else sizeof(struct madt_io_apic); +#ifdef BX_QEMU +for (i = 0; i < 16; i++) +if (PCI_ISA_IRQ_MASK & (1U << i)) +madt_size += sizeof(struct madt_int_override); +if (irq0_override) +madt_size += sizeof(struct madt_int_override); #endif -madt = (void *)(addr); addr += madt_size; - #ifdef BX_QEMU #ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; @@ -1660,23 +1690,20 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); +int_override = (struct madt_int_override*)(io_apic + 1); #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM -io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +if (irq0_override) { +memset(int_override, 0, sizeof(*int_override)); +int_override->type = APIC_XRUPT_OVERRIDE; +int_override->length = sizeof(*int_override); +int_override->source = 0; +int_override->gsi = 2; +int_override->
Re: [PATCH 1/4] BIOS changes for configuring irq0->inti2 override
Sebastian Herbszt wrote: Beth Kon wrote: @@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_SIGNATURE 0x00 #define QEMU_CFG_ID 0x01 #define QEMU_CFG_UUID 0x02 +#define QEMU_CFG_IRQ0_OVERRIDE 0x0e Small thing to consider before you resubmit: In his patch "read-additional-acpi-tables-from-a-vm.patch" Gleb introduced: #define QEMU_CFG_ARCH_LOCAL 0x8000 #define QEMU_CFG_ACPI_TABLES (QEMU_CFG_ARCH_LOCAL + 0) I think the idea behind this was to seperate the generic part from arch specific. The IRQ0 override seems to be arch specific (x86 only?) just like the ACPI tables, right? I'm not sure what the intent is. It looks like it would be just for additional tables (as opposed to "local")? Gleb? I don't believe irq0 override would fall into that category. But in any case since this is not in any code base, I don't think there's anything to be done yet. - Sebastian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] BIOS changes for configuring irq0->inti2 override
Beth Kon wrote: These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. I've found a problem with these patches. I'll resubmit shortly. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] BIOS changes for KVM HPET
Signed-off-by: Beth Kon diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index c756fed..0e142be 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -308,7 +308,6 @@ DefinitionBlock ( }) } #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM Device(HPET) { Name(_HID, EISAID("PNP0103")) Name(_UID, 0) @@ -328,7 +327,6 @@ DefinitionBlock ( }) } #endif -#endif } Scope(\_SB.PCI0) { diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index ddfa828..7441cd7 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1293,7 +1293,7 @@ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ #ifdef BX_QEMU - uint32_t table_offset_entry [2]; /* Array of pointers to other */ + uint32_t table_offset_entry [3]; /* Array of pointers to other */ // uint32_t table_offset_entry [4]; /* Array of pointers to other */ #else uint32_t table_offset_entry [3]; /* Array of pointers to other */ @@ -1450,8 +1450,8 @@ struct acpi_20_generic_address { } __attribute__((__packed__)); /* - * * HPET Description Table - * */ + * HPET Description Table + */ struct acpi_20_hpet { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ uint32_t timer_block_id; @@ -1591,13 +1591,11 @@ void acpi_bios_init(void) #endif addr += madt_size; #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; hpet_addr = addr; hpet = (void *)(addr); addr += sizeof(*hpet); #endif -#endif acpi_tables_size = addr - base_addr; @@ -1620,10 +1618,10 @@ void acpi_bios_init(void) memset(rsdt, 0, sizeof(*rsdt)); rsdt->table_offset_entry[0] = cpu_to_le32(fadt_addr); rsdt->table_offset_entry[1] = cpu_to_le32(madt_addr); -//rsdt->table_offset_entry[2] = cpu_to_le32(ssdt_addr); #ifdef BX_QEMU -//rsdt->table_offset_entry[3] = cpu_to_le32(hpet_addr); +rsdt->table_offset_entry[2] = cpu_to_le32(hpet_addr); #endif +//rsdt->table_offset_entry[3] = cpu_to_le32(ssdt_addr); acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", sizeof(*rsdt), 1); @@ -1723,7 +1721,6 @@ void acpi_bios_init(void) #ifdef BX_QEMU /* HPET */ -#ifdef HPET_WORKS_IN_KVM memset(hpet, 0, sizeof(*hpet)); /* Note timer_block_id value must be kept in sync with value advertised by * emulated hpet @@ -1733,7 +1730,6 @@ void acpi_bios_init(void) acpi_build_table_header((struct acpi_table_header *)hpet, "HPET", sizeof(*hpet), 1); #endif -#endif } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Userspace changes for KVM HPET
Signed-off-by: Beth Kon diff --git a/hw/hpet.c b/hw/hpet.c index c7945ec..47c9f89 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_kpit_enable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_kpit_disable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +ps.channels[0].mode = 0xff; +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_legacy_enable(void) +{ +if (qemu_kvm_pit_in_kernel()) { + hpet_kpit_disable(); + dprintf("qemu: hpet disabled kernel pit\n"); +} else { + hpet_pit_disable(); + dprintf("qemu: hpet disabled userspace pit\n"); +} +} + +static void hpet_legacy_disable(void) +{ +if (qemu_kvm_pit_in_kernel()) { + hpet_kpit_enable(); + dprintf("qemu: hpet enabled kernel pit\n"); +} else { + hpet_pit_enable(); + dprintf("qemu: hpet enabled userspace pit\n"); +} +} + static uint32_t timer_int_route(struct HPETTimer *timer) { uint32_t route; @@ -475,9 +513,9 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, } /* i8254 and RTC are disabled when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +hpet_legacy_enable(); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +hpet_legacy_disable(); } break; case HPET_CFG + 4: @@ -560,7 +598,7 @@ static void hpet_reset(void *opaque) { * hpet_reset is called due to system reset. At this point control must * be returned to pit until SW reenables hpet. */ -hpet_pit_enable(); +hpet_legacy_disable(); count = 1; } diff --git a/pc-bios/bios.bin b/pc-bios/bios.bin index d5d42f3..2503783 100644 Binary files a/pc-bios/bios.bin and b/pc-bios/bios.bin differ diff --git a/vl.c b/vl.c index 5eacd6a..1334344 100644 --- a/vl.c +++ b/vl.c @@ -5666,10 +5666,15 @@ int main(int argc, char **argv, char **envp) } if (kvm_enabled()) { - kvm_init_ap(); +kvm_init_ap(); #ifdef USE_KVM if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { - irq0override = 0; +irq0override = 0; +/* if kernel can't do irq routing, interrupt source + * override 0->2 can not be set up as required by hpet, + * so disable hpet. + */ +no_hpet=1; } #endif } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] Userspace changes for configuring irq0->inti2 override
Signed-off-by: Beth Kon diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c index e1b19d7..bb74f38 100644 --- a/hw/fw_cfg.c +++ b/hw/fw_cfg.c @@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic); fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override); register_savevm("fw_cfg", -1, 1, fw_cfg_save, fw_cfg_load, s); qemu_register_reset(fw_cfg_reset, s); diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h index f616ed2..498c1e3 100644 --- a/hw/fw_cfg.h +++ b/hw/fw_cfg.h @@ -15,6 +15,7 @@ #define FW_CFG_INITRD_SIZE 0x0b #define FW_CFG_BOOT_DEVICE 0x0c #define FW_CFG_NUMA 0x0d +#define FW_CFG_IRQ0_OVERRIDE0x0e #define FW_CFG_MAX_ENTRY0x10 #define FW_CFG_WRITE_CHANNEL0x4000 diff --git a/hw/ioapic.c b/hw/ioapic.c index 0b70cf6..2d77a2c 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -23,6 +23,7 @@ #include "hw.h" #include "pc.h" +#include "sysemu.h" #include "qemu-timer.h" #include "host-utils.h" @@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level) { IOAPICState *s = opaque; -#if 0 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -if (vector == 0) +if (vector == 0 && irq0override) { vector = 2; -#endif +} if (vector >= 0 && vector < IOAPIC_NUM_PINS) { uint32_t mask = 1 << vector; diff --git a/qemu-kvm.c b/qemu-kvm.c index 68a9218..5b27179 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -814,9 +814,14 @@ int kvm_qemu_create_context(void) return r; } for (i = 0; i < 24; ++i) { -r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); -if (r < 0) +if (i == 0) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2); +} else if (i != 2) { +r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i); +} +if (r < 0) { return r; +} } kvm_commit_irq_routes(kvm_context); } diff --git a/qemu-kvm.h b/qemu-kvm.h index ca59af8..a836579 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -166,6 +166,7 @@ void qemu_kvm_cpu_stop(CPUState *env); #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context) +#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context) #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu() void kvm_init_vcpu(CPUState *env); void kvm_load_tsc(CPUState *env); diff --git a/sysemu.h b/sysemu.h index e8dd381..a5f96f9 100644 --- a/sysemu.h +++ b/sysemu.h @@ -96,6 +96,7 @@ extern int graphic_width; extern int graphic_height; extern int graphic_depth; extern int nographic; +extern int irq0override; extern const char *keyboard_layout; extern int win2k_install_hack; extern int rtc_td_hack; diff --git a/vl.c b/vl.c index 9ff4a5a..ee7f29a 100644 --- a/vl.c +++ b/vl.c @@ -207,6 +207,7 @@ static int vga_ram_size; enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; static DisplayState *display_state; int nographic; +int irq0override; static int curses; static int sdl; const char* keyboard_layout = NULL; @@ -4599,6 +4600,7 @@ int main(int argc, char **argv, char **envp) vga_ram_size = VGA_RAM_SIZE; snapshot = 0; nographic = 0; +irq0override = 1; curses = 0; kernel_filename = NULL; kernel_cmdline = ""; @@ -5682,8 +5684,14 @@ int main(int argc, char **argv, char **envp) } } -if (kvm_enabled()) - kvm_init_ap(); +if (kvm_enabled()) { + kvm_init_ap(); +#ifdef USE_KVM +if (kvm_irqchip && !qemu_kvm_has_gsi_routing()) { + irq0override = 0; +} +#endif +} machine->init(ram_size, vga_ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] BIOS changes for configuring irq0->inti2 override
These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm. Override and HPET changes are sent as a series because HPET depends on the override. Win2k8 expects the HPET interrupt on inti2, regardless of whether an override exists in the BIOS. And the HPET spec states that in legacy mode, timer interrupt is on inti2. The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid checksum with these patches, but later he couldn't reproduce it, and I'm not seeing it now. While all guests still need to be fully tested, everything appears to be in order. I've tested on win2k864, win2k832, RHEL5.3 32 bit, and ubuntu 8.10 64 bit. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 8684987..ddfa828 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -445,6 +445,9 @@ uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; uint8_t bios_uuid[16]; +#ifdef BX_QEMU +uint8_t irq0_override; +#endif #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_SIGNATURE 0x00 #define QEMU_CFG_ID 0x01 #define QEMU_CFG_UUID 0x02 +#define QEMU_CFG_IRQ0_OVERRIDE 0x0e int qemu_cfg_port; @@ -518,6 +522,18 @@ void uuid_probe(void) memset(bios_uuid, 0, 16); } +#ifdef BX_QEMU +void irq0_override_probe(void) +{ +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +memset(&irq0_override, 0, 1); +} +#endif + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1160,6 +1176,13 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +#ifdef BX_QEMU +/* One entry per ioapic interrupt destination. Destination 2 is covered + * by irq0->inti2 override (i == 0). Source IRQ 2 is unused + */ +if (irq0_override && i == 2) +continue; +#endif putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1167,7 +1190,12 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +#ifdef BX_QEMU +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +#endif +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1550,16 +1578,18 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + -#ifdef BX_QEMU -sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */; -#else sizeof(struct madt_io_apic); +#ifdef BX_QEMU +for (i = 0; i < 16; i++) +if (PCI_ISA_IRQ_MASK & (1U << i)) +madt_size += sizeof(struct madt_int_override); +if (irq0_override) +madt_size += sizeof(struct madt_int_override); #endif -madt = (void *)(addr); addr += madt_size; - #ifdef BX_QEMU #ifdef HPET_WORKS_IN_KVM addr = (addr + 7) & ~7; @@ -1660,23 +1690,21 @@ void acpi_bios_init(void) io_apic->io_apic_id = smp_cpus; io_apic->address = cpu_to_le32(0xfec0); io_apic->interrupt = cpu_to_le32(0); +int_override = (struct madt_int_override*)(io_apic + 1); #ifdef BX_QEMU -#ifdef HPET_WORKS_IN_KVM -io_apic++; - -int_override = (void *)io_apic; -int_override->type = APIC_XRUPT_OVERRIDE; -int_override->length = sizeof(*int_override); -int_override->bus = cpu_to_le32(0); -int_override->source = cpu_to_le32(0); -int_override->gsi = cpu_to_le32(2); -int_override->flags = cpu_to_le32(0); -#endif +if (irq0_override) { +int_override = (void *)io_apic; +int_override->type = APIC_XRUPT_OVERRIDE; +int_override->length = sizeof(*int_override); +int_override->bus = cpu_to_le32(0); +int_override->source = cpu_to_le32(0); +int_override->gsi = cpu_to_le32(2); +int_override->flags = cpu_to_le32(0); /* conf
[PATCH 2/2] Finish HPET implementation for KVM
Signed-off-by: Beth Kon diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl index 06ab25d..84697db 100755 --- a/bios/acpi-dsdt.dsl +++ b/bios/acpi-dsdt.dsl @@ -307,6 +307,24 @@ DefinitionBlock ( ,, , AddressRangeMemory, TypeStatic) }) } +Device(HPET) { +Name(_HID, EISAID("PNP0103")) +Name(_UID, 0) +Method (_STA, 0, NotSerialized) { +Return(0x0F) +} +Name(_CRS, ResourceTemplate() { +DWordMemory( +ResourceConsumer, PosDecode, MinFixed, MaxFixed, +NonCacheable, ReadWrite, +0x, +0xFED0, +0xFED003FF, +0x, +0x0400 /* 1K memory: FED0 - FED003FF */ +) +}) +} } Scope(\_SB.PCI0) { diff --git a/bios/rombios32.c b/bios/rombios32.c index 5cf1f54..959a784 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -1275,7 +1275,7 @@ struct rsdp_descriptor /* Root System Descriptor Pointer */ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t table_offset_entry [2]; /* Array of pointers to other */ + uint32_t table_offset_entry [3]; /* Array of pointers to other */ /* ACPI tables */ } __attribute__((__packed__)); @@ -1415,6 +1415,31 @@ struct madt_processor_apic #endif } __attribute__((__packed__)); +/* + * * ACPI 2.0 Generic Address Space definition. + * */ +struct acpi_20_generic_address { +uint8_t address_space_id; +uint8_t register_bit_width; +uint8_t register_bit_offset; +uint8_t reserved; +uint64_t address; +} __attribute__((__packed__)); + +/* + * * HPET Description Table + * */ +struct acpi_20_hpet { +ACPI_TABLE_HEADER_DEF/* ACPI common table header */ +uint32_t timer_block_id; +struct acpi_20_generic_address addr; +uint8_thpet_number; +uint16_t min_tick; +uint8_tpage_protect; +} __attribute__((__packed__)); + +#define ACPI_HPET_ADDRESS 0xFED0UL + struct madt_io_apic { APIC_HEADER_DEF @@ -1487,6 +1512,8 @@ void acpi_bios_init(void) struct facs_descriptor_rev1 *facs; struct multiple_apic_table *madt; uint8_t *dsdt; +struct acpi_20_hpet *hpet; +uint32_t hpet_addr; uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr; uint32_t acpi_tables_size, madt_addr, madt_size; int i; @@ -1534,6 +1561,11 @@ void acpi_bios_init(void) madt_size += sizeof(struct madt_intsrcovr); addr += madt_size; +addr = (addr + 7) & ~7; +hpet_addr = addr; +hpet = (void *)(addr); +addr += sizeof(*hpet); + acpi_tables_size = addr - base_addr; BX_INFO("ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx size=0x%x\n", @@ -1555,6 +1587,7 @@ void acpi_bios_init(void) memset(rsdt, 0, sizeof(*rsdt)); rsdt->table_offset_entry[0] = cpu_to_le32(fadt_addr); rsdt->table_offset_entry[1] = cpu_to_le32(madt_addr); +rsdt->table_offset_entry[2] = cpu_to_le32(hpet_addr); acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", sizeof(*rsdt), 1); @@ -1644,6 +1677,15 @@ void acpi_bios_init(void) } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); +/* HPET */ +memset(hpet, 0, sizeof(*hpet)); +/* Note: timer_block_id value must be kept in sync with value + * advertised by emulated hpet in hpet.c + */ +hpet->timer_block_id = cpu_to_le32(0x8086a201); +hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); +acpi_build_table_header((struct acpi_table_header *)hpet, + "HPET", sizeof(*hpet), 1); } } diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c index 7df2d05..2b817a6 100644 --- a/qemu/hw/hpet.c +++ b/qemu/hw/hpet.c @@ -30,6 +30,7 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_kpit_enable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_kpit_disable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +ps.channels[0].mode = 0xff; +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_legacy_enable(void) +{ +
[PATCH 1/2] Make BIOS irq0->inti2 override configurable from userspace
These patches resolve the irq0->inti2 override issue, and get the hpet working on kvm with and without -no-kvm-irqchip (i.e., when hpet takes over, it disables userspace or in-kernel pit as appropriate). The irq0->inti2 override will always be used unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So if the kernel is capable, userspace sets up irq0->inti2 via the irq routing interface, and adds the irq0->inti2 override to the MADT interrupt source override table, and the mp table (for the no-acpi case). A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid checksum with these patches, but later he couldn't reproduce it, and I'm not seeing it now. While all guests still need to be fully tested, everything appears to be in order. I've tested on win2k864, win2k832, RHEL5.3 32 bit, and ubuntu 8.10 64 bit. Signed-off-by: Beth Kon diff --git a/bios/rombios32.c b/bios/rombios32.c index 4dea066..5cf1f54 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -443,6 +443,7 @@ uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; uint8_t bios_uuid[16]; +uint8_t irq0_override; #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -475,6 +476,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_SIGNATURE 0x00 #define QEMU_CFG_ID 0x01 #define QEMU_CFG_UUID 0x02 +#define QEMU_CFG_IRQ0_OVERRIDE 0x0d int qemu_cfg_port; @@ -516,6 +518,18 @@ void uuid_probe(void) memset(bios_uuid, 0, 16); } +void irq0_override_probe(void) +{ +#ifdef BX_QEMU +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +#endif +memset(&irq0_override, 0, 1); +} + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1152,6 +1166,8 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +if (irq0_override && i == 2) +continue; putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1159,7 +1175,10 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1508,6 +1527,11 @@ void acpi_bios_init(void) sizeof(struct madt_processor_apic) * MAX_CPUS + sizeof(struct madt_io_apic); madt = (void *)(addr); +for (i = 0; i < 16; i++) +if (PCI_ISA_IRQ_MASK & (1U << i)) +madt_size += sizeof(struct madt_intsrcovr); +if (irq0_override) +madt_size += sizeof(struct madt_intsrcovr); addr += madt_size; acpi_tables_size = addr - base_addr; @@ -1597,8 +1621,15 @@ void acpi_bios_init(void) io_apic->interrupt = cpu_to_le32(0); intsrcovr = (struct madt_intsrcovr*)(io_apic + 1); -for ( i = 0; i < 16; i++ ) { -if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +for (i = 0; i < 16; i++) { +if (irq0_override && i == 0) { +memset(intsrcovr, 0, sizeof(*intsrcovr)); +intsrcovr->type = APIC_XRUPT_OVERRIDE; +intsrcovr->length = sizeof(*intsrcovr); +intsrcovr->source = i; +intsrcovr->gsi= 2; +intsrcovr->flags = 0; //conforms to bus specifications +} else if (PCI_ISA_IRQ_MASK & (1U << i)) { memset(intsrcovr, 0, sizeof(*intsrcovr)); intsrcovr->type = APIC_XRUPT_OVERRIDE; intsrcovr->length = sizeof(*intsrcovr); @@ -1610,7 +1641,6 @@ void acpi_bios_init(void) continue; } intsrcovr++; -madt_size += sizeof(struct madt_intsrcovr); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); @@ -2230,6 +2260,8 @@ void rombios32_init(uint32_t *s3_resume_vector, uint8_t *shutdown_flag) if (bios_table_cur_addr != 0) { +irq0_override_probe(); + mptable_init(); uuid_probe(); diff --git a/qemu/hw/fw_cfg.c b/qemu/hw/fw_cfg.c index e324e8d..f06dc3c 100644 --- a/qemu/hw/fw_cfg.c +++ b/qemu/hw/fw_cfg.c @@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port, fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nograp
[RFC][PATCH 2/2] Finish hpet implementation for KVM
- add hpet to BIOS - add disable/enable of kernel pit when hpet enters/leaves legacy mode Signed-off-by: Beth Kon diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl index d67616d..9981a1f 100755 --- a/bios/acpi-dsdt.dsl +++ b/bios/acpi-dsdt.dsl @@ -233,6 +233,24 @@ DefinitionBlock ( ,, , AddressRangeMemory, TypeStatic) }) } +Device(HPET) { +Name(_HID, EISAID("PNP0103")) +Name(_UID, 0) +Method (_STA, 0, NotSerialized) { +Return(0x0F) +} +Name(_CRS, ResourceTemplate() { +DWordMemory( +ResourceConsumer, PosDecode, MinFixed, MaxFixed, +NonCacheable, ReadWrite, +0x, +0xFED0, +0xFED003FF, +0x, +0x0400 /* 1K memory: FED0 - FED003FF */ +) +}) +} } Scope(\_SB.PCI0) { diff --git a/bios/rombios32.c b/bios/rombios32.c index 84f15fb..17c3704 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -1272,7 +1272,7 @@ struct rsdp_descriptor /* Root System Descriptor Pointer */ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t table_offset_entry [2]; /* Array of pointers to other */ + uint32_t table_offset_entry [3]; /* Array of pointers to other */ /* ACPI tables */ } __attribute__((__packed__)); @@ -1412,6 +1412,31 @@ struct madt_processor_apic #endif } __attribute__((__packed__)); +/* + * * ACPI 2.0 Generic Address Space definition. + * */ +struct acpi_20_generic_address { +uint8_t address_space_id; +uint8_t register_bit_width; +uint8_t register_bit_offset; +uint8_t reserved; +uint64_t address; +} __attribute__((__packed__)); + +/* + * * HPET Description Table + * */ +struct acpi_20_hpet { +ACPI_TABLE_HEADER_DEF /* ACPI common table header */ +uint32_t timer_block_id; +struct acpi_20_generic_address addr; +uint8_thpet_number; +uint16_t min_tick; +uint8_tpage_protect; +} __attribute__((__packed__)); + +#define ACPI_HPET_ADDRESS 0xFED0UL + struct madt_io_apic { APIC_HEADER_DEF @@ -1484,6 +1509,8 @@ void acpi_bios_init(void) struct facs_descriptor_rev1 *facs; struct multiple_apic_table *madt; uint8_t *dsdt; +struct acpi_20_hpet *hpet; +uint32_t hpet_addr; uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr; uint32_t acpi_tables_size, madt_addr, madt_size; int i; @@ -1531,6 +1558,11 @@ void acpi_bios_init(void) madt_size += sizeof(struct madt_intsrcovr); addr += madt_size; +addr = (addr + 7) & ~7; +hpet_addr = addr; +hpet = (void *)(addr); +addr += sizeof(*hpet); + acpi_tables_size = addr - base_addr; BX_INFO("ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx size=0x%x\n", @@ -1552,6 +1584,7 @@ void acpi_bios_init(void) memset(rsdt, 0, sizeof(*rsdt)); rsdt->table_offset_entry[0] = cpu_to_le32(fadt_addr); rsdt->table_offset_entry[1] = cpu_to_le32(madt_addr); +rsdt->table_offset_entry[2] = cpu_to_le32(hpet_addr); acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", sizeof(*rsdt), 1); @@ -1641,6 +1674,15 @@ void acpi_bios_init(void) } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); +/* HPET */ +memset(hpet, 0, sizeof(*hpet)); +/* Note timer_block_id value must be kept in sync with value advertised by + * emulated hpet + */ +hpet->timer_block_id = cpu_to_le32(0x8086a201); +hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); +acpi_build_table_header((struct acpi_table_header *)hpet, + "HPET", sizeof(*hpet), 1); } } diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c index 7df2d05..80b2edd 100644 --- a/qemu/hw/hpet.c +++ b/qemu/hw/hpet.c @@ -30,8 +30,9 @@ #include "console.h" #include "qemu-timer.h" #include "hpet_emul.h" +#include "qemu-kvm.h" -//#define HPET_DEBUG +#define HPET_DEBUG #ifdef HPET_DEBUG #define dprintf printf #else @@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_kpit_enable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +kvm_set_pit(kvm_context, &ps); +} + +static void hpet_kpit_disable(void) +{ +struct kvm_pit_state ps; +kvm_get_pit(kvm_context, &ps); +
[RFC][PATCH 1/2] Make irq0->inti2 override in BIOS configurable from userspace
This series of patches (nearly) resolves the irq0->inti2 override issue, and gets the hpet working on kvm with and without the in-kernel irqchip (i.e., it disables userspace and in-kernel pit as needed). - irq0->inti2 The resolution was to always use the override unless the kernel cannot do irq routing (i.e., compatibility with old kernels). So qemu checks whether the kernel is capable of irq routing. If so, qemu tells kvm to route irq0 to inti2 via the irq routing interface, and tells bios to add the irq0->inti2 override to the MADT interrupt source override table, and to the mp table (for the non-acpi case). The only outstanding problem here is that when I set acpi=off on the kernel boot line, the boot fails. Apparently linux does not like the way I implemented the override for the mp table in rombios32.c. Since I am pressed for time at the moment, I'm putting this patch set out for comments in hopes that someone else may immediately see the problem. Otherwise I'll keep looking into it as time permits. - hpet The hpet works with and without in-kernel irqchip. And many thanks to Marcelo for finding a bios corruption bug that was the primary source of win2k864 problems. Now the hpet works on linux (ubuntu 8.0.4), win2k832. On win2k864 it works with the in-kernel irqchip but is broken (i.e.,black screen) when -no-kvm-irqchip is specified. Though I found that it is also broken when I remove these 2 patches, so it appears to have nothing to do with hpet or irq routing. Needs more looking into. Signed-off-by: Beth Kon --- bios/Makefile|2 +- bios/rombios32.c | 40 qemu/hw/apic.c |5 ++--- qemu/hw/fw_cfg.c |1 + qemu/hw/fw_cfg.h |1 + qemu/qemu-kvm.c |5 - qemu/sysemu.h|1 + qemu/vl.c| 10 -- 8 files changed, 54 insertions(+), 11 deletions(-) diff --git a/bios/Makefile b/bios/Makefile index 2d1f40d..374d70e 100644 --- a/bios/Makefile +++ b/bios/Makefile @@ -48,7 +48,7 @@ LIBS = -lm RANLIB = ranlib BCC = bcc -GCC = gcc $(CFLAGS) +GCC = gcc $(CFLAGS) -fno-stack-protector HOST_CC = gcc AS86 = as86 diff --git a/bios/rombios32.c b/bios/rombios32.c index 9d2eaaa..84f15fb 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -443,6 +443,7 @@ uint32_t cpuid_ext_features; unsigned long ram_size; uint64_t ram_end; uint8_t bios_uuid[16]; +uint8_t irq0_override; #ifdef BX_USE_EBDA_TABLES unsigned long ebda_cur_addr; #endif @@ -475,6 +476,7 @@ void wrmsr_smp(uint32_t index, uint64_t val) #define QEMU_CFG_SIGNATURE 0x00 #define QEMU_CFG_ID 0x01 #define QEMU_CFG_UUID 0x02 +#define QEMU_CFG_IRQ0_OVERRIDE 0x07 int qemu_cfg_port; @@ -516,6 +518,18 @@ void uuid_probe(void) memset(bios_uuid, 0, 16); } +void irq0_override_probe(void) +{ +#ifdef BX_QEMU +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE); +qemu_cfg_read(&irq0_override, 1); +return; +} +#endif +memset(&irq0_override, 0, 1); +} + void cpu_probe(void) { uint32_t eax, ebx, ecx, edx; @@ -1149,6 +1163,8 @@ static void mptable_init(void) /* irqs */ for(i = 0; i < 16; i++) { +if (irq0_override && i == 2) +continue; putb(&q, 3); /* entry type = I/O interrupt */ putb(&q, 0); /* interrupt type = vectored interrupt */ putb(&q, 0); /* flags: po=0, el=0 */ @@ -1156,7 +1172,10 @@ static void mptable_init(void) putb(&q, 0); /* source bus ID = ISA */ putb(&q, i); /* source bus IRQ */ putb(&q, ioapic_id); /* dest I/O APIC ID */ -putb(&q, i); /* dest I/O APIC interrupt in */ +if (irq0_override && i == 0) +putb(&q, 2); /* dest I/O APIC interrupt in */ +else +putb(&q, i); /* dest I/O APIC interrupt in */ } /* patch length */ len = q - mp_config_table; @@ -1505,6 +1524,11 @@ void acpi_bios_init(void) sizeof(struct madt_processor_apic) * MAX_CPUS + sizeof(struct madt_io_apic); madt = (void *)(addr); +for (i = 0; i < 16; i++) +if (PCI_ISA_IRQ_MASK & (1U << i)) +madt_size += sizeof(struct madt_intsrcovr); +if (irq0_override) +madt_size += sizeof(struct madt_intsrcovr); addr += madt_size; acpi_tables_size = addr - base_addr; @@ -1594,8 +1618,15 @@ void acpi_bios_init(void) io_apic->interrupt = cpu_to_le32(0); intsrcovr = (struct madt_intsrcovr*)(io_apic + 1); -for ( i = 0; i < 16; i++ ) { -if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +for (i = 0; i < 16; i++) { +if (irq0_override && i == 0) { +memset(intsrcovr, 0, sizeof(*intsrcovr)); +intsrcovr->type = APIC_XRUPT_OVERRIDE; +intsrcovr->length
[PATCH] hpet config mask fix
I discovered a bug in the hpet code that caused Windows to boot without hpet. The config mask I was using was preventing the guest from placing the hpet into 32 bit mode. diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c index 5c1aca2..7df2d05 100644 --- a/qemu/hw/hpet.c +++ b/qemu/hw/hpet.c @@ -388,7 +388,8 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, switch ((addr - 0x100) % 0x20) { case HPET_TN_CFG: dprintf("qemu: hpet_ram_writel HPET_TN_CFG\n"); -timer->config = hpet_fixup_reg(new_val, old_val, 0x3e4e); +timer->config = hpet_fixup_reg(new_val, old_val, + HPET_TN_CFG_WRITE_MASK); if (new_val & HPET_TN_32BIT) { timer->cmp = (uint32_t)timer->cmp; timer->period = (uint32_t)timer->period; @@ -456,7 +457,8 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr, case HPET_ID: return; case HPET_CFG: -s->config = hpet_fixup_reg(new_val, old_val, 0x3); +s->config = hpet_fixup_reg(new_val, old_val, + HPET_CFG_WRITE_MASK); if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) { /* Enable main counter and interrupt generation. */ s->hpet_offset = ticks_to_ns(s->hpet_counter) diff --git a/qemu/hw/hpet_emul.h b/qemu/hw/hpet_emul.h index fbe7a44..60893b6 100644 --- a/qemu/hw/hpet_emul.h +++ b/qemu/hw/hpet_emul.h @@ -36,6 +36,7 @@ #define HPET_TN_CFG 0x000 #define HPET_TN_CMP 0x008 #define HPET_TN_ROUTE 0x010 +#define HPET_CFG_WRITE_MASK 0x3 #define HPET_TN_ENABLE 0x004 @@ -45,6 +46,7 @@ #define HPET_TN_SETVAL 0x040 #define HPET_TN_32BIT0x100 #define HPET_TN_INT_ROUTE_MASK 0x3e00 +#define HPET_TN_CFG_WRITE_MASK 0x3f4e #define HPET_TN_INT_ROUTE_SHIFT 9 #define HPET_TN_INT_ROUTE_CAP_SHIFT 32 #define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U
KVM userspace build fails with 2.6.28-rc7 kernel installed
I pulled the latest: kvm (commit 3c260758b41000986c3c064b17a9771286e98d1e) kvm-userspace (commit 6892f63c18a526c7b54bbde2f59287787eabe1f8) and built and installed the 2.6.28-rc7 x86_64 kernel from kvm pull, then tried to build kvm-userspace and the build failed: make -C /lib/modules/2.6.28-rc7/build M=`pwd` \ LINUXINCLUDE="-I`pwd`/include -Iinclude \ \ -Iarch/x86/include -I`pwd`/include-compat \ -include include/linux/autoconf.h \ -include `pwd`/x86/external-module-compat.h " make[2]: Entering directory `/home/beth/git/build/kvm.kernel/kvm' LD /home/beth/git/test/kvm-userspace/kernel/x86/built-in.o CC [M] /home/beth/git/test/kvm-userspace/kernel/x86/svm.o In file included from /home/beth/git/test/kvm-userspace/kernel/include/asm/kvm_host.h:64, from /home/beth/git/test/kvm-userspace/kernel/include/linux/kvm_host.h:67, from /home/beth/git/test/kvm-userspace/kernel/x86/svm.c:56: arch/x86/include/asm/mtrr.h:60: error: redefinition of ‘struct mtrr_var_range’ arch/x86/include/asm/mtrr.h:69: error: redefinition of typedef ‘mtrr_type’ /home/beth/git/test/kvm-userspace/kernel/x86/external-module-compat.h:349: error: previous declaration of ‘mtrr_type’ was here arch/x86/include/asm/mtrr.h:74: error: redefinition of ‘struct mtrr_state_type’ make[4]: *** [/home/beth/git/test/kvm-userspace/kernel/x86/svm.o] Error 1 When I moved the machine back to 2.6.27.7 the build succeeded. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/2] Add HPET emulation to qemu (v3)
On Mon, 2008-10-27 at 12:49 +0200, Dor Laor wrote: > Beth Kon wrote: > > On Fri, 2008-10-17 at 16:49 +0100, Jamie Lokier wrote: > > > > > Beth Kon wrote: > > > > > > > Clock drift on Linux is in the range of .017% - .019%, loaded and > > > > unloaded. I > > > > haven't found a straightforward way to test on Windows and would > > > > appreciate > > > > any pointers to existing approaches. > > > > > > > Is there any reason why there should be any clock drift, when the > > > guest is using a non-PIT clock? > > > > > > I'm probably being naive, but with 32-bit or 64-bit HPET counters > > > available to the guest, and accurate values from the CMOS clock > > > emulation, I don't see why drift would accumulate over the long term > > > relative to the host clock. > > > > > > > I was measuring with ntpdate, so the drift is with respect to the ntp > > server pool, not the host clock. But in any case, since timer interrupts > > and reads of the hpet counter are at the mercy of the host scheduler > > (i.e., the qemu process can be swapped out at any time during hpet read > > or timer expiration), I'd guess there would always be some amount of > > inaccuracy. Also, qemu checks for timer expiration (qemu_run_timers) as > > part of a bigger loop (main_loop_wait), so the varying amounts of work > > to do elsewhere in the loop from iteration to iteration would also > > introduce irregular delays. > > > This is exactly why hpet as the other clock emulation in qemu (pit, > rtc, pm?) need > to check whether their irq was really injected. Gleb sent patches for > the rtc, pit. > The idea is to check with the irq chip if the injected irq was really > successful. > I assume these are the patches you're referring to? http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18974/focus=18977 Looks like they were never merged. Does anyone know the history on that? Also, HPET generates edge-triggered interrupts (as dictated by Linux and Windows) so I'm not sure if this scheme could work for it. > Dor -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add HPET emulation to qemu (v3)
On Tue, 2008-10-21 at 10:21 -0500, Anthony Liguori wrote: > Beth Kon wrote: Thanks for the feedback, Anthony. I'll only respond where I have specific comments. Otherwise, I agree to your suggestions and will make the changes. > > +if(timer_enabled(timer) && hpet_enabled(timer->state)) { > > +qemu_irq_pulse(irq); > > +/* windows wants timer0 on irq2 and linux wants irq0, > > + * so we pulse both > > + */ > > +if (do_ioapic) > > +qemu_irq_pulse(timer->state->irqs[2]); > > > > This seems curious and not quite right. We should be able to detect > whether the HPET is being used in IO APIC mode and raise the appropriate > interrupt instead of generating a spurious irq0 interrupt. > After digging further on this, it turns out that the need for the 2 interrupts was caused by what looks like a problem with the way qemu is generating interrupts for the ioapic. I will send out a separate patch for that issue, and make the necessary changes in this hpet code. > > +} > > +} > > + > > +static void hpet_save(QEMUFile *f, void *opaque) > > +{ > > +HPETState *s = opaque; > > +int i; > > +qemu_put_be64s(f, &s->config); > > +qemu_put_be64s(f, &s->isr); > > +/* save current counter value */ > > +s->hpet_counter = hpet_get_ticks(s); > > +qemu_put_be64s(f, &s->hpet_counter); > > + > > +for(i = 0; i < HPET_NUM_TIMERS; i++) { > > +qemu_put_8s(f, &s->timer[i].tn); > > +qemu_put_be64s(f, &s->timer[i].config); > > +qemu_put_be64s(f, &s->timer[i].cmp); > > +qemu_put_be64s(f, &s->timer[i].fsb); > > +qemu_put_be64s(f, &s->timer[i].period); > > +if (s->timer[i].qemu_timer) { > > +qemu_put_timer(f, s->timer[i].qemu_timer); > > +} > > > > Would qemu_timer ever be NULL? You're right... the answer is no. I'll fix that. > > + > > + > > +diff = hpet_calculate_diff(t, cur_tick); > > +qemu_mod_timer(t->qemu_timer, qemu_get_clock(vm_clock) > > ++ (int64_t)ticks_to_ns(diff)); > > > > May want to convert ticks_to_ns to take and return an int64_t. The > explicit casting could introduce very subtle bugs. > It seems better this way to me, since muldiv64 in ticks_to_ns takes uint64_t. The likelihood of diff being big enough to create a problem seems small enough. Am I missing something? > > +case HPET_COUNTER: > > +if (hpet_enabled(s)) > > +cur_tick = hpet_get_ticks(s); > > > > Any reason for hpet_get_ticks(s) to not have this check integrated into it? When the hpet is being disabled, we need to get the actual count, even though the hpet_enabled check would return false. So if I made this change it would introduce an ordering issue in the disable code (i.e., get the ticks before setting the hpet to disabled) > > + > > +/* XXX this is a dirty hack for HPET support w/o LPC > > + Actually this is a config descriptor for the RCBA */ > > > > What's the dirty hack? This comment is left over from Alexander Graf's code. I'm not sure why it is in this location and will I'll remove it. But in comments on the first version of hpet code I produced, Alexander said, regarding the fixed assignment of HPET_BASE: "This is a dirty hack that I did to make Mac OS X happy. Actually the HPET base address gets specified in the RCBA on the LPC and is configured by the BIOS to point to a valid address, with 0xfed0 being the default (IIRC if you write 0 to the fields you end up with that address)." But in other areas of qemu code I see base addresses being hardcoded and am not sure anything different needs to be done here. Comments? > > Regards, > > Anthony Liguori > -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/2] Add HPET emulation to qemu (v3)
On Fri, 2008-10-17 at 16:49 +0100, Jamie Lokier wrote: > Beth Kon wrote: > > Clock drift on Linux is in the range of .017% - .019%, loaded and unloaded. > > I > > haven't found a straightforward way to test on Windows and would appreciate > > any pointers to existing approaches. > > Is there any reason why there should be any clock drift, when the > guest is using a non-PIT clock? > > I'm probably being naive, but with 32-bit or 64-bit HPET counters > available to the guest, and accurate values from the CMOS clock > emulation, I don't see why drift would accumulate over the long term > relative to the host clock. I was measuring with ntpdate, so the drift is with respect to the ntp server pool, not the host clock. But in any case, since timer interrupts and reads of the hpet counter are at the mercy of the host scheduler (i.e., the qemu process can be swapped out at any time during hpet read or timer expiration), I'd guess there would always be some amount of inaccuracy. Also, qemu checks for timer expiration (qemu_run_timers) as part of a bigger loop (main_loop_wait), so the varying amounts of work to do elsewhere in the loop from iteration to iteration would also introduce irregular delays. > > -- Jamie > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Add HPET emulation to qemu (v3)
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] This patch contains the bochs bios changes needed to support the qemu hpet. --- bios/acpi-dsdt.dsl | 18 ++ bios/rombios32.c | 46 -- 2 files changed, 62 insertions(+), 2 deletions(-) Index: bochs-2.3.7/bios/acpi-dsdt.dsl === --- bochs-2.3.7.orig/bios/acpi-dsdt.dsl 2008-10-15 12:39:14.0 -0500 +++ bochs-2.3.7/bios/acpi-dsdt.dsl 2008-10-15 18:50:44.0 -0500 @@ -159,6 +159,24 @@ Return (MEMP) } } +Device(HPET) { +Name(_HID, EISAID("PNP0103")) +Name(_UID, 0) +Method (_STA, 0, NotSerialized) { +Return(0x0F) +} +Name(_CRS, ResourceTemplate() { +DWordMemory( +ResourceConsumer, PosDecode, MinFixed, MaxFixed, +NonCacheable, ReadWrite, +0x, +0xFED0, +0xFED003FF, +0x, +0x0400 /* 1K memory: FED0 - FED003FF */ +) +}) +} } Scope(\_SB.PCI0) { Index: bochs-2.3.7/bios/rombios32.c === --- bochs-2.3.7.orig/bios/rombios32.c 2008-10-15 12:39:36.0 -0500 +++ bochs-2.3.7/bios/rombios32.c 2008-10-16 18:30:40.0 -0500 @@ -1087,7 +1087,7 @@ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t table_offset_entry [3]; /* Array of pointers to other */ + uint32_t table_offset_entry [4]; /* Array of pointers to other */ /* ACPI tables */ }; @@ -1227,6 +1227,31 @@ #endif }; +/* + * * ACPI 2.0 Generic Address Space definition. + * */ +struct acpi_20_generic_address { +uint8_t address_space_id; +uint8_t register_bit_width; +uint8_t register_bit_offset; +uint8_t reserved; +uint64_t address; +}; + +/* + * * HPET Description Table + * */ +struct acpi_20_hpet { +ACPI_TABLE_HEADER_DEF /* ACPI common table header */ +uint32_t timer_block_id; +struct acpi_20_generic_address addr; +uint8_thpet_number; +uint16_t min_tick; +uint8_tpage_protect; +}; +#define ACPI_HPET_ADDRESS 0xFED0UL + + struct madt_io_apic { APIC_HEADER_DEF @@ -1341,8 +1366,9 @@ struct fadt_descriptor_rev1 *fadt; struct facs_descriptor_rev1 *facs; struct multiple_apic_table *madt; +struct acpi_20_hpet *hpet; uint8_t *dsdt, *ssdt; -uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, ssdt_addr; +uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, ssdt_addr, hpet_addr; uint32_t acpi_tables_size, madt_addr, madt_size; int i; @@ -1388,6 +1414,11 @@ madt = (void *)(addr); addr += madt_size; +addr = (addr + 7) & ~7; +hpet_addr = addr; +hpet = (void *)(addr); +addr += sizeof(*hpet); + acpi_tables_size = addr - base_addr; BX_INFO("ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx size=0x%x\n", @@ -1410,6 +1441,7 @@ rsdt->table_offset_entry[0] = cpu_to_le32(fadt_addr); rsdt->table_offset_entry[1] = cpu_to_le32(madt_addr); rsdt->table_offset_entry[2] = cpu_to_le32(ssdt_addr); +rsdt->table_offset_entry[3] = cpu_to_le32(hpet_addr); acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", sizeof(*rsdt), 1); @@ -1471,6 +1503,16 @@ acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); } +/* HPET */ +memset(hpet, 0, sizeof(*hpet)); +/* Note timer_block_id value must be kept in sync with value advertised by + * emulated hpet + */ +hpet->timer_block_id = cpu_to_le32(0x80868201); +hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); +acpi_build_table_header((struct acpi_table_header *)hpet, + "HPET", sizeof(*hpet), 1); + } /* SMBIOS entry point -- must be written to a 16-bit aligned address
[PATCH 1/2] Add HPET emulation to qemu (v3)
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] This version contains many miscellaneous changes, including incorporation of comments received, addition of save/restore support and a reset handler, and a couple of bugfixes. I've booted Linux and Win2k832 guests on QEMU. Win2k832 still looks shaky on QEMU even without the hpet - I've gotten intermittent blue screens. But I did get it to boot at least once both with and without hpet. Clock drift on Linux is in the range of .017% - .019%, loaded and unloaded. I haven't found a straightforward way to test on Windows and would appreciate any pointers to existing approaches. The second patch in this series contains the needed bochs bios changes. Signed-off-by: Beth Kon <[EMAIL PROTECTED]> --- Makefile.target |2 +- hw/hpet.c| 572 ++ hw/i8254.c | 11 + hw/mc146818rtc.c | 30 +++- hw/pc.c |2 + 5 files changed, 614 insertions(+), 3 deletions(-) diff --git a/Makefile.target b/Makefile.target index e2edf9d..9e80b3d 100644 --- a/Makefile.target +++ b/Makefile.target @@ -545,7 +545,7 @@ ifeq ($(TARGET_BASE_ARCH), i386) OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o -OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o +OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE endif ifeq ($(TARGET_BASE_ARCH), ppc) diff --git a/hw/hpet.c b/hw/hpet.c new file mode 100644 index 000..61fbaaf --- /dev/null +++ b/hw/hpet.c @@ -0,0 +1,572 @@ +/* + * High Precisition Event Timer emulation + * + * Copyright (c) 2007 Alexander Graf + * Copyright (c) 2008 IBM Corporation + * + * Authors: Beth Kon <[EMAIL PROTECTED]> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * * + * + * This driver attempts to emulate an HPET device in software. + * + */ +#include "hw.h" +#include "console.h" +#include "qemu-timer.h" + +//#define HPET_DEBUG + +#define HPET_BASE 0xfed0 +#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/ + +#define FS_PER_NS 100 +#define HPET_NUM_TIMERS 3 +#define HPET_TIMER_TYPE_LEVEL 1 +#define HPET_TIMER_TYPE_EDGE 0 +#define HPET_TIMER_DELIVERY_APIC 0 +#define HPET_TIMER_DELIVERY_FSB 1 +#define HPET_TIMER_CAP_FSB_INT_DEL (1 << 15) +#define HPET_TIMER_CAP_PER_INT (1 << 4) + +#define HPET_CFG_ENABLE 0x001 +#define HPET_CFG_LEGACY 0x002 + +#define HPET_ID 0x000 +#define HPET_PERIOD 0x004 +#define HPET_CFG0x010 +#define HPET_STATUS 0x020 +#define HPET_COUNTER0x0f0 +#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/ +#define HPET_TN_CFG 0x000 +#define HPET_TN_CMP 0x008 +#define HPET_TN_ROUTE 0x010 + + +#define HPET_TN_ENABLE 0x004 +#define HPET_TN_PERIODIC 0x008 +#define HPET_TN_PERIODIC_CAP 0x010 +#define HPET_TN_SIZE_CAP 0x020 +#define HPET_TN_SETVAL 0x040 +#define HPET_TN_32BIT0x100 +#define HPET_TN_INT_ROUTE_MASK 0x3e00 +#define HPET_TN_INT_ROUTE_SHIFT 9 +#define HPET_TN_INT_ROUTE_CAP_SHIFT 32 +#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U + +#define timer_int_route(timer) \ +((timer->config & HPET_TN_INT_ROUTE_MASK) >> HPET_TN_INT_ROUTE_SHIFT) + +#define hpet_enabled(s) (s->config & HPET_CFG_ENABLE) +#define timer_is_periodic(t) (t->config & HPET_TN_PERIODIC) +#define timer_enabled(t) (t->config & HPET_TN_ENABLE) + +#define hpet_time_after(a, b) ((int32_t)(b) - (int32_t)(a) < 0) +#define hpet_time_after64(a, b) ((int64_t)(b) - (int64_t)(a) < 0) + + +/*indicator hpet is operating in legacy mode */ +int hpet_legacy=0; + +struct HPETState; +typedef struct HPETTimer { /* timers */ +uint8_t tn; /*timer number*/ +QEMUTimer *qemu_timer; +struct HPETState *state; +/* Memory-mapped, software visible timer registers */ +uint64_t config;/* configuration/cap
Re: Need help with windows debug tools - HPET problems on win2k864
On Sat, 2008-09-13 at 07:50 +0300, Avi Kivity wrote: > Beth Kon wrote: > > I ran into trouble trying to get the hpet working with win2k864. It > > hangs very early on (black screen with "Windows is loading Files" at the > > bottom). My guess is there are problems with our acpi/bios changes, > > since they introduce some ACPI 2.0 structures and QEMU/KVM supports ACPI > > 1.0. We may not have enough 2.0 infrastructure to get it working right > > for 64 bit. Win2k832 does work with the hpet, but 64 doesn't. > > > > So! I need to figure out how to debug Windows and Anthony suggested that > > there may be some people with experience here. > > > > Something that's worked for me is to enable memory dump triggered by > ctrl-ctrl-scroll-lock. There's some registry key you set, and on the > next hang you can generate a memory dump, which you can then analyze > with windbg. > > Of course, your hang might well occur earlier than disk driver > initialization, so this is not bulletproof. > The problem is, I've been told (and confirmed with a test) I can't add the hpet after windows is installed. It needs to be present for installation, and I'm getting the black screen at the start of the install. So there is no registry to play with yet. My guess is this is just too early to get useful debug, though I was hoping a checked build would provide information over the serial port even during install. I'll keep trying to see if there's a way to make that happen. Did you need a host and target for the kind of memory dump analysis you did? Or just use windbg on a local dump file? I ask because I'm trying to figure out if the "-serial pty" and "-serial /dev/pts/0" is the right way to set up the null modem cable between 2 guests. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Fwd: Re: Need help with windows debug tools - HPET problems on win2k864]
Oops... meant to copy the list too... -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] --- Begin Message --- On Fri, 2008-09-12 at 20:40 +0200, Sebastian Herbszt wrote: > Beth Kon wrote: > > >I ran into trouble trying to get the hpet working with win2k864. It > > hangs very early on (black screen with "Windows is loading Files" at the > > bottom). My guess is there are problems with our acpi/bios changes, > > since they introduce some ACPI 2.0 structures and QEMU/KVM supports ACPI > > 1.0. We may not have enough 2.0 infrastructure to get it working right > > for 64 bit. Win2k832 does work with the hpet, but 64 doesn't. > > Does win2k864 without hpet work? > Yes, win2k864 works on kvm, but not qemu. > Maybe the article "Using the windows debugger under Xen" at > http://wiki.xensource.com/xenwiki/XenWindowsGplPv can help. I'll look into that... thanks. > > - Sebastian > -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] --- End Message ---
Need help with windows debug tools - HPET problems on win2k864
I ran into trouble trying to get the hpet working with win2k864. It hangs very early on (black screen with "Windows is loading Files" at the bottom). My guess is there are problems with our acpi/bios changes, since they introduce some ACPI 2.0 structures and QEMU/KVM supports ACPI 1.0. We may not have enough 2.0 infrastructure to get it working right for 64 bit. Win2k832 does work with the hpet, but 64 doesn't. So! I need to figure out how to debug Windows and Anthony suggested that there may be some people with experience here. I have set up a host and a target guest on kvm, one started with -serial pty and the other with -serial /dev/pts/0 (which was the value spit out by the first guest). I'm trying to test the serial connection by following instructions at: http://msdn.microsoft.com/en-us/library/cc266323.aspx which just sends a message over the serial connection from one guest to the other. I specify the baud rate as shown , assuming it shouldn't matter in this emulated environment, but I can't get the message to show up. Does anyone have any suggestions? Is this the proper way to set up a "null modem cable" between 2 guests? Also, reading the windows docs, I see that there are several debuggers, like kd, cdb, ntsd, and windbg (gui wrapper). I downloaded the debugging package from: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx#a which claims to download all tools, but I can't figure out how to invoke kd directly. It appears not to be there. I can invoke WinDbg from the Start menu, however, and I assume that will be sufficient for my needs. Lastly, I still haven't figured out if these debuggers require setup on both host and target. If so, they will be useless for my needs since I get a hang before the OS boots. But there must be some way to debug boot issues. If anyone has any comments/suggestions on any of the above, I'd be very appreciative! -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH]Problems with hpet on kvm
I've been playing with my hpet patch on kvm and seeing some strange behavior. The patch I've been using is attached below. /usr/local/bin/qemu-system-x86_64 -boot cd -hda /home/beth/images/ubuntu_server_8.04_10G.img -m 1024 -net nic,model=e1000 -net user -smp 2 -vnc :1 With the above command line the boot intermittently fails with an infinite roll of error messages that look something like this: *BEGIN ERROR MESSAGES ... ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ E] - event disabled [20070126] ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ F] - event disabled [20070126] printk: 242 messages suppressed. kacpid: page allocation failure. order:0, mode:0x20 Pid: 93, comm: kacpid Not tainted 2.6.25.9 #13 Call Trace: [] __alloc_pages+0x325/0x33e [] kmem_getpages+0xc6/0x194 [] fallback_alloc+0x10d/0x185 [] kmem_cache_alloc+0xbd/0xe7 [] acpi_ev_asynch_execute_gpe_method+0x0/0x117 [] acpi_os_execute+0x2e/0x9a [] acpi_ev_gpe_dispatch+0xd0/0x149 [] acpi_ev_gpe_detect+0xb1/0x104 [] acpi_ev_fixed_event_detect+0x34/0xd4 [] acpi_ev_sci_xrupt_handler+0x1a/0x22 [] acpi_irq+0x11/0x23 [] handle_IRQ_event+0x25/0x53 [] handle_fasteoi_irq+0x90/0xc8 [] do_IRQ+0xf1/0x15f [] ret_from_intr+0x0/0xa [] __do_softirq+0x5a/0xce [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x68 [] irq_exit+0x3f/0x83 [] do_IRQ+0x13e/0x15f [] ret_from_intr+0x0/0xa [] acpi_ns_get_parent_node+0x14/0x15 [] acpi_ns_delete_namespace_by_owner+0xb7/0xde [] acpi_ds_terminate_control_method+0x73/0xc6 [] acpi_ps_parse_aml+0x179/0x254 [] acpi_ps_execute_method+0x12b/0x1d7 [] acpi_ns_evaluate+0xa4/0x100 [] acpi_ev_asynch_execute_gpe_method+0xc4/0x117 [] acpi_os_execute_deferred+0x0/0x2c [] acpi_os_execute_deferred+0x23/0x2c [] run_workqueue+0x79/0x104 [] worker_thread+0xd9/0xe8 [] autoremove_wake_function+0x0/0x2e [] worker_thread+0x0/0xe8 [] kthread+0x47/0x76 [] schedule_tail+0x28/0x5c [] child_rip+0xa/0x12 [] kthread+0x0/0x76 [] child_rip+0x0/0x12 Mem-info: Node 0 DMA per-cpu: CPU0: hi:0, btch: 1 usd: 0 CPU1: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: hi: 186, btch: 31 usd: 65 CPU1: hi: 186, btch: 31 usd: 184 Active:0 inactive:0 dirty:0 writeback:0 unstable:0 free:0 slab:255796 mapped:0 pagetables:0 bounce:0 Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:8924kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:1018020kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB 0 total pagecache pages Swap cache: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB Free swap:0kB 262128 pages of RAM 5637 reserved pages 0 pages shared 0 pages swap cached ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ 8] - event disabled [20070126] ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ 9] - event disabled [20070126] ... *END ERROR MESSAGES** If I add -no-kvm-irqchip, the error disappears. Can anyone offer any insight about what is going on here? I don't know if it is related, but booting linux with the hpet seems to stall in some places, and I don't see that when booting without the hpet. Other than this problem, I have booted win2k8 and linux with the hpet. The only other odd situation is that, to get linux to work I have to use irq 0 for timer0, but to get windows to work, I have to use irq 2. In hpet.c update_irq: if (timer->tn == 0) irq=timer->state->irqs[0]; must be changed to if (timer->tn == 0) irq=timer->state->irqs[2]; to get win2k8 to boot. Any ideas? Beth Kon IBM Linux Technology Center ****** signed-off-by Beth Kon <[EMAIL PROTECTED]> diff --git a/qemu/Makefile.target b/qemu/Makefile.target index a86464f..8634186 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -607,7 +607,7 @@ ifeq ($(TARGET_BASE_ARCH), i386) OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o -OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o +OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o hpet.o ifeq ($(USE_KVM_PIT), 1) OBJS+= i8254-kvm.o endif diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c new file mode 100644 index 000..01dee56 --- /dev/null +++ b/qemu/hw/hpet.c @@ -0,0 +1,486 @@ +/* + * High Precisition Event Timer emulation + * + * Copyrig
Re: [Qemu-devel] [RFC][PATCH] Add HPET emulation to qemu (v2)
On Sat, 2008-08-02 at 18:21 +0100, Samuel Thibault wrote: > Anthony Liguori, le Sat 02 Aug 2008 09:46:30 -0500, a écrit : > > Samuel Thibault wrote: > > >Beth Kon, le Sat 02 Aug 2008 06:05:14 -0500, a écrit : > > > > > >>I was trying to reproduce the wakeup every 10ms that > > >>Samuel Thibault mentioned, thinking the HPET would improve it. > > >>But for an idle guest in both cases (with and without HPET), the > > >>number of wakeups per second was relatively low (28). > > >> > > > > > >I was referring to vl.c's timeout = 10; which makes the select call > > >use a timeout of 10ms. That said, "/* If all cpus are halted then wait > > >until the next IRQ */", so maybe that's why you get slower wakeups per > > >second. I'm still surprised because of the call to qemu_mod_timer in > > >pit_irq_timer_update which should setup at least a 100Hz timer with > > >linux guests (when they don't have HPET available). > > > > > > > The patch disables that when hpet is active. > > That's why I would expect, indeed, but he is reporting that _without_ > HPET he gets low wakeups per second already. > > Samuel Yes, 28 is incorrect. I was misinterpreting the output of powertop, shown here: Wakeups-from-idle per second : 27.7 interval: 10.0s no ACPI power usage estimate available Top causes for wakeups: 46.1% ( 63.9) qemu-system-x86 : schedule_timeout (process_timeout) 36.5% ( 50.6) qemu-system-x86 : sys_timer_settime (posix_timer_fn) ... The "Wakeups-from-idle per second" reports 27.7, but the powertop source code shows that this value is the total wakeups-per-second divided by the number of online processors. So the proper number of wakeups-per-second caused by the select is 63.9, which makes more sense. Looking at the main_loop code, there is no way to get a timeout of greater than 10 without setting icount. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] Add HPET emulation to qemu (v2)
Major changes: - Rebased to register-based operations for ease of save/restore. - Looked through Xen's hpet implementation and picked up a bunch of things, though not quite everything yet. Thanks! - PIT and RTC are entirely disabled in legacy mode, not just their interrupts. There is still a bunch to do but I'm re-posting primarily because of the switch to register-based. I have still only tested with a linux guest. Windows guest is next on my list... as soon as I return from my week vacation. I've been playing with CONFIG_NO_HZ and been surprised by the results. I was trying to reproduce the wakeup every 10ms that Samuel Thibault mentioned, thinking the HPET would improve it. But for an idle guest in both cases (with and without HPET), the number of wakeups per second was relatively low (28). Ultimately this depends on exactly what the guest is doing when idle, so maybe the HPET won't provide any improvement here. But in any case, I didn't see the 10ms wakeup cycle with CONFIG_NO_HZ. If anyone can shed any light on this, I could look into it more if need be. Signed-off-by: Beth Kon <[EMAIL PROTECTED]> *** Makefile.target |2 hw/hpet.c| 441 +++ hw/i8254.c | 11 + hw/mc146818rtc.c | 30 +++ hw/pc.c |2 5 files changed, 483 insertions(+), 3 deletions(-) *** diff --git a/Makefile.target b/Makefile.target index 42162c3..946bdef 100644 --- a/Makefile.target +++ b/Makefile.target @@ -536,7 +536,7 @@ ifeq ($(TARGET_BASE_ARCH), i386) OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o -OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o +OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE endif ifeq ($(TARGET_BASE_ARCH), ppc) diff --git a/hw/hpet.c b/hw/hpet.c new file mode 100644 index 000..adfecf0 --- /dev/null +++ b/hw/hpet.c @@ -0,0 +1,441 @@ +/* + * High Precisition Event Timer emulation + * + * Copyright (c) 2007 Alexander Graf + * Copyright (c) 2008 IBM Corporation + * + * Authors: Beth Kon <[EMAIL PROTECTED]> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * * + * + * This driver attempts to emulate an HPET device in software. It is by no + * means complete and is prone to break on certain conditions. + * + */ +#include "hw.h" +#include "console.h" +#include "qemu-timer.h" + +//#define HPET_DEBUG + +#define HPET_BASE 0xfed0 +#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/ + +#define FS_PER_NS 100 +#define HPET_NUM_TIMERS 3 +#define HPET_TIMER_TYPE_LEVEL 1 +#define HPET_TIMER_TYPE_EDGE 0 +#define HPET_TIMER_DELIVERY_APIC 0 +#define HPET_TIMER_DELIVERY_FSB 1 +#define HPET_TIMER_CAP_FSB_INT_DEL (1 << 15) +#define HPET_TIMER_CAP_PER_INT (1 << 4) + +#define HPET_CFG_ENABLE 0x001 +#define HPET_CFG_LEGACY 0x002 + +#define HPET_ID 0x000 +#define HPET_PERIOD 0x004 +#define HPET_CFG0x010 +#define HPET_STATUS 0x020 +#define HPET_COUNTER0x0f0 +#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/ +#define HPET_TN_CFG 0x000 +#define HPET_TN_CMP 0x008 +#define HPET_TN_ROUTE 0x010 + + +#define HPET_TN_INT_TYPE_LEVEL 0x002 +#define HPET_TN_ENABLE 0x004 +#define HPET_TN_PERIODIC 0x008 +#define HPET_TN_PERIODIC_CAP 0x010 +#define HPET_TN_SIZE_CAP 0x020 +#define HPET_TN_SETVAL 0x040 +#define HPET_TN_32BIT0x100 +#define HPET_TN_INT_ROUTE_MASK 0x3e00 +#define HPET_TN_INT_ROUTE_SHIFT 9 +#define HPET_TN_INT_ROUTE_CAP_SHIFT 32 +#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U + +#define timer_int_route(timer) \ +((timer->config & HPET_TN_INT_ROUTE_MASK) >> HPET_TN_INT_ROUTE_SHIFT) + +#define hpet_enabled(s) (s->config & HPET_CFG_ENABLE) +#define timer_is_periodic(t) (t->config & HPET_TN_PERIODIC) +#define timer_enabled(t) (t->config & HPET_TN_ENABLE) + +struct
[RFC][PATCH] Add HPET emulation to qemu (v2)
Major changes: - Rebased to register-based operations for ease of save/restore. - Looked through Xen's hpet implementation and picked up a bunch of things, though not quite everything yet. Thanks! - PIT and RTC are entirely disabled in legacy mode, not just their interrupts. There is still a bunch to do but I'm re-posting primarily because of the switch to register-based. I have still only tested with a linux guest. Windows guest is next on my list... as soon as I return from my week vacation. I've been playing with CONFIG_NO_HZ and been surprised by the results. I was trying to reproduce the wakeup every 10ms that Samuel Thibault mentioned, thinking the HPET would improve it. But for an idle guest in both cases (with and without HPET), the number of wakeups per second was relatively low (28). Ultimately this depends on exactly what the guest is doing when idle, so maybe the HPET won't provide any improvement here. But in any case, I didn't see the 10ms wakeup cycle with CONFIG_NO_HZ. If anyone can shed any light on this, I could look into it more if need be. Signed-off-by: Beth Kon <[EMAIL PROTECTED]> *** Makefile.target |2 hw/hpet.c| 441 +++ hw/i8254.c | 11 + hw/mc146818rtc.c | 30 +++ hw/pc.c |2 5 files changed, 483 insertions(+), 3 deletions(-) *** diff --git a/Makefile.target b/Makefile.target index 42162c3..946bdef 100644 --- a/Makefile.target +++ b/Makefile.target @@ -536,7 +536,7 @@ ifeq ($(TARGET_BASE_ARCH), i386) OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o -OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o +OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE endif ifeq ($(TARGET_BASE_ARCH), ppc) diff --git a/hw/hpet.c b/hw/hpet.c new file mode 100644 index 000..adfecf0 --- /dev/null +++ b/hw/hpet.c @@ -0,0 +1,441 @@ +/* + * High Precisition Event Timer emulation + * + * Copyright (c) 2007 Alexander Graf + * Copyright (c) 2008 IBM Corporation + * + * Authors: Beth Kon <[EMAIL PROTECTED]> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * * + * + * This driver attempts to emulate an HPET device in software. It is by no + * means complete and is prone to break on certain conditions. + * + */ +#include "hw.h" +#include "console.h" +#include "qemu-timer.h" + +//#define HPET_DEBUG + +#define HPET_BASE 0xfed0 +#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/ + +#define FS_PER_NS 100 +#define HPET_NUM_TIMERS 3 +#define HPET_TIMER_TYPE_LEVEL 1 +#define HPET_TIMER_TYPE_EDGE 0 +#define HPET_TIMER_DELIVERY_APIC 0 +#define HPET_TIMER_DELIVERY_FSB 1 +#define HPET_TIMER_CAP_FSB_INT_DEL (1 << 15) +#define HPET_TIMER_CAP_PER_INT (1 << 4) + +#define HPET_CFG_ENABLE 0x001 +#define HPET_CFG_LEGACY 0x002 + +#define HPET_ID 0x000 +#define HPET_PERIOD 0x004 +#define HPET_CFG0x010 +#define HPET_STATUS 0x020 +#define HPET_COUNTER0x0f0 +#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/ +#define HPET_TN_CFG 0x000 +#define HPET_TN_CMP 0x008 +#define HPET_TN_ROUTE 0x010 + + +#define HPET_TN_INT_TYPE_LEVEL 0x002 +#define HPET_TN_ENABLE 0x004 +#define HPET_TN_PERIODIC 0x008 +#define HPET_TN_PERIODIC_CAP 0x010 +#define HPET_TN_SIZE_CAP 0x020 +#define HPET_TN_SETVAL 0x040 +#define HPET_TN_32BIT0x100 +#define HPET_TN_INT_ROUTE_MASK 0x3e00 +#define HPET_TN_INT_ROUTE_SHIFT 9 +#define HPET_TN_INT_ROUTE_CAP_SHIFT 32 +#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U + +#define timer_int_route(timer) \ +((timer->config & HPET_TN_INT_ROUTE_MASK) >> HPET_TN_INT_ROUTE_SHIFT) + +#define hpet_enabled(s) (s->config & HPET_CFG_ENABLE) +#define timer_is_periodic(t) (t->config & HPET_TN_PERIODIC) +#define timer_enabled(t) (t->config & HPET_TN_ENABLE) + +struct
[RFC][PATCH] Add HPET emulation to qemu (v2)
Major changes: - Rebased to register-based operations for ease of save/restore. - Looked through Xen's hpet implementation and picked up a bunch of things, though not quite everything yet. Thanks! - PIT and RTC are entirely disabled in legacy mode, not just their interrupts. There is still a bunch to do but I'm re-posting primarily because of the switch to register-based. I have still only tested with a linux guest. Windows guest is next on my list... as soon as I return from my week vacation. I've been playing with CONFIG_NO_HZ and been surprised by the results. I was trying to reproduce the wakeup every 10ms that Samuel Thibault mentioned, thinking the HPET would improve it. But for an idle guest in both cases (with and without HPET), the number of wakeups per second was relatively low (28). Ultimately this depends on exactly what the guest is doing when idle, so maybe the HPET won't provide any improvement here. But in any case, I didn't see the 10ms wakeup cycle with CONFIG_NO_HZ. If anyone can shed any light on this, I could look into it more if need be. Signed-off-by: Beth Kon <[EMAIL PROTECTED]> *** Makefile.target |2 hw/hpet.c| 441 +++ hw/i8254.c | 11 + hw/mc146818rtc.c | 30 +++ hw/pc.c |2 5 files changed, 483 insertions(+), 3 deletions(-) *** diff --git a/Makefile.target b/Makefile.target index 42162c3..946bdef 100644 --- a/Makefile.target +++ b/Makefile.target @@ -536,7 +536,7 @@ ifeq ($(TARGET_BASE_ARCH), i386) OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o -OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o +OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE endif ifeq ($(TARGET_BASE_ARCH), ppc) diff --git a/hw/hpet.c b/hw/hpet.c new file mode 100644 index 000..adfecf0 --- /dev/null +++ b/hw/hpet.c @@ -0,0 +1,441 @@ +/* + * High Precisition Event Timer emulation + * + * Copyright (c) 2007 Alexander Graf + * Copyright (c) 2008 IBM Corporation + * + * Authors: Beth Kon <[EMAIL PROTECTED]> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * * + * + * This driver attempts to emulate an HPET device in software. It is by no + * means complete and is prone to break on certain conditions. + * + */ +#include "hw.h" +#include "console.h" +#include "qemu-timer.h" + +//#define HPET_DEBUG + +#define HPET_BASE 0xfed0 +#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/ + +#define FS_PER_NS 100 +#define HPET_NUM_TIMERS 3 +#define HPET_TIMER_TYPE_LEVEL 1 +#define HPET_TIMER_TYPE_EDGE 0 +#define HPET_TIMER_DELIVERY_APIC 0 +#define HPET_TIMER_DELIVERY_FSB 1 +#define HPET_TIMER_CAP_FSB_INT_DEL (1 << 15) +#define HPET_TIMER_CAP_PER_INT (1 << 4) + +#define HPET_CFG_ENABLE 0x001 +#define HPET_CFG_LEGACY 0x002 + +#define HPET_ID 0x000 +#define HPET_PERIOD 0x004 +#define HPET_CFG0x010 +#define HPET_STATUS 0x020 +#define HPET_COUNTER0x0f0 +#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/ +#define HPET_TN_CFG 0x000 +#define HPET_TN_CMP 0x008 +#define HPET_TN_ROUTE 0x010 + + +#define HPET_TN_INT_TYPE_LEVEL 0x002 +#define HPET_TN_ENABLE 0x004 +#define HPET_TN_PERIODIC 0x008 +#define HPET_TN_PERIODIC_CAP 0x010 +#define HPET_TN_SIZE_CAP 0x020 +#define HPET_TN_SETVAL 0x040 +#define HPET_TN_32BIT0x100 +#define HPET_TN_INT_ROUTE_MASK 0x3e00 +#define HPET_TN_INT_ROUTE_SHIFT 9 +#define HPET_TN_INT_ROUTE_CAP_SHIFT 32 +#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U + +#define timer_int_route(timer) \ +((timer->config & HPET_TN_INT_ROUTE_MASK) >> HPET_TN_INT_ROUTE_SHIFT) + +#define hpet_enabled(s) (s->config & HPET_CFG_ENABLE) +#define timer_is_periodic(t) (t->config & HPET_TN_PERIODIC) +#define timer_enabled(t) (t->config & HPET_TN_ENABLE) + +struct
Re: [RFC][PATCH] Add HPET emulation to qemu
On Sat, 2008-07-12 at 17:42 +0200, Alexander Graf wrote: > Hi Beth, > > On Jul 10, 2008, at 5:48 AM, Beth Kon wrote: > > > This patch, based on an earlier patch by Alexander Graf, adds HPET > > emulation to qemu. I am sending out a separate patch to kvm with the > > required bios changes. > > > > This work is incomplete. > > Wow it's good to see that someone's working on it. I am pretty sure > that you're basing on an older version of my HPET emulation, so you > might also want to take a look at the current patch file residing in > http://alex.csgraf.de/qemu/osxpatches.tar.bz2 Hi Alex. Thanks for the feedback! Sorry for the delayed response, I've been on vacation. I did check the patch you pointed me to and it is actually the same one that I started with. > > While reading through the code I realized how badly commented it is. > At least the functions should have some comments on them what their > purpose is. > Furthermore there still are a lot of magic numbers in the code. While > that is "normal qemu code style" and I wrote it this way, I'm not too > fond of it. So it might be a good idea to at least make the switch > numbers defines. > Ok... added those things to my todo list :-) > > > > The one area that feels ugly/wrong at the moment is handling the > > disabling of 8254 and RTC timer interrupts when operating in legacy > > mode. The HPET spec says "in this case the 8254/RTC timer will not > > cause > > any interrupts". I'm not sure if I should disable the RTC/8254 in some > > more general way, or just disable interrupts. Comments appreciated. > > IIRC the spec defines that the HPET _can_ replace the 8254, but does > not have to. So you should be mostly fine on that part. > > > > > + > > +//#define HPET_DEBUG > > + > > +#define HPET_BASE 0xfed0 > > This is a dirty hack that I did to make Mac OS X happy. Actually the > HPET base address gets specified in the RCBA on the LPC and is > configured by the BIOS to point to a valid address, with 0xfed0 > being the default (IIRC if you write 0 to the fields you end up with > that address). Yes, Ryan Harper's BIOS patch that was submitted with this patch specified the HPET address in ACPI. I am not familiar with this stuff, so not sure how that relates to the RCBA and whether more needs to be done here. For the time being I'll add it to the todo list. > > > > > +#define HPET_PERIOD 0x00989680 /* 1000 femptoseconds, > > 10ns*/ > > Any reason why this is a hex value? I find 1000 a lot easier to > read :-) > Well that's a VERY good question! Job security? :-) > > > > > > +static uint32_t hpet_ram_readw(void *opaque, target_phys_addr_t addr) > > +{ > > +#ifdef HPET_DEBUG > > +fprintf(stderr, "qemu: hpet_read w at %#lx\n", addr); > > +#endif > > +return 10; > > +} > > If I'm not completely mistaken, all reads and writes need to be in 32- > or 64-bit mode. So it's pretty safe to remove these. I only added them > to see if Mac OS X actually would access them. To still enable other > people to do the same you might as well ifdef them out. > Yep, you're right. I'll do that. > > > > + > > +static uint32_t hpet_ram_readl(void *opaque, target_phys_addr_t addr) > > +{ > > +HPETState *s = (HPETState *)opaque; > > +#ifdef HPET_DEBUG > > + fprintf(stderr, "qemu: hpet_read l at %#lx\n", addr); > > +#endif > > +switch(addr - HPET_BASE) { > > +case 0x00: > > +return 0x8086a201; > > +case 0x04: > > +return HPET_PERIOD; > > +case 0x10: > > +return ((s->legacy_route << 1) | s->enabled); > > +case 0x14: > > +#ifdef HPET_DEBUG > > +fprintf(stderr, "qemu: invalid hpet_read l at %#lx\n", > > addr); > > +#endif > > +return 0; > > +case 0xf0: > > +s->hpet_counter = ns_to_ticks(qemu_get_clock(vm_clock) > > + - s->hpet_offset) ; > > I'm having trouble understanding this part. The hpet_counter is > actually the ticks of the internal main clock of the HPET. This value > is actually supposed to constantly change wrt to the current time. The > "timers" in the HPET can now compare themselves to the "current value" > of the hpet_counter at all times, rising an interrupt if something > matches. > > So far for the theor
Re: [Qemu-devel] [RFC][PATCH] Add HPET emulation to qemu
On Thu, 2008-07-10 at 10:18 +0100, Samuel Thibault wrote: > Cool! > Does it now happen that qemu no longer wakes up every 10ms? If not, > please try to make sure it happens, that would eventually fix that power > leak :) > > Samuel > I will look into CONFIG_NO_HZ operation next. Haven't tried that yet. > -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add HPET support to BIOS
This patch, written by Ryan Harper, adds HPET support to BIOS. Signed-off-by: Beth Kon <[EMAIL PROTECTED]> diff --git a/bios/Makefile b/bios/Makefile index 48022ea..3e73fb5 100644 --- a/bios/Makefile +++ b/bios/Makefile @@ -40,7 +40,7 @@ LIBS = -lm RANLIB = ranlib BCC = bcc -GCC = gcc -m32 +GCC = gcc -m32 -fno-stack-protector HOST_CC = gcc AS86 = as86 diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl index d1bfa2c..1548c86 100755 --- a/bios/acpi-dsdt.dsl +++ b/bios/acpi-dsdt.dsl @@ -262,6 +262,24 @@ DefinitionBlock ( Return (MEMP) } } +Device(HPET) { +Name(_HID, EISAID("PNP0103")) +Name(_UID, 0) +Method (_STA, 0, NotSerialized) { +Return(0x00) +} +Name(_CRS, ResourceTemplate() { +DWordMemory( +ResourceConsumer, PosDecode, MinFixed, MaxFixed, +NonCacheable, ReadWrite, +0x, +0xFED0, +0xFED003FF, +0x, +0x0400 /* 1K memory: FED0 - FED003FF */ +) +}) +} } Scope(\_SB.PCI0) { @@ -628,7 +646,7 @@ DefinitionBlock ( { Or (PRQ3, 0x80, PRQ3) } -Method (_CRS, 0, NotSerialized) +Method (_CRS, 1, NotSerialized) { Name (PRR0, ResourceTemplate () { diff --git a/bios/rombios32.c b/bios/rombios32.c index 2dc1d25..c1ec015 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -1182,7 +1182,7 @@ struct rsdp_descriptor /* Root System Descriptor Pointer */ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t table_offset_entry [2]; /* Array of pointers to other */ + uint32_t table_offset_entry [3]; /* Array of pointers to other */ /* ACPI tables */ }; @@ -1322,6 +1322,30 @@ struct madt_processor_apic #endif }; +/* + * ACPI 2.0 Generic Address Space definition. + */ +struct acpi_20_generic_address { +uint8_t address_space_id; +uint8_t register_bit_width; +uint8_t register_bit_offset; +uint8_t reserved; +uint64_t address; +}; + +/* + * HPET Description Table + */ +struct acpi_20_hpet { +ACPI_TABLE_HEADER_DEF /* ACPI common table header */ +uint32_t timer_block_id; +struct acpi_20_generic_address addr; +uint8_thpet_number; +uint16_t min_tick; +uint8_tpage_protect; +}; +#define ACPI_HPET_ADDRESS 0xFED0UL + struct madt_io_apic { APIC_HEADER_DEF @@ -1393,8 +1417,9 @@ void acpi_bios_init(void) struct fadt_descriptor_rev1 *fadt; struct facs_descriptor_rev1 *facs; struct multiple_apic_table *madt; +struct acpi_20_hpet *hpet; uint8_t *dsdt; -uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr; +uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, hpet_addr; uint32_t acpi_tables_size, madt_addr, madt_size; int i; @@ -1436,6 +1461,11 @@ void acpi_bios_init(void) madt = (void *)(addr); addr += madt_size; +addr = (addr + 7) & ~7; +hpet_addr = addr; +hpet = (void *)(addr); +addr += sizeof(*hpet); + acpi_tables_size = addr - base_addr; BX_INFO("ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx size=0x%x\n", @@ -1457,6 +1487,7 @@ void acpi_bios_init(void) memset(rsdt, 0, sizeof(*rsdt)); rsdt->table_offset_entry[0] = cpu_to_le32(fadt_addr); rsdt->table_offset_entry[1] = cpu_to_le32(madt_addr); +rsdt->table_offset_entry[2] = cpu_to_le32(hpet_addr); acpi_build_table_header((struct acpi_table_header *)rsdt, "RSDT", sizeof(*rsdt), 1); @@ -1540,6 +1571,15 @@ void acpi_bios_init(void) acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); } + +/* HPET */ +memset(hpet, 0, sizeof(*hpet)); +hpet->timer_block_id = cpu_to_le32(0x8086a201); + // hpet->timer_block_id = cpu_to_le32(0x80862201); +hpet->addr.address = cpu_to_le32(ACPI_HPET_ADDRESS); +acpi_build_table_header((struct acpi_table_header *)hpet, + "HPET", sizeof(*hpet), 1); + } /* SMBIOS entry point -- must be written to a 16-bit aligned address -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] Add HPET emulation to qemu
This patch, based on an earlier patch by Alexander Graf, adds HPET emulation to qemu. I am sending out a separate patch to kvm with the required bios changes. This work is incomplete. Currently working (at least generally): - linux 2.6.25.9 guest Todo: - other guest support (i.e. adding whatever may be missing for support of other modes of operation used by other OS's). - level-triggered interrupts - non-legacy routing - 64-bit operation - ... Basically what I've done so far is make it work for linux. The one area that feels ugly/wrong at the moment is handling the disabling of 8254 and RTC timer interrupts when operating in legacy mode. The HPET spec says "in this case the 8254/RTC timer will not cause any interrupts". I'm not sure if I should disable the RTC/8254 in some more general way, or just disable interrupts. Comments appreciated. Signed-off-by: Beth Kon <[EMAIL PROTECTED]> diffstat output: Makefile.target |2 hw/hpet.c| 393 +++ hw/i8254.c |8 - hw/mc146818rtc.c | 25 ++- hw/pc.c |5 5 files changed, 427 insertions(+), 6 deletions(-) diff --git a/Makefile.target b/Makefile.target index 73adbb1..05829ea 100644 --- a/Makefile.target +++ b/Makefile.target @@ -530,7 +530,7 @@ ifeq ($(TARGET_BASE_ARCH), i386) OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o -OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o +OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE endif ifeq ($(TARGET_BASE_ARCH), ppc) diff --git a/hw/hpet.c b/hw/hpet.c new file mode 100644 index 000..e74de08 --- /dev/null +++ b/hw/hpet.c @@ -0,0 +1,393 @@ +/* + * High Precisition Event Timer emulation + * + * Copyright (c) 2007 Alexander Graf + * Copyright (c) 2008 IBM Corporation + * + * Authors: Beth Kon <[EMAIL PROTECTED]> + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * * + * + * This driver attempts to emulate an HPET device in software. It is by no + * means complete and is prone to break on certain conditions. + * + */ +#include "hw.h" +#include "console.h" +#include "qemu-timer.h" + +//#define HPET_DEBUG + +#define HPET_BASE 0xfed0 +#define HPET_PERIOD 0x00989680 /* 1000 femptoseconds, 10ns*/ + +#define FS_PER_NS 100 +#define HPET_NUM_TIMERS 3 +#define HPET_TIMER_TYPE_LEVEL 1 +#define HPET_TIMER_TYPE_EDGE 0 +#define HPET_TIMER_DELIVERY_APIC 0 +#define HPET_TIMER_DELIVERY_FSB 1 +#define HPET_TIMER_CAP_FSB_INT_DEL (1 << 15) +#define HPET_TIMER_CAP_PER_INT (1 << 4) + +struct HPETState; +typedef struct HPETTimer { +QEMUTimer *timer; +struct HPETState *state; +uint8_t type; +uint8_t active; +uint8_t delivery; +uint8_t apic_port; +uint8_t periodic; +uint8_t enabled; +uint32_t comparator; // if(hpet_counter == comparator) IRQ(); +uint32_t delta; +qemu_irq irq; +} HPETTimer; + +typedef struct HPETState { +uint64_t hpet_counter; +uint64_t hpet_offset; +int64_t next_periodic_time; +uint8_t legacy_route; +uint8_t enabled; +uint8_t updated; +qemu_irq *irqs; +HPETTimer timer[HPET_NUM_TIMERS]; +} HPETState; + + +int hpet_legacy; + +static void update_irq(struct HPETTimer *timer) +{ +if(timer->enabled && timer->state->enabled) { +qemu_irq_pulse(timer->irq); +} +} + +static void update_irq_all(struct HPETState *s) +{ +int i; +for(i=0; itimer[i]); +} + +static inline int64_t ticks_to_ns(int64_t value) +{ +return (value * HPET_PERIOD / FS_PER_NS); +} + +static inline int64_t ns_to_ticks(int64_t value) +{ +return (value * FS_PER_NS / HPET_PERIOD); +} + +static void hpet_timer(void *opaque) +{ +HPETTimer *s = (HPETTimer*)opaque; +if(s->periodic) { + s->comparator += s->delta; + qemu_mod_timer(s->timer, qemu_get_clock(vm_clock) + + ticks_to_ns(s->delta)); +} +update_irq(s); +} + +static void h
Re: ata exception messages
On Wed, 2008-06-04 at 15:38 +0300, Avi Kivity wrote: > Beth Kon wrote: > > On Tue, 2008-06-03 at 10:49 -0400, Beth Kon wrote: > > > >> I'm running an Ubuntu 7.10 guest on a kvm git build (commit > >> 3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated > >> messages like: > >> > >> [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 > >> frozen > >> [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb > >> 0x0 data 4096 out > >> > >> I see that they're coming from ata_eh_link_report in > >> drivers/ata/libata-eh.c but am not familiar enough with this code to > >> understand what the problem is. > >> > >> Does anyone have any idea what might be causing this? > >> > >> > > I discovered that these messages were associated with my disk image being > > NFS > > mounted. Yes, the network has been misbehaving lately, so could be causing timeouts. > > Interesting. Is it an exceptionally slow server (or perhaps, on a lossy > network)? > > I can see how timeouts can annoy the ide driver, but I've never seen > this myself. > -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ata exception messages
On Tue, 2008-06-03 at 10:49 -0400, Beth Kon wrote: > I'm running an Ubuntu 7.10 guest on a kvm git build (commit > 3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated > messages like: > > [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 > frozen > [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb > 0x0 data 4096 out > > I see that they're coming from ata_eh_link_report in > drivers/ata/libata-eh.c but am not familiar enough with this code to > understand what the problem is. > > Does anyone have any idea what might be causing this? > I discovered that these messages were associated with my disk image being NFS mounted. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ata exception messages
I'm running an Ubuntu 7.10 guest on a kvm git build (commit 3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated messages like: [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out I see that they're coming from ata_eh_link_report in drivers/ata/libata-eh.c but am not familiar enough with this code to understand what the problem is. Does anyone have any idea what might be causing this? -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html