No longer working on HPET

2010-02-02 Thread Beth Kon
I have decided to take a job outside of IBM and so will not be involved 
with HPET any longer. Working on KVM has been great fun... top-notch 
people and a top-notch technology. Wishing KVM and you all the best!


--
Regards,

Beth Kon

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The HPET issue on Linux

2010-01-06 Thread Beth Kon

Dor Laor wrote:

On 01/06/2010 12:09 PM, Gleb Natapov wrote:

On Wed, Jan 06, 2010 at 05:48:52PM +0800, Sheng Yang wrote:

Hi Beth

I still found the emulated HPET would result in some boot failure. For
example, on my 2.6.30, with HPET enabled, the kernel would fail 
check_timer(),

especially in timer_irq_works().

The testing of timer_irq_works() is let 10 ticks pass(using 
mdelay()), and
want to confirm the clock source with at least 5 ticks advanced in 
jiffies.
I've checked that, on my machine, it would mostly get only 4 ticks 
when HPET
enabled, then fail the test. On the other hand, if I using PIT, it 
would get
more than 10 ticks(maybe understandable if some complementary ticks 
there). Of

course, extend the ticks count/mdelay() time can work.

I think it's a major issue of HPET. And it maybe just due to a too long
userspace path for interrupt injection... If it's true, I think it's 
not easy

to deal with it.


PIT tick are reinjected automatically, HPET should probably do the same
although it may just create another set of problems.


Older Linux do automatic adjustment for lost ticks so automatic 
reinjection causes time to run too fast. This is why we added the 
-no-kvm-pit-reinject flag...


It took lots of time to pit/rtc to stabilize, in order of seriously 
consider the hpet emulation, lots of testing should be done.
I will try to look into this. Since HPET is edge-triggered, looks like 
this problem is of a different nature than PIT.  Is this a solid failure 
or intermittent?






--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Regards,

Beth Kon

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The HPET issue on Linux

2010-01-06 Thread Beth Kon

Beth Kon wrote:

Dor Laor wrote:

On 01/06/2010 12:09 PM, Gleb Natapov wrote:

On Wed, Jan 06, 2010 at 05:48:52PM +0800, Sheng Yang wrote:

Hi Beth

I still found the emulated HPET would result in some boot failure. For
example, on my 2.6.30, with HPET enabled, the kernel would fail 
check_timer(),

especially in timer_irq_works().

The testing of timer_irq_works() is let 10 ticks pass(using 
mdelay()), and
want to confirm the clock source with at least 5 ticks advanced in 
jiffies.
I've checked that, on my machine, it would mostly get only 4 ticks 
when HPET
enabled, then fail the test. On the other hand, if I using PIT, it 
would get
more than 10 ticks(maybe understandable if some complementary ticks 
there). Of

course, extend the ticks count/mdelay() time can work.

I think it's a major issue of HPET. And it maybe just due to a too 
long
userspace path for interrupt injection... If it's true, I think 
it's not easy

to deal with it.


PIT tick are reinjected automatically, HPET should probably do the same
although it may just create another set of problems.


Older Linux do automatic adjustment for lost ticks so automatic 
reinjection causes time to run too fast. This is why we added the 
-no-kvm-pit-reinject flag...


It took lots of time to pit/rtc to stabilize, in order of seriously 
consider the hpet emulation, lots of testing should be done.
I will try to look into this. Since HPET is edge-triggered, looks like 
this problem is of a different nature than PIT.  Is this a solid 
failure or intermittent?
Anthony just explained that on x86, even edge-triggered interrupts are 
queued in the apic and an eoi will occur, so this is not different than 
the PIT.






--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






--
Regards,

Beth Kon

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: PC machine types switched to SeaBIOS/gPXE

2009-11-03 Thread Beth Kon

Kevin O'Connor wrote:

On Mon, Nov 02, 2009 at 05:22:00PM -0600, Anthony Liguori wrote:
  

Beth Kon wrote:

Serendipity allowed us to find this really easily, thanks to some old  
builds lying around...


The following Seabios commit breaks gpxe boot with e1000:
  

[...]
  

Any thoughts Kevin?

Before this commit, the gPXE e1000 rom was able to successfully netboot  
when selected as a boot device.  With this commit, we get a device not  
found error within gPXE when launched as a boot device but when run  
from the gPXE command line, it launches successfully.



The easist way to debug this is to enable debugging output.  It's
possible to modify qemu's hw/pc.c and enable DEBUG_BIOS, but it's
probably simpler to recompile SeaBIOS and set CONFIG_DEBUG_SERIAL in
src/config.h (and possibly increase CONFIG_DEBUG_LEVEL).

With the later, one can then run:

qemu -net nic,model=e1000 -boot n -serial stdio

and what comes out is:

Scan for option roms
Running option rom at c900:0003
pnp call arg1=60
pmm call arg1=0
Found option rom with bad checksum: loc=0x000c9000 len=72192 sum=37

So, the e1000 option rom is modifying itself and not properly updating
its checksum - thefore SeaBIOS doesn't consider it in its BEV list.
The fact that it changed in the commit highlighted above was probably
just random.

When was this gpxe rom built?  I know gpxe used to have an issue with
the checksum not being updated, but I thought that was fixed about six
months ago.
  
The rom was built about 2 weeks ago. But I don't follow what you're 
saying. The same rom works when the Seabios tree is reset to the commit 
just prior to this one. That would suggest to me that the rom isn't the 
problem. But I agree that there is no obvious connection between that 
commit and a bad checksum. What am I missing?

-Kevin
  






Re: [Qemu-devel] Re: PC machine types switched to SeaBIOS/gPXE

2009-11-02 Thread Beth Kon

Jan Kiszka wrote:

Stefan Weil wrote:
  

Anthony Liguori schrieb:


Jan Kiszka wrote:
  

Anthony Liguori wrote:
 


Hi,

I just wanted to let everyone know that I've switched the PC machine
type to SeaBIOS and gPXE.  SeaBIOS is a port of the Bochs BIOS to GCC,
by Kevin O'Conner, along with quite a lot of clean up and new
feature work.

gPXE is the new development tree of etherboot which is now
deprecated. We've done a lot of testing of and while there are a few
outstanding
issues, almost everything seems to be working okay.

Some known issues:
o e1000 pxe booting doesn't seem to work
o gPXE does not like the slirp tftp server

  

Can't confirm, works nicely here with default settings (e1000).
  


Interesting.  I'll have to look again.
  

Hi,

it loads the ROM, but ROM and e1000 don't match.



But they work together if you do not specify '-boot n' and go via the
command prompt instead.

  

I tested with a fresh build.

Jan, did you check that qemu uses the correct bios path?
Maybe -L dir was missing.



Yes, it's correct and -L makes no difference.

Jan

  
Serendipity allowed us to find this really easily, thanks to some old 
builds lying around...


The following Seabios commit breaks gpxe boot with e1000:

commit a5826b5ad482f44d293387dc7513e5e98802a54e
Author: Kevin O'Connor ke...@koconnor.net
Date:   Sat Oct 24 17:57:29 2009 -0400

   Add simple cooperative threading scheme to allow parallel hw init.
  
   Enable system for running hardware initialization in parallel.

   The yield() call can now round-robin between threads.
   Rework ata controller init to use a thread per controller.
   Make sure internal drives are registered in a defined order.
   Run keyboard initialization in a thread.
   Rework usb init to use a thread per controller.







[Fwd: Re: [Qemu-devel] Re: PC machine types switched to SeaBIOS/gPXE]

2009-11-02 Thread Beth Kon

Anthony, I assume you meant to cc Kevin...
---BeginMessage---

Beth Kon wrote:
Serendipity allowed us to find this really easily, thanks to some old 
builds lying around...


The following Seabios commit breaks gpxe boot with e1000:

commit a5826b5ad482f44d293387dc7513e5e98802a54e
Author: Kevin O'Connor ke...@koconnor.net
Date:   Sat Oct 24 17:57:29 2009 -0400

   Add simple cooperative threading scheme to allow parallel hw init.
 Enable system for running hardware initialization in parallel.
   The yield() call can now round-robin between threads.
   Rework ata controller init to use a thread per controller.
   Make sure internal drives are registered in a defined order.
   Run keyboard initialization in a thread.
   Rework usb init to use a thread per controller.


Any thoughts Kevin?

Before this commit, the gPXE e1000 rom was able to successfully netboot 
when selected as a boot device.  With this commit, we get a device not 
found error within gPXE when launched as a boot device but when run 
from the gPXE command line, it launches successfully.


Regards,

Anthony Liguori


---End Message---


[Qemu-devel] accidental mistyping of command line kills networking

2009-10-27 Thread Beth Kon

I accidentally entered a command line as follows:

/usr/bin/qemu-kvm -drive 
file=/scratch/images/beth/windows/win2k3_32_R2.dat.10G.img,if=ide -m 
2048 -boot cd -net nic,model=rtl8139 -net tap,script=/etc/qemu-ifup -vnc 
:12 -usbdevice tablet -monitor stdio -net nic,model=e1000 -net 
tap,script=/etc/qemu-ifup


and the machine's networking broke, requiring a network restart to get 
it back in order. The second -net tap,script=/etc/qemu-ifup causes the 
problem.


/var/log/messages shows
tap0: received packet with  own address as source address

I don't have time at the moment to look into what's going wrong.  Just 
wanted to make people aware.


Beth Kon






Re: [Qemu-devel] accidental mistyping of command line kills networking

2009-10-27 Thread Beth Kon

Beth Kon wrote:

I accidentally entered a command line as follows:

/usr/bin/qemu-kvm -drive 
file=/scratch/images/beth/windows/win2k3_32_R2.dat.10G.img,if=ide -m 
2048 -boot cd -net nic,model=rtl8139 -net tap,script=/etc/qemu-ifup 
-vnc :12 -usbdevice tablet -monitor stdio -net nic,model=e1000 -net 
tap,script=/etc/qemu-ifup


and the machine's networking broke, requiring a network restart to get 
it back in order. The second -net tap,script=/etc/qemu-ifup causes the 
problem.


/var/log/messages shows
tap0: received packet with  own address as source address

I don't have time at the moment to look into what's going wrong.  Just 
wanted to make people aware.


Beth Kon


A clarification... this command line is ok. But as it happens, windows 
datacenter does not have a driver for rtl8139. So somehow, this 
driverless adapter in windows is effectively causing an extra tap 
device to be specified on the qemu command line, wreaking havoc with the 
networking on the host.





[PATCH 1/5] BIOS changes for irq0-inti2 override (v9)

2009-07-07 Thread Beth Kon
bios: allow qemu to configure irq0-inti2 override

Win2k8 expects the HPET interrupt on inti2, regardless of whether
an override exists in the BIOS. And the HPET spec states that in legacy 
mode,
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is capable,
userspace sets up irq0-inti2 via the irq routing interface, and adds the
irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v8 - Incorporated Gleb's comments to patch 1/5 and 4/5.
  In 1/5, removed a return per Gleb's comment.
  See 4/5 for v8-v9 change description. 

Signed-off-by: Beth Kon e...@us.ibm.com

---
 kvm/bios/rombios32.c |   66 +
 1 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 0369111..f9e0452 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -446,6 +446,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -555,6 +559,16 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+}
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1153,7 +1167,14 @@ static void mptable_init(void)
 putstr(q, 0.1 ); /* vendor id */
 putle32(q, 0); /* OEM table ptr */
 putle16(q, 0); /* OEM table size */
+#ifdef BX_QEMU
+if (irq0_override)
+putle16(q, MAX_CPUS + 17); /* entry count */
+else
+putle16(q, MAX_CPUS + 18); /* entry count */
+#else
 putle16(q, MAX_CPUS + 18); /* entry count */
+#endif
 putle32(q, 0xfee0); /* local APIC addr */
 putle16(q, 0); /* ext table length */
 putb(q, 0); /* ext table checksum */
@@ -1197,6 +1218,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1204,7 +1232,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1768,23 +1801,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
-#ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+int_override = (struct madt_int_override*)(io_apic);
+#ifdef BX_QEMU
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 0;
+int_override-gsi = 2;
+int_override-flags = 0; /* conforms to bus specifications */
+int_override++;
+}
 #endif
-
-int_override = (struct madt_int_override*)(io_apic + 1);
-for ( i = 0; i  16; i++ ) {
-if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+for (i = 0; i  16; i++) {
+if (PCI_ISA_IRQ_MASK  (1U  i)) {
 memset(int_override, 0, sizeof(*int_override));
 int_override-type   = APIC_XRUPT_OVERRIDE;
 int_override-length = sizeof

[PATCH 3/5] BIOS changes for qemu-kvm hpet support (v9)

2009-07-07 Thread Beth Kon
Advertise HPET in ACPI HPET table

Signed-off-by: Beth Kon e...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

---
 kvm/bios/acpi-dsdt.dsl |2 --
 kvm/bios/rombios32.c   |   11 +++
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index 3560baa..26fc7ad 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -194,7 +194,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -214,7 +213,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index f9e0452..28f2b21 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1526,8 +1526,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1716,13 +1716,11 @@ void acpi_bios_init(void)
 addr += madt_size;
 
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1900,7 +1898,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1909,7 +1906,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 #endif
 
@@ -1919,8 +1915,7 @@ void acpi_bios_init(void)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(madt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 acpi_additional_tables(); /* resets cfg to required entry */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] Userspace changes for irq0-inti2 override support (v9)

2009-07-07 Thread Beth Kon
Select irq0-irq2 override based on kernel gsi routing availability

If the kernel does not support gsi routing, we cannot do the irq0-irq2
override, so disable it in that case.

Signed-off-by: Beth Kon e...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

---
 hw/ioapic.c|6 +++---
 hw/pc.c|2 ++
 qemu-kvm-x86.c |6 +-
 qemu-kvm.h |2 ++
 sysemu.h   |1 +
 vl.c   |   11 +--
 6 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index a7a5ef9..c894b72 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/hw/pc.c b/hw/pc.c
index 05d05e0..043a0da 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -55,6 +55,7 @@
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
+#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
 
 #define MAX_IDE_BUS 2
 
@@ -476,6 +477,7 @@ static void bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index a78073e..f7c66d1 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1561,7 +1561,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index eb99bc4..b044ead 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -167,6 +167,7 @@ int kvm_has_sync_mmu(void);
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
 #else
@@ -175,6 +176,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
 #define qemu_kvm_cpu_stop(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 2824b0d..5b42506 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -111,6 +111,7 @@ extern int xenfb_enabled;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
+extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index df583b7..d8b7198 100644
--- a/vl.c
+++ b/vl.c
@@ -255,6 +255,7 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
+uint8_t irq0override = 1;
 #ifndef _WIN32
 int daemonize = 0;
 #endif
@@ -6199,8 +6200,14 @@ int main(int argc, char **argv, char **envp)
 
 module_call_init(MODULE_INIT_DEVICE);
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Kernel changes for HPET legacy support(v9)

2009-07-07 Thread Beth Kon
When kvm is in hpet_legacy_mode, the hpet is providing the 
timer interrupt and the pit should not be. So in legacy mode, the pit timer is
destroyed, but the *state* of the pit is maintained. So if kvm or the guest
tries to modify the state of the pit, this modification is accepted, *except* 
that the timer isn't actually started. When we exit hpet_legacy_mode, 
the current state of the pit (which is up to date since we've been 
accepting modifications) is used to restart the pit timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to 
0xff in order to destroy the timer, but then restores the actual value,
again maintaining current state of the pit for possible later reenablement.

Changes from v7:
- added kvm_pit_state2 struct with flags field
- replaced hpet legacy mode ioctl with get/set pit2 ioctl

changes from v6:

- added ioctl interface for legacy mode in order not to break the abi.


Signed-off-by: Beth Kon e...@us.ibm.com

---
 arch/x86/include/asm/kvm.h |8 ++
 arch/x86/kvm/i8254.c   |   22 ++---
 arch/x86/kvm/i8254.h   |3 +-
 arch/x86/kvm/x86.c |   55 +++-
 include/linux/kvm.h|6 
 5 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..f5554dd 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -18,6 +18,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_MSIX
 #define __KVM_HAVE_MCE
+#define __KVM_HAVE_PIT_STATE2
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
@@ -237,6 +238,13 @@ struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
 };
 
+#define KPIT_FLAGS_HPET_LEGACY  0x0001
+
+struct kvm_pit_state2 {
+   struct kvm_pit_channel_state channels[3];
+   __u32 flags;
+};
+
 struct kvm_reinject_control {
__u8 pit_reinject;
__u8 reserved[31];
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 6e0a203..0b0a761 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -329,20 +329,33 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
case 1:
 /* FIXME: enhance mode 4 precision */
case 4:
-   create_pit_timer(ps, val, 0);
+   if (!(ps-flags  KPIT_FLAGS_HPET_LEGACY)) {
+   create_pit_timer(ps, val, 0);
+   }
break;
case 2:
case 3:
-   create_pit_timer(ps, val, 1);
+   if (!(ps-flags  KPIT_FLAGS_HPET_LEGACY)){
+   create_pit_timer(ps, val, 1);
+   }
break;
default:
destroy_pit_timer(ps-pit_timer);
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
-   pit_load_count(kvm, channel, val);
+   u8 saved_mode;
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   pit_load_count(kvm, channel, val);
+   }
 }
 
 static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev)
@@ -546,6 +559,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.flags = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..d4c1c7f 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u32 flags;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af53f64..aa75466 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1196,6 +1196,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_IRQFD:
case KVM_CAP_PIT2:
+   case KVM_CAP_PIT_STATE2:
r = 1

[PATCH 1/5] BIOS changes for irq0-inti2 override (v8)

2009-06-30 Thread Beth Kon
bios: allow qemu to configure irq0-inti2 override

Win2k8 expects the HPET interrupt on inti2, regardless of whether
an override exists in the BIOS. And the HPET spec states that in legacy 
mode,
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is capable,
userspace sets up irq0-inti2 via the irq routing interface, and adds the
irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Signed-off-by: Beth Kon e...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

---
 kvm/bios/rombios32.c |   67 ++
 1 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 0369111..9e5370e 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -446,6 +446,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -555,6 +559,17 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1153,7 +1168,14 @@ static void mptable_init(void)
 putstr(q, 0.1 ); /* vendor id */
 putle32(q, 0); /* OEM table ptr */
 putle16(q, 0); /* OEM table size */
+#ifdef BX_QEMU
+if (irq0_override)
+putle16(q, MAX_CPUS + 17); /* entry count */
+else
+putle16(q, MAX_CPUS + 18); /* entry count */
+#else
 putle16(q, MAX_CPUS + 18); /* entry count */
+#endif
 putle32(q, 0xfee0); /* local APIC addr */
 putle16(q, 0); /* ext table length */
 putb(q, 0); /* ext table checksum */
@@ -1197,6 +1219,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1204,7 +1233,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1768,23 +1802,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
-#ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+int_override = (struct madt_int_override*)(io_apic);
+#ifdef BX_QEMU
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 0;
+int_override-gsi = 2;
+int_override-flags = 0; /* conforms to bus specifications */
+int_override++;
+}
 #endif
-
-int_override = (struct madt_int_override*)(io_apic + 1);
-for ( i = 0; i  16; i++ ) {
-if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+for (i = 0; i  16; i++) {
+if (PCI_ISA_IRQ_MASK  (1U  i)) {
 memset(int_override, 0, sizeof(*int_override));
 int_override-type   = APIC_XRUPT_OVERRIDE;
 int_override-length = sizeof(*int_override);
@@ -2708,6 +2740,9 @@ void rombios32_init(uint32_t *s3_resume_vector, uint8_t 
*shutdown_flag)
 
 if (bios_table_cur_addr != 0

[PATCH 2/5] Userspace changes for irq0-inti2 override support (v8)

2009-06-30 Thread Beth Kon
Select irq0-irq2 override based on kernel gsi routing availability

If the kernel does not support gsi routing, we cannot do the irq0-irq2
override, so disable it in that case.

Signed-off-by: Beth Kon e...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

---
 hw/ioapic.c|6 +++---
 hw/pc.c|2 ++
 qemu-kvm-x86.c |6 +-
 qemu-kvm.h |2 ++
 sysemu.h   |1 +
 vl.c   |   11 +--
 6 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index a7a5ef9..c894b72 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/hw/pc.c b/hw/pc.c
index 05d05e0..043a0da 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -55,6 +55,7 @@
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
+#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
 
 #define MAX_IDE_BUS 2
 
@@ -476,6 +477,7 @@ static void bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index a78073e..f7c66d1 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1561,7 +1561,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index eb99bc4..b044ead 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -167,6 +167,7 @@ int kvm_has_sync_mmu(void);
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
 #else
@@ -175,6 +176,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
 #define qemu_kvm_cpu_stop(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 2824b0d..5b42506 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -111,6 +111,7 @@ extern int xenfb_enabled;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
+extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index df583b7..d8b7198 100644
--- a/vl.c
+++ b/vl.c
@@ -255,6 +255,7 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
+uint8_t irq0override = 1;
 #ifndef _WIN32
 int daemonize = 0;
 #endif
@@ -6199,8 +6200,14 @@ int main(int argc, char **argv, char **envp)
 
 module_call_init(MODULE_INIT_DEVICE);
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] BIOS changes for qemu-kvm hpet support (v8)

2009-06-30 Thread Beth Kon
Advertise HPET in ACPI HPET table

Signed-off-by: Beth Kon e...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

---
 kvm/bios/acpi-dsdt.dsl |2 --
 kvm/bios/rombios32.c   |   11 +++
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index 3560baa..26fc7ad 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -194,7 +194,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -214,7 +213,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 9e5370e..110d130 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1527,8 +1527,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1717,13 +1717,11 @@ void acpi_bios_init(void)
 addr += madt_size;
 
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1901,7 +1899,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1910,7 +1907,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 #endif
 
@@ -1920,8 +1916,7 @@ void acpi_bios_init(void)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(madt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 acpi_additional_tables(); /* resets cfg to required entry */
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Kernel changes for HPET legacy support(v8)

2009-06-30 Thread Beth Kon
When kvm is in hpet_legacy_mode, the hpet is providing the 
timer interrupt and the pit should not be. So in legacy mode, the pit timer is
destroyed, but the *state* of the pit is maintained. So if kvm or the guest
tries to modify the state of the pit, this modification is accepted, *except* 
that the timer isn't actually started. When we exit hpet_legacy_mode, 
the current state of the pit (which is up to date since we've been 
accepting modifications) is used to restart the pit timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to 
0xff in order to destroy the timer, but then restores the actual value,
again maintaining current state of the pit for possible later reenablement.

Changes from v7:
- added kvm_pit_state2 struct with flags field
- replaced hpet legacy mode ioctl with get/set pit2 ioctl

changes from v6:

- added ioctl interface for legacy mode in order not to break the abi.


Signed-off-by: Beth Kon e...@us.ibm.com

---
 arch/x86/include/asm/kvm.h |8 ++
 arch/x86/kvm/i8254.c   |   22 ++---
 arch/x86/kvm/i8254.h   |3 +-
 arch/x86/kvm/x86.c |   55 +++-
 include/linux/kvm.h|6 
 5 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..f5554dd 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -18,6 +18,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_MSIX
 #define __KVM_HAVE_MCE
+#define __KVM_HAVE_PIT_STATE2
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
@@ -237,6 +238,13 @@ struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
 };
 
+#define KPIT_FLAGS_HPET_LEGACY  0x0001
+
+struct kvm_pit_state2 {
+   struct kvm_pit_channel_state channels[3];
+   __u32 flags;
+};
+
 struct kvm_reinject_control {
__u8 pit_reinject;
__u8 reserved[31];
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 6e0a203..0b0a761 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -329,20 +329,33 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
case 1:
 /* FIXME: enhance mode 4 precision */
case 4:
-   create_pit_timer(ps, val, 0);
+   if (!(ps-flags  KPIT_FLAGS_HPET_LEGACY)) {
+   create_pit_timer(ps, val, 0);
+   }
break;
case 2:
case 3:
-   create_pit_timer(ps, val, 1);
+   if (!(ps-flags  KPIT_FLAGS_HPET_LEGACY)){
+   create_pit_timer(ps, val, 1);
+   }
break;
default:
destroy_pit_timer(ps-pit_timer);
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
-   pit_load_count(kvm, channel, val);
+   u8 saved_mode;
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   pit_load_count(kvm, channel, val);
+   }
 }
 
 static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev)
@@ -546,6 +559,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.flags = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..d4c1c7f 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u32 flags;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af53f64..aa75466 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1196,6 +1196,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_IRQFD:
case KVM_CAP_PIT2:
+   case KVM_CAP_PIT_STATE2:
r = 1

[PATCH 4/5] Userspace changes for qemu-kvm HPET support(v8)

2009-06-30 Thread Beth Kon
The big change here is handling of enabling/disabling of hpet legacy mode. When 
hpet enters
legacy mode, the spec says that the pit stops generating interrupts. In 
practice, we want to
stop the pit periodic timer from running because it is wasteful in a virtual 
environment.

We also have to worry about the hpet leaving legacy mode (which, at least in 
linux, happens
only during a shutdown or crash). At this point, according to the hpet spec, 
PIT interrupts
need to be reenabled. For us, it means the PIT timer needs to be restarted.

This patch handles this situation better than the earlier versions by coming 
closer to
just disabling PIT interrupts. It allows the PIT state to change if the OS 
modifies it,
even while PIT is disabled, but does not allow a pit timer to start. Then if 
HPET
legacy mode is disabled, whatever the PIT state is at that point, the PIT timer 
is
restarted accordingly.

Changes from v7:
- added flags field to PITState
- added kvm_pit_state2 struct with flags field
- replaced hpet legacy mode ioctl with get/set pit2 ioctl

Changes from v6:

- added ioctl interface for setting hpet legacy mode in kernel pit
- moved check for hpet_legacy_mode in pit_load_count to allow state info
  to be copied before returning if legacy mode is enabled.
- sprinkled in some #ifdef TARGET_I386


Signed-off-by: Beth Kon e...@us.ibm.com
---
 hw/hpet.c |   16 +++--
 hw/i8254-kvm.c|   26 ++-
 hw/i8254.c|   77 
 hw/i8254.h|3 ++
 hw/pc.h   |4 +-
 kvm/include/linux/kvm.h   |4 ++
 kvm/include/x86/asm/kvm.h |7 
 libkvm-all.h  |   32 ++-
 qemu-kvm-x86.c|   38 ++
 qemu-kvm.c|   20 
 qemu-kvm.h|8 +
 vl.c  |   21 ++--
 12 files changed, 217 insertions(+), 39 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index e0be486..462e6db 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_timer(f, s-timer[i].qemu_timer);
 }
 }
+if (hpet_in_legacy_mode()) {
+hpet_disable_pit();
+}
 return 0;
 }
 
@@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_disable_pit();
+dprintf(qemu: hpet disabled pit\n);
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
 }
 break;
 case HPET_CFG + 4:
@@ -554,13 +559,16 @@ static void hpet_reset(void *opaque) {
 /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */
 s-capability = 0x8086a201ULL;
 s-capability |= ((HPET_CLK_PERIOD)  32);
-if (count  0)
+s-config = 0ULL;
+if (count  0) {
 /* we don't enable pit when hpet_reset is first called (by hpet_init)
  * because hpet is taking over for pit here. On subsequent invocations,
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
+}
 count = 1;
 }
 
diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c
index 8390d75..8145658 100644
--- a/hw/i8254-kvm.c
+++ b/hw/i8254-kvm.c
@@ -33,15 +33,20 @@ static PITState pit_state;
 static void kvm_pit_save(QEMUFile *f, void *opaque)
 {
 PITState *s = opaque;
-struct kvm_pit_state pit;
+struct kvm_pit_state2 pit2;
 struct kvm_pit_channel_state *c;
 struct PITChannelState *sc;
 int i;
 
-kvm_get_pit(kvm_context, pit);
-
+if(qemu_kvm_has_pit_state2()) {
+kvm_get_pit2(kvm_context, pit2);
+s-flags = pit2.flags;
+} else {
+/* pit2 is superset of pit struct so just cast it and use it */
+kvm_get_pit(kvm_context, (struct kvm_pit_state *)pit2);
+}
 for (i = 0; i  3; i++) {
-   c = pit.channels[i];
+   c = pit2.channels[i];
sc = s-channels[i];
sc-count = c-count;
sc-latched_count = c-latched_count;
@@ -64,15 +69,16 @@ static void kvm_pit_save(QEMUFile *f, void *opaque)
 static int kvm_pit_load(QEMUFile *f, void *opaque, int version_id)
 {
 PITState *s = opaque;
-struct kvm_pit_state pit;
+struct kvm_pit_state2 pit2;
 struct kvm_pit_channel_state *c;
 struct PITChannelState *sc;
 int i;
 
 pit_load(f, s, version_id

Re: [PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)

2009-06-19 Thread Beth Kon

Jan Kiszka wrote:

Beth Kon wrote:
  
When kvm is in hpet_legacy_mode, the hpet is providing the 
timer interrupt and the pit should not be. So in legacy mode, the pit timer is

destroyed, but the *state* of the pit is maintained. So if kvm or the guest
tries to modify the state of the pit, this modification is accepted, *except* 
that the timer isn't actually started. When we exit hpet_legacy_mode, 
the current state of the pit (which is up to date since we've been 
accepting modifications) is used to restart the pit timer.


The saved_mode code in kvm_pit_load_count temporarily changes mode to 
0xff in order to destroy the timer, but then restores the actual value,

again maintaining current state of the pit for possible later reenablement.

changes from v6:

- Added ioctl interface for legacy mode in order not to break the abi.


Signed-off-by: Beth Kon e...@us.ibm.com



...

  

@@ -1986,7 +1987,24 @@ static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
int r = 0;
 
 	memcpy(kvm-arch.vpit-pit_state, ps, sizeof(struct kvm_pit_state));

-   kvm_pit_load_count(kvm, 0, ps-channels[0].count);
+   kvm_pit_load_count(kvm, 0, ps-channels[0].count, 0);
+   return r;
+}
+
+static int kvm_vm_ioctl_get_hpet_legacy_mode(struct kvm *kvm, u8 *mode)
+{
+   int r = 0;
+   *mode = kvm-arch.vpit-pit_state.hpet_legacy_mode;
+   return r;
+}



This only applies if we go for a separate mode IOCTL:
The legacy mode is not directly modifiable by the guest. Is it planned
to add in-kernel hpet support? Otherwise get_hpet_legacy_mode looks a
bit like overkill given that user space could easily track the state.
  
Assuming I will at least generalize the ioctl, I'll leave this question 
for the time being.
  

+
+static int kvm_vm_ioctl_set_hpet_legacy_mode(struct kvm *kvm, u8 *mode)
+{
+   int r = 0, start = 0;
+   if (kvm-arch.vpit-pit_state.hpet_legacy_mode == 0  *mode == 1)



Here you check more mode == 1, but legacy_mode is only checked for != 0.
I would make this consistent.

  

ok

+   start = 1;
+   kvm-arch.vpit-pit_state.hpet_legacy_mode = *mode;
+   kvm_pit_load_count(kvm, 0, kvm-arch.vpit-pit_state.channels[0].count, 
start);
return r;
 }
 
@@ -2047,6 +2065,7 @@ long kvm_arch_vm_ioctl(struct file *filp,

struct kvm_pit_state ps;
struct kvm_memory_alias alias;
struct kvm_pit_config pit_config;
+   u8 hpet_legacy_mode;



Hmm, stead of introducing a new pair of singe-purpose IOCTLs, why not
add KVM_GET/SET_PIT2 which exchanges an extended kvm_pit_state2. And
that struct should also include some flags field and enough padding to
be potentially extended yet again in the future. In that case I see no
problem having also a mode read-back interface.

  
I thought about that, but it seemed to add unnecessary complexity, since 
this legacy control is really outside of normal PIT operation, which is 
embodied by KVM_GET/SET_PIT. It might be worth making this ioctl more 
general. Rather than SET_HPET_LEGACY, have SET_PIT_CONTROLS and pass a 
bit field, one of which is HPET_LEGACY. But, if general consensus is 
that it would be better to create a kvm_pit_state2, and get/set that, I 
can do that.

Jan

  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)

2009-06-18 Thread Beth Kon
When kvm is in hpet_legacy_mode, the hpet is providing the 
timer interrupt and the pit should not be. So in legacy mode, the pit timer is
destroyed, but the *state* of the pit is maintained. So if kvm or the guest
tries to modify the state of the pit, this modification is accepted, *except* 
that the timer isn't actually started. When we exit hpet_legacy_mode, 
the current state of the pit (which is up to date since we've been 
accepting modifications) is used to restart the pit timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to 
0xff in order to destroy the timer, but then restores the actual value,
again maintaining current state of the pit for possible later reenablement.

changes from v6:

- Added ioctl interface for legacy mode in order not to break the abi.


Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..25cae50 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -18,6 +18,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_MSIX
 #define __KVM_HAVE_MCE
+#define __KVM_HAVE_HPET_LEGACY_MODE
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 331705f..02de293 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -329,21 +329,32 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
case 1:
 /* FIXME: enhance mode 4 precision */
case 4:
-   create_pit_timer(ps, val, 0);
+   if (!ps-hpet_legacy_mode)
+   create_pit_timer(ps, val, 0);
break;
case 2:
case 3:
-   create_pit_timer(ps, val, 1);
+   if (!ps-hpet_legacy_mode)
+   create_pit_timer(ps, val, 1);
break;
default:
destroy_pit_timer(ps-pit_timer);
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
+   u8 saved_mode;
mutex_lock(kvm-arch.vpit-pit_state.lock);
-   pit_load_count(kvm, channel, val);
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   pit_load_count(kvm, channel, val);
+   }
mutex_unlock(kvm-arch.vpit-pit_state.lock);
 }
 
@@ -548,6 +559,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.hpet_legacy_mode = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..b5967ca 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u8 hpet_legacy_mode;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6025e5b..8562eeb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1134,6 +1134,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_IRQFD:
case KVM_CAP_PIT2:
+   case KVM_CAP_HPET_LEGACY_MODE:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -1986,7 +1987,24 @@ static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
int r = 0;
 
memcpy(kvm-arch.vpit-pit_state, ps, sizeof(struct kvm_pit_state));
-   kvm_pit_load_count(kvm, 0, ps-channels[0].count);
+   kvm_pit_load_count(kvm, 0, ps-channels[0].count, 0);
+   return r;
+}
+
+static int kvm_vm_ioctl_get_hpet_legacy_mode(struct kvm *kvm, u8 *mode)
+{
+   int r = 0;
+   *mode = kvm-arch.vpit-pit_state.hpet_legacy_mode;
+   return r;
+}
+
+static int kvm_vm_ioctl_set_hpet_legacy_mode(struct kvm *kvm, u8 *mode)
+{
+   int r = 0, start = 0;
+   if (kvm-arch.vpit-pit_state.hpet_legacy_mode == 0  *mode == 1)
+   start

[PATCH 0/2][RFC] Completing HPET in KVM (v7)

2009-06-18 Thread Beth Kon
There is a problem in the latest git with savevm (it aborts). So I've 
been unable to test savevm with these patches, but am submitting them RFC. 
Everything else has been tested, including compatibility testing between
old/new kernel/userspace combinations.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2][RFC] Userspace changes for KVM HPET (v7)

2009-06-18 Thread Beth Kon

This patch series must be applied on top of the hpet branch. 

The big change here is handling of enabling/disabling of hpet legacy mode. When 
hpet enters
legacy mode, the spec says that the pit stops generating interrupts. In 
practice, we want to 
stop the pit periodic timer from running because it is wasteful in a virtual 
environment. 

We also have to worry about the hpet leaving legacy mode (which, at least in 
linux, happens
only during a shutdown or crash). At this point, according to the hpet spec, 
PIT interrupts
need to be reenabled. For us, it means the PIT timer needs to be restarted.  

This patch handles this situation better than the earlier versions by coming 
closer to 
just disabling PIT interrupts. It allows the PIT state to change if the OS 
modifies it,
even while PIT is disabled, but does not allow a pit timer to start. Then if 
HPET
legacy mode is disabled, whatever the PIT state is at that point, the PIT timer 
is 
restarted accordingly.

Changes from v6:

- added ioctl interface for setting hpet legacy mode in kernel pit
- moved check for hpet_legacy_mode in pit_load_count to allow state info
  to be copied before returning if legacy mode is enabled. 
- sprinkled in some #ifdef TARGET_I386
 

Signed-off-by: Beth Kon e...@us.ibm.com
diff --git a/hw/hpet.c b/hw/hpet.c
index 29db325..2f5255f 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_timer(f, s-timer[i].qemu_timer);
 }
 }
+if (hpet_in_legacy_mode()) {
+hpet_disable_pit();
+}
 return 0;
 }
 
@@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_disable_pit();
+dprintf(qemu: hpet disabled pit\n);
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
 }
 break;
 case HPET_CFG + 4:
@@ -554,13 +559,16 @@ static void hpet_reset(void *opaque) {
 /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */
 s-capability = 0x8086a201ULL;
 s-capability |= ((HPET_CLK_PERIOD)  32);
-if (count  0)
+s-config = 0ULL;
+if (count  0) {
 /* we don't enable pit when hpet_reset is first called (by hpet_init)
  * because hpet is taking over for pit here. On subsequent invocations,
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
+}
 count = 1;
 }
 
diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c
index 8390d75..76ce6f2 100644
--- a/hw/i8254-kvm.c
+++ b/hw/i8254-kvm.c
@@ -36,6 +36,7 @@ static void kvm_pit_save(QEMUFile *f, void *opaque)
 struct kvm_pit_state pit;
 struct kvm_pit_channel_state *c;
 struct PITChannelState *sc;
+__u8 hpet_legacy_mode;
 int i;
 
 kvm_get_pit(kvm_context, pit);
@@ -59,6 +60,10 @@ static void kvm_pit_save(QEMUFile *f, void *opaque)
 }
 
 pit_save(f, s);
+if (kvm_has_hpet_legacy_mode(kvm_context)) {
+kvm_get_hpet_legacy_mode(kvm_context, hpet_legacy_mode);
+qemu_put_8s(f, hpet_legacy_mode);
+}
 }
 
 static int kvm_pit_load(QEMUFile *f, void *opaque, int version_id)
@@ -67,6 +72,7 @@ static int kvm_pit_load(QEMUFile *f, void *opaque, int 
version_id)
 struct kvm_pit_state pit;
 struct kvm_pit_channel_state *c;
 struct PITChannelState *sc;
+__u8 hpet_legacy_mode;
 int i;
 
 pit_load(f, s, version_id);
@@ -89,8 +95,13 @@ static int kvm_pit_load(QEMUFile *f, void *opaque, int 
version_id)
c-count_load_time = sc-count_load_time;
 }
 
-kvm_set_pit(kvm_context, pit);
+if (kvm_has_hpet_legacy_mode(kvm_context)) {
+qemu_get_8s(f, hpet_legacy_mode);
+kvm_get_hpet_legacy_mode(kvm_context, hpet_legacy_mode);
+}
 
+kvm_set_hpet_legacy_mode(kvm_context, hpet_legacy_mode);
+kvm_set_pit(kvm_context, pit);
 return 0;
 }
 
diff --git a/hw/i8254.c b/hw/i8254.c
index 2f229f9..0136c64 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -25,6 +25,7 @@
 #include pc.h
 #include isa.h
 #include qemu-timer.h
+#include qemu-kvm.h
 #include i8254.h
 
 //#define DEBUG_PIT
@@ -202,6 +203,11 @@ static inline void pit_load_count(PITChannelState *s, int 
val)
 val = 0x1;
 s-count_load_time = qemu_get_clock(vm_clock);
 s-count = val;
+#ifdef TARGET_I386
+if (s-channel == 0  pit_state.hpet_legacy_mode) {
+return;
+}
+#endif

ioctl number overlap?

2009-06-15 Thread Beth Kon

kvm.h has

#define KVM_SET_GUEST_DEBUG   _IOW(KVMIO,  0x9b, struct kvm_guest_debug)

and

#define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)

Seems that these could conflict?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm broken after ./configure --disable-kvm

2009-06-11 Thread Beth Kon
Building latest git with ./configure --disable-kvm breaks with errors in 
pcspk.c

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] BIOS changes for configuring irq0-inti2 override (v4)

2009-06-11 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm.

Override and HPET changes are sent as a series because HPET depends on the
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether
an override exists in the BIOS. And the HPET spec states that in legacy mode,
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is capable,
userspace sets up irq0-inti2 via the irq routing interface, and adds the
irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v3:

- changes based on comments from Avi and Gleb.
- corrected legacy enable/disable for in-kernel PIT. The code now best
  approximates a multiplexer that disables PIT interrupts when HPET is 
  in legacy mode (as described by HPET spec). Any changes to the PIT that 
  may occur while HPET is operating in legacy mode are saved, so if 
  HPET leaves legacy mode, the PIT is just reenabled, with mode set 
  to whatever the last setting from guest was. Legacy mode is disabled
  at least during crash and shutdown (in Linux), so this needs to be 
  handled properly.


---
 kvm/bios/rombios32.c |   60 -
 1 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 369cbef..9d6910e 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -444,6 +444,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -553,6 +557,17 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1195,6 +1210,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1202,7 +1224,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1758,23 +1785,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
-#ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+int_override = (struct madt_int_override*)(io_apic);
+#ifdef BX_QEMU
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 0;
+int_override-gsi = 2;
+int_override-flags = 0; /* conforms to bus specifications */
+int_override++;
+}
 #endif
-
-int_override = (struct madt_int_override*)(io_apic + 1);
-for ( i = 0; i  16; i++ ) {
-if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+for (i = 0; i  16; i++) {
+if (PCI_ISA_IRQ_MASK  (1U  i)) {
 memset(int_override, 0, sizeof(*int_override));
 int_override-type   = APIC_XRUPT_OVERRIDE;
 int_override-length = sizeof(*int_override);
@@ -2697,6 +2722,9 @@ void rombios32_init(uint32_t *s3_resume_vector, uint8_t 
*shutdown_flag)

[PATCH 3/5] BIOS changes for KVM HPET (v5)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


---
 kvm/bios/acpi-dsdt.dsl |2 --
 kvm/bios/rombios32.c   |   11 +++
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index db57307..71d0a5e 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -296,7 +296,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -316,7 +315,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 9d6910e..1106f38 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1518,8 +1518,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1703,13 +1703,11 @@ void acpi_bios_init(void)
 addr += madt_size;
 
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1883,7 +1881,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1892,7 +1889,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 acpi_additional_tables(); /* resets cfg to required entry */
 for(i = 0; i  external_tables; i++) {
@@ -1912,8 +1908,7 @@ void acpi_bios_init(void)
 /* kvm has no ssdt (processors are in dsdt) */
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] HPET interaction with in-kernel PIT

2009-06-11 Thread Beth Kon

Signed-off-by: Beth Kon e...@us.ibm.com

---
 arch/x86/include/asm/kvm.h |1 +
 arch/x86/kvm/i8254.c   |   24 +++-
 arch/x86/kvm/i8254.h   |3 ++-
 arch/x86/kvm/x86.c |5 -
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..3c44923 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -235,6 +235,7 @@ struct kvm_guest_debug_arch {
 
 struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
+   u8 hpet_legacy_mode;
 };
 
 struct kvm_reinject_control {
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 331705f..bb8382b 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -340,10 +340,20 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
+   u8 saved_mode;
mutex_lock(kvm-arch.vpit-pit_state.lock);
-   pit_load_count(kvm, channel, val);
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   if (!(channel == 0  
kvm-arch.vpit-pit_state.hpet_legacy_mode))
+   pit_load_count(kvm, channel, val);
+   }
mutex_unlock(kvm-arch.vpit-pit_state.lock);
 }
 
@@ -411,17 +421,20 @@ static void pit_ioport_write(struct kvm_io_device *this,
switch (s-write_state) {
default:
case RW_STATE_LSB:
-   pit_load_count(kvm, addr, val);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val);
break;
case RW_STATE_MSB:
-   pit_load_count(kvm, addr, val  8);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val  8);
break;
case RW_STATE_WORD0:
s-write_latch = val;
s-write_state = RW_STATE_WORD1;
break;
case RW_STATE_WORD1:
-   pit_load_count(kvm, addr, s-write_latch | (val  8));
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, s-write_latch | (val 
 8));
s-write_state = RW_STATE_WORD0;
break;
}
@@ -548,6 +561,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.hpet_legacy_mode = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..b5967ca 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u8 hpet_legacy_mode;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b91ea7..3c70545 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1948,9 +1948,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
int r = 0;
+   int hpet_legacy_start = 0;
 
+   if (ps-hpet_legacy_mode  !kvm-arch.vpit-pit_state.hpet_legacy_mode)
+   hpet_legacy_start = 1;
memcpy(kvm-arch.vpit-pit_state, ps, sizeof(struct kvm_pit_state));
-   kvm_pit_load_count(kvm, 0, ps-channels[0].count);
+   kvm_pit_load_count(kvm, 0, ps-channels[0].count, hpet_legacy_start);
return r;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] Userspace changes for configuring irq0-inti2 override (v4)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

---
 hw/ioapic.c|6 +++---
 hw/pc.c|2 ++
 qemu-kvm-x86.c |6 +-
 qemu-kvm.h |2 ++
 sysemu.h   |1 +
 vl.c   |   11 +--
 6 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 6c178c7..a67b766 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/hw/pc.c b/hw/pc.c
index 66f4635..1c068fb 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -55,6 +55,7 @@
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
+#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
 
 #define MAX_IDE_BUS 2
 
@@ -476,6 +477,7 @@ static void bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5526d8f..89337e9 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -909,7 +909,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index fa40542..6bbafbc 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -169,6 +169,7 @@ int handle_tpr_access(void *opaque, kvm_vcpu_context_t vcpu,
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -177,6 +178,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 47d001e..f78e974 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -108,6 +108,7 @@ extern int xenfb_enabled;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
+extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index 2fda17b..9b1d1ab 100644
--- a/vl.c
+++ b/vl.c
@@ -253,6 +253,7 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
+uint8_t irq0override = 1;
 #ifndef _WIN32
 int daemonize = 0;
 #endif
@@ -6054,8 +6055,14 @@ int main(int argc, char **argv, char **envp)
 
 module_call_init(MODULE_INIT_DEVICE);
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Userspace changes for KVM HPET (v4)

2009-06-11 Thread Beth Kon
The big change here is handling of enabling/disabling of hpet legacy mode. When 
hpet enters
legacy mode, the spec says that the pit stops generating interrupts. In 
practice, we want to 
stop the pit periodic timer from running because it is wasteful in a virtual 
environment. 

We also have to worry about the hpet leaving legacy mode (which, at least in 
linux, happens
only during a shutdown or crash). At this point, according to the hpet spec, 
PIT interrupts
need to be reenabled. For us, it means the PIT timer needs to be restarted.  

This patch handles this situation better than the previous version by coming 
closer to 
just disabling PIT interrupts. It allows the PIT state to change if the OS 
modifies it,
even while PIT is disabled, but does not allow a pit timer to start. Then if 
HPET
legacy mode is disabled, whatever the PIT state is at that point, the PIT timer 
is 
restarted accordingly.

Signed-off-by: Beth Kon e...@us.ibm.com


---
 hw/hpet.c |   15 +++
 hw/i8254.c|   43 ++-
 hw/i8254.h|2 ++
 hw/pc.h   |4 ++--
 kvm/include/x86/asm/kvm.h |1 +
 qemu-kvm.c|   20 
 qemu-kvm.h|3 ++-
 vl.c  |7 ++-
 8 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 29db325..043b92b 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_timer(f, s-timer[i].qemu_timer);
 }
 }
+if (hpet_in_legacy_mode()) {
+hpet_disable_pit();
+}
 return 0;
 }
 
@@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_disable_pit();
+dprintf(qemu: hpet disabled pit\n);
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
 }
 break;
 case HPET_CFG + 4:
@@ -554,13 +559,15 @@ static void hpet_reset(void *opaque) {
 /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */
 s-capability = 0x8086a201ULL;
 s-capability |= ((HPET_CLK_PERIOD)  32);
-if (count  0)
+if (count  0) {
 /* we don't enable pit when hpet_reset is first called (by hpet_init)
  * because hpet is taking over for pit here. On subsequent invocations,
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
+}
 count = 1;
 }
 
diff --git a/hw/i8254.c b/hw/i8254.c
index 2f229f9..8c8076f 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -25,6 +25,7 @@
 #include pc.h
 #include isa.h
 #include qemu-timer.h
+#include qemu-kvm.h
 #include i8254.h
 
 //#define DEBUG_PIT
@@ -198,6 +199,9 @@ int pit_get_mode(PITState *pit, int channel)
 
 static inline void pit_load_count(PITChannelState *s, int val)
 {
+if (s-channel == 0  pit_state.hpet_legacy_mode) {
+return;
+}
 if (val == 0)
 val = 0x1;
 s-count_load_time = qemu_get_clock(vm_clock);
@@ -371,10 +375,11 @@ static void pit_irq_timer_update(PITChannelState *s, 
int64_t current_time)
(double)(expire_time - current_time) / ticks_per_sec);
 #endif
 s-next_transition_time = expire_time;
-if (expire_time != -1)
+if (expire_time != -1) {
 qemu_mod_timer(s-irq_timer, expire_time);
-else
+} else {
 qemu_del_timer(s-irq_timer);
+}
 }
 
 static void pit_irq_timer(void *opaque)
@@ -451,6 +456,7 @@ void pit_reset(void *opaque)
 PITChannelState *s;
 int i;
 
+pit-hpet_legacy_mode = 0;
 for(i = 0;i  3; i++) {
 s = pit-channels[i];
 s-mode = 3;
@@ -460,32 +466,43 @@ void pit_reset(void *opaque)
 }
 
 /* When HPET is operating in legacy mode, i8254 timer0 is disabled */
-void hpet_pit_disable(void) {
-PITChannelState *s;
-s = pit_state.channels[0];
-if (s-irq_timer)
-qemu_del_timer(s-irq_timer);
+
+void hpet_disable_pit(void)
+{
+PITChannelState *s = pit_state.channels[0];
+if (qemu_kvm_pit_in_kernel()) {
+kvm_hpet_disable_kpit();
+} else {
+if (s-irq_timer) {
+qemu_del_timer(s-irq_timer);
+}
+}
 }
 
 /* When HPET is reset or leaving legacy mode, it must reenable i8254
  * timer 0
  */
 
-void hpet_pit_enable(void)
+void hpet_enable_pit(void

Re: [PATCH 1/5] BIOS changes for configuring irq0-inti2 override (v4)

2009-06-11 Thread Beth Kon

Beth Kon wrote:

Sebastian Herbszt wrote:

Beth Kon wrote:
These patches resolve the irq0-inti2 override issue, and get the 
hpet working

on kvm.

Override and HPET changes are sent as a series because HPET depends 
on the
override. Win2k8 expects the HPET interrupt on inti2, regardless of 
whether
an override exists in the BIOS. And the HPET spec states that in 
legacy mode,

timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel 
cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is 
capable,
userspace sets up irq0-inti2 via the irq routing interface, and 
adds the

irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v3:

- changes based on comments from Avi and Gleb.
- corrected legacy enable/disable for in-kernel PIT. The code now best
 approximates a multiplexer that disables PIT interrupts when HPET 
is  in legacy mode (as described by HPET spec). Any changes to the 
PIT that  may occur while HPET is operating in legacy mode are 
saved, so if  HPET leaves legacy mode, the PIT is just reenabled, 
with mode set  to whatever the last setting from guest was. 
Legacy mode is disabled
 at least during crash and shutdown (in Linux), so this needs to be 
 handled properly.



---
kvm/bios/rombios32.c |   60 
-

1 files changed, 44 insertions(+), 16 deletions(-)


What about the mptable entry count?
Think it would need something like

#ifdef BX_QEMU
 if (irq0_override)
   putle16(q, smp_cpus + 17); /* entry count */
 else
   putle16(q, smp_cpus + 18); /* entry count */
#else
 putle16(q, smp_cpus + 18); /* entry count */
#endif

Your patch Fix non-ACPI Timer Interrupt Routing - v3 [1] included 
such a change.


[1] http://lists.gnu.org/archive/html/qemu-devel/2009-04/msg01396.html

Yes, I lost that somehow! Thanks (again!).
Actually, it isn't that simple. That patch that you referred to was a 
qemu patch. But I still don't see it in qemu-patched bochs bios. 
Apparently, I did neglect to add it to the kvm bios patches that I had 
waiting.


Anthony, do you know what happened to this patch?




- Sebastian



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Userspace changes for configuring irq0-inti2 override (v6)

2009-06-11 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm.

Override and HPET changes are sent as a series because HPET depends on the
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether
an override exists in the BIOS. And the HPET spec states that in legacy mode,
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is capable,
userspace sets up irq0-inti2 via the irq routing interface, and adds the
irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v3:

- changes based on comments from Avi and Gleb.
- corrected legacy enable/disable for in-kernel PIT. The code now best
  approximates a multiplexer that disables PIT interrupts when HPET is 
  in legacy mode (as described by HPET spec). Any changes to the PIT that 
  may occur while HPET is operating in legacy mode are saved, so if 
  HPET leaves legacy mode, the PIT is just reenabled, with mode set 
  to whatever the last setting from guest was. Legacy mode is disabled
  at least during crash and shutdown (in Linux), so this needs to be 
  handled properly.

Changes from v4:

- Modify mp_table entry count depending on whether irq_override is enabled.


Signed-off-by: Beth Kon e...@us.ibm.com
---
 kvm/bios/rombios32.c |   67 ++
 1 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7db91d8..d6886ee 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -446,6 +446,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -555,6 +559,17 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1153,7 +1168,14 @@ static void mptable_init(void)
 putstr(q, 0.1 ); /* vendor id */
 putle32(q, 0); /* OEM table ptr */
 putle16(q, 0); /* OEM table size */
+#ifdef BX_QEMU
+if (irq0_override)
+putle16(q, MAX_CPUS + 17); /* entry count */
+else
+putle16(q, MAX_CPUS + 18); /* entry count */
+#else
 putle16(q, MAX_CPUS + 18); /* entry count */
+#endif
 putle32(q, 0xfee0); /* local APIC addr */
 putle16(q, 0); /* ext table length */
 putb(q, 0); /* ext table checksum */
@@ -1197,6 +1219,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1204,7 +1233,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1760,23 +1794,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
-#ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+int_override = (struct madt_int_override*)(io_apic);
+#ifdef BX_QEMU
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source

[PATCH 2/5] Userspace changes for configuring irq0-inti2 override (v6)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

---
 hw/ioapic.c|6 +++---
 hw/pc.c|2 ++
 qemu-kvm-x86.c |6 +-
 qemu-kvm.h |2 ++
 sysemu.h   |1 +
 vl.c   |   11 +--
 6 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 6c178c7..a67b766 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/hw/pc.c b/hw/pc.c
index 66f4635..1c068fb 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -55,6 +55,7 @@
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
+#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
 
 #define MAX_IDE_BUS 2
 
@@ -476,6 +477,7 @@ static void bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5526d8f..89337e9 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -909,7 +909,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index fa40542..6bbafbc 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -169,6 +169,7 @@ int handle_tpr_access(void *opaque, kvm_vcpu_context_t vcpu,
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -177,6 +178,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 47d001e..f78e974 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -108,6 +108,7 @@ extern int xenfb_enabled;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
+extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index 2fda17b..9b1d1ab 100644
--- a/vl.c
+++ b/vl.c
@@ -253,6 +253,7 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
+uint8_t irq0override = 1;
 #ifndef _WIN32
 int daemonize = 0;
 #endif
@@ -6054,8 +6055,14 @@ int main(int argc, char **argv, char **envp)
 
 module_call_init(MODULE_INIT_DEVICE);
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] BIOS changes for KVM HPET (v6)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

---
 kvm/bios/acpi-dsdt.dsl |2 --
 kvm/bios/rombios32.c   |   11 +++
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index db57307..71d0a5e 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -296,7 +296,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -316,7 +315,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 9d6910e..1106f38 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1518,8 +1518,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1703,13 +1703,11 @@ void acpi_bios_init(void)
 addr += madt_size;
 
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1883,7 +1881,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1892,7 +1889,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 acpi_additional_tables(); /* resets cfg to required entry */
 for(i = 0; i  external_tables; i++) {
@@ -1912,8 +1908,7 @@ void acpi_bios_init(void)
 /* kvm has no ssdt (processors are in dsdt) */
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Userspace changes for KVM HPET (v6)

2009-06-11 Thread Beth Kon
The big change here is handling of enabling/disabling of hpet legacy mode. When 
hpet enters
legacy mode, the spec says that the pit stops generating interrupts. In 
practice, we want to 
stop the pit periodic timer from running because it is wasteful in a virtual 
environment. 

We also have to worry about the hpet leaving legacy mode (which, at least in 
linux, happens
only during a shutdown or crash). At this point, according to the hpet spec, 
PIT interrupts
need to be reenabled. For us, it means the PIT timer needs to be restarted.  

This patch handles this situation better than the previous version by coming 
closer to 
just disabling PIT interrupts. It allows the PIT state to change if the OS 
modifies it,
even while PIT is disabled, but does not allow a pit timer to start. Then if 
HPET
legacy mode is disabled, whatever the PIT state is at that point, the PIT timer 
is 
restarted accordingly.

Signed-off-by: Beth Kon e...@us.ibm.com
---

diff --git a/hw/hpet.c b/hw/hpet.c
index 29db325..043b92b 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_timer(f, s-timer[i].qemu_timer);
 }
 }
+if (hpet_in_legacy_mode()) {
+hpet_disable_pit();
+}
 return 0;
 }
 
@@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_disable_pit();
+dprintf(qemu: hpet disabled pit\n);
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
 }
 break;
 case HPET_CFG + 4:
@@ -554,13 +559,15 @@ static void hpet_reset(void *opaque) {
 /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */
 s-capability = 0x8086a201ULL;
 s-capability |= ((HPET_CLK_PERIOD)  32);
-if (count  0)
+if (count  0) {
 /* we don't enable pit when hpet_reset is first called (by hpet_init)
  * because hpet is taking over for pit here. On subsequent invocations,
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
+}
 count = 1;
 }
 
diff --git a/hw/i8254.c b/hw/i8254.c
index 2f229f9..8c8076f 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -25,6 +25,7 @@
 #include pc.h
 #include isa.h
 #include qemu-timer.h
+#include qemu-kvm.h
 #include i8254.h
 
 //#define DEBUG_PIT
@@ -198,6 +199,9 @@ int pit_get_mode(PITState *pit, int channel)
 
 static inline void pit_load_count(PITChannelState *s, int val)
 {
+if (s-channel == 0  pit_state.hpet_legacy_mode) {
+return;
+}
 if (val == 0)
 val = 0x1;
 s-count_load_time = qemu_get_clock(vm_clock);
@@ -371,10 +375,11 @@ static void pit_irq_timer_update(PITChannelState *s, 
int64_t current_time)
(double)(expire_time - current_time) / ticks_per_sec);
 #endif
 s-next_transition_time = expire_time;
-if (expire_time != -1)
+if (expire_time != -1) {
 qemu_mod_timer(s-irq_timer, expire_time);
-else
+} else {
 qemu_del_timer(s-irq_timer);
+}
 }
 
 static void pit_irq_timer(void *opaque)
@@ -451,6 +456,7 @@ void pit_reset(void *opaque)
 PITChannelState *s;
 int i;
 
+pit-hpet_legacy_mode = 0;
 for(i = 0;i  3; i++) {
 s = pit-channels[i];
 s-mode = 3;
@@ -460,32 +466,43 @@ void pit_reset(void *opaque)
 }
 
 /* When HPET is operating in legacy mode, i8254 timer0 is disabled */
-void hpet_pit_disable(void) {
-PITChannelState *s;
-s = pit_state.channels[0];
-if (s-irq_timer)
-qemu_del_timer(s-irq_timer);
+
+void hpet_disable_pit(void)
+{
+PITChannelState *s = pit_state.channels[0];
+if (qemu_kvm_pit_in_kernel()) {
+kvm_hpet_disable_kpit();
+} else {
+if (s-irq_timer) {
+qemu_del_timer(s-irq_timer);
+}
+}
 }
 
 /* When HPET is reset or leaving legacy mode, it must reenable i8254
  * timer 0
  */
 
-void hpet_pit_enable(void)
+void hpet_enable_pit(void)
 {
 PITState *pit = pit_state;
-PITChannelState *s;
-s = pit-channels[0];
-s-mode = 3;
-s-gate = 1;
-pit_load_count(s, 0);
+PITChannelState *s = pit-channels[0];
+if (qemu_kvm_pit_in_kernel()) {
+kvm_hpet_enable_kpit();
+} else {
+pit_load_count(s, s-count);
+}
 }
 
 PITState *pit_init(int base, qemu_irq irq)
 {
 PITState *pit = pit_state;
 PITChannelState *s;
+int i

[PATCH 5/5] HPET interaction with in-kernel PIT (v6)

2009-06-11 Thread Beth Kon

Signed-off-by: Beth Kon e...@us.ibm.com

---
 arch/x86/include/asm/kvm.h |1 +
 arch/x86/kvm/i8254.c   |   24 +++-
 arch/x86/kvm/i8254.h   |3 ++-
 arch/x86/kvm/x86.c |5 -
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..3c44923 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -235,6 +235,7 @@ struct kvm_guest_debug_arch {
 
 struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
+   u8 hpet_legacy_mode;
 };
 
 struct kvm_reinject_control {
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 331705f..bb8382b 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -340,10 +340,20 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
+   u8 saved_mode;
mutex_lock(kvm-arch.vpit-pit_state.lock);
-   pit_load_count(kvm, channel, val);
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   if (!(channel == 0  
kvm-arch.vpit-pit_state.hpet_legacy_mode))
+   pit_load_count(kvm, channel, val);
+   }
mutex_unlock(kvm-arch.vpit-pit_state.lock);
 }
 
@@ -411,17 +421,20 @@ static void pit_ioport_write(struct kvm_io_device *this,
switch (s-write_state) {
default:
case RW_STATE_LSB:
-   pit_load_count(kvm, addr, val);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val);
break;
case RW_STATE_MSB:
-   pit_load_count(kvm, addr, val  8);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val  8);
break;
case RW_STATE_WORD0:
s-write_latch = val;
s-write_state = RW_STATE_WORD1;
break;
case RW_STATE_WORD1:
-   pit_load_count(kvm, addr, s-write_latch | (val  8));
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, s-write_latch | (val 
 8));
s-write_state = RW_STATE_WORD0;
break;
}
@@ -548,6 +561,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.hpet_legacy_mode = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..b5967ca 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u8 hpet_legacy_mode;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b91ea7..3c70545 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1948,9 +1948,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
int r = 0;
+   int hpet_legacy_start = 0;
 
+   if (ps-hpet_legacy_mode  !kvm-arch.vpit-pit_state.hpet_legacy_mode)
+   hpet_legacy_start = 1;
memcpy(kvm-arch.vpit-pit_state, ps, sizeof(struct kvm_pit_state));
-   kvm_pit_load_count(kvm, 0, ps-channels[0].count);
+   kvm_pit_load_count(kvm, 0, ps-channels[0].count, hpet_legacy_start);
return r;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Clean up MADT Table Creation (v2)

2009-06-09 Thread Beth Kon
This patch is based on the recent patch from Vincent Minet. I split Vincent's
changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by
Marcelo) and added a bit to them.

There has been much ado over the acpi_bios_init function recently. I had 
actually 
done a rewrite very similar to Gleb's, but Avi argued that the rewrite has to 
be 
more incremental. This patch contains minimal changes without any rewrite 
because 
the changes are kvm-only. The rewrite would better be a separate step, 
submitted to 
qemu and then merged into kvm.  

I am submitting the RSDT fix to kvm because the kvm and qemu RSDT 
implementation differs.
Again, as a separate rewrite effort, the kvm and qemu RSDT manipulation could 
be merged 
into one base as a later, separate step.

This patch will get MADT into reasonable enough shape for me to resubmit hpet 
patches
on top of it. After that, I'd be willing to submit incremental rewrite patches 
for 
acpi_bios_init to qemu, starting with MADT and RSDT.  

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 369cbef..cdae363 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -86,6 +86,8 @@ typedef unsigned long long uint64_t;
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
+#define MAX_INT_OVERRIDES 16
+
 static inline void outl(int addr, int val)
 {
 asm volatile (outl %1, %w0 : : d (addr), a (val));
@@ -1600,7 +1602,7 @@ void acpi_bios_init(void)
 uint32_t hpet_addr;
 #endif
 uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, 
ssdt_addr;
-uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size;
+uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end;
 uint32_t srat_addr,srat_size;
 uint16_t i, external_tables;
 int nb_numa_nodes;
@@ -1668,7 +1670,7 @@ void acpi_bios_init(void)
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
-sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */;
+sizeof(struct madt_io_apic)  + sizeof(struct madt_int_override) * 
MAX_INT_OVERRIDES;
 #else
 sizeof(struct madt_io_apic);
 #endif
@@ -1786,8 +1788,9 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
+madt_end = (uint32_t)int_override;
+madt_size = madt_end - madt_addr;
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Clean up RSDT Table Creation (v2)

2009-06-09 Thread Beth Kon
This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

I moved the for() loop to the end of the code that adds table_offset_entry 
entries so I could add the check for overflow - || (nb_rsdt_entries  
MAX_RSDT_ENTRIES)
This is not ideal. An ideal fix would require a rewrite of the rsdt build code, 
which 
I can do later and submit to qemu.

Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cdae363..7db91d8 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1602,7 +1602,7 @@ void acpi_bios_init(void)
 uint32_t hpet_addr;
 #endif
 uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, 
ssdt_addr;
-uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end;
+uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size, madt_end, 
rsdt_end;
 uint32_t srat_addr,srat_size;
 uint16_t i, external_tables;
 int nb_numa_nodes;
@@ -1628,7 +1628,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1872,16 +1872,6 @@ void acpi_bios_init(void)
  HPET, sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i  external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, len)  0)
-BX_PANIC(Failed to load ACPI table from QEMU\n);
-rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr = ram_size)
-BX_PANIC(ACPI table overflow\n);
-}
 #endif
 
 /* RSDT */
@@ -1894,9 +1884,19 @@ void acpi_bios_init(void)
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+for(i = 0; i  external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, len)  0)
+BX_PANIC(Failed to load ACPI table from QEMU\n);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if ((addr = ram_size) || (nb_rsdt_entries  MAX_RSDT_ENTRIES))
+BX_PANIC(ACPI table overflow\n);
+}
 #endif
-rsdt_size -= MAX_RSDT_ENTRIES * 4;
-rsdt_size += nb_rsdt_entries * 4;
+rsdt_end = (uint32_t)(rsdt-table_offset_entry[nb_rsdt_entries]);
+rsdt_size = rsdt_end - rsdt_addr;
 acpi_build_table_header((struct acpi_table_header *)rsdt, RSDT,
 rsdt_size, 1);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hpet missing in qemu-kvm's acpi table

2009-06-05 Thread Beth Kon

Jan Kiszka wrote:

Hi,

does qemu-kvm's bios intentionally refrain from reporting hpet support
via acpi or is this a bug? It works nicely with upstream (tcg  kvm mode).

Jan

  
Hi Jan. HPET is not in qemu-kvm yet because there are some issues unique 
to qemu-kvm regarding disabling of the in-kernel PIT. I have patches 
ready to submit and should be able to do so next week (long story), so 
hopefully this will be resolved shortly.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Clean up MADT Table Creation

2009-05-20 Thread Beth Kon

Avi Kivity wrote:

Beth Kon wrote:
This patch is based on the recent patch from Vincent Minet. I split 
Vincent's
changes into 2 patches (to separate MADT and RSDT table cleanup, as 
suggested by
Marcelo) and added a bit to them. And to give credit where it is due, 
this
cleanup is also related to the patch Marcelo provided when the HPET 
addition tripped over the same problem. (Thanks again Marcelo :-)

This patch moves all the table layout calculations to the same area of
acpi_bios_init. This prevents corruption problems when, in the middle of
filling in the tables, the MADT table size grows. The idea is to do 
all the layout in one section, then fill things in afterwards. It 
also corrects a problem where the madt table was memset to 0 before 
the final size of the table had been determined.


Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..7f62e4f 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;

 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i  16; i++ ) {
+if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;
 
  


You're just duplicating the override creation loop (with its internal 
if); if we update it, we'll have to update this too.

Yep, that's a valid complaint. I'll resubmit shortly.


Why not set madt_end = int_override and calculate madt_size = madt_end 
- madt?




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm.git regression in configure

2009-05-19 Thread Beth Kon
Latest qemu-kvm.git fails with ./configure, and reverting 
22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it.



Beth Kon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm.git regression in configure

2009-05-19 Thread Beth Kon

Avi Kivity wrote:

Beth Kon wrote:
Latest qemu-kvm.git fails with ./configure, and reverting 
22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it.




Works for me.  What error do you get?



./configure: 1364: Syntax error: ( unexpected (expecting fi)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable

2009-05-15 Thread Beth Kon

Marcelo Tosatti wrote:

Beth,

On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote:
  

Anthony Liguori wrote:


Vincent Minet wrote:
  

External ACPI tables are counted twice for the RSDT size and the load
address for the first external table is in the MADT (interrupt override
entries are overwritten).

Signed-off-by: Vincent Minet vinc...@vincent-minet.net
  


Beth,

I think you had a patch attempting to address the same issue.  It was  
a bit more involved though.


Which is the proper fix and are they both to the same problem?
  
They are for 2 different bases. My patch was for qemu's bochs bios and  
this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in  
this area of setting up the ACPI tables. My patch is still needed for  
the qemu base. I hope we'll be getting to one base soon :-)


Assuming the intent of the code was for MAX_RSDT_ENTRIES to include  
external_tables, this patch looks correct. I think one additional check  
would be needed (in my patch) to make sure that the code doesn't exceed  
MAX_RSDT_ENTRIES when the external tables are being loaded.


My patch also puts all the code that calculates madt_size in the same  
place, at the beginning of the table layout. I believe this is neater  
and will avoid problems like this one in the future. As much as  
possible, I think it best to get all the tables layed out, then fill  
them in. If for some reason this is not acceptable, we need to add a big  
note that no tables should be layed out after the madt because the madt  
may grow further down in the code and overwrite the other table.



I like this better too, see questions/comments below.

  

Regards,

Anthony Liguori

  

---
 kvm/bios/rombios32.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..289361b 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
  fadt_addr = addr;
@@ -1787,6 +1787,7 @@ void acpi_bios_init(void)
 }
 int_override++;
 madt_size += sizeof(struct madt_int_override);
+addr += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
  




  

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..23835b6 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;

@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;

 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i  16; i++ ) {
+if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;



This bug could only affect the HPET descriptor right? 
  
I'm not sure what you're asking. There were 2 bugs that Vincent pointed 
out. The first caused an incorrect rsdt_size to be reported, and the 
second (missing addr += sizeof(struct madt_int_override)) caused 
corruption of whatever came after the MADT. But even if his patch were 
applied, any future code that added a table and manipulated addr between 
the following points:


...
(about line 1676)
madt = (void *)(addr);
addr += madt_size;
...
(about line 1789)
madt_size += sizeof(struct madt_int_override);
addr += sizeof(struct madt_int_override);

would have wound up causing some kind of corruption, as happened with 
the HPET. Also the memset(madt, 0, madt_size) around line 1740 was not 
using the complete madt_size.


So this seems undesirable, and that's why I suggested moving all addr 
manipulation (with the exception of additional tables at the very end) 
to the same section of the table layout code. Seems best to manage 
madt_size all in one place.


  

 #ifdef BX_QEMU
@@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
@@ -1868,17 +1872,6 @@ void acpi_bios_init(void

Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon

This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7f62e4f..ac8f9c5 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1873,16 +1873,6 @@ void acpi_bios_init(void)
  HPET, sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i  external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, len)  0)
-BX_PANIC(Failed to load ACPI table from QEMU\n);
-rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr = ram_size)
-BX_PANIC(ACPI table overflow\n);
-}
 #endif
 
 /* RSDT */
@@ -1895,6 +1885,19 @@ void acpi_bios_init(void)
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+/* external_tables load must occur last to 
+ * properly check for MAX_RSDT_ENTRIES overflow.
+ */
+for(i = 0; i  external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, len)  0)
+BX_PANIC(Failed to load ACPI table from QEMU\n);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if((addr = ram_size) || (nb_rsdt_entries  MAX_RSDT_ENTRIES)) 
+BX_PANIC(ACPI table overflow\n);
+}
 #endif
 rsdt_size -= MAX_RSDT_ENTRIES * 4;
 rsdt_size += nb_rsdt_entries * 4;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon

Beth Kon wrote:

This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within

MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon e...@us.ibm.com

  
This should have been patch 2/2. I think git-send-email didn't like that 
I didn't have a space after Subject: . Let me try to resend with the 
space added.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Clean up RSDT Table Creation

2009-05-15 Thread Beth Kon
This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7f62e4f..ac8f9c5 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1873,16 +1873,6 @@ void acpi_bios_init(void)
  HPET, sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i  external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, len)  0)
-BX_PANIC(Failed to load ACPI table from QEMU\n);
-rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr = ram_size)
-BX_PANIC(ACPI table overflow\n);
-}
 #endif
 
 /* RSDT */
@@ -1895,6 +1885,19 @@ void acpi_bios_init(void)
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+/* external_tables load must occur last to 
+ * properly check for MAX_RSDT_ENTRIES overflow.
+ */
+for(i = 0; i  external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, len)  0)
+BX_PANIC(Failed to load ACPI table from QEMU\n);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if((addr = ram_size) || (nb_rsdt_entries  MAX_RSDT_ENTRIES)) 
+BX_PANIC(ACPI table overflow\n);
+}
 #endif
 rsdt_size -= MAX_RSDT_ENTRIES * 4;
 rsdt_size += nb_rsdt_entries * 4;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable

2009-05-14 Thread Beth Kon

Anthony Liguori wrote:

Vincent Minet wrote:

External ACPI tables are counted twice for the RSDT size and the load
address for the first external table is in the MADT (interrupt override
entries are overwritten).

Signed-off-by: Vincent Minet vinc...@vincent-minet.net
  


Beth,

I think you had a patch attempting to address the same issue.  It was 
a bit more involved though.


Which is the proper fix and are they both to the same problem?
They are for 2 different bases. My patch was for qemu's bochs bios and 
this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in 
this area of setting up the ACPI tables. My patch is still needed for 
the qemu base. I hope we'll be getting to one base soon :-)


Assuming the intent of the code was for MAX_RSDT_ENTRIES to include 
external_tables, this patch looks correct. I think one additional check 
would be needed (in my patch) to make sure that the code doesn't exceed 
MAX_RSDT_ENTRIES when the external tables are being loaded.


My patch also puts all the code that calculates madt_size in the same 
place, at the beginning of the table layout. I believe this is neater 
and will avoid problems like this one in the future. As much as 
possible, I think it best to get all the tables layed out, then fill 
them in. If for some reason this is not acceptable, we need to add a big 
note that no tables should be layed out after the madt because the madt 
may grow further down in the code and overwrite the other table.







Regards,

Anthony Liguori


---
 kvm/bios/rombios32.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..289361b 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;

@@ -1787,6 +1787,7 @@ void acpi_bios_init(void)
 }
 int_override++;
 madt_size += sizeof(struct madt_int_override);
+addr += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..23835b6 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;
 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i  16; i++ ) {
+if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;
 
 #ifdef BX_QEMU
@@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
@@ -1868,17 +1872,6 @@ void acpi_bios_init(void)
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
 #endif
-
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i  external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, len)  0)
-BX_PANIC(Failed to load ACPI table from QEMU\n);
-rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr = ram_size)
-BX_PANIC(ACPI table overflow\n);
-}
 #endif
 
 /* RSDT */
@@ -1891,6 +1884,16 @@ void acpi_bios_init(void)
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+for(i = 0; i  external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, len)  0)
+BX_PANIC(Failed to load ACPI table from QEMU\n);
+

Re: [PATCH 2/4] Userspace changes for configuring irq0-inti2override (v3)

2009-05-12 Thread Beth Kon

Gleb Natapov wrote:

On Tue, May 12, 2009 at 01:22:06PM +0300, Avi Kivity wrote:
  

Gleb Natapov wrote:


 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}



There is no entry for IRQ2, is this OK? What happens if IRQ2 triggers?
  
  
irq 2 is the PIC cascade interrupt.  If it is somehow triggered, the  
kernel will ignore it.




But here we configure IOAPIC routing. What if IOAPIC is used for
interrupt delivery and something triggers irq2. There is no entry
describing it in IOAPIC routing table, so what gsi it will be mapped to?

--
  
The ACPI spec states that systems that support both APIC and dual-8259 
interrupt models must map system interrupt vectors 0-15 to 8259 IRQs 
0-15, except where interrupt source overrides are provided. We provide 
an irq0-inti2 override, and no irq2 override, so irq2 must be unused.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Userspace changes for configuring irq0-inti2override (v3)

2009-05-12 Thread Beth Kon

Gleb Natapov wrote:

On Mon, May 11, 2009 at 01:29:44PM -0400, Beth Kon wrote:
  

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index e1b19d7..bb74f38 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic);
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override);
 


It is read as 1 byte by the BIOS, but it is 2 bytes here. And arch
specific config should be registered in arch specific place (hw/pc.c)
  

ok.
  

 register_savevm(fw_cfg, -1, 1, fw_cfg_save, fw_cfg_load, s);
 qemu_register_reset(fw_cfg_reset, s);
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index f616ed2..1de7360 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -19,6 +19,7 @@
 
 #define FW_CFG_WRITE_CHANNEL0x4000

 #define FW_CFG_ARCH_LOCAL   0x8000
+#define FW_CFG_IRQ0_OVERRIDE(FW_CFG_ARCH_LOCAL + 2)


This should go to hw/pc.c
  

ok.
  

 #define FW_CFG_ENTRY_MASK   ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL)
 
 #define FW_CFG_INVALID  0x

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 0b70cf6..2d77a2c 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h

 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)

 {
 IOAPICState *s = opaque;
 
-#if 0

 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)

+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {

 uint32_t mask = 1  vector;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 8cb6faa..2e52c87 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -879,7 +879,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}


There is no entry for IRQ2, is this OK? What happens if IRQ2 triggers?

  

Answered in separate email.

 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index dd045dd..6a1968a 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -165,6 +165,7 @@ void qemu_kvm_cpu_stop(CPUState *env);
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -173,6 +174,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 1f45fd6..292bbc3 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -93,6 +93,7 @@ extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
 extern int nographic;
+extern int irq0override;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
 extern int rtc_td_hack;
diff --git a/vl.c b/vl.c
index d9f0607..0bffc82 100644
--- a/vl.c
+++ b/vl.c
@@ -207,6 +207,7 @@ static int vga_ram_size;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 static DisplayState *display_state;
 int nographic;
+int irq0override;
 static int curses;
 static int sdl;
 const char* keyboard_layout = NULL;
@@ -5035,6 +5036,7 @@ int main(int argc, char **argv, char **envp)
 vga_ram_size = VGA_RAM_SIZE;
 snapshot = 0;
 nographic = 0;
+irq0override = 1;


Why not do that when defining the variable? Yeah I realize this is how
it is done for other variables too, but why?

  
Good question. I don't think there is any good reason. I was conforming 
to the existing style.

 curses = 0;
 kernel_filename = NULL;
 kernel_cmdline = ;
@@ -6129,8 +6131,14 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
-if (kvm_enabled())

-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size

Re: [PATCH 1/4] BIOS changes for configuring irq0-inti2 override(v3)

2009-05-12 Thread Beth Kon

Gleb Natapov wrote:

On Mon, May 11, 2009 at 01:29:43PM -0400, Beth Kon wrote:
  

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..53359b8 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -444,6 +444,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -553,6 +557,18 @@ uint64_t qemu_cfg_get64 (void)

 }
 #endif
 
+#ifdef BX_QEMU

+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+memset(irq0_override, 0, 1);
+}


Why memset and not irq0_override = 0, actually it should zero already.

  
This was an oversight, left over from some early cut-and-paste coding I 
was doing. You're right - not necessary. Thanks.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Userspace changes for KVM HPET (v3)

2009-05-12 Thread Beth Kon

Avi Kivity wrote:

Beth Kon wrote:

Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/hw/hpet.c b/hw/hpet.c
index c7945ec..100abf5 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -30,6 +30,7 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
 //#define HPET_DEBUG

 #ifdef HPET_DEBUG
@@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_legacy_enable(void)

+{
+if (qemu_kvm_pit_in_kernel()) {
+kvm_kpit_disable();
+dprintf(qemu: hpet disabled kernel pit\n);
+} else {
+hpet_pit_disable();
+dprintf(qemu: hpet disabled userspace pit\n);
+}
+}
+
+static void hpet_legacy_disable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+kvm_kpit_enable();
+dprintf(qemu: hpet enabled kernel pit\n);
+} else {
+hpet_pit_enable();
+dprintf(qemu: hpet enabled userspace pit\n);
+}
+}
  
I think it's better to move these into hpet_pit_enable() and 
hpet_pit_enable().  This avoids changing the calls below, and puts pit 
stuff in i8254.c instead of hpet.c.


Might also need to be called from hpet_load(); probably a problem in 
upstream as well.


My assumption about hpet_load was that the correct pit state would be 
established via pit_load (since all saves/loads are done together).  But 
when I wrote this, I was thinking only about the userspace pit (for 
qemu). I'm not sure how the load concept applies to kernel state.  Do 
I need to explicitly re-enable or disable the kernel pit during load?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Userspace changes for KVM HPET (v3)

2009-05-12 Thread Beth Kon

Beth Kon wrote:

Avi Kivity wrote:

Beth Kon wrote:

Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/hw/hpet.c b/hw/hpet.c
index c7945ec..100abf5 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -30,6 +30,7 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
 //#define HPET_DEBUG

 #ifdef HPET_DEBUG
@@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_legacy_enable(void)

+{
+if (qemu_kvm_pit_in_kernel()) {
+kvm_kpit_disable();
+dprintf(qemu: hpet disabled kernel pit\n);
+} else {
+hpet_pit_disable();
+dprintf(qemu: hpet disabled userspace pit\n);
+}
+}
+
+static void hpet_legacy_disable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+kvm_kpit_enable();
+dprintf(qemu: hpet enabled kernel pit\n);
+} else {
+hpet_pit_enable();
+dprintf(qemu: hpet enabled userspace pit\n);
+}
+}
  
I think it's better to move these into hpet_pit_enable() and 
hpet_pit_enable().  This avoids changing the calls below, and puts 
pit stuff in i8254.c instead of hpet.c.


Might also need to be called from hpet_load(); probably a problem in 
upstream as well.


My assumption about hpet_load was that the correct pit state would be 
established via pit_load (since all saves/loads are done together).  
But when I wrote this, I was thinking only about the userspace pit 
(for qemu). I'm not sure how the load concept applies to kernel 
state.  Do I need to explicitly re-enable or disable the kernel pit 
during load?
Looking further at the code, it looks like kvm_pit_load should take care 
of this. Agree?





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Userspace changes for KVM HPET (v3)

2009-05-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/hw/hpet.c b/hw/hpet.c
index c7945ec..100abf5 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -30,6 +30,7 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_legacy_enable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+kvm_kpit_disable();
+dprintf(qemu: hpet disabled kernel pit\n);
+} else {
+hpet_pit_disable();
+dprintf(qemu: hpet disabled userspace pit\n);
+}
+}
+
+static void hpet_legacy_disable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+kvm_kpit_enable();
+dprintf(qemu: hpet enabled kernel pit\n);
+} else {
+hpet_pit_enable();
+dprintf(qemu: hpet enabled userspace pit\n);
+}
+}
+
 static uint32_t timer_int_route(struct HPETTimer *timer)
 {
 uint32_t route;
@@ -475,9 +498,9 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_legacy_enable();
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_legacy_disable();
 }
 break;
 case HPET_CFG + 4:
@@ -560,7 +583,7 @@ static void hpet_reset(void *opaque) {
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_legacy_disable();
 count = 1;
 }
 
diff --git a/qemu-kvm.c b/qemu-kvm.c
index f55cee8..1bb853b 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -463,6 +463,25 @@ void kvm_init_vcpu(CPUState *env)
qemu_cond_wait(qemu_vcpu_cond);
 }
 
+void kvm_kpit_enable(void)
+{
+struct kvm_pit_state ps;
+if (qemu_kvm_pit_in_kernel()) {
+kvm_get_pit(kvm_context, ps);
+kvm_set_pit(kvm_context, ps);
+}
+}
+
+void kvm_kpit_disable(void)
+{
+struct kvm_pit_state ps;
+if (qemu_kvm_pit_in_kernel()) {
+kvm_get_pit(kvm_context, ps);
+ps.channels[0].mode = 0xff;
+kvm_set_pit(kvm_context, ps);
+}
+}
+
 int kvm_init_ap(void)
 {
 #ifdef TARGET_I386
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6a1968a..13353ec 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -31,6 +31,8 @@ int kvm_update_guest_debug(CPUState *env, unsigned long 
reinject_trap);
 int kvm_qemu_init_env(CPUState *env);
 int kvm_qemu_check_extension(int ext);
 void kvm_apic_init(CPUState *env);
+void kvm_kpit_enable(void);
+void kvm_kpit_disable(void);
 int kvm_set_irq(int irq, int level, int *status);
 
 int kvm_physical_memory_set_dirty_tracking(int enable);
diff --git a/vl.c b/vl.c
index 0bffc82..8f120c5 100644
--- a/vl.c
+++ b/vl.c
@@ -6132,10 +6132,15 @@ int main(int argc, char **argv, char **envp)
 }
 
 if (kvm_enabled()) {
-   kvm_init_ap();
+kvm_init_ap();
 #ifdef USE_KVM
 if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
 irq0override = 0;
+/* if kernel can't do irq routing, interrupt source
+ * override 0-2 can not be set up as required by hpet,
+ * so disable hpet.
+ */
+no_hpet=1;
 }
 #endif
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Userspace changes for configuring irq0-inti2 override (v3)

2009-05-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index e1b19d7..bb74f38 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic);
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override);
 
 register_savevm(fw_cfg, -1, 1, fw_cfg_save, fw_cfg_load, s);
 qemu_register_reset(fw_cfg_reset, s);
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index f616ed2..1de7360 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -19,6 +19,7 @@
 
 #define FW_CFG_WRITE_CHANNEL0x4000
 #define FW_CFG_ARCH_LOCAL   0x8000
+#define FW_CFG_IRQ0_OVERRIDE(FW_CFG_ARCH_LOCAL + 2) 
 #define FW_CFG_ENTRY_MASK   ~(FW_CFG_WRITE_CHANNEL | FW_CFG_ARCH_LOCAL)
 
 #define FW_CFG_INVALID  0x
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 0b70cf6..2d77a2c 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 8cb6faa..2e52c87 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -879,7 +879,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index dd045dd..6a1968a 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -165,6 +165,7 @@ void qemu_kvm_cpu_stop(CPUState *env);
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -173,6 +174,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 1f45fd6..292bbc3 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -93,6 +93,7 @@ extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
 extern int nographic;
+extern int irq0override;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
 extern int rtc_td_hack;
diff --git a/vl.c b/vl.c
index d9f0607..0bffc82 100644
--- a/vl.c
+++ b/vl.c
@@ -207,6 +207,7 @@ static int vga_ram_size;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 static DisplayState *display_state;
 int nographic;
+int irq0override;
 static int curses;
 static int sdl;
 const char* keyboard_layout = NULL;
@@ -5035,6 +5036,7 @@ int main(int argc, char **argv, char **envp)
 vga_ram_size = VGA_RAM_SIZE;
 snapshot = 0;
 nographic = 0;
+irq0override = 1;
 curses = 0;
 kernel_filename = NULL;
 kernel_cmdline = ;
@@ -6129,8 +6131,14 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, vga_ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] BIOS changes for KVM HPET (v3)

2009-05-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index c756fed..0e142be 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -308,7 +308,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -328,7 +327,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 53359b8..df83ee7 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1519,8 +1519,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1706,13 +1706,11 @@ void acpi_bios_init(void)
 #endif
 addr += madt_size;
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1884,7 +1882,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1893,7 +1890,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 acpi_additional_tables(); /* resets cfg to required entry */
 for(i = 0; i  external_tables; i++) {
@@ -1913,8 +1909,7 @@ void acpi_bios_init(void)
 /* kvm has no ssdt (processors are in dsdt) */
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] BIOS changes for configuring irq0-inti2 override (v3)

2009-05-11 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm. 

Override and HPET changes are sent as a series because HPET depends on the 
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether 
an override exists in the BIOS. And the HPET spec states that in legacy mode, 
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq 
routing (i.e., compatibility with old kernels). So if the kernel is capable, 
userspace sets up irq0-inti2 via the irq routing interface, and adds the 
irq0-inti2 override to the MADT interrupt source override table, 
and the mp table (for the no-acpi case).

A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid
checksum with these patches, but later he couldn't reproduce it, and I'm not 
seeing it now. While all guests still need to be fully tested, everything 
appears to be in order.  I've tested on win2k864, win2k832, RHEL5.3 32 bit, 
and ubuntu 8.10 64 bit. 

Changes from v2:

- rebased on latest kvm 
- fixed build problems with --disable-kvm (kvm_kpit_enable/disable)  

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..53359b8 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -444,6 +444,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -553,6 +557,18 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+memset(irq0_override, 0, 1);
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1195,6 +1211,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused 
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1202,7 +1225,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1665,16 +1693,18 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;
 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
-#ifdef BX_QEMU
-sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */;
-#else
 sizeof(struct madt_io_apic);
+#ifdef BX_QEMU
+for (i = 0; i  16; i++)
+if (PCI_ISA_IRQ_MASK  (1U  i))
+madt_size += sizeof(struct madt_int_override);
+if (irq0_override)
+madt_size += sizeof(struct madt_int_override);
 #endif
-madt = (void *)(addr);
 addr += madt_size;
-
 #ifdef BX_QEMU
 #ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
@@ -1758,23 +1788,20 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
+int_override = (struct madt_int_override*)(io_apic + 1);
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
-io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 0;
+int_override-gsi = 2;
+int_override-flags = 0; /* conforms to bus specifications */
+int_override++;
+}
 #endif

[PATCH 1/4] BIOS changes for configuring irq0-inti2 override (v2)

2009-05-07 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm. 

They are dependent on Jes Sorensen's recent 0006-qemu-kvm-irq-routing.patch.

Override and HPET changes are sent as a series because HPET depends on the 
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether 
an override exists in the BIOS. And the HPET spec states that in legacy mode, 
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq 
routing (i.e., compatibility with old kernels). So if the kernel is capable, 
userspace sets up irq0-inti2 via the irq routing interface, and adds the 
irq0-inti2 override to the MADT interrupt source override table, 
and the mp table (for the no-acpi case).

A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid
checksum with these patches, but later he couldn't reproduce it, and I'm not 
seeing it now. While all guests still need to be fully tested, everything 
appears to be in order.  I've tested on win2k864, win2k832, RHEL5.3 32 bit, 
and ubuntu 8.10 64 bit. 

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 8684987..07dda73 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -445,6 +445,9 @@ uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
 uint8_t bios_uuid[16];
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_SIGNATURE  0x00
 #define QEMU_CFG_ID 0x01
 #define QEMU_CFG_UUID   0x02
+#define QEMU_CFG_IRQ0_OVERRIDE 0x0e
 
 int qemu_cfg_port;
 
@@ -518,6 +522,18 @@ void uuid_probe(void)
 memset(bios_uuid, 0, 16);
 }
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+memset(irq0_override, 0, 1);
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1160,6 +1176,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused 
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1167,7 +1190,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1550,16 +1578,18 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;
 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
-#ifdef BX_QEMU
-sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */;
-#else
 sizeof(struct madt_io_apic);
+#ifdef BX_QEMU
+for (i = 0; i  16; i++)
+if (PCI_ISA_IRQ_MASK  (1U  i))
+madt_size += sizeof(struct madt_int_override);
+if (irq0_override)
+madt_size += sizeof(struct madt_int_override);
 #endif
-madt = (void *)(addr);
 addr += madt_size;
-
 #ifdef BX_QEMU
 #ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
@@ -1660,23 +1690,20 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
+int_override = (struct madt_int_override*)(io_apic + 1);
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
-io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 0;
+int_override-gsi = 2;
+int_override-flags = 0; /* conforms to bus specifications */
+int_override++;
+}
 #endif
-
-int_override = (struct madt_int_override*)(io_apic + 1);
 for ( i = 0; i  16; i

[PATCH 3/4] BIOS changes for KVM HPET (v2)

2009-05-07 Thread Beth Kon

Just a note here... 
The number of table_offset_entry entries for the non BX_QEMU case doesn't make  
sense here. There are only 2 entries. I left it as is, since it does not 
impact HPET's interraction with it. Actually it seems like dead code since 
this is in kvm code but with BX_QEMU undefined. It doesn't seem to be a
problem. 


Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index c756fed..0e142be 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -308,7 +308,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -328,7 +327,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index ddfa828..7441cd7 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1293,7 +1293,7 @@ struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 #ifdef BX_QEMU
-   uint32_t table_offset_entry [2]; /* Array 
of pointers to other */
+   uint32_t table_offset_entry [3]; /* Array 
of pointers to other */
 // uint32_t table_offset_entry [4]; /* Array 
of pointers to other */
 #else
uint32_t table_offset_entry [3]; /* Array 
of pointers to other */
@@ -1450,8 +1450,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1591,13 +1591,11 @@ void acpi_bios_init(void)
 #endif
 addr += madt_size;
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 acpi_tables_size = addr - base_addr;
 
@@ -1620,10 +1618,10 @@ void acpi_bios_init(void)
 memset(rsdt, 0, sizeof(*rsdt));
 rsdt-table_offset_entry[0] = cpu_to_le32(fadt_addr);
 rsdt-table_offset_entry[1] = cpu_to_le32(madt_addr);
-//rsdt-table_offset_entry[2] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-//rsdt-table_offset_entry[3] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[2] = cpu_to_le32(hpet_addr);
 #endif
+//rsdt-table_offset_entry[3] = cpu_to_le32(ssdt_addr);
 acpi_build_table_header((struct acpi_table_header *)rsdt,
 RSDT, sizeof(*rsdt), 1);
 
@@ -1723,7 +1721,6 @@ void acpi_bios_init(void)
 
 #ifdef BX_QEMU
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1733,7 +1730,6 @@ void acpi_bios_init(void)
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
 #endif
-#endif
 
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Userspace changes for KVM HPET (v2)

2009-05-07 Thread Beth Kon

Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/hw/hpet.c b/hw/hpet.c
index c7945ec..47c9f89 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -30,6 +30,7 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_kpit_enable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_kpit_disable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+ps.channels[0].mode = 0xff;
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_legacy_enable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+   hpet_kpit_disable();
+   dprintf(qemu: hpet disabled kernel pit\n);
+} else {
+   hpet_pit_disable();
+   dprintf(qemu: hpet disabled userspace pit\n);
+}
+}
+
+static void hpet_legacy_disable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+   hpet_kpit_enable();
+   dprintf(qemu: hpet enabled kernel pit\n);
+} else {
+   hpet_pit_enable();
+   dprintf(qemu: hpet enabled userspace pit\n);
+}
+}
+
 static uint32_t timer_int_route(struct HPETTimer *timer)
 {
 uint32_t route;
@@ -475,9 +513,9 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_legacy_enable();
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_legacy_disable();
 }
 break;
 case HPET_CFG + 4:
@@ -560,7 +598,7 @@ static void hpet_reset(void *opaque) {
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_legacy_disable();
 count = 1;
 }
 
diff --git a/vl.c b/vl.c
index f9a72b3..b860b82 100644
--- a/vl.c
+++ b/vl.c
@@ -6138,10 +6138,15 @@ int main(int argc, char **argv, char **envp)
 }
 
 if (kvm_enabled()) {
-   kvm_init_ap();
+kvm_init_ap();
 #ifdef USE_KVM
 if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
 irq0override = 0;
+/* if kernel can't do irq routing, interrupt source
+ * override 0-2 can not be set up as required by hpet,
+ * so disable hpet.
+ */
+no_hpet=1;
 }
 #endif
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Userspace changes for configuring irq0-inti2 override (v2)

2009-05-07 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index e1b19d7..bb74f38 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic);
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override);
 
 register_savevm(fw_cfg, -1, 1, fw_cfg_save, fw_cfg_load, s);
 qemu_register_reset(fw_cfg_reset, s);
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index f616ed2..498c1e3 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -15,6 +15,7 @@
 #define FW_CFG_INITRD_SIZE  0x0b
 #define FW_CFG_BOOT_DEVICE  0x0c
 #define FW_CFG_NUMA 0x0d
+#define FW_CFG_IRQ0_OVERRIDE0x0e
 #define FW_CFG_MAX_ENTRY0x10
 
 #define FW_CFG_WRITE_CHANNEL0x4000
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 0b70cf6..2d77a2c 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 8cb6faa..2e52c87 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -879,7 +879,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 8226001..c64718c 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -165,6 +165,7 @@ void qemu_kvm_cpu_stop(CPUState *env);
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -173,6 +174,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 1f45fd6..292bbc3 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -93,6 +93,7 @@ extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
 extern int nographic;
+extern int irq0override;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
 extern int rtc_td_hack;
diff --git a/vl.c b/vl.c
index 6b4b7d2..f9a72b3 100644
--- a/vl.c
+++ b/vl.c
@@ -207,6 +207,7 @@ static int vga_ram_size;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 static DisplayState *display_state;
 int nographic;
+int irq0override;
 static int curses;
 static int sdl;
 const char* keyboard_layout = NULL;
@@ -5039,6 +5040,7 @@ int main(int argc, char **argv, char **envp)
 vga_ram_size = VGA_RAM_SIZE;
 snapshot = 0;
 nographic = 0;
+irq0override = 1;
 curses = 0;
 kernel_filename = NULL;
 kernel_cmdline = ;
@@ -6135,8 +6137,14 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, vga_ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] BIOS changes for configuring irq0-inti2 override(v2)

2009-05-07 Thread Beth Kon

Avi Kivity wrote:

Beth Kon wrote:
These patches resolve the irq0-inti2 override issue, and get the 
hpet working

on kvm.
They are dependent on Jes Sorensen's recent 
0006-qemu-kvm-irq-routing.patch.


Override and HPET changes are sent as a series because HPET depends 
on the override. Win2k8 expects the HPET interrupt on inti2, 
regardless of whether an override exists in the BIOS. And the HPET 
spec states that in legacy mode, timer interrupt is on inti2.


The irq0-inti2 override will always be used unless the kernel cannot 
do irq routing (i.e., compatibility with old kernels). So if the 
kernel is capable, userspace sets up irq0-inti2 via the irq routing 
interface, and adds the irq0-inti2 override to the MADT interrupt 
source override table, and the mp table (for the no-acpi case).


A couple of months ago, Marcelo was seeing RHEL5 guests complain of 
invalid
checksum with these patches, but later he couldn't reproduce it, and 
I'm not seeing it now. While all guests still need to be fully 
tested, everything appears to be in order.  I've tested on win2k864, 
win2k832, RHEL5.3 32 bit, and ubuntu 8.10 64 bit.
  


What are the changes relative to v1?
Just merge issues with the changes you put in when moving to the newer 
bios. I submitted prematurely, incorrectly thinking I was done testing. 
When I finished, some problems surfaced.



@@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_SIGNATURE  0x00
 #define QEMU_CFG_ID 0x01
 #define QEMU_CFG_UUID   0x02
+#define QEMU_CFG_IRQ0_OVERRIDE 0x0e
  


As noted, this should be in the arch local space.

The base changes were not in the code yet. As we discussed on IRC, I'll 
resubmit once they're there.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] BIOS changes for configuring irq0-inti2 override

2009-05-06 Thread Beth Kon

Sebastian Herbszt wrote:

Beth Kon wrote:

@@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
#define QEMU_CFG_SIGNATURE  0x00
#define QEMU_CFG_ID 0x01
#define QEMU_CFG_UUID   0x02
+#define QEMU_CFG_IRQ0_OVERRIDE 0x0e


Small thing to consider before you resubmit:
In his patch read-additional-acpi-tables-from-a-vm.patch Gleb 
introduced:


#define QEMU_CFG_ARCH_LOCAL 0x8000
#define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)

I think the idea behind this was to seperate the generic part from 
arch specific.
The IRQ0 override seems to be arch specific (x86 only?) just like the 
ACPI tables, right?
I'm not sure what the intent is. It looks like it would be just for 
additional tables (as opposed to local)? Gleb? I don't believe irq0 
override would fall into that category. But in any case since this is 
not in any code base, I don't think there's anything to be done yet.


- Sebastian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] BIOS changes for configuring irq0-inti2 override

2009-05-05 Thread Beth Kon

Beth Kon wrote:

These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm. 
  

I've found a problem with these patches. I'll resubmit shortly.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] BIOS changes for configuring irq0-inti2 override

2009-05-04 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm. 

Override and HPET changes are sent as a series because HPET depends on the 
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether 
an override exists in the BIOS. And the HPET spec states that in legacy mode, 
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq 
routing (i.e., compatibility with old kernels). So if the kernel is capable, 
userspace sets up irq0-inti2 via the irq routing interface, and adds the 
irq0-inti2 override to the MADT interrupt source override table, 
and the mp table (for the no-acpi case).

A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid
checksum with these patches, but later he couldn't reproduce it, and I'm not 
seeing it now. While all guests still need to be fully tested, everything 
appears to be in order.  I've tested on win2k864, win2k832, RHEL5.3 32 bit, 
and ubuntu 8.10 64 bit. 


Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 8684987..ddfa828 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -445,6 +445,9 @@ uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
 uint8_t bios_uuid[16];
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -477,6 +480,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_SIGNATURE  0x00
 #define QEMU_CFG_ID 0x01
 #define QEMU_CFG_UUID   0x02
+#define QEMU_CFG_IRQ0_OVERRIDE 0x0e
 
 int qemu_cfg_port;
 
@@ -518,6 +522,18 @@ void uuid_probe(void)
 memset(bios_uuid, 0, 16);
 }
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+memset(irq0_override, 0, 1);
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1160,6 +1176,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused 
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1167,7 +1190,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1550,16 +1578,18 @@ void acpi_bios_init(void)
 
 addr = (addr + 7)  ~7;
 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
-#ifdef BX_QEMU
-sizeof(struct madt_io_apic) /* + sizeof(struct madt_int_override) */;
-#else
 sizeof(struct madt_io_apic);
+#ifdef BX_QEMU
+for (i = 0; i  16; i++)
+if (PCI_ISA_IRQ_MASK  (1U  i))
+madt_size += sizeof(struct madt_int_override);
+if (irq0_override)
+madt_size += sizeof(struct madt_int_override);
 #endif
-madt = (void *)(addr);
 addr += madt_size;
-
 #ifdef BX_QEMU
 #ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
@@ -1660,23 +1690,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
+int_override = (struct madt_int_override*)(io_apic + 1);
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
-io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+if (irq0_override) {
+int_override = (void *)io_apic;
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-bus = cpu_to_le32(0);
+int_override-source = cpu_to_le32(0);
+int_override-gsi = cpu_to_le32(2);
+int_override-flags = cpu_to_le32(0); /* conforms to bus 
specifications */
+int_override++;
+}
 #endif
-
-int_override = (struct madt_int_override*)(io_apic + 1);
 for ( i = 0; i  16; i

[PATCH 2/4] Userspace changes for configuring irq0-inti2 override

2009-05-04 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/hw/fw_cfg.c b/hw/fw_cfg.c
index e1b19d7..bb74f38 100644
--- a/hw/fw_cfg.c
+++ b/hw/fw_cfg.c
@@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic);
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override);
 
 register_savevm(fw_cfg, -1, 1, fw_cfg_save, fw_cfg_load, s);
 qemu_register_reset(fw_cfg_reset, s);
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index f616ed2..498c1e3 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -15,6 +15,7 @@
 #define FW_CFG_INITRD_SIZE  0x0b
 #define FW_CFG_BOOT_DEVICE  0x0c
 #define FW_CFG_NUMA 0x0d
+#define FW_CFG_IRQ0_OVERRIDE0x0e
 #define FW_CFG_MAX_ENTRY0x10
 
 #define FW_CFG_WRITE_CHANNEL0x4000
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 0b70cf6..2d77a2c 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 68a9218..5b27179 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -814,9 +814,14 @@ int kvm_qemu_create_context(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
-if (r  0)
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
+if (r  0) {
 return r;
+}
 }
 kvm_commit_irq_routes(kvm_context);
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index ca59af8..a836579 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -166,6 +166,7 @@ void qemu_kvm_cpu_stop(CPUState *env);
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
diff --git a/sysemu.h b/sysemu.h
index e8dd381..a5f96f9 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -96,6 +96,7 @@ extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
 extern int nographic;
+extern int irq0override;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
 extern int rtc_td_hack;
diff --git a/vl.c b/vl.c
index 9ff4a5a..ee7f29a 100644
--- a/vl.c
+++ b/vl.c
@@ -207,6 +207,7 @@ static int vga_ram_size;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 static DisplayState *display_state;
 int nographic;
+int irq0override;
 static int curses;
 static int sdl;
 const char* keyboard_layout = NULL;
@@ -4599,6 +4600,7 @@ int main(int argc, char **argv, char **envp)
 vga_ram_size = VGA_RAM_SIZE;
 snapshot = 0;
 nographic = 0;
+irq0override = 1;
 curses = 0;
 kernel_filename = NULL;
 kernel_cmdline = ;
@@ -5682,8 +5684,14 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+   irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, vga_ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] BIOS changes for KVM HPET

2009-05-04 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index c756fed..0e142be 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -308,7 +308,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -328,7 +327,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index ddfa828..7441cd7 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1293,7 +1293,7 @@ struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 #ifdef BX_QEMU
-   uint32_t table_offset_entry [2]; /* Array 
of pointers to other */
+   uint32_t table_offset_entry [3]; /* Array 
of pointers to other */
 // uint32_t table_offset_entry [4]; /* Array 
of pointers to other */
 #else
uint32_t table_offset_entry [3]; /* Array 
of pointers to other */
@@ -1450,8 +1450,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1591,13 +1591,11 @@ void acpi_bios_init(void)
 #endif
 addr += madt_size;
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 acpi_tables_size = addr - base_addr;
 
@@ -1620,10 +1618,10 @@ void acpi_bios_init(void)
 memset(rsdt, 0, sizeof(*rsdt));
 rsdt-table_offset_entry[0] = cpu_to_le32(fadt_addr);
 rsdt-table_offset_entry[1] = cpu_to_le32(madt_addr);
-//rsdt-table_offset_entry[2] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-//rsdt-table_offset_entry[3] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[2] = cpu_to_le32(hpet_addr);
 #endif
+//rsdt-table_offset_entry[3] = cpu_to_le32(ssdt_addr);
 acpi_build_table_header((struct acpi_table_header *)rsdt,
 RSDT, sizeof(*rsdt), 1);
 
@@ -1723,7 +1721,6 @@ void acpi_bios_init(void)
 
 #ifdef BX_QEMU
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1733,7 +1730,6 @@ void acpi_bios_init(void)
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
 #endif
-#endif
 
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Userspace changes for KVM HPET

2009-05-04 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


diff --git a/hw/hpet.c b/hw/hpet.c
index c7945ec..47c9f89 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -30,6 +30,7 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_kpit_enable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_kpit_disable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+ps.channels[0].mode = 0xff;
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_legacy_enable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+   hpet_kpit_disable();
+   dprintf(qemu: hpet disabled kernel pit\n);
+} else {
+   hpet_pit_disable();
+   dprintf(qemu: hpet disabled userspace pit\n);
+}
+}
+
+static void hpet_legacy_disable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+   hpet_kpit_enable();
+   dprintf(qemu: hpet enabled kernel pit\n);
+} else {
+   hpet_pit_enable();
+   dprintf(qemu: hpet enabled userspace pit\n);
+}
+}
+
 static uint32_t timer_int_route(struct HPETTimer *timer)
 {
 uint32_t route;
@@ -475,9 +513,9 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_legacy_enable();
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_legacy_disable();
 }
 break;
 case HPET_CFG + 4:
@@ -560,7 +598,7 @@ static void hpet_reset(void *opaque) {
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_legacy_disable();
 count = 1;
 }
 
diff --git a/pc-bios/bios.bin b/pc-bios/bios.bin
index d5d42f3..2503783 100644
Binary files a/pc-bios/bios.bin and b/pc-bios/bios.bin differ
diff --git a/vl.c b/vl.c
index 5eacd6a..1334344 100644
--- a/vl.c
+++ b/vl.c
@@ -5666,10 +5666,15 @@ int main(int argc, char **argv, char **envp)
 }
 
 if (kvm_enabled()) {
-   kvm_init_ap();
+kvm_init_ap();
 #ifdef USE_KVM
 if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
-   irq0override = 0;
+irq0override = 0;
+/* if kernel can't do irq routing, interrupt source
+ * override 0-2 can not be set up as required by hpet,
+ * so disable hpet.
+ */
+no_hpet=1;
 }
 #endif
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Finish HPET implementation for KVM

2009-04-09 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index 06ab25d..84697db 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -307,6 +307,24 @@ DefinitionBlock (
 ,, , AddressRangeMemory, TypeStatic)
 })
 }
+Device(HPET) {
+Name(_HID,  EISAID(PNP0103))
+Name(_UID, 0)
+Method (_STA, 0, NotSerialized) {
+Return(0x0F)
+}
+Name(_CRS, ResourceTemplate() {
+DWordMemory(
+ResourceConsumer, PosDecode, MinFixed, MaxFixed,
+NonCacheable, ReadWrite,
+0x,
+0xFED0,
+0xFED003FF,
+0x,
+0x0400 /* 1K memory: FED0 - FED003FF */
+)
+})
+}
 }
 
 Scope(\_SB.PCI0) {
diff --git a/bios/rombios32.c b/bios/rombios32.c
index 5cf1f54..959a784 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -1275,7 +1275,7 @@ struct rsdp_descriptor /* Root System Descriptor 
Pointer */
 struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
-   uint32_t table_offset_entry [2]; /* Array 
of pointers to other */
+   uint32_t table_offset_entry [3]; /* Array 
of pointers to other */
 /* ACPI tables */
 } __attribute__((__packed__));
 
@@ -1415,6 +1415,31 @@ struct madt_processor_apic
 #endif
 } __attribute__((__packed__));
 
+/*
+ *  * ACPI 2.0 Generic Address Space definition.
+ *   */
+struct acpi_20_generic_address {
+uint8_t  address_space_id;
+uint8_t  register_bit_width;
+uint8_t  register_bit_offset;
+uint8_t  reserved;
+uint64_t address;
+} __attribute__((__packed__));
+
+/*
+ *  * HPET Description Table
+ *   */
+struct acpi_20_hpet {
+ACPI_TABLE_HEADER_DEF/* ACPI common table header */
+uint32_t   timer_block_id;
+struct acpi_20_generic_address addr;
+uint8_thpet_number;
+uint16_t   min_tick;
+uint8_tpage_protect;
+} __attribute__((__packed__));
+
+#define ACPI_HPET_ADDRESS 0xFED0UL
+
 struct madt_io_apic
 {
APIC_HEADER_DEF
@@ -1487,6 +1512,8 @@ void acpi_bios_init(void)
 struct facs_descriptor_rev1 *facs;
 struct multiple_apic_table *madt;
 uint8_t *dsdt;
+struct acpi_20_hpet *hpet;
+uint32_t hpet_addr;
 uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr;
 uint32_t acpi_tables_size, madt_addr, madt_size;
 int i;
@@ -1534,6 +1561,11 @@ void acpi_bios_init(void)
 madt_size += sizeof(struct madt_intsrcovr);
 addr += madt_size;
 
+addr = (addr + 7)  ~7;
+hpet_addr = addr;
+hpet = (void *)(addr);
+addr += sizeof(*hpet);
+
 acpi_tables_size = addr - base_addr;
 
 BX_INFO(ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx 
size=0x%x\n,
@@ -1555,6 +1587,7 @@ void acpi_bios_init(void)
 memset(rsdt, 0, sizeof(*rsdt));
 rsdt-table_offset_entry[0] = cpu_to_le32(fadt_addr);
 rsdt-table_offset_entry[1] = cpu_to_le32(madt_addr);
+rsdt-table_offset_entry[2] = cpu_to_le32(hpet_addr);
 acpi_build_table_header((struct acpi_table_header *)rsdt,
 RSDT, sizeof(*rsdt), 1);
 
@@ -1644,6 +1677,15 @@ void acpi_bios_init(void)
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
+/* HPET */
+memset(hpet, 0, sizeof(*hpet));
+/* Note: timer_block_id value must be kept in sync with value 
+ * advertised by emulated hpet in hpet.c
+ */
+hpet-timer_block_id = cpu_to_le32(0x8086a201);
+hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
+acpi_build_table_header((struct  acpi_table_header *)hpet,
+ HPET, sizeof(*hpet), 1);
 }
 }
 
diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c
index 7df2d05..2b817a6 100644
--- a/qemu/hw/hpet.c
+++ b/qemu/hw/hpet.c
@@ -30,6 +30,7 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
 //#define HPET_DEBUG
 #ifdef HPET_DEBUG
@@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_kpit_enable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_kpit_disable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+ps.channels[0].mode = 0xff;
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_legacy_enable(void)
+{
+if (qemu_kvm_pit_in_kernel()) {
+   hpet_kpit_disable();
+   dprintf(qemu: hpet disabled kernel pit\n);
+} else {
+   hpet_pit_disable

[PATCH 1/2] Make BIOS irq0-inti2 override configurable from userspace

2009-04-09 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm with and without -no-kvm-irqchip (i.e., when hpet takes over, it 
disables userspace or in-kernel pit as appropriate).

The irq0-inti2 override will always be used unless the kernel cannot do irq 
routing (i.e., compatibility with old kernels). So if the kernel is capable, 
userspace sets up irq0-inti2 via the irq routing interface, and adds the 
irq0-inti2 override to the MADT interrupt source override table, 
and the mp table (for the no-acpi case).

A couple of months ago, Marcelo was seeing RHEL5 guests complain of invalid
checksum with these patches, but later he couldn't reproduce it, and I'm not 
seeing it now. While all guests still need to be fully tested, everything 
appears to be in order.  I've tested on win2k864, win2k832, RHEL5.3 32 bit, 
and ubuntu 8.10 64 bit. 

Signed-off-by: Beth Kon e...@us.ibm.com

diff --git a/bios/rombios32.c b/bios/rombios32.c
index 4dea066..5cf1f54 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -443,6 +443,7 @@ uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
 uint8_t bios_uuid[16];
+uint8_t irq0_override;
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -475,6 +476,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_SIGNATURE  0x00
 #define QEMU_CFG_ID 0x01
 #define QEMU_CFG_UUID   0x02
+#define QEMU_CFG_IRQ0_OVERRIDE 0x0d
 
 int qemu_cfg_port;
 
@@ -516,6 +518,18 @@ void uuid_probe(void)
 memset(bios_uuid, 0, 16);
 }
 
+void irq0_override_probe(void)
+{
+#ifdef BX_QEMU
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+#endif
+memset(irq0_override, 0, 1);
+}
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1152,6 +1166,8 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+if (irq0_override  i == 2)
+continue;
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1159,7 +1175,10 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1508,6 +1527,11 @@ void acpi_bios_init(void)
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 sizeof(struct madt_io_apic);
 madt = (void *)(addr);
+for (i = 0; i  16; i++)
+if (PCI_ISA_IRQ_MASK  (1U  i))
+madt_size += sizeof(struct madt_intsrcovr);
+if (irq0_override)
+madt_size += sizeof(struct madt_intsrcovr);
 addr += madt_size;
 
 acpi_tables_size = addr - base_addr;
@@ -1597,8 +1621,15 @@ void acpi_bios_init(void)
 io_apic-interrupt = cpu_to_le32(0);
 
 intsrcovr = (struct madt_intsrcovr*)(io_apic + 1);
-for ( i = 0; i  16; i++ ) {
-if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+for (i = 0; i  16; i++) {
+if (irq0_override  i == 0) {
+memset(intsrcovr, 0, sizeof(*intsrcovr));
+intsrcovr-type   = APIC_XRUPT_OVERRIDE;
+intsrcovr-length = sizeof(*intsrcovr);
+intsrcovr-source = i;
+intsrcovr-gsi= 2;
+intsrcovr-flags  = 0;  //conforms to bus specifications
+} else if (PCI_ISA_IRQ_MASK  (1U  i)) {
 memset(intsrcovr, 0, sizeof(*intsrcovr));
 intsrcovr-type   = APIC_XRUPT_OVERRIDE;
 intsrcovr-length = sizeof(*intsrcovr);
@@ -1610,7 +1641,6 @@ void acpi_bios_init(void)
 continue;
 }
 intsrcovr++;
-madt_size += sizeof(struct madt_intsrcovr);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
@@ -2230,6 +2260,8 @@ void rombios32_init(uint32_t *s3_resume_vector, uint8_t 
*shutdown_flag)
 
 if (bios_table_cur_addr != 0) {
 
+irq0_override_probe();
+
 mptable_init();
 
 uuid_probe();
diff --git a/qemu/hw/fw_cfg.c b/qemu/hw/fw_cfg.c
index e324e8d..f06dc3c 100644
--- a/qemu/hw/fw_cfg.c
+++ b/qemu/hw/fw_cfg.c
@@ -279,6 +279,7 @@ void *fw_cfg_init(uint32_t ctl_port, uint32_t data_port,
 fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)nographic);
 fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(s, FW_CFG_IRQ0_OVERRIDE, (uint16_t)irq0override);
 
 register_savevm(fw_cfg, -1, 1, fw_cfg_save, fw_cfg_load

[RFC][PATCH 2/2] Finish hpet implementation for KVM

2009-01-22 Thread Beth Kon
- add hpet to BIOS
- add disable/enable of kernel pit when hpet enters/leaves legacy mode

Signed-off-by: Beth Kon e...@us.ibm.com
diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index d67616d..9981a1f 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -233,6 +233,24 @@ DefinitionBlock (
 ,, , AddressRangeMemory, TypeStatic)
 })
 }
+Device(HPET) {
+Name(_HID,  EISAID(PNP0103))
+Name(_UID, 0)
+Method (_STA, 0, NotSerialized) {
+Return(0x0F)
+}
+Name(_CRS, ResourceTemplate() {
+DWordMemory(
+ResourceConsumer, PosDecode, MinFixed, MaxFixed,
+NonCacheable, ReadWrite,
+0x,
+0xFED0,
+0xFED003FF,
+0x,
+0x0400 /* 1K memory: FED0 - FED003FF */
+)
+})
+}
 }
 
 Scope(\_SB.PCI0) {
diff --git a/bios/rombios32.c b/bios/rombios32.c
index 84f15fb..17c3704 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -1272,7 +1272,7 @@ struct rsdp_descriptor /* Root System Descriptor 
Pointer */
 struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
-   uint32_t table_offset_entry [2]; /* Array 
of pointers to other */
+   uint32_t table_offset_entry [3]; /* Array 
of pointers to other */
 /* ACPI tables */
 } __attribute__((__packed__));
 
@@ -1412,6 +1412,31 @@ struct madt_processor_apic
 #endif
 } __attribute__((__packed__));
 
+/*
+ *  * ACPI 2.0 Generic Address Space definition.
+ *   */
+struct acpi_20_generic_address {
+uint8_t  address_space_id;
+uint8_t  register_bit_width;
+uint8_t  register_bit_offset;
+uint8_t  reserved;
+uint64_t address;
+} __attribute__((__packed__));
+
+/*
+ *  * HPET Description Table
+ *   */
+struct acpi_20_hpet {
+ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
+uint32_t   timer_block_id;
+struct acpi_20_generic_address addr;
+uint8_thpet_number;
+uint16_t   min_tick;
+uint8_tpage_protect;
+} __attribute__((__packed__));
+
+#define ACPI_HPET_ADDRESS 0xFED0UL
+
 struct madt_io_apic
 {
APIC_HEADER_DEF
@@ -1484,6 +1509,8 @@ void acpi_bios_init(void)
 struct facs_descriptor_rev1 *facs;
 struct multiple_apic_table *madt;
 uint8_t *dsdt;
+struct acpi_20_hpet *hpet;
+uint32_t hpet_addr;
 uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr;
 uint32_t acpi_tables_size, madt_addr, madt_size;
 int i;
@@ -1531,6 +1558,11 @@ void acpi_bios_init(void)
 madt_size += sizeof(struct madt_intsrcovr);
 addr += madt_size;
 
+addr = (addr + 7)  ~7;
+hpet_addr = addr;
+hpet = (void *)(addr);
+addr += sizeof(*hpet);
+
 acpi_tables_size = addr - base_addr;
 
 BX_INFO(ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx 
size=0x%x\n,
@@ -1552,6 +1584,7 @@ void acpi_bios_init(void)
 memset(rsdt, 0, sizeof(*rsdt));
 rsdt-table_offset_entry[0] = cpu_to_le32(fadt_addr);
 rsdt-table_offset_entry[1] = cpu_to_le32(madt_addr);
+rsdt-table_offset_entry[2] = cpu_to_le32(hpet_addr);
 acpi_build_table_header((struct acpi_table_header *)rsdt,
 RSDT, sizeof(*rsdt), 1);
 
@@ -1641,6 +1674,15 @@ void acpi_bios_init(void)
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
+/* HPET */
+memset(hpet, 0, sizeof(*hpet));
+/* Note timer_block_id value must be kept in sync with value 
advertised by
+ * emulated hpet
+ */
+hpet-timer_block_id = cpu_to_le32(0x8086a201);
+hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
+acpi_build_table_header((struct  acpi_table_header *)hpet,
+ HPET, sizeof(*hpet), 1);
 }
 }
 
diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c
index 7df2d05..80b2edd 100644
--- a/qemu/hw/hpet.c
+++ b/qemu/hw/hpet.c
@@ -30,8 +30,9 @@
 #include console.h
 #include qemu-timer.h
 #include hpet_emul.h
+#include qemu-kvm.h
 
-//#define HPET_DEBUG
+#define HPET_DEBUG
 #ifdef HPET_DEBUG
 #define dprintf printf
 #else
@@ -48,6 +49,43 @@ uint32_t hpet_in_legacy_mode(void)
 return 0;
 }
 
+static void hpet_kpit_enable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_kpit_disable(void)
+{
+struct kvm_pit_state ps;
+kvm_get_pit(kvm_context, ps);
+ps.channels[0].mode = 0xff;
+kvm_set_pit(kvm_context, ps);
+}
+
+static void hpet_legacy_enable(void)
+{
+if (qemu_kvm_pit_in_kernel

[RFC][PATCH 1/2] Make irq0-inti2 override in BIOS configurable from userspace

2009-01-22 Thread Beth Kon
This series of patches (nearly) resolves the irq0-inti2 override issue, and 
gets the hpet working on kvm with and
without the in-kernel irqchip (i.e., it disables userspace and in-kernel pit as 
needed).

- irq0-inti2
  The resolution was to always use the override unless the kernel cannot do irq 
routing (i.e., compatibility with old
  kernels). So qemu checks whether the kernel is capable of irq routing. If so, 
qemu tells kvm to route irq0 to 
  inti2 via the irq routing interface, and tells bios to add the irq0-inti2 
override to the MADT interrupt source 
  override table, and to the mp table (for the non-acpi case). The only 
outstanding problem here is that when I set 
  acpi=off on the kernel boot line, the boot fails. Apparently linux does not 
like the way I implemented the override 
  for the mp table in rombios32.c. Since I am pressed for time at the moment, 
I'm putting this patch set out for comments 
  in hopes that someone else may immediately see the problem. Otherwise  I'll 
keep looking into it as time permits.

- hpet
  The hpet works with and without in-kernel irqchip. And many thanks to Marcelo 
for finding a bios corruption bug that
  was the primary source of win2k864 problems. Now the hpet works on linux 
(ubuntu 8.0.4), win2k832. On win2k864 it works
  with the in-kernel irqchip but is broken (i.e.,black screen) when 
-no-kvm-irqchip is specified. Though I found that 
  it is also broken when I remove these 2 patches, so it appears to have 
nothing to do with hpet or irq routing. Needs 
  more looking into.


Signed-off-by: Beth Kon e...@us.ibm.com
---
 bios/Makefile|2 +-
 bios/rombios32.c |   40 
 qemu/hw/apic.c   |5 ++---
 qemu/hw/fw_cfg.c |1 +
 qemu/hw/fw_cfg.h |1 +
 qemu/qemu-kvm.c  |5 -
 qemu/sysemu.h|1 +
 qemu/vl.c|   10 --
 8 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/bios/Makefile b/bios/Makefile
index 2d1f40d..374d70e 100644
--- a/bios/Makefile
+++ b/bios/Makefile
@@ -48,7 +48,7 @@ LIBS =  -lm
 RANLIB = ranlib
 
 BCC = bcc
-GCC = gcc $(CFLAGS)
+GCC = gcc $(CFLAGS) -fno-stack-protector
 HOST_CC = gcc
 AS86 = as86
 
diff --git a/bios/rombios32.c b/bios/rombios32.c
index 9d2eaaa..84f15fb 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -443,6 +443,7 @@ uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
 uint8_t bios_uuid[16];
+uint8_t irq0_override;
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -475,6 +476,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_SIGNATURE  0x00
 #define QEMU_CFG_ID 0x01
 #define QEMU_CFG_UUID   0x02
+#define QEMU_CFG_IRQ0_OVERRIDE 0x07
 
 int qemu_cfg_port;
 
@@ -516,6 +518,18 @@ void uuid_probe(void)
 memset(bios_uuid, 0, 16);
 }
 
+void irq0_override_probe(void)
+{
+#ifdef BX_QEMU
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+#endif
+memset(irq0_override, 0, 1);
+}
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1149,6 +1163,8 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+if (irq0_override  i == 2)
+continue;
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1156,7 +1172,10 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else 
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1505,6 +1524,11 @@ void acpi_bios_init(void)
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 sizeof(struct madt_io_apic);
 madt = (void *)(addr);
+for (i = 0; i  16; i++)
+if (PCI_ISA_IRQ_MASK  (1U  i))
+madt_size += sizeof(struct madt_intsrcovr);
+if (irq0_override)
+madt_size += sizeof(struct madt_intsrcovr);
 addr += madt_size;
 
 acpi_tables_size = addr - base_addr;
@@ -1594,8 +1618,15 @@ void acpi_bios_init(void)
 io_apic-interrupt = cpu_to_le32(0);
 
 intsrcovr = (struct madt_intsrcovr*)(io_apic + 1);
-for ( i = 0; i  16; i++ ) {
-if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+for (i = 0; i  16; i++) {
+if (irq0_override  i == 0) {
+memset(intsrcovr, 0, sizeof(*intsrcovr));
+intsrcovr-type   = APIC_XRUPT_OVERRIDE;
+intsrcovr-length = sizeof(*intsrcovr);
+intsrcovr-source = i;
+intsrcovr-gsi= 2;
+intsrcovr-flags  = 0;  //conforms

[PATCH] hpet config mask fix

2009-01-14 Thread Beth Kon
I discovered a bug in the hpet code that caused Windows to boot without 
hpet. The config mask I was using was preventing the guest from placing 
the hpet into 32 bit mode.


diff --git a/qemu/hw/hpet.c b/qemu/hw/hpet.c
index 5c1aca2..7df2d05 100644
--- a/qemu/hw/hpet.c
+++ b/qemu/hw/hpet.c
@@ -388,7 +388,8 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr,
 switch ((addr - 0x100) % 0x20) {
 case HPET_TN_CFG:
 dprintf(qemu: hpet_ram_writel HPET_TN_CFG\n);
-timer-config = hpet_fixup_reg(new_val, old_val, 0x3e4e);
+timer-config = hpet_fixup_reg(new_val, old_val, 
+   HPET_TN_CFG_WRITE_MASK);
 if (new_val  HPET_TN_32BIT) {
 timer-cmp = (uint32_t)timer-cmp;
 timer-period = (uint32_t)timer-period;
@@ -456,7 +457,8 @@ static void hpet_ram_writel(void *opaque, target_phys_addr_t addr,
 case HPET_ID:
 return;
 case HPET_CFG:
-s-config = hpet_fixup_reg(new_val, old_val, 0x3);
+s-config = hpet_fixup_reg(new_val, old_val, 
+   HPET_CFG_WRITE_MASK);
 if (activating_bit(old_val, new_val, HPET_CFG_ENABLE)) {
 /* Enable main counter and interrupt generation. */
 s-hpet_offset = ticks_to_ns(s-hpet_counter)
diff --git a/qemu/hw/hpet_emul.h b/qemu/hw/hpet_emul.h
index fbe7a44..60893b6 100644
--- a/qemu/hw/hpet_emul.h
+++ b/qemu/hw/hpet_emul.h
@@ -36,6 +36,7 @@
 #define HPET_TN_CFG 0x000
 #define HPET_TN_CMP 0x008
 #define HPET_TN_ROUTE   0x010
+#define HPET_CFG_WRITE_MASK  0x3
 
 
 #define HPET_TN_ENABLE   0x004
@@ -45,6 +46,7 @@
 #define HPET_TN_SETVAL   0x040
 #define HPET_TN_32BIT0x100
 #define HPET_TN_INT_ROUTE_MASK  0x3e00
+#define HPET_TN_CFG_WRITE_MASK  0x3f4e
 #define HPET_TN_INT_ROUTE_SHIFT  9
 #define HPET_TN_INT_ROUTE_CAP_SHIFT 32
 #define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U


KVM userspace build fails with 2.6.28-rc7 kernel installed

2008-12-05 Thread Beth Kon
I pulled the latest:
 kvm (commit 3c260758b41000986c3c064b17a9771286e98d1e)
 kvm-userspace (commit 6892f63c18a526c7b54bbde2f59287787eabe1f8)

and built and installed the 2.6.28-rc7 x86_64 kernel from kvm pull, then
tried to build kvm-userspace and the build failed:


make -C /lib/modules/2.6.28-rc7/build M=`pwd` \
LINUXINCLUDE=-I`pwd`/include -Iinclude \
 \
-Iarch/x86/include -I`pwd`/include-compat \
-include include/linux/autoconf.h \
-include `pwd`/x86/external-module-compat.h 
make[2]: Entering directory `/home/beth/git/build/kvm.kernel/kvm'
  LD  /home/beth/git/test/kvm-userspace/kernel/x86/built-in.o
  CC [M]  /home/beth/git/test/kvm-userspace/kernel/x86/svm.o
In file included
from /home/beth/git/test/kvm-userspace/kernel/include/asm/kvm_host.h:64,

from /home/beth/git/test/kvm-userspace/kernel/include/linux/kvm_host.h:67,

from /home/beth/git/test/kvm-userspace/kernel/x86/svm.c:56:
arch/x86/include/asm/mtrr.h:60: error: redefinition of ‘struct
mtrr_var_range’
arch/x86/include/asm/mtrr.h:69: error: redefinition of typedef
‘mtrr_type’
/home/beth/git/test/kvm-userspace/kernel/x86/external-module-compat.h:349: 
error: previous declaration of ‘mtrr_type’ was here
arch/x86/include/asm/mtrr.h:74: error: redefinition of ‘struct
mtrr_state_type’
make[4]: *** [/home/beth/git/test/kvm-userspace/kernel/x86/svm.o] Error
1

When I moved the machine back to 2.6.27.7 the build succeeded.

-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/2] Add HPET emulation to qemu (v3)

2008-10-30 Thread Beth Kon
On Mon, 2008-10-27 at 12:49 +0200, Dor Laor wrote:
 Beth Kon wrote: 
  On Fri, 2008-10-17 at 16:49 +0100, Jamie Lokier wrote:

   Beth Kon wrote:
   
Clock drift on Linux is in the range of .017% - .019%, loaded and 
unloaded. I
haven't found a straightforward way to test on Windows and would 
appreciate
any pointers to existing approaches.
  
   Is there any reason why there should be any clock drift, when the
   guest is using a non-PIT clock?
   
   I'm probably being naive, but with 32-bit or 64-bit HPET counters
   available to the guest, and accurate values from the CMOS clock
   emulation, I don't see why drift would accumulate over the long term
   relative to the host clock.
   
  
  I was measuring with ntpdate, so the drift is with respect to the ntp
  server pool, not the host clock. But in any case, since timer interrupts
  and reads of the hpet counter are at the mercy of the host scheduler
  (i.e., the qemu process can be swapped out at any time during hpet read
  or timer expiration), I'd guess there would always be some amount of
  inaccuracy. Also, qemu checks for timer expiration (qemu_run_timers) as
  part of a bigger loop (main_loop_wait), so the varying amounts of work
  to do elsewhere in the loop from iteration to iteration would also
  introduce irregular delays.

 This is exactly why hpet as the other clock emulation in qemu (pit,
 rtc, pm?) need
 to check whether their irq was really injected. Gleb sent patches for
 the rtc, pit.
 The idea is to check with the irq chip if the injected irq was really
 successful.
 
I assume these are the patches you're referring to?
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18974/focus=18977

Looks like they were never merged. Does anyone know the history on that?
Also, HPET generates edge-triggered interrupts (as dictated by Linux and
Windows) so I'm not sure if this scheme could work for it.
 Dor
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Add HPET emulation to qemu (v3)

2008-10-27 Thread Beth Kon
On Tue, 2008-10-21 at 10:21 -0500, Anthony Liguori wrote: 
 Beth Kon wrote:
snip
Thanks for the feedback, Anthony. I'll only respond where I have
specific comments. Otherwise, I agree to your suggestions and will make
the changes. 
snip 
  +if(timer_enabled(timer)  hpet_enabled(timer-state)) {
  +qemu_irq_pulse(irq);
  +/* windows wants timer0 on irq2 and linux wants irq0, 
  + * so we pulse both 
  + */
  +if (do_ioapic)
  +qemu_irq_pulse(timer-state-irqs[2]);

 
 This seems curious and not quite right.  We should be able to detect 
 whether the HPET is being used in IO APIC mode and raise the appropriate 
 interrupt instead of generating a spurious irq0 interrupt.
 
After digging further on this, it turns out that the need for the 2
interrupts was caused by what looks like a problem with the way qemu is
generating interrupts for the ioapic. I will send out a separate patch
for that issue, and make the necessary changes in this hpet code.
  +}
  +}
  +
  +static void hpet_save(QEMUFile *f, void *opaque)
  +{
  +HPETState *s = opaque;
  +int i;
  +qemu_put_be64s(f, s-config);
  +qemu_put_be64s(f, s-isr);
  +/* save current counter value */
  +s-hpet_counter = hpet_get_ticks(s); 
  +qemu_put_be64s(f, s-hpet_counter);
  +
  +for(i = 0; i  HPET_NUM_TIMERS; i++) {
  +qemu_put_8s(f, s-timer[i].tn);
  +qemu_put_be64s(f, s-timer[i].config);
  +qemu_put_be64s(f, s-timer[i].cmp);
  +qemu_put_be64s(f, s-timer[i].fsb);
  +qemu_put_be64s(f, s-timer[i].period);
  +if (s-timer[i].qemu_timer) {
  +qemu_put_timer(f, s-timer[i].qemu_timer);
  +}

 
 Would qemu_timer ever be NULL?

You're right... the answer is no. I'll fix that.
snip
  +
  +
  +diff = hpet_calculate_diff(t, cur_tick);
  +qemu_mod_timer(t-qemu_timer, qemu_get_clock(vm_clock) 
  ++ (int64_t)ticks_to_ns(diff));

 
 May want to convert ticks_to_ns to take and return an int64_t.  The 
 explicit casting could introduce very subtle bugs.
 
It seems better this way to me, since muldiv64 in ticks_to_ns takes uint64_t. 
The likelihood of diff being big enough to create a problem seems small enough. 
Am I 
missing something?
  +case HPET_COUNTER: 
  +if (hpet_enabled(s))
  +cur_tick = hpet_get_ticks(s);

 
 Any reason for hpet_get_ticks(s) to not have this check integrated into it?
When the hpet is being disabled, we need to get the actual count, even though 
the 
hpet_enabled check would return false. So if I made this change it would 
introduce an 
ordering issue in the disable code (i.e., get the ticks before setting the hpet 
to
disabled)

snip
  +
  +/* XXX this is a dirty hack for HPET support w/o LPC
  +   Actually this is a config descriptor for the RCBA */

 
 What's the dirty hack?
This comment is left over from Alexander Graf's code. I'm not sure why it is in 
this location and will I'll remove it. But
in comments on the first version of hpet code I produced, Alexander said, 
regarding the fixed assignment of HPET_BASE:

This is a dirty hack that I did to make Mac OS X happy. Actually the HPET base 
address gets specified in the RCBA on the
 LPC and is configured by the BIOS to point to a valid address, with 0xfed0 
being the default (IIRC if you write 0 to 
 the fields you end up with that address).

But in other areas of qemu code I see base addresses being hardcoded and am not 
sure anything different needs to be done 
here. Comments?


snip
 
 Regards,
 
 Anthony Liguori
 
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/2] Add HPET emulation to qemu (v3)

2008-10-20 Thread Beth Kon
On Fri, 2008-10-17 at 16:49 +0100, Jamie Lokier wrote:
 Beth Kon wrote:
  Clock drift on Linux is in the range of .017% - .019%, loaded and unloaded. 
  I
  haven't found a straightforward way to test on Windows and would appreciate
  any pointers to existing approaches.
 
 Is there any reason why there should be any clock drift, when the
 guest is using a non-PIT clock?
 
 I'm probably being naive, but with 32-bit or 64-bit HPET counters
 available to the guest, and accurate values from the CMOS clock
 emulation, I don't see why drift would accumulate over the long term
 relative to the host clock.

I was measuring with ntpdate, so the drift is with respect to the ntp
server pool, not the host clock. But in any case, since timer interrupts
and reads of the hpet counter are at the mercy of the host scheduler
(i.e., the qemu process can be swapped out at any time during hpet read
or timer expiration), I'd guess there would always be some amount of
inaccuracy. Also, qemu checks for timer expiration (qemu_run_timers) as
part of a bigger loop (main_loop_wait), so the varying amounts of work
to do elsewhere in the loop from iteration to iteration would also
introduce irregular delays.
 
 -- Jamie
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Add HPET emulation to qemu (v3)

2008-10-17 Thread Beth Kon

-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]
This version contains many miscellaneous changes, including incorporation 
of comments received, addition of save/restore support and a reset handler, 
and a couple of bugfixes.

I've booted Linux and Win2k832 guests on QEMU. Win2k832 still looks shaky
on QEMU even without the hpet - I've gotten intermittent blue screens. But 
I did get it to boot at least once both with and without hpet. 

Clock drift on Linux is in the range of .017% - .019%, loaded and unloaded. I
haven't found a straightforward way to test on Windows and would appreciate
any pointers to existing approaches.


The second patch in this series contains the needed bochs bios changes.


Signed-off-by: Beth Kon [EMAIL PROTECTED]
---
 Makefile.target  |2 +-
 hw/hpet.c|  572 ++
 hw/i8254.c   |   11 +
 hw/mc146818rtc.c |   30 +++-
 hw/pc.c  |2 +
 5 files changed, 614 insertions(+), 3 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index e2edf9d..9e80b3d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -545,7 +545,7 @@ ifeq ($(TARGET_BASE_ARCH), i386)
 OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
-OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o
+OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/hpet.c b/hw/hpet.c
new file mode 100644
index 000..61fbaaf
--- /dev/null
+++ b/hw/hpet.c
@@ -0,0 +1,572 @@
+/*
+ *  High Precisition Event Timer emulation
+ *
+ *  Copyright (c) 2007 Alexander Graf
+ *  Copyright (c) 2008 IBM Corporation
+ *
+ *  Authors: Beth Kon [EMAIL PROTECTED]
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * *
+ *
+ * This driver attempts to emulate an HPET device in software.
+ *
+ */
+#include hw.h
+#include console.h
+#include qemu-timer.h
+
+//#define HPET_DEBUG
+
+#define HPET_BASE		0xfed0
+#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/
+
+#define FS_PER_NS 100
+#define HPET_NUM_TIMERS 3
+#define HPET_TIMER_TYPE_LEVEL 1
+#define HPET_TIMER_TYPE_EDGE 0
+#define HPET_TIMER_DELIVERY_APIC 0
+#define HPET_TIMER_DELIVERY_FSB 1
+#define HPET_TIMER_CAP_FSB_INT_DEL (1  15)
+#define HPET_TIMER_CAP_PER_INT (1  4)
+
+#define HPET_CFG_ENABLE 0x001
+#define HPET_CFG_LEGACY 0x002
+
+#define HPET_ID 0x000
+#define HPET_PERIOD 0x004
+#define HPET_CFG0x010
+#define HPET_STATUS 0x020
+#define HPET_COUNTER0x0f0
+#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/
+#define HPET_TN_CFG 0x000
+#define HPET_TN_CMP 0x008
+#define HPET_TN_ROUTE   0x010
+
+
+#define HPET_TN_ENABLE   0x004
+#define HPET_TN_PERIODIC 0x008
+#define HPET_TN_PERIODIC_CAP 0x010
+#define HPET_TN_SIZE_CAP 0x020
+#define HPET_TN_SETVAL   0x040
+#define HPET_TN_32BIT0x100
+#define HPET_TN_INT_ROUTE_MASK  0x3e00
+#define HPET_TN_INT_ROUTE_SHIFT  9
+#define HPET_TN_INT_ROUTE_CAP_SHIFT 32
+#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U
+
+#define timer_int_route(timer)   \
+((timer-config  HPET_TN_INT_ROUTE_MASK)  HPET_TN_INT_ROUTE_SHIFT)
+
+#define hpet_enabled(s)  (s-config  HPET_CFG_ENABLE)
+#define timer_is_periodic(t) (t-config  HPET_TN_PERIODIC)
+#define timer_enabled(t) (t-config  HPET_TN_ENABLE)
+
+#define hpet_time_after(a, b)   ((int32_t)(b) - (int32_t)(a)  0)
+#define hpet_time_after64(a, b) ((int64_t)(b) - (int64_t)(a)  0)
+
+
+/*indicator hpet is operating in legacy mode */
+int hpet_legacy=0;
+
+struct HPETState;
+typedef struct HPETTimer {  /* timers */
+uint8_t tn; /*timer number*/
+QEMUTimer *qemu_timer;
+struct HPETState *state;
+/* Memory-mapped, software visible timer registers */
+uint64_t config;/* configuration/cap */
+uint64_t cmp;   /* comparator */
+uint64_t fsb;   /* FSB route, not supported now */
+/* Hidden register

Re: Need help with windows debug tools - HPET problems on win2k864

2008-09-14 Thread Beth Kon
On Sat, 2008-09-13 at 07:50 +0300, Avi Kivity wrote:
 Beth Kon wrote:
  I ran into trouble trying to get the hpet working with win2k864. It
  hangs very early on (black screen with Windows is loading Files at the
  bottom). My guess is there are problems with our acpi/bios changes,
  since they introduce some ACPI 2.0 structures and QEMU/KVM supports ACPI
  1.0. We may not have enough 2.0 infrastructure to get it working right
  for 64 bit. Win2k832 does work with the hpet, but 64 doesn't.
 
  So! I need to figure out how to debug Windows and Anthony suggested that
  there may be some people with experience here. 

 
 Something that's worked for me is to enable memory dump triggered by
 ctrl-ctrl-scroll-lock.  There's some registry key you set, and on the
 next hang you can generate a memory dump, which you can then analyze
 with windbg.
 
 Of course, your hang might well occur earlier than disk driver
 initialization, so this is not bulletproof.
 
The problem is, I've been told (and confirmed with a test) I can't add
the hpet after windows is installed. It needs to be present for
installation, and I'm getting the black screen at the start of the
install. So there is no registry to play with yet. My guess is this is
just too early to get useful debug, though I was hoping a checked build
would provide information over the serial port even during install. I'll
keep trying to see if there's a way to make that happen.

Did you need a host and target for the kind of memory dump analysis you
did? Or just use windbg on a local dump file? I ask because I'm trying
to figure out if the -serial pty and -serial /dev/pts/0 is the right
way to set up the null modem cable between 2 guests.
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Need help with windows debug tools - HPET problems on win2k864

2008-09-12 Thread Beth Kon
I ran into trouble trying to get the hpet working with win2k864. It
hangs very early on (black screen with Windows is loading Files at the
bottom). My guess is there are problems with our acpi/bios changes,
since they introduce some ACPI 2.0 structures and QEMU/KVM supports ACPI
1.0. We may not have enough 2.0 infrastructure to get it working right
for 64 bit. Win2k832 does work with the hpet, but 64 doesn't.

So! I need to figure out how to debug Windows and Anthony suggested that
there may be some people with experience here. 

I have set up a host and a target guest on kvm, one started with -serial
pty and the other with -serial /dev/pts/0 (which was the value spit out
by the first guest). I'm trying to test the serial connection by
following instructions at:

http://msdn.microsoft.com/en-us/library/cc266323.aspx

which just sends a message over the serial connection from one guest to
the other. I specify the baud rate as shown , assuming it shouldn't
matter in this emulated environment, but I can't get the message to show
up. Does anyone have any suggestions? Is this the proper way to set up a
null modem cable between 2 guests?

Also, reading the windows docs, I see that there are several debuggers,
like kd, cdb, ntsd, and windbg (gui wrapper). I downloaded the debugging
package from:

http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx#a

which claims to download all tools, but I can't figure out how to invoke
kd directly. It appears not to be there. I can invoke WinDbg from the
Start menu, however, and I assume that will be sufficient for my needs.

Lastly, I still haven't figured out if these debuggers require setup on
both host and target. If so, they will be useless for my needs since I
get a hang before the OS boots. But there must be some way to debug boot
issues. 

If anyone has any comments/suggestions on any of the above, I'd be very
appreciative!

-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Fwd: Re: Need help with windows debug tools - HPET problems on win2k864]

2008-09-12 Thread Beth Kon
Oops... meant to copy the list too...
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]
---BeginMessage---
On Fri, 2008-09-12 at 20:40 +0200, Sebastian Herbszt wrote:
 Beth Kon wrote:
 
 I ran into trouble trying to get the hpet working with win2k864. It
  hangs very early on (black screen with Windows is loading Files at the
  bottom). My guess is there are problems with our acpi/bios changes,
  since they introduce some ACPI 2.0 structures and QEMU/KVM supports ACPI
  1.0. We may not have enough 2.0 infrastructure to get it working right
  for 64 bit. Win2k832 does work with the hpet, but 64 doesn't.
 
 Does win2k864 without hpet work?
 
 Yes, win2k864 works on kvm, but not qemu.
snip
 Maybe the article Using the windows debugger under Xen at
 http://wiki.xensource.com/xenwiki/XenWindowsGplPv can help.

I'll look into that... thanks.
 
 - Sebastian
 
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]
---End Message---


[RFC][PATCH]Problems with hpet on kvm

2008-08-19 Thread Beth Kon
I've been playing with my hpet patch on kvm and seeing some strange 
behavior. The patch I've been using is attached below.

/usr/local/bin/qemu-system-x86_64 -boot cd -hda 
/home/beth/images/ubuntu_server_8.04_10G.img -m 1024 -net nic,model=e1000 -net 
user -smp 2 -vnc :1

With the above command line the boot intermittently fails with an infinite 
roll of error messages that look something like this:

*BEGIN ERROR MESSAGES
...
ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ E] 
- event disabled [20070126]
ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ F] 
- event disabled [20070126]
printk: 242 messages suppressed.
kacpid: page allocation failure. order:0, mode:0x20
Pid: 93, comm: kacpid Not tainted 2.6.25.9 #13

Call Trace:
 IRQ  [8025f143] __alloc_pages+0x325/0x33e
 [8027b27c] kmem_getpages+0xc6/0x194
 [8027b85a] fallback_alloc+0x10d/0x185
 [8027bea7] kmem_cache_alloc+0xbd/0xe7
 [80369944] acpi_ev_asynch_execute_gpe_method+0x0/0x117
 [80362e9f] acpi_os_execute+0x2e/0x9a
 [80369823] acpi_ev_gpe_dispatch+0xd0/0x149
 [80369b0c] acpi_ev_gpe_detect+0xb1/0x104
 [80367600] acpi_ev_fixed_event_detect+0x34/0xd4
 [8036800a] acpi_ev_sci_xrupt_handler+0x1a/0x22
 [80362895] acpi_irq+0x11/0x23
 [802553a0] handle_IRQ_event+0x25/0x53
 [802567f6] handle_fasteoi_irq+0x90/0xc8
 [8020da12] do_IRQ+0xf1/0x15f
 [8020b471] ret_from_intr+0x0/0xa
 [80233398] __do_softirq+0x5a/0xce
 [8020c0ec] call_softirq+0x1c/0x28
 [8020d794] do_softirq+0x2c/0x68
 [802332fa] irq_exit+0x3f/0x83
 [8020da5f] do_IRQ+0x13e/0x15f
 [8020b471] ret_from_intr+0x0/0xa
 EOI  [80371f1c] acpi_ns_get_parent_node+0x14/0x15
 [80371b08] acpi_ns_delete_namespace_by_owner+0xb7/0xde
 [80365641] acpi_ds_terminate_control_method+0x73/0xc6
 [80373933] acpi_ps_parse_aml+0x179/0x254
 [80374c4c] acpi_ps_execute_method+0x12b/0x1d7
 [80371c18] acpi_ns_evaluate+0xa4/0x100
 [80369a08] acpi_ev_asynch_execute_gpe_method+0xc4/0x117
 [80362dd6] acpi_os_execute_deferred+0x0/0x2c
 [80362df9] acpi_os_execute_deferred+0x23/0x2c
 [8023cb6c] run_workqueue+0x79/0x104
 [8023d47f] worker_thread+0xd9/0xe8
 [8023fc91] autoremove_wake_function+0x0/0x2e
 [8023d3a6] worker_thread+0x0/0xe8
 [8023fb5d] kthread+0x47/0x76
 [80229e74] schedule_tail+0x28/0x5c
 [8020bd78] child_rip+0xa/0x12
 [8023fb16] kthread+0x0/0x76
 [8020bd6e] child_rip+0x0/0x12

Mem-info:
Node 0 DMA per-cpu:
CPU0: hi:0, btch:   1 usd:   0
CPU1: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: hi:  186, btch:  31 usd:  65
CPU1: hi:  186, btch:  31 usd: 184
Active:0 inactive:0 dirty:0 writeback:0 unstable:0
 free:0 slab:255796 mapped:0 pagetables:0 bounce:0
Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB 
present:8924kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB 
present:1018020kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 
0*2048kB 0*4096kB = 0kB
Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 
0*2048kB 0*4096kB = 0kB
0 total pagecache pages
Swap cache: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
Free swap:0kB
262128 pages of RAM
5637 reserved pages
0 pages shared
0 pages swap cached
ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ 8] 
- event disabled [20070126]
ACPI Exception (evgpe-0704): AE_NO_MEMORY, Unable to queue handler for GPE[ 9] 
- event disabled [20070126]
...
*END ERROR MESSAGES**

If I add -no-kvm-irqchip, the error disappears. 

Can anyone offer any insight about what is going on here? I don't know if it 
is related, but booting linux with the hpet seems to stall in some places, and 
I don't see that when booting without the hpet.

Other than this problem, I have booted win2k8 and linux with the hpet. The 
only other odd situation is that, to get linux to work I 
have to use irq 0 for timer0, but to get windows to work, I have to 
use irq 2. In hpet.c update_irq:

 if (timer-tn == 0)
irq=timer-state-irqs[0];

must be changed to 

 if (timer-tn == 0)
irq=timer-state-irqs[2];

to get win2k8 to boot. 

Any ideas?

Beth Kon
IBM Linux Technology Center
**

signed-off-by Beth Kon [EMAIL PROTECTED]

diff --git a/qemu/Makefile.target b/qemu/Makefile.target
index a86464f..8634186 100644
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -607,7 +607,7 @@ ifeq

[RFC][PATCH] Add HPET emulation to qemu (v2)

2008-08-02 Thread Beth Kon
Major changes:

- Rebased to register-based operations for ease of save/restore.
- Looked through Xen's hpet implementation and picked up a bunch of 
  things, though not quite everything yet. Thanks!
- PIT and RTC are entirely disabled in legacy mode, not just their 
  interrupts.
 
There is still a bunch to do but I'm re-posting primarily because 
of the switch to register-based. I have still only tested with a 
linux guest. Windows guest is next on my list... as soon as I return 
from my week vacation.

I've been playing with CONFIG_NO_HZ and been surprised by the 
results.  I was trying to reproduce the wakeup every 10ms that 
Samuel Thibault mentioned, thinking the HPET would improve it. 
But for an idle guest in both cases (with and without HPET), the 
number of wakeups per second was relatively low (28). Ultimately 
this depends on exactly what the guest is doing
when idle, so maybe the HPET won't provide any improvement here. 
But in any case, I didn't see the 10ms wakeup cycle with CONFIG_NO_HZ. 
If anyone can shed any light on this, I could look into it more if need
be.
 
Signed-off-by: Beth Kon [EMAIL PROTECTED]
***

Makefile.target  |2 
hw/hpet.c|  441 +++
hw/i8254.c   |   11 +
hw/mc146818rtc.c |   30 +++
hw/pc.c  |2 
5 files changed, 483 insertions(+), 3 deletions(-)

***

diff --git a/Makefile.target b/Makefile.target
index 42162c3..946bdef 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -536,7 +536,7 @@ ifeq ($(TARGET_BASE_ARCH), i386)
 OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
-OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o
+OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/hpet.c b/hw/hpet.c
new file mode 100644
index 000..adfecf0
--- /dev/null
+++ b/hw/hpet.c
@@ -0,0 +1,441 @@
+/*
+ *  High Precisition Event Timer emulation
+ *
+ *  Copyright (c) 2007 Alexander Graf
+ *  Copyright (c) 2008 IBM Corporation
+ *
+ *  Authors: Beth Kon [EMAIL PROTECTED]
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * *
+ *
+ * This driver attempts to emulate an HPET device in software. It is by no
+ * means complete and is prone to break on certain conditions.
+ *
+ */
+#include hw.h
+#include console.h
+#include qemu-timer.h
+
+//#define HPET_DEBUG
+
+#define HPET_BASE  0xfed0
+#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/
+
+#define FS_PER_NS 100
+#define HPET_NUM_TIMERS 3
+#define HPET_TIMER_TYPE_LEVEL 1
+#define HPET_TIMER_TYPE_EDGE 0
+#define HPET_TIMER_DELIVERY_APIC 0
+#define HPET_TIMER_DELIVERY_FSB 1
+#define HPET_TIMER_CAP_FSB_INT_DEL (1  15)
+#define HPET_TIMER_CAP_PER_INT (1  4)
+
+#define HPET_CFG_ENABLE 0x001
+#define HPET_CFG_LEGACY 0x002
+
+#define HPET_ID 0x000
+#define HPET_PERIOD 0x004
+#define HPET_CFG0x010
+#define HPET_STATUS 0x020
+#define HPET_COUNTER0x0f0
+#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/
+#define HPET_TN_CFG 0x000
+#define HPET_TN_CMP 0x008
+#define HPET_TN_ROUTE   0x010
+
+
+#define HPET_TN_INT_TYPE_LEVEL   0x002
+#define HPET_TN_ENABLE   0x004
+#define HPET_TN_PERIODIC 0x008
+#define HPET_TN_PERIODIC_CAP 0x010
+#define HPET_TN_SIZE_CAP 0x020
+#define HPET_TN_SETVAL   0x040
+#define HPET_TN_32BIT0x100
+#define HPET_TN_INT_ROUTE_MASK  0x3e00
+#define HPET_TN_INT_ROUTE_SHIFT  9
+#define HPET_TN_INT_ROUTE_CAP_SHIFT 32
+#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U
+
+#define timer_int_route(timer)   \
+((timer-config  HPET_TN_INT_ROUTE_MASK)  HPET_TN_INT_ROUTE_SHIFT)
+
+#define hpet_enabled(s)  (s-config  HPET_CFG_ENABLE)
+#define timer_is_periodic(t) (t-config  HPET_TN_PERIODIC)
+#define timer_enabled(t) (t-config  HPET_TN_ENABLE)
+
+struct HPETState;
+typedef struct HPETTimer {  /* timers */
+uint8_t tn; /*timer number*/
+QEMUTimer *qemu_timer;
+struct HPETState

[RFC][PATCH] Add HPET emulation to qemu (v2)

2008-08-01 Thread Beth Kon
Major changes:

- Rebased to register-based operations for ease of save/restore.
- Looked through Xen's hpet implementation and picked up a bunch of 
  things, though not quite everything yet. Thanks!
- PIT and RTC are entirely disabled in legacy mode, not just their 
  interrupts.
 
There is still a bunch to do but I'm re-posting primarily because 
of the switch to register-based. I have still only tested with a 
linux guest. Windows guest is next on my list... as soon as I return 
from my week vacation.

I've been playing with CONFIG_NO_HZ and been surprised by the 
results.  I was trying to reproduce the wakeup every 10ms that 
Samuel Thibault mentioned, thinking the HPET would improve it. 
But for an idle guest in both cases (with and without HPET), the 
number of wakeups per second was relatively low (28). Ultimately 
this depends on exactly what the guest is doing
when idle, so maybe the HPET won't provide any improvement here. 
But in any case, I didn't see the 10ms wakeup cycle with CONFIG_NO_HZ. 
If anyone can shed any light on this, I could look into it more if need
be.
 
Signed-off-by: Beth Kon [EMAIL PROTECTED]
***

Makefile.target  |2 
hw/hpet.c|  441 +++
hw/i8254.c   |   11 +
hw/mc146818rtc.c |   30 +++
hw/pc.c  |2 
5 files changed, 483 insertions(+), 3 deletions(-)

***

diff --git a/Makefile.target b/Makefile.target
index 42162c3..946bdef 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -536,7 +536,7 @@ ifeq ($(TARGET_BASE_ARCH), i386)
 OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
-OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o
+OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/hpet.c b/hw/hpet.c
new file mode 100644
index 000..adfecf0
--- /dev/null
+++ b/hw/hpet.c
@@ -0,0 +1,441 @@
+/*
+ *  High Precisition Event Timer emulation
+ *
+ *  Copyright (c) 2007 Alexander Graf
+ *  Copyright (c) 2008 IBM Corporation
+ *
+ *  Authors: Beth Kon [EMAIL PROTECTED]
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * *
+ *
+ * This driver attempts to emulate an HPET device in software. It is by no
+ * means complete and is prone to break on certain conditions.
+ *
+ */
+#include hw.h
+#include console.h
+#include qemu-timer.h
+
+//#define HPET_DEBUG
+
+#define HPET_BASE  0xfed0
+#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/
+
+#define FS_PER_NS 100
+#define HPET_NUM_TIMERS 3
+#define HPET_TIMER_TYPE_LEVEL 1
+#define HPET_TIMER_TYPE_EDGE 0
+#define HPET_TIMER_DELIVERY_APIC 0
+#define HPET_TIMER_DELIVERY_FSB 1
+#define HPET_TIMER_CAP_FSB_INT_DEL (1  15)
+#define HPET_TIMER_CAP_PER_INT (1  4)
+
+#define HPET_CFG_ENABLE 0x001
+#define HPET_CFG_LEGACY 0x002
+
+#define HPET_ID 0x000
+#define HPET_PERIOD 0x004
+#define HPET_CFG0x010
+#define HPET_STATUS 0x020
+#define HPET_COUNTER0x0f0
+#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/
+#define HPET_TN_CFG 0x000
+#define HPET_TN_CMP 0x008
+#define HPET_TN_ROUTE   0x010
+
+
+#define HPET_TN_INT_TYPE_LEVEL   0x002
+#define HPET_TN_ENABLE   0x004
+#define HPET_TN_PERIODIC 0x008
+#define HPET_TN_PERIODIC_CAP 0x010
+#define HPET_TN_SIZE_CAP 0x020
+#define HPET_TN_SETVAL   0x040
+#define HPET_TN_32BIT0x100
+#define HPET_TN_INT_ROUTE_MASK  0x3e00
+#define HPET_TN_INT_ROUTE_SHIFT  9
+#define HPET_TN_INT_ROUTE_CAP_SHIFT 32
+#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U
+
+#define timer_int_route(timer)   \
+((timer-config  HPET_TN_INT_ROUTE_MASK)  HPET_TN_INT_ROUTE_SHIFT)
+
+#define hpet_enabled(s)  (s-config  HPET_CFG_ENABLE)
+#define timer_is_periodic(t) (t-config  HPET_TN_PERIODIC)
+#define timer_enabled(t) (t-config  HPET_TN_ENABLE)
+
+struct HPETState;
+typedef struct HPETTimer {  /* timers */
+uint8_t tn; /*timer number*/
+QEMUTimer *qemu_timer;
+struct HPETState

[RFC][PATCH] Add HPET emulation to qemu (v2)

2008-08-01 Thread Beth Kon
Major changes:

- Rebased to register-based operations for ease of save/restore.
- Looked through Xen's hpet implementation and picked up a bunch of 
  things, though not quite everything yet. Thanks!
- PIT and RTC are entirely disabled in legacy mode, not just their 
  interrupts.
 
There is still a bunch to do but I'm re-posting primarily because 
of the switch to register-based. I have still only tested with a 
linux guest. Windows guest is next on my list... as soon as I return 
from my week vacation.

I've been playing with CONFIG_NO_HZ and been surprised by the 
results.  I was trying to reproduce the wakeup every 10ms that 
Samuel Thibault mentioned, thinking the HPET would improve it. 
But for an idle guest in both cases (with and without HPET), the 
number of wakeups per second was relatively low (28). Ultimately 
this depends on exactly what the guest is doing
when idle, so maybe the HPET won't provide any improvement here. 
But in any case, I didn't see the 10ms wakeup cycle with CONFIG_NO_HZ. 
If anyone can shed any light on this, I could look into it more if need
be.
 
Signed-off-by: Beth Kon [EMAIL PROTECTED]
***

Makefile.target  |2 
hw/hpet.c|  441 +++
hw/i8254.c   |   11 +
hw/mc146818rtc.c |   30 +++
hw/pc.c  |2 
5 files changed, 483 insertions(+), 3 deletions(-)

***

diff --git a/Makefile.target b/Makefile.target
index 42162c3..946bdef 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -536,7 +536,7 @@ ifeq ($(TARGET_BASE_ARCH), i386)
 OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
-OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o
+OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
diff --git a/hw/hpet.c b/hw/hpet.c
new file mode 100644
index 000..adfecf0
--- /dev/null
+++ b/hw/hpet.c
@@ -0,0 +1,441 @@
+/*
+ *  High Precisition Event Timer emulation
+ *
+ *  Copyright (c) 2007 Alexander Graf
+ *  Copyright (c) 2008 IBM Corporation
+ *
+ *  Authors: Beth Kon [EMAIL PROTECTED]
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * *
+ *
+ * This driver attempts to emulate an HPET device in software. It is by no
+ * means complete and is prone to break on certain conditions.
+ *
+ */
+#include hw.h
+#include console.h
+#include qemu-timer.h
+
+//#define HPET_DEBUG
+
+#define HPET_BASE  0xfed0
+#define HPET_CLK_PERIOD 1000ULL /* 1000 femtoseconds == 10ns*/
+
+#define FS_PER_NS 100
+#define HPET_NUM_TIMERS 3
+#define HPET_TIMER_TYPE_LEVEL 1
+#define HPET_TIMER_TYPE_EDGE 0
+#define HPET_TIMER_DELIVERY_APIC 0
+#define HPET_TIMER_DELIVERY_FSB 1
+#define HPET_TIMER_CAP_FSB_INT_DEL (1  15)
+#define HPET_TIMER_CAP_PER_INT (1  4)
+
+#define HPET_CFG_ENABLE 0x001
+#define HPET_CFG_LEGACY 0x002
+
+#define HPET_ID 0x000
+#define HPET_PERIOD 0x004
+#define HPET_CFG0x010
+#define HPET_STATUS 0x020
+#define HPET_COUNTER0x0f0
+#define HPET_TN_REGS0x100 ... 0x3ff /*address range of all TN regs*/
+#define HPET_TN_CFG 0x000
+#define HPET_TN_CMP 0x008
+#define HPET_TN_ROUTE   0x010
+
+
+#define HPET_TN_INT_TYPE_LEVEL   0x002
+#define HPET_TN_ENABLE   0x004
+#define HPET_TN_PERIODIC 0x008
+#define HPET_TN_PERIODIC_CAP 0x010
+#define HPET_TN_SIZE_CAP 0x020
+#define HPET_TN_SETVAL   0x040
+#define HPET_TN_32BIT0x100
+#define HPET_TN_INT_ROUTE_MASK  0x3e00
+#define HPET_TN_INT_ROUTE_SHIFT  9
+#define HPET_TN_INT_ROUTE_CAP_SHIFT 32
+#define HPET_TN_CFG_BITS_READONLY_OR_RESERVED 0x80b1U
+
+#define timer_int_route(timer)   \
+((timer-config  HPET_TN_INT_ROUTE_MASK)  HPET_TN_INT_ROUTE_SHIFT)
+
+#define hpet_enabled(s)  (s-config  HPET_CFG_ENABLE)
+#define timer_is_periodic(t) (t-config  HPET_TN_PERIODIC)
+#define timer_enabled(t) (t-config  HPET_TN_ENABLE)
+
+struct HPETState;
+typedef struct HPETTimer {  /* timers */
+uint8_t tn; /*timer number*/
+QEMUTimer *qemu_timer;
+struct HPETState

Re: [RFC][PATCH] Add HPET emulation to qemu

2008-07-22 Thread Beth Kon
On Sat, 2008-07-12 at 17:42 +0200, Alexander Graf wrote:
 Hi Beth,
 
 On Jul 10, 2008, at 5:48 AM, Beth Kon wrote:
 
  This patch, based on an earlier patch by Alexander Graf, adds HPET
  emulation to qemu. I am sending out a separate patch to kvm with the
  required bios changes.
 
  This work is incomplete.
 
 Wow it's good to see that someone's working on it. I am pretty sure  
 that you're basing on an older version of my HPET emulation, so you  
 might also want to take a look at the current patch file residing in 
 http://alex.csgraf.de/qemu/osxpatches.tar.bz2
snip
Hi Alex. Thanks for the feedback! Sorry for the delayed response, I've
been on vacation. I did check the patch you pointed me to and it is
actually the same one that I started with.
 
 While reading through the code I realized how badly commented it is.  
 At least the functions should have some comments on them what their  
 purpose is.
 Furthermore there still are a lot of magic numbers in the code. While  
 that is normal qemu code style and I wrote it this way, I'm not too  
 fond of it. So it might be a good idea to at least make the switch  
 numbers defines.
 
Ok... added those things to my todo list :-)

 
  The one area that feels ugly/wrong at the moment is handling the
  disabling of 8254 and RTC timer interrupts when operating in legacy
  mode. The HPET spec says in this case the 8254/RTC timer will not  
  cause
  any interrupts. I'm not sure if I should disable the RTC/8254 in some
  more general way, or just disable interrupts. Comments appreciated.
 
 IIRC the spec defines that the HPET _can_ replace the 8254, but does  
 not have to. So you should be mostly fine on that part.
 
 

  +
  +//#define HPET_DEBUG
  +
  +#define HPET_BASE  0xfed0
 
 This is a dirty hack that I did to make Mac OS X happy. Actually the  
 HPET base address gets specified in the RCBA on the LPC and is  
 configured by the BIOS to point to a valid address, with 0xfed0  
 being the default (IIRC  if you write 0 to the fields you end up with  
 that address).
Yes, Ryan Harper's BIOS patch that was submitted with this patch
specified the HPET address in ACPI. I am not familiar with this stuff,
so not sure how that relates to the RCBA and whether more needs to be
done here. For the time being I'll add it to the todo list.

 
 
  +#define HPET_PERIOD 0x00989680 /* 1000 femptoseconds,
  10ns*/
 
 Any reason why this is a hex value? I find 1000 a lot easier to  
 read :-)
 
Well that's a VERY good question! Job security? :-)

 
  
  +static uint32_t hpet_ram_readw(void *opaque, target_phys_addr_t addr)
  +{
  +#ifdef HPET_DEBUG
  +fprintf(stderr, qemu: hpet_read w at %#lx\n, addr);
  +#endif
  +return 10;
  +}
 
 If I'm not completely mistaken, all reads and writes need to be in 32-  
 or 64-bit mode. So it's pretty safe to remove these. I only added them  
 to see if Mac OS X actually would access them. To still enable other  
 people to do the same you might as well ifdef them out.
 
Yep, you're right. I'll do that.
snip
 
  +
  +static uint32_t hpet_ram_readl(void *opaque, target_phys_addr_t addr)
  +{
  +HPETState *s = (HPETState *)opaque;
  +#ifdef HPET_DEBUG
  +  fprintf(stderr, qemu: hpet_read l at %#lx\n, addr);
  +#endif
  +switch(addr - HPET_BASE) {
  +case 0x00:
  +return 0x8086a201;
  +case 0x04:
  +return HPET_PERIOD;
  +case 0x10:
  +return ((s-legacy_route  1) | s-enabled);
  +case 0x14:
  +#ifdef HPET_DEBUG
  +fprintf(stderr, qemu: invalid hpet_read l at %#lx\n,
  addr);
  +#endif
  +return 0;
  +case 0xf0:
  +s-hpet_counter = ns_to_ticks(qemu_get_clock(vm_clock)
  +  - s-hpet_offset) ;
 
 I'm having trouble understanding this part. The hpet_counter is  
 actually the ticks of the internal main clock of the HPET. This value  
 is actually supposed to constantly change wrt to the current time. The  
 timers in the HPET can now compare themselves to the current value  
 of the hpet_counter at all times, rising an interrupt if something  
 matches.
 
 So far for the theory. I thought that it'd be a lot more convenient to  
 simply write down an internal offset (hpet_counter) to the actual  
 clock and base all calculations on that, so we can actually trigger a  
 timer based on event_time - offset. I don't see any reason to set  
 hpet_counter again, but maybe it's been too long that I've looked at  
 that code :).
 
 Hum ... having looked further, is that what hpet_offset is supposed to  
 be and the reason you're setting s-updated?

Ok, let me explain my thinking. The hpet_counter tracks elapsed ticks,
but these ticks are emulated based on the value of the vm_clock. So to
find the elapsed ticks, I have to determine how many nanoseconds have
elapsed since the hpet_counter was enabled by software, then translate
nanoseconds to ticks

[PATCH] Add HPET support to BIOS

2008-07-09 Thread Beth Kon
This patch, written by Ryan Harper, adds HPET support to BIOS.

Signed-off-by: Beth Kon [EMAIL PROTECTED]

diff --git a/bios/Makefile b/bios/Makefile
index 48022ea..3e73fb5 100644
--- a/bios/Makefile
+++ b/bios/Makefile
@@ -40,7 +40,7 @@ LIBS =  -lm
 RANLIB = ranlib
 
 BCC = bcc
-GCC = gcc -m32
+GCC = gcc -m32 -fno-stack-protector
 HOST_CC = gcc
 AS86 = as86
 
diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index d1bfa2c..1548c86 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -262,6 +262,24 @@ DefinitionBlock (
 Return (MEMP)
 }
 }
+Device(HPET) {
+Name(_HID,  EISAID(PNP0103))
+Name(_UID, 0)
+Method (_STA, 0, NotSerialized) {
+Return(0x00)
+}
+Name(_CRS, ResourceTemplate() {
+DWordMemory(
+ResourceConsumer, PosDecode, MinFixed, MaxFixed,
+NonCacheable, ReadWrite,
+0x,
+0xFED0,
+0xFED003FF,
+0x,
+0x0400 /* 1K memory: FED0 - FED003FF */
+)
+})
+}
 }
 
 Scope(\_SB.PCI0) {
@@ -628,7 +646,7 @@ DefinitionBlock (
 {
 Or (PRQ3, 0x80, PRQ3)
 }
-Method (_CRS, 0, NotSerialized)
+Method (_CRS, 1, NotSerialized)
 {
 Name (PRR0, ResourceTemplate ()
 {
diff --git a/bios/rombios32.c b/bios/rombios32.c
index 2dc1d25..c1ec015 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -1182,7 +1182,7 @@ struct rsdp_descriptor /* Root System
Descriptor Pointer */
 struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table
header */
-   uint32_t table_offset_entry [2]; /* Array
of pointers to other */
+   uint32_t table_offset_entry [3]; /* Array
of pointers to other */
 /* ACPI tables */
 };
 
@@ -1322,6 +1322,30 @@ struct madt_processor_apic
 #endif
 };
 
+/*
+ * ACPI 2.0 Generic Address Space definition.
+ */
+struct acpi_20_generic_address {
+uint8_t  address_space_id;
+uint8_t  register_bit_width;
+uint8_t  register_bit_offset;
+uint8_t  reserved;
+uint64_t address;
+};
+
+/*
+ * HPET Description Table
+ */
+struct acpi_20_hpet {
+ACPI_TABLE_HEADER_DEF   /* ACPI common
table header */
+uint32_t   timer_block_id;
+struct acpi_20_generic_address addr;
+uint8_thpet_number;
+uint16_t   min_tick;
+uint8_tpage_protect;
+};
+#define ACPI_HPET_ADDRESS 0xFED0UL
+
 struct madt_io_apic
 {
APIC_HEADER_DEF
@@ -1393,8 +1417,9 @@ void acpi_bios_init(void)
 struct fadt_descriptor_rev1 *fadt;
 struct facs_descriptor_rev1 *facs;
 struct multiple_apic_table *madt;
+struct acpi_20_hpet *hpet;
 uint8_t *dsdt;
-uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr,
dsdt_addr;
+uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr,
dsdt_addr, hpet_addr;
 uint32_t acpi_tables_size, madt_addr, madt_size;
 int i;
 
@@ -1436,6 +1461,11 @@ void acpi_bios_init(void)
 madt = (void *)(addr);
 addr += madt_size;
 
+addr = (addr + 7)  ~7;
+hpet_addr = addr;
+hpet = (void *)(addr);
+addr += sizeof(*hpet);
+
 acpi_tables_size = addr - base_addr;
 
 BX_INFO(ACPI tables: RSDP addr=0x%08lx ACPI DATA addr=0x%08lx
size=0x%x\n,
@@ -1457,6 +1487,7 @@ void acpi_bios_init(void)
 memset(rsdt, 0, sizeof(*rsdt));
 rsdt-table_offset_entry[0] = cpu_to_le32(fadt_addr);
 rsdt-table_offset_entry[1] = cpu_to_le32(madt_addr);
+rsdt-table_offset_entry[2] = cpu_to_le32(hpet_addr);
 acpi_build_table_header((struct acpi_table_header *)rsdt,
 RSDT, sizeof(*rsdt), 1);
 
@@ -1540,6 +1571,15 @@ void acpi_bios_init(void)
 acpi_build_table_header((struct acpi_table_header *)madt,
 APIC, madt_size, 1);
 }
+
+/* HPET */
+memset(hpet, 0, sizeof(*hpet));
+hpet-timer_block_id = cpu_to_le32(0x8086a201);
+   // hpet-timer_block_id = cpu_to_le32(0x80862201);
+hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
+acpi_build_table_header((struct  acpi_table_header *)hpet,
+ HPET, sizeof(*hpet), 1);
+
 }
 
 /* SMBIOS entry point -- must be written to a 16-bit aligned address



-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ata exception messages

2008-06-04 Thread Beth Kon
On Tue, 2008-06-03 at 10:49 -0400, Beth Kon wrote:
 I'm running an Ubuntu 7.10 guest on a kvm git build (commit
 3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated
 messages like:
 
 [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
 frozen
 [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb
 0x0 data 4096 out
 
 I see that they're coming from ata_eh_link_report in
 drivers/ata/libata-eh.c but am not familiar enough with this code to
 understand what the problem is.
 
 Does anyone have any idea what might be causing this? 
 
I discovered that these messages were associated with my disk image being NFS 
mounted. 
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ata exception messages

2008-06-04 Thread Beth Kon
On Wed, 2008-06-04 at 15:38 +0300, Avi Kivity wrote:
 Beth Kon wrote:
  On Tue, 2008-06-03 at 10:49 -0400, Beth Kon wrote:

  I'm running an Ubuntu 7.10 guest on a kvm git build (commit
  3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated
  messages like:
 
  [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
  frozen
  [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb
  0x0 data 4096 out
 
  I see that they're coming from ata_eh_link_report in
  drivers/ata/libata-eh.c but am not familiar enough with this code to
  understand what the problem is.
 
  Does anyone have any idea what might be causing this? 
 
  
  I discovered that these messages were associated with my disk image being 
  NFS 
  mounted. 
Yes, the network has been misbehaving lately, so could be causing
timeouts.
 
 Interesting.  Is it an exceptionally slow server (or perhaps, on a lossy 
 network)?
 
 I can see how timeouts can annoy the ide driver, but I've never seen 
 this myself.
 
-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ata exception messages

2008-06-03 Thread Beth Kon
I'm running an Ubuntu 7.10 guest on a kvm git build (commit
3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated
messages like:

[3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
[3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb
0x0 data 4096 out

I see that they're coming from ata_eh_link_report in
drivers/ata/libata-eh.c but am not familiar enough with this code to
understand what the problem is.

Does anyone have any idea what might be causing this? 


-- 
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Libvir] The problem of the definition of tuning informations

2007-11-13 Thread beth kon

beth kon wrote:


Daniel Veillard wrote:


On Thu, Nov 08, 2007 at 02:00:10PM -0600, Ryan Harper wrote:
 


* Daniel Veillard [EMAIL PROTECTED] [2007-11-08 10:08]:
  


 I promised that mail for the beginning of the week but I still have
I think tuning informations are that set of parameters associated
to a domain or a host, which are not stricly needed to get the 
domain(s) working but improve their runtime behaviour.

To me this includes:
  - scheduling parameters the scope may be host/hypervisor/domain
  - vcpu affinity i.e. to which set of physical CPU each of the
vcpu may be bound
  - and possibly others ...

The problem:

People would like to associate those to the XML domain informations,
the goal being to be able to restore those informations when a domain
(re-)starts. I have been objecting it so far because, I think those 
informations
don't have the same lifetime and scope as the other domain 
informations

saved in the XML. Since they are not needed to start the domain, and
that once the domain is started the existing domain API can be used
to change those informations, it is better to keep them separate.



For at least (maybe only) Xen NUMA systems, the application of tuning
information after a domain is started does not achieve the same affect
as including the information during the initial construction of the
domain.  In particular, Xen needs to know which physical cpus are being
used to determine which cpus it from which numanode it will allocate
memory.  Adjusting affinity after the domain has allocated memory
doesn't allow libvirt or any management app to control from which node
domains pull memory.
  



 yes, I understand and that's why I agreed to add the cpuset information
at that point it's more than tunning because it may be irreversible 
for the

lifetime of the domain, so this really should be in the XML. I'm not
suggesting to go back about 'cpu affinity' i.e. to which physical CPUs
a domain should be bound, but 'vcpu affinity' i.e. then how the virtual
CPUs of the domain are mapped onto that cpu set, that can change
dynamically without (serious) performance penalty.
 

I don't have any objection to separating tuning information as 
long as

we have the ability to merge permanent domain parameters with its
tuning information prior to domain construction.
  



 My point is that you don't need the tuning informations to create the
domain, if you need them it's not tuning. When you say you want to
merge them, do you want this to create the domain ? It should not
be necessary (or I take a counter example that would help me), right ?
 

It seems to me that the only reason cpuset information is being 
treated as more than tuning is due to an artifact of Xen (i.e., it 
must be specified at domain creation). For KVM, for example, I believe 
this can be specified after domain creation.


From a libvirt perspective, I think the XML config/tuning split should 
be hypervisor-neutral, and based solely on what is required to get a 
domain running (ignoring performance):


1) XML contains arguments absolutely needed to start a domain in any 
hypervisor. This could be thought of as the minimum requirements for 
starting a domin.


2) Tuning information contains arguments that affect performance, and 
may be changed.


When a domain is started, the caller can specify a minimal start (XML 
only) or a tuned start (XML plus tuning). Lower level libvirt code 
would understand the specifics of the hypervisor well enough to know 
whether it had to include some of the tuning information at domain 
creation time.


Daniel and I have been discussing this a bit on IRC, so I will dump that 
information on the list... (correct me if I misstate something here, 
Daniel :-)


Daniel wants to have the xml contain all parameters that must be 
specified at domain creation in order to achieve proper function, and 
cannot be tuned later. I agree this is a reasonable definition. In this 
case, cpuset would need to be in the xml.


My concern is that currently Xen will fail a domain create request if 
the cpu is out of range with the error invalid argument, so the user 
will not have enough information to correct the problem in the xml and 
try again. We can pursue getting a more explicit error message from 
Xen.  Or Xen could ignore the cpuset and start the domain, perhaps with 
a warning message.


My thinking was that ideally it might be good to have libvirt provide 2 
start methods - minimal and tuned, but Daniel thinks it is not worth the 
complexity. It should be up to the user to correct issues in the xml and 
try again.





Daniel

 







--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[Libvir] latest NUMA/cpuset code testing

2007-11-13 Thread beth kon
I tested the latest CVS libvirt on a 128-way x3950 and create, define, 
and start appear to work well with various cpusets specified. The only 
thing I noticed was that dumpxml does not grab the cpuset info. I have 
not looked at the code to verify, but is this expected? Maybe you 
discussed this Daniel, but I can't remember at the moment.


--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] The problem of the definition of tuning informations

2007-11-08 Thread beth kon

Daniel Veillard wrote:

snip




My opinion:
---

We need better tools, even for simple use case to be able to save
an existing tuning for a domain or a full machine, and reload it 
when needed. This is IMHO better done on top of the existing API

which already have the entry points to implement them. My idea is
to provide tuning commands in virsh [5]. If you implement tuning both
at creation time and in the tool, this mean you either make them
different in which case you have no coherency between what you say
when you create a domain or save its config and what you do at the
virsh level. If you don't make it different (for example trying to
use the same kind of XML syntax), then you need code for doing this
both in the tool and in the library itself, or you export as a
new API the tuning load and save. Exporting as a parallel API what 
we have already for scheduling and VCPU affinity makes the API

more complex, and less coherent.

 


Daniel,

Just to be sure I understand, are you suggesting removing tuning 
information from any configuration file and making it a runtime exercise 
to set it up? (That is, after the domain has been started)


--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [PATCH] finish NUMA code reorg, plug cpuset at creat time support

2007-11-02 Thread beth kon

Daniel Veillard wrote:


On Wed, Oct 31, 2007 at 05:20:01PM +0900, Saori Fukuta wrote:
 


On Tue, 30 Oct 2007 16:54:36 -0400 Daniel Veillard wrote:
   

 The associated patch compiles, but I have not yet tested it, it's 
basically how I would expect to finish the NUMA work, but it certainly need

debug and testing. I will look at this tomorrow, but I welcome feedback :-)
 


sounds good to me, I tested with your patch, and I have four fixes for it.
   

I tested with the latest code, and found a problem with specifying a 
value in the cpuset in XML that is greater than maxcpu-1. The attached 
patch corrects this behavior, but the issue remains that an out of range 
cpu will prevent the domain from starting. This is the way xm create 
works today.


In talking with Daniel about this, he made the point that it is not 
necessarily desirable for the failure of a tuning parameter (e.g., 
config specifies cpu 5 and the domain is now being started on a 4 cpu 
machine) to cause domain creation to fail. He mentioned phones ringing 
in the middle of the night somewhere for support when the domain fails 
to start. I think he has a point. Wouldn't it make more sense to start 
the domain(s) at perhaps suboptimal performance, than to have everything 
shut down until someone can come and figure out the problem?


It would seem better if the domain started (with no cpu affinity) and a 
warning was posted. This might also argue for separation of domain 
config and tuning parameters. Or at least for 2 flavors of create? One 
that fails if tuning parameters cannot be activated, and one that doesn't.




--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

Index: src/xml.c
===
RCS file: /data/cvs/libvirt/src/xml.c,v
retrieving revision 1.97
diff -u -r1.97 xml.c
--- src/xml.c	31 Oct 2007 09:39:13 -	1.97
+++ src/xml.c	2 Nov 2007 15:20:44 -
@@ -126,7 +126,7 @@
 
 while ((*cur = '0')  (*cur = '9')) {
 ret = ret * 10 + (*cur - '0');
-if (ret  maxcpu)
+if (ret = maxcpu)
 return (-1);
 cur++;
 }
@@ -1647,6 +1647,8 @@
 }
 }
 free(cpuset);
+if (res  0) 
+goto error;
 } else {
 virXMLError(conn, VIR_ERR_NO_MEMORY, xmldesc, 0);
 }
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[XenPPC] Re: [Xen-devel] [PATCH] nr_cpus calculation problem due to incorrect sockets_per_node

2007-10-29 Thread beth kon
Hi. Wondering if this patch has been reviewed and could be considered 
for inclusion in 3.2. Sorry about the late request. You were asking for 
input last week.


beth kon wrote:

Testing on an 8-node 128-way NUMA machine has exposed a problem with 
Xen's nr_cpus calculation. In this case, since Xen cuts off recognized 
CPUs at 32, the machine appears to have 16 CPUs on the first and 
second nodes and none on the remaining nodes. Given this asymmetry, 
the calculation of sockets_per_node (which is later used to calculate 
nr_cpus) is incorrect:


pi-sockets_per_node = num_online_cpus() /(num_online_nodes() * 
pi-cores_per_socket * pi-threads_per_core);


The most straightforward solution is to remove sockets_per_node, and 
instead determine nr_cpus directly from num_online_cpus.


This patch has been tested on x86_64 NUMA machines.



diff -r b4278beaf354 docs/man/xm.pod.1
--- a/docs/man/xm.pod.1 Wed Oct 17 13:12:03 2007 +0100
+++ b/docs/man/xm.pod.1 Wed Oct 17 20:09:46 2007 -0700
@@ -446,7 +446,6 @@ page more readable):
 machine: i686
 nr_cpus: 2
 nr_nodes   : 1
- sockets_per_node   : 2
 cores_per_socket   : 1
 threads_per_core   : 1
 cpu_mhz: 696
diff -r b4278beaf354 tools/python/xen/lowlevel/xc/xc.c
--- a/tools/python/xen/lowlevel/xc/xc.c Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/python/xen/lowlevel/xc/xc.c Wed Oct 17 20:09:46 2007 -0700
@@ -721,7 +721,7 @@ static PyObject *pyxc_physinfo(XcObject 
max_cpu_id,   info.max_cpu_id,

threads_per_core, info.threads_per_core,
cores_per_socket, info.cores_per_socket,
-sockets_per_node, info.sockets_per_node,
+nr_cpus,  info.nr_cpus, 
total_memory, pages_to_kib(info.total_pages),

free_memory,  pages_to_kib(info.free_pages),
scrub_memory, pages_to_kib(info.scrub_pages),
diff -r b4278beaf354 tools/python/xen/xend/XendNode.py
--- a/tools/python/xen/xend/XendNode.py Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/python/xen/xend/XendNode.py Wed Oct 17 20:09:46 2007 -0700
@@ -475,7 +475,7 @@ class XendNode:

cpu_info = {
nr_nodes: phys_info[nr_nodes],
-sockets_per_node: phys_info[sockets_per_node],
+nr_cpus:  phys_info[nr_cpus],
cores_per_socket: phys_info[cores_per_socket],
threads_per_core: phys_info[threads_per_core]
}
@@ -580,17 +580,9 @@ class XendNode:
str='none\n'
return str[:-1];

-def count_cpus(self, pinfo):
-count=0
-node_to_cpu=pinfo['node_to_cpu']
-for i in range(0, pinfo['nr_nodes']):
-count+=len(node_to_cpu[i])
-return count;
-
def physinfo(self):
info = self.xc.physinfo()

-info['nr_cpus'] = self.count_cpus(info)
info['cpu_mhz'] = info['cpu_khz'] / 1000

# physinfo is in KiB, need it in MiB

@@ -600,7 +592,6 @@ class XendNode:

ITEM_ORDER = ['nr_cpus',
  'nr_nodes',
-  'sockets_per_node',
  'cores_per_socket',
  'threads_per_core',
  'cpu_mhz',
diff -r b4278beaf354 tools/python/xen/xm/main.py
--- a/tools/python/xen/xm/main.py   Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/python/xen/xm/main.py   Wed Oct 17 20:09:46 2007 -0700
@@ -1667,9 +1667,8 @@ def xm_info(args):
release:   getVal([software_version, release]),
version:   getVal([software_version, version]),
machine:   getVal([software_version, machine]),
-nr_cpus:   len(getVal([host_CPUs], [])),
+nr_cpus:   getVal([cpu_configuration, nr_cpus]),
nr_nodes:  getVal([cpu_configuration, nr_nodes]),
-sockets_per_node:  getVal([cpu_configuration, 
sockets_per_node]),
cores_per_socket:  getVal([cpu_configuration, 
cores_per_socket]),
threads_per_core:  getVal([cpu_configuration, 
threads_per_core]),
cpu_mhz:   getCpuMhz(),
diff -r b4278beaf354 tools/xenmon/xenbaked.c
--- a/tools/xenmon/xenbaked.c   Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/xenmon/xenbaked.c   Wed Oct 17 20:09:46 2007 -0700
@@ -460,10 +460,7 @@ unsigned int get_num_cpus(void)
xc_interface_close(xc_handle);
opts.cpu_freq = (double)physinfo.cpu_khz/1000.0;

-return (physinfo.threads_per_core *
-physinfo.cores_per_socket *
-physinfo.sockets_per_node *
-physinfo.nr_nodes);
+return physinfo.nr_cpus;
}


diff -r b4278beaf354 tools/xenstat/libxenstat/src/xenstat.c
--- a/tools/xenstat/libxenstat/src/xenstat.cWed Oct 17 13

[XenPPC] [PATCH] nr_cpus calculation problem due to incorrect sockets_per_node

2007-10-19 Thread beth kon
Testing on an 8-node 128-way NUMA machine has exposed a problem with 
Xen's nr_cpus calculation. In this case, since Xen cuts off recognized 
CPUs at 32, the machine appears to have 16 CPUs on the first and second 
nodes and none on the remaining nodes. Given this asymmetry, the 
calculation of sockets_per_node (which is later used to calculate 
nr_cpus) is incorrect:


pi-sockets_per_node = num_online_cpus() /(num_online_nodes() * 
pi-cores_per_socket * pi-threads_per_core);


The most straightforward solution is to remove sockets_per_node, and 
instead determine nr_cpus directly from num_online_cpus.


This patch has been tested on x86_64 NUMA machines.

--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

diff -r b4278beaf354 docs/man/xm.pod.1
--- a/docs/man/xm.pod.1	Wed Oct 17 13:12:03 2007 +0100
+++ b/docs/man/xm.pod.1	Wed Oct 17 20:09:46 2007 -0700
@@ -446,7 +446,6 @@ page more readable):
  machine: i686
  nr_cpus: 2
  nr_nodes   : 1
- sockets_per_node   : 2
  cores_per_socket   : 1
  threads_per_core   : 1
  cpu_mhz: 696
diff -r b4278beaf354 tools/python/xen/lowlevel/xc/xc.c
--- a/tools/python/xen/lowlevel/xc/xc.c	Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/python/xen/lowlevel/xc/xc.c	Wed Oct 17 20:09:46 2007 -0700
@@ -721,7 +721,7 @@ static PyObject *pyxc_physinfo(XcObject 
 max_cpu_id,   info.max_cpu_id,
 threads_per_core, info.threads_per_core,
 cores_per_socket, info.cores_per_socket,
-sockets_per_node, info.sockets_per_node,
+nr_cpus,  info.nr_cpus, 
 total_memory, pages_to_kib(info.total_pages),
 free_memory,  pages_to_kib(info.free_pages),
 scrub_memory, pages_to_kib(info.scrub_pages),
diff -r b4278beaf354 tools/python/xen/xend/XendNode.py
--- a/tools/python/xen/xend/XendNode.py	Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/python/xen/xend/XendNode.py	Wed Oct 17 20:09:46 2007 -0700
@@ -475,7 +475,7 @@ class XendNode:
 
 cpu_info = {
 nr_nodes: phys_info[nr_nodes],
-sockets_per_node: phys_info[sockets_per_node],
+nr_cpus:  phys_info[nr_cpus],
 cores_per_socket: phys_info[cores_per_socket],
 threads_per_core: phys_info[threads_per_core]
 }
@@ -580,17 +580,9 @@ class XendNode:
 str='none\n'
 return str[:-1];
 
-def count_cpus(self, pinfo):
-count=0
-node_to_cpu=pinfo['node_to_cpu']
-for i in range(0, pinfo['nr_nodes']):
-count+=len(node_to_cpu[i])
-return count;
-
 def physinfo(self):
 info = self.xc.physinfo()
 
-info['nr_cpus'] = self.count_cpus(info)
 info['cpu_mhz'] = info['cpu_khz'] / 1000
 
 # physinfo is in KiB, need it in MiB
@@ -600,7 +592,6 @@ class XendNode:
 
 ITEM_ORDER = ['nr_cpus',
   'nr_nodes',
-  'sockets_per_node',
   'cores_per_socket',
   'threads_per_core',
   'cpu_mhz',
diff -r b4278beaf354 tools/python/xen/xm/main.py
--- a/tools/python/xen/xm/main.py	Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/python/xen/xm/main.py	Wed Oct 17 20:09:46 2007 -0700
@@ -1667,9 +1667,8 @@ def xm_info(args):
 release:   getVal([software_version, release]),
 version:   getVal([software_version, version]),
 machine:   getVal([software_version, machine]),
-nr_cpus:   len(getVal([host_CPUs], [])),
+nr_cpus:   getVal([cpu_configuration, nr_cpus]),
 nr_nodes:  getVal([cpu_configuration, nr_nodes]),
-sockets_per_node:  getVal([cpu_configuration, sockets_per_node]),
 cores_per_socket:  getVal([cpu_configuration, cores_per_socket]),
 threads_per_core:  getVal([cpu_configuration, threads_per_core]),
 cpu_mhz:   getCpuMhz(),
diff -r b4278beaf354 tools/xenmon/xenbaked.c
--- a/tools/xenmon/xenbaked.c	Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/xenmon/xenbaked.c	Wed Oct 17 20:09:46 2007 -0700
@@ -460,10 +460,7 @@ unsigned int get_num_cpus(void)
 xc_interface_close(xc_handle);
 opts.cpu_freq = (double)physinfo.cpu_khz/1000.0;
 
-return (physinfo.threads_per_core *
-physinfo.cores_per_socket *
-physinfo.sockets_per_node *
-physinfo.nr_nodes);
+return physinfo.nr_cpus;
 }
 
 
diff -r b4278beaf354 tools/xenstat/libxenstat/src/xenstat.c
--- a/tools/xenstat/libxenstat/src/xenstat.c	Wed Oct 17 13:12:03 2007 +0100
+++ b/tools/xenstat/libxenstat/src/xenstat.c	Wed Oct 17 20:09:46 2007 -0700
@@ -155,9 +155,7 @@ xenstat_node 

Re: [Libvir] [PATCH] Topology fix for no cpus

2007-10-08 Thread beth kon

Daniel Veillard wrote:


On Fri, Oct 05, 2007 at 03:55:14PM -0400, beth kon wrote:
 

I was able to test on a 128-way NUMA box and found a bug. My code did 
not handle the case of no cpus being associated with a node. I decided 
to represent (pretty straightforward decision :-) no cpus as follows in 
the xml...



cell id='2'
  cpus num='0'
  /cpus
/cell

Here is the patch...

Signed-off-by: Beth Kon [EMAIL PROTECTED]
   



 Hi Beth,

the patch makes sense but I think a small improvement is in order:

 


diff -urpN libvirt.orig/src/xend_internal.c libvirt/src/xend_internal.c
--- libvirt.orig/src/xend_internal.c2007-10-03 19:27:25.0 -0700
+++ libvirt/src/xend_internal.c 2007-10-04 05:41:13.0 -0700
@@ -1989,6 +1989,15 @@ sexpr_to_xend_topology_xml(virConnectPtr
/* get list of cpus associated w/ single cell */
while (1) {
if ((len = getNumber(offset, cpuNum))  0) {
+if (!strncmp (offset, no cpus, 7)){
+*(cpuIdsPtr++) = -1;
+break;
+} else {
+virXendError(conn, VIR_ERR_XEN_CALL, topology string syntax 
error);
+goto error;
+}
+}
+if ((len = getNumber(offset, cpuNum))  0) {
   



 Seems to me that at this point the test should read
  if (len  0) {

as getNumber has no side effect and offset or cpuNum are not changed.
Actually I would just move the
  len = getNumber(offset, cpuNum)
as a separate statement out at the beginning of the block of the while loop
for clarity of the code,

Daniel

 


Yes, that makes perfect sense. Thanks.

--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[Libvir] [PATCH] Topology fix for no cpus

2007-10-08 Thread beth kon

Here is a resubmission of this patch, corrected.


--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

Index: src/xend_internal.c
===
RCS file: /data/cvs/libvirt/src/xend_internal.c,v
retrieving revision 1.146
diff -u -r1.146 xend_internal.c
--- src/xend_internal.c	5 Oct 2007 01:08:17 -	1.146
+++ src/xend_internal.c	8 Oct 2007 14:37:58 -
@@ -1988,9 +1988,15 @@
 offset++;
 /* get list of cpus associated w/ single cell */
 while (1) {
-if ((len = getNumber(offset, cpuNum))  0) {
-virXendError(conn, VIR_ERR_XEN_CALL,  topology string syntax error);
-goto error;
+len = getNumber(offset, cpuNum);
+if (len  0) {
+if (!strncmp (offset, no cpus, 7)){
+*(cpuIdsPtr++) = -1;
+break;
+} else {
+virXendError(conn, VIR_ERR_XEN_CALL, topology string syntax error);
+goto error;
+}
 }
 offset += len;
 next = *(offset);
@@ -2058,6 +2064,8 @@
 if (r == -1) goto vir_buffer_failed;
 
 for (i = 0; i  cellCpuCount; i++) {
+if (*(iCpuIdsPtr + i) == -1)
+break;
 r = virBufferVSprintf (xml, \
cpu id='%d'/\n, *(iCpuIdsPtr + i));
 if (r == -1) goto vir_buffer_failed;
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [RFC][PATCH 1/2] Tested NUMA patches for available memory and topology

2007-09-28 Thread beth kon

Richard W.M. Jones wrote:


beth kon wrote:


Patch for accessing available memory.



--- libvirt.danielpatch/src/driver.h2007-09-11 15:29:43.0 
-0400
+++ libvirt.cellsMemory/src/driver.h2007-09-27 18:39:52.0 
-0400

@@ -258,8 +258,9 @@ typedef virDriver *virDriverPtr;
 typedef int
 (*virDrvNodeGetCellsFreeMemory)
 (virConnectPtr conn,
- unsigned long *freeMems,
- int nbCells);
+ long long *freeMems,

This needs to be declared unsigned long long.  If you configure with 
--enable-compile-warnings=error then the compiler will catch these 
sorts of errors.


--- libvirt.danielpatch/src/xend_internal.c2007-09-10 
17:35:39.0 -0400
+++ libvirt.cellsMemory/src/xend_internal.c2007-09-27 
18:39:52.0 -0400

@@ -1954,6 +1954,8 @@ xenDaemonOpen(virConnectPtr conn, const
 {
 xmlURIPtr uri = NULL;
 int ret;
+
+virNodeInfo nodeInfo;

This variable is never used.

[ And from part 2/2 of the patch ]

+ * getNumber:

sscanf?


The reason I created this is because I also wanted to find the length of 
the segment so I could add it to the parsing offset to check what was 
next in the string.  That level of checking may be unnecessary 
(overkill), and in any case could be more easily achieved using 
something like sscanf for some token portion of the string. As I said, I 
am *certain* there is a prettier way to do this!




[ And in general ]

I compiled this version  was hoping to test it, but I don't seem to 
have the right combination of Xen to make it work.  At least I don't 
see any topology section in the XML capabilities.  What patches do I 
need for Xen to make this work?  I have a 2 socket AMD machine which I 
assume should work with this.


Daniel has built the kernel and xen rpms with the needed patches.



Rich.




--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [RFC][PATCH 1/2] Tested NUMA patches for available memory and topology

2007-09-28 Thread beth kon

Daniel P. Berrange wrote:


On Fri, Sep 28, 2007 at 02:52:51PM +0100, Richard W.M. Jones wrote:
 


# src/virsh capabilities
[...]
 topology
   cells num='1'
 cell id='0'
   cpus num='4'
  cpu id='0'/
  cpu id='1'/
  cpu id='2'/
  cpu id='3'/
   /cpus
 /cell
   /cells
 /topology
   



Do we really need such verbose XML. At the very least the 'num' attribute
is redundant, since you can trivially do count(/topology/cells/cell) or
count(/topology/cells/[EMAIL PROTECTED]/cpus/cpu) XPath exprs in both cases.

The addition of extra tags every time we have a list is not the style we
have normally used in libvirt. eg, we don't use

   disks
  disk
 ..
  /disk
  disk
 ..
  /disk
   /disk

to surround the list of disks in a domain.

I'd prefer to see it looking more like this:

  topology
cell id='0' 
  cpu id='0'/

  cpu id='1'/
  cpu id='2'/
  cpu id='3'/
/cell
  /topology

Regards,
Dan.
 

That would simplify the code since the counts wouldn't need to be known 
up front. This was the format suggested by Daniel V and I used it, 
assuming he knows more about libvirt's desired/required xml structure.


--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [RFC][PATCH 1/2] Tested NUMA patches for available memory and topology

2007-09-28 Thread beth kon

Richard W.M. Jones wrote:


beth kon wrote:


Patch for accessing available memory.



--- libvirt.danielpatch/src/driver.h2007-09-11 15:29:43.0 
-0400
+++ libvirt.cellsMemory/src/driver.h2007-09-27 18:39:52.0 
-0400

@@ -258,8 +258,9 @@ typedef virDriver *virDriverPtr;
 typedef int
 (*virDrvNodeGetCellsFreeMemory)
 (virConnectPtr conn,
- unsigned long *freeMems,
- int nbCells);
+ long long *freeMems,

This needs to be declared unsigned long long.  If you configure with 
--enable-compile-warnings=error then the compiler will catch these 
sorts of errors.


--- libvirt.danielpatch/src/xend_internal.c2007-09-10 
17:35:39.0 -0400
+++ libvirt.cellsMemory/src/xend_internal.c2007-09-27 
18:39:52.0 -0400

@@ -1954,6 +1954,8 @@ xenDaemonOpen(virConnectPtr conn, const
 {
 xmlURIPtr uri = NULL;
 int ret;
+
+virNodeInfo nodeInfo;
This variable is never used.


Somehow I missed this part of the note last time. Thanks for the catches.


[ And from part 2/2 of the patch ]

+ * getNumber:

sscanf?

[ And in general ]

I compiled this version  was hoping to test it, but I don't seem to 
have the right combination of Xen to make it work.  At least I don't 
see any topology section in the XML capabilities.  What patches do I 
need for Xen to make this work?  I have a 2 socket AMD machine which I 
assume should work with this.



Daniel's RPMs are at
http://veillard.com/NUMA/


Rich.




--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [RFC][PATCH 0/2] Tested NUMA patches for available memory and topology

2007-09-28 Thread beth kon

Ryan Harper wrote:


* Elizabeth Kon [EMAIL PROTECTED] [2007-09-28 11:27]:
 


Daniel Veillard wrote:

   

- isolate as a separate call what is the total sum of free memory 
available

 on the Node


 


There is currently no way to get that information from Xen.
   



no, we can always get a total of _free_ memory, we just don't have a
call for _total_ ram (ie, free and non-free) -- only what's in the heap
(free mem).

 

I asked DV about this off-list and he said he actually wanted total, not 
free. DV please correct me if I misunderstood.



- on NUMA boxes in the capability dump I would like to see the
 amount of memory available on the cell see
 https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html
 memory size='2097152'/
 in the topology example.
- now that the code is in CVS reorganize a bit for example move back 
generic

 code to xen_unified.c

Anyway it looks like we are in good shape,

thanks a lot !

Daniel



 


--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
   



 




--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [Libvir] [RFC][PATCH 0/2] Tested NUMA patches for available memory and topology

2007-09-28 Thread beth kon

Daniel Veillard wrote:


 - isolate as a separate call what is the total sum of free memory available
   on the Node
 


There is currently no way to get that information from Xen.


 - on NUMA boxes in the capability dump I would like to see the
   amount of memory available on the cell see
   https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html
   memory size='2097152'/
   in the topology example.
 - now that the code is in CVS reorganize a bit for example move back generic
   code to xen_unified.c

Anyway it looks like we are in good shape,

 thanks a lot !

Daniel

 




--
Elizabeth Kon (Beth)
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


  1   2   >